This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Troubleshooting Guide

Solutions for common issues with Ohlala SmartOps deployment, Teams integration, and daily operations. Find quick fixes and detailed debugging steps.

    Troubleshooting Guide

    Quick solutions for common issues with Ohlala SmartOps. Use the search function (Ctrl+F) to find specific error messages.

    🚨 Quick Diagnostics

    Run this checklist to identify common issues:

    1. Check Service Health

      curl https://your-api-gateway-url/prod-{StackName}/health
      

      Expected: {"status": "healthy"}

    2. Verify CloudFormation Stack

      • AWS Console β†’ CloudFormation
      • Stack status: CREATE_COMPLETE or UPDATE_COMPLETE
    3. Check ECS Service

      • AWS Console β†’ ECS β†’ Clusters
      • Service should have 1 running task
    4. Review Recent Logs

      • AWS Console β†’ CloudWatch β†’ Log Groups
      • Check /aws/ecs/ohlala-smartops-{StackName}

    πŸ“Š CloudWatch Logs Troubleshooting

    Quick Log Analysis

    Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.

    1. Access ECS Task Logs

    Via AWS Console:

    1. Go to CloudWatch β†’ Log Groups
    2. Find /aws/ecs/ohlala-smartops-{your-stack-name}
    3. Click on the most recent log stream
    4. Search for “ERROR” using Ctrl+F

    πŸ€– Bot Not Responding

    Symptoms

    • No response when messaging the bot in Teams
    • Bot appears offline
    • Commands timeout without response

    Solution 1: Verify Webhook Configuration

    1. Check Webhook URL

      # Get from CloudFormation outputs
      aws cloudformation describe-stacks \
        --stack-name your-stack-name \
        --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \
        --output text
      
    2. Update in Azure Bot

      • Azure Portal β†’ Your Bot β†’ Configuration
      • Messaging endpoint must match CloudFormation output
      • Must end with /api/messages

    Solution 2: Check Authentication

    1. Verify Secrets in AWS

      aws secretsmanager get-secret-value \
        --secret-id ohlala-smartops-teams-{StackName} \
        --query SecretString \
        --output json
      
    2. Validate Credentials Match Azure

      • App ID must match Azure Bot’s App ID
      • Password must be valid and not expired
      • Tenant ID must match your Azure AD
    3. Check Lambda Authorizer Logs

      • CloudWatch β†’ Log Groups β†’ /aws/lambda/ohlala-authorizer-{StackName}
      • Look for “Authorization failed” messages

    Solution 3: Teams App Issues

    1. Re-upload Teams Package

      • Remove existing app from Teams
      • Download fresh package
      • Update manifest.json with correct bot ID
      • Re-upload to Teams
      • You may need to manually bump the version in manifest.json to force Teams to accept the update
    2. Check Teams Policies

      • Teams Admin Center β†’ Teams apps β†’ Permission policies
      • Ensure custom apps are allowed
      • Check user has permission to use bots

    ❌ Deployment Failures

    CloudFormation Stack Failed

    Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”

    Solution:

    # For Existing VPC mode, verify subnet IDs
    aws ec2 describe-subnets \
      --subnet-ids subnet-xxxxx \
      --region your-region
    

    Error: “CREATE_FAILED - IAM role already exists”

    Solution:

    # Delete existing role or use different stack name
    aws iam delete-role --role-name ec2-management-bot-execution-role
    aws iam delete-role --role-name ec2-management-bot-task-role
    

    ECS Task Won’t Start

    Error: “ResourceInitializationError: unable to pull secrets or registry auth”

    Solution:

    1. Check ECR permissions
    2. Verify marketplace subscription is active
    3. Check execution role has secret access:
    aws iam attach-role-policy \
      --role-name ec2-management-bot-execution-role \
      --policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy
    

    🧠 Bedrock Model Issues

    Error: “ValidationException: The provided model identifier is invalid”

    This is the #1 most common deployment issue!

    Cause: Amazon Bedrock Claude Sonnet 4 model access is not enabled or not available in your deployment region.

    Solution:

    1. Navigate to Amazon Bedrock Console

      • Go to AWS Console β†’ Amazon Bedrock
      • Ensure you’re in the correct region (same as deployment)
    2. Enable Claude Sonnet 4 Model Access

      • Left sidebar β†’ “Model access”
      • Click “Edit” or “Manage model access”
      • Find Anthropic section
      • Enable Claude Sonnet 4:
        • βœ… Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0)
    3. Submit Request

      • Click “Next” β†’ “Submit”
      • Most requests are approved immediately
      • Wait for status to show “Available”
    4. Verify Access

      # Test via AWS CLI
      aws bedrock list-foundation-models \
        --region us-east-1 \
        --query 'modelSummaries[?contains(modelId, `claude-sonnet-4`)]'
      
    5. Test in Bedrock Playground

      • Bedrock Console β†’ Playgrounds β†’ Chat
      • Select Claude Sonnet 4
      • Send test message: “Hello”
      • Should receive response
    6. Restart Application (if already deployed)

      # Force ECS service restart
      aws ecs update-service \
        --cluster your-cluster \
        --service your-service \
        --force-new-deployment
      

    Regional Support with Cross-Region Inference Profiles:

    Primary Regions (Native Claude Sonnet 4 Support):

    • us-east-1 βœ… (Recommended)
    • us-west-2 βœ…
    • eu-west-1 βœ…
    • eu-central-1 βœ…
    • ap-northeast-1 βœ…
    • ap-southeast-2 βœ…

    Supported via Inference Profiles:

    • eu-west-3 βœ… (via global/EU inference profiles)
    • eu-west-2 βœ… (via global/EU inference profiles)
    • eu-north-1 βœ… (via global/EU inference profiles)
    • ap-southeast-1 βœ… (via global/APAC inference profiles)
    • ap-northeast-2 βœ… (via global/APAC inference profiles)
    • ap-south-1 βœ… (via global/APAC inference profiles)
    • ca-central-1 βœ… (via global inference profiles)
    • sa-east-1 βœ… (via global inference profiles)

    How Inference Profiles Work:

    1. Global Profile: global.anthropic.claude-sonnet-4-20250514-v1:0 - Works from any region
    2. Regional Profiles: eu.anthropic.claude-sonnet-4-20250514-v1:0 - Optimized for EU regions
    3. Automatic Fallback: Application automatically tries the best profile for your region

    For eu-west-3 Specifically:

    • The application will automatically use global or EU inference profiles
    • No additional configuration required
    • Same Claude Sonnet 4 quality and performance

    Error: “AccessDeniedException: You do not have access to the requested model”

    Cause: Model access requested but not yet approved, or using wrong model ID.

    Solution:

    1. Check approval status:

      • Bedrock Console β†’ Model access
      • Status should be “Available”, not “Pending”
    2. Wait for approval:

      • Standard models: Usually immediate
      • Advanced models: Up to 24-48 hours
      • Check email for approval notification

    πŸ” Permission Issues

    Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”

    Solution:

    1. Add Bedrock permissions to ECS task role:
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Action": [
            "bedrock:InvokeModel",
            "bedrock:InvokeModelWithResponseStream"
          ],
          "Resource": [
            "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
            "arn:aws:bedrock:*:*:inference-profile/*claude*"
          ]
        }
      ]
    }
    
    1. Ensure Bedrock is available in your region
    2. Check Service Control Policies (SCPs) aren’t blocking access

    Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”

    Solution:

    1. Update task role policy
    2. Check for SCPs (Service Control Policies) blocking access
    3. Verify cross-account permissions if using multiple accounts

    πŸ’¬ Teams Integration Issues

    Bot Shows as Offline

    Causes & Solutions:

    1. Azure Bot Channel Not Configured

      • Azure Portal β†’ Bot β†’ Channels
      • Ensure Teams channel is enabled
      • Status should be “Running”
    2. API Gateway Throttling

      • Check CloudWatch metrics for 429 errors
    3. Network Connectivity

      • Verify security groups allow HTTPS outbound
      • Check NAT Gateway is functioning (if used)

    Messages Not Formatted Correctly

    Issue: Bot responses show raw JSON or markdown

    Solution:

    1. Update Teams app manifest version
    2. Ensure bot supports Adaptive Cards:
    "supportsFiles": false,
    "supportsCalling": false,
    "supportsVideo": false
    

    Bot Added but Can’t Use Commands

    Issue: Bot visible but commands don’t work

    Solution:

    1. Check bot is added to channel properly
    2. Verify @ mentions are working
    3. Test in personal chat first
    4. Check Teams app permissions

    πŸ“ž Getting Support

    Before Contacting Support

    1. Collect diagnostic information:

      • Stack name and region
      • Error messages (exact text)
      • CloudWatch logs (last 100 lines)
      • Time of occurrence
    2. Try quick fixes:

      • Restart ECS service
      • Clear Teams cache
      • Re-authenticate bot

    Contact Support

    Email: support@ohlala.cloud

    Include:

    • AWS Account ID
    • Stack Name
    • Error Description
    • Steps to Reproduce
    • Diagnostic Logs

    Response Time: 1 business day

    πŸ“– Additional Resources