Troubleshooting Guide

Solutions for common issues with Ohlala SmartOps deployment, Teams integration, and daily operations. Find quick fixes and detailed debugging steps.

Troubleshooting Guide

Quick solutions for common issues with Ohlala SmartOps. Use the search function (Ctrl+F) to find specific error messages.

🚨 Quick Diagnostics

Run this checklist to identify common issues:

  1. Check Service Health

    curl https://your-api-gateway-url/prod-{StackName}/health
    

    Expected: {"status": "healthy"}

  2. Verify CloudFormation Stack

    • AWS Console → CloudFormation
    • Stack status: CREATE_COMPLETE or UPDATE_COMPLETE
  3. Check ECS Service

    • AWS Console → ECS → Clusters
    • Service should have 1 running task
  4. Review Recent Logs

    • AWS Console → CloudWatch → Log Groups
    • Check /aws/ecs/ohlala-smartops-{StackName}

📊 CloudWatch Logs Troubleshooting

Quick Log Analysis

Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.

1. Access ECS Task Logs

Via AWS Console:

  1. Go to CloudWatchLog Groups
  2. Find /aws/ecs/ohlala-smartops-{your-stack-name}
  3. Click on the most recent log stream
  4. Search for “ERROR” using Ctrl+F

🤖 Bot Not Responding

Symptoms

  • No response when messaging the bot in Teams
  • Bot appears offline
  • Commands timeout without response

Solution 1: Verify Webhook Configuration

  1. Check Webhook URL

    # Get from CloudFormation outputs
    aws cloudformation describe-stacks \
      --stack-name your-stack-name \
      --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \
      --output text
    
  2. Update in Azure Bot

    • Azure Portal → Your Bot → Configuration
    • Messaging endpoint must match CloudFormation output
    • Must end with /api/messages

Solution 2: Check Authentication

  1. Verify Secrets in AWS

    aws secretsmanager get-secret-value \
      --secret-id ohlala-smartops-teams-{StackName} \
      --query SecretString \
      --output json
    
  2. Validate Credentials Match Azure

    • App ID must match Azure Bot’s App ID
    • Password must be valid and not expired
    • Tenant ID must match your Azure AD
  3. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-authorizer-{StackName}
    • Look for “Authorization failed” messages

Solution 3: Teams App Issues

  1. Re-upload Teams Package

    • Remove existing app from Teams
    • Download fresh package
    • Update manifest.json with correct bot ID
    • Re-upload to Teams
    • You may need to manually bump the version in manifest.json to force Teams to accept the update
  2. Check Teams Policies

    • Teams Admin Center → Teams apps → Permission policies
    • Ensure custom apps are allowed
    • Check user has permission to use bots

❌ Deployment Failures

CloudFormation Stack Failed

Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”

Solution:

# For Existing VPC mode, verify subnet IDs
aws ec2 describe-subnets \
  --subnet-ids subnet-xxxxx \
  --region your-region

Error: “CREATE_FAILED - IAM role already exists”

Solution:

# Delete existing role or use different stack name
aws iam delete-role --role-name ec2-management-bot-execution-role
aws iam delete-role --role-name ec2-management-bot-task-role

ECS Task Won’t Start

Error: “ResourceInitializationError: unable to pull secrets or registry auth”

Solution:

  1. Check ECR permissions
  2. Verify marketplace subscription is active
  3. Check execution role has secret access:
aws iam attach-role-policy \
  --role-name ec2-management-bot-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy

🧠 Bedrock Model Issues

Error: “ValidationException: The provided model identifier is invalid”

This is the #1 most common deployment issue!

Cause: Amazon Bedrock Claude Sonnet 4 model access is not enabled or not available in your deployment region.

Solution:

  1. Navigate to Amazon Bedrock Console

    • Go to AWS Console → Amazon Bedrock
    • Ensure you’re in the correct region (same as deployment)
  2. Enable Claude Sonnet 4 Model Access

    • Left sidebar → “Model access”
    • Click “Edit” or “Manage model access”
    • Find Anthropic section
    • Enable Claude Sonnet 4:
      • Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0)
  3. Submit Request

    • Click “Next” → “Submit”
    • Most requests are approved immediately
    • Wait for status to show “Available”
  4. Verify Access

    # Test via AWS CLI
    aws bedrock list-foundation-models \
      --region us-east-1 \
      --query 'modelSummaries[?contains(modelId, `claude-sonnet-4`)]'
    
  5. Test in Bedrock Playground

    • Bedrock Console → Playgrounds → Chat
    • Select Claude Sonnet 4
    • Send test message: “Hello”
    • Should receive response
  6. Restart Application (if already deployed)

    # Force ECS service restart
    aws ecs update-service \
      --cluster your-cluster \
      --service your-service \
      --force-new-deployment
    

Regional Support with Cross-Region Inference Profiles:

Primary Regions (Native Claude Sonnet 4 Support):

  • us-east-1 ✅ (Recommended)
  • us-west-2 ✅
  • eu-west-1 ✅
  • eu-central-1 ✅
  • ap-northeast-1 ✅
  • ap-southeast-2 ✅

Supported via Inference Profiles:

  • eu-west-3 ✅ (via global/EU inference profiles)
  • eu-west-2 ✅ (via global/EU inference profiles)
  • eu-north-1 ✅ (via global/EU inference profiles)
  • ap-southeast-1 ✅ (via global/APAC inference profiles)
  • ap-northeast-2 ✅ (via global/APAC inference profiles)
  • ap-south-1 ✅ (via global/APAC inference profiles)
  • ca-central-1 ✅ (via global inference profiles)
  • sa-east-1 ✅ (via global inference profiles)

How Inference Profiles Work:

  1. Global Profile: global.anthropic.claude-sonnet-4-20250514-v1:0 - Works from any region
  2. Regional Profiles: eu.anthropic.claude-sonnet-4-20250514-v1:0 - Optimized for EU regions
  3. Automatic Fallback: Application automatically tries the best profile for your region

For eu-west-3 Specifically:

  • The application will automatically use global or EU inference profiles
  • No additional configuration required
  • Same Claude Sonnet 4 quality and performance

Error: “AccessDeniedException: You do not have access to the requested model”

Cause: Model access requested but not yet approved, or using wrong model ID.

Solution:

  1. Check approval status:

    • Bedrock Console → Model access
    • Status should be “Available”, not “Pending”
  2. Wait for approval:

    • Standard models: Usually immediate
    • Advanced models: Up to 24-48 hours
    • Check email for approval notification

🔐 Permission Issues

Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”

Solution:

  1. Add Bedrock permissions to ECS task role:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*:*:inference-profile/*claude*"
      ]
    }
  ]
}
  1. Ensure Bedrock is available in your region
  2. Check Service Control Policies (SCPs) aren’t blocking access

Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”

Solution:

  1. Update task role policy
  2. Check for SCPs (Service Control Policies) blocking access
  3. Verify cross-account permissions if using multiple accounts

💬 Teams Integration Issues

Bot Shows as Offline

Causes & Solutions:

  1. Azure Bot Channel Not Configured

    • Azure Portal → Bot → Channels
    • Ensure Teams channel is enabled
    • Status should be “Running”
  2. API Gateway Throttling

    • Check CloudWatch metrics for 429 errors
  3. Network Connectivity

    • Verify security groups allow HTTPS outbound
    • Check NAT Gateway is functioning (if used)

Messages Not Formatted Correctly

Issue: Bot responses show raw JSON or markdown

Solution:

  1. Update Teams app manifest version
  2. Ensure bot supports Adaptive Cards:
"supportsFiles": false,
"supportsCalling": false,
"supportsVideo": false

Bot Added but Can’t Use Commands

Issue: Bot visible but commands don’t work

Solution:

  1. Check bot is added to channel properly
  2. Verify @ mentions are working
  3. Test in personal chat first
  4. Check Teams app permissions

📞 Getting Support

Before Contacting Support

  1. Collect diagnostic information:

    • Stack name and region
    • Error messages (exact text)
    • CloudWatch logs (last 100 lines)
    • Time of occurrence
  2. Try quick fixes:

    • Restart ECS service
    • Clear Teams cache
    • Re-authenticate bot

Contact Support

Email: support@ohlala.cloud

Include:

  • AWS Account ID
  • Stack Name
  • Error Description
  • Steps to Reproduce
  • Diagnostic Logs

Response Time: 1 business day

📖 Additional Resources