Troubleshooting Guide

Solutions for common issues with Ohlala SmartOps deployment, Teams integration, and daily operations. Find quick fixes and detailed debugging steps.

Quick Diagnostics

Run this checklist to identify common issues:

  1. Check Service Health

    curl https://your-api-gateway-url/prod-{StackName}/health
    

    Expected: {"status": "healthy"}

  2. Verify CloudFormation Stack

    • AWS Console → CloudFormation
    • Stack status: CREATE_COMPLETE or UPDATE_COMPLETE
  3. Check ECS Service

    • AWS Console → ECS → Clusters
    • Service should have 1 running task
  4. Review Recent Logs

    • AWS Console → CloudWatch → Log Groups
    • Check /aws/ecs/ohlala-smartops-{StackName}

CloudWatch Logs Troubleshooting

Quick Log Analysis

Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.

1. Access ECS Task Logs

Via AWS Console:

  1. Go to CloudWatchLog Groups
  2. Find /aws/ecs/ohlala-smartops-{your-stack-name}
  3. Click on the most recent log stream
  4. Search for “ERROR” using Ctrl+F

Bot Not Responding

Symptoms

  • No response when messaging the bot in Teams
  • Bot appears offline
  • Commands timeout without response

Solution 1: Verify Webhook Configuration

  1. Check Webhook URL

    # Get from CloudFormation outputs
    aws cloudformation describe-stacks \
      --stack-name your-stack-name \
      --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \
      --output text
    
  2. Update in Azure Bot

    • Azure Portal → Your Bot → Configuration
    • Messaging endpoint must match CloudFormation output
    • Must end with /api/messages

Solution 2: Check Authentication

  1. Verify Secrets in AWS

    aws secretsmanager get-secret-value \
      --secret-id ohlala-smartops-teams-{StackName} \
      --query SecretString \
      --output json
    
  2. Validate Credentials Match Azure

    • App ID must match Azure Bot’s App ID
    • Password must be valid and not expired
    • Tenant ID must match your Azure AD
  3. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-authorizer-{StackName}
    • Look for “Authorization failed” messages

Solution 3: Teams App Issues

  1. Re-upload Teams Package

    • Remove existing app from Teams
    • Download fresh package
    • Update manifest.json with correct bot ID
    • Re-upload to Teams
    • You may need to manually bump the version in manifest.json to force Teams to accept the update
  2. Check Teams Policies

    • Teams Admin Center → Teams apps → Permission policies
    • Ensure custom apps are allowed
    • Check user has permission to use bots

Deployment Failures

CloudFormation Stack Failed

Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”

Solution:

# For Existing VPC mode, verify subnet IDs
aws ec2 describe-subnets \
  --subnet-ids subnet-xxxxx \
  --region your-region

Error: “CREATE_FAILED - IAM role already exists”

Solution:

# Delete existing role or use different stack name
aws iam delete-role --role-name ec2-management-bot-execution-role
aws iam delete-role --role-name ec2-management-bot-task-role

ECS Task Won’t Start

Error: “ResourceInitializationError: unable to pull secrets or registry auth”

Solution:

  1. Check ECR permissions
  2. Verify marketplace subscription is active
  3. Check execution role has secret access:
aws iam attach-role-policy \
  --role-name ec2-management-bot-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy

Bedrock Model Issues

Error: “ValidationException: The provided model identifier is invalid”

This is the #1 most common deployment issue!

Cause: Amazon Bedrock Claude Sonnet 4.5 model access is not enabled or not available in your deployment region.

Solution:

  1. Navigate to Amazon Bedrock Console

    • Go to AWS Console → Amazon Bedrock
    • Ensure you’re in the correct region (same as deployment)
  2. Enable Claude Sonnet 4.5 Model Access

    • Left sidebar → “Model access”
    • Click “Edit” or “Manage model access”
    • Find Anthropic section
    • Enable Claude Sonnet 4.5:
      • Claude Sonnet 4.5 (anthropic.claude-sonnet-4-5-*)
  3. Submit Request

    • Click “Next” → “Submit”
    • Most requests are approved immediately
    • Wait for status to show “Available”
  4. Verify Access

    # Test via AWS CLI
    aws bedrock list-foundation-models \
      --region us-east-1 \
      --query 'modelSummaries[?contains(modelId, `claude-sonnet`)]'
    
  5. Test in Bedrock Playground

    • Bedrock Console → Playgrounds → Chat
    • Select Claude Sonnet 4.5
    • Send test message: “Hello”
    • Should receive response
  6. Restart Application (if already deployed)

    # Force ECS service restart
    aws ecs update-service \
      --cluster your-cluster \
      --service your-service \
      --force-new-deployment
    

Regional Support with Cross-Region Inference Profiles:

Primary Regions (Native Claude Sonnet 4.5 Support):

  • us-east-1 (Recommended)
  • us-west-2
  • eu-west-1
  • eu-central-1
  • ap-northeast-1
  • ap-southeast-2

Supported via Inference Profiles:

  • eu-west-3 (via global/EU inference profiles)
  • eu-west-2 (via global/EU inference profiles)
  • eu-north-1 (via global/EU inference profiles)
  • ap-southeast-1 (via global/APAC inference profiles)
  • ap-northeast-2 (via global/APAC inference profiles)
  • ap-south-1 (via global/APAC inference profiles)
  • ca-central-1 (via global inference profiles)
  • sa-east-1 (via global inference profiles)

How Inference Profiles Work:

  1. Global Profile: global.anthropic.claude-sonnet-4-5-* - Works from any region
  2. Regional Profiles: eu.anthropic.claude-sonnet-4-5-* - Optimized for EU regions
  3. Automatic Fallback: Application automatically tries the best profile for your region

For eu-west-3 Specifically:

  • The application will automatically use global or EU inference profiles
  • No additional configuration required
  • Same Claude Sonnet 4.5 quality and performance

Error: “AccessDeniedException: You do not have access to the requested model”

Cause: Model access requested but not yet approved, or using wrong model ID.

Solution:

  1. Check approval status:

    • Bedrock Console → Model access
    • Status should be “Available”, not “Pending”
  2. Wait for approval:

    • Standard models: Usually immediate
    • Advanced models: Up to 24-48 hours
    • Check email for approval notification

Permission Issues

Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”

Solution:

  1. Add Bedrock permissions to ECS task role:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*:*:inference-profile/*claude*"
      ]
    }
  ]
}
  1. Ensure Bedrock is available in your region
  2. Check Service Control Policies (SCPs) aren’t blocking access

Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”

Solution:

  1. Update task role policy
  2. Check for SCPs (Service Control Policies) blocking access
  3. Verify cross-account permissions if using multiple accounts

Teams Integration Issues

Bot Shows as Offline

Causes & Solutions:

  1. Azure Bot Channel Not Configured

    • Azure Portal → Bot → Channels
    • Ensure Teams channel is enabled
    • Status should be “Running”
  2. API Gateway Throttling

    • Check CloudWatch metrics for 429 errors
  3. Network Connectivity

    • Verify security groups allow HTTPS outbound
    • Check NAT Gateway is functioning (if used)

Messages Not Formatted Correctly

Issue: Bot responses show raw JSON or markdown

Solution:

  1. Update Teams app manifest version
  2. Ensure bot supports Adaptive Cards:
"supportsFiles": false,
"supportsCalling": false,
"supportsVideo": false

Bot Added but Can’t Use Commands

Issue: Bot visible but commands don’t work

Solution:

  1. Check bot is added to channel properly
  2. Verify @ mentions are working
  3. Test in personal chat first
  4. Check Teams app permissions

Google Chat Integration Issues

Bot Not Responding in Google Chat

Causes & Solutions:

  1. Check Google Chat is Enabled

    • CloudFormation parameter GoogleChatEnabled must be true
    • Verify ECS task has GOOGLE_CHAT_ENABLED=true environment variable
  2. Verify Webhook URL Configuration

    • Go to Google Cloud Console → Chat API Configuration
    • App URL must match CloudFormation output GoogleChatWebhookURL
    • Must end with /api/google-chat
  3. Check Service Account Credentials

    • Verify GoogleChatServiceAccountInfo parameter is correctly formatted
    • JSON must be on a single line
    • Project ID must match the service account’s project

Most Common Cause: Apps not published through Google Workspace Marketplace must explicitly list allowed users.

Solution:

  1. Check Visibility Settings (most likely issue)

    • Go to Google Cloud ConsoleAPIs & ServicesGoogle Chat APIConfiguration
    • Scroll to Visibility section
    • Select “Make this Chat app available to specific people and groups”
    • Click Add people or groups
    • Add your email address explicitly (even if you’re the project owner)
    • Click Save
    • Wait 2-5 minutes for propagation
  2. Verify Configuration is Saved

    • Check for a green “Saved” confirmation
    • Refresh the page and verify settings persisted
  3. Try Different Discovery Methods

    • In Google Chat: + New chatFind apps → search for app name
    • In a Space: Click space name → Apps & integrationsAdd apps

Google Chat Authentication Errors

Symptoms: 401/403 errors in CloudWatch logs for /api/google-chat

Solution:

  1. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-gc-authorizer-{StackName}
    • Look for specific error messages
  2. Verify Audience URL

    • In Chat API Configuration, Authentication Audience must be set to “App URL”
    • The audience in JWT tokens will match your webhook endpoint
  3. Validate Service Account JSON

    • Ensure JSON is properly escaped when passed to CloudFormation
    • Use jq -c to convert to single line

Cards Not Rendering Properly in Google Chat

Issue: Charts or cards don’t display correctly

Solution:

  1. Check QuickChart Container

    • ECS task should have quickchart container running
    • Check container logs for errors
  2. Verify S3 Bucket

    • Chart images are stored in S3 bucket smartops-charts-{StackName}
    • Check bucket permissions and lifecycle rules
  3. Check CloudWatch Logs

    • Search for “chart” or “card” errors in main-bot logs

Google Chat vs Teams Behavior Differences

Known Differences:

  • Message Updates: Google Chat has limited message update support
  • Card Format: Uses Google Card JSON instead of Adaptive Cards
  • Charts: Uploaded to S3 as images (Teams renders inline)
  • @mentions: Required in spaces, optional in direct messages

Getting Support

Before Contacting Support

  1. Collect diagnostic information:

    • Stack name and region
    • Error messages (exact text)
    • CloudWatch logs (last 100 lines)
    • Time of occurrence
  2. Try quick fixes:

    • Restart ECS service
    • Clear Teams cache
    • Re-authenticate bot

Contact Support

Email: support@ohlala.cloud

Include:

  • AWS Account ID
  • Stack Name
  • Error Description
  • Steps to Reproduce
  • Diagnostic Logs

Response Time: 1 business day

Additional Resources