Troubleshooting Guide
Quick solutions for common issues with Ohlala SmartOps. Use the search function (Ctrl+F) to find specific error messages.
π¨ Quick Diagnostics
Run this checklist to identify common issues:
Check Service Health
curl https://your-api-gateway-url/prod-{StackName}/health
Expected:
{"status": "healthy"}
Verify CloudFormation Stack
- AWS Console β CloudFormation
- Stack status:
CREATE_COMPLETE
orUPDATE_COMPLETE
Check ECS Service
- AWS Console β ECS β Clusters
- Service should have 1 running task
Review Recent Logs
- AWS Console β CloudWatch β Log Groups
- Check
/aws/ecs/ohlala-smartops-{StackName}
π CloudWatch Logs Troubleshooting
Quick Log Analysis
Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.
1. Access ECS Task Logs
Via AWS Console:
- Go to CloudWatch β Log Groups
- Find
/aws/ecs/ohlala-smartops-{your-stack-name}
- Click on the most recent log stream
- Search for “ERROR” using Ctrl+F
π€ Bot Not Responding
Symptoms
- No response when messaging the bot in Teams
- Bot appears offline
- Commands timeout without response
Solution 1: Verify Webhook Configuration
Check Webhook URL
# Get from CloudFormation outputs aws cloudformation describe-stacks \ --stack-name your-stack-name \ --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \ --output text
Update in Azure Bot
- Azure Portal β Your Bot β Configuration
- Messaging endpoint must match CloudFormation output
- Must end with
/api/messages
Solution 2: Check Authentication
Verify Secrets in AWS
aws secretsmanager get-secret-value \ --secret-id ohlala-smartops-teams-{StackName} \ --query SecretString \ --output json
Validate Credentials Match Azure
- App ID must match Azure Bot’s App ID
- Password must be valid and not expired
- Tenant ID must match your Azure AD
Check Lambda Authorizer Logs
- CloudWatch β Log Groups β
/aws/lambda/ohlala-authorizer-{StackName}
- Look for “Authorization failed” messages
- CloudWatch β Log Groups β
Solution 3: Teams App Issues
Re-upload Teams Package
- Remove existing app from Teams
- Download fresh package
- Update manifest.json with correct bot ID
- Re-upload to Teams
- You may need to manually bump the version in manifest.json to force Teams to accept the update
Check Teams Policies
- Teams Admin Center β Teams apps β Permission policies
- Ensure custom apps are allowed
- Check user has permission to use bots
β Deployment Failures
CloudFormation Stack Failed
Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”
Solution:
# For Existing VPC mode, verify subnet IDs
aws ec2 describe-subnets \
--subnet-ids subnet-xxxxx \
--region your-region
Error: “CREATE_FAILED - IAM role already exists”
Solution:
# Delete existing role or use different stack name
aws iam delete-role --role-name ec2-management-bot-execution-role
aws iam delete-role --role-name ec2-management-bot-task-role
ECS Task Won’t Start
Error: “ResourceInitializationError: unable to pull secrets or registry auth”
Solution:
- Check ECR permissions
- Verify marketplace subscription is active
- Check execution role has secret access:
aws iam attach-role-policy \
--role-name ec2-management-bot-execution-role \
--policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy
π§ Bedrock Model Issues
Error: “ValidationException: The provided model identifier is invalid”
This is the #1 most common deployment issue!
Cause: Amazon Bedrock Claude Sonnet 4 model access is not enabled or not available in your deployment region.
Solution:
Navigate to Amazon Bedrock Console
- Go to AWS Console β Amazon Bedrock
- Ensure you’re in the correct region (same as deployment)
Enable Claude Sonnet 4 Model Access
- Left sidebar β “Model access”
- Click “Edit” or “Manage model access”
- Find Anthropic section
- Enable Claude Sonnet 4:
- β Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0)
Submit Request
- Click “Next” β “Submit”
- Most requests are approved immediately
- Wait for status to show “Available”
Verify Access
# Test via AWS CLI aws bedrock list-foundation-models \ --region us-east-1 \ --query 'modelSummaries[?contains(modelId, `claude-sonnet-4`)]'
Test in Bedrock Playground
- Bedrock Console β Playgrounds β Chat
- Select Claude Sonnet 4
- Send test message: “Hello”
- Should receive response
Restart Application (if already deployed)
# Force ECS service restart aws ecs update-service \ --cluster your-cluster \ --service your-service \ --force-new-deployment
Regional Support with Cross-Region Inference Profiles:
π Cross-Region Support
Ohlala SmartOps now supports ALL AWS regions through intelligent inference profile selection, including regions without native Claude Sonnet 4 support like eu-west-3.Primary Regions (Native Claude Sonnet 4 Support):
- us-east-1 β (Recommended)
- us-west-2 β
- eu-west-1 β
- eu-central-1 β
- ap-northeast-1 β
- ap-southeast-2 β
Supported via Inference Profiles:
- eu-west-3 β (via global/EU inference profiles)
- eu-west-2 β (via global/EU inference profiles)
- eu-north-1 β (via global/EU inference profiles)
- ap-southeast-1 β (via global/APAC inference profiles)
- ap-northeast-2 β (via global/APAC inference profiles)
- ap-south-1 β (via global/APAC inference profiles)
- ca-central-1 β (via global inference profiles)
- sa-east-1 β (via global inference profiles)
How Inference Profiles Work:
- Global Profile:
global.anthropic.claude-sonnet-4-20250514-v1:0
- Works from any region - Regional Profiles:
eu.anthropic.claude-sonnet-4-20250514-v1:0
- Optimized for EU regions - Automatic Fallback: Application automatically tries the best profile for your region
For eu-west-3 Specifically:
- The application will automatically use global or EU inference profiles
- No additional configuration required
- Same Claude Sonnet 4 quality and performance
Error: “AccessDeniedException: You do not have access to the requested model”
Cause: Model access requested but not yet approved, or using wrong model ID.
Solution:
Check approval status:
- Bedrock Console β Model access
- Status should be “Available”, not “Pending”
Wait for approval:
- Standard models: Usually immediate
- Advanced models: Up to 24-48 hours
- Check email for approval notification
π Permission Issues
Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”
Solution:
- Add Bedrock permissions to ECS task role:
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*:*:inference-profile/*claude*"
]
}
]
}
- Ensure Bedrock is available in your region
- Check Service Control Policies (SCPs) aren’t blocking access
Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”
Solution:
- Update task role policy
- Check for SCPs (Service Control Policies) blocking access
- Verify cross-account permissions if using multiple accounts
π¬ Teams Integration Issues
Bot Shows as Offline
Causes & Solutions:
Azure Bot Channel Not Configured
- Azure Portal β Bot β Channels
- Ensure Teams channel is enabled
- Status should be “Running”
API Gateway Throttling
- Check CloudWatch metrics for 429 errors
Network Connectivity
- Verify security groups allow HTTPS outbound
- Check NAT Gateway is functioning (if used)
Messages Not Formatted Correctly
Issue: Bot responses show raw JSON or markdown
Solution:
- Update Teams app manifest version
- Ensure bot supports Adaptive Cards:
"supportsFiles": false,
"supportsCalling": false,
"supportsVideo": false
Bot Added but Can’t Use Commands
Issue: Bot visible but commands don’t work
Solution:
- Check bot is added to channel properly
- Verify @ mentions are working
- Test in personal chat first
- Check Teams app permissions
π Getting Support
Before Contacting Support
Collect diagnostic information:
- Stack name and region
- Error messages (exact text)
- CloudWatch logs (last 100 lines)
- Time of occurrence
Try quick fixes:
- Restart ECS service
- Clear Teams cache
- Re-authenticate bot
Contact Support
Email: support@ohlala.cloud
Include:
- AWS Account ID
- Stack Name
- Error Description
- Steps to Reproduce
- Diagnostic Logs
Response Time: 1 business day