This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Ohlala SmartOps Documentation

Complete documentation for Ohlala SmartOps - AI-powered EC2 management in Microsoft Teams

Ohlala SmartOps is an AI-powered infrastructure management solution that brings AWS EC2 operations directly into Microsoft Teams. Using Amazon Bedrock’s Claude AI, it understands natural language commands and provides intelligent responses with visual dashboards.

Quick Navigation

Section Description
Getting Started Deploy SmartOps in 15-20 minutes
Architecture System design and performance
Deployment CloudFormation configuration
Features Complete feature overview
Bot Commands Command reference and examples
Troubleshooting Common issues and solutions

Core Capabilities

  • Conversational AI Interface - Manage infrastructure using plain English
  • Intelligent Analysis - On-demand health checks and anomaly detection
  • Cost Optimization - Automated rightsizing recommendations
  • Remote Troubleshooting - SSM integration for command execution
  • Visual Dashboards - On-demand metrics and reports delivered in Teams
  • Enterprise Security - Runs in your AWS account with IAM controls

Need Help?

1 - Getting Started with Ohlala SmartOps

Deploy Ohlala SmartOps and begin managing EC2 instances through Microsoft Teams or Google Chat

What You’ll Accomplish

By completing this guide, you’ll be able to:

  • Deploy SmartOps in your AWS account
  • Connect it to Microsoft Teams, Google Chat, or both
  • Execute your first EC2 management commands
  • View health reports and cost optimization recommendations

Deployment Steps

The deployment process is organized into clear, manageable steps:

Estimated Time: 15-20 minutes total Difficulty: Intermediate Prerequisites: AWS account with admin access, plus either Azure account (for Teams) or Google Workspace (for Google Chat)

1. Check Prerequisites

Verify you have everything needed before starting

  • AWS account requirements
  • Microsoft Teams or Google Workspace setup
  • Required permissions

2. Enable Amazon Bedrock

Configure AI model access in your AWS account

  • Enable Claude Sonnet 4.5 model
  • Verify regional availability
  • Understand inference profiles

3. Subscribe via AWS Marketplace

Get Ohlala SmartOps from AWS Marketplace

  • Subscribe to the product
  • Download CloudFormation template
  • Understand pricing

4. Register Azure Bot (Teams users)

Create and configure your Teams bot

  • Create Azure Bot resource
  • Generate authentication credentials
  • Configure bot settings

4b. Google Cloud Setup (Google Chat users)

Create Google Cloud Project and credentials

  • Create GCP project
  • Enable Google Chat API
  • Generate service account credentials

5. Deploy CloudFormation Stack

Launch the infrastructure in AWS

  • Configure stack parameters (including chat platform selection)
  • Deploy resources
  • Monitor deployment progress

6a. Connect to Teams (Teams users)

Integrate the bot with Microsoft Teams

  • Configure webhook URL
  • Install Teams app
  • Test the connection

6b. Connect to Google Chat (Google Chat users)

Configure the Chat app and install the bot

  • Configure Chat app with webhook URL
  • Set visibility and permissions
  • Install the app

7. Verify & Test

Confirm everything is working

  • Run test commands
  • Check health reports
  • Troubleshoot common issues
Step Title Description
1 Prerequisites Check requirements and prepare your environment
2 Bedrock Setup Enable AI models in your AWS account
3 AWS Marketplace Subscribe and download CloudFormation template
4 Azure Bot Configure Microsoft Teams bot (Teams users only)
4b Google Cloud Create GCP credentials (Google Chat users only)
5 CloudFormation Deploy infrastructure in AWS
6a Teams Integration Connect bot to Microsoft Teams
6b Google Chat Integration Connect bot to Google Chat
7 Verify & Test Confirm deployment and run first commands

Pro Tips

  • Start Simple: Begin with monitoring features before enabling modifications
  • Budget Awareness: SmartOps includes intelligent cost tracking with $5 milestone warnings
  • Clear Error Messages: The bot provides actionable guidance when issues occur
  • Team Collaboration: Share the assistant with your team for maximum productivity

Need Help?

1.1 - Prerequisites

Check all requirements before deploying Ohlala SmartOps

Choose Your Chat Platform

Ohlala SmartOps supports Microsoft Teams, Google Chat, or both. Choose based on your organization’s needs:

Platform Best For Requirements
Microsoft Teams Organizations using Microsoft 365 Azure account, Teams workspace
Google Chat Organizations using Google Workspace Google Cloud project, Workspace account
Both Organizations using both platforms All requirements from both

AWS Requirements

AWS Account

  • Administrative access to create IAM roles, ECS clusters, and other resources
  • AWS Marketplace subscription capability
  • Billing enabled for AWS services usage

Required AWS Services

Ensure these services are available in your chosen region:

  • Amazon ECS Fargate - Container orchestration
  • Amazon Bedrock - AI model access
  • API Gateway - Teams webhook endpoint
  • CloudFormation - Infrastructure deployment
  • Systems Manager (SSM) - Instance management

Existing Infrastructure

  • At least 1 EC2 instance to manage
  • SSM Agent installed on instances

Microsoft Teams Requirements

Skip this section if you plan to use Google Chat only.

Azure Account

  • Azure subscription (Free tier works)
  • Ability to create resources in Azure Portal
  • Azure AD tenant for authentication

Teams Workspace

  • Microsoft Teams installed and configured
  • Admin permissions to install custom apps
  • Teams channel where you want to add the bot

Google Workspace Requirements

Skip this section if you plan to use Microsoft Teams only.

Google Cloud Platform

  • Google Cloud account with ability to create projects
  • Billing enabled (for API usage tracking, minimal cost)
  • Permissions to enable APIs and create service accounts

Google Workspace

  • Google Workspace account (Business, Enterprise, or Education)
  • Admin permissions to create and manage Chat apps, or
  • Developer permissions to create internal Chat apps
  • Google Chat enabled in your Workspace

Knowledge Requirements

  • Basic understanding of AWS services
  • Familiarity with CloudFormation
  • Experience with your chosen chat platform administration (Teams or Google Chat)
  • Understanding of bot concepts

Pre-Deployment Checklist

Use this checklist to verify readiness:

AWS (Required for all)

  • AWS account with admin access
  • AWS region selected that supports all required services
  • At least one EC2 instance running
  • SSM Agent installed on instances

Microsoft Teams (if using Teams)

  • Azure account created
  • Microsoft Teams workspace available
  • Teams admin permissions confirmed

Google Chat (if using Google Chat)

  • Google Cloud account with billing enabled
  • Google Workspace account available
  • Permissions to create Chat apps

Regional Availability

For optimal performance, we recommend:

  • US East (N. Virginia) - us-east-1
  • US West (Oregon) - us-west-2
  • Europe (Ireland) - eu-west-1
  • Europe (Frankfurt) - eu-central-1

All Supported Regions

The solution works in any region with ECS Fargate support. Bedrock access is automatically handled through cross-region inference profiles.

Next Step

Once you’ve verified all prerequisites:

Continue to Bedrock Setup →

1.2 - Enable Amazon Bedrock

Configure Amazon Bedrock Claude model access for AI capabilities

Why Bedrock is Required

Ohlala SmartOps uses Amazon Bedrock with Claude Sonnet 4.5 to provide:

  • Natural language understanding of your commands
  • Intelligent analysis of infrastructure issues
  • Smart recommendations for optimization
  • Context-aware responses based on your environment

Step-by-Step Setup

1. Open Amazon Bedrock Console

Navigate to the Amazon Bedrock console in your deployment region:

Open Amazon Bedrock Console

2. Navigate to Model Access

In the left sidebar, click on “Model access”

3. Enable Claude Sonnet 4.5

  1. Click “Modify model access”
  2. Find and enable:
    • Claude Sonnet 4.5 (anthropic.claude-sonnet-4-5-*)
  3. Click “Submit” to request access

4. Wait for Approval

  • Standard models are usually approved immediately
  • Wait for status to show “Access granted”
  • Refresh the page if needed

Amazon Bedrock Model access page showing enabled Claude Sonnet 4.5 with “Access granted” status

Regional Considerations

How Regional Access Works

Ohlala SmartOps automatically handles regional model access:

  1. Detects your deployment region
  2. Uses the optimal inference profile for your location
  3. No additional configuration needed
  4. Same performance across all regions

Primary Regions (Native Support)

Best performance in these regions:

  • US East (N. Virginia) - us-east-1
  • US West (Oregon) - us-west-2
  • Europe (Frankfurt) - eu-central-1
  • Europe (Ireland) - eu-west-1
  • Asia Pacific (Tokyo) - ap-northeast-1
  • Asia Pacific (Sydney) - ap-southeast-2

All Other Regions

Supported via cross-region inference profiles:

  • Europe (Paris) - eu-west-3
  • Europe (London) - eu-west-2
  • Asia Pacific (Singapore) - ap-southeast-1
  • Asia Pacific (Mumbai) - ap-south-1
  • Canada (Central) - ca-central-1
  • South America (São Paulo) - sa-east-1
  • And more…

Verify Access

Check Model Status

  1. Return to the Model access page
  2. Verify Claude Sonnet 4.5 shows “Access granted”
  3. Note the model ID pattern: anthropic.claude-sonnet-4-5-*

Troubleshooting

Model Not Available

  • Issue: Claude Sonnet 4.5 not listed
  • Solution: Check you’re in a supported region

Access Denied After Deployment

  • Issue: Bot shows “Model access required” error
  • Solution:
    1. Enable model access as shown above
    2. Wait for “Access granted” status
    3. No need to redeploy - the bot will work automatically

Request Pending Too Long

  • Issue: Status stuck on “Pending”
  • Solution:
    1. Cancel and resubmit the request
    2. Contact AWS Support if issues persist

Good to Know

Model Costs

  • Input: $3.00 per million tokens
  • Output: $15.00 per million tokens
  • Average command uses ~500-2000 tokens
  • Built-in cost tracking alerts you at $5 milestones

Alternative Models

Currently, only Claude Sonnet 4.5 is supported. Support for additional models may be added in future releases.

Next Step

Once Bedrock access is enabled:

Continue to AWS Marketplace →

1.3 - AWS Marketplace Subscription

Subscribe to Ohlala SmartOps through AWS Marketplace

What You’ll Get

The AWS Marketplace subscription provides:

  • Official CloudFormation template
  • Pre-built container images
  • Automatic updates
  • AWS support integration
  • Simplified billing through AWS

Subscription Steps

1. Navigate to AWS Marketplace

Open the Ohlala SmartOps product page:

AWS Marketplace - Ohlala SmartOps

2. View Purchase Options

Click “View purchase options” to start the subscription process.

AWS Marketplace product page with “View purchase options” button highlighted

3. Subscribe to the Product

Scroll down and click “Subscribe” to accept the terms.

AWS Marketplace subscription confirmation page with “Subscribe” button highlighted

4. Launch Your Software

Once subscribed, click “Launch your software” to proceed with deployment.

AWS Marketplace subscription confirmation page with “Launch your software” button highlighted

5. Download CloudFormation Template

  1. Select “Amazon ECS” as the launch method
  2. Click the “cloudformation template” link to download
  3. Save the template file locally - you’ll need it in the next steps
  4. Alternatively, you can download the cloudformation template here: Download Template

AWS Marketplace launch options page with “cloudformation template” link highlighted

Verify Subscription

Check Subscription Status

  1. Go to AWS Marketplace → Manage subscriptions
  2. Find “Ohlala SmartOps” in your subscriptions
  3. Verify status shows “Active”

Download Template Backup

Common Questions

Q: Can I cancel anytime?

A: Yes, you can cancel the subscription anytime through AWS Marketplace. You only pay for resources used.

Q: Can I deploy multiple instances?

A: Yes, you can deploy multiple stacks using the same subscription. Contact us for volume licensing.

Q: How do updates work?

A: Updates are provided through new versions in Marketplace. You can update at your convenience.

Next Step

With your subscription active and template downloaded:

Continue to Azure Bot Setup →

1.4 - Azure Bot Registration

Create and configure your Microsoft Teams bot in Azure

What You’ll Create

  • Azure Bot resource for Teams integration
  • Authentication credentials (App ID, Password, Tenant ID)
  • Secure communication channel with Teams

Step-by-Step Setup

1. Access Azure Portal

Navigate to Azure Portal and sign in:

https://portal.azure.com

2. Create Resource Group (Optional)

It’s recommended to create a dedicated resource group:

  1. Search for “Resource groups” in the search bar
  2. Click “Create”
  3. Configure:
    • Subscription: Your Azure subscription
    • Resource group: ohlala-smartops-rg
    • Region: Choose any region (e.g., North Europe)
  4. Click “Review + create” then “Create”

Azure Portal Resource Groups page with “Create” button highlighted

3. Create Azure Bot

Search for Azure Bot

In the Azure Portal search bar, type “Azure Bot” and select it from the marketplace.

Azure Portal search bar with “Azure Bot” typed in

Configure Bot Settings

Fill in the bot configuration:

Bot handle: OhlalaSmartOps
Subscription: Your Azure subscription
Resource group: ohlala-smartops-rg (or your chosen group)
Location: North Europe (or your preferred region)
Pricing tier: F0 (Free)
Type: Single Tenant (default)
Microsoft App ID: Create new

Azure Bot creation page with required fields filled in

Click “Review + create” then “Create”. Deployment takes about 1-2 minutes.

4. Get Authentication Credentials

After deployment completes, go to your bot resource.

  1. Go to Settings → Configuration
  2. You’ll see the Microsoft App ID - copy and save this

Azure Bot Configuration page with “Create new” button highlighted under “Microsoft App ID”

Create App Password

  1. Click “Manage Password” next to the App ID
  2. In the new window, click “New client secret”

Azure Bot Client Secret creation page with “New client secret” button highlighted

  1. Configure the secret:
    • Description: Ohlala SmartOps Bot Secret
    • Expires: Choose duration (recommend 24 months)

Azure Bot Client Secret creation page with description and expiration fields filled in

  1. Click “Add”
  2. IMPORTANT: Copy the secret value immediately!

Azure Bot Client Secret creation page with “Copy to clipboard” button highlighted

Get Tenant ID

The Tenant ID is shown in the Azure Portal:

  1. Click on your account menu (top right)
  2. Select “Switch directory”
  3. Your Tenant ID is displayed there

Alternatively:

  1. Go to Azure Active Directory
  2. The Tenant ID is on the overview page

Save Your Credentials

You now have three critical values needed for deployment:

Credential Where to Find Example
Microsoft App ID Bot Configuration page 12345678-1234-1234-1234-123456789012
Microsoft App Password Client secrets (copied) AbC123... (long string)
Microsoft App Tenant ID Azure AD or account menu 87654321-4321-4321-4321-210987654321

5. Configure Teams Channel

1. Open Channels Page

In your Azure Bot resource, navigate to Channels in the left sidebar.

Azure Portal Channels page with “Microsoft Teams” icon highlighted

2. Add Microsoft Teams Channel

  1. Click on the Microsoft Teams icon
  2. Accept the terms and click “Agree”

Azure Bot Microsoft Teams channel configuration page with “Agree” button highlighted

  1. Click on Apply

Azure Bot Microsoft Teams channel configuration page with “Apply” button highlighted

Common Issues

Issue: Free Tier Not Available

Solution: F0 tier is limited to one per subscription. Use S1 (Standard) tier instead (~$0.50/month).

Issue: Can’t Create App Password

Solution: You need appropriate permissions in Azure AD. Contact your Azure administrator.

Issue: Lost App Password

Solution: You can create a new client secret:

  1. Go to Bot Configuration → Manage Password
  2. Create a new client secret
  3. Update your deployment with the new password

Next Step

With your Azure Bot configured and credentials saved:

Continue to CloudFormation Deployment →

1.5 - Google Cloud Setup

Create Google Cloud Project and Service Account for Google Chat integration

What You’ll Create

  • Google Cloud Project
  • Service Account with JSON credentials
  • Enabled Google Chat API

Prerequisites

Before starting, ensure you have:

  • Google Cloud Platform account with billing enabled
  • Google Workspace account (Business, Enterprise, or Education)

Step-by-Step Setup

1. Create Google Cloud Project

Set Up Project

  1. Go to Google Cloud Console
  2. Click Select a project then New Project

Google Cloud Console Project dialog

  1. Enter project name: ohlala-smartops (or your preferred name)

  2. Click Create Google Cloud Console New Project dialog

  3. Wait for project creation to complete

2. Enable Google Chat API

  1. In Google Cloud Console, go to APIs & Services then Library

  2. Search for Google Chat API Google Cloud Console Search for Google Chat API

  3. Click Google Chat API

  4. Click Enable Google Cloud Console Search for Google Chat API

3. Create Service Account

Create the Account

  1. Go to IAM & Admin then Service Accounts
  2. Click Create Service Account

Service Account creation page

  1. Enter details:
    • Name: ohlala-smartops-bot
    • ID: ohlala-smartops-bot (auto-generated)
    • Description: Service account for Ohlala SmartOps Chat app Service Account creation page
  2. Click Create and Continue
  3. Skip the optional steps (roles, access)
  4. Click Done

Generate JSON Key

  1. Click on your new service account
  2. Go to Keys tab
  3. Click Add Key then Create new key Service Account creation page
  4. Select JSON format
  5. Click Create
  6. Save the downloaded JSON file securely

Create JSON key dialog

4. Prepare Credentials for CloudFormation

You’ll need two values for CloudFormation deployment:

Credential CloudFormation Parameter Where to Find
Project ID GoogleChatProjectId Visible in Google Cloud Console (top bar or project settings)
Service Account JSON GoogleChatServiceAccountInfo Downloaded JSON file (converted to single line)

Convert JSON to Single Line

The service account JSON must be on a single line for CloudFormation. Use one of these methods:

Linux/Mac:

cat service-account.json | jq -c

Windows (PowerShell):

Get-Content service-account.json | ConvertFrom-Json | ConvertTo-Json -Compress

Summary

At this point, you should have:

  • Google Cloud Project created
  • Project ID noted for CloudFormation
  • Google Chat API enabled
  • Service Account created
  • JSON key downloaded and converted to single line

Next Step

You’re ready to deploy the CloudFormation stack with your Google Chat credentials:

Continue to CloudFormation Deployment →

1.6 - Deploy CloudFormation Stack

Deploy the Ohlala SmartOps infrastructure in AWS

What Gets Deployed

The CloudFormation stack creates:

  • ECS Fargate cluster with container services
  • API Gateway for Teams and/or Google Chat webhooks
  • Network infrastructure (VPC, subnets, security groups)
  • IAM roles with appropriate permissions
  • Secrets Manager for credentials
  • CloudWatch logs for monitoring

Deployment Steps

1. Open CloudFormation Console

Navigate to CloudFormation in your target region:

https://console.aws.amazon.com/cloudformation/home

2. Create New Stack

Click “Create stack” and choose “With new resources (standard)”

CloudFormation Create Stack page with “Create stack” button highlighted

3. Upload Template

  1. Select “Choose an existing template”
  2. Select “Upload a template file”
  3. Click “Choose file” and select the template downloaded from AWS Marketplace
  4. Click “Next”

CloudFormation Create Stack page with “Choose an existing template” option and file upload highlighted

4. Configure Stack Parameters

Stack Name

Enter a unique stack name: OhlalaSmartOps (or your preference)

Required Parameters

Fill in the mandatory parameters:

Parameter Description Example/Value
DeploymentMode VPC configuration NewVPC (recommended)
ContainerImageTag Version to deploy v2.0.2 (default)
ChatPlatform Chat integration to enable Teams, GoogleChat, or Both

CloudFormation ChatPlatform parameter dropdown

Teams Parameters (required when ChatPlatform is Teams or Both)

Parameter Description Example/Value
MicrosoftAppId From Azure Bot setup Your App ID
MicrosoftAppPassword From Azure Bot setup Your App Password
MicrosoftAppTenantId From Azure Bot setup Your Tenant ID

Google Chat Parameters (required when ChatPlatform is GoogleChat or Both)

Parameter Description Example/Value
GoogleChatProjectId Your GCP project ID my-project-123456
GoogleChatServiceAccountInfo Service account JSON key (single line) {"type":"service_account",...}

CloudFormation Stack Parameters page with fields filled in

VPC Configuration (if NewVPC)

Keep defaults or customize:

  • VPCCIDR: 10.0.0.0/16
  • PublicSubnet1CIDR: 10.0.1.0/24
  • PublicSubnet2CIDR: 10.0.2.0/24
  • PrivateSubnet1CIDR: 10.0.10.0/24
  • PrivateSubnet2CIDR: 10.0.11.0/24
  • EnableNATGateway: true

Security & Reports Configuration

Parameter Description Default
AllowedIngressCIDR CIDR range allowed to access API Gateway 0.0.0.0/0
EnableDailyReports Enable daily health reports to chat platforms true
DailyReportSchedule Cron expression for report schedule cron(0 8 * * ? *)

Schedule Examples:

  • cron(0 8 * * ? *) - Every day at 8am UTC
  • cron(0 9 ? * MON-FRI *) - Weekdays at 9am UTC
  • cron(0 14 * * ? *) - Every day at 2pm UTC

Click “Next”

5. Configure Stack Options

On the stack options page:

  • Tags: (Optional) Add tags for resource organization
  • Permissions: Leave default
  • Advanced options: Leave default

Click “Next”

6. Review and Create

  1. Review all settings
  2. Check the acknowledgment box:
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names

CloudFormation Stack Review page with acknowledgment checkbox highlighted

  1. Click “Submit”

CloudFormation Stack Review page with “Submit” button highlighted

7. Monitor Deployment

The stack creation takes 5-10 minutes. Monitor progress:

  1. Select your stack in the CloudFormation console
  2. Check the Events tab for real-time updates
  3. Wait for status: CREATE_COMPLETE

Get Stack Outputs

Once deployment completes, get the important URLs:

  1. Select your stack
  2. Go to the Outputs tab
  3. Save these values:
Output Description Use
TeamsWebhookURL API Gateway endpoint for Teams Configure in Azure Bot
GoogleChatWebhookURL API Gateway endpoint for Google Chat Configure in Google Chat API
APIGatewayEndpoint Base API URL Reference only
ECSCluster Cluster name For monitoring
ECSService Service name For monitoring

Verify Deployment

Check ECS Service

  1. Go to ECS Console → Clusters
  2. Find your cluster (e.g., OhlalaSmartOps-Cluster-...)
  3. Check service shows 1 running task

Check API Gateway

  1. Go to API Gateway Console
  2. Find your API (e.g., OhlalaSmartOps-API-...)
  3. Verify endpoints are created

Check Health Endpoint

Test the health endpoint (no authentication required):

curl https://your-api-id.execute-api.region.amazonaws.com/prod-stackname/health

Should return: {"status": "healthy"}

Troubleshooting

Stack Creation Failed

IAM Role Already Exists

Error: “Resource of type ‘AWS::IAM::Role’ with identifier already exists” Solution: Use a different stack name, or delete the existing role first

Insufficient Permissions

Error: “User is not authorized to perform: iam:CreateRole” Solution: Ensure you have admin permissions or required IAM policies

Service Quota Exceeded

Error: “Service quota exceeded” Solution: Request quota increase or deploy in different region

Stack Stuck in CREATE_IN_PROGRESS

  • Check Events tab for specific resource causing delay
  • ECS service can take 3-5 minutes to stabilize
  • If stuck >15 minutes, consider deleting and retrying

Next Step

With infrastructure deployed and webhook URL ready, continue based on your ChatPlatform selection:

1.7 - Connect to Microsoft Teams

Configure the webhook and install the bot in Microsoft Teams

What You’ll Configure

  • Azure Bot webhook endpoint
  • Teams channel connection
  • Bot app installation
  • Initial testing

Integration Steps

1. Configure Webhook in Azure Bot

Get the Webhook URL

From your CloudFormation stack outputs, copy the TeamsWebhookURL:

https://xxx.execute-api.region.amazonaws.com/prod-stackname/api/messages

Update Bot Configuration

  1. Go to Azure Portal
  2. Navigate to your Azure Bot resource
  3. Go to Configuration under Settings
  4. Set Messaging endpoint to your webhook URL
  5. Click Apply to save

Azure Bot Configuration page with Messaging endpoint field highlighted

2. Install Teams App

  1. Download the Teams app package:

    Ohlala SmartOps Teams App

  2. Customize the manifest:

    • Extract the zip file
    • Edit manifest.json
    • Replace YOUR_APP_ID with your Microsoft App ID
    • Re-zip the files
  3. Install in Teams:

    • Open Microsoft Teams
    • Go to Apps in the left sidebar, then Manage your apps
    • Click Upload an app
    • Select Upload a custom app
    • Choose your zip file
    • Click Add to install

Teams Apps menu showing Manage your apps option

Teams upload custom app dialog

N.B.: You can also ask your Teams admin to upload the app for you if you lack permissions on Teams Admin portal

Teams Admin Center showing Manage apps page with Upload a custom app button highlighted

4. Add Bot to Team or Chat

For Personal Use

  1. Find Ohlala SmartOps in your apps
  2. Click Add
  3. Start chatting directly with the bot

For Team Use

  1. Go to your team
  2. Click (More options) → Manage team
  3. Go to Apps tab
  4. Click Upload a custom app
  5. Select your app
  6. Click Add to team

Test the Connection

Send Test Message

In Teams, message the bot:

@Ohlala SmartOps hello

Expected response:

Hello! I’m Ohlala SmartOps, your AI-powered AWS infrastructure assistant. Type ‘/help’ to see what I can do for you.

Test Basic Command

Try a simple command:

@Ohlala SmartOps /help

Teams chat showing bot responding to /help command with list of available commands

The bot should respond with a help card showing available commands.

Verify Integration

Check Connection Status

In Azure Portal

  1. Go to your bot → Channels
  2. Microsoft Teams should show Running
  3. Click Microsoft Teams to see activity

In AWS Console

  1. Go to CloudWatch → Log Groups
  2. Find /aws/ecs/ohlala-smartops-...
  3. Check for incoming request logs

Monitor API Gateway

  1. Go to API Gateway Console
  2. Select your API
  3. Go to Dashboard
  4. You should see incoming requests when messaging the bot

Troubleshooting

Bot Not Responding

Check Webhook URL

  • Verify URL in Azure Bot Configuration matches CloudFormation output exactly
  • Ensure it includes the full path with /api/messages

Check ECS Service

  • Go to ECS Console
  • Verify service has 1 running task
  • Check task logs for errors

Test Health Endpoint

curl https://your-api.execute-api.region.amazonaws.com/prod-stackname/health

“Service Unavailable” Error

Causes:

  • ECS task not running
  • API Gateway misconfigured
  • Authentication failing

Solution:

  1. Check ECS service is running
  2. Verify API Gateway deployment
  3. Check CloudWatch logs for details

Authentication Errors

Symptoms: 401 or 403 errors in logs

Solution:

  1. Verify Microsoft App credentials in Secrets Manager
  2. Ensure Tenant ID is correct
  3. Check Lambda authorizer logs

Teams App Installation Issues

“App not found”:

  • Ensure manifest.json has correct App ID
  • Verify bot is published in Azure

“Permissions required”:

  • Contact Teams admin to allow custom apps
  • Check organizational app policies

Success Checklist

Confirm everything is working:

  • Azure Bot shows Teams channel as Running
  • Webhook URL configured correctly
  • Teams app installed successfully
  • Bot responds to messages
  • Help command works
  • No errors in CloudWatch logs

Next Step

Your bot is connected! Now let’s verify everything and run your first commands:

Continue to Verification & Testing →

1.8 - Connect to Google Chat

Configure the Chat app and install the bot in Google Chat

What You’ll Configure

  • Chat app settings with webhook URL
  • App visibility and permissions
  • Bot installation in Google Chat

Prerequisites

Before starting, ensure you have:

  • Google Cloud Project created with Chat API enabled (Google Cloud Setup)
  • CloudFormation stack deployed with ChatPlatform set to GoogleChat or Both
  • GoogleChatWebhookURL from CloudFormation stack outputs

Integration Steps

1. Get the Webhook URL

  1. Go to AWS CloudFormation Console
  2. Select your stack
  3. Go to the Outputs tab
  4. Copy the GoogleChatWebhookURL value

The URL looks like: https://xxx.execute-api.region.amazonaws.com/prod-stackname/api/google-chat

2. Configure Chat App

Access Chat API Configuration

  1. Go to Google Chat API Configuration
  2. Or navigate: APIs & Services then Enabled APIs then Google Chat API then Configuration

Google Chat API Configuration page

Fill App Information

App name: Ohlala SmartOps

Avatar URL: https://767397776277-marketplace.s3.us-east-1.amazonaws.com/Ohlala_logo.png

Description: AI-powered AWS infrastructure management assistant

Enable Interactive features: ON

Configure App URL

Functionality: Select both:

  • Receive 1:1 messages
  • Join spaces and group conversations

Connection settings:

Google Chat API Configuration page

Authentication Audience: Select App URL

Set Visibility

  1. Select “Make this Chat app available to specific people and groups”
  2. Click Add people or groups
  3. Enter email addresses of all users who need access to the bot
  4. Click Save

Chat app visibility settings

3. Add Bot to Google Chat

For Personal Use (Direct Message)

  1. Open Google Chat
  2. Click + New chat then type “smartops”

Google Chat Find apps dialog

  1. Search for Ohlala SmartOps
  2. Click on the app
  3. Click Add
  4. Start chatting directly with the bot

For Space Use (Group)

  1. Open or create a Google Chat space
  2. Click the space name then Integrations
  3. Click Add apps
  4. Search for Ohlala SmartOps
  5. Click Add
  6. The bot will appear in the space

Adding bot to a Google Chat space

Test the Connection

Send Test Message

In Google Chat, message the bot:

@Ohlala SmartOps hello

Or in a direct message:

hello

Expected response:

Hello! I’m Ohlala SmartOps, your AI-powered AWS infrastructure assistant. Type ‘/help’ to see what I can do for you.

Test Basic Command

Try a simple command:

/help

Example conversation with bot showing /help command

The bot should respond with a help card showing available commands.

Verify Integration

Check Google Chat API

  1. Go to Google Cloud Console
  2. Navigate to APIs & ServicesDashboard
  3. Find Google Chat API
  4. Click to see usage metrics

Check AWS CloudWatch Logs

  1. Go to CloudWatch → Log Groups
  2. Find /aws/ecs/ohlala-smartops-...
  3. Look for logs containing google-chat or incoming POST requests
  4. Verify requests are being received and processed

Monitor API Gateway

  1. Go to API Gateway Console
  2. Select your API
  3. Go to Dashboard
  4. You should see requests to /api/google-chat endpoint

Troubleshooting

Bot Not Responding

Check Webhook URL

  • Verify URL in Google Chat API Configuration matches CloudFormation output
  • Ensure it includes the full path with /api/google-chat

Check ECS Service

  • Go to ECS Console
  • Verify service has running tasks
  • Check task logs for errors

Verify Google Chat is Enabled

  • Check CloudFormation parameter ChatPlatform is GoogleChat or Both
  • Verify ECS task has the environment variable set

Causes:

  • App visibility not configured correctly
  • You’re not in the allowed users/groups list
  • App not yet published (may take a few minutes)

Solution:

  1. Go to Chat API Configuration
  2. Check Visibility settings
  3. Add your email to allowed users
  4. Wait a few minutes and try again

Authentication Errors

Symptoms: 401 or 403 errors in CloudWatch logs

Solution:

  1. Verify service account JSON was correct in CloudFormation
  2. Check the JSON was properly formatted (single line, no extra quotes)
  3. Verify Project ID matches the service account’s project
  4. Check Lambda authorizer logs for specific error messages

“Service Unavailable” Error

Causes:

  • ECS task not running
  • API Gateway misconfigured
  • JWT validation failing

Solution:

  1. Check ECS service is running
  2. Verify API Gateway deployment
  3. Check CloudWatch logs for Lambda authorizer errors
  4. Verify the audience URL in Chat API Configuration matches your endpoint

Card Rendering Issues

Google Chat uses a different card format than Teams. If cards don’t render correctly:

  1. Check CloudWatch logs for card formatting errors
  2. Verify the QuickChart sidecar container is running in ECS
  3. Check S3 bucket for chart images (if using charts)

Google Chat vs Teams Differences

Feature Google Chat Microsoft Teams
Card Format Google Card JSON Adaptive Cards
Authentication Service Account + JWT Azure AD + Bot Framework
Charts Uploaded to S3 as images Rendered inline
Message Updates Limited support Full support
@mentions Required in spaces Optional

Success Checklist

Confirm everything is working:

  • Chat app configured with correct webhook URL
  • App visibility settings allow your access
  • Bot appears in Google Chat app search
  • Bot responds to messages
  • Help command works
  • No errors in CloudWatch logs

Next Step

Your bot is connected to Google Chat! Now verify everything is working:

Continue to Verification & Testing →

1.9 - Verification & Testing

Confirm your deployment and run first commands

Deployment Checklist

Before testing commands, verify each component:

AWS Infrastructure

  • CloudFormation stack shows CREATE_COMPLETE
  • ECS service has 1 running task
  • API Gateway endpoint responds to health check
  • CloudWatch logs are being created
  • Secrets Manager contains credentials

Microsoft Teams (if using Teams)

  • Azure Bot shows Teams channel as Running
  • Webhook URL is configured correctly
  • Teams app is installed
  • Bot appears in Teams chat/channel

Google Chat (if using Google Chat)

  • Google Chat API enabled and configured
  • Service account credentials stored in CloudFormation
  • Chat app appears in Google Chat search
  • Bot is added to a space or direct message

Bedrock

  • Claude Sonnet 4.5 shows Access granted
  • Model is available in deployment region

Your First Commands

The commands below work in both Microsoft Teams and Google Chat. In Teams, use @Ohlala SmartOps to mention the bot. In Google Chat direct messages, you can type commands directly without mentioning.

1. Test Connection

Teams: @Ohlala SmartOps hello

Google Chat (direct message): hello

Expected Response: Friendly greeting confirming the bot is working

2. Get Help

Teams: @Ohlala SmartOps help

Google Chat: /help

Expected Response: Interactive card with available commands and examples

3. Check Instance Status

show me my EC2 instances

Expected Response: List of your EC2 instances with status information

4. Health Report

/health

Expected Response: Detailed health metrics for your instances

5. Natural Language Query

which instances are running in us-east-1?

Expected Response: Filtered list based on your query

Advanced Testing

Test SSM Integration

@Ohlala SmartOps check disk space on i-1234567890abcdef0

  • Verifies SSM command execution
  • Returns disk usage information

Test Cost Analysis

@Ohlala SmartOps analyze my EC2 costs

  • Checks CloudWatch metrics access
  • Provides cost optimization suggestions

Test Multi-Instance Commands

@Ohlala SmartOps show me all stopped instances

  • Tests filtering and analysis capabilities
  • Demonstrates natural language understanding

Monitoring Your Deployment

CloudWatch Metrics

Monitor key metrics in CloudWatch:

  1. ECS Service

    • CPU utilization (should be <50%)
    • Memory utilization (should be <70%)
    • Task count (should be 1)
  2. API Gateway

    • Request count
    • 4XX/5XX errors (should be minimal)
    • Latency (should be <3 seconds)
  3. Bedrock Usage

    • Token consumption
    • API throttling events
    • Model invocation errors

Common Issues & Solutions

Issue: Bot Not Responding

Quick Diagnosis:

# Check health endpoint
curl https://your-api.execute-api.region.amazonaws.com/prod-stackname/health

Solutions:

  1. Check ECS task is running
  2. For Teams: Verify webhook URL in Azure Bot configuration
  3. For Google Chat: Verify webhook URL in Chat API Configuration
  4. Ensure the app is installed correctly
  5. Review CloudWatch logs

Issue: “Model Access Required” Error

Symptom: Bot responds but shows Bedrock error

Solution:

  1. Go to Bedrock Console then Model access
  2. Enable Claude Sonnet 4.5
  3. Wait for “Access granted”
  4. Retry command (no restart needed)

Issue: No Instances Found

Symptom: Bot works but doesn’t see EC2 instances

Checks:

  1. Instances are in same region as deployment
  2. Instances have SSM agent installed
  3. IAM permissions are correct
  4. Try: list all instances in all regions

Issue: Commands Timeout

Symptom: Bot shows “thinking” but never responds

Solutions:

  1. Check ECS task memory/CPU
  2. Look for Bedrock throttling
  3. Verify network connectivity
  4. Scale ECS service if needed

Issue: Teams Authentication Failures

Symptom: 401/403 errors in logs for Teams requests

Solutions:

  1. Regenerate Azure Bot credentials
  2. Update Secrets Manager
  3. Restart ECS service
  4. Check tenant ID is correct

Issue: Google Chat Authentication Failures

Symptom: 401/403 errors in logs for Google Chat requests

Solutions:

  1. Verify service account JSON is correctly formatted (single line)
  2. Check the service account has the correct permissions
  3. Verify the Project ID matches the service account
  4. Check the Authentication Audience setting in Chat API Configuration

Best Practices

  1. Start simple: Use basic commands first
  2. Be specific: Include instance IDs for targeted actions
  3. Use natural language: The bot understands context
  4. Review suggestions: Always verify before applying changes

Success Indicators

Your deployment is successful when:

  • Bot responds within 2-3 seconds
  • All test commands work
  • No errors in CloudWatch logs
  • Costs align with expectations
  • Team members can use the bot

Next Steps

Now that your bot is working:

  1. Explore Features

    • Try advanced commands
    • Experiment with natural language queries
    • Review health and cost reports
  2. Train Your Team

    • Share the bot with team members
    • Create usage guidelines
    • Document common workflows

Getting Help

If you encounter issues:

  1. Check Documentation

  2. Contact Support

  3. Community Resources

Congratulations

You’ve successfully deployed Ohlala SmartOps. Your AI-powered infrastructure assistant is ready to help manage your AWS environment through natural language conversations in Microsoft Teams, Google Chat, or both.

Happy automating!

2 - Architecture & Limitations

System architecture, design decisions, and current limitations of Ohlala SmartOps

System Architecture

High-Level Overview

Ohlala SmartOps follows a containerized, serverless architecture designed for high availability and cost efficiency:

High-level architecture diagram showing user interaction with Teams, API Gateway, ECS Fargate, Bedrock, and AWS services

Container Architecture

Multi-Container Design with dedicated responsibilities:

Main Bot Container

  • Purpose: Teams integration, conversation orchestration, Bedrock AI
  • Port: 8000
  • Resources: 768 CPU units, 1536MB memory
  • Key Features:
    • Microsoft Bot Framework integration
    • Amazon Bedrock (Claude) orchestration
    • Conversation state management
    • Multi-language support

MCP AWS API Container

  • Purpose: Secure AWS operations via Model Context Protocol
  • Port: 8080
  • Resources: 256 CPU units, 512MB memory
  • Key Features:
    • AWS service abstractions
    • Permission-aware operations
    • Rate limiting and retry logic
    • Security-first design

Architecture Highlights

Fully Serverless

ECS Fargate + API Gateway eliminate infrastructure management overhead

  • Zero server maintenance - AWS handles all patching and scaling
  • Automatic scaling - Responds to demand without intervention
  • Pay-per-use pricing - Only pay for actual compute time
  • Note: ~30s cold start for new container instances

Security-First Design

Defense in depth with multiple security layers

  • Private subnets - Containers have no direct internet exposure
  • Isolated containers - Bot logic and AWS operations run separately
  • JWT validation - Lambda authorizer validates all requests
  • Secrets management - Credentials stored in AWS Secrets Manager
  • Least privilege IAM - Each component has minimal required permissions

Microservices Architecture

Multi-container pattern for better maintainability

  • Main bot container - Handles Teams interactions and AI orchestration
  • MCP AWS container - Provides secure AWS API access
  • Clear boundaries - Each container has a single responsibility
  • Independent updates - Deploy changes without affecting other components

Stateless by Design

No persistent storage keeps architecture simple

  • Reduced complexity - No database to manage or scale
  • Lower costs - No database charges or backup requirements
  • Horizontal scaling - Any container can handle any request
  • Trade-off: Conversation context resets on container restart

Regional Flexibility

Deploy anywhere with single-region stacks

  • Data sovereignty - Keep data in your required region
  • Low latency - Deploy close to your EC2 instances
  • Cost optimization - No cross-region data transfer fees
  • Simple disaster recovery - Deploy multiple independent stacks

High-Performance Networking

Optimized for Teams integration with enterprise-grade networking

  • Network Load Balancer - Layer 4 load balancing for minimal latency
  • VPC Link - Secure private connection from API Gateway
  • Auto-scaling - Network automatically handles traffic spikes
  • Health checks - Automatic failover for unhealthy containers

Performance Characteristics

Response Times

  • Health Check: < 1 second
  • Simple Commands: 2-5 seconds
  • AI Analysis: 5-15 seconds
  • SSM Operations: 10-60 seconds (depending on command)

Throughput Limits

  • Concurrent Users: 1-20 (single task)
  • Commands/Day: 10,00+ (with proper scaling)
  • API Gateway: 10,000 requests/second (AWS limit)
  • Bedrock: 20 requests/minute per model (AWS limit)

Scaling Behavior

  • Auto-scaling: ECS service set to auto-heal (1 task)
  • Cold start: ~30 seconds for new tasks

Current Limitations

1. Session Management

  • Issue: No persistent conversation history
  • Impact: Context lost on container restart
  • Workaround: Keep conversations short and focused

2. Multi-Region Support

  • Issue: Single region deployment only
  • Impact: No built-in disaster recovery
  • Workaround: Deploy multiple stacks in different regions

5. Cold Start Latency

  • Issue: 30+ second delay for new container starts
  • Impact: First request after idle period is slow
  • Workaround: Keep minimum 1 task running always
  • Mitigation: ECS warmup targets available

Security Architecture

Network Security

  • Private Subnets: Containers have no direct internet access
  • Security Groups: Restrictive ingress/egress rules
  • VPC Endpoints: Secure access to AWS services

Authentication & Authorization

  • Teams Authentication: Microsoft Bot Framework JWT validation
  • AWS Permissions: IAM roles with least-privilege access
  • Inter-Container: Shared API key for MCP communication
  • Secrets: AWS Secrets Manager for sensitive data

Data Protection

  • Encryption in Transit: TLS 1.2+ for all communication
  • Encryption at Rest: EBS volumes encrypted by default
  • Logging: CloudWatch Logs with retention policies
  • Audit Trail: All AWS API calls logged via CloudTrail

Technical References

Container Images

  • Registry: Amazon ECR
  • Repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/ohlala-automation-solutions/
  • Tags: Version-based (v1.1.0 latest, earlier: v1.0.x)

Monitoring & Observability

  • Metrics: CloudWatch Container Insights
  • Logs: Structured JSON logging to CloudWatch
  • Health Checks: HTTP endpoints on both containers
  • Alarms: CPU, Memory, Error Rate monitoring

Backup & Recovery

  • Container Images: Immutable, versioned in ECR
  • Infrastructure: CloudFormation templates in version control
  • Configuration: Environment variables and secrets
  • No Persistent Data: Stateless design eliminates backup needs

Additional Resources

Need Help?

3 - SmartOps Features & Security

Comprehensive guide to Ohlala SmartOps features with emphasis on the approval system that ensures infrastructure safety

Core Capabilities

Infrastructure Discovery

  • Automatic EC2 Detection: Zero-configuration discovery of SSM-enabled instances
  • Tag-Based Organization: Intelligent grouping by environment, application, and team
  • Multi-Region Support: Manages instances across all supported AWS regions

Cost Intelligence

  • Usage Analysis: Deep dive into actual vs. provisioned capacity
  • AI-Powered Recommendations: ML-driven rightsizing suggestions
  • Savings Calculations: Precise cost impact modeling with confidence intervals

Smart Troubleshooting

  • AI-Guided Diagnostics: Step-by-step issue resolution assistance
  • Remote Command Execution: Secure SSM-based command execution with approval
  • Pattern Recognition: Intelligent problem identification and solution suggestions

On-Demand Analytics

  • Health Assessments: Infrastructure status reports when requested
  • Performance Insights: Capacity planning and optimization recommendations
  • Custom Reports: Team-specific views and executive summaries

Detailed Feature Documentation

Cost Optimization →

Comprehensive FinOps capabilities for EC2 cost management:

  • Rightsizing recommendations with usage pattern analysis
  • Reserved Instance planning and optimization
  • Schedule-based scaling opportunities
  • ROI calculations and savings tracking

Monitoring & Analytics →

On-demand monitoring and reporting features:

  • Health reports and status dashboards
  • Performance metrics and trend analysis
  • Automated reporting and scheduled updates
  • Custom analytics and team-specific views

Security & Compliance →

Enterprise-grade security and audit capabilities:

  • Approval system deep dive
  • Complete audit trails and compliance reporting
  • Identity and access management integration
  • Security best practices and safeguards

AI & Safety Features

Intelligent Understanding

  • Natural Language Processing: Understands context and intent
  • Fuzzy Matching: Handles typos and variations in commands
  • Context Awareness: Remembers conversation history for follow-ups

Safety by Design

  • Read-First Policy: All operations require explicit confirmation
  • Risk Assessment: AI evaluates potential impact before actions
  • Audit Trail: Complete logging with user identity tracking

Quick Start

Try These Commands

@Ohlala SmartOps what instances do I have?

@Ohlala SmartOps show me a health report

@Ohlala SmartOps analyze my EC2 costs

@Ohlala SmartOps which instances need attention?

Best Practices

  1. Start with Read-Only: Explore monitoring features first
  2. Use Natural Language: Don’t worry about exact syntax
  3. Review Before Approving: Always check what commands will do
  4. Ask Follow-ups: Build on previous responses for context

Integration Capabilities

Native AWS Services

  • EC2: Complete instance lifecycle management
  • Systems Manager: Secure command execution
  • CloudWatch: Metrics collection and analysis
  • Cost Explorer: Detailed cost analysis
  • Bedrock: AI-powered insights

Communication Platforms

  • Microsoft Teams: Full-featured chat interface with Adaptive Cards
  • Google Chat: Full-featured chat interface for Google Workspace users
  • Azure AD / Google Workspace: Enterprise identity and access management
  • Slack: Coming soon with comparable feature set

Key Benefits

Operational Efficiency

  • Streamlined workflows with AI-powered assistance
  • Faster incident response through automated discovery and analysis
  • Reduced manual overhead for routine infrastructure tasks

Infrastructure Optimization

  • Cost optimization recommendations based on actual usage patterns
  • Right-sizing suggestions for underutilized resources
  • Proactive monitoring to identify optimization opportunities

Next Steps

Explore Features in Detail

Get Started

Need Help?

4 - Bot Commands & Examples

Complete guide to Ohlala SmartOps chat commands and conversation examples for Microsoft Teams. Learn natural language patterns and see real responses.

Command Overview

SmartOps understands both natural language and specific commands. You can interact in three ways:

  1. Natural Language: “Show me instances that are running high on CPU”
  2. Direct Commands: “list instances”, “health report”
  3. Contextual Queries: Follow-up questions based on previous responses

Safety Through Approval System

Documentation Sections

Slash Commands →

Built-in commands for quick access to common operations:

  • Essential commands (/help, /status, /instances)
  • Information commands (/version, /regions, /limits)
  • Utility commands (/clear, /settings, /feedback)
  • Support commands (/debug, /contact)

Command Examples →

Detailed examples of all available commands with natural language variations and expected responses:

  • Instance management (list, describe, control)
  • Health monitoring and troubleshooting
  • Cost optimization and rightsizing
  • Remote command execution

Natural Language Features →

Learn how SmartOps understands context and intent:

  • Context awareness and fuzzy matching
  • Intent recognition patterns
  • Follow-up conversations
  • Handling typos and variations

Quick Start Commands

Try these commands to get started:

Basic Information

@Ohlala SmartOps help

@Ohlala SmartOps what instances do I have?

@Ohlala SmartOps show me a health report

Natural Language

@Ohlala SmartOps which instances need attention?

@Ohlala SmartOps how much am I spending on EC2?

@Ohlala SmartOps help me troubleshoot my web server

Follow-up Questions

After any response, you can ask follow-up questions like:

  • “Show me more details about that”
  • “What would you recommend?”
  • “Can you help me fix this?”

Best Practices

  1. Start Simple: Begin with read-only commands to get familiar
  2. Use Natural Language: Don’t worry about exact syntax
  3. Ask Follow-ups: Build on previous responses for context
  4. Review Before Approving: Always check what commands will do

Next Steps

Need Help?

4.1 - Slash Commands Reference

Complete reference for built-in slash commands and their usage

Essential Commands

/help

Purpose: Display all available commands and features

Usage: /help or /help [command] for detailed help on a specific command

Response: Interactive adaptive card showing:

  • All available slash commands
  • Natural language command examples
  • Quick action buttons for common operations
  • Localized content based on user’s Teams language

Help command showing interactive card with all available bot features


/instances

Purpose: List all EC2 instances with interactive management options

Usage: /instances

Response: Interactive card displaying:

  • Instance IDs, names, and tags
  • Current state (running, stopped, etc.)
  • Instance type and platform
  • SSM connectivity status
  • Quick action buttons for each instance

Instances command showing interactive list of EC2 instances with action buttons


/health

Purpose: Comprehensive health dashboard for instances

Usage: /health for all instances, or /health [instance-id] for a specific instance

Response: Rich dashboard featuring:

  • CPU, memory, and disk usage metrics
  • SSM agent connectivity status
  • Visual health indicators and charts
  • System performance trends
  • CloudWatch metrics integration

Health command displaying comprehensive dashboard with metrics and charts Health command showing detailed instance health metrics and status


/rightsizing

Purpose: Cost optimization and rightsizing recommendations

Usage: /rightsizing

Response: Cost optimization dashboard with:

  • Current instance utilization analysis
  • Rightsizing recommendations
  • Potential cost savings calculations
  • Instance type upgrade/downgrade suggestions
  • CloudWatch metrics-based insights

Rightsizing command showing cost optimization recommendations and potential savings


Monitoring & Management Commands

/status

Purpose: Show pending commands and recent activity

Usage: /status

Response: Command status dashboard showing:

  • Currently pending SSM commands
  • Recent command execution history
  • Command success/failure rates
  • AWS Console links for detailed monitoring
  • Elapsed time for running operations

Status command displaying pending operations and recent activity


/history

Purpose: View detailed command execution history

Usage: /history

Response: Comprehensive command history with:

  • Past command executions
  • Success/failure status
  • Detailed results and outputs
  • Timestamp and user information
  • Filtering and search capabilities

History command showing detailed execution history and results


/token-usage

Purpose: Monitor Bedrock AI token usage and costs

Usage: /token-usage

Response: Token usage analytics including:

  • Current billing period usage
  • Token consumption trends
  • Cost breakdown by operation type
  • Usage limits and quotas
  • Optimization recommendations

Token usage command showing AI usage analytics and cost breakdown Token usage command showing AI usage analytics and cost breakdown with details


Command Tips

Quick Access

  • Type / in Teams to see all available slash commands
  • Commands are case-insensitive: /help = /HELP = /Help
  • Use Tab completion in Teams for faster command entry

Combining with Natural Language

You can follow slash commands with natural language for more specific requests:

/health show me only instances with high CPU usage

/instances filter by production environment

/help with cost optimization

Command Parameters

  • Most commands work without parameters for overview information
  • Add instance IDs for specific instance details: /health i-1234567890abcdef0
  • Use /help [command] for detailed usage instructions

Command History

  • Use ↑ (up arrow) in Teams to repeat recent commands
  • All commands are logged for audit purposes
  • Interactive cards maintain state for better user experience

Command Comparison

Command Speed Detail Level Best For
/instances Fast Interactive Instance management
"show me my instances" Slower Conversational Analysis & insights
/health Fast Dashboard Health monitoring
"which instances need attention?" Slower AI Analysis Troubleshooting
/status Fast Current Operation tracking

Error Handling

Common Issues

Command not recognized:

Unknown command: /instaces
Did you mean: /instances?

Missing permissions:

❌ Insufficient AWS permissions for this operation
Contact your administrator to review IAM policies

Service unavailable:

⚠️ AWS services temporarily unavailable
Try again in a few moments or use /status for details

Recovery Steps

  1. Check spelling - Commands must be exact
  2. Verify AWS permissions - Commands require proper IAM roles
  3. Try /status - Check if services are operational
  4. Use /help - See all available commands

Next Steps

Learn More

Quick Start

Try these commands right now in Teams:

  1. /help - See what’s available
  2. /instances - View your EC2 instances with interactive controls
  3. /health - Check instance health dashboard
  4. /rightsizing - Discover cost optimization opportunities
  5. "show me instances that need attention" - Try natural language

Advanced Usage

Command Workflows

Combine slash commands for powerful workflows:

  1. /instances → Click instance → View health details
  2. /health → Identify issues → Use natural language for troubleshooting
  3. /rightsizing → Review recommendations → Ask for implementation help
  4. /status → Monitor ongoing operations → /history for detailed results

Interactive Features

  • Action Buttons: Most commands include interactive buttons for common actions
  • Context Preservation: Commands remember your selections for follow-up questions
  • Real-time Updates: Health and status information refreshes automatically
  • Multi-language Support: Commands adapt to your Teams language preference

Need Help?

4.2 - Command Examples & Usage

Detailed examples of all SmartOps commands with natural language variations and expected responses for EC2 management in Teams.

Command Categories

Instance Management

List Instances

Shows all EC2 instances with current status and basic metrics.

Natural Language Examples:

  • “What instances do I have?”
  • “Show me all EC2 instances”
  • “List my servers”

Direct Command: @Ohlala SmartOps list instances

Response Format:

📊 EC2 Instance Summary
Found 5 instances in us-east-1

✅ web-server-01 (i-0abc123def)
   Type: t3.medium | State: running
   CPU: 45% | Memory: 62% | Disk: 38%

⚠️ database-01 (i-0def456ghi)
   Type: m5.large | State: running
   CPU: 78% | Memory: 85% | Disk: 72%

[... more instances ...]

Get Instance Details

Detailed information about a specific instance.

Natural Language Examples:

  • “Tell me about instance i-0abc123def”
  • “Show details for web-server-01”
  • “What’s the configuration of my database server?”

Direct Command: @Ohlala SmartOps describe instance <instance-id>

Response Format:

📋 Instance Details: web-server-01

Instance ID: i-0abc123def
Type: t3.medium (2 vCPU, 4 GB RAM)
State: running (since 2024-03-15 10:30 UTC)
Platform: Amazon Linux+
AZ: us-east-1a
Private IP: 10.0.1.45
Public IP: 54.123.45.67

Tags:
- Name: web-server-01
- Environment: production
- Team: platform

Monitoring:
- CPU: 45% (avg last hour)
- Memory: 62% (current)
- Network In: 125 MB/hour
- Network Out: 450 MB/hour

Health Monitoring

Health Report

Comprehensive health status of all instances.

Natural Language Examples:

  • “Show me the health report”
  • “How healthy are my instances?”
  • “Give me a status update”

Direct Command: @Ohlala SmartOps health report

Response Format:

🏥 Infrastructure Health Report
Generated: 2024-03-20 14:30 UTC

Overall Health: ⚠️ ATTENTION NEEDED

Summary:
✅ Healthy: 12 instances
⚠️ Warning: 3 instances
❌ Critical: 1 instance

Issues Requiring Attention:

❌ CRITICAL: app-server-03
   - CPU: 95% (sustained for 30 min)
   - Action: Consider scaling or investigating process

⚠️ WARNING: database-01
   - Disk: 85% full
   - Action: Clean up logs or expand storage

⚠️ WARNING: web-cache-02
   - Memory: 88% utilized
   - Action: Monitor for OOM issues

📈 Trends:
- CPU usage up 15% from yesterday
- 2 new instances added this week
- Cost trending 8% over budget

Instance Health Check

Check health of specific instance.

Natural Language Examples:

  • “Is web-server-01 healthy?”
  • “Check the health of i-0abc123def”
  • “How is my database server doing?”

Direct Command: @Ohlala SmartOps check health <instance-id>

Cost Optimization

Cost Analysis

Analyze EC2 costs and identify savings opportunities.

Natural Language Examples:

  • “Analyze my EC2 costs”
  • “Where can I save money?”
  • “Show me cost optimization opportunities”

Direct Command: @Ohlala SmartOps cost analysis

Response Format:

💰 EC2 Cost Analysis Report
Period: Last 30 days

Current Spending:
- Total: $3,456.78
- On-Demand: $2,890.45 (84%)
- Reserved: $566.33 (16%)
- Spot: $0.00 (0%)

Top Recommendations:

1. 🎯 Right-size Overprovisioned Instances
   Potential Savings: $456/month (13%)

   - web-server-01: t3.medium → t3.small
     Current: 15% CPU avg → Save $28/month

   - test-server-02: m5.xlarge → m5.large
     Current: 8% CPU avg → Save $95/month

2. 💼 Purchase Reserved Instances
   Potential Savings: $890/month (26%)

   - 5 instances running 24/7
   - Recommend 1-year no upfront RIs

3. 🌙 Implement Schedule-Based Scaling
   Potential Savings: $234/month (7%)

   - Dev/test instances can be stopped nights/weekends
   - 10 instances identified

Total Potential Savings: $1,580/month (46%)

Rightsizing Recommendations

Get specific rightsizing suggestions.

Natural Language Examples:

  • “Which instances should I rightsize?”
  • “Show me oversized instances”
  • “Find underutilized servers”

Direct Command: @Ohlala SmartOps rightsizing recommendations

Troubleshooting

Troubleshoot Instance

AI-guided troubleshooting for instance issues.

Natural Language Examples:

  • “My web server is slow”
  • “Help me troubleshoot i-0abc123def”
  • “Database connections are timing out”

Direct Command: @Ohlala SmartOps troubleshoot <instance-id>

Interactive Response:

🔧 Troubleshooting Assistant

I'll help you troubleshoot web-server-01. Let me gather some information...

Current Status:
- Instance is running
- CPU: 45% (normal)
- Memory: 92% (HIGH)
- Disk I/O: Normal
- Network: Normal

⚠️ High memory usage detected!

Let me check what's consuming memory...
[Running diagnostic commands via SSM]

Top Memory Consumers:
1. java process: 2.8 GB (70%)
2. mysql: 650 MB (16%)
3. nginx: 120 MB (3%)

Recommendations:
1. Immediate: Restart the Java application
2. Short-term: Increase instance type to t3.large
3. Long-term: Investigate memory leak in application

Would you like me to:
A) Restart the Java application now
B) Show application logs
C) Create a snapshot before changes

Find Issues

Identify instances with problems.

Natural Language Examples:

  • “Which instances need attention?”
  • “Show me problematic servers”
  • “Find unhealthy instances”

Direct Command: @Ohlala SmartOps find issues

Remote Execution

Execute Command

Run commands on instances via SSM.

Natural Language Examples:

  • “Run ‘df -h’ on web-server-01”
  • “Check disk space on all instances”
  • “Restart nginx on the web servers”

Direct Command: @Ohlala SmartOps execute "<command>" on <instance-id>

Safety Features:

  • Confirmation required for all SSM commands
  • Commands run with limited privileges
  • Audit trail maintained
  • Output limited to 24,000 characters

Response Format:

🔨 Command Execution Request

Target: web-server-01 (i-0abc123def)
Command: systemctl restart nginx

⚠️ This command will restart the nginx service.
This may cause brief downtime.

Type 'yes' to confirm execution

[After confirmation]

✅ Command Executed Successfully

Output:
nginx.service - The nginx HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled)
   Active: active (running) since Thu 2024-03-20 15:45:32 UTC

Execution Time: 1.2 seconds
Command ID: abc-def-ghi-123

Instance Control

Start Instance

Start stopped instances.

Natural Language Examples:

  • “Start web-server-01”
  • “Boot up the test environment”
  • “Turn on i-0abc123def”

Direct Command: @Ohlala SmartOps start instance <instance-id>

Stop Instance

Stop running instances.

Natural Language Examples:

  • “Stop the dev server”
  • “Shut down test-instance-02”
  • “Turn off i-0abc123def”

Direct Command: @Ohlala SmartOps stop instance <instance-id>

Safety Confirmation:

⚠️ Stop Instance Confirmation

You're about to stop: prod-database-01
Environment: production
Current connections: 45

This action will:
- Terminate all active connections
- Stop the instance (data on instance store volumes will be lost)
- Incur no further hourly charges

Type 'yes' to confirm stopping this instance

Reboot Instance

Restart instances gracefully.

Natural Language Examples:

  • “Reboot web-server-01”
  • “Restart my application server”
  • “Perform a soft reset on i-0abc123def”

Direct Command: @Ohlala SmartOps reboot instance <instance-id>

Scheduling

Schedule Report

Set up automated daily reports.

Natural Language Examples:

  • “Send me a daily health report at 9 AM”
  • “Schedule cost reports every Monday”
  • “Set up morning status updates”

Direct Command: @Ohlala SmartOps schedule daily report at <time>

Help and Information

Get Help

Show available commands and usage.

Natural Language Examples:

  • “Help”
  • “What can you do?”
  • “Show me available commands”

Direct Command: @Ohlala SmartOps help

Response Format:

🤖 Ohlala SmartOps - Command Reference

I understand natural language! Just describe what you need.
You can also use these commands:

📊 Monitoring
• list instances - Show all EC2 instances
• health report - Comprehensive health status
• check health <id> - Check specific instance

💰 Cost Optimization
• cost analysis - Analyze spending
• rightsizing recommendations - Find savings

🔧 Troubleshooting
• troubleshoot <id> - AI-guided diagnostics
• find issues - Identify problems

🔨 Remote Execution
• execute "<command>" on <id> - Run via SSM

⚙️ Instance Control
• start/stop/reboot instance <id>

📅 Scheduling
• schedule daily report at <time>

💡 Tips:
- Use instance names or IDs
- Ask follow-up questions
- Natural language works best!

Need more help? Visit docs.ohlala.cloud

Advanced Features

Bulk Operations

Execute commands across multiple instances:

@Ohlala SmartOps execute “sudo yum update -y” on tag:Environment=dev

Filtering

Filter instances by various criteria:

@Ohlala SmartOps list instances where cpu > 80%

@Ohlala SmartOps find instances tagged Environment=production

Chaining Commands

Combine multiple operations:

@Ohlala SmartOps stop all dev instances then create ami backups

Next Steps

Need Help?

4.3 - Natural Language Processing

Learn how SmartOps understands context and intent through natural language processing, fuzzy matching, and conversational AI.

Natural Language Processing Features

SmartOps uses Amazon Bedrock’s Claude AI to understand context and intent. Examples:

Context Awareness

User: "Show me expensive instances"
Bot: [Lists instances sorted by cost]

User: "Which of those can be rightsized?"
Bot: [Understands "those" refers to expensive instances]

Intent Recognition

User: "My website is down"
Bot: "I'll help troubleshoot. Let me check your web servers..."
[Automatically identifies web-tagged instances and checks health]

Fuzzy Matching

User: "Check the databse server"
Bot: "Checking database-server-01..."
[Handles typos and variations]

How SmartOps Understands You

1. Intent Classification

SmartOps recognizes different types of requests:

Information Requests:

  • “What instances do I have?”
  • “Show me the current status”
  • “How much am I spending?”

Action Requests:

  • “Restart the web server”
  • “Stop the test instances”
  • “Update all development servers”

Troubleshooting Requests:

  • “My application is slow”
  • “Why is the database not responding?”
  • “Help me fix this error”

2. Context Tracking

SmartOps remembers conversation context:

Example Conversation:

User: "List my production instances"
Bot: [Shows 5 production instances]

User: "Which one has the highest CPU?"
Bot: "Among your production instances, web-prod-02 has the highest CPU at 78%"

User: "Show me more details about that one"
Bot: [Shows detailed info for web-prod-02]

User: "Can you help me optimize it?"
Bot: "I can help optimize web-prod-02. Let me analyze its usage patterns..."

3. Entity Recognition

SmartOps identifies specific entities in your requests:

Instance References:

  • Instance IDs: “i-0abc123def”
  • Instance names: “web-server-01”
  • Tags: “all production instances”
  • Roles: “database servers”, “web servers”

Time References:

  • “last week”, “yesterday”, “this month”
  • “since 9 AM”, “in the past hour”

Metric References:

  • “high CPU”, “low memory”, “disk space”
  • “expensive instances”, “underutilized servers”

Conversation Patterns

Follow-up Questions

After any response, you can ask natural follow-ups:

After Health Report:

  • “What’s causing the high CPU on server-01?”
  • “How can I fix the memory issue?”
  • “Should I be worried about that warning?”

After Cost Analysis:

  • “How much would rightsizing save me?”
  • “Which instances should I prioritize?”
  • “Can you help me implement those recommendations?”

After Instance List:

  • “Which ones need updates?”
  • “Show me only the problematic ones”
  • “What would you recommend for optimization?”

Conversational Shortcuts

SmartOps understands casual language:

Informal Requests:

  • “What’s up with my servers?” → Health status check
  • “How are things looking?” → Overall status report
  • “Any issues I should know about?” → Problem identification
  • “Can you help me save money?” → Cost optimization analysis

Implicit Context:

  • “Restart it” (after discussing a specific instance)
  • “Show me more” (after any report or list)
  • “What do you think?” (asking for AI recommendations)

Advanced Language Features

Comparative Queries

User: "Which instances use more CPU than database-01?"
User: "Show me servers that cost more than $100/month"
User: "Find instances smaller than t3.medium"

Temporal Queries

User: "What happened to web-server-01 yesterday?"
User: "Show me cost trends over the last 3 months"
User: "Which instances were restarted this week?"

Conditional Logic

User: "If the CPU is over 80%, show me the top processes"
User: "When database connections exceed 50, check memory usage"
User: "Alert me if any production instance goes down"

Aggregation Requests

User: "Total cost of all development instances"
User: "Average CPU across web servers"
User: "Count of instances by environment"

Handling Ambiguity

When your request is unclear, SmartOps will ask for clarification:

Example Ambiguous Requests:

User: "Restart the server"
Bot: "I found 5 servers. Which one would you like me to restart?
- web-server-01 (production)
- web-server-02 (production)
- app-server-01 (staging)
- db-server-01 (production)
- test-server-01 (development)"

User: "Stop the expensive instances"
Bot: "I found 3 instances with costs over $200/month. Would you like to:
A) See details about these instances first
B) Stop all 3 instances
C) Stop only non-production instances"

Error Recovery

SmartOps handles common communication issues:

Typos and Misspellings

User: "chck the helth of web-srvr-01"
Bot: "Checking the health of web-server-01..."
[Automatically corrects and understands intent]

Incomplete Requests

User: "show instances"
Bot: "Here are your EC2 instances. Would you like me to:
- Show more details about any specific instance
- Filter by environment or status
- Include cost or performance information"

Mixed Languages (Limited)

User: "montrer les instances"
Bot: "I understand you want to see instances. Here are your EC2 instances..."
[Basic French recognition, responds in English]

Learning from Context

SmartOps learns patterns within a conversation:

Preference Learning

User: "Show instances without IDs"
Bot: [Remembers preference for this conversation]

User: "Now show the database servers"
Bot: [Lists database servers without IDs, as previously requested]

Domain Knowledge

SmartOps understands infrastructure terminology:

Technical Terms:

  • “Load balancer”, “auto-scaling group”, “availability zone”
  • “RDS”, “EBS”, “VPC”, “security group”
  • “Production”, “staging”, “development” environments

Common Patterns:

  • “Scale up/down”, “failover”, “backup”
  • “Peak hours”, “maintenance window”, “scheduled downtime”
  • “Performance bottleneck”, “capacity planning”

Best Practices for Natural Communication

1. Be Conversational

  • Use natural language like you’re talking to a colleague
  • Don’t worry about exact command syntax
  • Feel free to ask follow-up questions

2. Provide Context

  • Mention environment: “production servers”, “dev instances”
  • Include timeframes: “since yesterday”, “this week”
  • Specify scope: “all web servers”, “just the database”

3. Ask for Clarification

  • If responses aren’t what you expected, ask for clarification
  • Request more details: “Can you explain that further?”
  • Ask for alternatives: “What other options do I have?”

4. Build on Previous Responses

  • Use “that instance”, “those servers”, “the expensive ones”
  • Ask follow-ups: “What would you recommend?”, “Should I be concerned?”

Integration with Commands

Natural language seamlessly integrates with command execution:

User: "The production web servers seem slow, can you help?"

Bot: "I'll investigate your production web servers. Let me check their current status...

Found 3 production web servers:
- web-prod-01: CPU 45%, Memory 78%
- web-prod-02: CPU 89%, Memory 92% ⚠️
- web-prod-03: CPU 52%, Memory 65%

web-prod-02 appears to have high resource usage. Would you like me to:
A) Show detailed process information
B) Restart the web services
C) Investigate recent logs
D) Recommend instance scaling"

Next Steps

Need Help?

5 - Troubleshooting Guide

Solutions for common issues with Ohlala SmartOps deployment, Teams integration, and daily operations. Find quick fixes and detailed debugging steps.

Quick Diagnostics

Run this checklist to identify common issues:

  1. Check Service Health

    curl https://your-api-gateway-url/prod-{StackName}/health
    

    Expected: {"status": "healthy"}

  2. Verify CloudFormation Stack

    • AWS Console → CloudFormation
    • Stack status: CREATE_COMPLETE or UPDATE_COMPLETE
  3. Check ECS Service

    • AWS Console → ECS → Clusters
    • Service should have 1 running task
  4. Review Recent Logs

    • AWS Console → CloudWatch → Log Groups
    • Check /aws/ecs/ohlala-smartops-{StackName}

CloudWatch Logs Troubleshooting

Quick Log Analysis

Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.

1. Access ECS Task Logs

Via AWS Console:

  1. Go to CloudWatchLog Groups
  2. Find /aws/ecs/ohlala-smartops-{your-stack-name}
  3. Click on the most recent log stream
  4. Search for “ERROR” using Ctrl+F

Bot Not Responding

Symptoms

  • No response when messaging the bot in Teams
  • Bot appears offline
  • Commands timeout without response

Solution 1: Verify Webhook Configuration

  1. Check Webhook URL

    # Get from CloudFormation outputs
    aws cloudformation describe-stacks \
      --stack-name your-stack-name \
      --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \
      --output text
    
  2. Update in Azure Bot

    • Azure Portal → Your Bot → Configuration
    • Messaging endpoint must match CloudFormation output
    • Must end with /api/messages

Solution 2: Check Authentication

  1. Verify Secrets in AWS

    aws secretsmanager get-secret-value \
      --secret-id ohlala-smartops-teams-{StackName} \
      --query SecretString \
      --output json
    
  2. Validate Credentials Match Azure

    • App ID must match Azure Bot’s App ID
    • Password must be valid and not expired
    • Tenant ID must match your Azure AD
  3. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-authorizer-{StackName}
    • Look for “Authorization failed” messages

Solution 3: Teams App Issues

  1. Re-upload Teams Package

    • Remove existing app from Teams
    • Download fresh package
    • Update manifest.json with correct bot ID
    • Re-upload to Teams
    • You may need to manually bump the version in manifest.json to force Teams to accept the update
  2. Check Teams Policies

    • Teams Admin Center → Teams apps → Permission policies
    • Ensure custom apps are allowed
    • Check user has permission to use bots

Deployment Failures

CloudFormation Stack Failed

Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”

Solution:

# For Existing VPC mode, verify subnet IDs
aws ec2 describe-subnets \
  --subnet-ids subnet-xxxxx \
  --region your-region

Error: “CREATE_FAILED - IAM role already exists”

Solution:

# Delete existing role or use different stack name
aws iam delete-role --role-name ec2-management-bot-execution-role
aws iam delete-role --role-name ec2-management-bot-task-role

ECS Task Won’t Start

Error: “ResourceInitializationError: unable to pull secrets or registry auth”

Solution:

  1. Check ECR permissions
  2. Verify marketplace subscription is active
  3. Check execution role has secret access:
aws iam attach-role-policy \
  --role-name ec2-management-bot-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy

Bedrock Model Issues

Error: “ValidationException: The provided model identifier is invalid”

This is the #1 most common deployment issue!

Cause: Amazon Bedrock Claude Sonnet 4.5 model access is not enabled or not available in your deployment region.

Solution:

  1. Navigate to Amazon Bedrock Console

    • Go to AWS Console → Amazon Bedrock
    • Ensure you’re in the correct region (same as deployment)
  2. Enable Claude Sonnet 4.5 Model Access

    • Left sidebar → “Model access”
    • Click “Edit” or “Manage model access”
    • Find Anthropic section
    • Enable Claude Sonnet 4.5:
      • Claude Sonnet 4.5 (anthropic.claude-sonnet-4-5-*)
  3. Submit Request

    • Click “Next” → “Submit”
    • Most requests are approved immediately
    • Wait for status to show “Available”
  4. Verify Access

    # Test via AWS CLI
    aws bedrock list-foundation-models \
      --region us-east-1 \
      --query 'modelSummaries[?contains(modelId, `claude-sonnet`)]'
    
  5. Test in Bedrock Playground

    • Bedrock Console → Playgrounds → Chat
    • Select Claude Sonnet 4.5
    • Send test message: “Hello”
    • Should receive response
  6. Restart Application (if already deployed)

    # Force ECS service restart
    aws ecs update-service \
      --cluster your-cluster \
      --service your-service \
      --force-new-deployment
    

Regional Support with Cross-Region Inference Profiles:

Primary Regions (Native Claude Sonnet 4.5 Support):

  • us-east-1 (Recommended)
  • us-west-2
  • eu-west-1
  • eu-central-1
  • ap-northeast-1
  • ap-southeast-2

Supported via Inference Profiles:

  • eu-west-3 (via global/EU inference profiles)
  • eu-west-2 (via global/EU inference profiles)
  • eu-north-1 (via global/EU inference profiles)
  • ap-southeast-1 (via global/APAC inference profiles)
  • ap-northeast-2 (via global/APAC inference profiles)
  • ap-south-1 (via global/APAC inference profiles)
  • ca-central-1 (via global inference profiles)
  • sa-east-1 (via global inference profiles)

How Inference Profiles Work:

  1. Global Profile: global.anthropic.claude-sonnet-4-5-* - Works from any region
  2. Regional Profiles: eu.anthropic.claude-sonnet-4-5-* - Optimized for EU regions
  3. Automatic Fallback: Application automatically tries the best profile for your region

For eu-west-3 Specifically:

  • The application will automatically use global or EU inference profiles
  • No additional configuration required
  • Same Claude Sonnet 4.5 quality and performance

Error: “AccessDeniedException: You do not have access to the requested model”

Cause: Model access requested but not yet approved, or using wrong model ID.

Solution:

  1. Check approval status:

    • Bedrock Console → Model access
    • Status should be “Available”, not “Pending”
  2. Wait for approval:

    • Standard models: Usually immediate
    • Advanced models: Up to 24-48 hours
    • Check email for approval notification

Permission Issues

Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”

Solution:

  1. Add Bedrock permissions to ECS task role:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*:*:inference-profile/*claude*"
      ]
    }
  ]
}
  1. Ensure Bedrock is available in your region
  2. Check Service Control Policies (SCPs) aren’t blocking access

Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”

Solution:

  1. Update task role policy
  2. Check for SCPs (Service Control Policies) blocking access
  3. Verify cross-account permissions if using multiple accounts

Teams Integration Issues

Bot Shows as Offline

Causes & Solutions:

  1. Azure Bot Channel Not Configured

    • Azure Portal → Bot → Channels
    • Ensure Teams channel is enabled
    • Status should be “Running”
  2. API Gateway Throttling

    • Check CloudWatch metrics for 429 errors
  3. Network Connectivity

    • Verify security groups allow HTTPS outbound
    • Check NAT Gateway is functioning (if used)

Messages Not Formatted Correctly

Issue: Bot responses show raw JSON or markdown

Solution:

  1. Update Teams app manifest version
  2. Ensure bot supports Adaptive Cards:
"supportsFiles": false,
"supportsCalling": false,
"supportsVideo": false

Bot Added but Can’t Use Commands

Issue: Bot visible but commands don’t work

Solution:

  1. Check bot is added to channel properly
  2. Verify @ mentions are working
  3. Test in personal chat first
  4. Check Teams app permissions

Google Chat Integration Issues

Bot Not Responding in Google Chat

Causes & Solutions:

  1. Check Google Chat is Enabled

    • CloudFormation parameter GoogleChatEnabled must be true
    • Verify ECS task has GOOGLE_CHAT_ENABLED=true environment variable
  2. Verify Webhook URL Configuration

    • Go to Google Cloud Console → Chat API Configuration
    • App URL must match CloudFormation output GoogleChatWebhookURL
    • Must end with /api/google-chat
  3. Check Service Account Credentials

    • Verify GoogleChatServiceAccountInfo parameter is correctly formatted
    • JSON must be on a single line
    • Project ID must match the service account’s project

Most Common Cause: Apps not published through Google Workspace Marketplace must explicitly list allowed users.

Solution:

  1. Check Visibility Settings (most likely issue)

    • Go to Google Cloud ConsoleAPIs & ServicesGoogle Chat APIConfiguration
    • Scroll to Visibility section
    • Select “Make this Chat app available to specific people and groups”
    • Click Add people or groups
    • Add your email address explicitly (even if you’re the project owner)
    • Click Save
    • Wait 2-5 minutes for propagation
  2. Verify Configuration is Saved

    • Check for a green “Saved” confirmation
    • Refresh the page and verify settings persisted
  3. Try Different Discovery Methods

    • In Google Chat: + New chatFind apps → search for app name
    • In a Space: Click space name → Apps & integrationsAdd apps

Google Chat Authentication Errors

Symptoms: 401/403 errors in CloudWatch logs for /api/google-chat

Solution:

  1. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-gc-authorizer-{StackName}
    • Look for specific error messages
  2. Verify Audience URL

    • In Chat API Configuration, Authentication Audience must be set to “App URL”
    • The audience in JWT tokens will match your webhook endpoint
  3. Validate Service Account JSON

    • Ensure JSON is properly escaped when passed to CloudFormation
    • Use jq -c to convert to single line

Cards Not Rendering Properly in Google Chat

Issue: Charts or cards don’t display correctly

Solution:

  1. Check QuickChart Container

    • ECS task should have quickchart container running
    • Check container logs for errors
  2. Verify S3 Bucket

    • Chart images are stored in S3 bucket smartops-charts-{StackName}
    • Check bucket permissions and lifecycle rules
  3. Check CloudWatch Logs

    • Search for “chart” or “card” errors in main-bot logs

Google Chat vs Teams Behavior Differences

Known Differences:

  • Message Updates: Google Chat has limited message update support
  • Card Format: Uses Google Card JSON instead of Adaptive Cards
  • Charts: Uploaded to S3 as images (Teams renders inline)
  • @mentions: Required in spaces, optional in direct messages

Getting Support

Before Contacting Support

  1. Collect diagnostic information:

    • Stack name and region
    • Error messages (exact text)
    • CloudWatch logs (last 100 lines)
    • Time of occurrence
  2. Try quick fixes:

    • Restart ECS service
    • Clear Teams cache
    • Re-authenticate bot

Contact Support

Email: support@ohlala.cloud

Include:

  • AWS Account ID
  • Stack Name
  • Error Description
  • Steps to Reproduce
  • Diagnostic Logs

Response Time: 1 business day

Additional Resources

6 - Deployment Reference

CloudFormation template parameters and advanced deployment configuration options

Parameter Overview

The template supports two deployment modes:

  • NewVPC: Creates complete network infrastructure (recommended)
  • ExistingVPC: Integrates with your existing VPC

Required Parameters

Deployment Configuration

DeploymentMode

  • Type: String
  • Default: NewVPC
  • Allowed Values: NewVPC, ExistingVPC
  • Description: Choose to create a new VPC or use existing VPC infrastructure

ContainerImageTag

  • Type: String
  • Default: v1.1.0
  • Description: Container image tag version (e.g., v1.0.0, v1.1.0)
  • Example: v1.1.0

Microsoft Teams Configuration

MicrosoftAppId

  • Type: String
  • Description: Microsoft Teams Bot App ID
  • Format: GUID format
  • Example: 12345678-90ab-cdef-1234-567890abcdef
  • Where to find: Azure Portal → Bot Resource → Configuration
  • NoEcho: false

MicrosoftAppPassword

  • Type: String
  • Description: Microsoft Teams Bot App Password
  • Format: String with special characters
  • Example: abcDEF123~hijKLM456-nopQRS789.tuvWXY012
  • Where to find: Created during bot registration (save immediately!)
  • NoEcho: true (hidden in console)

MicrosoftAppTenantId

  • Type: String
  • Description: Microsoft Teams Tenant ID
  • Format: GUID format
  • Example: 87654321-abcd-efgh-4321-0987654321fe
  • Where to find: Azure Portal → Azure Active Directory → Overview
  • NoEcho: false

Google Chat Configuration (Optional)

These parameters are optional and only needed if you want to enable Google Chat integration.

GoogleChatEnabled

  • Type: String
  • Default: "false"
  • Allowed Values: "true", "false"
  • Description: Enable Google Chat integration
  • When to enable: If your organization uses Google Workspace and Google Chat

GoogleChatProjectId

  • Type: String
  • Default: "" (empty)
  • Description: Google Cloud Project ID for Google Chat integration
  • Format: lowercase letters, numbers, and hyphens
  • Example: my-company-smartops-123456
  • Where to find: Google Cloud Console → Project selector → Project ID
  • NoEcho: false

GoogleChatServiceAccountInfo

  • Type: String
  • Default: "" (empty)
  • Description: Google Chat service account JSON key (single line)
  • Format: JSON string on a single line
  • Where to find: GCP Console → IAM → Service Accounts → Keys → Create JSON key
  • NoEcho: true (hidden in console)
  • Important: Must be minified to a single line. Use cat key.json | jq -c on Linux/Mac

AI Features (Optional)

EnableAIInsights

  • Type: String
  • Default: "false"
  • Allowed Values: "true", "false"
  • Description: Enable AI-powered insights in instance reports (uses Claude Sonnet 4 via Bedrock)
  • When to enable: For enhanced analysis and recommendations in reports

Existing VPC Parameters

These parameters are required only when DeploymentMode: ExistingVPC:

ExistingVPCId

  • Type: String
  • Default: "" (empty)
  • Description: ID of existing VPC (e.g., vpc-12345678)
  • Pattern: ^(vpc-[0-9a-f]{8,17})?$
  • Example: vpc-0123456789abcdef0
  • Constraint: Must be a valid VPC ID or empty for NewVPC mode

ExistingPrivateSubnet1Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of first private subnet (e.g., subnet-12345678)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0123456789abcdef0
  • Requirement: Must be in different AZ from ExistingPrivateSubnet2Id

ExistingPrivateSubnet2Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of second private subnet in different AZ (e.g., subnet-87654321)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0fedcba9876543210
  • Requirement: Must be in different AZ from ExistingPrivateSubnet1Id

ExistingPublicSubnet1Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of first public subnet (e.g., subnet-abcd1234)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0abcd1234efgh5678
  • Requirement: Must be in different AZ from ExistingPublicSubnet2Id

ExistingPublicSubnet2Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of second public subnet in different AZ (e.g., subnet-dcba4321)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0dcba4321hgfe8765
  • Requirement: Must be in different AZ from ExistingPublicSubnet1Id

NewVPC Network Configuration

These parameters are optional and only used when DeploymentMode: NewVPC:

VPCCIDR

  • Type: String
  • Default: 10.0.0.0/16
  • Description: CIDR block for the VPC
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.0.0/16

PublicSubnet1CIDR

  • Type: String
  • Default: 10.0.1.0/24
  • Description: CIDR block for public subnet 1
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.1.0/24

PublicSubnet2CIDR

  • Type: String
  • Default: 10.0.2.0/24
  • Description: CIDR block for public subnet 2
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.2.0/24

PrivateSubnet1CIDR

  • Type: String
  • Default: 10.0.10.0/24
  • Description: CIDR block for private subnet 1
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.10.0/24

PrivateSubnet2CIDR

  • Type: String
  • Default: 10.0.11.0/24
  • Description: CIDR block for private subnet 2
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.11.0/24

EnableNATGateway

  • Type: String
  • Default: "true"
  • Allowed Values: "true", "false"
  • Description: Enable NAT Gateway for private subnets
  • Cost Impact: NAT Gateway adds ~$32/month
  • Recommendation: Set to "false" for cost savings if outbound internet not needed

Stack Outputs

The template provides these outputs after successful deployment:

APIGatewayEndpoint

  • Description: API Gateway endpoint URL
  • Format: https://{ApiGateway}.execute-api.{Region}.amazonaws.com/prod-{StackName}
  • Usage: Base URL for API access

TeamsWebhookURL

  • Description: URL to configure in Microsoft Teams Bot Framework
  • Format: https://{ApiGateway}.execute-api.{Region}.amazonaws.com/prod-{StackName}/api/messages
  • Usage: Set this as the messaging endpoint in Azure Bot Configuration

GoogleChatWebhookURL

  • Description: URL to configure in Google Chat API Configuration
  • Format: https://{ApiGateway}.execute-api.{Region}.amazonaws.com/prod-{StackName}/api/google-chat
  • Usage: Set this as the App URL in Google Cloud Console → Chat API Configuration

ECSCluster

  • Description: ECS Cluster Name
  • Format: OhlalaSmartOps-Cluster-{StackName}
  • Usage: For monitoring and management

ECSService

  • Description: ECS Service Name
  • Format: OhlalaSmartOps-Service-{StackName}
  • Usage: For monitoring and scaling

VPCId

  • Description: VPC ID (created or existing)
  • Format: vpc-xxxxxxxxx
  • Usage: For reference and additional resource creation

Deployment Examples

Simple NewVPC Deployment

Parameters:
  DeploymentMode: NewVPC
  ContainerImageTag: v1.1.0
  MicrosoftAppId: "12345678-90ab-cdef-1234-567890abcdef"
  MicrosoftAppPassword: "your-secret-password"
  MicrosoftAppTenantId: "87654321-abcd-efgh-4321-0987654321fe"
  EnableNATGateway: "false"  # Cost optimization

Custom NewVPC with Different CIDR

Parameters:
  DeploymentMode: NewVPC
  VPCCIDR: "172.16.0.0/16"
  PublicSubnet1CIDR: "172.16.1.0/24"
  PublicSubnet2CIDR: "172.16.2.0/24"
  PrivateSubnet1CIDR: "172.16.10.0/24"
  PrivateSubnet2CIDR: "172.16.11.0/24"
  EnableNATGateway: "true"
  # ... Teams parameters

ExistingVPC Deployment

Parameters:
  DeploymentMode: ExistingVPC
  ExistingVPCId: "vpc-0123456789abcdef0"
  ExistingPrivateSubnet1Id: "subnet-0123456789abcdef0"
  ExistingPrivateSubnet2Id: "subnet-0fedcba9876543210"
  ExistingPublicSubnet1Id: "subnet-0abcd1234efgh5678"
  ExistingPublicSubnet2Id: "subnet-0dcba4321hgfe8765"
  # ... Teams parameters

Parameter Validation

The template includes validation rules:

Pattern Validation

  • VPC IDs: Must match vpc- followed by 8-17 hex characters
  • Subnet IDs: Must match subnet- followed by 8-17 hex characters
  • CIDR Blocks: Must be valid IP CIDR format

Logical Validation

  • ExistingVPC mode requires all four subnet IDs
  • Subnets must be in at least 2 different availability zones
  • CIDR blocks must not overlap

Cross-Parameter Rules

  • If DeploymentMode: ExistingVPC, all existing VPC parameters are required
  • If DeploymentMode: NewVPC, existing VPC parameters are ignored

Cost Impact by Parameter

Parameter Cost Impact Notes
EnableNATGateway: "true" +$32/month Only for NewVPC mode
EnableNATGateway: "false" $0 Saves money but no outbound internet
DeploymentMode: ExistingVPC $0 Uses existing network infrastructure
ContainerImageTag $0 No cost difference between versions

Common Parameter Errors

Missing Required Parameters

Template validation error: Parameter 'MicrosoftAppId' must have a value

Solution: Provide all required Teams configuration parameters

Invalid VPC ID Format

Parameter validation failed: vpc-invalid does not match pattern

Solution: Use correct format: vpc- + 8-17 hex characters

Subnet AZ Requirements Not Met

The subnet IDs must be in at least two different availability zones

Solution: Choose subnets from different AZs in your region

ExistingVPC Missing Parameters

When using ExistingVPC mode, you must provide all subnet IDs

Solution: Provide all four subnet parameters for ExistingVPC mode

Additional Resources