This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Documentation

Complete documentation for Ohlala SmartOps - AI-powered EC2 management in Microsoft Teams

Ohlala SmartOps Documentation

Welcome to the complete documentation for Ohlala SmartOps, your AI-powered EC2 management assistant for Microsoft Teams.

Quick Navigation

🚀 Getting Started

Deploy SmartOps in 15 to 20 minutes and start managing EC2 instances with natural conversations.

🏗️ Architecture

System design, performance characteristics, and architectural decisions

☁️ Deployment Guide

AWS Marketplace deployment and CloudFormation configuration

⭐ Features

Complete feature overview and command reference

What is Ohlala SmartOps?

Ohlala SmartOps is an AI-powered infrastructure management solution that brings AWS EC2 operations directly into Microsoft Teams. Using Amazon Bedrock’s Claude AI, it understands natural language commands and provides intelligent responses with visual dashboards.

Core Capabilities

  • 🤖 Conversational AI Interface - Manage infrastructure using plain English
  • 📊 Intelligent Analysis - On-demand health checks and anomaly detection
  • 💰 Cost Optimization - Automated rightsizing recommendations
  • 🔧 Remote Troubleshooting - SSM integration for command execution
  • 📈 Visual Dashboards - On-demand metrics and reports delivered in Teams
  • 🔒 Enterprise Security - Runs in your AWS account with IAM controls

Documentation Overview

Getting Started →

New to SmartOps? Start here for prerequisites, quick setup, and your first commands.

Architecture →

System design, architectural highlights, performance characteristics, and limitations.

Deployment →

CloudFormation parameters reference and advanced deployment configuration.

Teams Setup →

Step-by-step guide for Azure Bot registration and Microsoft Teams app configuration.

Features →

Comprehensive overview of all features including on-demand analysis, cost optimization, and troubleshooting capabilities.

Architecture →

Technical deep-dive into system components, security model, and data flow.

Bot Commands →

Complete command reference and conversation examples.

Troubleshooting →

Solutions for common issues and frequently asked questions.

Support →

Get help, report issues, and stay updated with release notes.

Search Documentation

Use the search box in the top navigation to quickly find specific topics, commands, or troubleshooting guides.

Need Help?

1 - Getting Started with Ohlala SmartOps

Deploy Ohlala SmartOps and begin managing EC2 instances through Microsoft Teams

Getting Started

Welcome to Ohlala SmartOps! Follow this step-by-step guide to deploy and configure your AI-powered AWS infrastructure management assistant.

🎯 What You’ll Accomplish

By completing this guide, you’ll be able to:

  • ✅ Deploy SmartOps in your AWS account
  • ✅ Connect it to Microsoft Teams
  • ✅ Execute your first EC2 management commands
  • ✅ View health reports and cost optimization recommendations

📚 Deployment Steps

The deployment process is organized into clear, manageable steps:

Estimated Time: 15-20 minutes total Difficulty: Intermediate Prerequisites: AWS account with admin access, Azure account, Teams workspace

1. Check Prerequisites

Verify you have everything needed before starting

  • AWS account requirements
  • Microsoft Teams setup
  • Required permissions

2. Enable Amazon Bedrock

Configure AI model access in your AWS account

  • Enable Claude Sonnet 4 model
  • Verify regional availability
  • Understand inference profiles

3. Subscribe via AWS Marketplace

Get Ohlala SmartOps from AWS Marketplace

  • Subscribe to the product
  • Download CloudFormation template
  • Understand pricing

4. Register Azure Bot

Create and configure your Teams bot

  • Create Azure Bot resource
  • Generate authentication credentials
  • Configure bot settings

5. Deploy CloudFormation Stack

Launch the infrastructure in AWS

  • Configure stack parameters
  • Deploy resources
  • Monitor deployment progress

6. Connect to Teams

Integrate the bot with Microsoft Teams

  • Configure webhook URL
  • Install Teams app
  • Test the connection

7. Verify & Test

Confirm everything is working

  • Run test commands
  • Check health reports
  • Troubleshoot common issues
StepTitleDescription
1PrerequisitesCheck requirements and prepare your environment
2Bedrock SetupEnable AI models in your AWS account
3AWS MarketplaceSubscribe and download CloudFormation template
4Azure BotConfigure Microsoft Teams bot
5CloudFormationDeploy infrastructure in AWS
6Teams IntegrationConnect bot to Microsoft Teams
7Verify & TestConfirm deployment and run first commands

💡 Pro Tips

  • Start Simple: Begin with monitoring features before enabling modifications
  • Budget Awareness: SmartOps includes intelligent cost tracking with $5 milestone warnings
  • Clear Error Messages: The bot provides actionable guidance when issues occur
  • Team Collaboration: Share the assistant with your team for maximum productivity

Need Help?

1.1 - Prerequisites

Check all requirements before deploying Ohlala SmartOps

Prerequisites

Before deploying Ohlala SmartOps, ensure you have all the necessary requirements in place.

✅ AWS Requirements

AWS Account

  • Administrative access to create IAM roles, ECS clusters, and other resources
  • AWS Marketplace subscription capability
  • Billing enabled for AWS services usage

Required AWS Services

Ensure these services are available in your chosen region:

  • Amazon ECS Fargate - Container orchestration
  • Amazon Bedrock - AI model access
  • API Gateway - Teams webhook endpoint
  • CloudFormation - Infrastructure deployment
  • Systems Manager (SSM) - Instance management

Existing Infrastructure

  • At least 1 EC2 instance to manage
  • SSM Agent installed on instances

✅ Microsoft Teams Requirements

Azure Account

  • Azure subscription (Free tier works)
  • Ability to create resources in Azure Portal
  • Azure AD tenant for authentication

Teams Workspace

  • Microsoft Teams installed and configured
  • Admin permissions to install custom apps
  • Teams channel where you want to add the bot

✅ Knowledge Requirements

  • Basic understanding of AWS services
  • Familiarity with CloudFormation
  • Experience with Microsoft Teams administration
  • Understanding of bot concepts

Not Required

  • Programming knowledge
  • Deep AWS expertise
  • Infrastructure as Code experience

📋 Pre-Deployment Checklist

Use this checklist to verify readiness:

  • AWS account with admin access
  • AWS region selected that supports all required services
  • At least one EC2 instance running
  • SSM Agent installed on instances
  • Azure account created
  • Microsoft Teams workspace available
  • Teams admin permissions confirmed

🌍 Regional Availability

For optimal performance, we recommend:

  • US East (N. Virginia) - us-east-1
  • US West (Oregon) - us-west-2
  • Europe (Ireland) - eu-west-1
  • Europe (Frankfurt) - eu-central-1

All Supported Regions

The solution works in any region with ECS Fargate support. Bedrock access is automatically handled through cross-region inference profiles.

⏭️ Next Step

Once you’ve verified all prerequisites:

Continue to Bedrock Setup →

1.2 - Enable Amazon Bedrock

Configure Amazon Bedrock Claude model access for AI capabilities

Enable Amazon Bedrock

🤖 Why Bedrock is Required

Ohlala SmartOps uses Amazon Bedrock with Claude Sonnet 4 to provide:

  • Natural language understanding of your commands
  • Intelligent analysis of infrastructure issues
  • Smart recommendations for optimization
  • Context-aware responses based on your environment

📋 Step-by-Step Setup

1. Open Amazon Bedrock Console

Navigate to the Amazon Bedrock console in your deployment region:

Open Amazon Bedrock Console ↗️

2. Navigate to Model Access

In the left sidebar, click on “Model access”

3. Enable Claude Sonnet 4

  1. Click “Modify model access”
  2. Find and enable:
    • Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0)
  3. Click “Submit” to request access

4. Wait for Approval

  • Standard models are usually approved immediately
  • Wait for status to show “Access granted”
  • Refresh the page if needed

Amazon Bedrock Model access page showing enabled Claude Sonnet 4 with “Access granted” status

🌍 Regional Considerations

How Regional Access Works

Ohlala SmartOps automatically handles regional model access:

  1. Detects your deployment region
  2. Uses the optimal inference profile for your location
  3. No additional configuration needed
  4. Same performance across all regions

Primary Regions (Native Support)

Best performance in these regions:

  • US East (N. Virginia) - us-east-1
  • US West (Oregon) - us-west-2
  • Europe (Frankfurt) - eu-central-1
  • Europe (Ireland) - eu-west-1
  • Asia Pacific (Tokyo) - ap-northeast-1
  • Asia Pacific (Sydney) - ap-southeast-2

All Other Regions

Supported via cross-region inference profiles:

  • Europe (Paris) - eu-west-3
  • Europe (London) - eu-west-2
  • Asia Pacific (Singapore) - ap-southeast-1
  • Asia Pacific (Mumbai) - ap-south-1
  • Canada (Central) - ca-central-1
  • South America (São Paulo) - sa-east-1
  • And more…

🔍 Verify Access

Check Model Status

  1. Return to the Model access page
  2. Verify Claude Sonnet 4 shows “Access granted”
  3. Note the model ID for reference: anthropic.claude-sonnet-4-20250514-v1:0

❗ Troubleshooting

Model Not Available

  • Issue: Claude Sonnet 4 not listed
  • Solution: Check you’re in a supported region

Access Denied After Deployment

  • Issue: Bot shows “Model access required” error
  • Solution:
    1. Enable model access as shown above
    2. Wait for “Access granted” status
    3. No need to redeploy - the bot will work automatically

Request Pending Too Long

  • Issue: Status stuck on “Pending”
  • Solution:
    1. Cancel and resubmit the request
    2. Contact AWS Support if issues persist

💡 Good to Know

Model Costs

  • Input: $3.00 per million tokens
  • Output: $15.00 per million tokens
  • Average command uses ~500-2000 tokens
  • Built-in cost tracking alerts you at $5 milestones

Alternative Models

Currently, only Claude Sonnet 4 is supported. Support for additional models may be added in future releases.

⏭️ Next Step

Once Bedrock access is enabled:

Continue to AWS Marketplace →

1.3 - AWS Marketplace Subscription

Subscribe to Ohlala SmartOps through AWS Marketplace

AWS Marketplace Subscription

Subscribe to Ohlala SmartOps through AWS Marketplace to get the official deployment package.

📦 What You’ll Get

The AWS Marketplace subscription provides:

  • ✅ Official CloudFormation template
  • ✅ Pre-built container images
  • ✅ Automatic updates
  • ✅ AWS support integration
  • ✅ Simplified billing through AWS

📋 Subscription Steps

1. Navigate to AWS Marketplace

Open the Ohlala SmartOps product page:

AWS Marketplace - Ohlala SmartOps ↗️

2. View Purchase Options

Click “View purchase options” to start the subscription process.

AWS Marketplace product page with “View purchase options” button highlighted

3. Subscribe to the Product

Scroll down and click “Subscribe” to accept the terms.

AWS Marketplace subscription confirmation page with “Subscribe” button highlighted

4. Launch Your Software

Once subscribed, click “Launch your software” to proceed with deployment.

AWS Marketplace subscription confirmation page with “Launch your software” button highlighted

5. Download CloudFormation Template

  1. Select “Amazon ECS” as the launch method
  2. Click the “cloudformation template” link to download
  3. Save the template file locally - you’ll need it in the next steps
  4. Alternatively, you can download the cloudformation template here: Download Template

AWS Marketplace launch options page with “cloudformation template” link highlighted

🔍 Verify Subscription

Check Subscription Status

  1. Go to AWS Marketplace → Manage subscriptions
  2. Find “Ohlala SmartOps” in your subscriptions
  3. Verify status shows “Active”

Download Template Backup

❓ Common Questions

Q: Can I cancel anytime?

A: Yes, you can cancel the subscription anytime through AWS Marketplace. You only pay for resources used.

Q: Is there a free trial?

A: Yes, 30 days free trial is available. After that, standard pricing applies.

Q: Can I deploy multiple instances?

A: Yes, you can deploy multiple stacks using the same subscription. Contact us for volume licensing.

Q: How do updates work?

A: Updates are provided through new versions in Marketplace. You can update at your convenience.

⏭️ Next Step

With your subscription active and template downloaded:

Continue to Azure Bot Setup →

1.4 - Azure Bot Registration

Create and configure your Microsoft Teams bot in Azure

Azure Bot Registration

Set up the Azure Bot that will connect Ohlala SmartOps to Microsoft Teams.

🎯 What You’ll Create

  • Azure Bot resource for Teams integration
  • Authentication credentials (App ID, Password, Tenant ID)
  • Secure communication channel with Teams

📋 Step-by-Step Setup

1. Access Azure Portal

Navigate to Azure Portal and sign in:

https://portal.azure.com ↗️

2. Create Resource Group (Optional)

It’s recommended to create a dedicated resource group:

  1. Search for “Resource groups” in the search bar
  2. Click “Create”
  3. Configure:
    • Subscription: Your Azure subscription
    • Resource group: ohlala-smartops-rg
    • Region: Choose any region (e.g., North Europe)
  4. Click “Review + create” then “Create”

Azure Portal Resource Groups page with “Create” button highlighted

3. Create Azure Bot

Search for Azure Bot

In the Azure Portal search bar, type “Azure Bot” and select it from the marketplace.

Azure Portal search bar with “Azure Bot” typed in

Configure Bot Settings

Fill in the bot configuration:

Bot handle: OhlalaSmartOps
Subscription: Your Azure subscription
Resource group: ohlala-smartops-rg (or your chosen group)
Location: North Europe (or your preferred region)
Pricing tier: F0 (Free)
Type: Single Tenant (default)
Microsoft App ID: Create new

Azure Bot creation page with required fields filled in

Click “Review + create” then “Create”. Deployment takes about 1-2 minutes.

4. Get Authentication Credentials

After deployment completes, go to your bot resource.

  1. Go to Settings → Configuration
  2. You’ll see the Microsoft App ID - copy and save this

Azure Bot Configuration page with “Create new” button highlighted under “Microsoft App ID”

Create App Password

  1. Click “Manage Password” next to the App ID
  2. In the new window, click “New client secret”

Azure Bot Client Secret creation page with “New client secret” button highlighted

  1. Configure the secret:
    • Description: Ohlala SmartOps Bot Secret
    • Expires: Choose duration (recommend 24 months)

Azure Bot Client Secret creation page with description and expiration fields filled in

  1. Click “Add”
  2. IMPORTANT: Copy the secret value immediately!

Azure Bot Client Secret creation page with “Copy to clipboard” button highlighted

Get Tenant ID

The Tenant ID is shown in the Azure Portal:

  1. Click on your account menu (top right)
  2. Select “Switch directory”
  3. Your Tenant ID is displayed there

Alternatively:

  1. Go to Azure Active Directory
  2. The Tenant ID is on the overview page

📝 Save Your Credentials

You now have three critical values needed for deployment:

CredentialWhere to FindExample
Microsoft App IDBot Configuration page12345678-1234-1234-1234-123456789012
Microsoft App PasswordClient secrets (copied)AbC123... (long string)
Microsoft App Tenant IDAzure AD or account menu87654321-4321-4321-4321-210987654321

5. Configure Teams Channel

1. Open Channels Page

In your Azure Bot resource, navigate to Channels in the left sidebar.

Azure Portal Channels page with “Microsoft Teams” icon highlighted

2. Add Microsoft Teams Channel

  1. Click on the Microsoft Teams icon
  2. Accept the terms and click “Agree”

Azure Bot Microsoft Teams channel configuration page with “Agree” button highlighted

  1. Click on Apply

Azure Bot Microsoft Teams channel configuration page with “Apply” button highlighted

❓ Common Issues

Issue: Free Tier Not Available

Solution: F0 tier is limited to one per subscription. Use S1 (Standard) tier instead (~$0.50/month).

Issue: Can’t Create App Password

Solution: You need appropriate permissions in Azure AD. Contact your Azure administrator.

Issue: Lost App Password

Solution: You can create a new client secret:

  1. Go to Bot Configuration → Manage Password
  2. Create a new client secret
  3. Update your deployment with the new password

⏭️ Next Step

With your Azure Bot configured and credentials saved:

Continue to CloudFormation Deployment →

1.5 - Deploy CloudFormation Stack

Deploy the Ohlala SmartOps infrastructure in AWS

Deploy CloudFormation Stack

Deploy the complete Ohlala SmartOps infrastructure using the CloudFormation template from AWS Marketplace.

📦 What Gets Deployed

The CloudFormation stack creates:

  • ECS Fargate cluster with container services
  • API Gateway for Teams webhook
  • Network infrastructure (VPC, subnets, security groups)
  • IAM roles with appropriate permissions
  • Secrets Manager for credentials
  • CloudWatch logs for monitoring

📋 Deployment Steps

1. Open CloudFormation Console

Navigate to CloudFormation in your target region:

https://console.aws.amazon.com/cloudformation/home ↗️

2. Create New Stack

Click “Create stack” and choose “With new resources (standard)”

CloudFormation Create Stack page with “Create stack” button highlighted

3. Upload Template

  1. Select “Choose an existing template”
  2. Select “Upload a template file”
  3. Click “Choose file” and select the template downloaded from AWS Marketplace
  4. Click “Next”

CloudFormation Create Stack page with “Choose an existing template” option and file upload highlighted

4. Configure Stack Parameters

Stack Name

Enter a unique stack name: OhlalaSmartOps (or your preference)

Required Parameters

Fill in the mandatory parameters:

ParameterDescriptionExample/Value
DeploymentModeVPC configurationNewVPC (recommended)
ContainerImageTagVersion to deployv1.0.15 (default)
MicrosoftAppIdFrom Azure Bot setupYour App ID
MicrosoftAppPasswordFrom Azure Bot setupYour App Password
MicrosoftAppTenantIdFrom Azure Bot setupYour Tenant ID

CloudFormation Stack Parameters page with fields filled in

VPC Configuration (if NewVPC)

Keep defaults or customize:

  • VPCCIDR: 10.0.0.0/16
  • PublicSubnet1CIDR: 10.0.1.0/24
  • PublicSubnet2CIDR: 10.0.2.0/24
  • PrivateSubnet1CIDR: 10.0.10.0/24
  • PrivateSubnet2CIDR: 10.0.11.0/24
  • EnableNATGateway: true

Click “Next”

5. Configure Stack Options

On the stack options page:

  • Tags: (Optional) Add tags for resource organization
  • Permissions: Leave default
  • Advanced options: Leave default

Click “Next”

6. Review and Create

  1. Review all settings
  2. Check the acknowledgment box:
    • I acknowledge that AWS CloudFormation might create IAM resources with custom names

CloudFormation Stack Review page with acknowledgment checkbox highlighted

  1. Click “Submit”

CloudFormation Stack Review page with “Submit” button highlighted

7. Monitor Deployment

The stack creation takes 5-10 minutes. Monitor progress:

  1. Select your stack in the CloudFormation console
  2. Check the Events tab for real-time updates
  3. Wait for status: CREATE_COMPLETE

📊 Get Stack Outputs

Once deployment completes, get the important URLs:

  1. Select your stack
  2. Go to the Outputs tab
  3. Save these values:
OutputDescriptionUse
TeamsWebhookURLAPI Gateway endpointConfigure in Azure Bot
APIGatewayEndpointBase API URLReference only
ECSClusterCluster nameFor monitoring
ECSServiceService nameFor monitoring

🔍 Verify Deployment

Check ECS Service

  1. Go to ECS Console → Clusters
  2. Find your cluster (e.g., OhlalaSmartOps-Cluster-...)
  3. Check service shows 1 running task

Check API Gateway

  1. Go to API Gateway Console
  2. Find your API (e.g., OhlalaSmartOps-API-...)
  3. Verify endpoints are created

Check Health Endpoint

Test the health endpoint (no authentication required):

curl https://your-api-id.execute-api.region.amazonaws.com/prod-stackname/health

Should return: {"status": "healthy"}

❓ Troubleshooting

Stack Creation Failed

IAM Role Already Exists

Error: “Resource of type ‘AWS::IAM::Role’ with identifier already exists” Solution: Use a different stack name, or delete the existing role first

Insufficient Permissions

Error: “User is not authorized to perform: iam:CreateRole” Solution: Ensure you have admin permissions or required IAM policies

Service Quota Exceeded

Error: “Service quota exceeded” Solution: Request quota increase or deploy in different region

Stack Stuck in CREATE_IN_PROGRESS

  • Check Events tab for specific resource causing delay
  • ECS service can take 3-5 minutes to stabilize
  • If stuck >15 minutes, consider deleting and retrying

⏭️ Next Step

With infrastructure deployed and webhook URL ready:

Continue to Teams Integration →

1.6 - Connect to Microsoft Teams

Configure the webhook and install the bot in Microsoft Teams

Connect to Microsoft Teams

Link your deployed infrastructure with Microsoft Teams to enable chat-based infrastructure management.

🔗 What You’ll Configure

  • Azure Bot webhook endpoint
  • Teams channel connection
  • Bot app installation
  • Initial testing

📋 Integration Steps

1. Configure Webhook in Azure Bot

Get the Webhook URL

From your CloudFormation stack outputs, copy the TeamsWebhookURL:

https://xxx.execute-api.region.amazonaws.com/prod-stackname/api/messages

Update Bot Configuration

  1. Go to Azure Portal ↗️
  2. Navigate to your Azure Bot resource
  3. Go to Configuration under Settings
  4. Set Messaging endpoint to your webhook URL
  5. Click Apply to save

Azure Bot Configuration page with Messaging endpoint field highlighted

2. Install Teams App

  1. Download the Teams app package:

    Ohlala SmartOps Teams App ↗️

  2. Customize the manifest:

    • Extract the zip file
    • Edit manifest.json
    • Replace YOUR_APP_ID with your Microsoft App ID
    • Re-zip the files
  3. Install in Teams:

    • Open Microsoft Teams
    • Go to Apps → Manage your apps
    • Click Upload an app
    • Select Upload a custom app
    • Choose your zip file
    • Click Add to install

N.B.: You can also ask your Teams admin to upload the app for you if you lack permissions on Teams Admin portal ↗️

4. Add Bot to Team or Chat

For Personal Use

  1. Find Ohlala SmartOps in your apps
  2. Click Add
  3. Start chatting directly with the bot

For Team Use

  1. Go to your team
  2. Click (More options) → Manage team
  3. Go to Apps tab
  4. Click Upload a custom app
  5. Select your app
  6. Click Add to team

🧪 Test the Connection

Send Test Message

In Teams, message the bot:

@Ohlala SmartOps hello

Expected response:

👋 Hello! I'm Ohlala SmartOps, your AI-powered AWS infrastructure assistant.
Type '/help' to see what I can do for you.

Test Basic Command

Try a simple command:

@Ohlala SmartOps /help

Teams chat showing bot responding to /help command with list of available commands

The bot should respond with a help card showing available commands.

🔍 Verify Integration

Check Connection Status

In Azure Portal

  1. Go to your bot → Channels
  2. Microsoft Teams should show Running
  3. Click Microsoft Teams to see activity

In AWS Console

  1. Go to CloudWatch → Log Groups
  2. Find /aws/ecs/ohlala-smartops-...
  3. Check for incoming request logs

Monitor API Gateway

  1. Go to API Gateway Console
  2. Select your API
  3. Go to Dashboard
  4. You should see incoming requests when messaging the bot

❓ Troubleshooting

Bot Not Responding

Check Webhook URL

  • Verify URL in Azure Bot Configuration matches CloudFormation output exactly
  • Ensure it includes the full path with /api/messages

Check ECS Service

  • Go to ECS Console
  • Verify service has 1 running task
  • Check task logs for errors

Test Health Endpoint

curl https://your-api.execute-api.region.amazonaws.com/prod-stackname/health

“Service Unavailable” Error

Causes:

  • ECS task not running
  • API Gateway misconfigured
  • Authentication failing

Solution:

  1. Check ECS service is running
  2. Verify API Gateway deployment
  3. Check CloudWatch logs for details

Authentication Errors

Symptoms: 401 or 403 errors in logs

Solution:

  1. Verify Microsoft App credentials in Secrets Manager
  2. Ensure Tenant ID is correct
  3. Check Lambda authorizer logs

Teams App Installation Issues

“App not found”:

  • Ensure manifest.json has correct App ID
  • Verify bot is published in Azure

“Permissions required”:

  • Contact Teams admin to allow custom apps
  • Check organizational app policies

🎉 Success Checklist

Confirm everything is working:

  • Azure Bot shows Teams channel as Running
  • Webhook URL configured correctly
  • Teams app installed successfully
  • Bot responds to messages
  • Help command works
  • No errors in CloudWatch logs

⏭️ Next Step

Your bot is connected! Now let’s verify everything and run your first commands:

Continue to Verification & Testing →

1.7 - Verification & Testing

Confirm your deployment and run first commands

Verification & Testing

Congratulations on deploying Ohlala SmartOps! Let’s verify everything is working and explore the capabilities.

✅ Deployment Checklist

Before testing commands, verify each component:

AWS Infrastructure

  • CloudFormation stack shows CREATE_COMPLETE
  • ECS service has 1 running task
  • API Gateway endpoint responds to health check
  • CloudWatch logs are being created
  • Secrets Manager contains credentials

Azure & Teams

  • Azure Bot shows Teams channel as Running
  • Webhook URL is configured correctly
  • Teams app is installed
  • Bot appears in Teams chat/channel

Bedrock

  • Claude Sonnet 4 shows Access granted
  • Model is available in deployment region

🎯 Your First Commands

1. Test Connection

@Ohlala SmartOps hello

Expected Response: Friendly greeting confirming the bot is working

2. Get Help

@Ohlala SmartOps help

Expected Response: Interactive card with available commands and examples

3. Check Instance Status

@Ohlala SmartOps show me my EC2 instances

Expected Response: List of your EC2 instances with status information

4. Health Report

@Ohlala SmartOps /health

Expected Response: Detailed health metrics for your instances

5. Natural Language Query

@Ohlala SmartOps which instances are running in us-east-1?

Expected Response: Filtered list based on your query

🔍 Advanced Testing

Test SSM Integration

@Ohlala SmartOps check disk space on i-1234567890abcdef0
  • Verifies SSM command execution
  • Returns disk usage information

Test Cost Analysis

@Ohlala SmartOps analyze my EC2 costs
  • Checks CloudWatch metrics access
  • Provides cost optimization suggestions

Test Multi-Instance Commands

@Ohlala SmartOps show me all stopped instances
  • Tests filtering and analysis capabilities
  • Demonstrates natural language understanding

📊 Monitoring Your Deployment

CloudWatch Metrics

Monitor key metrics in CloudWatch:

  1. ECS Service

    • CPU utilization (should be <50%)
    • Memory utilization (should be <70%)
    • Task count (should be 1)
  2. API Gateway

    • Request count
    • 4XX/5XX errors (should be minimal)
    • Latency (should be <3 seconds)
  3. Bedrock Usage

    • Token consumption
    • API throttling events
    • Model invocation errors

🚨 Common Issues & Solutions

Issue: Bot Not Responding

Quick Diagnosis:

# Check health endpoint
curl https://your-api.execute-api.region.amazonaws.com/prod-stackname/health

Solutions:

  1. Check ECS task is running
  2. Verify webhook URL in Azure
  3. Ensure Teams app is installed
  4. Review CloudWatch logs

Issue: “Model Access Required” Error

Symptom: Bot responds but shows Bedrock error

Solution:

  1. Go to Bedrock Console → Model access
  2. Enable Claude Sonnet 4
  3. Wait for “Access granted”
  4. Retry command (no restart needed)

Issue: No Instances Found

Symptom: Bot works but doesn’t see EC2 instances

Checks:

  1. Instances are in same region as deployment
  2. Instances have SSM agent installed
  3. IAM permissions are correct
  4. Try: @Ohlala SmartOps list all instances in all regions

Issue: Commands Timeout

Symptom: Bot shows “thinking” but never responds

Solutions:

  1. Check ECS task memory/CPU
  2. Look for Bedrock throttling
  3. Verify network connectivity
  4. Scale ECS service if needed

Issue: Authentication Failures

Symptom: 401/403 errors in logs

Solutions:

  1. Regenerate Azure Bot credentials
  2. Update Secrets Manager
  3. Restart ECS service
  4. Check tenant ID is correct

Best Practices

  1. Start simple: Use basic commands first
  2. Be specific: Include instance IDs for targeted actions
  3. Use natural language: The bot understands context
  4. Review suggestions: Always verify before applying changes

🎉 Success Indicators

Your deployment is successful when:

  • ✅ Bot responds within 2-3 seconds
  • ✅ All test commands work
  • ✅ No errors in CloudWatch logs
  • ✅ Costs align with expectations
  • ✅ Team members can use the bot

📚 Next Steps

Now that your bot is working:

  1. Explore Features

    • Try advanced commands
    • Experiment with natural language queries
    • Review health and cost reports
  2. Train Your Team

    • Share the bot with team members
    • Create usage guidelines
    • Document common workflows

🆘 Getting Help

If you encounter issues:

  1. Check Documentation

  2. Contact Support

  3. Community Resources

🎊 Congratulations!

You’ve successfully deployed Ohlala SmartOps! Your AI-powered infrastructure assistant is ready to help manage your AWS environment through natural language conversations in Microsoft Teams.

Happy automating! 🤖

2 - Architecture & Limitations

System architecture, design decisions, and current limitations of Ohlala SmartOps

Architecture & Limitations

Understanding the system design, architectural decisions, and current limitations of Ohlala SmartOps.

🏗️ System Architecture

High-Level Overview

Ohlala SmartOps follows a containerized, serverless architecture designed for high availability and cost efficiency:

High-level architecture diagram showing user interaction with Teams, API Gateway, ECS Fargate, Bedrock, and AWS services

Container Architecture

Multi-Container Design with dedicated responsibilities:

Main Bot Container

  • Purpose: Teams integration, conversation orchestration, Bedrock AI
  • Port: 8000
  • Resources: 768 CPU units, 1536MB memory
  • Key Features:
    • Microsoft Bot Framework integration
    • Amazon Bedrock (Claude) orchestration
    • Conversation state management
    • Multi-language support

MCP AWS API Container

  • Purpose: Secure AWS operations via Model Context Protocol
  • Port: 8080
  • Resources: 256 CPU units, 512MB memory
  • Key Features:
    • AWS service abstractions
    • Permission-aware operations
    • Rate limiting and retry logic
    • Security-first design

🎯 Architecture Highlights

🚀 Fully Serverless

ECS Fargate + API Gateway eliminate infrastructure management overhead

  • Zero server maintenance - AWS handles all patching and scaling
  • Automatic scaling - Responds to demand without intervention
  • Pay-per-use pricing - Only pay for actual compute time
  • Note: ~30s cold start for new container instances

🔒 Security-First Design

Defense in depth with multiple security layers

  • Private subnets - Containers have no direct internet exposure
  • Isolated containers - Bot logic and AWS operations run separately
  • JWT validation - Lambda authorizer validates all requests
  • Secrets management - Credentials stored in AWS Secrets Manager
  • Least privilege IAM - Each component has minimal required permissions

📦 Microservices Architecture

Multi-container pattern for better maintainability

  • Main bot container - Handles Teams interactions and AI orchestration
  • MCP AWS container - Provides secure AWS API access
  • Clear boundaries - Each container has a single responsibility
  • Independent updates - Deploy changes without affecting other components

💾 Stateless by Design

No persistent storage keeps architecture simple

  • Reduced complexity - No database to manage or scale
  • Lower costs - No database charges or backup requirements
  • Horizontal scaling - Any container can handle any request
  • Trade-off: Conversation context resets on container restart

🌍 Regional Flexibility

Deploy anywhere with single-region stacks

  • Data sovereignty - Keep data in your required region
  • Low latency - Deploy close to your EC2 instances
  • Cost optimization - No cross-region data transfer fees
  • Simple disaster recovery - Deploy multiple independent stacks

⚡ High-Performance Networking

Optimized for Teams integration with enterprise-grade networking

  • Network Load Balancer - Layer 4 load balancing for minimal latency
  • VPC Link - Secure private connection from API Gateway
  • Auto-scaling - Network automatically handles traffic spikes
  • Health checks - Automatic failover for unhealthy containers

📊 Performance Characteristics

Response Times

  • Health Check: < 1 second
  • Simple Commands: 2-5 seconds
  • AI Analysis: 5-15 seconds
  • SSM Operations: 10-60 seconds (depending on command)

Throughput Limits

  • Concurrent Users: 1-20 (single task)
  • Commands/Day: 10,00+ (with proper scaling)
  • API Gateway: 10,000 requests/second (AWS limit)
  • Bedrock: 20 requests/minute per model (AWS limit)

Scaling Behavior

  • Auto-scaling: ECS service set to auto-heal (1 task)
  • Cold start: ~30 seconds for new tasks

⚠️ Current Limitations

1. Session Management

  • Issue: No persistent conversation history
  • Impact: Context lost on container restart
  • Workaround: Keep conversations short and focused

2. Multi-Region Support

  • Issue: Single region deployment only
  • Impact: No built-in disaster recovery
  • Workaround: Deploy multiple stacks in different regions

5. Cold Start Latency

  • Issue: 30+ second delay for new container starts
  • Impact: First request after idle period is slow
  • Workaround: Keep minimum 1 task running always
  • Mitigation: ECS warmup targets available

🔒 Security Architecture

Network Security

  • Private Subnets: Containers have no direct internet access
  • Security Groups: Restrictive ingress/egress rules
  • VPC Endpoints: Secure access to AWS services

Authentication & Authorization

  • Teams Authentication: Microsoft Bot Framework JWT validation
  • AWS Permissions: IAM roles with least-privilege access
  • Inter-Container: Shared API key for MCP communication
  • Secrets: AWS Secrets Manager for sensitive data

Data Protection

  • Encryption in Transit: TLS 1.2+ for all communication
  • Encryption at Rest: EBS volumes encrypted by default
  • Logging: CloudWatch Logs with retention policies
  • Audit Trail: All AWS API calls logged via CloudTrail

📖 Technical References

Container Images

  • Registry: Amazon ECR
  • Repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/ohlala-automation-solutions/
  • Tags: Version-based (v1.0.0, v1.1.0, etc.)

Monitoring & Observability

  • Metrics: CloudWatch Container Insights
  • Logs: Structured JSON logging to CloudWatch
  • Health Checks: HTTP endpoints on both containers
  • Alarms: CPU, Memory, Error Rate monitoring

Backup & Recovery

  • Container Images: Immutable, versioned in ECR
  • Infrastructure: CloudFormation templates in version control
  • Configuration: Environment variables and secrets
  • No Persistent Data: Stateless design eliminates backup needs

📚 Additional Resources

Need Help?

3 - SmartOps Features & Security

Comprehensive guide to Ohlala SmartOps features with emphasis on the approval system that ensures infrastructure safety

SmartOps Features & Security

Discover the powerful capabilities of Ohlala SmartOps and understand how our approval system ensures your infrastructure remains safe while providing seamless AI-powered management.

🎯 Core Capabilities

🔍 Infrastructure Discovery

  • Automatic EC2 Detection: Zero-configuration discovery of SSM-enabled instances
  • Tag-Based Organization: Intelligent grouping by environment, application, and team
  • Multi-Region Support: Manages instances across all supported AWS regions

💰 Cost Intelligence

  • Usage Analysis: Deep dive into actual vs. provisioned capacity
  • AI-Powered Recommendations: ML-driven rightsizing suggestions
  • Savings Calculations: Precise cost impact modeling with confidence intervals

🔧 Smart Troubleshooting

  • AI-Guided Diagnostics: Step-by-step issue resolution assistance
  • Remote Command Execution: Secure SSM-based command execution with approval
  • Pattern Recognition: Intelligent problem identification and solution suggestions

📊 On-Demand Analytics

  • Health Assessments: Infrastructure status reports when requested
  • Performance Insights: Capacity planning and optimization recommendations
  • Custom Reports: Team-specific views and executive summaries

📖 Detailed Feature Documentation

💰 Cost Optimization →

Comprehensive FinOps capabilities for EC2 cost management:

  • Rightsizing recommendations with usage pattern analysis
  • Reserved Instance planning and optimization
  • Schedule-based scaling opportunities
  • ROI calculations and savings tracking

📊 Monitoring & Analytics →

On-demand monitoring and reporting features:

  • Health reports and status dashboards
  • Performance metrics and trend analysis
  • Automated reporting and scheduled updates
  • Custom analytics and team-specific views

🔒 Security & Compliance →

Enterprise-grade security and audit capabilities:

  • Approval system deep dive
  • Complete audit trails and compliance reporting
  • Identity and access management integration
  • Security best practices and safeguards

🤖 AI & Safety Features

Intelligent Understanding

  • Natural Language Processing: Understands context and intent
  • Fuzzy Matching: Handles typos and variations in commands
  • Context Awareness: Remembers conversation history for follow-ups

Safety by Design

  • Read-First Policy: All operations require explicit confirmation
  • Risk Assessment: AI evaluates potential impact before actions
  • Audit Trail: Complete logging with user identity tracking

🚀 Quick Start

Try These Commands

@Ohlala SmartOps what instances do I have?
@Ohlala SmartOps show me a health report
@Ohlala SmartOps analyze my EC2 costs
@Ohlala SmartOps which instances need attention?

Best Practices

  1. Start with Read-Only: Explore monitoring features first
  2. Use Natural Language: Don’t worry about exact syntax
  3. Review Before Approving: Always check what commands will do
  4. Ask Follow-ups: Build on previous responses for context

🔗 Integration Capabilities

Native AWS Services

  • EC2: Complete instance lifecycle management
  • Systems Manager: Secure command execution
  • CloudWatch: Metrics collection and analysis
  • Cost Explorer: Detailed cost analysis
  • Bedrock: AI-powered insights

Communication Platforms

  • Microsoft Teams: Primary chat interface with full feature support
  • Azure AD: Enterprise identity and access management
  • Slack: Coming soon with comparable feature set

🎯 Key Benefits

Operational Efficiency

  • Streamlined workflows with AI-powered assistance
  • Faster incident response through automated discovery and analysis
  • Reduced manual overhead for routine infrastructure tasks

Infrastructure Optimization

  • Cost optimization recommendations based on actual usage patterns
  • Right-sizing suggestions for underutilized resources
  • Proactive monitoring to identify optimization opportunities

📖 Next Steps

Explore Features in Detail

Get Started

Need Help?

4 - Bot Commands & Examples

Complete guide to Ohlala SmartOps chat commands and conversation examples for Microsoft Teams. Learn natural language patterns and see real responses.

Bot Commands & Examples

Complete guide to chatting with Ohlala SmartOps in Microsoft Teams. Learn natural language patterns, see example conversations, and understand how the AI responds to your infrastructure questions.

🤖 Command Overview

SmartOps understands both natural language and specific commands. You can interact in three ways:

  1. Natural Language: “Show me instances that are running high on CPU”
  2. Direct Commands: “list instances”, “health report”
  3. Contextual Queries: Follow-up questions based on previous responses

🛡️ Safety Through Approval System

📖 Documentation Sections

⚡ Slash Commands →

Built-in commands for quick access to common operations:

  • Essential commands (/help, /status, /instances)
  • Information commands (/version, /regions, /limits)
  • Utility commands (/clear, /settings, /feedback)
  • Support commands (/debug, /contact)

📝 Command Examples →

Detailed examples of all available commands with natural language variations and expected responses:

  • Instance management (list, describe, control)
  • Health monitoring and troubleshooting
  • Cost optimization and rightsizing
  • Remote command execution

🎯 Natural Language Features →

Learn how SmartOps understands context and intent:

  • Context awareness and fuzzy matching
  • Intent recognition patterns
  • Follow-up conversations
  • Handling typos and variations

🚀 Quick Start Commands

Try these commands to get started:

Basic Information

@Ohlala SmartOps help
@Ohlala SmartOps what instances do I have?
@Ohlala SmartOps show me a health report

Natural Language

@Ohlala SmartOps which instances need attention?
@Ohlala SmartOps how much am I spending on EC2?
@Ohlala SmartOps help me troubleshoot my web server

Follow-up Questions

After any response, you can ask follow-up questions like:

  • “Show me more details about that”
  • “What would you recommend?”
  • “Can you help me fix this?”

💡 Best Practices

  1. Start Simple: Begin with read-only commands to get familiar
  2. Use Natural Language: Don’t worry about exact syntax
  3. Ask Follow-ups: Build on previous responses for context
  4. Review Before Approving: Always check what commands will do

📖 Next Steps

Need Help?

4.1 - Slash Commands

Complete reference for built-in slash commands and their usage

Slash Commands Reference

Ohlala SmartOps includes several built-in slash commands that provide quick access to common operations and information.

🚀 Essential Commands

/help

Purpose: Display all available commands and features

Usage:

/help
/help [command] - Show detailed help for specific command

Response: Interactive adaptive card showing:

  • All available slash commands
  • Natural language command examples
  • Quick action buttons for common operations
  • Localized content based on user’s Teams language

Help command showing interactive card with all available bot features


/instances

Purpose: List all EC2 instances with interactive management options

Usage:

/instances

Response: Interactive card displaying:

  • Instance IDs, names, and tags
  • Current state (running, stopped, etc.)
  • Instance type and platform
  • SSM connectivity status
  • Quick action buttons for each instance

Instances command showing interactive list of EC2 instances with action buttons


/health

Purpose: Comprehensive health dashboard for instances

Usage:

/health - Show health dashboard for all instances
/health [instance-id] - Show health for specific instance

Response: Rich dashboard featuring:

  • CPU, memory, and disk usage metrics
  • SSM agent connectivity status
  • Visual health indicators and charts
  • System performance trends
  • CloudWatch metrics integration

Health command displaying comprehensive dashboard with metrics and charts


/rightsizing

Purpose: Cost optimization and rightsizing recommendations

Usage:

/rightsizing

Response: Cost optimization dashboard with:

  • Current instance utilization analysis
  • Rightsizing recommendations
  • Potential cost savings calculations
  • Instance type upgrade/downgrade suggestions
  • CloudWatch metrics-based insights

Rightsizing command showing cost optimization recommendations and potential savings


📊 Monitoring & Management Commands

/status

Purpose: Show pending commands and recent activity

Usage:

/status

Response: Command status dashboard showing:

  • Currently pending SSM commands
  • Recent command execution history
  • Command success/failure rates
  • AWS Console links for detailed monitoring
  • Elapsed time for running operations

Status command displaying pending operations and recent activity


/history

Purpose: View detailed command execution history

Usage:

/history

Response: Comprehensive command history with:

  • Past command executions
  • Success/failure status
  • Detailed results and outputs
  • Timestamp and user information
  • Filtering and search capabilities

History command showing detailed execution history and results


/token-usage

Purpose: Monitor Bedrock AI token usage and costs

Usage:

/token-usage

Response: Token usage analytics including:

  • Current billing period usage
  • Token consumption trends
  • Cost breakdown by operation type
  • Usage limits and quotas
  • Optimization recommendations

Token usage command showing AI usage analytics and cost breakdown


📝 Command Tips

Quick Access

  • Type / in Teams to see all available slash commands
  • Commands are case-insensitive: /help = /HELP = /Help
  • Use Tab completion in Teams for faster command entry

Combining with Natural Language

You can follow slash commands with natural language for more specific requests:

/health show me only instances with high CPU usage
/instances filter by production environment
/help with cost optimization

Command Parameters

  • Most commands work without parameters for overview information
  • Add instance IDs for specific instance details: /health i-1234567890abcdef0
  • Use /help [command] for detailed usage instructions

Command History

  • Use ↑ (up arrow) in Teams to repeat recent commands
  • All commands are logged for audit purposes
  • Interactive cards maintain state for better user experience

🔍 Command Comparison

CommandSpeedDetail LevelBest For
/instances⚡ Fast📊 InteractiveInstance management
"show me my instances"🐌 Slower📖 ConversationalAnalysis & insights
/health⚡ Fast📈 DashboardHealth monitoring
"which instances need attention?"🐌 Slower🔍 AI AnalysisTroubleshooting
/status⚡ Fast📋 CurrentOperation tracking

🚨 Error Handling

Common Issues

Command not recognized:

Unknown command: /instaces
Did you mean: /instances?

Missing permissions:

❌ Insufficient AWS permissions for this operation
Contact your administrator to review IAM policies

Service unavailable:

⚠️ AWS services temporarily unavailable
Try again in a few moments or use /status for details

Recovery Steps

  1. Check spelling - Commands must be exact
  2. Verify AWS permissions - Commands require proper IAM roles
  3. Try /status - Check if services are operational
  4. Use /help - See all available commands

📖 Next Steps

Learn More

Quick Start

Try these commands right now in Teams:

  1. /help - See what’s available
  2. /instances - View your EC2 instances with interactive controls
  3. /health - Check instance health dashboard
  4. /rightsizing - Discover cost optimization opportunities
  5. "show me instances that need attention" - Try natural language

🔄 Advanced Usage

Command Workflows

Combine slash commands for powerful workflows:

1. /instances → Click instance → View health details
2. /health → Identify issues → Use natural language for troubleshooting
3. /rightsizing → Review recommendations → Ask for implementation help
4. /status → Monitor ongoing operations → /history for detailed results

Interactive Features

  • Action Buttons: Most commands include interactive buttons for common actions
  • Context Preservation: Commands remember your selections for follow-up questions
  • Real-time Updates: Health and status information refreshes automatically
  • Multi-language Support: Commands adapt to your Teams language preference

Need Help?

4.2 - Command Examples & Usage

Detailed examples of all SmartOps commands with natural language variations and expected responses for EC2 management in Teams.

Command Examples & Usage

Comprehensive examples of all SmartOps commands with natural language variations and detailed response formats.

📝 Command Categories

Instance Management

List Instances

Shows all EC2 instances with current status and basic metrics.

Natural Language Examples:

  • “What instances do I have?”
  • “Show me all EC2 instances”
  • “List my servers”

Direct Command:

@Ohlala SmartOps list instances

Response Format:

📊 EC2 Instance Summary
Found 5 instances in us-east-1

✅ web-server-01 (i-0abc123def)
   Type: t3.medium | State: running
   CPU: 45% | Memory: 62% | Disk: 38%

⚠️ database-01 (i-0def456ghi)
   Type: m5.large | State: running
   CPU: 78% | Memory: 85% | Disk: 72%

[... more instances ...]

Get Instance Details

Detailed information about a specific instance.

Natural Language Examples:

  • “Tell me about instance i-0abc123def”
  • “Show details for web-server-01”
  • “What’s the configuration of my database server?”

Direct Command:

@Ohlala SmartOps describe instance <instance-id>

Response Format:

📋 Instance Details: web-server-01

Instance ID: i-0abc123def
Type: t3.medium (2 vCPU, 4 GB RAM)
State: running (since 2024-03-15 10:30 UTC)
Platform: Amazon Linux+
AZ: us-east-1a
Private IP: 10.0.1.45
Public IP: 54.123.45.67

Tags:
- Name: web-server-01
- Environment: production
- Team: platform

Monitoring:
- CPU: 45% (avg last hour)
- Memory: 62% (current)
- Network In: 125 MB/hour
- Network Out: 450 MB/hour

Health Monitoring

Health Report

Comprehensive health status of all instances.

Natural Language Examples:

  • “Show me the health report”
  • “How healthy are my instances?”
  • “Give me a status update”

Direct Command:

@Ohlala SmartOps health report

Response Format:

🏥 Infrastructure Health Report
Generated: 2024-03-20 14:30 UTC

Overall Health: ⚠️ ATTENTION NEEDED

Summary:
✅ Healthy: 12 instances
⚠️ Warning: 3 instances
❌ Critical: 1 instance

Issues Requiring Attention:

❌ CRITICAL: app-server-03
   - CPU: 95% (sustained for 30 min)
   - Action: Consider scaling or investigating process

⚠️ WARNING: database-01
   - Disk: 85% full
   - Action: Clean up logs or expand storage

⚠️ WARNING: web-cache-02
   - Memory: 88% utilized
   - Action: Monitor for OOM issues

📈 Trends:
- CPU usage up 15% from yesterday
- 2 new instances added this week
- Cost trending 8% over budget

Instance Health Check

Check health of specific instance.

Natural Language Examples:

  • “Is web-server-01 healthy?”
  • “Check the health of i-0abc123def”
  • “How is my database server doing?”

Direct Command:

@Ohlala SmartOps check health <instance-id>

Cost Optimization

Cost Analysis

Analyze EC2 costs and identify savings opportunities.

Natural Language Examples:

  • “Analyze my EC2 costs”
  • “Where can I save money?”
  • “Show me cost optimization opportunities”

Direct Command:

@Ohlala SmartOps cost analysis

Response Format:

💰 EC2 Cost Analysis Report
Period: Last 30 days

Current Spending:
- Total: $3,456.78
- On-Demand: $2,890.45 (84%)
- Reserved: $566.33 (16%)
- Spot: $0.00 (0%)

Top Recommendations:

1. 🎯 Right-size Overprovisioned Instances
   Potential Savings: $456/month (13%)

   - web-server-01: t3.medium → t3.small
     Current: 15% CPU avg → Save $28/month

   - test-server-02: m5.xlarge → m5.large
     Current: 8% CPU avg → Save $95/month

2. 💼 Purchase Reserved Instances
   Potential Savings: $890/month (26%)

   - 5 instances running 24/7
   - Recommend 1-year no upfront RIs

3. 🌙 Implement Schedule-Based Scaling
   Potential Savings: $234/month (7%)

   - Dev/test instances can be stopped nights/weekends
   - 10 instances identified

Total Potential Savings: $1,580/month (46%)

Rightsizing Recommendations

Get specific rightsizing suggestions.

Natural Language Examples:

  • “Which instances should I rightsize?”
  • “Show me oversized instances”
  • “Find underutilized servers”

Direct Command:

@Ohlala SmartOps rightsizing recommendations

Troubleshooting

Troubleshoot Instance

AI-guided troubleshooting for instance issues.

Natural Language Examples:

  • “My web server is slow”
  • “Help me troubleshoot i-0abc123def”
  • “Database connections are timing out”

Direct Command:

@Ohlala SmartOps troubleshoot <instance-id>

Interactive Response:

🔧 Troubleshooting Assistant

I'll help you troubleshoot web-server-01. Let me gather some information...

Current Status:
- Instance is running
- CPU: 45% (normal)
- Memory: 92% (HIGH)
- Disk I/O: Normal
- Network: Normal

⚠️ High memory usage detected!

Let me check what's consuming memory...
[Running diagnostic commands via SSM]

Top Memory Consumers:
1. java process: 2.8 GB (70%)
2. mysql: 650 MB (16%)
3. nginx: 120 MB (3%)

Recommendations:
1. Immediate: Restart the Java application
2. Short-term: Increase instance type to t3.large
3. Long-term: Investigate memory leak in application

Would you like me to:
A) Restart the Java application now
B) Show application logs
C) Create a snapshot before changes

Find Issues

Identify instances with problems.

Natural Language Examples:

  • “Which instances need attention?”
  • “Show me problematic servers”
  • “Find unhealthy instances”

Direct Command:

@Ohlala SmartOps find issues

Remote Execution

Execute Command

Run commands on instances via SSM.

Natural Language Examples:

  • “Run ‘df -h’ on web-server-01”
  • “Check disk space on all instances”
  • “Restart nginx on the web servers”

Direct Command:

@Ohlala SmartOps execute "<command>" on <instance-id>

Safety Features:

  • Confirmation required for all SSM commands
  • Commands run with limited privileges
  • Audit trail maintained
  • Output limited to 24,000 characters

Response Format:

🔨 Command Execution Request

Target: web-server-01 (i-0abc123def)
Command: systemctl restart nginx

⚠️ This command will restart the nginx service.
This may cause brief downtime.

Type 'yes' to confirm execution

[After confirmation]

✅ Command Executed Successfully

Output:
nginx.service - The nginx HTTP Server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled)
   Active: active (running) since Thu 2024-03-20 15:45:32 UTC

Execution Time: 1.2 seconds
Command ID: abc-def-ghi-123

Instance Control

Start Instance

Start stopped instances.

Natural Language Examples:

  • “Start web-server-01”
  • “Boot up the test environment”
  • “Turn on i-0abc123def”

Direct Command:

@Ohlala SmartOps start instance <instance-id>

Stop Instance

Stop running instances.

Natural Language Examples:

  • “Stop the dev server”
  • “Shut down test-instance-02”
  • “Turn off i-0abc123def”

Direct Command:

@Ohlala SmartOps stop instance <instance-id>

Safety Confirmation:

⚠️ Stop Instance Confirmation

You're about to stop: prod-database-01
Environment: production
Current connections: 45

This action will:
- Terminate all active connections
- Stop the instance (data on instance store volumes will be lost)
- Incur no further hourly charges

Type 'yes' to confirm stopping this instance

Reboot Instance

Restart instances gracefully.

Natural Language Examples:

  • “Reboot web-server-01”
  • “Restart my application server”
  • “Perform a soft reset on i-0abc123def”

Direct Command:

@Ohlala SmartOps reboot instance <instance-id>

Scheduling

Schedule Report

Set up automated daily reports.

Natural Language Examples:

  • “Send me a daily health report at 9 AM”
  • “Schedule cost reports every Monday”
  • “Set up morning status updates”

Direct Command:

@Ohlala SmartOps schedule daily report at <time>

Help and Information

Get Help

Show available commands and usage.

Natural Language Examples:

  • “Help”
  • “What can you do?”
  • “Show me available commands”

Direct Command:

@Ohlala SmartOps help

Response Format:

🤖 Ohlala SmartOps - Command Reference

I understand natural language! Just describe what you need.
You can also use these commands:

📊 Monitoring
• list instances - Show all EC2 instances
• health report - Comprehensive health status
• check health <id> - Check specific instance

💰 Cost Optimization
• cost analysis - Analyze spending
• rightsizing recommendations - Find savings

🔧 Troubleshooting
• troubleshoot <id> - AI-guided diagnostics
• find issues - Identify problems

🔨 Remote Execution
• execute "<command>" on <id> - Run via SSM

⚙️ Instance Control
• start/stop/reboot instance <id>

📅 Scheduling
• schedule daily report at <time>

💡 Tips:
- Use instance names or IDs
- Ask follow-up questions
- Natural language works best!

Need more help? Visit docs.ohlala.cloud

🔄 Advanced Features

Bulk Operations

Execute commands across multiple instances:

@Ohlala SmartOps execute "sudo yum update -y" on tag:Environment=dev

Filtering

Filter instances by various criteria:

@Ohlala SmartOps list instances where cpu > 80%
@Ohlala SmartOps find instances tagged Environment=production

Chaining Commands

Combine multiple operations:

@Ohlala SmartOps stop all dev instances then create ami backups

📖 Next Steps

Need Help?

4.3 - Natural Language Processing

Learn how SmartOps understands context and intent through natural language processing, fuzzy matching, and conversational AI.

Natural Language Processing

SmartOps uses Claude AI to understand context and intent, making infrastructure management feel like a natural conversation.

🎯 Natural Language Processing Features

SmartOps uses Amazon Bedrock’s Claude AI to understand context and intent. Examples:

Context Awareness

User: "Show me expensive instances"
Bot: [Lists instances sorted by cost]

User: "Which of those can be rightsized?"
Bot: [Understands "those" refers to expensive instances]

Intent Recognition

User: "My website is down"
Bot: "I'll help troubleshoot. Let me check your web servers..."
[Automatically identifies web-tagged instances and checks health]

Fuzzy Matching

User: "Check the databse server"
Bot: "Checking database-server-01..."
[Handles typos and variations]

🤖 How SmartOps Understands You

1. Intent Classification

SmartOps recognizes different types of requests:

Information Requests:

  • “What instances do I have?”
  • “Show me the current status”
  • “How much am I spending?”

Action Requests:

  • “Restart the web server”
  • “Stop the test instances”
  • “Update all development servers”

Troubleshooting Requests:

  • “My application is slow”
  • “Why is the database not responding?”
  • “Help me fix this error”

2. Context Tracking

SmartOps remembers conversation context:

Example Conversation:

User: "List my production instances"
Bot: [Shows 5 production instances]

User: "Which one has the highest CPU?"
Bot: "Among your production instances, web-prod-02 has the highest CPU at 78%"

User: "Show me more details about that one"
Bot: [Shows detailed info for web-prod-02]

User: "Can you help me optimize it?"
Bot: "I can help optimize web-prod-02. Let me analyze its usage patterns..."

3. Entity Recognition

SmartOps identifies specific entities in your requests:

Instance References:

  • Instance IDs: “i-0abc123def”
  • Instance names: “web-server-01”
  • Tags: “all production instances”
  • Roles: “database servers”, “web servers”

Time References:

  • “last week”, “yesterday”, “this month”
  • “since 9 AM”, “in the past hour”

Metric References:

  • “high CPU”, “low memory”, “disk space”
  • “expensive instances”, “underutilized servers”

💬 Conversation Patterns

Follow-up Questions

After any response, you can ask natural follow-ups:

After Health Report:

  • “What’s causing the high CPU on server-01?”
  • “How can I fix the memory issue?”
  • “Should I be worried about that warning?”

After Cost Analysis:

  • “How much would rightsizing save me?”
  • “Which instances should I prioritize?”
  • “Can you help me implement those recommendations?”

After Instance List:

  • “Which ones need updates?”
  • “Show me only the problematic ones”
  • “What would you recommend for optimization?”

Conversational Shortcuts

SmartOps understands casual language:

Informal Requests:

  • “What’s up with my servers?” → Health status check
  • “How are things looking?” → Overall status report
  • “Any issues I should know about?” → Problem identification
  • “Can you help me save money?” → Cost optimization analysis

Implicit Context:

  • “Restart it” (after discussing a specific instance)
  • “Show me more” (after any report or list)
  • “What do you think?” (asking for AI recommendations)

🔍 Advanced Language Features

Comparative Queries

User: "Which instances use more CPU than database-01?"
User: "Show me servers that cost more than $100/month"
User: "Find instances smaller than t3.medium"

Temporal Queries

User: "What happened to web-server-01 yesterday?"
User: "Show me cost trends over the last 3 months"
User: "Which instances were restarted this week?"

Conditional Logic

User: "If the CPU is over 80%, show me the top processes"
User: "When database connections exceed 50, check memory usage"
User: "Alert me if any production instance goes down"

Aggregation Requests

User: "Total cost of all development instances"
User: "Average CPU across web servers"
User: "Count of instances by environment"

🎭 Handling Ambiguity

When your request is unclear, SmartOps will ask for clarification:

Example Ambiguous Requests:

User: "Restart the server"
Bot: "I found 5 servers. Which one would you like me to restart?
- web-server-01 (production)
- web-server-02 (production)
- app-server-01 (staging)
- db-server-01 (production)
- test-server-01 (development)"

User: "Stop the expensive instances"
Bot: "I found 3 instances with costs over $200/month. Would you like to:
A) See details about these instances first
B) Stop all 3 instances
C) Stop only non-production instances"

🔧 Error Recovery

SmartOps handles common communication issues:

Typos and Misspellings

User: "chck the helth of web-srvr-01"
Bot: "Checking the health of web-server-01..."
[Automatically corrects and understands intent]

Incomplete Requests

User: "show instances"
Bot: "Here are your EC2 instances. Would you like me to:
- Show more details about any specific instance
- Filter by environment or status
- Include cost or performance information"

Mixed Languages (Limited)

User: "montrer les instances"
Bot: "I understand you want to see instances. Here are your EC2 instances..."
[Basic French recognition, responds in English]

📚 Learning from Context

SmartOps learns patterns within a conversation:

Preference Learning

User: "Show instances without IDs"
Bot: [Remembers preference for this conversation]

User: "Now show the database servers"
Bot: [Lists database servers without IDs, as previously requested]

Domain Knowledge

SmartOps understands infrastructure terminology:

Technical Terms:

  • “Load balancer”, “auto-scaling group”, “availability zone”
  • “RDS”, “EBS”, “VPC”, “security group”
  • “Production”, “staging”, “development” environments

Common Patterns:

  • “Scale up/down”, “failover”, “backup”
  • “Peak hours”, “maintenance window”, “scheduled downtime”
  • “Performance bottleneck”, “capacity planning”

🎯 Best Practices for Natural Communication

1. Be Conversational

  • Use natural language like you’re talking to a colleague
  • Don’t worry about exact command syntax
  • Feel free to ask follow-up questions

2. Provide Context

  • Mention environment: “production servers”, “dev instances”
  • Include timeframes: “since yesterday”, “this week”
  • Specify scope: “all web servers”, “just the database”

3. Ask for Clarification

  • If responses aren’t what you expected, ask for clarification
  • Request more details: “Can you explain that further?”
  • Ask for alternatives: “What other options do I have?”

4. Build on Previous Responses

  • Use “that instance”, “those servers”, “the expensive ones”
  • Ask follow-ups: “What would you recommend?”, “Should I be concerned?”

🔗 Integration with Commands

Natural language seamlessly integrates with command execution:

User: "The production web servers seem slow, can you help?"

Bot: "I'll investigate your production web servers. Let me check their current status...

Found 3 production web servers:
- web-prod-01: CPU 45%, Memory 78%
- web-prod-02: CPU 89%, Memory 92% ⚠️
- web-prod-03: CPU 52%, Memory 65%

web-prod-02 appears to have high resource usage. Would you like me to:
A) Show detailed process information
B) Restart the web services
C) Investigate recent logs
D) Recommend instance scaling"

📖 Next Steps

Need Help?

5 - Troubleshooting Guide

Solutions for common issues with Ohlala SmartOps deployment, Teams integration, and daily operations. Find quick fixes and detailed debugging steps.

Troubleshooting Guide

Quick solutions for common issues with Ohlala SmartOps. Use the search function (Ctrl+F) to find specific error messages.

🚨 Quick Diagnostics

Run this checklist to identify common issues:

  1. Check Service Health

    curl https://your-api-gateway-url/prod-{StackName}/health
    

    Expected: {"status": "healthy"}

  2. Verify CloudFormation Stack

    • AWS Console → CloudFormation
    • Stack status: CREATE_COMPLETE or UPDATE_COMPLETE
  3. Check ECS Service

    • AWS Console → ECS → Clusters
    • Service should have 1 running task
  4. Review Recent Logs

    • AWS Console → CloudWatch → Log Groups
    • Check /aws/ecs/ohlala-smartops-{StackName}

📊 CloudWatch Logs Troubleshooting

Quick Log Analysis

Most issues can be diagnosed by checking CloudWatch logs for ERROR messages in the ECS task logs.

1. Access ECS Task Logs

Via AWS Console:

  1. Go to CloudWatchLog Groups
  2. Find /aws/ecs/ohlala-smartops-{your-stack-name}
  3. Click on the most recent log stream
  4. Search for “ERROR” using Ctrl+F

🤖 Bot Not Responding

Symptoms

  • No response when messaging the bot in Teams
  • Bot appears offline
  • Commands timeout without response

Solution 1: Verify Webhook Configuration

  1. Check Webhook URL

    # Get from CloudFormation outputs
    aws cloudformation describe-stacks \
      --stack-name your-stack-name \
      --query "Stacks[0].Outputs[?OutputKey=='TeamsWebhookURL'].OutputValue" \
      --output text
    
  2. Update in Azure Bot

    • Azure Portal → Your Bot → Configuration
    • Messaging endpoint must match CloudFormation output
    • Must end with /api/messages

Solution 2: Check Authentication

  1. Verify Secrets in AWS

    aws secretsmanager get-secret-value \
      --secret-id ohlala-smartops-teams-{StackName} \
      --query SecretString \
      --output json
    
  2. Validate Credentials Match Azure

    • App ID must match Azure Bot’s App ID
    • Password must be valid and not expired
    • Tenant ID must match your Azure AD
  3. Check Lambda Authorizer Logs

    • CloudWatch → Log Groups → /aws/lambda/ohlala-authorizer-{StackName}
    • Look for “Authorization failed” messages

Solution 3: Teams App Issues

  1. Re-upload Teams Package

    • Remove existing app from Teams
    • Download fresh package
    • Update manifest.json with correct bot ID
    • Re-upload to Teams
    • You may need to manually bump the version in manifest.json to force Teams to accept the update
  2. Check Teams Policies

    • Teams Admin Center → Teams apps → Permission policies
    • Ensure custom apps are allowed
    • Check user has permission to use bots

❌ Deployment Failures

CloudFormation Stack Failed

Error: “CREATE_FAILED - Resource handler returned message: ‘The specified subnet does not exist’”

Solution:

# For Existing VPC mode, verify subnet IDs
aws ec2 describe-subnets \
  --subnet-ids subnet-xxxxx \
  --region your-region

Error: “CREATE_FAILED - IAM role already exists”

Solution:

# Delete existing role or use different stack name
aws iam delete-role --role-name ec2-management-bot-execution-role
aws iam delete-role --role-name ec2-management-bot-task-role

ECS Task Won’t Start

Error: “ResourceInitializationError: unable to pull secrets or registry auth”

Solution:

  1. Check ECR permissions
  2. Verify marketplace subscription is active
  3. Check execution role has secret access:
aws iam attach-role-policy \
  --role-name ec2-management-bot-execution-role \
  --policy-arn arn:aws:iam::aws:policy/AmazonECSTaskExecutionRolePolicy

🧠 Bedrock Model Issues

Error: “ValidationException: The provided model identifier is invalid”

This is the #1 most common deployment issue!

Cause: Amazon Bedrock Claude Sonnet 4 model access is not enabled or not available in your deployment region.

Solution:

  1. Navigate to Amazon Bedrock Console

    • Go to AWS Console → Amazon Bedrock
    • Ensure you’re in the correct region (same as deployment)
  2. Enable Claude Sonnet 4 Model Access

    • Left sidebar → “Model access”
    • Click “Edit” or “Manage model access”
    • Find Anthropic section
    • Enable Claude Sonnet 4:
      • Claude Sonnet 4 (anthropic.claude-sonnet-4-20250514-v1:0)
  3. Submit Request

    • Click “Next” → “Submit”
    • Most requests are approved immediately
    • Wait for status to show “Available”
  4. Verify Access

    # Test via AWS CLI
    aws bedrock list-foundation-models \
      --region us-east-1 \
      --query 'modelSummaries[?contains(modelId, `claude-sonnet-4`)]'
    
  5. Test in Bedrock Playground

    • Bedrock Console → Playgrounds → Chat
    • Select Claude Sonnet 4
    • Send test message: “Hello”
    • Should receive response
  6. Restart Application (if already deployed)

    # Force ECS service restart
    aws ecs update-service \
      --cluster your-cluster \
      --service your-service \
      --force-new-deployment
    

Regional Support with Cross-Region Inference Profiles:

Primary Regions (Native Claude Sonnet 4 Support):

  • us-east-1 ✅ (Recommended)
  • us-west-2 ✅
  • eu-west-1 ✅
  • eu-central-1 ✅
  • ap-northeast-1 ✅
  • ap-southeast-2 ✅

Supported via Inference Profiles:

  • eu-west-3 ✅ (via global/EU inference profiles)
  • eu-west-2 ✅ (via global/EU inference profiles)
  • eu-north-1 ✅ (via global/EU inference profiles)
  • ap-southeast-1 ✅ (via global/APAC inference profiles)
  • ap-northeast-2 ✅ (via global/APAC inference profiles)
  • ap-south-1 ✅ (via global/APAC inference profiles)
  • ca-central-1 ✅ (via global inference profiles)
  • sa-east-1 ✅ (via global inference profiles)

How Inference Profiles Work:

  1. Global Profile: global.anthropic.claude-sonnet-4-20250514-v1:0 - Works from any region
  2. Regional Profiles: eu.anthropic.claude-sonnet-4-20250514-v1:0 - Optimized for EU regions
  3. Automatic Fallback: Application automatically tries the best profile for your region

For eu-west-3 Specifically:

  • The application will automatically use global or EU inference profiles
  • No additional configuration required
  • Same Claude Sonnet 4 quality and performance

Error: “AccessDeniedException: You do not have access to the requested model”

Cause: Model access requested but not yet approved, or using wrong model ID.

Solution:

  1. Check approval status:

    • Bedrock Console → Model access
    • Status should be “Available”, not “Pending”
  2. Wait for approval:

    • Standard models: Usually immediate
    • Advanced models: Up to 24-48 hours
    • Check email for approval notification

🔐 Permission Issues

Error: “AccessDeniedException: User is not authorized to perform bedrock:InvokeModel”

Solution:

  1. Add Bedrock permissions to ECS task role:
{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "bedrock:InvokeModel",
        "bedrock:InvokeModelWithResponseStream"
      ],
      "Resource": [
        "arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
        "arn:aws:bedrock:*:*:inference-profile/*claude*"
      ]
    }
  ]
}
  1. Ensure Bedrock is available in your region
  2. Check Service Control Policies (SCPs) aren’t blocking access

Error: “UnauthorizedOperation: You are not authorized to perform ec2:DescribeInstances”

Solution:

  1. Update task role policy
  2. Check for SCPs (Service Control Policies) blocking access
  3. Verify cross-account permissions if using multiple accounts

💬 Teams Integration Issues

Bot Shows as Offline

Causes & Solutions:

  1. Azure Bot Channel Not Configured

    • Azure Portal → Bot → Channels
    • Ensure Teams channel is enabled
    • Status should be “Running”
  2. API Gateway Throttling

    • Check CloudWatch metrics for 429 errors
  3. Network Connectivity

    • Verify security groups allow HTTPS outbound
    • Check NAT Gateway is functioning (if used)

Messages Not Formatted Correctly

Issue: Bot responses show raw JSON or markdown

Solution:

  1. Update Teams app manifest version
  2. Ensure bot supports Adaptive Cards:
"supportsFiles": false,
"supportsCalling": false,
"supportsVideo": false

Bot Added but Can’t Use Commands

Issue: Bot visible but commands don’t work

Solution:

  1. Check bot is added to channel properly
  2. Verify @ mentions are working
  3. Test in personal chat first
  4. Check Teams app permissions

📞 Getting Support

Before Contacting Support

  1. Collect diagnostic information:

    • Stack name and region
    • Error messages (exact text)
    • CloudWatch logs (last 100 lines)
    • Time of occurrence
  2. Try quick fixes:

    • Restart ECS service
    • Clear Teams cache
    • Re-authenticate bot

Contact Support

Email: support@ohlala.cloud

Include:

  • AWS Account ID
  • Stack Name
  • Error Description
  • Steps to Reproduce
  • Diagnostic Logs

Response Time: 1 business day

📖 Additional Resources

6 - Deployment Reference

CloudFormation template parameters and advanced deployment configuration options

Deployment Reference

Technical reference for CloudFormation template parameters and advanced deployment configurations for Ohlala SmartOps.

📋 Parameter Overview

The template supports two deployment modes:

  • NewVPC: Creates complete network infrastructure (recommended)
  • ExistingVPC: Integrates with your existing VPC

🔑 Required Parameters

Deployment Configuration

DeploymentMode

  • Type: String
  • Default: NewVPC
  • Allowed Values: NewVPC, ExistingVPC
  • Description: Choose to create a new VPC or use existing VPC infrastructure

ContainerImageTag

  • Type: String
  • Default: v1.0.0
  • Description: Container image tag version (e.g., v1.0.0, v1.1.0)
  • Example: v1.0.0

Microsoft Teams Configuration

MicrosoftAppId

  • Type: String
  • Description: Microsoft Teams Bot App ID
  • Format: GUID format
  • Example: 12345678-90ab-cdef-1234-567890abcdef
  • Where to find: Azure Portal → Bot Resource → Configuration
  • NoEcho: false

MicrosoftAppPassword

  • Type: String
  • Description: Microsoft Teams Bot App Password
  • Format: String with special characters
  • Example: abcDEF123~hijKLM456-nopQRS789.tuvWXY012
  • Where to find: Created during bot registration (save immediately!)
  • NoEcho: true (hidden in console)

MicrosoftAppTenantId

  • Type: String
  • Description: Microsoft Teams Tenant ID
  • Format: GUID format
  • Example: 87654321-abcd-efgh-4321-0987654321fe
  • Where to find: Azure Portal → Azure Active Directory → Overview
  • NoEcho: false

🏗️ Existing VPC Parameters

These parameters are required only when DeploymentMode: ExistingVPC:

ExistingVPCId

  • Type: String
  • Default: "" (empty)
  • Description: ID of existing VPC (e.g., vpc-12345678)
  • Pattern: ^(vpc-[0-9a-f]{8,17})?$
  • Example: vpc-0123456789abcdef0
  • Constraint: Must be a valid VPC ID or empty for NewVPC mode

ExistingPrivateSubnet1Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of first private subnet (e.g., subnet-12345678)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0123456789abcdef0
  • Requirement: Must be in different AZ from ExistingPrivateSubnet2Id

ExistingPrivateSubnet2Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of second private subnet in different AZ (e.g., subnet-87654321)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0fedcba9876543210
  • Requirement: Must be in different AZ from ExistingPrivateSubnet1Id

ExistingPublicSubnet1Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of first public subnet (e.g., subnet-abcd1234)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0abcd1234efgh5678
  • Requirement: Must be in different AZ from ExistingPublicSubnet2Id

ExistingPublicSubnet2Id

  • Type: String
  • Default: "" (empty)
  • Description: ID of second public subnet in different AZ (e.g., subnet-dcba4321)
  • Pattern: ^(subnet-[0-9a-f]{8,17})?$
  • Example: subnet-0dcba4321hgfe8765
  • Requirement: Must be in different AZ from ExistingPublicSubnet1Id

🌐 NewVPC Network Configuration

These parameters are optional and only used when DeploymentMode: NewVPC:

VPCCIDR

  • Type: String
  • Default: 10.0.0.0/16
  • Description: CIDR block for the VPC
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.0.0/16

PublicSubnet1CIDR

  • Type: String
  • Default: 10.0.1.0/24
  • Description: CIDR block for public subnet 1
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.1.0/24

PublicSubnet2CIDR

  • Type: String
  • Default: 10.0.2.0/24
  • Description: CIDR block for public subnet 2
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.2.0/24

PrivateSubnet1CIDR

  • Type: String
  • Default: 10.0.10.0/24
  • Description: CIDR block for private subnet 1
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.10.0/24

PrivateSubnet2CIDR

  • Type: String
  • Default: 10.0.11.0/24
  • Description: CIDR block for private subnet 2
  • Pattern: Valid IP CIDR range (x.x.x.x/x)
  • Example: 10.0.11.0/24

EnableNATGateway

  • Type: String
  • Default: "true"
  • Allowed Values: "true", "false"
  • Description: Enable NAT Gateway for private subnets
  • Cost Impact: NAT Gateway adds ~$32/month
  • Recommendation: Set to "false" for cost savings if outbound internet not needed

📤 Stack Outputs

The template provides these outputs after successful deployment:

APIGatewayEndpoint

  • Description: API Gateway endpoint URL
  • Format: https://{ApiGateway}.execute-api.{Region}.amazonaws.com/prod-{StackName}
  • Usage: Base URL for API access

TeamsWebhookURL

  • Description: URL to configure in Microsoft Teams Bot Framework
  • Format: https://{ApiGateway}.execute-api.{Region}.amazonaws.com/prod-{StackName}/api/messages
  • Usage: Set this as the messaging endpoint in Azure Bot Configuration

ECSCluster

  • Description: ECS Cluster Name
  • Format: OhlalaSmartOps-Cluster-{StackName}
  • Usage: For monitoring and management

ECSService

  • Description: ECS Service Name
  • Format: OhlalaSmartOps-Service-{StackName}
  • Usage: For monitoring and scaling

VPCId

  • Description: VPC ID (created or existing)
  • Format: vpc-xxxxxxxxx
  • Usage: For reference and additional resource creation

🚀 Deployment Examples

Simple NewVPC Deployment

Parameters:
  DeploymentMode: NewVPC
  ContainerImageTag: v1.0.0
  MicrosoftAppId: "12345678-90ab-cdef-1234-567890abcdef"
  MicrosoftAppPassword: "your-secret-password"
  MicrosoftAppTenantId: "87654321-abcd-efgh-4321-0987654321fe"
  EnableNATGateway: "false"  # Cost optimization

Custom NewVPC with Different CIDR

Parameters:
  DeploymentMode: NewVPC
  VPCCIDR: "172.16.0.0/16"
  PublicSubnet1CIDR: "172.16.1.0/24"
  PublicSubnet2CIDR: "172.16.2.0/24"
  PrivateSubnet1CIDR: "172.16.10.0/24"
  PrivateSubnet2CIDR: "172.16.11.0/24"
  EnableNATGateway: "true"
  # ... Teams parameters

ExistingVPC Deployment

Parameters:
  DeploymentMode: ExistingVPC
  ExistingVPCId: "vpc-0123456789abcdef0"
  ExistingPrivateSubnet1Id: "subnet-0123456789abcdef0"
  ExistingPrivateSubnet2Id: "subnet-0fedcba9876543210"
  ExistingPublicSubnet1Id: "subnet-0abcd1234efgh5678"
  ExistingPublicSubnet2Id: "subnet-0dcba4321hgfe8765"
  # ... Teams parameters

🔍 Parameter Validation

The template includes validation rules:

Pattern Validation

  • VPC IDs: Must match vpc- followed by 8-17 hex characters
  • Subnet IDs: Must match subnet- followed by 8-17 hex characters
  • CIDR Blocks: Must be valid IP CIDR format

Logical Validation

  • ExistingVPC mode requires all four subnet IDs
  • Subnets must be in at least 2 different availability zones
  • CIDR blocks must not overlap

Cross-Parameter Rules

  • If DeploymentMode: ExistingVPC, all existing VPC parameters are required
  • If DeploymentMode: NewVPC, existing VPC parameters are ignored

💰 Cost Impact by Parameter

ParameterCost ImpactNotes
EnableNATGateway: "true"+$32/monthOnly for NewVPC mode
EnableNATGateway: "false"$0Saves money but no outbound internet
DeploymentMode: ExistingVPC$0Uses existing network infrastructure
ContainerImageTag$0No cost difference between versions

🚨 Common Parameter Errors

Missing Required Parameters

Template validation error: Parameter 'MicrosoftAppId' must have a value

Solution: Provide all required Teams configuration parameters

Invalid VPC ID Format

Parameter validation failed: vpc-invalid does not match pattern

Solution: Use correct format: vpc- + 8-17 hex characters

Subnet AZ Requirements Not Met

The subnet IDs must be in at least two different availability zones

Solution: Choose subnets from different AZs in your region

ExistingVPC Missing Parameters

When using ExistingVPC mode, you must provide all subnet IDs

Solution: Provide all four subnet parameters for ExistingVPC mode

📚 Additional Resources