This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Architecture & Limitations

System architecture, design decisions, and current limitations of Ohlala SmartOps

    Architecture & Limitations

    Understanding the system design, architectural decisions, and current limitations of Ohlala SmartOps.

    πŸ—οΈ System Architecture

    High-Level Overview

    Ohlala SmartOps follows a containerized, serverless architecture designed for high availability and cost efficiency:

    High-level architecture diagram showing user interaction with Teams, API Gateway, ECS Fargate, Bedrock, and AWS services

    Container Architecture

    Multi-Container Design with dedicated responsibilities:

    Main Bot Container

    • Purpose: Teams integration, conversation orchestration, Bedrock AI
    • Port: 8000
    • Resources: 768 CPU units, 1536MB memory
    • Key Features:
      • Microsoft Bot Framework integration
      • Amazon Bedrock (Claude) orchestration
      • Conversation state management
      • Multi-language support

    MCP AWS API Container

    • Purpose: Secure AWS operations via Model Context Protocol
    • Port: 8080
    • Resources: 256 CPU units, 512MB memory
    • Key Features:
      • AWS service abstractions
      • Permission-aware operations
      • Rate limiting and retry logic
      • Security-first design

    🎯 Architecture Highlights

    πŸš€ Fully Serverless

    ECS Fargate + API Gateway eliminate infrastructure management overhead

    • Zero server maintenance - AWS handles all patching and scaling
    • Automatic scaling - Responds to demand without intervention
    • Pay-per-use pricing - Only pay for actual compute time
    • Note: ~30s cold start for new container instances

    πŸ”’ Security-First Design

    Defense in depth with multiple security layers

    • Private subnets - Containers have no direct internet exposure
    • Isolated containers - Bot logic and AWS operations run separately
    • JWT validation - Lambda authorizer validates all requests
    • Secrets management - Credentials stored in AWS Secrets Manager
    • Least privilege IAM - Each component has minimal required permissions

    πŸ“¦ Microservices Architecture

    Multi-container pattern for better maintainability

    • Main bot container - Handles Teams interactions and AI orchestration
    • MCP AWS container - Provides secure AWS API access
    • Clear boundaries - Each container has a single responsibility
    • Independent updates - Deploy changes without affecting other components

    πŸ’Ύ Stateless by Design

    No persistent storage keeps architecture simple

    • Reduced complexity - No database to manage or scale
    • Lower costs - No database charges or backup requirements
    • Horizontal scaling - Any container can handle any request
    • Trade-off: Conversation context resets on container restart

    🌍 Regional Flexibility

    Deploy anywhere with single-region stacks

    • Data sovereignty - Keep data in your required region
    • Low latency - Deploy close to your EC2 instances
    • Cost optimization - No cross-region data transfer fees
    • Simple disaster recovery - Deploy multiple independent stacks

    ⚑ High-Performance Networking

    Optimized for Teams integration with enterprise-grade networking

    • Network Load Balancer - Layer 4 load balancing for minimal latency
    • VPC Link - Secure private connection from API Gateway
    • Auto-scaling - Network automatically handles traffic spikes
    • Health checks - Automatic failover for unhealthy containers

    πŸ“Š Performance Characteristics

    Response Times

    • Health Check: < 1 second
    • Simple Commands: 2-5 seconds
    • AI Analysis: 5-15 seconds
    • SSM Operations: 10-60 seconds (depending on command)

    Throughput Limits

    • Concurrent Users: 1-20 (single task)
    • Commands/Day: 10,00+ (with proper scaling)
    • API Gateway: 10,000 requests/second (AWS limit)
    • Bedrock: 20 requests/minute per model (AWS limit)

    Scaling Behavior

    • Auto-scaling: ECS service set to auto-heal (1 task)
    • Cold start: ~30 seconds for new tasks

    ⚠️ Current Limitations

    1. Session Management

    • Issue: No persistent conversation history
    • Impact: Context lost on container restart
    • Workaround: Keep conversations short and focused

    2. Multi-Region Support

    • Issue: Single region deployment only
    • Impact: No built-in disaster recovery
    • Workaround: Deploy multiple stacks in different regions

    5. Cold Start Latency

    • Issue: 30+ second delay for new container starts
    • Impact: First request after idle period is slow
    • Workaround: Keep minimum 1 task running always
    • Mitigation: ECS warmup targets available

    πŸ”’ Security Architecture

    Network Security

    • Private Subnets: Containers have no direct internet access
    • Security Groups: Restrictive ingress/egress rules
    • VPC Endpoints: Secure access to AWS services

    Authentication & Authorization

    • Teams Authentication: Microsoft Bot Framework JWT validation
    • AWS Permissions: IAM roles with least-privilege access
    • Inter-Container: Shared API key for MCP communication
    • Secrets: AWS Secrets Manager for sensitive data

    Data Protection

    • Encryption in Transit: TLS 1.2+ for all communication
    • Encryption at Rest: EBS volumes encrypted by default
    • Logging: CloudWatch Logs with retention policies
    • Audit Trail: All AWS API calls logged via CloudTrail

    πŸ“– Technical References

    Container Images

    • Registry: Amazon ECR
    • Repository: 709825985650.dkr.ecr.us-east-1.amazonaws.com/ohlala-automation-solutions/
    • Tags: Version-based (v1.0.0, v1.1.0, etc.)

    Monitoring & Observability

    • Metrics: CloudWatch Container Insights
    • Logs: Structured JSON logging to CloudWatch
    • Health Checks: HTTP endpoints on both containers
    • Alarms: CPU, Memory, Error Rate monitoring

    Backup & Recovery

    • Container Images: Immutable, versioned in ECR
    • Infrastructure: CloudFormation templates in version control
    • Configuration: Environment variables and secrets
    • No Persistent Data: Stateless design eliminates backup needs

    πŸ“š Additional Resources

    Need Help?