|
| 1 | +# Nova Sonic Architecture |
| 2 | + |
| 3 | +This document provides a comprehensive overview of the Nova Sonic speech-to-speech application architecture. |
| 4 | + |
| 5 | +## System Overview |
| 6 | + |
| 7 | +Nova Sonic is a speech-to-speech conversation application that enables real-time voice interactions with an AI assistant powered by Amazon Bedrock. The application features WebRTC for low-latency audio streaming, a restaurant booking capability, and a real-time event system using AppSync Events API. As illustrated in Figure 1 below, the system consists of frontend and backend components working together with various AWS services. |
| 8 | + |
| 9 | + |
| 10 | +*Figure 1: High-level architecture of the Nova Sonic application showing the main components and their interactions.* |
| 11 | + |
| 12 | +## Key Components |
| 13 | + |
| 14 | +The following components are illustrated in the high-level architecture diagram (Figure 1) and interact as shown in the data flow diagram (Figure 2): |
| 15 | + |
| 16 | +### Frontend Web Application |
| 17 | + |
| 18 | +- **Technology**: React-based web application |
| 19 | +- **Deployment**: ECS Fargate service with Application Load Balancer |
| 20 | +- **Features**: |
| 21 | + - WebRTC integration for audio streaming |
| 22 | + - Real-time conversation display |
| 23 | + - Restaurant booking interface |
| 24 | + - AppSync Events API client for real-time updates |
| 25 | + |
| 26 | +### Backend API Service |
| 27 | + |
| 28 | +- **Technology**: Python-based API service |
| 29 | +- **Deployment Options**: |
| 30 | + - ECS Fargate service (default) |
| 31 | + - EC2 instances running Docker containers (alternative) |
| 32 | +- **Features**: |
| 33 | + - Audio processing and streaming |
| 34 | + - Integration with Amazon Bedrock |
| 35 | + - DynamoDB interaction for data persistence |
| 36 | + - Restaurant booking API integration |
| 37 | + - AppSync Events API integration for real-time events |
| 38 | + |
| 39 | +### AWS Services |
| 40 | + |
| 41 | +#### Amazon Bedrock |
| 42 | + |
| 43 | +- Powers the AI conversation capabilities |
| 44 | +- Provides Nova Sonic for speech synthesis |
| 45 | +- Enables natural language understanding and generation |
| 46 | + |
| 47 | +#### Amazon DynamoDB |
| 48 | + |
| 49 | +- **Tables**: |
| 50 | + - `NovaSonicConversations`: Stores conversation history |
| 51 | + - `RestaurantBookings`: Stores restaurant booking information |
| 52 | +- **Features**: |
| 53 | + - DynamoDB Streams for change data capture |
| 54 | + - On-demand capacity for cost optimization |
| 55 | + |
| 56 | +#### AWS AppSync Events API |
| 57 | + |
| 58 | +- Provides real-time publish/subscribe functionality |
| 59 | +- Enables real-time updates for conversation and booking events |
| 60 | +- Integrates with DynamoDB Streams via Lambda function |
| 61 | + |
| 62 | +#### AWS Elastic Container Service (ECS) |
| 63 | + |
| 64 | +- Manages container deployment and orchestration |
| 65 | +- Supports both Fargate and EC2 launch types |
| 66 | +- Auto-scaling based on CPU utilization |
| 67 | + |
| 68 | +#### AWS Elastic Load Balancing |
| 69 | + |
| 70 | +- **Application Load Balancer (ALB)**: |
| 71 | + - Distributes HTTP/HTTPS traffic |
| 72 | + - Supports WebSocket connections |
| 73 | + - Enables HTTPS with AWS Certificate Manager |
| 74 | +- **Network Load Balancer (NLB)**: |
| 75 | + - Handles WebRTC UDP traffic |
| 76 | + - Enables low-latency audio streaming |
| 77 | + |
| 78 | +#### Amazon Route 53 |
| 79 | + |
| 80 | +- DNS management for custom domains |
| 81 | +- Integration with AWS Certificate Manager for HTTPS |
| 82 | + |
| 83 | +## Network Architecture |
| 84 | + |
| 85 | +The network architecture, as depicted in Figure 3, includes the following components: |
| 86 | + |
| 87 | +### VPC Configuration |
| 88 | + |
| 89 | +- VPC with public and private subnets across 2 availability zones |
| 90 | +- NAT Gateway for outbound internet access from private subnets |
| 91 | +- Security groups for fine-grained access control |
| 92 | + |
| 93 | +### Security Groups |
| 94 | + |
| 95 | +- **API Security Group**: Controls access to the API service |
| 96 | +- **API Load Balancer Security Group**: Controls access to the API load balancer |
| 97 | +- **WebRTC Load Balancer Security Group**: Controls access to the WebRTC load balancer |
| 98 | +- **Webapp Security Group**: Controls access to the web application |
| 99 | + |
| 100 | +## Data Flow |
| 101 | + |
| 102 | +The following diagram (Figure 2) illustrates how data flows through the Nova Sonic system during user interactions: |
| 103 | + |
| 104 | + |
| 105 | +*Figure 2: Data flow diagram illustrating how information moves through the Nova Sonic system during user interactions.* |
| 106 | + |
| 107 | +### Speech-to-Speech Conversation Flow |
| 108 | + |
| 109 | +1. User speaks into the microphone on the web application |
| 110 | +2. Audio is streamed via WebRTC to the backend API |
| 111 | +3. API sends the audio to Amazon Transcribe for speech-to-text conversion |
| 112 | +4. Transcribed text is sent to Amazon Bedrock for processing |
| 113 | +5. Bedrock generates a response |
| 114 | +6. Response is sent to Amazon Nova Sonic for text-to-speech conversion |
| 115 | +7. Synthesized speech is streamed back to the user via WebRTC |
| 116 | +8. Conversation history is stored in DynamoDB |
| 117 | +9. DynamoDB Streams trigger Lambda function |
| 118 | +10. Lambda function publishes events to AppSync Events API |
| 119 | +11. Web application receives real-time updates via AppSync subscription |
| 120 | + |
| 121 | +### Restaurant Booking Flow |
| 122 | + |
| 123 | +1. User requests to book a restaurant through voice conversation |
| 124 | +2. Bedrock identifies the booking intent and extracts details |
| 125 | +3. API sends booking request to the Restaurant Booking API |
| 126 | +4. Booking confirmation is stored in DynamoDB |
| 127 | +5. DynamoDB Streams trigger Lambda function |
| 128 | +6. Lambda function publishes booking event to AppSync Events API |
| 129 | +7. Web application receives real-time booking confirmation via AppSync subscription |
| 130 | + |
| 131 | +## Real-time Event System |
| 132 | + |
| 133 | +The AppSync Events API provides real-time publish/subscribe functionality for change data capture from DynamoDB tables. This enables real-time updates for conversation and booking events, as illustrated in the data flow diagram (Figure 2). |
| 134 | + |
| 135 | +### Components |
| 136 | + |
| 137 | +- **AppSync Events API**: AWS AppSync API configured as an EVENTS API |
| 138 | +- **DynamoDB Tables with Streams**: Tables with DynamoDB streams enabled |
| 139 | +- **Lambda Function**: Processes DynamoDB stream events and publishes to AppSync |
| 140 | +- **Client Integration**: Web clients subscribe to AppSync for real-time updates |
| 141 | + |
| 142 | +### Channels |
| 143 | + |
| 144 | +- **restaurant-booking**: Events related to restaurant bookings |
| 145 | +- **conversations**: Events related to conversation transcripts |
| 146 | + |
| 147 | +## Deployment Options |
| 148 | + |
| 149 | +Nova Sonic supports multiple deployment options as shown in Figure 3. The detailed architecture diagram below illustrates the infrastructure components and how they interact in different deployment scenarios: |
| 150 | + |
| 151 | + |
| 152 | +*Figure 3: Detailed architecture diagram showing the deployment options and infrastructure components of the Nova Sonic system.* |
| 153 | + |
| 154 | +### ECS-based Deployment (Default) |
| 155 | + |
| 156 | +``` |
| 157 | +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ |
| 158 | +│ Application │ │ ECS │ │ EC2 Instance │ |
| 159 | +│ Load Balancer │────▶│ Service │────▶│ or Fargate │ |
| 160 | +└───────────────┘ └───────────────┘ └───────────────┘ |
| 161 | + │ |
| 162 | + ▼ |
| 163 | + ┌───────────────┐ |
| 164 | + │ DynamoDB │ |
| 165 | + └───────────────┘ |
| 166 | +``` |
| 167 | + |
| 168 | +- Managed container orchestration |
| 169 | +- Better integration with AWS ecosystem |
| 170 | +- More sophisticated deployment options |
| 171 | +- Built-in monitoring and logging |
| 172 | + |
| 173 | +### EC2-based Deployment (Alternative) |
| 174 | + |
| 175 | +``` |
| 176 | +┌───────────────┐ ┌───────────────┐ ┌───────────────┐ |
| 177 | +│ Application │ │ Auto │ │ EC2 Instance │ |
| 178 | +│ Load Balancer │────▶│ Scaling Group │────▶│ with Docker │ |
| 179 | +└───────────────┘ └───────────────┘ └───────────────┘ |
| 180 | + │ |
| 181 | + ▼ |
| 182 | + ┌───────────────┐ |
| 183 | + │ DynamoDB │ |
| 184 | + └───────────────┘ |
| 185 | +``` |
| 186 | + |
| 187 | +- Simpler architecture with fewer AWS services |
| 188 | +- Direct control over the Docker runtime |
| 189 | +- Potentially lower costs for certain workloads |
| 190 | +- Easier to debug and troubleshoot |
| 191 | + |
| 192 | +## Security Considerations |
| 193 | + |
| 194 | +As shown in the architecture diagrams (Figures 1 and 3), security is implemented at multiple layers of the Nova Sonic system: |
| 195 | + |
| 196 | +- Security groups restrict traffic between components |
| 197 | +- IAM roles follow the principle of least privilege |
| 198 | +- Containers run in private subnets with outbound internet access through NAT Gateway |
| 199 | +- Load balancers are the only components exposed to the internet |
| 200 | +- HTTPS is configured for secure communication and to enable WebRTC functionality |
| 201 | +- HTTP to HTTPS redirection is implemented for enhanced security |
| 202 | + |
| 203 | +## Cost Optimization |
| 204 | + |
| 205 | +As shown in the deployment architecture (Figure 3), several cost optimization strategies are implemented: |
| 206 | + |
| 207 | +- Auto-scaling is configured to scale based on CPU utilization |
| 208 | +- NAT Gateway is shared across availability zones to reduce costs |
| 209 | +- DynamoDB is configured with on-demand capacity to optimize costs based on usage |
| 210 | +- EC2-based deployment option for potentially lower costs in certain scenarios |
0 commit comments