Skip to content

Latest commit

 

History

History
175 lines (142 loc) · 10.1 KB

File metadata and controls

175 lines (142 loc) · 10.1 KB

Architecture Deep Dive

Architecture

Archnitecture Diagram

Description

  1. The user sends a request to the application hosted on AWS Amplify.
  2. Amplify integrates with the backend API Gateway.
  3. Instructors can upload patient data to the application, which is stored in an S3 bucket using a pre-signed upload URL.
  4. Adding a new file to the S3 bucket triggers the data ingestion workflow. The Lambda function runs a Docker container with Amazon Elastic Container Registry (ECR) and uses LangChain to process documents.
  5. The Lambda function embeds the text from uploaded files into vectors using Amazon Bedrock. This project uses the Amazon Titan Text Embeddings V2 model to generate embeddings.
  6. The Lambda function stores the vectors in the PostgreSQL database.
  7. Users can perform simulation group management/access actions by sending an API request which invokes a Lambda function.
  8. This Lambda function interacts with Amazon RDS.
  9. Users can start chatting with the LLM by sending an API request that invokes the Lambda function to generate a response. The Lambda function runs a Docker container with Amazon ECR.
  10. The Lambda function stores the embedded messages in Amazon DynamoDB
  11. This Lambda function uses RAG architecture and LangChain to retrieve the response from LLMs hosted on Amazon Bedrock augmented with the patient's information stored in the Amazon RDS. This project uses Meta Llama 3 70 B as its selected LLM.
  12. The LLM responses are streamed as chunks to the frontend using AppSync as a websocket implementation, so that the frontend may receive realtime responses as they are generated rather than receiving all of the generated text at once.
  13. An ECS server invokes Amazon Nova Sonic through bedrock and delivers chunks of audio to and from the backend via CloudFront and a websocket connection.

Database Schema

Database Schema

RDS Langchain Tables

langchain_pg_collection table

Column Name Description
uuid The uuid of the collection
name The name of the collection
cmetadata The metadata of the collection

langchain_pg_embedding table

Column Name Description
id The ID of the embeddings
collection_id The uuid of the collection
embedding The vector embeddings of the document
cmetadata The metadata of the collection
document The content of the document

RDS PostgreSQL Tables

users table

Column Name Description
user_id The ID of the user
user_email The email of the user
username The username of the user
first_name The first name of the user
last_name The last name of the user
preferred_name The preferred name of the user
time_account_created The time the account was created in UTC
roles The roles of the user
last_sign_in The time the user last signed in in UTC

simulation_groups table

Column Name Description
simulation_group_id The ID of the simulation group
group_name The name of the simulation group
group_description The description of the simulation group
group_access_code The access code for students to join the group
group_student_access Whether or not students can access the group
system_prompt The system prompt for the group

enrolments table

Column Name Description
enrolment_id The ID of the enrolment
user_id The ID of the enrolled user
simulation_group_id The ID of the associated simulation group
enrolment_type The type of enrolment (e.g., student, instructor, admin)
group_completion_percentage The percentage of the group completed (currently unused)
time_enroled The timestamp when the enrolment occurred

patients table

Column Name Description
patient_id The ID of the patient
simulation_group_id The ID of the associated simulation group
patient_name The name of the patient
patient_age The age of the patient
patient_gender The gender of the patient
patient_number The number of the patient
patient_prompt The prompt that reveals more about patient
llm_completion The name of the patient

patient_data table

Column Name Description
file_id The ID of the file
patient_id The ID of the associated patient
filetype The type of the file (e.g., pdf, docx, etc.)
s3_bucket_reference The reference to the S3 bucket
filepath The path to the file in the S3 bucket
filename The name of the file
time_uploaded The timestamp when the file was uploaded
metadata Additional metadata about the file
file_number Number of the file to keep track of order

student_interactions table

Column Name Description
student_interaction_id The ID of the student interaction
patient_id The ID of the associated patient
enrolment_id The ID of the related enrolment
patient_score Score calculated by the LLM for the student interacting with the patient
last_accessed The timestamp of the last time the patient was accessed
patient_context_embedding A float array representing the patient context embedding
is_completed A Boolean representing if the instructor has marked this student's interaction as complete

sessions table

Column Name Description
session_id The ID of the session
student_interaction_id The ID of the associated student interaction
session_name The name of the session
session_context_embeddings A float array representing the session context embeddings (currently unused)
last_accessed The timestamp of the last time the session was accessed
notes The notes a student can take per session when talking to a patient

messages table

Column Name Description
message_id The ID of the message
session_id The ID of the associated session
student_sent Whether the message was sent by the student (boolean)
message_content The content of the message (currently unused)
time_sent The timestamp when the message was sent

user_engagement_log table

Column Name Description
log_id The ID of the engagement log entry
user_id The ID of the user
simulation_group_id The ID of the associated simulation group
patient_id The ID of the associated patient
enrolment_id The ID of the related enrolment
timestamp The timestamp of the engagement event
engagement_type The type of engagement (e.g., patient access)
engagement_details The text describing the engagement

S3 Bucket Structure

.
├── {simulation_group_id}
    └── {patient_id}
        └── documents
            ├── document_1.pdf
            └── document_2.pdf
        └── info
            ├── info_1.pdf
            └── info_2.pdf
        └── answer_key
            ├── answer_key_1.pdf
        └── profile_pic
            ├── {patient_id}_profile_pic.png