"AI-Powered Social Cue Analyzer for Enhancing Communication in Socially Anxious Individuals with Cultural Sensitivity"
Socially anxious individuals often struggle to interpret and respond appropriately to social cues such as facial expressions, tone, and gestures. These difficulties are exacerbated in multicultural or cross-cultural environments, where the norms of expression and interpretation can vary significantly. A culturally unaware AI system could misinterpret behavior or offer incorrect feedback, leading to reduced effectiveness or even harm.
Develop a multimodal, culturally-aware AI assistant that:
-
Analyzes facial expressions, gaze, and speech patterns.
-
Evaluates linguistic content using an LLM.
-
Delivers real-time or post-session feedback on:
-
Userโs nonverbal expressiveness.
-
Their interpretation of others' cues.
-
Suggested communication improvements.
-
-
Incorporates cultural variability into emotion recognition and response recommendations.
The country_context.py file contains detailed cultural information organized using these 14 key dimensions:
formality- Level of formality expected in interactions (formal vs. informal communication styles)individualism_collectivism- Whether the culture prioritizes individual or group needs and decisionstime_orientation- Approach to time management (monochronic/punctual vs. polychronic/flexible)context_orientation- Communication style (high-context/indirect vs. low-context/direct)mental_health_stigma- Cultural attitudes toward mental health discussion and professional helpgreeting_norms- Appropriate ways to greet people (handshakes, bows, verbal greetings)small_talk_topics- Safe and appropriate casual conversation topicssensitive_topics- Topics to avoid or approach carefully in conversationsgift_giving_norms- Cultural expectations around giving and receiving giftseye_contact_and_gaze_tips- Appropriate eye contact patterns and gaze behavioremotional_display_rules- Cultural norms for expressing emotions in publiccoping_and_support_norms- How people typically seek and receive emotional supportstress_expression- How stress and disagreement are typically communicatednonverbal_signals- Important gestures, body language, and their cultural meanings
Countries in Database: The country_context.py file includes 6 countries with complete cultural context implementation:
Fully Implemented (6 countries):
- India ๐ฎ๐ณ - Complete cultural context with all 14 dimensions
- China ๐จ๐ณ - Complete cultural context with all 14 dimensions
- United States ๐บ๐ธ - Complete cultural context with all 14 dimensions
- Japan ๐ฏ๐ต - Complete cultural context with all 14 dimensions
- Germany ๐ฉ๐ช - Complete cultural context with all 14 dimensions
- Spain ๐ช๐ธ - Complete cultural context with all 14 dimensions
Each country includes comprehensive cultural information covering formality levels, individualism vs. collectivism, time orientation, communication styles, mental health attitudes, greeting norms, appropriate conversation topics, gift-giving customs, eye contact patterns, emotional expression rules, support systems, stress communication, and important nonverbal signals.
The prompts.py file defines a structured JSON response format that the LLM returns for each analysis:
{
"how_to_take_the_conversation_forward": "Actionable communication steps suitable to the cultural context and emotional state",
"what_to_avoid": "Behaviors, topics, or nonverbal cues that might cause discomfort or misunderstanding",
"reassurance_to_offer": "Gentle words or gestures that help calm the anxious person",
"nonverbal_tips": "Body language or gaze tips that would appear appropriate and comforting in this culture",
"small_talk_starters": ["Array of suggested casual conversation openers suitable to culture and context"],
"tone_and_pacing_tips": "Advice on how fast to speak, how formal to sound, and how to pause",
"cultural_sensitivity_notes": "Important cultural taboos or politeness norms to keep in mind"
}This structured response ensures the AI provides:
- Actionable guidance for continuing conversations
- Cultural awareness of what to avoid
- Emotional support through appropriate reassurance
- Nonverbal coaching for body language and gaze
- Conversation starters tailored to cultural context
- Communication style advice for tone and pacing
- Cultural sensitivity reminders for appropriate behavior
anxiety_helper/
โโโ ๐ engine.py # Main application engine
โโโ ๐ environment.yml # Conda environment configuration
โโโ ๐ requirements.txt # Python dependencies
โโโ ๐ README.md # Project documentation
โโโ ๐ prompts used to create context.txt # Context prompts for development
โ
โโโ ๐ค audio/ # Audio processing module
โ โโโ transcribe.py # Speech-to-text transcription with conversation history
โ
โโโ ๐ง analyzer/ # LLM analysis module
โ โโโ country_context.py # Cultural context handling
โ โโโ llm_analyzer.py # Main LLM analysis logic
โ โโโ llm.py # LLM integration
โ โโโ prompts.py # LLM prompts and templates
โ โโโ utils.py # Utility functions
โ
โโโ ๐๏ธ vision/ # Computer vision module
โโโ main.py # Vision processing entry point โโโ analyzers/ # Vision analysis components
โ โโโ base_analyzer.py # Base analyzer class
โ โโโ emotion_analyzer.py # Facial emotion detection
โ โโโ gaze_analyzer.py # Eye gaze tracking and engagement detection (Bored/Interested/Surprised)
โโโ utils/ # Vision utilities
โโโ image_utils.py # Image processing utilities
- Purpose: Processes speech input and converts it to text for analysis
- Key Components:
transcribe.py: Real-time speech-to-text transcription with conversation history tracking
- Features:
- Conversation History: Automatically stores and tracks conversation flow between speakers
- Speaker Identification: Distinguishes between "SELF" and "OTHER" participants
- Context Preservation: Maintains conversation context for improved LLM analysis
- Demo: Run
python audio/transcribe.py
- Purpose: Analyzes visual cues including facial expressions and gaze patterns
- Key Components:
emotion_analyzer.py: Detects facial emotions and expressionsgaze_analyzer.py: Tracks eye movement, gaze direction, and engagement states (Bored, Interested, Surprised)image_utils.py: Image preprocessing and utility functions
- Demo: Run
python vision/main.py - Recent Update: Gaze analyzer no longer detects "Excited" state to avoid classification overlap
- Purpose: Processes multimodal data using Large Language Models for cultural context
- Key Components:
llm_analyzer.py: Main analysis engine combining all modalitiescountry_context.py: Cultural sensitivity and context awarenessprompts.py: Structured prompts for LLM interactionsllm.py: LLM integration and API handlingutils.py: Helper functions for data processing
engine.py: Main application orchestratorenvironment.yml: Conda environment specificationrequirements.txt: Python package dependencies
The engine.py file orchestrates the entire multimodal analysis pipeline, integrating all three modules into a cohesive real-time system:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ ANXIETY HELPER PIPELINE โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ ๐ค AUDIO โ โ ๐๏ธ VISION โ โ ๐ง ANALYZER โ
โ MODULE โ โ MODULE โ โ MODULE โ
โ โ โ โ โ โ
โ โข Transcriber โ โ โข Emotion โ โ โข LLM Analysis โ
โ โข Real-time โ โ โข Gaze Tracking โ โ โข Cultural โ
โ Speech-to- โ โ โข Visual Cues โ โ Context โ
โ Text โ โ โ โ โข Feedback โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ ๐ INTEGRATED ANALYSIS โ
โ โ
โ โข Multimodal Fusion โ
โ โข Cultural Adaptation โ
โ โข Real-time Feedback โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโ
# Country context setup (default: "China")
country = "China" # Cultural context parameter
# Initialize components
transcriber = AudioTranscriber() # Audio processing with conversation history
emotion_analyzer = EmotionAnalyzer() # Facial emotion detection
gaze_analyzer = GazeAnalyzer() # Eye trackingThe engine runs a continuous loop that processes multiple data streams:
a) ๐น Video Frame Capture
- Captures frames from webcam (
cv2.VideoCapture) - Processes each frame through vision analyzers
b) ๐ Emotion Detection
EmotionAnalyzer.analyze(frame)โ Returns emotion + confidence- Detects: happiness, sadness, anger, fear, surprise, etc.
c) ๐ Gaze Analysis
GazeAnalyzer.analyze(frame)โ Returns gaze direction and engagement state- Gaze Direction: Tracks "Left", "Right", "Center" eye positioning
- Engagement States: Detects "Bored", "Interested", "Surprised" based on eye openness and eyebrow position
d) ๐ค Audio Transcription
AudioTranscriber.get_transcription()โ Returns speech text- Continuous background audio processing
- Speaker Identification: User specifies speaker (0=self, 1=other)
- Conversation History: Stores dialogue with "SELF:" and "OTHER:" prefixes
- Context Window: Analyzes last 20 conversation entries for historical context
When new speech is detected, the system triggers comprehensive analysis:
if transcript: # New speech detected
analysis = analyze_conversation(
cultural_context=country, # e.g., "China"
emotional_state=emotion, # e.g., "neutral"
gaze_behavior=gaze, # e.g., "direct"
transcript=transcript # Spoken content
)The LLM analyzer combines all inputs with cultural awareness:
- Speech Content: What was said
- Emotional State: How it was expressed facially
- Gaze Behavior: Eye contact patterns
- Conversation History: Previous dialogue context (last 20 exchanges)
- Speaker Identification: Who said what (SELF vs OTHER)
- Cultural Context: Country-specific social norms
The system provides:
- Real-time visual feedback (on-screen labels)
- Comprehensive analysis (cultural sensitivity insights)
- Improvement suggestions (communication tips)
- Start: Run
python engine.py - Real-time Monitoring: Webcam shows live emotion/gaze detection
- Speech Trigger: Speak to trigger full analysis
- Speaker Identification: Specify if you (0) or other person (1) spoke
- Conversation Tracking: System automatically stores dialogue history
- Analysis Display: View cultural context and feedback based on conversation history
- Continue: Press Enter to continue monitoring
- Exit: Press 'q' to quit
Audio Input โ Transcription โ Speaker ID โ Conversation History โ โ
โโ LLM Analysis โ Cultural Feedback
Video Input โ Emotion + Gaze โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
The system maintains intelligent conversation history to provide contextually-aware feedback and analysis. This feature enables the AI to understand conversation flow and provide more accurate cultural guidance.
- Real-time Tracking: Every transcribed speech is automatically stored
- Speaker Identification: System prompts user to identify speaker:
0= SELF (the user)1= OTHER (conversation partner)
- Formatted Storage: Conversations stored with clear prefixes:
SELF: Hello, how are you today? OTHER: I'm doing well, thank you for asking. SELF: That's great to hear!
- Historical Context: LLM analyzes conversation flow and progression
- Context Window: Uses last 20 conversation exchanges for analysis
- Cultural Continuity: Maintains cultural sensitivity across entire conversation
- Emotional Progression: Tracks emotional state changes throughout dialogue
In audio/transcribe.py:
class AudioTranscriber:
def __init__(self, ...):
self.conversation_history = [] # Stores conversation flowIn engine.py:
# Speaker identification and storage
speaker = input("Enter speaker {self:0, other:1}: ")
if speaker == "1":
transcriber.conversation_history.append("OTHER: " + transcript)
# Analyze with conversation context
analysis = analyze_conversation(
cultural_context, emotional_state, gaze_behavior,
"\n".join(transcriber.conversation_history[-20:]) # Last 20 exchanges
)
else:
transcriber.conversation_history.append("SELF: " + transcript)- Enhanced Accuracy: Better cultural guidance based on conversation flow
- Contextual Feedback: Suggestions consider previous dialogue
- Improved Learning: System understands conversation patterns
- Cultural Continuity: Maintains appropriate cultural tone throughout interaction
- ๐ค Audio Module:
python audio/transcribe.py- Demo speech transcription - ๐๏ธ Vision Module:
python vision/main.py- Demo visual analysis - ๐ง Analyzer Module: LLM-powered multimodal analysis
conda env export --no-builds > environment.ymlconda env create -f environment.ymlpip install -r requirements.txt- Real-time multimodal integration
- Advanced cultural context models
- Mobile application development
- Extended language support
- Personalized feedback systems