GitHub - ya0002/anxiety_helper: A Multimodal tool to help folks with anxiety communicate better.

🧠 Project Title

"AI-Powered Social Cue Analyzer for Enhancing Communication in Socially Anxious Individuals with Cultural Sensitivity"

🔍 Problem Statement

Socially anxious individuals often struggle to interpret and respond appropriately to social cues such as facial expressions, tone, and gestures. These difficulties are exacerbated in multicultural or cross-cultural environments, where the norms of expression and interpretation can vary significantly. A culturally unaware AI system could misinterpret behavior or offer incorrect feedback, leading to reduced effectiveness or even harm.

🎯 Objective

Develop a multimodal, culturally-aware AI assistant that:

Analyzes facial expressions, gaze, and speech patterns.
Evaluates linguistic content using an LLM.
Delivers real-time or post-session feedback on:
- User’s nonverbal expressiveness.
- Their interpretation of others' cues.
- Suggested communication improvements.
Incorporates cultural variability into emotion recognition and response recommendations.

🌍 Cultural Context Framework

📋 Country Context Dictionary Keys

The country_context.py file contains detailed cultural information organized using these 14 key dimensions:

formality - Level of formality expected in interactions (formal vs. informal communication styles)
individualism_collectivism - Whether the culture prioritizes individual or group needs and decisions
time_orientation - Approach to time management (monochronic/punctual vs. polychronic/flexible)
context_orientation - Communication style (high-context/indirect vs. low-context/direct)
mental_health_stigma - Cultural attitudes toward mental health discussion and professional help
greeting_norms - Appropriate ways to greet people (handshakes, bows, verbal greetings)
small_talk_topics - Safe and appropriate casual conversation topics
sensitive_topics - Topics to avoid or approach carefully in conversations
gift_giving_norms - Cultural expectations around giving and receiving gifts
eye_contact_and_gaze_tips - Appropriate eye contact patterns and gaze behavior
emotional_display_rules - Cultural norms for expressing emotions in public
coping_and_support_norms - How people typically seek and receive emotional support
stress_expression - How stress and disagreement are typically communicated
nonverbal_signals - Important gestures, body language, and their cultural meanings

Countries in Database: The country_context.py file includes 6 countries with complete cultural context implementation:

Fully Implemented (6 countries):

India 🇮🇳 - Complete cultural context with all 14 dimensions
China 🇨🇳 - Complete cultural context with all 14 dimensions
United States 🇺🇸 - Complete cultural context with all 14 dimensions
Japan 🇯🇵 - Complete cultural context with all 14 dimensions
Germany 🇩🇪 - Complete cultural context with all 14 dimensions
Spain 🇪🇸 - Complete cultural context with all 14 dimensions

Each country includes comprehensive cultural information covering formality levels, individualism vs. collectivism, time orientation, communication styles, mental health attitudes, greeting norms, appropriate conversation topics, gift-giving customs, eye contact patterns, emotional expression rules, support systems, stress communication, and important nonverbal signals.

🤖 LLM Response Format

The prompts.py file defines a structured JSON response format that the LLM returns for each analysis:

{
  "how_to_take_the_conversation_forward": "Actionable communication steps suitable to the cultural context and emotional state",
  "what_to_avoid": "Behaviors, topics, or nonverbal cues that might cause discomfort or misunderstanding", 
  "reassurance_to_offer": "Gentle words or gestures that help calm the anxious person",
  "nonverbal_tips": "Body language or gaze tips that would appear appropriate and comforting in this culture",
  "small_talk_starters": ["Array of suggested casual conversation openers suitable to culture and context"],
  "tone_and_pacing_tips": "Advice on how fast to speak, how formal to sound, and how to pause",
  "cultural_sensitivity_notes": "Important cultural taboos or politeness norms to keep in mind"
}

This structured response ensures the AI provides:

Actionable guidance for continuing conversations
Cultural awareness of what to avoid
Emotional support through appropriate reassurance
Nonverbal coaching for body language and gaze
Conversation starters tailored to cultural context
Communication style advice for tone and pacing
Cultural sensitivity reminders for appropriate behavior

📁 Project Structure

anxiety_helper/
├── 📄 engine.py                    # Main application engine
├── 📄 environment.yml              # Conda environment configuration
├── 📄 requirements.txt             # Python dependencies
├── 📄 README.md                    # Project documentation
├── 📄 prompts used to create context.txt  # Context prompts for development
│
├── 🎤 audio/                       # Audio processing module
│   └── transcribe.py               # Speech-to-text transcription with conversation history
│
├── 🧠 analyzer/                    # LLM analysis module
│   ├── country_context.py          # Cultural context handling
│   ├── llm_analyzer.py             # Main LLM analysis logic
│   ├── llm.py                      # LLM integration
│   ├── prompts.py                  # LLM prompts and templates
│   └── utils.py                    # Utility functions
│
└── 👁️ vision/                      # Computer vision module
    ├── main.py                     # Vision processing entry point    ├── analyzers/                  # Vision analysis components
    │   ├── base_analyzer.py        # Base analyzer class
    │   ├── emotion_analyzer.py     # Facial emotion detection
    │   └── gaze_analyzer.py        # Eye gaze tracking and engagement detection (Bored/Interested/Surprised)
    └── utils/                      # Vision utilities
        └── image_utils.py          # Image processing utilities

🔧 Module Descriptions

🎤 Audio Module

Purpose: Processes speech input and converts it to text for analysis
Key Components:
- transcribe.py: Real-time speech-to-text transcription with conversation history tracking
Features:
- Conversation History: Automatically stores and tracks conversation flow between speakers
- Speaker Identification: Distinguishes between "SELF" and "OTHER" participants
- Context Preservation: Maintains conversation context for improved LLM analysis
Demo: Run python audio/transcribe.py

👁️ Vision Module

Purpose: Analyzes visual cues including facial expressions and gaze patterns
Key Components:
- emotion_analyzer.py: Detects facial emotions and expressions
- gaze_analyzer.py: Tracks eye movement, gaze direction, and engagement states (Bored, Interested, Surprised)
- image_utils.py: Image preprocessing and utility functions
Demo: Run python vision/main.py
Recent Update: Gaze analyzer no longer detects "Excited" state to avoid classification overlap

🧠 Analyzer Module

Purpose: Processes multimodal data using Large Language Models for cultural context
Key Components:
- llm_analyzer.py: Main analysis engine combining all modalities
- country_context.py: Cultural sensitivity and context awareness
- prompts.py: Structured prompts for LLM interactions
- llm.py: LLM integration and API handling
- utils.py: Helper functions for data processing

📄 Core Files

engine.py: Main application orchestrator
environment.yml: Conda environment specification
requirements.txt: Python package dependencies

� Pipeline Flow & Architecture

🎯 Main Engine (`engine.py`)

The engine.py file orchestrates the entire multimodal analysis pipeline, integrating all three modules into a cohesive real-time system:

┌─────────────────────────────────────────────────────────────────┐
│                    🚀 ANXIETY HELPER PIPELINE                   │
└─────────────────────────────────────────────────────────────────┘
                                    │
                                    ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   🎤 AUDIO      │    │   👁️ VISION     │    │   🧠 ANALYZER   │
│   MODULE        │    │   MODULE        │    │   MODULE        │
│                 │    │                 │    │                 │
│ • Transcriber   │    │ • Emotion       │    │ • LLM Analysis  │
│ • Real-time     │    │ • Gaze Tracking │    │ • Cultural      │
│   Speech-to-    │    │ • Visual Cues   │    │   Context       │
│   Text          │    │                 │    │ • Feedback      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
         │                       │                       │
         └───────────────────────┼───────────────────────┘
                                 │
                                 ▼
                    ┌─────────────────────────┐
                    │  📊 INTEGRATED ANALYSIS │
                    │                         │
                    │ • Multimodal Fusion     │
                    │ • Cultural Adaptation   │
                    │ • Real-time Feedback    │
                    └─────────────────────────┘

🔧 Step-by-Step Pipeline Flow

1. 🎬 Initialization Phase

# Country context setup (default: "China")
country = "China"  # Cultural context parameter

# Initialize components
transcriber = AudioTranscriber()      # Audio processing with conversation history
emotion_analyzer = EmotionAnalyzer()  # Facial emotion detection
gaze_analyzer = GazeAnalyzer()        # Eye tracking

2. 🔄 Real-time Processing Loop

The engine runs a continuous loop that processes multiple data streams:

a) 📹 Video Frame Capture

Captures frames from webcam (cv2.VideoCapture)
Processes each frame through vision analyzers

b) 😊 Emotion Detection

EmotionAnalyzer.analyze(frame) → Returns emotion + confidence
Detects: happiness, sadness, anger, fear, surprise, etc.

c) 👀 Gaze Analysis

GazeAnalyzer.analyze(frame) → Returns gaze direction and engagement state
Gaze Direction: Tracks "Left", "Right", "Center" eye positioning
Engagement States: Detects "Bored", "Interested", "Surprised" based on eye openness and eyebrow position

d) 🎤 Audio Transcription

AudioTranscriber.get_transcription() → Returns speech text
Continuous background audio processing
Speaker Identification: User specifies speaker (0=self, 1=other)
Conversation History: Stores dialogue with "SELF:" and "OTHER:" prefixes
Context Window: Analyzes last 20 conversation entries for historical context

3. 🧠 Multimodal Analysis Trigger

When new speech is detected, the system triggers comprehensive analysis:

if transcript:  # New speech detected
    analysis = analyze_conversation(
        cultural_context=country,     # e.g., "China"
        emotional_state=emotion,      # e.g., "neutral"
        gaze_behavior=gaze,          # e.g., "direct"
        transcript=transcript        # Spoken content
    )

4. 🌍 Cultural Context Integration

The LLM analyzer combines all inputs with cultural awareness:

Speech Content: What was said
Emotional State: How it was expressed facially
Gaze Behavior: Eye contact patterns
Conversation History: Previous dialogue context (last 20 exchanges)
Speaker Identification: Who said what (SELF vs OTHER)
Cultural Context: Country-specific social norms

5. 💬 Feedback Generation

The system provides:

Real-time visual feedback (on-screen labels)
Comprehensive analysis (cultural sensitivity insights)
Improvement suggestions (communication tips)

🎮 User Interaction Flow

Start: Run python engine.py
Real-time Monitoring: Webcam shows live emotion/gaze detection
Speech Trigger: Speak to trigger full analysis
Speaker Identification: Specify if you (0) or other person (1) spoke
Conversation Tracking: System automatically stores dialogue history
Analysis Display: View cultural context and feedback based on conversation history
Continue: Press Enter to continue monitoring
Exit: Press 'q' to quit

📈 Data Flow Summary

Audio Input → Transcription → Speaker ID → Conversation History → ┐
                                                                  ├→ LLM Analysis → Cultural Feedback
Video Input → Emotion + Gaze → ────────────────────────────────── ┘

💬 Conversation History & Context Tracking

🎯 Overview

The system maintains intelligent conversation history to provide contextually-aware feedback and analysis. This feature enables the AI to understand conversation flow and provide more accurate cultural guidance.

🔧 Key Features

📝 Automatic Conversation Storage

Real-time Tracking: Every transcribed speech is automatically stored
Speaker Identification: System prompts user to identify speaker:
- 0 = SELF (the user)
- 1 = OTHER (conversation partner)

Formatted Storage: Conversations stored with clear prefixes:

SELF: Hello, how are you today?
OTHER: I'm doing well, thank you for asking.
SELF: That's great to hear!

🧠 Context-Aware Analysis

Historical Context: LLM analyzes conversation flow and progression
Context Window: Uses last 20 conversation exchanges for analysis
Cultural Continuity: Maintains cultural sensitivity across entire conversation
Emotional Progression: Tracks emotional state changes throughout dialogue

🔄 Implementation Details

In audio/transcribe.py:

class AudioTranscriber:
    def __init__(self, ...):
        self.conversation_history = []  # Stores conversation flow

In engine.py:

# Speaker identification and storage
speaker = input("Enter speaker {self:0, other:1}: ")
if speaker == "1":
    transcriber.conversation_history.append("OTHER: " + transcript)
    # Analyze with conversation context
    analysis = analyze_conversation(
        cultural_context, emotional_state, gaze_behavior,
        "\n".join(transcriber.conversation_history[-20:])  # Last 20 exchanges
    )
else:
    transcriber.conversation_history.append("SELF: " + transcript)

📊 Benefits

Enhanced Accuracy: Better cultural guidance based on conversation flow
Contextual Feedback: Suggestions consider previous dialogue
Improved Learning: System understands conversation patterns
Cultural Continuity: Maintains appropriate cultural tone throughout interaction

�🚀 Current Setup

📋 Quick Start

🎤 Audio Module: python audio/transcribe.py - Demo speech transcription
👁️ Vision Module: python vision/main.py - Demo visual analysis
🧠 Analyzer Module: LLM-powered multimodal analysis

🐍 Environment Management

Export Current Environment

conda env export --no-builds > environment.yml

Create Environment from Configuration

conda env create -f environment.yml

Alternative: Install from Requirements

pip install -r requirements.txt

🔮 Future Enhancements

Real-time multimodal integration
Advanced cultural context models
Mobile application development
Extended language support
Personalized feedback systems

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
analyzer		analyzer
audio		audio
vision		vision
.gitignore		.gitignore
README.md		README.md
engine.py		engine.py
environment.yml		environment.yml
prompts used to create context.txt		prompts used to create context.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Project Title

🔍 Problem Statement

🎯 Objective

🌍 Cultural Context Framework

📋 Country Context Dictionary Keys

🤖 LLM Response Format

📁 Project Structure

🔧 Module Descriptions

🎤 Audio Module

👁️ Vision Module

🧠 Analyzer Module

📄 Core Files

� Pipeline Flow & Architecture

🎯 Main Engine (engine.py)

🔧 Step-by-Step Pipeline Flow

1. 🎬 Initialization Phase

2. 🔄 Real-time Processing Loop

3. 🧠 Multimodal Analysis Trigger

4. 🌍 Cultural Context Integration

5. 💬 Feedback Generation

🎮 User Interaction Flow

📈 Data Flow Summary

💬 Conversation History & Context Tracking

🎯 Overview

🔧 Key Features

📝 Automatic Conversation Storage

🧠 Context-Aware Analysis

🔄 Implementation Details

📊 Benefits

�🚀 Current Setup

📋 Quick Start

🐍 Environment Management

Export Current Environment

Create Environment from Configuration

Alternative: Install from Requirements

🔮 Future Enhancements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

🎯 Main Engine (`engine.py`)

Packages