Skip to content

ya0002/anxiety_helper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  Project Title

"AI-Powered Social Cue Analyzer for Enhancing Communication in Socially Anxious Individuals with Cultural Sensitivity"


๐Ÿ” Problem Statement

Socially anxious individuals often struggle to interpret and respond appropriately to social cues such as facial expressions, tone, and gestures. These difficulties are exacerbated in multicultural or cross-cultural environments, where the norms of expression and interpretation can vary significantly. A culturally unaware AI system could misinterpret behavior or offer incorrect feedback, leading to reduced effectiveness or even harm.


๐ŸŽฏ Objective

Develop a multimodal, culturally-aware AI assistant that:

  1. Analyzes facial expressions, gaze, and speech patterns.

  2. Evaluates linguistic content using an LLM.

  3. Delivers real-time or post-session feedback on:

    • Userโ€™s nonverbal expressiveness.

    • Their interpretation of others' cues.

    • Suggested communication improvements.

  4. Incorporates cultural variability into emotion recognition and response recommendations.


๐ŸŒ Cultural Context Framework

๐Ÿ“‹ Country Context Dictionary Keys

The country_context.py file contains detailed cultural information organized using these 14 key dimensions:

  1. formality - Level of formality expected in interactions (formal vs. informal communication styles)
  2. individualism_collectivism - Whether the culture prioritizes individual or group needs and decisions
  3. time_orientation - Approach to time management (monochronic/punctual vs. polychronic/flexible)
  4. context_orientation - Communication style (high-context/indirect vs. low-context/direct)
  5. mental_health_stigma - Cultural attitudes toward mental health discussion and professional help
  6. greeting_norms - Appropriate ways to greet people (handshakes, bows, verbal greetings)
  7. small_talk_topics - Safe and appropriate casual conversation topics
  8. sensitive_topics - Topics to avoid or approach carefully in conversations
  9. gift_giving_norms - Cultural expectations around giving and receiving gifts
  10. eye_contact_and_gaze_tips - Appropriate eye contact patterns and gaze behavior
  11. emotional_display_rules - Cultural norms for expressing emotions in public
  12. coping_and_support_norms - How people typically seek and receive emotional support
  13. stress_expression - How stress and disagreement are typically communicated
  14. nonverbal_signals - Important gestures, body language, and their cultural meanings

Countries in Database: The country_context.py file includes 6 countries with complete cultural context implementation:

Fully Implemented (6 countries):

  • India ๐Ÿ‡ฎ๐Ÿ‡ณ - Complete cultural context with all 14 dimensions
  • China ๐Ÿ‡จ๐Ÿ‡ณ - Complete cultural context with all 14 dimensions
  • United States ๐Ÿ‡บ๐Ÿ‡ธ - Complete cultural context with all 14 dimensions
  • Japan ๐Ÿ‡ฏ๐Ÿ‡ต - Complete cultural context with all 14 dimensions
  • Germany ๐Ÿ‡ฉ๐Ÿ‡ช - Complete cultural context with all 14 dimensions
  • Spain ๐Ÿ‡ช๐Ÿ‡ธ - Complete cultural context with all 14 dimensions

Each country includes comprehensive cultural information covering formality levels, individualism vs. collectivism, time orientation, communication styles, mental health attitudes, greeting norms, appropriate conversation topics, gift-giving customs, eye contact patterns, emotional expression rules, support systems, stress communication, and important nonverbal signals.

๐Ÿค– LLM Response Format

The prompts.py file defines a structured JSON response format that the LLM returns for each analysis:

{
  "how_to_take_the_conversation_forward": "Actionable communication steps suitable to the cultural context and emotional state",
  "what_to_avoid": "Behaviors, topics, or nonverbal cues that might cause discomfort or misunderstanding", 
  "reassurance_to_offer": "Gentle words or gestures that help calm the anxious person",
  "nonverbal_tips": "Body language or gaze tips that would appear appropriate and comforting in this culture",
  "small_talk_starters": ["Array of suggested casual conversation openers suitable to culture and context"],
  "tone_and_pacing_tips": "Advice on how fast to speak, how formal to sound, and how to pause",
  "cultural_sensitivity_notes": "Important cultural taboos or politeness norms to keep in mind"
}

This structured response ensures the AI provides:

  • Actionable guidance for continuing conversations
  • Cultural awareness of what to avoid
  • Emotional support through appropriate reassurance
  • Nonverbal coaching for body language and gaze
  • Conversation starters tailored to cultural context
  • Communication style advice for tone and pacing
  • Cultural sensitivity reminders for appropriate behavior

๐Ÿ“ Project Structure

anxiety_helper/
โ”œโ”€โ”€ ๐Ÿ“„ engine.py                    # Main application engine
โ”œโ”€โ”€ ๐Ÿ“„ environment.yml              # Conda environment configuration
โ”œโ”€โ”€ ๐Ÿ“„ requirements.txt             # Python dependencies
โ”œโ”€โ”€ ๐Ÿ“„ README.md                    # Project documentation
โ”œโ”€โ”€ ๐Ÿ“„ prompts used to create context.txt  # Context prompts for development
โ”‚
โ”œโ”€โ”€ ๐ŸŽค audio/                       # Audio processing module
โ”‚   โ””โ”€โ”€ transcribe.py               # Speech-to-text transcription with conversation history
โ”‚
โ”œโ”€โ”€ ๐Ÿง  analyzer/                    # LLM analysis module
โ”‚   โ”œโ”€โ”€ country_context.py          # Cultural context handling
โ”‚   โ”œโ”€โ”€ llm_analyzer.py             # Main LLM analysis logic
โ”‚   โ”œโ”€โ”€ llm.py                      # LLM integration
โ”‚   โ”œโ”€โ”€ prompts.py                  # LLM prompts and templates
โ”‚   โ””โ”€โ”€ utils.py                    # Utility functions
โ”‚
โ””โ”€โ”€ ๐Ÿ‘๏ธ vision/                      # Computer vision module
    โ”œโ”€โ”€ main.py                     # Vision processing entry point    โ”œโ”€โ”€ analyzers/                  # Vision analysis components
    โ”‚   โ”œโ”€โ”€ base_analyzer.py        # Base analyzer class
    โ”‚   โ”œโ”€โ”€ emotion_analyzer.py     # Facial emotion detection
    โ”‚   โ””โ”€โ”€ gaze_analyzer.py        # Eye gaze tracking and engagement detection (Bored/Interested/Surprised)
    โ””โ”€โ”€ utils/                      # Vision utilities
        โ””โ”€โ”€ image_utils.py          # Image processing utilities

๐Ÿ”ง Module Descriptions

๐ŸŽค Audio Module

  • Purpose: Processes speech input and converts it to text for analysis
  • Key Components:
    • transcribe.py: Real-time speech-to-text transcription with conversation history tracking
  • Features:
    • Conversation History: Automatically stores and tracks conversation flow between speakers
    • Speaker Identification: Distinguishes between "SELF" and "OTHER" participants
    • Context Preservation: Maintains conversation context for improved LLM analysis
  • Demo: Run python audio/transcribe.py

๐Ÿ‘๏ธ Vision Module

  • Purpose: Analyzes visual cues including facial expressions and gaze patterns
  • Key Components:
    • emotion_analyzer.py: Detects facial emotions and expressions
    • gaze_analyzer.py: Tracks eye movement, gaze direction, and engagement states (Bored, Interested, Surprised)
    • image_utils.py: Image preprocessing and utility functions
  • Demo: Run python vision/main.py
  • Recent Update: Gaze analyzer no longer detects "Excited" state to avoid classification overlap

๐Ÿง  Analyzer Module

  • Purpose: Processes multimodal data using Large Language Models for cultural context
  • Key Components:
    • llm_analyzer.py: Main analysis engine combining all modalities
    • country_context.py: Cultural sensitivity and context awareness
    • prompts.py: Structured prompts for LLM interactions
    • llm.py: LLM integration and API handling
    • utils.py: Helper functions for data processing

๐Ÿ“„ Core Files

  • engine.py: Main application orchestrator
  • environment.yml: Conda environment specification
  • requirements.txt: Python package dependencies

๏ฟฝ Pipeline Flow & Architecture

๐ŸŽฏ Main Engine (engine.py)

The engine.py file orchestrates the entire multimodal analysis pipeline, integrating all three modules into a cohesive real-time system:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    ๐Ÿš€ ANXIETY HELPER PIPELINE                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                    โ”‚
                                    โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   ๐ŸŽค AUDIO      โ”‚    โ”‚   ๐Ÿ‘๏ธ VISION     โ”‚    โ”‚   ๐Ÿง  ANALYZER   โ”‚
โ”‚   MODULE        โ”‚    โ”‚   MODULE        โ”‚    โ”‚   MODULE        โ”‚
โ”‚                 โ”‚    โ”‚                 โ”‚    โ”‚                 โ”‚
โ”‚ โ€ข Transcriber   โ”‚    โ”‚ โ€ข Emotion       โ”‚    โ”‚ โ€ข LLM Analysis  โ”‚
โ”‚ โ€ข Real-time     โ”‚    โ”‚ โ€ข Gaze Tracking โ”‚    โ”‚ โ€ข Cultural      โ”‚
โ”‚   Speech-to-    โ”‚    โ”‚ โ€ข Visual Cues   โ”‚    โ”‚   Context       โ”‚
โ”‚   Text          โ”‚    โ”‚                 โ”‚    โ”‚ โ€ข Feedback      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ”‚                       โ”‚                       โ”‚
         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                 โ”‚
                                 โ–ผ
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  ๐Ÿ“Š INTEGRATED ANALYSIS โ”‚
                    โ”‚                         โ”‚
                    โ”‚ โ€ข Multimodal Fusion     โ”‚
                    โ”‚ โ€ข Cultural Adaptation   โ”‚
                    โ”‚ โ€ข Real-time Feedback    โ”‚
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Step-by-Step Pipeline Flow

1. ๐ŸŽฌ Initialization Phase

# Country context setup (default: "China")
country = "China"  # Cultural context parameter

# Initialize components
transcriber = AudioTranscriber()      # Audio processing with conversation history
emotion_analyzer = EmotionAnalyzer()  # Facial emotion detection
gaze_analyzer = GazeAnalyzer()        # Eye tracking

2. ๐Ÿ”„ Real-time Processing Loop

The engine runs a continuous loop that processes multiple data streams:

a) ๐Ÿ“น Video Frame Capture

  • Captures frames from webcam (cv2.VideoCapture)
  • Processes each frame through vision analyzers

b) ๐Ÿ˜Š Emotion Detection

  • EmotionAnalyzer.analyze(frame) โ†’ Returns emotion + confidence
  • Detects: happiness, sadness, anger, fear, surprise, etc.

c) ๐Ÿ‘€ Gaze Analysis

  • GazeAnalyzer.analyze(frame) โ†’ Returns gaze direction and engagement state
  • Gaze Direction: Tracks "Left", "Right", "Center" eye positioning
  • Engagement States: Detects "Bored", "Interested", "Surprised" based on eye openness and eyebrow position

d) ๐ŸŽค Audio Transcription

  • AudioTranscriber.get_transcription() โ†’ Returns speech text
  • Continuous background audio processing
  • Speaker Identification: User specifies speaker (0=self, 1=other)
  • Conversation History: Stores dialogue with "SELF:" and "OTHER:" prefixes
  • Context Window: Analyzes last 20 conversation entries for historical context

3. ๐Ÿง  Multimodal Analysis Trigger

When new speech is detected, the system triggers comprehensive analysis:

if transcript:  # New speech detected
    analysis = analyze_conversation(
        cultural_context=country,     # e.g., "China"
        emotional_state=emotion,      # e.g., "neutral"
        gaze_behavior=gaze,          # e.g., "direct"
        transcript=transcript        # Spoken content
    )

4. ๐ŸŒ Cultural Context Integration

The LLM analyzer combines all inputs with cultural awareness:

  • Speech Content: What was said
  • Emotional State: How it was expressed facially
  • Gaze Behavior: Eye contact patterns
  • Conversation History: Previous dialogue context (last 20 exchanges)
  • Speaker Identification: Who said what (SELF vs OTHER)
  • Cultural Context: Country-specific social norms

5. ๐Ÿ’ฌ Feedback Generation

The system provides:

  • Real-time visual feedback (on-screen labels)
  • Comprehensive analysis (cultural sensitivity insights)
  • Improvement suggestions (communication tips)

๐ŸŽฎ User Interaction Flow

  1. Start: Run python engine.py
  2. Real-time Monitoring: Webcam shows live emotion/gaze detection
  3. Speech Trigger: Speak to trigger full analysis
  4. Speaker Identification: Specify if you (0) or other person (1) spoke
  5. Conversation Tracking: System automatically stores dialogue history
  6. Analysis Display: View cultural context and feedback based on conversation history
  7. Continue: Press Enter to continue monitoring
  8. Exit: Press 'q' to quit

๐Ÿ“ˆ Data Flow Summary

Audio Input โ†’ Transcription โ†’ Speaker ID โ†’ Conversation History โ†’ โ”
                                                                  โ”œโ†’ LLM Analysis โ†’ Cultural Feedback
Video Input โ†’ Emotion + Gaze โ†’ โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ โ”˜

๐Ÿ’ฌ Conversation History & Context Tracking

๐ŸŽฏ Overview

The system maintains intelligent conversation history to provide contextually-aware feedback and analysis. This feature enables the AI to understand conversation flow and provide more accurate cultural guidance.

๐Ÿ”ง Key Features

๐Ÿ“ Automatic Conversation Storage

  • Real-time Tracking: Every transcribed speech is automatically stored
  • Speaker Identification: System prompts user to identify speaker:
    • 0 = SELF (the user)
    • 1 = OTHER (conversation partner)
  • Formatted Storage: Conversations stored with clear prefixes:
    SELF: Hello, how are you today?
    OTHER: I'm doing well, thank you for asking.
    SELF: That's great to hear!
    

๐Ÿง  Context-Aware Analysis

  • Historical Context: LLM analyzes conversation flow and progression
  • Context Window: Uses last 20 conversation exchanges for analysis
  • Cultural Continuity: Maintains cultural sensitivity across entire conversation
  • Emotional Progression: Tracks emotional state changes throughout dialogue

๐Ÿ”„ Implementation Details

In audio/transcribe.py:

class AudioTranscriber:
    def __init__(self, ...):
        self.conversation_history = []  # Stores conversation flow

In engine.py:

# Speaker identification and storage
speaker = input("Enter speaker {self:0, other:1}: ")
if speaker == "1":
    transcriber.conversation_history.append("OTHER: " + transcript)
    # Analyze with conversation context
    analysis = analyze_conversation(
        cultural_context, emotional_state, gaze_behavior,
        "\n".join(transcriber.conversation_history[-20:])  # Last 20 exchanges
    )
else:
    transcriber.conversation_history.append("SELF: " + transcript)

๐Ÿ“Š Benefits

  • Enhanced Accuracy: Better cultural guidance based on conversation flow
  • Contextual Feedback: Suggestions consider previous dialogue
  • Improved Learning: System understands conversation patterns
  • Cultural Continuity: Maintains appropriate cultural tone throughout interaction

๏ฟฝ๐Ÿš€ Current Setup

๐Ÿ“‹ Quick Start

  • ๐ŸŽค Audio Module: python audio/transcribe.py - Demo speech transcription
  • ๐Ÿ‘๏ธ Vision Module: python vision/main.py - Demo visual analysis
  • ๐Ÿง  Analyzer Module: LLM-powered multimodal analysis

๐Ÿ Environment Management

Export Current Environment

conda env export --no-builds > environment.yml

Create Environment from Configuration

conda env create -f environment.yml

Alternative: Install from Requirements

pip install -r requirements.txt

๐Ÿ”ฎ Future Enhancements

  • Real-time multimodal integration
  • Advanced cultural context models
  • Mobile application development
  • Extended language support
  • Personalized feedback systems

About

A Multimodal tool to help folks with anxiety communicate better.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages