Skip to content

Blink Speech is a revolutionary browser-based assistive communication application that transforms eye blink patterns and gaze gestures into spoken phrases using advanced computer vision and speech synthesis. Built with modern web technologies, it operates entirely client-side to ensure maximum privacy and accessibility.

License

Recursion-Labs/Blink-Speech

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

41 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Blink Speech

Turning blinks and gaze into voice โ€“ communication without boundaries.

Blink Speech is a revolutionary browser-based assistive communication application that transforms eye blink patterns and gaze gestures into spoken phrases using advanced computer vision and speech synthesis. Built with modern web technologies, it operates entirely client-side to ensure maximum privacy and accessibility.

Demo Video Documentation Live Site License


๐Ÿ“Š Project Stats

GitHub stars GitHub forks GitHub issues GitHub pull requests GitHub last commit GitHub code size


โœจ Key Features

๐ŸŽฏ Real-time Gesture Recognition - Advanced computer vision detects blinks and gaze directions
๐Ÿ—ฃ๏ธ Natural Speech Synthesis - High-quality text-to-speech using Web Speech API
๐ŸŽจ Fully Customizable - Create your own gesture-to-phrase mappings
๐Ÿ”’ Privacy-First - All processing happens locally, no video data transmitted
โšก Zero Installation - Runs in any modern web browser with HTTPS
โ™ฟ Accessibility-Focused - Designed for users with motor impairments and speech limitations
๐ŸŒ Multi-language Support - Works with any language or custom phrases
๐Ÿ“ฑ Cross-Platform - Compatible with desktop, tablet, and mobile devices

๐Ÿ› ๏ธ Technology Stack

React Vite TypeScript MediaPipe WebGazer.js TensorFlow.js Tailwind CSS Radix UI Supabase Next.js Web Speech API


๐ŸŒ Vision & Impact

Everyone Deserves a Voice

Blink Speech was born from a simple yet powerful belief: communication is a fundamental human right. By transforming natural eye movements into spoken words, we're breaking down barriers that prevent people from expressing their thoughts, needs, and emotions.

Who We Help

๐Ÿฅ Healthcare Patients

  • ICU patients who cannot speak due to intubation
  • Post-surgery recovery when vocal communication is difficult
  • Individuals with locked-in syndrome or severe paralysis
  • Emergency communication when traditional methods fail

โ™ฟ People with Disabilities

  • ALS (Lou Gehrig's disease) patients as speech deteriorates
  • Individuals with muscular dystrophy or cerebral palsy
  • Stroke survivors during speech therapy recovery
  • Anyone with motor impairments affecting traditional communication

โฐ Temporary Conditions

  • Recovery from oral or throat surgery
  • Severe laryngitis or vocal cord issues
  • Medication side effects affecting speech
  • Fatigue-related communication difficulties

๐ŸŒ Global Accessibility

  • Works in any language or cultural context
  • No specialized hardware or expensive equipment required
  • Runs on existing devices (computers, tablets, phones)
  • Free and open-source for maximum accessibility

๐Ÿš€ Quick Start

For Users

  1. ๐Ÿ“ฑ Try the Demo Video: YouTube Demo
  2. ๐Ÿ“– Read the User Guide: Complete usage instructions
  3. ๐ŸŽฏ Complete Calibration: Follow the 5-point setup for optimal accuracy
  4. ๐Ÿ—ฃ๏ธ Start Communicating: Use blinks and gaze to speak!

For Developers

  1. ๐Ÿ“ฅ Clone the Repository

    git clone https://github.com/akshad-exe/Blink-Speech.git
    cd Blink-Speech
  2. โš™๏ธ Install Dependencies

    # Frontend
    cd frontend && npm install
    
    # Backend
    cd ../backend && npm install
  3. ๐Ÿ”ง Configure Environment

  4. ๐ŸŽฌ Run Development Servers

    # Terminal 1 - Frontend (https://localhost:5173)
    cd frontend && npm run dev
    
    # Terminal 2 - Backend (http://localhost:3001)
    cd backend && npm run dev

๐Ÿ“š Complete Documentation

๐Ÿ“– User Documentation

๐Ÿ—๏ธ Technical Documentation

๐Ÿ”ฌ Core Technologies

๐Ÿ“‹ Complete Documentation Hub - Start here for all documentation


๐Ÿ‘ฅ Team

Role Name GitHub
๐Ÿง  Project Lead Md Athar Jamal Makki @atharhive
๐ŸŽจ Frontend Lead Akshad Jogi @akshad-exe
๐Ÿ› ๏ธ Backend Lead Ayush Sarkar @dev-Ninjaa

๐ŸŽฏ How It Works

1. ๐Ÿ‘๏ธ Gesture Recognition

Advanced computer vision powered by MediaPipe and WebGazer.js detects:

  • Blink Patterns: Single, double, triple, and long blinks
  • Gaze Directions: Left, right, up, down, and center positioning
  • Combined Gestures: Blinks + gaze for complex communication (20+ combinations)

2. ๐ŸŽฏ Real-time Processing

  • <150ms Detection Latency: Near-instantaneous gesture recognition
  • Eye Aspect Ratio (EAR): Scientific method for accurate blink detection
  • Adaptive Thresholds: Automatic calibration for optimal performance
  • 15-30 FPS Processing: Smooth real-time operation

3. ๐Ÿ—ฃ๏ธ Speech Synthesis

  • Web Speech API: High-quality, natural-sounding voices
  • Multi-language Support: Works with any language
  • Customizable Voice: Adjust rate, pitch, and volume
  • <1s Speech Latency: From gesture to spoken word

4. ๐Ÿ”’ Privacy & Security

  • 100% Local Processing: No video data ever leaves your device
  • HTTPS Encryption: Secure communication protocols
  • Anonymous Usage: No personal information required
  • Local Storage: Settings saved securely on your device

๐Ÿฅ Medical & Healthcare Applications

Critical Care Benefits

๐Ÿšจ Emergency Communication: Instant access to critical phrases ("Help", "Pain", "Emergency")
๐Ÿ“Š Patient Monitoring: Non-verbal feedback for medical assessment
๐Ÿ”„ Telemedicine Integration: Remote patient communication capabilities
โšก Rapid Response: Immediate notification systems for urgent needs

Rehabilitation Support

๐Ÿง  Stroke Recovery: Bridge communication during speech therapy
๐Ÿ’ช Motor Skill Development: Eye-tracking exercises aid neurological recovery
๐Ÿ“ˆ Progress Tracking: Monitor improvement in motor control and communication
๐ŸŽฏ Adaptive Learning: System learns and adapts to individual capabilities

Long-term Care

๐Ÿ  Home Healthcare: Enables independent communication with caregivers
๐Ÿ“ฑ Family Connection: Stay connected with loved ones remotely
๐Ÿ”” Alert Systems: Customizable emergency and routine notifications
๐Ÿ“ Care Documentation: Optional logging for healthcare providers


๐Ÿ“Š Performance & Compatibility

System Specifications

  • Detection Accuracy: >95% in optimal conditions
  • Latency: <150ms gesture recognition, <1s speech output
  • Frame Rate: Adaptive 15-30 FPS based on device capabilities
  • Memory Usage: <100MB typical operation
  • Storage: ~50MB for complete application cache

Browser Support

Browser Version MediaPipe WebGazer Speech API Status
Chrome 80+ โœ… โœ… โœ… โœ… Optimal
Firefox 75+ โœ… โœ… โœ… โœ… Excellent
Safari 13+ โœ… โš ๏ธ โœ… โœ… Good
Edge 80+ โœ… โœ… โœ… โœ… Excellent

Device Compatibility

๐Ÿ–ฅ๏ธ Desktop: Windows, macOS, Linux - Full feature support
๐Ÿ“ฑ Tablet: iPad, Android tablets - Optimized touch interface
๐Ÿ“ฒ Mobile: Smartphone support with adaptive UI
๐ŸŽฅ Cameras: Built-in webcams, USB cameras, HD recommended


๐Ÿš€ Roadmap & Future Features

๐Ÿ”ฎ Version 2.0 (In Development)

  • ๐Ÿง  AI-Powered Phrase Prediction: Context-aware phrase suggestions
  • ๐ŸŒ Enhanced Multi-language: 50+ languages with native voices
  • ๐Ÿ“Š Analytics Dashboard: Usage patterns and communication insights
  • ๐Ÿ”— Healthcare Integrations: Direct API connections to medical systems

๐ŸŒŸ Future Innovations

  • ๐Ÿ‘“ AR/VR Integration: Wearable device support (AR glasses, smart contact lenses)
  • ๐Ÿค– Machine Learning: Personalized gesture recognition improvement
  • ๐Ÿฅ Medical Partnerships: Integration with hospital communication systems
  • ๐ŸŒ Offline PWA: Complete offline functionality as Progressive Web App
  • ๐ŸŽฎ Gamification: Interactive learning and practice modes

๐Ÿค Community Features

  • ๐Ÿ‘ฅ Gesture Sharing: Community-driven phrase mappings
  • ๐Ÿ“š Learning Resources: Tutorials and best practices
  • ๐Ÿ”ง Plugin System: Extensible architecture for custom integrations
  • ๐Ÿ“ฑ Mobile Apps: Native iOS/Android applications

๐ŸŒŸ Key Features

๐ŸŽฏ Core Capabilities

โœ… Zero Installation - Works instantly in any modern browser
โœ… Complete Privacy - 100% client-side processing, no data transmission
โœ… Real-time Recognition - <150ms gesture detection latency
โœ… Custom Mappings - Create your own gesture-to-phrase combinations
โœ… Multi-language - Support for any language or custom phrases
โœ… Offline Ready - Core features work without internet connection

โ™ฟ Accessibility Features

โœ… High Contrast Mode - Enhanced visibility for users with visual impairments
โœ… Large Text Options - Scalable interface for better readability
โœ… Screen Reader Support - Full compatibility with assistive technologies
โœ… Keyboard Navigation - Complete keyboard accessibility
โœ… Voice Customization - Adjustable speech rate, pitch, and volume
โœ… Emergency Mode - Quick access to critical communication phrases

๐Ÿ”ง Advanced Features

โœ… Adaptive Performance - Automatic optimization based on device capabilities
โœ… Calibration System - Personalized setup for optimal accuracy
โœ… Data Export/Import - Share settings between devices and users
โœ… Cloud Sync - Optional backup and synchronization (Supabase)
โœ… SMS Integration - Send messages via Twilio API
โœ… Real-time Logging - Optional activity tracking for healthcare providers


๐Ÿค Contributing

We welcome contributions from developers, researchers, and accessibility advocates! Here's how you can help:

๐Ÿ› ๏ธ Development

  • ๐Ÿ› Report Bugs: Create an issue with detailed reproduction steps
  • ๐Ÿ’ก Suggest Features: Share ideas for improving accessibility and usability
  • ๐Ÿ”ง Submit Code: Fork, develop, and create pull requests
  • ๐Ÿ“ Documentation: Help improve guides, tutorials, and API docs

๐Ÿงช Testing & Feedback

  • ๐Ÿฅ Healthcare Professionals: Provide clinical insights and use case feedback
  • โ™ฟ Accessibility Users: Share experiences and improvement suggestions
  • ๐ŸŒ Localization: Help translate and adapt for different languages/cultures
  • ๐Ÿ“Š Research: Academic collaboration on computer vision and accessibility

๐Ÿ“‹ Contribution Guidelines

  1. Read our Development Guide
  2. Follow our Code of Conduct
  3. Check existing issues and discussions before creating new ones
  4. Write clear commit messages and documentation
  5. Test thoroughly and include relevant test cases

๐Ÿ“„ License

Blink Speech is open-source software licensed under the MIT License. This means you can:

โœ… Use - For personal, commercial, or research purposes
โœ… Modify - Adapt the code to your specific needs
โœ… Distribute - Share with others or deploy your own version
โœ… Contribute - Help improve the project for everyone


๐Ÿ†˜ Support & Community

๐Ÿ“ž Get Help

๐ŸŒ Connect

  • ๐Ÿ™ GitHub: @akshad-exe/Blink-Speech
  • ๐Ÿ“ง Contact: For accessibility partnerships and healthcare integrations
  • ๐Ÿค Collaborate: Open to academic research partnerships

๐Ÿšจ Emergency Support

For urgent accessibility needs or critical bugs affecting communication:

  1. Create a high-priority GitHub issue
  2. Include detailed system information and reproduction steps
  3. Tag the issue with "urgent" or "accessibility-critical"

๐Ÿ™ Acknowledgments

Research & Inspiration:

  • MediaPipe team at Google for facial landmark detection
  • WebGazer.js contributors for browser-based eye tracking
  • Accessibility research community for guidance and feedback
  • Healthcare professionals providing real-world insights

Open Source Technologies:

  • React and Vite communities for modern web development tools
  • TensorFlow.js for browser-based machine learning
  • Supabase for backend infrastructure
  • Tailwind CSS and Radix UI for accessible design systems

Special Thanks:

  • Beta testers who provided crucial feedback
  • Accessibility advocates who guided our design decisions
  • Healthcare institutions that shared use case requirements
  • Open source contributors who helped improve the codebase

๐ŸŒŸ If Blink Speech has helped you or someone you know, please consider starring the repository to help others discover this tool! ๐ŸŒŸ

โญ Star on GitHub โญ

"Communication is a human right. Technology should make it accessible to everyone."

About

Blink Speech is a revolutionary browser-based assistive communication application that transforms eye blink patterns and gaze gestures into spoken phrases using advanced computer vision and speech synthesis. Built with modern web technologies, it operates entirely client-side to ensure maximum privacy and accessibility.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages