This document lists all the APIs, tools, and technologies used in building Lexia.
Purpose: High-quality text-to-speech with word-level timestamps
- Endpoint:
/v1/text-to-speech/{voice_id}/with-timestamps - Features Used:
- Text-to-speech synthesis with multiple voice options
- Word-level timestamp data for synchronized highlighting
- Voice catalog retrieval for dynamic voice selection
- Documentation: ElevenLabs API Docs
- License: Commercial API (users provide their own API keys)
Purpose: AI-powered text editing and explanations
- Model:
claude-sonnet-4-20250514 - Endpoint:
/v1/messages - Features Used:
- Text editing based on voice commands (e.g., "make this more formal")
- Text explanation generation for the "Explain" feature
- Documentation: Anthropic API Docs
- License: Commercial API (users provide their own API keys)
Purpose: Browser-native speech recognition for dictation
- Interface:
SpeechRecognition/webkitSpeechRecognition - Features Used:
- Continuous speech recognition
- Interim results for real-time transcription display
- Language configuration (en-US)
- Documentation: MDN Web Speech API
- License: Built into modern browsers (Chrome, Edge)
chrome.storage.local- Persist user settings and API keyschrome.runtime- Message passing between componentschrome.contextMenus- Right-click "Read with Lexia" menu itemchrome.scripting- Inject content scriptschrome.commands- Keyboard shortcuts (Alt+V, Alt+Space)
Documentation: Chrome Extensions API
window.getSelection()- Capture user's text selectionRangeAPI - Manipulate DOM for word highlightingAudioAPI - Playback of synthesized speechClipboardAPI - Fallback for dictation when no input is focused
Our hold-to-dictate interaction pattern is inspired by the "WhisperFlow" paradigm - a voice input method where users hold a button to speak and release to insert text. This provides:
- Clear start/stop boundaries for speech recognition
- Immediate visual feedback during dictation
- Easy cancellation (ESC key)
The floating sidebar player design is inspired by Speechify, featuring:
- Compact, non-intrusive sidebar
- Word-by-word highlighting synchronized with audio
- Speed controls and skip functionality
| Tool | Purpose |
|---|---|
| Visual Studio Code | Primary IDE |
| Chrome DevTools | Extension debugging |
| Git/GitHub | Version control |
- SVG icons are based on Font Awesome style paths
- Icons are embedded inline as SVG for performance and simplicity
- License: Icons created to match Font Awesome aesthetic (not directly copied)
Lexia is built with vanilla JavaScript and has no npm dependencies. This means:
- No build step required
- Minimal extension size
- Easy to audit and understand
- Works directly in the browser
- No backend server - All API calls go directly from the browser to ElevenLabs/Anthropic
- No data collection - We don't track, store, or process any user data
- User-owned API keys - Users maintain full control over their API usage and costs
This project is open source. See the repository for license details.
- ElevenLabs for providing high-quality, natural-sounding TTS with timestamp support
- Anthropic for Claude's excellent instruction-following capabilities
- The Chrome Extensions team for Manifest V3 and comprehensive APIs
- MDN Web Docs for excellent Web Speech API documentation
Built for the ElevenLabs Hackathon by Team AISquad#1