Glyph

Voice-controlled markdown editing that doesn't suck

Voice is an underutilized interface. Most tools treat it as a gimmick, but voice can drive complex workflows faster than typing when done right.

Glyph is a voice-native, agentic markdown editor with conversational AI that learns your patterns and maintains context across sessions.

Documentation

Installation Guide - Setup and configuration
Complete Functionality - Feature overview and usage examples
Architecture Documentation - Technical design and implementation
Testing Guide - QA documentation and test coverage
Contributing Guidelines - Development setup and contribution process

What this does

Glyph has two modes:

Direct editing: Point it at a markdown file, say what you want changed, see a diff, approve or reject
Agent mode: Conversational AI that manages your entire Obsidian vault, remembers context across sessions

The agent mode is where it gets interesting - it learns how you refer to your notes and can handle complex multi-step operations through natural conversation.

Why I built this

Most tools treat voice as an afterthought - a novelty feature that barely works. But voice is actually faster than typing for many operations, especially when you're already thinking in natural language.

Voice commands like "mark the third task complete" or "link this to my architecture notes" are faster than clicking through menus, but most voice tools can't handle context or remember how you work.

Voice works everywhere - on mobile where typing sucks, when your hands are busy, or when you're thinking through complex problems and don't want to stop to navigate UIs. The key is building voice interfaces that actually understand what you mean and remember what you've done, not just transcribe words. Glyph achieves this through persistent memory, multi-turn conversations, context tracking, and a learning system that improves with usage.

Quick start

git clone https://github.com/tnagar72/Glyph.git
cd Glyph
python -m venv glyph_env && source glyph_env/bin/activate
pip install -r requirements.txt

# Test it works
python main.py --transcript-only

# Try agent mode (the fun part)
python main.py --setup-agent  # point it at your Obsidian vault
python main.py --agent-mode

Then just talk to it:

You: "Create a note about today's standup"
Glyph: ✅ Created "Daily Standup 2024-07-08.md"

You: "Add action items section to it" 
Glyph: ✅ Added section "Action Items" to Daily Standup 2024-07-08.md

You: "Open that note in Obsidian"
Glyph: ✅ Opened Daily Standup 2024-07-08.md in Obsidian

Direct editing mode

For when you just want to edit a single file:

# Edit a specific file
python main.py --file notes.md

# Preview changes without applying (recommended first time)
python main.py --file notes.md --dry-run

# Use Enter-to-stop recording if spacebar interferes with your terminal
python main.py --file notes.md --enter-stop

Voice command examples:

"Mark the second task as complete"
"Add a new task about fixing the deployment pipeline"
"Move the meeting notes section to the top"
"Change the deadline from Friday to next Monday"

Agent mode (the interesting part)

This is where Glyph shines. It's a conversational AI that:

Remembers context across the entire session
Learns your note names and handles typos/variations
Manages your whole vault with 15+ specialized tools
Handles multi-step operations through natural conversation

# Set up once
python main.py --setup-agent

# Launch agent
python main.py --agent-mode

# Text-only mode for testing without voice
python main.py --agent-mode --text-only

What the agent can do

File operations:

Create, read, edit, delete, rename, move notes
Add/edit sections within notes
Generate summaries and content
Create links between notes

Smart features:

Learns how you refer to notes ("my stanford app" → "Stanford Application.md")
Handles typos and partial matches
Maintains conversation context ("add that to the note I just created")
Automatically backs up before making changes

Example conversation:

You: "Find my notes about the API redesign"
Agent: Found: "API Redesign Proposal.md", "API Migration Notes.md"

You: "Open the proposal one"
Agent: ✅ Opened "API Redesign Proposal.md" in Obsidian

You: "Add a section about backwards compatibility"
Agent: ✅ Added section "Backwards Compatibility" to API Redesign Proposal.md

You: "Link it to my migration notes"
Agent: ✅ Added wikilink to API Migration Notes.md

Live transcription

Real-time voice-to-text streaming:

# Stream to terminal
python main.py --live

# Copy directly to clipboard
python main.py --live --clipboard

# Pipe to other tools
python main.py --live | grep -i "important" | tee important_notes.txt

Great for meeting notes, brainstorming, or any time you need fast voice-to-text.

Configuration

Glyph supports both local Whisper models and OpenAI API transcription:

Transcription methods

# Configure your preferred method
python main.py --setup-transcription

Method	Cost	Privacy	Speed	Accuracy
Local Whisper	Free	Complete	2-10s	Good
OpenAI API	$0.006/min	Data sent to OpenAI	1-3s	Excellent

Whisper models

# Choose model size vs speed tradeoff
python main.py --setup-model

Model	Size	Speed	Use case
tiny	39MB	Fastest	Quick testing
base	74MB	Fast	Simple commands
small	244MB	Medium	Balanced
medium	769MB	Slow	Recommended
large	1550MB	Slowest	Best accuracy

Audio setup

# Interactive device selection
python main.py --setup-audio

# View all current settings
python main.py --show-config

Technical details

Architecture

Two main modes with different strengths:

Direct mode: Voice → Whisper → GPT-4 → Diff → Apply

Single-shot editing with immediate results
Rich diff display with user approval
Automatic backups and undo support

Agent mode: Voice → Context Analysis → Tool Selection → Execution → Learning

Persistent conversation state and memory
Multi-step operations with reference tracking
Learning system that improves over time

Key components

glyph/
├── main.py                 # Entry point and mode routing
├── recording.py            # Audio capture with validation
├── transcription.py        # Dual transcription service
├── llm.py                 # GPT-4 integration
├── agent_cli.py           # Conversational interface
├── agent_tools.py         # 15+ vault management tools
├── agent_memory.py        # Reference learning system
├── agent_context.py       # Multi-turn conversation tracking
├── interactive_cli.py     # Rich terminal interface
├── live_transcription.py  # Real-time streaming
├── backup_manager.py      # Centralized backup system
└── [config files...]      # Audio, model, transcription setup

Testing

Comprehensive test suite with 100% feature coverage:

# Quick validation
python run_tests_simple.py

# Full test suite (agent + direct mode)
python run_all_tests.py

# Individual components
python test_agent_comprehensive.py
python test_nonagent_comprehensive.py

The tests mock external APIs and audio input, so they run reliably in CI/CD.

Troubleshooting

Audio not working:

# Check devices
python -c "import sounddevice as sd; print(sd.query_devices())"

# Run audio setup
python main.py --setup-audio

Transcription errors:

# Test all methods
python main.py --test-transcription

# Use larger model for better accuracy
python main.py --setup-model

Terminal interference (iTerm users):

# Use enter-to-stop instead of spacebar
python main.py --agent-mode --enter-stop

Import errors:

# Make sure you're in the virtual environment
source glyph_env/bin/activate

Requirements

Python 3.8+ (3.9+ recommended)
2GB+ RAM (4GB+ for large Whisper models)
Microphone
OpenAI API key (optional, for API transcription)
Obsidian (optional, for vault integration)

Works on macOS, Linux, and Windows. Tested primarily on macOS with iTerm2.

What's next

Some ideas I'm considering:

Plugin architecture: Let people write custom agent tools
Voice synthesis: Have the agent talk back with confirmations
Better context: Understand relationships between notes
Web interface: For remote access and mobile use
Custom models: Train on your specific note patterns

Open to contributions and ideas. This started as a weekend project but has grown into something I use daily.

Files worth looking at

agent_tools.py - The 15+ tools that power agent mode
agent_memory.py - How the learning system works
transcription.py - Dual transcription with fallback logic
test_agent_comprehensive.py - Comprehensive agent testing
TESTING_GUIDE.md - Complete testing documentation

Contributing

Standard stuff:

Fork, branch, PR
Run tests before submitting: python run_all_tests.py
Follow existing code style
Add tests for new features

The codebase is pretty clean and well-documented. Most complexity is in the agent system and conversation management.

License

MIT - do whatever you want with it.

If you build something cool on top of this, let me know. Always interested to see how people extend it.

Note: This tool processes voice locally by default but can use OpenAI's API for transcription and text processing. See the privacy/security section in FUNCTIONALITY.md for details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
backups		backups
docs		docs
examples		examples
prompts		prompts
tests		tests
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
DEVELOPER_TOOLS.md		DEVELOPER_TOOLS.md
Dockerfile		Dockerfile
EXAMPLES.md		EXAMPLES.md
FAQ.md		FAQ.md
FUNCTIONALITY.md		FUNCTIONALITY.md
INSTALLATION.md		INSTALLATION.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
TESTING_GUIDE.md		TESTING_GUIDE.md
TRANSCRIPTION_API_FEATURE.md		TRANSCRIPTION_API_FEATURE.md
WHISPER_MODELS.md		WHISPER_MODELS.md
agent_cli.py		agent_cli.py
agent_config.py		agent_config.py
agent_context.py		agent_context.py
agent_llm.py		agent_llm.py
agent_memory.py		agent_memory.py
agent_prompts.py		agent_prompts.py
agent_tools.py		agent_tools.py
audio_config.py		audio_config.py
backup_manager.py		backup_manager.py
cleaning.py		cleaning.py
cleanup_backups.py		cleanup_backups.py
demo_enhanced_agent.py		demo_enhanced_agent.py
diff.py		diff.py
interactive_cli.py		interactive_cli.py
live_transcription.py		live_transcription.py
llm.py		llm.py
main.py		main.py
md_file.py		md_file.py
model_config.py		model_config.py
prompts.py		prompts.py
pyproject.toml		pyproject.toml
recording.py		recording.py
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
run_all_tests.py		run_all_tests.py
run_glyph.py		run_glyph.py
run_tests_simple.py		run_tests_simple.py
session_logger.py		session_logger.py
setup.py		setup.py
test_agent_commands.py		test_agent_commands.py
test_agent_comprehensive.py		test_agent_comprehensive.py
test_agent_directly.py		test_agent_directly.py
test_enhanced_agent.py		test_enhanced_agent.py
test_nonagent_comprehensive.py		test_nonagent_comprehensive.py
test_semantic_reference.py		test_semantic_reference.py
test_specific_commands.py		test_specific_commands.py
transcription.py		transcription.py
transcription_config.py		transcription_config.py
transcription_enhanced.py		transcription_enhanced.py
transcription_original.py		transcription_original.py
ui_helpers.py		ui_helpers.py
undo_manager.py		undo_manager.py
utils.py		utils.py
validate_tests.py		validate_tests.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Glyph

Documentation

What this does

Why I built this

Quick start

Direct editing mode

Agent mode (the interesting part)

What the agent can do

Live transcription

Configuration

Transcription methods

Whisper models

Audio setup

Technical details

Architecture

Key components

Testing

Troubleshooting

Requirements

What's next

Files worth looking at

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Glyph

Documentation

What this does

Why I built this

Quick start

Direct editing mode

Agent mode (the interesting part)

What the agent can do

Live transcription

Configuration

Transcription methods

Whisper models

Audio setup

Technical details

Architecture

Key components

Testing

Troubleshooting

Requirements

What's next

Files worth looking at

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages