Achieve 3-4x speedup through intelligent prompt compression, smart caching, and adaptive planning
Running Claude Code with local Ollama models hits performance walls:
| Issue | Impact |
|---|---|
| โฑ๏ธ Slow responses | 5-6 minutes for moderate prompts |
| ๐ Silent failures | Commands hang without error messages |
| ๐ Repeated work | No benefit from previous computations |
| /help commands take 10+ seconds |
The result? Frustrating, unpredictable local AI development.
| ๐๏ธ Compression Automatically compresses prompts 30-50% without losing information โ
Saves 10-20s per prompt |
๐ฆ Caching Smart response caching system with intelligent TTL โ
20x faster repeated queries |
๐งฉ Planning Breaks tasks into optimal parallelizable chunks โ
3-4x faster complex tasks |
| Task | Before | After | Improvement |
|---|---|---|---|
| Medium Prompt (500 tokens) | 45s | 18s | 2.5x โก |
| Large Prompt (1000+ tokens) | 90s | 28s | 3.2x โก |
| Repeated /help | 10s | 0.5s | 20x โกโกโก |
| Average Session | 45s | 12s | 3.75x โก |
Hardware: Windows 11, i7-12700K, 16GB RAM
โ
Claude Code CLI installed
โ
Ollama running (ollama launch claude)
โ
Windows/Linux/macOS
Option 1: Copy Plugin (Recommended)
mkdir -p ~/.claude/plugins/cache/local/
cp -r ~/Projects/claude-code-orchestrator ~/.claude/plugins/cache/local/catalystOption 2: Symlink (Development)
ln -s ~/Projects/claude-code-orchestrator ~/.claude/plugins/cache/local/catalystclaude --reload-plugins
claude --skill catalyst-prompt-optimizer statusExpected: โ CATALYST installed, Ollama enabled
claude --skill catalyst-prompt-optimizer analyze "your long prompt"
claude --skill catalyst-prompt-optimizer statsCompression Results:
- 200 tokens โ 5-10% compression (+1-2s faster)
- 500 tokens โ 25-35% compression (+3-5s faster)
- 1000+ tokens โ 30-50% compression (+10-20s faster)
claude --skill catalyst-planning plan "your complex task"Example: Code review (normally 5-6 min)
- Syntax analysis [30s]
- Security check [40s]
- Performance review [40s]
- Synthesis [30s]
Total: 2 minutes (3x faster!)
claude --skill catalyst-cache status
claude --skill catalyst-cache clear-allTypical Savings: 30-60 seconds per session
Windows (PowerShell):
[Environment]::SetEnvironmentVariable("ANTHROPIC_BASE_URL", "http://localhost:11434", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_AUTH_TOKEN", "ollama", "User")
[Environment]::SetEnvironmentVariable("ANTHROPIC_API_KEY", "", "User")Linux/macOS:
export ANTHROPIC_BASE_URL=http://localhost:11434
export ANTHROPIC_AUTH_TOKEN=ollama
export ANTHROPIC_API_KEY=""Verify:
echo $ANTHROPIC_BASE_URL # Should show: http://localhost:11434| Model | Speed | Quality | Best For |
|---|---|---|---|
| โญ qwen3-coder | โกโกโก | โญโญโญโญโญ | RECOMMENDED |
| neural-chat | โกโกโก | โญโญโญโญ | Conversations |
| phi | โกโกโกโก | โญโญโญ | Quick tasks |
| glm-4.7 | โกโก | โญโญโญโญโญ | Complex reasoning |
| mistral | โกโก | โญโญโญโญ | Balanced |
Plugin not loading?
claude --reload-plugins
ls ~/.claude/plugins/cache/local/catalyst/Ollama not detected?
echo $ANTHROPIC_BASE_URL
ollama listStill slow (>2 minutes)?
- Use
catalyst-planningto break tasks into chunks - Try faster model:
phiorneural-chat - Check CPU/GPU usage
Cache not working?
claude --skill catalyst-cache status
claude --skill catalyst-cache clear-all| Question | Answer |
|---|---|
| Works with cloud API? | โ Auto-disables (not needed) |
| How much disk space? | ๐พ 1-5 MB typical, max 50 MB |
| Is cache secure? | ๐ Local only, use --skip-cache for sensitive |
| Other LLM providers? | ๐ v1.1 will add LM Studio, GPT4All |
| Change models? | โ
Clear cache: catalyst-cache clear-all |
| Customize TTL? | โ
Edit ~/.catalyst-cache-config.json |
catalyst/
โโโ ๐ .claude-plugin/ Plugin configuration
โโโ ๐ง skills/ Three skills (optimizer, planner, cache)
โโโ โ๏ธ lib/ Four core libraries
โโโ ๐ README.md This file
โโโ ๐ QUICK_START.md Installation guide
โโโ ๐ package.json NPM metadata
โโโ โ๏ธ LICENSE MIT License
| Metric | Value |
|---|---|
| Lines of Code | 1,700+ |
| Skills | 3 production-ready |
| Dependencies | Zero ๐ |
| License | MIT โ |
| Status | Production Ready โ |
v1.0.0 โ
- โ Prompt compression
- โ Response caching
- โ Task planning
v1.1.0 ๐ง
- Web dashboard
- LM Studio support
- Auto model selection
v2.0.0 ๐ฎ
- Multi-model orchestration
- VSCode Extension
MIT License - Free for personal, educational, and commercial use.
๐ Report Issues โข ๐ฌ Discussions โข ๐ Full Guide
Made with โค๏ธ for the Claude Code + Ollama community
โญ If this helps, consider giving it a star! โญ