What it really means to be a 10x engineerβand the tool built for that reality.
The industry loves the lore of "10x engineer"βthe lone genius who ships a new product in a weekend, the hacker who rewrites the entire stack in a caffeine-fueled sprint, the visionary who creates something from nothing.
That's not what a 10x engineer actually does.
The real 10x engineers aren't working on greenfield projects. They're not inventing new frameworks or building the next viral app. They're maintaining behemoth codebases where millions of users depend on their decisions every single day.
Their incentive structure is fundamentally different: "If it's not broken, don't fix it."
And with that constraint in mind, they ask a different question entirely:
"What truly matters for solving this particular problem, and how can I gain enough confidence to ship it reliably?"
Idea creation is a human trait. Ideas arise from impulsive feelings, obstacles we encounter, problems we want to solve. Creating something new is exciting, visceral, immediate.
Maintaining something reliably over time requires a completely different pedigreeβand it's by far more important than creating the idea itself.
Consider the reality:
- A production codebase with 100,000+ lines across hundreds of files
- Millions of users whose workflows depend on your system staying stable
- Years of accumulated complexity: edge cases, performance optimizations, backwards compatibility
- Distributed teams where no single person understands the entire system
- The cost of breaking things is measured in downtime, lost revenue, and user trust
In this environment, the questions that matter are:
- "If I change this function, what breaks?"
- "How does this data flow through the system?"
- "What are all the execution paths that touch this code?"
- "Where are the hidden dependencies I need to understand before refactoring?"
This is where most coding tools fail.
They're built for the weekend hackathon, the demo video, the "move fast and break things" mentality. They optimize for speed of creation, not confidence in maintenance.
We built axe because we understood this problem intimately. Our team has been maintaining production systems at scale, and we needed a tool that matched the way real engineering actually works.
axe is built for large codebases. Not prototypes. Not "good enough for now" solutions.
It's built for the engineer who needs to:
- Understand a call graph before changing a function signature
- Trace data flow to debug a subtle state corruption
- Analyze execution paths to understand why a test fails in CI but not locally
- Perform impact analysis before refactoring to know exactly what depends on what
The core insight: To ship reliably in large codebases, you need precise understanding, not exhaustive reading.
| Resource | Description |
|---|---|
| Installation | Get up and running in seconds. |
| axe-dig | A powerful inference needs a precise retrieval. Here you can dig with us more. |
| Use Cases | Real-world workflows: features, bugs, refactoring, exploration. |
| Tools | Complete reference for file ops, shell, and axe-dig tools. |
| Agents | Creating custom agents and subagents for parallel work. |
| Configuration | Providers, models, sessions, and architecture deep dive. |
| Bodega Inference Engine | Loading models with continuous batching, tool parsers, speculative decoding, and more. |
Most coding tools take the brute-force approach: dump your entire codebase into the context window and hope the LLM figures it out.
This is backwards.
axe uses axe-dig, a 5-layer of retrieval that extracts exactly what matters for the task at hand:
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Layer 5: Program Dependence β "What affects line 42?" β
β Layer 4: Data Flow β "Where does this value go?" β
β Layer 3: Control Flow β "How complex is this?" β
β Layer 2: Call Graph β "Who calls this function?" β
β Layer 1: AST β "What functions exist?" β
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
When you need to understand a function, axe-dig gives you:
- The function signature and what it does
- Forward call graph: What does this function call?
- Backward call graph: Who calls this function?
- Control flow complexity: How many execution paths exist?
- Data flow: How do values transform through this code?
- Impact analysis: What breaks if I change this?
Sometimes this means fetching more context, not less. When you're debugging a race condition or tracing a subtle bug through multiple layers, axe-dig will pull in the full dependency chainβbecause correctness matters more than brevity.
To demonstrate the precision advantage, we built a minimal cli agent implementation with basic tools (grep, edit, write, shell) and compared it against the same agent with axe-dig tools.
Note: These are intentionally minimal implementations to show how phenomenal the axe-dig difference is. (If you want to try it yourself, here you go-- we named it knife since its a small implementation of axe. tests/smol_axe/knife.py).
Left: Basic cli agent with grep
Right: basic cli with axe-dig
The difference is clear. The basic agent searches blindly, while axe-dig understands code structure and dependencies.
When asked to explain how call flow tracking works, both agents found the contextβbut the results were dramatically different.
Left: Had to read the entire file after grepping for literal strings. 44,000 tokens.
Right: axe-dig used 17,000 tokens while also discovering:
- Call graphs for the decorator used on tracer functions
- Thread-safe depth tracking mechanisms
- How functions using this decorator actually work
axe-dig didn't just use fewer tokensβit provided better understanding of how the code flows.
The difference compounds with follow-up questions. When we asked about caller information:
Left: Started wrong, inferred wrong, continued wrong.
Right: Had more context and better understanding from the start, leading to precise answers.
This is why axe optimizes for what the code actually does and how it flows.
In the mlx-lm codebase, when asked how to compute DWQ targets:
Left: Explained the concept generically.
Right: axe cli actively searched the codebase and found the actual implementation.
**Precision means finding the answer in your code, not explaining theory. **
For Bodega, we optimized models for intelligence per watt: maximum throughput, minimum power consumption.
For axe, the equivalent metric is relevant tokens per context window.
| Scenario | Raw Tokens | axe-dig Tokens | Savings |
|---|---|---|---|
| Function + callees | 21,271 | 175 | 99% |
| Codebase overview (26 files) | 103,901 | 11,664 | 89% |
| Deep call chain (7 files) | 53,474 | 2,667 | 95% |
Precision retrieval naturally uses fewer tokens when extracting only what's needed for correct decisions. But when you need to trace a complex bug through seven layers, axe-dig fetches 150,000 tokensβwhatever it takes. Other tools burn tokens because they charge per token. axe optimizes for correctness.
axe was designed with local compute and local LLMs in mind.
Why does this matter?
Local LLMs have different constraints than cloud APIs:
- Slower prefill and decoding (can't waste time on irrelevant context)
- Smaller context windows (need precision, not bloat)
- No per-token billing (optimization is about speed and accuracy, not cost)
This forced us to build a precise retrieval engine from the ground up. We couldn't rely on "dump everything and let the cloud LLM figure it out."
The result: axe works brilliantly with both local and cloud models, because precision benefits everyone.
Here's axe running with srswti/blackbird-she-doesnt-refuse-21bβa 21B parameter model from our Bodega collection, running entirely locally:
Hardware: M1 Max, 64GB RAM
Performance: Spawning subagents, parallel task execution, full agentic capabilities
As you can see, the capability of axe-optimized Bodega models running locally is exceptional. The precision retrieval engine means even local models can handle complex workflows efficientlyβbecause they're not wasting compute on irrelevant context.
Traditional search finds syntax. axe-dig semantic search finds behavior.
# Traditional grep
grep "cache" src/ # Finds: variable names, comments, "cache_dir"
# axe-dig semantic search
chop semantic search "memoize expensive computations with TTL expiration"
# Finds: get_user_profile() because:
# - It calls redis.get() and redis.setex() (caching pattern)
# - Has TTL parameter in redis.setex call
# - Called by functions that do expensive DB queries
# Even though it doesn't mention "memoize" or "TTL"Every function gets embedded with:
- Signature and docstring
- Forward and backward call graphs
- Complexity metrics (branches, loops, cyclomatic complexity)
- Data flow patterns (variables used and transformed)
- Dependencies (imports, external modules)
- First ~10 lines of implementation
This gets encoded into 1024-dimensional embeddings, indexed with FAISS for fast similarity search.
Find code by what it does, not what it's named.
| Tool | What it does | Use case |
|---|---|---|
| CodeSearch | Semantic search by behavior | "Find payment processing logic" |
| CodeContext | LLM-ready function summaries with call graphs | Understand unfamiliar code |
| CodeStructure | Navigate functions/classes in files/dirs | Explore new codebases |
| CodeImpact | Reverse call graph (who calls this?) | Safe refactoring |
ReadFile/WriteFile/StrReplaceFile- Standard file I/OGrep- Exact file locations + line numbers (use after CodeSearch)Glob- Pattern matchingReadMediaFile- Images, PDFs, videos
Task- Spawn subagents for parallel workCreateSubagent- Custom agent specsSetTodoList- Track multi-step tasks
Subagents in action:
Spawn specialized subagents to divide and conquer complex workflows. Each subagent operates independently with its own context and tools.
# Install axe-cli (includes axe-dig)
uv pip install axe-cli
# Or from source
git clone https://github.com/SRSWTI/axe-cli
cd axe-cli
uv sync
axecd /path/to/your/project
axeOn first run, axe-dig automatically indexes your codebase (30-60 seconds for typical projects). After that, queries are instant.
# greet axe
hiii
# start coding
hey axe, can you tell me how does dwq targets are computed in mlx
# Toggle to shell mode
[Ctrl+X]
pytest tests/
[Ctrl+X]Hit Ctrl+X to toggle between axe and your normal shell. No context switching. No juggling terminals.
Building the world's fastest retrieval and inference engines.
To access the Bodega inference engine you need BodegaOS Sensors β the backend inference server that runs the MLX engine and serves the API on localhost:44468.
Install (macOS Tahoe+, Apple Silicon only):
curl -sL https://raw.githubusercontent.com/SRSWTI/axe/main/install_sensors.sh | bashThe script auto-detects your RAM and downloads the right edition (Standard / Pro), then downloads the BodegaOS client app. After running:
- Double-click the BodegaOS Sensors
.dmgβ drag to Applications β launch it. - Double-click the BodegaOS
.dmgβ drag to Applications (optional β visual model manager). - Open BodegaOS β log in with Google β Chat β Bodega Hub β Advanced to browse and download models.
Once Sensors is running, the inference API is live at http://localhost:44468. Load models via POST /v1/admin/load-model and point axe at them using your ~/.axe/config.toml.
Exclusive models trained/optimized for Bodega Inference Engine. axe includes zero-day support for all Bodega models, ensuring immediate access to our latest breakthroughs.
Note: Our models are also available on π€ Hugging Face.
Ultra-compact reasoning models designed for efficiency and edge deployment. Super light, amazing agentic coding capabilities, robust tool support, minimal memory footprint.
- π€ bodega-raptor-0.9b - 900M params. Runs on base m4 air with 100+ tok/s.
- π€ bodega-raptor-90m - Extreme edge variant. Sub-100M params for amazing tool calling.
- π€ bodega-raptor-1b-reasoning-opus4.5-distill - Distilled from Claude Opus 4.5 reasoning patterns.
- π€ bodega-raptor-8b-mxfp4 - Balanced power/performance for laptops.
- π€ bodega-raptor-15b-6bit - Enhanced raptor variant.
Frontier intelligence, distilled and optimized.
- π€ deepseek-v3.2-speciale-distilled-raptor-32b-4bit - DeepSeek V3.2 distilled to 32B with Raptor reasoning. Exceptional math/code generation in 5-7GB footprint. 120 tok/s on M1 Max.
- π€ bodega-centenario-21b-mxfp4 - Production workhorse. 21B params optimized for sustained inference workloads.
- π€ bodega-solomon-9b - Multimodal and best for agentic coding.
Launched specifically for the Axe coding use case. High-performance agentic coding models optimized for the Axe ecosystem.
- π€ axe-turbo-31b - High-capacity workloads. Exceptional agentic capabilities.
- π€ axe-stealth-37b - Our primary axe model.
Task-specific optimization.
- π€ bodega-vertex-4b - 4B params. Optimized for structured data.
- π€ blackbird-she-doesnt-refuse-21b - Uncensored 21B variant for unrestricted generation.
There are two separate config systems β make sure you know which one you're editing:
Tells axe which models exist and how to reach the Bodega server. The [models.*] blocks here only accept:
default_model = "bodega-raptor"
[providers.bodega]
type = "bodega"
base_url = "http://localhost:44468" # Local Bodega server
api_key = "" # Not required for local Bodega
[models.bodega-raptor]
provider = "bodega"
model = "srswti/bodega-raptor-8b-mxfp4" # must match the model_id you loaded in Bodega
max_context_size = 32768
capabilities = ["thinking"]
[models.axe-stealth-37b]
provider = "bodega"
model = "srswti/axe-stealth-37b"
max_context_size = 32768
capabilities = ["thinking"]See sample_config.toml for a full example with all providers (OpenRouter, Anthropic, OpenAI, Bodega).
Bodega runs as an app on your machine. You load models into it by calling the /v1/admin/load-model endpoint β this is where all the rich options live:
| Option | What it does |
|---|---|
model_path |
HuggingFace repo ID or local path |
model_id |
Alias used in API requests (defaults to model_path) β must match model in your axe-cli config |
model_type |
"lm" for text models, "multimodal" for vision models |
tool_call_parser |
Parses structured tool calls. Values: qwen3, qwen3_coder, qwen3_5, harmony, glm4_moe, etc. |
reasoning_parser |
Extracts <think> blocks into reasoning_content. Same values as above. |
enable_auto_tool_choice |
Instructs the model to automatically select the right tool. |
max_concurrency |
Parallel requests this model handler accepts before queueing. |
context_length |
Token context window for this model. |
continuous_batching |
High-throughput batching β up to 5x gains for multi-agent workloads. |
cb_max_num_seqs |
Total batch scheduler capacity (active + waiting sequences). |
cb_completion_batch_size |
Max sequences generating tokens simultaneously per GPU step. |
cb_prefill_batch_size |
New prompts injected into the active batch per step. |
draft_model_path |
Draft model for speculative decoding (faster single-user generation). |
prompt_cache_size |
Slots reserved for KV-cache reuse on repeated prefixes. |
π Full reference:
docs/bodega-inference-engine.mdCovers: all load parameters, continuous batching, speculative decoding, tool parsers, reasoning parsers, multimodal support, context length, max concurrency, and hardware tuning.
Example load calls for common axe model configurations:
# Raptor 8B β general purpose, tool calling + reasoning enabled
curl -X POST http://localhost:44468/v1/admin/load-model \
-H "Content-Type: application/json" \
-d '{
"model_path": "srswti/bodega-raptor-8b-mxfp4",
"model_id": "srswti/bodega-raptor-8b-mxfp4",
"model_type": "lm",
"context_length": 32768,
"max_concurrency": 1,
"enable_auto_tool_choice": true,
"tool_call_parser": "qwen3",
"reasoning_parser": "qwen3"
}'
# axe-stealth-37b β primary axe model, with continuous batching
curl -X POST http://localhost:44468/v1/admin/load-model \
-H "Content-Type: application/json" \
-d '{
"model_path": "srswti/axe-stealth-37b",
"model_id": "srswti/axe-stealth-37b",
"model_type": "multimodal",
"context_length": 32768,
"max_concurrency": 1,
"enable_auto_tool_choice": true,
"tool_call_parser": "qwen3_coder",
"reasoning_parser": "qwen3_5",
"continuous_batching": true,
"cb_max_num_seqs": 256,
"cb_prefill_batch_size": 16,
"cb_completion_batch_size": 32
}'
# Orion 0.6B β continuous batching for multi-agent throughput (~900 tok/s on M4 Max)
curl -X POST http://localhost:44468/v1/admin/load-model \
-H "Content-Type: application/json" \
-d '{
"model_path": "srswti/bodega-orion-0.6b",
"model_id": "srswti/bodega-orion-0.6b",
"model_type": "lm",
"max_concurrency": 1,
"enable_auto_tool_choice": true,
"tool_call_parser": "qwen3",
"reasoning_parser": "qwen3",
"continuous_batching": true,
"cb_max_num_seqs": 256,
"cb_prefill_batch_size": 16,
"cb_completion_batch_size": 32
}'See docs/bodega-inference-engine.md for the complete guide.
Our internal team has been using features that will change the game:
Understanding code isn't just about readingβit's about seeing the structure, connections, and flow.
The dashboard provides real-time visualization for:
Code Health Analysis:
- Cyclic dependencies: Visualize circular imports and dependency loops that make refactoring dangerous
- Dead code detection: See unreachable functions and unused modules with connection graphs
- Safe refactoring zones: Identify code that can be changed without cascading effects
- Execution trace visualization: Watch the actual flow of data through your system at runtime
Debugging Workflows:
- Trace execution paths visually from entry point to crash
- See which functions are called, in what order, with what values
- Identify bottlenecks and performance hotspots in the call graph
- Understand data transformations across multiple layers
The dashboard turns axe-dig's 5-layer analysis into interactive, explorable visualizations. No more drawing diagrams on whiteboardsβaxe generates them from your actual code.
See what actually happened at runtime. No more guessing why a test failed.
# Trace a failing test
/trace pytest tests/test_payment.py::test_refund
# Shows exact values that flowed through each function:
# process_refund(amount=Decimal("50.00"), transaction_id="tx_123")
# β validate_refund(transaction=Transaction(status="completed"))
# β check_refund_window(created_at=datetime(2024, 1, 15))
# β datetime.now() - created_at = timedelta(days=45)
# β raised RefundWindowExpired # β 30-day window exceededStatus: Under active development. Our team has been using this internally for weeks.
Large monorepos become unmaintainable when everything is tangled together. axe analyzes your codebase and automatically factors it into logical modules based on:
- Dependency analysis: Which code actually depends on what
- Call graph clustering: Functions that work together, grouped together
- Data flow boundaries: Natural separation points in your architecture
- Usage patterns: How different parts of the codebase are actually used
The result: Clear module boundaries, reduced coupling, easier maintenance. This has been heavily requested by enterprise customers managing multi-million-line monorepos.
Example workflow:
# Analyze current structure
/monorepo analyze .
# Shows: 47 logical modules detected across 1,200 files
# Suggests: Split into 5 packages with clear boundaries
# Impact: Reduces cross-module dependencies by 73%
# Apply factoring
/monorepo factor --target packages/Migrating codebases between languages is notoriously error-prone. axe uses its deep understanding of code structure to enable reliable migrations:
How it works:
- Analyze source code: Extract call graphs, data flow, and business logic
- Preserve semantics: Understand what the code does, not just what it says
- Generate target code: Translate to the new language while maintaining behavior
- Verify correctness: Compare execution traces and test coverage
Supported migrations:
- Python β TypeScript (preserve type safety)
- JavaScript β Go (maintain concurrency patterns)
- Ruby β Rust (keep performance characteristics)
- Java β Kotlin (modernize while preserving architecture)
Unlike simple transpilers, axe understands your code's intent and translates it idiomatically to the target language.
Flame graphs and memory profiling integrated directly in the chat interface.
# Generate flame graph
/flamegraph api_server.py
# Find memory leaks
/memory-profile background_worker.py# Only run tests affected by your changes
/test-impact src/payment/processor.py
# Shows: 12 tests need to run (not all 1,847)Python, TypeScript, JavaScript, Go, Rust, Java, C, C++, Ruby, PHP, C#, Kotlin, Scala, Swift, Lua, Elixir
Language auto-detected. Specify with --lang if needed.
| Feature | Claude Code | OpenAI Codex | axe |
|---|---|---|---|
| Built for | Weekend projects | Demos | Production codebases |
| Context strategy | Dump everything | Dump everything | Extract signal (precision-first) |
| Code search | Text/regex | Text/regex | Semantic (behavior-based) |
| Call graph analysis | β | β | β 5-layer analysis |
| Precision optimization | β (incentivized to waste) | β (incentivized to waste) | β Fetch what's needed for correctness |
| Execution tracing | β | β | β Coming soon |
| Flame graphs | β | β | β Coming soon |
| Memory profiling | β | β | β Coming soon |
| Visual debugging | β | β | β Coming soon |
| Shell integration | β | β | β Ctrl+X toggle |
| Session management | Limited | Limited | β Full history + replay |
| Skills system | β | β | β Modular, extensible |
| Subagents | β | β | β Parallel task execution |
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Special thanks to kimi-cli for their amazing work, which inspired our tools and the Kosong provider.







