llm-nav-sim — local-LLM navigation for the RB-NEX Omni3WD, built twice

Two parallel takes on the same goal: drive the RB-NEX 3-wheel omnidirectional robot using a local vision-language model (Gemma 4 via Ollama), in simulation, so the camera → LLM → motor-command loop can be validated before the Raspberry Pi + Arduino hardware is ready.

Each implementation stands alone — pick whichever matches your preferred stack.

Two questions this repo tries to answer

Coding-agent benchmark. Given the same informal brief, same hardware target, same Ollama + Gemma stack, and a user-assigned language per agent (Python for Opus, JavaScript for GPT), how does each agent shape the rest of the problem? See the side-by-side below.
Open-source VLM viability. Is Gemma 4 served locally via Ollama on a consumer laptop GPU fast enough to act as the "brain" of a discrete-step control loop? Answer below.

What Gemma 4 latency actually looked like

Hardware: RTX 5060 Laptop, 8 GB VRAM, so partial CPU offload for an 8B Q4_K_M model (~9.6 GB on disk).

Cold first call: ~30-50 s (model loading into VRAM).
Warm-cache typical: ~15-20 s per call.
Best observed warm call: 6.5 s.
Effective command cadence: ~20 s between moves (LLM-bound, not robot-bound).
Full coin-collection run (opus_build): ~403 s of sim time, ~6.5 min wall clock. 21 commands issued, one pickup.

Honest takeaway: not fast enough for fluid teleop today, but enough for a slow discrete-step navigation loop — which is exactly what a first-pass hobby robot needs. Dropping to a smaller VLM (LLaVA-7B, Qwen2.5-VL-3B) or running on a GPU that holds the full checkpoint in VRAM would likely pull the warm call under 5 s.

`gpt_build/` — browser + Three.js (JavaScript)

Browser-based 3D simulator served as plain static HTML/CSS/JS with Three.js. Talks directly to local Ollama from the browser, with an editable HUD for model name, URL and timeout, plus a heuristic fallback controller when the model is slow or returns malformed JSON. Endless tiled floor, procedurally-spawned red obstacles, and a glowing reward pickup.

cd gpt_build
python -m http.server 8130
# then open http://localhost:8130/public/

Full details in gpt_build/README.md.

`opus_build/` — PyBullet (Python)

Python / PyBullet simulator. Fixed room with walls, a table, a blue box, a green cylinder, and one yellow coin target. The robot captures 640×480 RGB frames from a forward-mounted camera, posts them to Ollama's /api/chat, and executes the returned JSON command. Per-run transcripts and a dated DEBUG_NOTES dev journal live under opus_build/logs/.

cd opus_build
python -m venv .venv
.venv/Scripts/python -m pip install -r requirements.txt
.venv/Scripts/python -m main

Full details in opus_build/README.md.

Side-by-side

The two sub-projects were built by two different coding agents from the same informal spec (Claude Opus 4.7 built opus_build/, GPT‑5.5 built gpt_build/) — effectively a benchmark of the two agents on the same robotics-sim task.

User-specified for both agents:

The architecture: image_cap → img_out → command_exec → motor primitives, with the loop split so image_cap + img_out run on a Raspberry Pi and command_exec + motor primitives run on an Arduino.
The language per agent (Python for Opus, JavaScript for GPT).
The hardware target (RB-NEX 3-wheel omni-directional chassis).
The LLM stack (local Ollama, Gemma 4).
The sim-first framing (validate the loop before the Pi, battery, and Arduino are physically on the bench).

Left to each agent: specific framework within the language (PyBullet vs Three.js), world layout, fallback behaviour, command schema details, prompt wording and motion-model calibration, process documentation style.

Dimension	`gpt_build` (GPT‑5.5)	`opus_build` (Opus 4.7)
Stack	Browser / Three.js / JS	PyBullet / Python
How to run	`python -m http.server` + browser	`python -m main` in a venv
Ollama endpoint	`/api/generate`	`/api/chat` with `format=json`
World	Endless tile floor, procedural obstacles	Fixed 6×6 m room, 3 static obstacles, 1 yellow coin
Camera	Three.js canvas → JPEG	PyBullet `getCameraImage` → PNG (640×480)
LLM fallback	Heuristic from reward bearing + clearance	Safe `stop` only
Action set	7 actions (`move_straight`/…/`stop`)	7 actions (`advance`/…/`stop`)
Command fields	`action`, `speed`, `duration_ms`, `angle?`	`action`, `speed_mmps`, `duration_ms`, `reasoning`
Live UI	Browser HUD with editable model/URL/timeout	PyBullet GUI + terminal pose-delta log
Process docs	`README`, `LOGS.md`, `HARDWARE_DEPLOYMENT.md`	`README`, `DEBUG_NOTES.md` dev journal, milestone git tags
Pi-stack parity	Browser; rewrite for Pi	Python; drops onto Pi with minimal changes

Honest takeaways from running both (only the bits that were actually the agents' calls, not the user's assignment):

Defensive design vs. deterministic testing. GPT added a heuristic fallback controller (using reward bearing + obstacle clearance) so the robot still behaves when Ollama misbehaves. Opus went with a fixed repeatable room and a safe stop on any error, so a failed run points the finger at the model or the prompt, not at a heuristic.
World shape. GPT picked an endless procedural world with spawning obstacles — more variety, harder to reproduce a specific bug. Opus picked a small fixed room with one coin — easier to isolate and rerun a scenario, less replay value.
Schema + prompt style. GPT's schema is minimal (action, speed, duration_ms, optional angle). Opus added a reasoning field and uses Ollama's format=json plus an explicit motion-model block in the prompt (so the model knows roughly how far each command travels).
Process artefacts. GPT left a LOGS.md build log plus a dedicated HARDWARE_DEPLOYMENT.md. Opus kept a dated DEBUG_NOTES.md dev journal with a milestone tag per commit (m1-scaffold → v1.0) — more overhead, more reproducible.
Both work. Both converge on the reward in practice, with different failure modes: GPT's sim makes specific bugs hard to reproduce because the world resets; Opus's sim had a real spin-oscillation loop that had to be fixed by re-tuning the prompt and widening the pickup radius.

Target hardware (shared)

The hardware split came out of the original user brief, not the agents — both sims were built to fit this layout:

Raspberry Pi (Python) — camera capture (image_cap) and the LLM call (img_out).
Arduino — motor commands (command_exec dispatching into the existing Omni3WD library from the RB-NEX firmware in rb-nex-02/lib/MotorWheel/).

See each sub-project for its hardware-deployment notes:

gpt_build/docs/HARDWARE_DEPLOYMENT.md
The Deployment on real hardware section of opus_build/README.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llm-nav-sim — local-LLM navigation for the RB-NEX Omni3WD, built twice

Two questions this repo tries to answer

What Gemma 4 latency actually looked like

`gpt_build/` — browser + Three.js (JavaScript)

`opus_build/` — PyBullet (Python)

Side-by-side

Target hardware (shared)

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
gpt_build		gpt_build
opus_build		opus_build
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

llm-nav-sim — local-LLM navigation for the RB-NEX Omni3WD, built twice

Two questions this repo tries to answer

What Gemma 4 latency actually looked like

gpt_build/ — browser + Three.js (JavaScript)

opus_build/ — PyBullet (Python)

Side-by-side

Target hardware (shared)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`gpt_build/` — browser + Three.js (JavaScript)

`opus_build/` — PyBullet (Python)

Packages