Skip to content

quanta-guy/llm-nav-sim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 

Repository files navigation

llm-nav-sim — local-LLM navigation for the RB-NEX Omni3WD, built twice

Two parallel takes on the same goal: drive the RB-NEX 3-wheel omnidirectional robot using a local vision-language model (Gemma 4 via Ollama), in simulation, so the camera → LLM → motor-command loop can be validated before the Raspberry Pi + Arduino hardware is ready.

Each implementation stands alone — pick whichever matches your preferred stack.

Two questions this repo tries to answer

  1. Coding-agent benchmark. Given the same informal brief, same hardware target, same Ollama + Gemma stack, and a user-assigned language per agent (Python for Opus, JavaScript for GPT), how does each agent shape the rest of the problem? See the side-by-side below.
  2. Open-source VLM viability. Is Gemma 4 served locally via Ollama on a consumer laptop GPU fast enough to act as the "brain" of a discrete-step control loop? Answer below.

What Gemma 4 latency actually looked like

Hardware: RTX 5060 Laptop, 8 GB VRAM, so partial CPU offload for an 8B Q4_K_M model (~9.6 GB on disk).

  • Cold first call: ~30-50 s (model loading into VRAM).
  • Warm-cache typical: ~15-20 s per call.
  • Best observed warm call: 6.5 s.
  • Effective command cadence: ~20 s between moves (LLM-bound, not robot-bound).
  • Full coin-collection run (opus_build): ~403 s of sim time, ~6.5 min wall clock. 21 commands issued, one pickup.

Honest takeaway: not fast enough for fluid teleop today, but enough for a slow discrete-step navigation loop — which is exactly what a first-pass hobby robot needs. Dropping to a smaller VLM (LLaVA-7B, Qwen2.5-VL-3B) or running on a GPU that holds the full checkpoint in VRAM would likely pull the warm call under 5 s.


gpt_build/ — browser + Three.js (JavaScript)

gpt_build preview

Browser-based 3D simulator served as plain static HTML/CSS/JS with Three.js. Talks directly to local Ollama from the browser, with an editable HUD for model name, URL and timeout, plus a heuristic fallback controller when the model is slow or returns malformed JSON. Endless tiled floor, procedurally-spawned red obstacles, and a glowing reward pickup.

cd gpt_build
python -m http.server 8130
# then open http://localhost:8130/public/

Full details in gpt_build/README.md.


opus_build/ — PyBullet (Python)

opus_build preview

Python / PyBullet simulator. Fixed room with walls, a table, a blue box, a green cylinder, and one yellow coin target. The robot captures 640×480 RGB frames from a forward-mounted camera, posts them to Ollama's /api/chat, and executes the returned JSON command. Per-run transcripts and a dated DEBUG_NOTES dev journal live under opus_build/logs/.

cd opus_build
python -m venv .venv
.venv/Scripts/python -m pip install -r requirements.txt
.venv/Scripts/python -m main

Full details in opus_build/README.md.


Side-by-side

The two sub-projects were built by two different coding agents from the same informal spec (Claude Opus 4.7 built opus_build/, GPT‑5.5 built gpt_build/) — effectively a benchmark of the two agents on the same robotics-sim task.

User-specified for both agents:

  • The architecture: image_capimg_outcommand_exec → motor primitives, with the loop split so image_cap + img_out run on a Raspberry Pi and command_exec + motor primitives run on an Arduino.
  • The language per agent (Python for Opus, JavaScript for GPT).
  • The hardware target (RB-NEX 3-wheel omni-directional chassis).
  • The LLM stack (local Ollama, Gemma 4).
  • The sim-first framing (validate the loop before the Pi, battery, and Arduino are physically on the bench).

Left to each agent: specific framework within the language (PyBullet vs Three.js), world layout, fallback behaviour, command schema details, prompt wording and motion-model calibration, process documentation style.

Dimension gpt_build (GPT‑5.5) opus_build (Opus 4.7)
Stack Browser / Three.js / JS PyBullet / Python
How to run python -m http.server + browser python -m main in a venv
Ollama endpoint /api/generate /api/chat with format=json
World Endless tile floor, procedural obstacles Fixed 6×6 m room, 3 static obstacles, 1 yellow coin
Camera Three.js canvas → JPEG PyBullet getCameraImage → PNG (640×480)
LLM fallback Heuristic from reward bearing + clearance Safe stop only
Action set 7 actions (move_straight/…/stop) 7 actions (advance/…/stop)
Command fields action, speed, duration_ms, angle? action, speed_mmps, duration_ms, reasoning
Live UI Browser HUD with editable model/URL/timeout PyBullet GUI + terminal pose-delta log
Process docs README, LOGS.md, HARDWARE_DEPLOYMENT.md README, DEBUG_NOTES.md dev journal, milestone git tags
Pi-stack parity Browser; rewrite for Pi Python; drops onto Pi with minimal changes

Honest takeaways from running both (only the bits that were actually the agents' calls, not the user's assignment):

  • Defensive design vs. deterministic testing. GPT added a heuristic fallback controller (using reward bearing + obstacle clearance) so the robot still behaves when Ollama misbehaves. Opus went with a fixed repeatable room and a safe stop on any error, so a failed run points the finger at the model or the prompt, not at a heuristic.
  • World shape. GPT picked an endless procedural world with spawning obstacles — more variety, harder to reproduce a specific bug. Opus picked a small fixed room with one coin — easier to isolate and rerun a scenario, less replay value.
  • Schema + prompt style. GPT's schema is minimal (action, speed, duration_ms, optional angle). Opus added a reasoning field and uses Ollama's format=json plus an explicit motion-model block in the prompt (so the model knows roughly how far each command travels).
  • Process artefacts. GPT left a LOGS.md build log plus a dedicated HARDWARE_DEPLOYMENT.md. Opus kept a dated DEBUG_NOTES.md dev journal with a milestone tag per commit (m1-scaffoldv1.0) — more overhead, more reproducible.
  • Both work. Both converge on the reward in practice, with different failure modes: GPT's sim makes specific bugs hard to reproduce because the world resets; Opus's sim had a real spin-oscillation loop that had to be fixed by re-tuning the prompt and widening the pickup radius.

Target hardware (shared)

The hardware split came out of the original user brief, not the agents — both sims were built to fit this layout:

  • Raspberry Pi (Python) — camera capture (image_cap) and the LLM call (img_out).
  • Arduino — motor commands (command_exec dispatching into the existing Omni3WD library from the RB-NEX firmware in rb-nex-02/lib/MotorWheel/).

See each sub-project for its hardware-deployment notes:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors