Two parallel takes on the same goal: drive the RB-NEX 3-wheel omnidirectional robot using a local vision-language model (Gemma 4 via Ollama), in simulation, so the camera → LLM → motor-command loop can be validated before the Raspberry Pi + Arduino hardware is ready.
Each implementation stands alone — pick whichever matches your preferred stack.
- Coding-agent benchmark. Given the same informal brief, same hardware target, same Ollama + Gemma stack, and a user-assigned language per agent (Python for Opus, JavaScript for GPT), how does each agent shape the rest of the problem? See the side-by-side below.
- Open-source VLM viability. Is Gemma 4 served locally via Ollama on a consumer laptop GPU fast enough to act as the "brain" of a discrete-step control loop? Answer below.
Hardware: RTX 5060 Laptop, 8 GB VRAM, so partial CPU offload for an 8B Q4_K_M model (~9.6 GB on disk).
- Cold first call: ~30-50 s (model loading into VRAM).
- Warm-cache typical: ~15-20 s per call.
- Best observed warm call: 6.5 s.
- Effective command cadence: ~20 s between moves (LLM-bound, not robot-bound).
- Full coin-collection run (
opus_build): ~403 s of sim time, ~6.5 min wall clock. 21 commands issued, one pickup.
Honest takeaway: not fast enough for fluid teleop today, but enough for a slow discrete-step navigation loop — which is exactly what a first-pass hobby robot needs. Dropping to a smaller VLM (LLaVA-7B, Qwen2.5-VL-3B) or running on a GPU that holds the full checkpoint in VRAM would likely pull the warm call under 5 s.
gpt_build/ — browser + Three.js (JavaScript)
Browser-based 3D simulator served as plain static HTML/CSS/JS with Three.js. Talks directly to local Ollama from the browser, with an editable HUD for model name, URL and timeout, plus a heuristic fallback controller when the model is slow or returns malformed JSON. Endless tiled floor, procedurally-spawned red obstacles, and a glowing reward pickup.
cd gpt_build
python -m http.server 8130
# then open http://localhost:8130/public/Full details in gpt_build/README.md.
opus_build/ — PyBullet (Python)
Python / PyBullet simulator. Fixed room with walls, a table, a blue
box, a green cylinder, and one yellow coin target. The robot
captures 640×480 RGB frames from a forward-mounted camera, posts
them to Ollama's /api/chat, and executes the returned JSON
command. Per-run transcripts and a dated DEBUG_NOTES dev journal
live under opus_build/logs/.
cd opus_build
python -m venv .venv
.venv/Scripts/python -m pip install -r requirements.txt
.venv/Scripts/python -m mainFull details in opus_build/README.md.
The two sub-projects were built by two different coding agents from
the same informal spec (Claude Opus 4.7 built opus_build/, GPT‑5.5
built gpt_build/) — effectively a benchmark of the two agents on
the same robotics-sim task.
User-specified for both agents:
- The architecture:
image_cap→img_out→command_exec→ motor primitives, with the loop split soimage_cap+img_outrun on a Raspberry Pi andcommand_exec+ motor primitives run on an Arduino. - The language per agent (Python for Opus, JavaScript for GPT).
- The hardware target (RB-NEX 3-wheel omni-directional chassis).
- The LLM stack (local Ollama, Gemma 4).
- The sim-first framing (validate the loop before the Pi, battery, and Arduino are physically on the bench).
Left to each agent: specific framework within the language (PyBullet vs Three.js), world layout, fallback behaviour, command schema details, prompt wording and motion-model calibration, process documentation style.
| Dimension | gpt_build (GPT‑5.5) |
opus_build (Opus 4.7) |
|---|---|---|
| Stack | Browser / Three.js / JS | PyBullet / Python |
| How to run | python -m http.server + browser |
python -m main in a venv |
| Ollama endpoint | /api/generate |
/api/chat with format=json |
| World | Endless tile floor, procedural obstacles | Fixed 6×6 m room, 3 static obstacles, 1 yellow coin |
| Camera | Three.js canvas → JPEG | PyBullet getCameraImage → PNG (640×480) |
| LLM fallback | Heuristic from reward bearing + clearance | Safe stop only |
| Action set | 7 actions (move_straight/…/stop) |
7 actions (advance/…/stop) |
| Command fields | action, speed, duration_ms, angle? |
action, speed_mmps, duration_ms, reasoning |
| Live UI | Browser HUD with editable model/URL/timeout | PyBullet GUI + terminal pose-delta log |
| Process docs | README, LOGS.md, HARDWARE_DEPLOYMENT.md |
README, DEBUG_NOTES.md dev journal, milestone git tags |
| Pi-stack parity | Browser; rewrite for Pi | Python; drops onto Pi with minimal changes |
Honest takeaways from running both (only the bits that were actually the agents' calls, not the user's assignment):
- Defensive design vs. deterministic testing. GPT added a
heuristic fallback controller (using reward bearing + obstacle
clearance) so the robot still behaves when Ollama misbehaves.
Opus went with a fixed repeatable room and a safe
stopon any error, so a failed run points the finger at the model or the prompt, not at a heuristic. - World shape. GPT picked an endless procedural world with spawning obstacles — more variety, harder to reproduce a specific bug. Opus picked a small fixed room with one coin — easier to isolate and rerun a scenario, less replay value.
- Schema + prompt style. GPT's schema is minimal
(
action,speed,duration_ms, optionalangle). Opus added areasoningfield and uses Ollama'sformat=jsonplus an explicit motion-model block in the prompt (so the model knows roughly how far each command travels). - Process artefacts. GPT left a
LOGS.mdbuild log plus a dedicatedHARDWARE_DEPLOYMENT.md. Opus kept a datedDEBUG_NOTES.mddev journal with a milestone tag per commit (m1-scaffold→v1.0) — more overhead, more reproducible. - Both work. Both converge on the reward in practice, with different failure modes: GPT's sim makes specific bugs hard to reproduce because the world resets; Opus's sim had a real spin-oscillation loop that had to be fixed by re-tuning the prompt and widening the pickup radius.
The hardware split came out of the original user brief, not the agents — both sims were built to fit this layout:
- Raspberry Pi (Python) — camera capture (
image_cap) and the LLM call (img_out). - Arduino — motor commands (
command_execdispatching into the existingOmni3WDlibrary from the RB-NEX firmware inrb-nex-02/lib/MotorWheel/).
See each sub-project for its hardware-deployment notes:
gpt_build/docs/HARDWARE_DEPLOYMENT.md- The Deployment on real hardware section of
opus_build/README.md.

