Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/spec/gemma-3n-e4b-path.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
orphan: true
---

# Gemma 3N E4B Software Path Overview

This page summarizes the current software path for the Gemma 3N E4B
bring-up lane on KV260. The useful public status is narrow: the launcher
scaffold is in place with mock coverage and typed contracts, while
bitstream evidence, a board run, and real model inference evidence are
still pending.

The release-line performance language remains **20 tok/s target** only.
This page does not present a measured throughput number.

## Current State

The software path is staged in
[`pccxai/pccx-llm-launcher`][launcher-repo]. As of 2026-05-07, the public
evidence is a set of open launcher PRs that define data preparation,
configuration, token I/O, and mock orchestration surfaces. They are
useful because they make the future board path concrete without
claiming that hardware execution has happened.

| Slice | Public evidence | Current interpretation |
| --- | --- | --- |
| Weight preparation | [pccxai/pccx-llm-launcher#84][launcher-84] | Adds caller-supplied BF16-shaped array preparation, grouped `e_max` / BFP power-of-two scaling, signed W4 quantization, packed-byte output, and manifest metadata. |
| Tokenizer | [pccxai/pccx-llm-launcher#87][launcher-87] | Adds an offline tokenizer surface with a local JSON config path and miniature placeholder vocabulary fixture; it does not include real Gemma tokenizer data. |
| Architecture spec | [pccxai/pccx-llm-launcher#89][launcher-89] | Adds a config-only `GemmaArchSpec` loader and validation surface, plus packed-size math for W4 manifest invariants. |
| Token streaming | [pccxai/pccx-llm-launcher#83][launcher-83] | Adds first-pass token streaming over a KV260 serial TTY boundary using marker-wrapped, length-prefixed chunks and mock/no-device coverage. |
| End-to-end orchestration | [pccxai/pccx-llm-launcher#88][launcher-88] | Wires prompt encode, W4 prep, scripted token stream, AXI mock polling, output receive, and decode into a deterministic mock-only path. |

The claim guard for this page follows the project public-wording list in
`manual/06_PUBLIC_WORDING_AND_CLAIMS.md`: target-only performance
wording, no hardware inference claim, no bitstream readiness claim, and
no production readiness claim.

## Data Path

The current software path can be read as five handoffs:

1. **Model-side shape and weight material** enters the launcher as
caller-supplied arrays and local config files. The W4 preparation
slice defines the packed representation and manifest fields that the
later hardware path must consume.
2. **Tokenizer input and output** are represented by an offline tokenizer
contract. The fixture is intentionally small and synthetic, so it is
contract coverage rather than a claim about real model assets.
3. **Gemma architecture metadata** is loaded through a config-only spec
object. That gives the launcher a place to validate dimensions and
compute packed-size expectations before any board command exists.
4. **Token transport** is framed over serial as marker-wrapped binary
chunks. The current tests cover framing, timeouts, mock AXI behavior,
and no-TTY skips; they do not replace a board capture.
5. **The orchestrator** joins those pieces with mock KV260 connection
surfaces. It can exercise the intended flow end to end in software,
but the real serial Gemma chat path remains a stub pending board
evidence.

## What Is Complete

The current scaffold covers:

- W4 quantization contract code and deterministic local tests.
- Tokenizer and architecture-spec contracts with local fixtures.
- Serial token-framing code and mock/no-device test paths.
- A mock end-to-end orchestrator and CLI route for deterministic local
output.
- Claim-scan-clean PR evidence on the launcher side.

This is enough to make the software boundary reviewable. It is not
enough to publish board execution, real inference, timing, or throughput
claims.

## Gates Still Open

The public gates that remain open are:

- Full bitstream build evidence from the KV260 hardware repository.
- Board programming and smoke-capture evidence.
- Real serial transport evidence against a connected KV260.
- Real Gemma tokenizer and weight asset handoff under the model license
boundary.
- End-to-end inference evidence with captured logs and reproducible
commands.
- Any measured throughput statement; only the **20 tok/s target** is
public at this stage.

Until those gates land, public wording should stay in the scaffold,
mock, contract, pending-evidence, and target vocabulary.

[launcher-repo]: https://github.com/pccxai/pccx-llm-launcher
[launcher-83]: https://github.com/pccxai/pccx-llm-launcher/pull/83
[launcher-84]: https://github.com/pccxai/pccx-llm-launcher/pull/84
[launcher-87]: https://github.com/pccxai/pccx-llm-launcher/pull/87
[launcher-88]: https://github.com/pccxai/pccx-llm-launcher/pull/88
[launcher-89]: https://github.com/pccxai/pccx-llm-launcher/pull/89
Loading