diff --git a/docs/spec/gemma-3n-e4b-path.md b/docs/spec/gemma-3n-e4b-path.md new file mode 100644 index 000000000..fb6f24502 --- /dev/null +++ b/docs/spec/gemma-3n-e4b-path.md @@ -0,0 +1,97 @@ +--- +orphan: true +--- + +# Gemma 3N E4B Software Path Overview + +This page summarizes the current software path for the Gemma 3N E4B +bring-up lane on KV260. The useful public status is narrow: the launcher +scaffold is in place with mock coverage and typed contracts, while +bitstream evidence, a board run, and real model inference evidence are +still pending. + +The release-line performance language remains **20 tok/s target** only. +This page does not present a measured throughput number. + +## Current State + +The software path is staged in +[`pccxai/pccx-llm-launcher`][launcher-repo]. As of 2026-05-07, the public +evidence is a set of open launcher PRs that define data preparation, +configuration, token I/O, and mock orchestration surfaces. They are +useful because they make the future board path concrete without +claiming that hardware execution has happened. + +| Slice | Public evidence | Current interpretation | +| --- | --- | --- | +| Weight preparation | [pccxai/pccx-llm-launcher#84][launcher-84] | Adds caller-supplied BF16-shaped array preparation, grouped `e_max` / BFP power-of-two scaling, signed W4 quantization, packed-byte output, and manifest metadata. | +| Tokenizer | [pccxai/pccx-llm-launcher#87][launcher-87] | Adds an offline tokenizer surface with a local JSON config path and miniature placeholder vocabulary fixture; it does not include real Gemma tokenizer data. | +| Architecture spec | [pccxai/pccx-llm-launcher#89][launcher-89] | Adds a config-only `GemmaArchSpec` loader and validation surface, plus packed-size math for W4 manifest invariants. | +| Token streaming | [pccxai/pccx-llm-launcher#83][launcher-83] | Adds first-pass token streaming over a KV260 serial TTY boundary using marker-wrapped, length-prefixed chunks and mock/no-device coverage. | +| End-to-end orchestration | [pccxai/pccx-llm-launcher#88][launcher-88] | Wires prompt encode, W4 prep, scripted token stream, AXI mock polling, output receive, and decode into a deterministic mock-only path. | + +The claim guard for this page follows the project public-wording list in +`manual/06_PUBLIC_WORDING_AND_CLAIMS.md`: target-only performance +wording, no hardware inference claim, no bitstream readiness claim, and +no production readiness claim. + +## Data Path + +The current software path can be read as five handoffs: + +1. **Model-side shape and weight material** enters the launcher as + caller-supplied arrays and local config files. The W4 preparation + slice defines the packed representation and manifest fields that the + later hardware path must consume. +2. **Tokenizer input and output** are represented by an offline tokenizer + contract. The fixture is intentionally small and synthetic, so it is + contract coverage rather than a claim about real model assets. +3. **Gemma architecture metadata** is loaded through a config-only spec + object. That gives the launcher a place to validate dimensions and + compute packed-size expectations before any board command exists. +4. **Token transport** is framed over serial as marker-wrapped binary + chunks. The current tests cover framing, timeouts, mock AXI behavior, + and no-TTY skips; they do not replace a board capture. +5. **The orchestrator** joins those pieces with mock KV260 connection + surfaces. It can exercise the intended flow end to end in software, + but the real serial Gemma chat path remains a stub pending board + evidence. + +## What Is Complete + +The current scaffold covers: + +- W4 quantization contract code and deterministic local tests. +- Tokenizer and architecture-spec contracts with local fixtures. +- Serial token-framing code and mock/no-device test paths. +- A mock end-to-end orchestrator and CLI route for deterministic local + output. +- Claim-scan-clean PR evidence on the launcher side. + +This is enough to make the software boundary reviewable. It is not +enough to publish board execution, real inference, timing, or throughput +claims. + +## Gates Still Open + +The public gates that remain open are: + +- Full bitstream build evidence from the KV260 hardware repository. +- Board programming and smoke-capture evidence. +- Real serial transport evidence against a connected KV260. +- Real Gemma tokenizer and weight asset handoff under the model license + boundary. +- End-to-end inference evidence with captured logs and reproducible + commands. +- Any measured throughput statement; only the **20 tok/s target** is + public at this stage. + +Until those gates land, public wording should stay in the scaffold, +mock, contract, pending-evidence, and target vocabulary. + +[launcher-repo]: https://github.com/pccxai/pccx-llm-launcher +[launcher-83]: https://github.com/pccxai/pccx-llm-launcher/pull/83 +[launcher-84]: https://github.com/pccxai/pccx-llm-launcher/pull/84 +[launcher-87]: https://github.com/pccxai/pccx-llm-launcher/pull/87 +[launcher-88]: https://github.com/pccxai/pccx-llm-launcher/pull/88 +[launcher-89]: https://github.com/pccxai/pccx-llm-launcher/pull/89