pccxai · hkimw · May 12, 2026
@@ -68,7 +68,7 @@ traces out of IPC payloads.
 
 - [Analyzer API](analyzer_api.md) - the plugin-registry primitive used by
   per-crate plugin traits.
-- [CLI reference](cli.md) - the binaries currently shipping; the old
+- [CLI reference](cli.md) - the binaries currently available; the old
   `pccx_analyze` umbrella does not exist today.
 
 ## Cite This Page

@@ -15,9 +15,9 @@ PCCX™ public docs and PRs must not claim any of the following
 without measured, reproducible evidence (each phrase is the
 *exact* claim form that requires evidence):
 
-- on-board KV260 inference operability
+- board inference operability
 - end-to-end Gemma 3N E4B runtime on KV260
-- numeric tokens-per-second targets (e.g. 20 tok/s)
+- numeric throughput claims or targets without measurement
 - timing-closure completion
 - bitstream-success outcomes
 - production-readiness

@@ -60,7 +60,7 @@ trademark docket issue in `pccxai/pccx` for the working list.
 
 - Use `PCCX™` on first prominent mention in any new public-facing
   document.
-- Do not use `PCCX®`, `registered trademark`, or `등록상표` until
+- Use only the `™` form; do not use registered-mark symbols or wording until
   registration is granted in the relevant jurisdiction and this
   policy is explicitly updated.
 - Treat `PCCX Compatible` and `PCCX Certified` as future

@@ -17,7 +17,7 @@ here are claims of use, not statements of registration.
 For the canonical entry point and the live filing docket, see
 [`TRADEMARKS.md`](../../TRADEMARKS.md) (root) and
 [`trademark-filing-log.md`](trademark-filing-log.md) (this directory).
-Use `PCCX™` on first prominent mention; do not use `PCCX®` until
+Use `PCCX™` on first prominent mention; do not use registered-mark symbols until
 registration is granted in the relevant jurisdiction.
 
 ## Claimed marks

@@ -16,7 +16,7 @@ decisions.
 ## Trademark
 
 - [`../../TRADEMARKS.md`](../../TRADEMARKS.md) — canonical trademark
-  policy. Use `PCCX™`. Do **not** use `PCCX®` until and unless
+  policy. Use `PCCX™`; avoid registered-mark symbols until and unless
   registration is granted.
 - [`../ip/trademark-filing-log.md`](../ip/trademark-filing-log.md)
   — public-safe filing docket (KR Class 09 / 42).

@@ -103,9 +103,9 @@ No. v002.1 is the planned sparsity and speculative-decoding ramp on the
 same KV260 RTL line. The baseline v002.0 integration and evidence gates
 remain visible dependencies.
 
-### Does the 20 tok/s figure mean measured throughput?
+### Does the v002.1 throughput target mean measured throughput?
 
-No. It is a v002.1 target. The docs may discuss it as a target, but it
+No. It is a v002.1 release-line target. The docs may discuss it as a target, but it
 must not be phrased as achieved throughput until KV260 evidence lands in
 {doc}`Evidence/index`.
 

@@ -43,7 +43,7 @@ Tracking issue: [pccxai/pccx#28 — v0.2.0 umbrella][v020].
 
 - same RTL repository (`pccx-FPGA-NPU-LLM-kv260`), continued from v002.0
 - G sparsity / H–H+ EAGLE-3 / I SSD / J Tree / K benchmark phases
-- 20 tok/s target lives on this release line
+- v002.1 throughput target lives on this release line
 - compute budget for EAGLE head training: $70–100 ($40 if a TRC TPU
   grant lands)
 

@@ -129,7 +129,7 @@ cycle.
 The two simplifications together remove **one CVO_SCALE** and
 **one CVO_TANH** per attention block per layer. Over the 35 layers of
 Gemma 3N E4B, that is 70 CVO invocations saved per decode step. Against
-the v002.1 throughput target (~20 tok/s; see :doc:`../../roadmap`), the
+the v002.1 throughput target (see :doc:`../../roadmap`), the
 SFU budget saves roughly 2–3 % wall-clock time.
 
 .. seealso::

@@ -2,7 +2,7 @@
 Gemma 3N E4B on pccx v002 — Execution and Scheduling
 =================================================================
 
-This page explains *how* a single decode token of Gemma 3N E4B runs
+This page explains the single-token Gemma 3N E4B decode path
 end-to-end on pccx v002 — which tensor lives where, which instruction
 fires which core, and how the scheduler keeps all three compute engines
 busy.
@@ -214,7 +214,7 @@ Under the baseline configuration (W4A8 compute path, INT4 KV cache,
 
 .. note::
 
-   The 20 tok/s figure is the **v002.1** release-line target (sparsity
+   The v002.1 throughput target is a release-line target (sparsity
    + speculative decoding on top of the v002.0 baseline RTL). The v002.0
    release line is measured-only — no throughput figure is asserted
    until KV260 evidence is reported. See :doc:`../../roadmap`.
@@ -227,7 +227,7 @@ Under the baseline configuration (W4A8 compute path, INT4 KV cache,
      - Target
      - Source of bottleneck
    * - Decode throughput
-     - **20 tok/s** — v002.1 target
+     - v002.1 throughput target
      - GEMV bandwidth at 400 MHz × 4 lanes × 1024 MAC/clk.
    * - L2 activation bandwidth
      - **~1.6 GB/s**

@@ -3,7 +3,7 @@ Gemma 3N E4B — Overview
 ========================
 
 pccx v002 is sized for **Gemma 3N E4B** on a bare-metal Kria KV260. The
-20 tok/s decoding figure is the **v002.1** release-line target (sparsity
+v002.1 decoding target is a release-line target (sparsity
 + speculative decoding on top of the v002.0 baseline RTL); the v002.0
 release line is measured-only. See :doc:`../../roadmap` for the staged
 release split. Before diving into the operator-level pipeline, this

@@ -85,7 +85,7 @@ changes ship together so there is no half-state.
 - No `git push --force` or `--force-with-lease`.
 - No tags pushed.
 - No staging push.
-- No PCCX® claim, no registered-trademark claim, no private
+- No registered-mark claim, no private
   trademark filings exposed.
 - No hardware/runtime/timing/bitstream claim is made by this PR;
   it is documentation-only.
@@ -67,7 +67,7 @@ throughput figure is asserted until KV260 evidence is reported.
      - Target
      - Rationale
    * - Decoding throughput
-     - **20 tok/s (Gemma 3N E4B)** — v002.1 target
+     - Gemma 3N E4B decoding — v002.1 throughput target
      - Bandwidth-matched between L2 cache and the GEMV cores
    * - Core clock frequency
      - **400 MHz**

@@ -37,8 +37,8 @@ runtime, or driver exists today.
 
 - No measured tokens-per-second on KV260 or any other board for Gemma 4
   E4B.
-- No bitstream, no timing-closed implementation.
-- No production-ready runtime.
+- No bitstream evidence and no implementation timing sign-off claim.
+- No production runtime claim.
 - No ABI stability.
 - No driver implementation.
 - No accuracy / quality benchmarks.

@@ -43,7 +43,7 @@ KV260 bring-up `[HW]` → 런타임 `[HW]` → 릴리스 증거 체크리스트
 
 - 동일 RTL 저장소 (`pccx-FPGA-NPU-LLM-kv260`) 에서 v002.0 의 후속
 - G sparsity / H–H+ EAGLE-3 / I SSD / J Tree / K benchmark 단계
-- 20 tok/s 목표는 이 릴리스 라인 위에 위치
+- v002.1 처리량 목표는 이 릴리스 라인 위에 위치
 - EAGLE head 학습용 컴퓨트 예산: $70–100 (TRC TPU grant 가
   들어오면 $40)
 

@@ -124,7 +124,7 @@ Gemma 3N 은 어텐션 블록에서 두 가지 선택을 표준 Transformer 와
 
 두 단순화로 레이어당 어텐션 블록마다 **CVO_SCALE 1 개** + **CVO_TANH
 1 개** 가 줄어듭니다. Gemma 3N E4B 35 레이어 기준으로 토큰당 70 회의
-CVO 호출 절감. v002.1 처리량 목표 (~20 tok/s; :doc:`../../roadmap`
+CVO 호출 절감. v002.1 처리량 목표 (:doc:`../../roadmap`
 참고) 기준으로 SFU 예산에서 약 2–3 % 의 시간 이득입니다.
 
 .. seealso::

@@ -205,7 +205,7 @@ end-to-end 디코드 목표:
 
 .. note::
 
-   20 tok/s 수치는 **v002.1** 릴리스 라인의 목표 (v002.0 베이스라인
+   v002.1 처리량 목표는 릴리스 라인의 목표 (v002.0 베이스라인
    RTL 위에 sparsity + speculative decoding 적층) 입니다. v002.0
    릴리스 라인은 측정만 (measured-only) — KV260 보드 근거가
    보고되기 전까지 처리량 수치를 주장하지 않습니다.
@@ -219,7 +219,7 @@ end-to-end 디코드 목표:
      - 목표
      - 병목 원인
    * - 디코드 처리량
-     - **20 tok/s** — v002.1 목표
+     - v002.1 처리량 목표
      - 400 MHz × 4 레인 × 1024 MAC/clk 에서 GEMV 대역폭.
    * - L2 활성화 대역폭
      - **~1.6 GB/s**

@@ -3,7 +3,7 @@ Gemma 3N E4B — 개요
 ========================
 
 pccx v002 는 베어메탈 Kria KV260 에서 **Gemma 3N E4B** 를 돌리는 것을
-기준으로 설계되었습니다. 20 tok/s 디코딩 수치는 **v002.1** 릴리스
+기준으로 설계되었습니다. v002.1 디코딩 목표는 릴리스
 라인의 목표 (v002.0 베이스라인 RTL 위에 sparsity + speculative
 decoding 적층) 입니다 — v002.0 릴리스 라인은 측정만 (measured-only)
 입니다. 단계별 릴리스 구분은 :doc:`../../roadmap` 참고. 연산자 수준

@@ -65,7 +65,7 @@
      - 목표
      - 근거
    * - 디코딩 처리량
-     - **20 tok/s (Gemma 3N E4B)** — v002.1 목표
+     - Gemma 3N E4B 디코딩 — v002.1 처리량 목표
      - L2 캐시 — GEMV 코어 사이 bandwidth 매칭
    * - 코어 동작 주파수
      - **400 MHz**