Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/Lab/workflow_facade.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ traces out of IPC payloads.

- [Analyzer API](analyzer_api.md) - the plugin-registry primitive used by
per-crate plugin traits.
- [CLI reference](cli.md) - the binaries currently shipping; the old
- [CLI reference](cli.md) - the binaries currently available; the old
`pccx_analyze` umbrella does not exist today.

## Cite This Page
Expand Down
4 changes: 2 additions & 2 deletions docs/evidence/no-unsupported-claims-policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,9 +15,9 @@ PCCX™ public docs and PRs must not claim any of the following
without measured, reproducible evidence (each phrase is the
*exact* claim form that requires evidence):

- on-board KV260 inference operability
- board inference operability
- end-to-end Gemma 3N E4B runtime on KV260
- numeric tokens-per-second targets (e.g. 20 tok/s)
- numeric throughput claims or targets without measurement
- timing-closure completion
- bitstream-success outcomes
- production-readiness
Expand Down
2 changes: 1 addition & 1 deletion docs/ip/trademark-filing-log.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ trademark docket issue in `pccxai/pccx` for the working list.

- Use `PCCX™` on first prominent mention in any new public-facing
document.
- Do not use `PCCX®`, `registered trademark`, or `등록상표` until
- Use only the `™` form; do not use registered-mark symbols or wording until
registration is granted in the relevant jurisdiction and this
policy is explicitly updated.
- Treat `PCCX Compatible` and `PCCX Certified` as future
Expand Down
2 changes: 1 addition & 1 deletion docs/ip/trademarks.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ here are claims of use, not statements of registration.
For the canonical entry point and the live filing docket, see
[`TRADEMARKS.md`](../../TRADEMARKS.md) (root) and
[`trademark-filing-log.md`](trademark-filing-log.md) (this directory).
Use `PCCX™` on first prominent mention; do not use `PCCX®` until
Use `PCCX™` on first prominent mention; do not use registered-mark symbols until
registration is granted in the relevant jurisdiction.

## Claimed marks
Expand Down
2 changes: 1 addition & 1 deletion docs/legal/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ decisions.
## Trademark

- [`../../TRADEMARKS.md`](../../TRADEMARKS.md) — canonical trademark
policy. Use `PCCX™`. Do **not** use `PCCX®` until and unless
policy. Use `PCCX™`; avoid registered-mark symbols until and unless
registration is granted.
- [`../ip/trademark-filing-log.md`](../ip/trademark-filing-log.md)
— public-safe filing docket (KR Class 09 / 42).
Expand Down
4 changes: 2 additions & 2 deletions docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,9 +103,9 @@ No. v002.1 is the planned sparsity and speculative-decoding ramp on the
same KV260 RTL line. The baseline v002.0 integration and evidence gates
remain visible dependencies.

### Does the 20 tok/s figure mean measured throughput?
### Does the v002.1 throughput target mean measured throughput?

No. It is a v002.1 target. The docs may discuss it as a target, but it
No. It is a v002.1 release-line target. The docs may discuss it as a target, but it
must not be phrased as achieved throughput until KV260 evidence lands in
{doc}`Evidence/index`.

Expand Down
2 changes: 1 addition & 1 deletion docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ Tracking issue: [pccxai/pccx#28 — v0.2.0 umbrella][v020].

- same RTL repository (`pccx-FPGA-NPU-LLM-kv260`), continued from v002.0
- G sparsity / H–H+ EAGLE-3 / I SSD / J Tree / K benchmark phases
- 20 tok/s target lives on this release line
- v002.1 throughput target lives on this release line
- compute budget for EAGLE head training: $70–100 ($40 if a TRC TPU
grant lands)

Expand Down
2 changes: 1 addition & 1 deletion docs/v002/Models/gemma3n_attention_rope.rst
Original file line number Diff line number Diff line change
Expand Up @@ -129,7 +129,7 @@ cycle.
The two simplifications together remove **one CVO_SCALE** and
**one CVO_TANH** per attention block per layer. Over the 35 layers of
Gemma 3N E4B, that is 70 CVO invocations saved per decode step. Against
the v002.1 throughput target (~20 tok/s; see :doc:`../../roadmap`), the
the v002.1 throughput target (see :doc:`../../roadmap`), the
SFU budget saves roughly 2–3 % wall-clock time.

.. seealso::
Expand Down
6 changes: 3 additions & 3 deletions docs/v002/Models/gemma3n_execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Gemma 3N E4B on pccx v002 — Execution and Scheduling
=================================================================

This page explains *how* a single decode token of Gemma 3N E4B runs
This page explains the single-token Gemma 3N E4B decode path
end-to-end on pccx v002 — which tensor lives where, which instruction
fires which core, and how the scheduler keeps all three compute engines
busy.
Expand Down Expand Up @@ -214,7 +214,7 @@ Under the baseline configuration (W4A8 compute path, INT4 KV cache,

.. note::

The 20 tok/s figure is the **v002.1** release-line target (sparsity
The v002.1 throughput target is a release-line target (sparsity
+ speculative decoding on top of the v002.0 baseline RTL). The v002.0
release line is measured-only — no throughput figure is asserted
until KV260 evidence is reported. See :doc:`../../roadmap`.
Expand All @@ -227,7 +227,7 @@ Under the baseline configuration (W4A8 compute path, INT4 KV cache,
- Target
- Source of bottleneck
* - Decode throughput
- **20 tok/s** — v002.1 target
- v002.1 throughput target
- GEMV bandwidth at 400 MHz × 4 lanes × 1024 MAC/clk.
* - L2 activation bandwidth
- **~1.6 GB/s**
Expand Down
2 changes: 1 addition & 1 deletion docs/v002/Models/gemma3n_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Gemma 3N E4B — Overview
========================

pccx v002 is sized for **Gemma 3N E4B** on a bare-metal Kria KV260. The
20 tok/s decoding figure is the **v002.1** release-line target (sparsity
v002.1 decoding target is a release-line target (sparsity
+ speculative decoding on top of the v002.0 baseline RTL); the v002.0
release line is measured-only. See :doc:`../../roadmap` for the staged
release split. Before diving into the operator-level pipeline, this
Expand Down
2 changes: 1 addition & 1 deletion docs/v002/RTL/pccx-v002-literalinclude-migration.md
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,7 @@ changes ship together so there is no half-state.
- No `git push --force` or `--force-with-lease`.
- No tags pushed.
- No staging push.
- No PCCX® claim, no registered-trademark claim, no private
- No registered-mark claim, no private
trademark filings exposed.
- No hardware/runtime/timing/bitstream claim is made by this PR;
it is documentation-only.
2 changes: 1 addition & 1 deletion docs/v002/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ throughput figure is asserted until KV260 evidence is reported.
- Target
- Rationale
* - Decoding throughput
- **20 tok/s (Gemma 3N E4B)** — v002.1 target
- Gemma 3N E4B decoding — v002.1 throughput target
- Bandwidth-matched between L2 cache and the GEMV cores
* - Core clock frequency
- **400 MHz**
Expand Down
4 changes: 2 additions & 2 deletions docs/v003/gemma4-e4b-planning.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,8 +37,8 @@ runtime, or driver exists today.

- No measured tokens-per-second on KV260 or any other board for Gemma 4
E4B.
- No bitstream, no timing-closed implementation.
- No production-ready runtime.
- No bitstream evidence and no implementation timing sign-off claim.
- No production runtime claim.
- No ABI stability.
- No driver implementation.
- No accuracy / quality benchmarks.
Expand Down
2 changes: 1 addition & 1 deletion ko/docs/roadmap.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ KV260 bring-up `[HW]` → 런타임 `[HW]` → 릴리스 증거 체크리스트

- 동일 RTL 저장소 (`pccx-FPGA-NPU-LLM-kv260`) 에서 v002.0 의 후속
- G sparsity / H–H+ EAGLE-3 / I SSD / J Tree / K benchmark 단계
- 20 tok/s 목표는 이 릴리스 라인 위에 위치
- v002.1 처리량 목표는 이 릴리스 라인 위에 위치
- EAGLE head 학습용 컴퓨트 예산: $70–100 (TRC TPU grant 가
들어오면 $40)

Expand Down
2 changes: 1 addition & 1 deletion ko/docs/v002/Models/gemma3n_attention_rope.rst
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ Gemma 3N 은 어텐션 블록에서 두 가지 선택을 표준 Transformer 와

두 단순화로 레이어당 어텐션 블록마다 **CVO_SCALE 1 개** + **CVO_TANH
1 개** 가 줄어듭니다. Gemma 3N E4B 35 레이어 기준으로 토큰당 70 회의
CVO 호출 절감. v002.1 처리량 목표 (~20 tok/s; :doc:`../../roadmap`
CVO 호출 절감. v002.1 처리량 목표 (:doc:`../../roadmap`
참고) 기준으로 SFU 예산에서 약 2–3 % 의 시간 이득입니다.

.. seealso::
Expand Down
4 changes: 2 additions & 2 deletions ko/docs/v002/Models/gemma3n_execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -205,7 +205,7 @@ end-to-end 디코드 목표:

.. note::

20 tok/s 수치는 **v002.1** 릴리스 라인의 목표 (v002.0 베이스라인
v002.1 처리량 목표는 릴리스 라인의 목표 (v002.0 베이스라인
RTL 위에 sparsity + speculative decoding 적층) 입니다. v002.0
릴리스 라인은 측정만 (measured-only) — KV260 보드 근거가
보고되기 전까지 처리량 수치를 주장하지 않습니다.
Expand All @@ -219,7 +219,7 @@ end-to-end 디코드 목표:
- 목표
- 병목 원인
* - 디코드 처리량
- **20 tok/s** — v002.1 목표
- v002.1 처리량 목표
- 400 MHz × 4 레인 × 1024 MAC/clk 에서 GEMV 대역폭.
* - L2 활성화 대역폭
- **~1.6 GB/s**
Expand Down
2 changes: 1 addition & 1 deletion ko/docs/v002/Models/gemma3n_overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ Gemma 3N E4B — 개요
========================

pccx v002 는 베어메탈 Kria KV260 에서 **Gemma 3N E4B** 를 돌리는 것을
기준으로 설계되었습니다. 20 tok/s 디코딩 수치는 **v002.1** 릴리스
기준으로 설계되었습니다. v002.1 디코딩 목표는 릴리스
라인의 목표 (v002.0 베이스라인 RTL 위에 sparsity + speculative
decoding 적층) 입니다 — v002.0 릴리스 라인은 측정만 (measured-only)
입니다. 단계별 릴리스 구분은 :doc:`../../roadmap` 참고. 연산자 수준
Expand Down
2 changes: 1 addition & 1 deletion ko/docs/v002/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@
- 목표
- 근거
* - 디코딩 처리량
- **20 tok/s (Gemma 3N E4B)** — v002.1 목표
- Gemma 3N E4B 디코딩 — v002.1 처리량 목표
- L2 캐시 — GEMV 코어 사이 bandwidth 매칭
* - 코어 동작 주파수
- **400 MHz**
Expand Down
Loading