diff --git a/docs/design-decisions.md b/docs/design-decisions.md new file mode 100644 index 000000000..35bcc5163 --- /dev/null +++ b/docs/design-decisions.md @@ -0,0 +1,65 @@ +# Design Decisions + +This page records public architecture decisions that affect the pccx +documentation line. It is a decision log, not an implementation-status +or performance-evidence page. + +## v002.1 Datapath Decisions + +Source: [pccxai/pccx-FPGA-NPU-LLM-kv260#80][kv260-80]. + +`kv260#80` resolves three defaults for follow-on v002.1 RTL, testbench, +software, and synthesis work. The upstream PR is docs-only: it does not +change datapath behavior, timing constraints, utilization evidence, or +measured throughput. + +### Activation Quantization + +**Decision:** use `e_max` / block-floating-point power-of-two activation +scaling as the v002.1 default. The parameter handle is +`ACT_SCALE_POLICY`, with the default mode `ACT_SCALE_EMAX_BFP`. + +**Rationale:** the current preprocess direction already has an exponent +scan path, so this default gives the BF16-to-INT8 conversion a small, +deterministic first target. True symmetric INT8 and driver-provided scale +tables remain useful reviewed modes, but they require extra policy and +hardware/software interface work before they should become the default. + +**Boundary:** this decision selects the first default policy. It does +not claim final task accuracy for quantized activations, and it does +not close later work on symmetric INT8, scale tables, saturation +conventions, or activation-scale restore. + +### K-Drain Limit + +**Decision:** make the K-split accumulator drain cadence parameterized, +with `K_DRAIN_LIMIT = 1024` as the v002.1 default. + +**Rationale:** `1024` matches the current W4A8 packer and sign-recovery +bit-width budget and the existing smoke-test assumptions. Keeping the +value as a parameter leaves room for `4096` or another reviewed limit +after the packer, sign recovery, scheduler, and testbench bounds all +derive from the same setting. + +**Boundary:** this decision does not assert that every long-K tile is +already drained by a completed scheduler path. It sets the default and +the parameter name that later RTL and tests should consume. + +### DSP Accounting Baseline + +**Decision:** report the v002.1 architectural DSP baseline as +`DSP_BASELINE_GEMM + DSP_BASELINE_GEMV + DSP_BASELINE_ALPHA`, with +`DSP_BASELINE_GEMM = 1024`, `DSP_BASELINE_GEMV = 64`, and +implementation extras reported under `DSP_BASELINE_ALPHA`. + +**Rationale:** the baseline should describe the intended compute-core +geometry: a 32 x 32 GEMM grid plus four GEMV lanes that each use 16 +DSP48E2 slices in the first reduction stage. Extra DSP use from final +accumulators, CVO/SFU, post-process, debug, or synthesis side effects +should be visible as alpha instead of hidden inside the denominator. + +**Boundary:** this decision is an accounting convention. It does not +claim a final utilization number or timing-closed device fit; synthesis +reports remain the evidence source for actual DSP usage. + +[kv260-80]: https://github.com/pccxai/pccx-FPGA-NPU-LLM-kv260/pull/80 diff --git a/docs/index.rst b/docs/index.rst index e65147752..3acba019b 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -100,7 +100,9 @@ Working tracks for the next release lines: candidates ResNet18 / YOLOv8n / MobileNetV3). The :doc:`roadmap` summarises how the three tracks relate, and the -``pccx`` family-tree figure on that page links them visually. +``pccx`` family-tree figure on that page links them visually. Public +architecture defaults that need a stable reference are recorded in +:doc:`design-decisions`. The v001 architecture is archived at :doc:`archive/experimental_v001/index`. @@ -202,3 +204,4 @@ risks, keeping the ecosystem safe for open-source hardware development. v003/index vision-v001/index + design-decisions