Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7d317a3
fix(streaming): signal finish_reason=length when a tool-loop guard ca…
tbraun96 May 23, 2026
d1ac545
fix(streaming): cancel scheduler when a loop guard fires, not just su…
tbraun96 May 23, 2026
7da054d
fix(streaming): detect & cancel in-think `<tool_call>` leak (Qwen3.6 …
tbraun96 May 23, 2026
1bb82ed
pre-refactor: Phase 2c precision + watchdog baseline
tbraun96 May 24, 2026
6097910
phase-A: remove all 13 always-on prompt injections
tbraun96 May 24, 2026
060ffe1
phase-B: vLLM-style stop-string holdback + per-request RepetitionDete…
tbraun96 May 24, 2026
bfa4666
hotfix(watchdog,budget): tune from phaseAB live opencode test
tbraun96 May 24, 2026
755db59
hotfix-2: cap orphan-tool-call suppression streak (kill at 256 tokens)
tbraun96 May 24, 2026
f46d9f4
phase-C-2 (part 1/2): extract pre-sample logits pipeline scaffold
tbraun96 May 24, 2026
622e8df
hotfix-2b: orphan-suppression check must run on every token, not only…
tbraun96 May 24, 2026
795d8d1
hotfix-3: content-loop watchdog must run on MTP path too
tbraun96 May 24, 2026
c6650d3
phase-C-2 (part 2/2): wire LogitsProcessor pipeline into all decode p…
tbraun96 May 24, 2026
d3adea7
realfix2: speculative xgrammar advance between verify positions (K>=2)
tbraun96 May 24, 2026
29a0a7c
phase-2c day-1: KV cache sweep — 18 configs across dgx1+dgx2, KV dime…
tbraun96 May 24, 2026
3580978
phase-2c day-2: kernel bisect infrastructure + 3 NEGATIVE bisects
tbraun96 May 24, 2026
ea44fe6
phase-2c day-3: NVFP4 weight checkpoint test — BREAKTHROUGH
tbraun96 May 24, 2026
72644aa
hotfix: stuck-in-tool-body watchdog (NVFP4 doom-loop fix)
tbraun96 May 24, 2026
303cbab
phase-2c day-3 audit: causal-pathway map of FP8 → NVFP4 dispatch points
tbraun96 May 24, 2026
0467d1e
phase-2c day-3 Bug #1 attempt: REVERT — kernel infrastructure not ready
tbraun96 May 24, 2026
8d2cc87
fp8-merge73: native FP8 SSM + byte-exact streaming + PR 73 qwen3_xml …
tbraun96 May 25, 2026
e99159d
grammar: qwen3_coder body uses any_text (matches XML wire format)
tbraun96 May 25, 2026
2a2500c
validator: reject empty 'command'/'cmd'/'script' for shell tools
tbraun96 May 25, 2026
eaaa269
tool_handlers: soft-pass empty-required-string validation errors
tbraun96 May 25, 2026
49bad35
kernel/moe_fp8: two-level FP32 accumulation (DeepGEMM pattern)
tbraun96 May 25, 2026
4fa47b6
grammar+sampler: Tier-0 EBNF + Tier-1 byte-counter mask for tool params
tbraun96 May 26, 2026
6f9d595
mission-12h: Tier-2 strict path/cmd validators + final mission report
tbraun96 May 26, 2026
8c296ea
fp8-drift: o_proj W8A8 N/K fix + GPU dequant kernel + BF16 MoE path
tbraun96 May 28, 2026
d03197c
opencode-fix: relative-path validator + WS-mask newline exclusion + d…
tbraun96 May 28, 2026
6608824
diag: complete per-step logit dump (ATLAS_LOGIT_DUMP)
tbraun96 May 28, 2026
25f8bbe
fp8-prefix-cache: fix exact-hit SSM double-advance (Marconi snapshot)
tbraun96 May 29, 2026
367846f
mtp+moe: fix MTP 0%-accept (fp32-residual dtype bug) + BF16 router (t…
tbraun96 May 29, 2026
d7a4da8
tool-parser: server-side write-path drift recovery (ATLAS_WRITE_PATH_…
tbraun96 May 29, 2026
b0779b9
residual: remove FP32-residual feature — BF16 residual stream always
tbraun96 May 29, 2026
9922d4c
fix(tool-recovery): recover FP8-drifted file-write tool calls (3 modes)
tbraun96 May 29, 2026
d0b95f1
fix(moe): route MTP K=2/K=3 verify through BF16 path when experts are…
tbraun96 May 30, 2026
d2eb167
test(toml_repair): add r105/r110/r4 TOML-shape probe tests
tbraun96 May 31, 2026
dc6ea50
debug(kernels): ATLAS_DEBUG_SYNC_KERNELS + ATLAS_DEBUG_NO_GRAPH diagn…
tbraun96 May 31, 2026
2db83dc
fix(attn): multi-seq O-proj BF16 branch for ATLAS_FP8_DEQUANT_ATTN_TO…
tbraun96 May 31, 2026
bb2b53f
chore(docker): fast-layer build helper Dockerfile.fencesalvage
tbraun96 May 31, 2026
f7525bd
feat(quant): ATLAS_FP8_DEQUANT_LAYERS — selective per-layer BF16 dequant
tbraun96 May 31, 2026
a970624
feat(agentic): BW1 bash-wandering / content-completeness watchdog
tbraun96 May 31, 2026
fd688ab
bench: opencode harness evidence trail (FP8 drift / BF16 lever / BW1 …
tbraun96 Jun 1, 2026
68c3c50
fix(tool-salvage): guard EOF-fence slice panic; repair C-style // com…
tbraun96 Jun 1, 2026
4521dc7
fix(loop-detect): tool calls are progress — stop spinning-detector ki…
tbraun96 Jun 1, 2026
c487bc4
fix(prefix-cache): recompute SSM over [snap_tok,total) when snapshot …
tbraun96 Jun 1, 2026
7e8e2d6
prefix-cache: ATLAS_NO_MARCONI_EXACT diagnostic gate + partial-hit re…
tbraun96 Jun 1, 2026
3d43e2f
webserver_ok F1-F5: bound runaway via post-think content cap + watchd…
tbraun96 Jun 2, 2026
bc9f694
fix(qwen3.6-fp8): rep_penalty 1.1->1.0 on sampler presets + tool-JSON…
tbraun96 Jun 3, 2026
52244ab
fix(qwen3.6-fp8): 10/10 webserver_ok MTP-on — delete tool-call band-a…
tbraun96 Jun 3, 2026
0ff94b5
perf(qwen3.6): phase-2 decode profiling — host-path stage timing + sp…
tbraun96 Jun 4, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,9 +1,15 @@
# Build artifacts
target/
**/target/
# Out-of-tree cargo target dirs used for parallel/experiment builds
# (target-final, target-fu, target-i84, target-phc, target-rc, ...).
target-*/
*.o
*.ptx
*.bak
# Local docker build input: release binary copied in before `docker build`.
# Regenerated from `target/release/spark`; never tracked.
docker/gb10/spark-fastbin

# tool-eval-bench run reports — local benchmark output, not source
/runs/
Expand Down
2 changes: 2 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

7 changes: 6 additions & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,12 @@ cudarc = { version = "0.19", features = ["driver", "nvrtc", "cuda-version-from-b

# Serialization (HF config.json)
serde = { version = "1", features = ["derive"] }
serde_json = "1"
# `preserve_order`: keep JSON object keys in insertion/declaration order
# (IndexMap, not BTreeMap). Required so `serde_json::to_value(ToolDefinition)`
# and the nested `parameters` JSON-schema Value match transformers'
# `tojson` (= `json.dumps(sort_keys=False)`) key order in the `<tools>`
# prompt block. See crates/spark-server/src/tokenizer/jinja_helpers.rs.
serde_json = { version = "1", features = ["preserve_order"] }

# Error handling
thiserror = "2"
Expand Down
46 changes: 46 additions & 0 deletions bench/fp8_dgx2_drift/FP8_COSINE_RESULTS_2026_05_29.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
layer type cos rel_l2 |vllm| |atlas|
----------------------------------------------------
0 ssm 0.99993 0.01197 0.6 0.6
1 ssm 0.99979 0.02077 0.8 0.8
2 ssm 0.99902 0.04431 1.0 1.0
3 attn 0.99874 0.05058 1.1 1.1 <-- divergence onset (cos<0.999)
4 ssm 0.99851 0.05465 1.1 1.1
5 ssm 0.99872 0.05059 1.3 1.3
6 ssm 0.99914 0.04149 1.3 1.3
7 attn 0.99811 0.06280 1.3 1.3
8 ssm 0.99804 0.06429 1.3 1.3
9 ssm 0.99808 0.06284 1.3 1.4
10 ssm 0.99845 0.05645 1.6 1.6
11 attn 0.99775 0.06734 1.6 1.6
12 ssm 0.99785 0.06570 1.5 1.5
13 ssm 0.99785 0.06604 1.5 1.6
14 ssm 0.99673 0.08093 1.6 1.6
15 attn 0.99188 0.12740 1.7 1.7
16 ssm 0.99290 0.11916 1.9 1.9
17 ssm 0.99236 0.12357 1.9 1.9
18 ssm 0.99410 0.10848 2.2 2.2
19 attn 0.99410 0.10851 2.8 2.8
20 ssm 0.99430 0.10677 2.8 2.9
21 ssm 0.99461 0.10373 2.8 2.8
22 ssm 0.99409 0.10868 3.2 3.2
23 attn 0.99353 0.11534 3.3 3.2
24 ssm 0.99183 0.12967 3.1 3.1
25 ssm 0.99296 0.12099 3.5 3.4
26 ssm 0.99469 0.10615 3.8 3.7
27 attn 0.99464 0.10609 4.2 4.1
28 ssm 0.99490 0.10416 4.5 4.4
29 ssm 0.99483 0.10377 4.6 4.6
30 ssm 0.99341 0.11750 4.9 4.8
31 attn 0.99291 0.11950 5.8 5.8
32 ssm 0.99209 0.12568 6.3 6.3
33 ssm 0.99126 0.13321 7.2 7.1
34 ssm 0.99233 0.12482 8.2 8.1
35 attn 0.99011 0.14054 9.2 9.2
36 ssm 0.98919 0.14737 9.8 9.7
37 ssm 0.98809 0.15493 11.3 11.3
38 ssm 0.98650 0.16487 14.3 14.2
39 attn 0.98766 0.15663 16.9 17.1
----------------------------------------------------
worst layer: L38 (ssm) cos=0.98650 rel_l2=0.1649
final-layer L39 cos=0.98766
DIVERGENCE ONSET: L3 — inspect this layer's ops (attn/SSM/MoE/norm) next
295 changes: 295 additions & 0 deletions bench/fp8_dgx2_drift/FP8_GAP_TRACKER.md

Large diffs are not rendered by default.

612 changes: 612 additions & 0 deletions bench/fp8_dgx2_drift/MASTER_DRIFT_TABLE.md

Large diffs are not rendered by default.

488 changes: 488 additions & 0 deletions bench/fp8_dgx2_drift/MISSION_PROGRESS.md

Large diffs are not rendered by default.

403 changes: 403 additions & 0 deletions bench/fp8_dgx2_drift/STATUS.md

Large diffs are not rendered by default.

97 changes: 97 additions & 0 deletions bench/fp8_dgx2_drift/atlas_tokenize.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
#!/usr/bin/env python3
"""Reproduce Atlas's exact tokenization of the probe JSON.

Uses Atlas's own OpenAI-variant Jinja template (jinja-templates/openai/qwen3_5_moe.jinja)
to render the same string Atlas sees, then encodes via the model's HF tokenizer.

Writes /tmp/atlas_tokens_dgx2.json in the same format as
/tmp/atlas_tokens.json so hf_dual_forward.py can consume it.
"""
from __future__ import annotations

import json
import pathlib

import jinja2
from transformers import AutoTokenizer

TEMPLATE_PATH = pathlib.Path("/workspace/atlas-mtp/jinja-templates/openai/qwen3_5_moe.jinja")
TOKENIZER_SNAP = "/workspace/.cache/huggingface/Qwen3.6-35B-A3B-FP8-dequanted-BF16"
PROBE_PATH = pathlib.Path("/workspace/atlas-dumps/numdrift/atlas_turn11_probe.json")
OUT_PATH = pathlib.Path("/tmp/atlas_tokens_dgx2.json")
TARGET_TOKEN_COUNT = 9780 # what Atlas reports today


def normalize_tool_call_arguments(messages):
"""Atlas's chat_impl.rs F76: pre-parse tool_call argument strings into dicts."""
out = []
for m in messages:
m2 = dict(m)
if m2.get("tool_calls"):
new_calls = []
for tc in m2["tool_calls"]:
tc2 = dict(tc)
if tc2.get("function") and isinstance(tc2["function"].get("arguments"), str):
fn2 = dict(tc2["function"])
try:
fn2["arguments"] = json.loads(fn2["arguments"])
except Exception:
pass
tc2["function"] = fn2
new_calls.append(tc2)
m2["tool_calls"] = new_calls
out.append(m2)
return out


def main() -> None:
probe = json.loads(PROBE_PATH.read_text())
template_src = TEMPLATE_PATH.read_text()

env = jinja2.Environment(
loader=jinja2.BaseLoader(),
trim_blocks=False,
lstrip_blocks=False,
keep_trailing_newline=True,
)
tmpl = env.from_string(template_src)

messages = normalize_tool_call_arguments(probe["messages"])
tools = probe.get("tools")

rendered = tmpl.render(
messages=messages,
tools=tools,
add_generation_prompt=True,
enable_thinking=True,
reasoning_effort="high",
disable_tool_steering=False,
add_vision_id=False,
)
print(f"rendered len chars: {len(rendered)}")

tok = AutoTokenizer.from_pretrained(TOKENIZER_SNAP)
ids = tok(rendered, add_special_tokens=False, return_tensors=None)["input_ids"]
print(f"token count: {len(ids)}")
print(f"first 10: {ids[:10]}")
print(f"last 10: {ids[-10:]}")

out = {
"prompt_len": len(ids),
"all_tokens": ids,
"generated_tokens": [],
}
OUT_PATH.write_text(json.dumps(out))
print(f"wrote {OUT_PATH}")

if len(ids) != TARGET_TOKEN_COUNT:
print(
f"\nWARN: count {len(ids)} != target {TARGET_TOKEN_COUNT} "
f"(Atlas-reported); template/encoder difference may bias HF dump"
)
else:
print(f"\nMATCH: count == {TARGET_TOKEN_COUNT}")


if __name__ == "__main__":
main()
1 change: 1 addition & 0 deletions bench/fp8_dgx2_drift/atlas_tokens_dgx2.json

Large diffs are not rendered by default.

17 changes: 17 additions & 0 deletions bench/fp8_dgx2_drift/c1_final_logit_overlap.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{
"residual_cos_L39": 0.9895407557487488,
"logit_cos": 0.9872766137123108,
"argmax_atlas_token": 6820,
"argmax_bf16_token": 6820,
"top1_agree": true,
"topk_jaccard": {
"1": 1.0,
"5": 1.0,
"10": 1.0,
"50": 0.8867924528301887,
"200": 0.8348623853211009,
"1000": 0.8399264029438822
},
"kl_bf16_vs_atlas": 0.020466946885892377,
"kl_atlas_vs_bf16": 0.020438889431768144
}
Loading
Loading