Feature/siewwwin#12
Open
LingSiewWin wants to merge 67 commits into
Open
Conversation
The replace directive pointed to ../../solana-go which doesn't exist in the repo, causing all Go builds to fail at module resolution. - Remove `replace github.com/gagliardetto/solana-go => ../../solana-go` - Pin require to v1.20.0 (latest stable upstream release) - `go mod tidy` cleaned up unused indirect deps (swaggo/*, openapi/*, edwards25519, KyleBanks/depth) that were transitive only through the vendored copy Verified: `go build ./...` and `go test ./...` both exit 0.
`pickPreferred` returned `solana ?? reqs[0]`, but under noUncheckedIndexedAccess `reqs[0]` is `X402Requirement | undefined`. Callers always guard with `length > 0` so this was safe at runtime, but the type lied — and a bug at the call site would silently yield undefined instead of failing fast. - Replace the nullish-coalesce with an explicit if/throw flow - Empty input now throws with a clear message instead of silently bypassing the type contract Verified: src/detect/x402.ts(108) TS2322 gone. SDK tests 116 pass / 7 fail (same 7 pre-existing session-fixture failures, audit #9).
@solana/mpp 0.5.x removed `solana.session()` — session-based
pay-as-you-go (deposit + TTL + auto-topup) is now Tempo-only upstream.
The Solana side ships only `solana()` / `solana.charge()` for per-request
payment, and Mppx.create(...).close was removed (no session lifecycle to
tear down in per-charge mode).
- Replace `mppClient.solana.session({ signer, autoOpen, autoTopup,
sessionDefaults: { suggestedDeposit, ttlSeconds }})` with the supported
`mppClient.solana({ signer })` per-charge form
- Drop the unused `mppx.close?.bind(mppx)` — no longer on Mppx 0.5.x
- Mark `maxDepositUsd`, `ttlSeconds`, `autoTopup` params as intentionally
unused (`_`-prefixed) — they're enforced by the outer Rhemify session()
wrapper's governance, not by MPP itself
- Add TODO(tempo) anchoring future work to register tempo.session()
alongside solana() once RhemifyConfig.wallet gains a tempoAccount
Behavior change: each governed fetch is now its own Solana tx (per-charge)
instead of a single batched session settlement. The Rhemify-level session()
wrapper continues to enforce maxDepositUsd cap, TTL, cumulative spend, and
trace emission — those guarantees come from the wrapper, not MPP.
Verified:
packages/sdk: bunx tsc --noEmit exit 0 (was: 2 TS2339 errors)
packages/sdk: bun test 116 pass / 7 fail (same 7 pre-existing
session-fixture failures from audit #9, zero new regressions)
Note on Done definition: no live MPP-protected endpoint exists in this
repo, so true e2e proof (real 402 challenge → solana.charge → settled tx)
is pending live integration. Type check + unit suite confirm the API
adaptation is mechanically correct.
The session governance suite passed `"fake-solana-key"` as `config.wallet.solanaPrivateKey`. With @solana/mpp installed in node_modules, openMppSession takes the live path (not the test fallback) and calls decodeSolanaKey, which threw "Invalid Solana private key format. Expected JSON array, base64, or hex." All 7 tests in `session() governance wrapper` failed for this single reason. Generate a real ed25519 keypair via @solana/web3.js once at module load and reuse it across all tests. Passes both decodeSolanaKey (JSON array length 64) and @solana/kit createKeyPairFromBytes (real ed25519 bytes). Verified: packages/sdk: bun test 123 pass / 0 fail (was: 116 pass / 7 fail)
Closes three audit findings in the Anchor program suite:
1. write_daily_root squat (rhemify-anchor)
Anyone could write a daily merkle root for any (fleet_id, date) tuple
and become its recorded `authority`, frontrunning legitimate fleet
operators and corrupting the canonical anchor record.
2. initialize_fleet_vault race-init (rhemify-dwallet)
Anyone could call initialize_fleet_vault for any fleet_id first, become
the vault's `authority`, set their own `co_signer`, and use that
co_signer to approve withdrawals via approve_signing — a full takeover
of the agent's funds path.
3. daily_cap stored but never enforced (rhemify-dwallet)
FleetVault.daily_cap was written at init but approve_signing only
checked the per-agent daily_limit. With multiple agents each at their
max-per-tx, the fleet aggregate could exceed the intended ceiling.
Fix is one consistent design: user-scoped PDAs across all fleet-derived
accounts. Adversaries can still create their own fleets, but their PDAs
derive at different addresses than legit users' — the namespace squat
attacks are no longer possible.
Seed changes (8 sites across 6 instruction files):
FleetVault: [b"fleet-vault", fleet_id]
-> [b"fleet-vault", authority.key().as_ref(), fleet_id]
AgentWallet: [b"agent-wallet", fleet_id, agent_key]
-> [b"agent-wallet", authority.key().as_ref(), fleet_id, agent_key]
DailyRoot: [b"rhemify-daily", fleet_id, date]
-> [b"rhemify-daily", authority.key().as_ref(), fleet_id, date]
approve_signing reorders accounts so fleet_vault is declared first,
then references fleet_vault.authority.as_ref() in agent_wallet seeds
(no `authority` signer in this ix — co_signer signs).
State + logic changes:
FleetVault gains daily_spent: u64 + last_reset_day: i64 (+16 bytes
INIT_SPACE). approve_signing now takes fleet_vault as &mut, mirrors the
agent-wallet daily-reset block, and checks/updates fleet daily_spent.
New error variant: ExceedsFleetDailyCap.
Migration: FleetVault layout grows 16 bytes. Pre-launch — no production
state. Existing devnet accounts under the old (unfixed) program IDs are
not migrated; new program IDs assigned to the fresh deploys (declare_id!
+ Anchor.toml updated).
Verified end-to-end on devnet:
cargo check (rhemify-anchor): exit 0 (6 pre-existing cfg warnings)
cargo check (rhemify-dwallet): exit 0 (9 pre-existing cfg warnings)
cargo build-sbf (rhemify-anchor): produced 150,728-byte .so
cargo build-sbf (rhemify-dwallet): produced 211,648-byte .so
rhemify_anchor deployed to devnet:
Program ID: HYWjBbLMEz98KnppVkUnHmkUZ4pyQ8abaDRTtUedUkxV
Deploy tx: 37CJCxvEdqGwn9W3caf6HZNJku83D8EjHF5EfM1Yg5HLgqKMhzYcgpDcNsz3C47hXTPwujqGSrWePHfqmdECSFFr
Slot: 461436925
Explorer: https://explorer.solana.com/tx/37CJCxvEdqGwn9W3caf6HZNJku83D8EjHF5EfM1Yg5HLgqKMhzYcgpDcNsz3C47hXTPwujqGSrWePHfqmdECSFFr?cluster=devnet
rhemify_dwallet deployed to devnet:
Program ID: GPgdzfwQ4qG1QcqePY3uR6Uo8SvCwqxRYg7oDsXd5opc
Deploy tx: 4fGSJAftgdAZnjt5viYPLcU2jgQDCTaAKNNrrE8eityQxcaPHNZ13bicfK6UVe22w8AMVy6oXWDZ5J8KZhnMG58h
Slot: 461436946
Explorer: https://explorer.solana.com/tx/4fGSJAftgdAZnjt5viYPLcU2jgQDCTaAKNNrrE8eityQxcaPHNZ13bicfK6UVe22w8AMVy6oXWDZ5J8KZhnMG58h?cluster=devnet
Both programs verified live via `solana program show` — owned by
BPFLoaderUpgradeable, authority 8usJ1ShvoR3e74E6WMaNk2owwGUf87MuCuBJHdPgdEnQ.
Follow-up (not in this commit): add Mollusk happy-path tests asserting
the access-control denial flow rejects a wrong-authority signer at the
seeds-derivation step.
Real instruction-level proof for Phase C (commit 149c077). The deploy proved the bytecode is live; this proves the new user-scoped seeds and the migrated FleetVault layout work end-to-end on devnet. Hand-encodes the Anchor instruction discriminator (sha256 prefix) and borsh-encoded args — no IDL needed, since cargo-build-sbf doesn't ship IDL generation and Anchor CLI 1.0.0 wouldn't install on this machine (LLVM bitcode mismatch with Homebrew rustc). Verified: bun run smoke → vault account created on-chain Authority: 8usJ1ShvoR3e74E6WMaNk2owwGUf87MuCuBJHdPgdEnQ Fleet ID: e2e-1778433401599 Vault PDA: CKLZaGoayjXwNX5rhqZLyfjxgrJoPUcRfUctT84sGGQ9 Vault size: 210 bytes (= old 194 + 16 from daily_spent + last_reset_day) confirms Phase C state migration is live in bytecode Tx: 7kRHx9iXgGnzzwbVSKEkFppzDkpBXD3cg2FwGVhL74pPtWcgDFN7RvDoL8xUMLkWPStd9FALc4Qgwvjy63VtyTF Explorer: https://explorer.solana.com/tx/7kRHx9iXgGnzzwbVSKEkFppzDkpBXD3cg2FwGVhL74pPtWcgDFN7RvDoL8xUMLkWPStd9FALc4Qgwvjy63VtyTF?cluster=devnet Cost: 0.002357 SOL (tx fee + rent for the new vault account) To re-run (each invocation creates a fresh vault under a unique fleet_id): cd tools/devnet-smoke && bun run smoke
Two parties (legit + attacker, distinct keypairs) both call initialize_fleet_vault with the SAME fleet_id. After Phase C the seeds are `[b"fleet-vault", authority.key(), fleet_id]`, so the two writes land at DIFFERENT PDAs and both succeed independently. Under the old `[b"fleet-vault", fleet_id]` seeds these would have collided at one address and the second caller would have failed with "account already in use" — the squat attack closed in Phase C is now structurally impossible. Verified on devnet: Shared fleet_id: squat-1778433715564 Legit authority: 8usJ1ShvoR3e74E6WMaNk2owwGUf87MuCuBJHdPgdEnQ Legit vault PDA: Aqya6CAamPnZBnkHpXQ934MAMp1BaSfEidVJq41TdHnj (bump 255, 210 B) Legit init tx: 4UGtgCLvHABdjSgizm75GerVVKTyzzX8jZc3odDizNZjx34oZbJKVGx2SYgk57PmduAyZUrriqsrneJDfCnsLSsu Explorer: https://explorer.solana.com/tx/4UGtgCLvHABdjSgizm75GerVVKTyzzX8jZc3odDizNZjx34oZbJKVGx2SYgk57PmduAyZUrriqsrneJDfCnsLSsu?cluster=devnet Attacker authority: i1S2Q9m1sEaPmDBxh3hCZBfXwrdvMpMKxxdtJRnvdtb (fresh keypair, funded with 0.01 SOL) Attacker vault PDA: SUjThvQS9u89aYR33vjZdkbmJ3THeD7R236U9YCUzVG (bump 255, 210 B) Attacker init tx: 3hd6t1CcQKiwPFYYcgnykvRPyRa2hpPYMEHHRa6EmrrG4ShF3ywph8zeZCGpHvLJi4L2st9YWreKg34gy6cTTHgf Explorer: https://explorer.solana.com/tx/3hd6t1CcQKiwPFYYcgnykvRPyRa2hpPYMEHHRa6EmrrG4ShF3ywph8zeZCGpHvLJi4L2st9YWreKg34gy6cTTHgf?cluster=devnet Old (pre-Phase-C) collision PDA: 3LD76kMfKscCZfShiRtivofjGTwwrAn82SDZYgkeVGhu (where both parties would have collided under `[b"fleet-vault", fleet_id]`) The script reads ~/.config/solana/id.json for the legit user, generates a fresh attacker keypair each run, funds it from legit, and uses a timestamped fleet_id so consecutive runs don't collide. Re-run: cd tools/devnet-smoke && bun run squat
Pre-Phase-C, FleetVault.daily_cap was set at init but approve_signing
only checked the per-agent daily_limit — the field was dead code. After
Phase C the field is load-bearing. This script proves it actively on
devnet.
Setup: vault.daily_cap=10000, agent.max_per_tx=20000, agent.daily_limit=100000
(agent limits intentionally loose so we don't trip ExceedsDailyLimit
before reaching the fleet check).
Steps:
1. init vault (legit user signs, co_signer = controlled keypair)
2. register agent (legit user signs)
3. fund co_signer 0.05 SOL
4. approve_signing(amount=8000) → must SUCCEED, vault.daily_spent=8000
5. approve_signing(amount=5000) → must FAIL: 8000+5000=13000 > 10000
with error ExceedsFleetDailyCap
The script asserts the failure logs contain "ExceedsFleetDailyCap" — a
generic transaction failure (wrong error code) is rejected as a false
positive.
Verified on devnet:
Authority: 8usJ1ShvoR3e74E6WMaNk2owwGUf87MuCuBJHdPgdEnQ
Fleet: cap-1778433977458
Vault PDA: 3AkhmRNWHQdD9r8LexCxEAqA5qkL9bbXPdcRrPFEm33y
Agent PDA: 3GsVzpgkAoyAudKsgCWHQbN6M8CeBVgbRSGqYfdr1stM
init vault tx: 5PUnVNgkHWE3KTZJW54iAQbwC8a1UVuw8ykxAaqXWtG8kMK8i5T9BW7tWkvKBEs1qo4naVGNcgRm2RM92oGe8kUU
register agent tx: 35Cf7n7uWFuPJSTenqA6S6jJMZKPNW2XXJCbAW7RRkYWUR8ABBGzYXpBaJDmjbM7KTMr1Kj4ombcq2CJgihWH6Z3
approve #1 (8000): 5Ja7pqGNPz6B5t9HXEkfYKSGNpz5EKD4cCDJ15NZcydJZ8GVq5tNmYzxSJvWKCvPgDjGJD7dLgUN9KYZAxmwJA9j
approve #2 (5000): rejected with ExceedsFleetDailyCap (expected)
Explorer:
init: https://explorer.solana.com/tx/5PUnVNgkHWE3KTZJW54iAQbwC8a1UVuw8ykxAaqXWtG8kMK8i5T9BW7tWkvKBEs1qo4naVGNcgRm2RM92oGe8kUU?cluster=devnet
agent: https://explorer.solana.com/tx/35Cf7n7uWFuPJSTenqA6S6jJMZKPNW2XXJCbAW7RRkYWUR8ABBGzYXpBaJDmjbM7KTMr1Kj4ombcq2CJgihWH6Z3?cluster=devnet
approve#1:https://explorer.solana.com/tx/5Ja7pqGNPz6B5t9HXEkfYKSGNpz5EKD4cCDJ15NZcydJZ8GVq5tNmYzxSJvWKCvPgDjGJD7dLgUN9KYZAxmwJA9j?cluster=devnet
Re-run: cd tools/devnet-smoke && bun run daily-cap
Reverts the route kill-switch from commit 432b2f6 ("feat: gate demo behind UUID, block all other routes"). Restores the original layouts: - _onboarding: theme-onboarding wrapper, header with Rhemify wordmark + ProgressBar (4-step), Outlet for /signup, /build, /fund, /deploy - dashboard: dark theme wrapper, Sidebar + Topbar, Outlet for nested routes (overview, policies, wallets, approvals, agent detail) - login: SignInForm / SignUpForm toggle Per audit #3 the kill-switch was a deliberate product gate ("Coming soon" on /, demo only at the UUID URL). Re-enabling because Solana Foundation submission demos the full stack: onboarding flow → fleet creation → dashboard → operational views. Verified live (HTTP responses from `bun run dev:web`, port 3001): / HTTP 200 71468 bytes (marketing) /signup HTTP 200 7002 bytes (theme-onboarding markup confirmed) /build HTTP 200 10581 bytes /fund HTTP 200 8010 bytes /deploy HTTP 200 8040 bytes /dashboard HTTP 200 9661 bytes (dark theme, Sidebar + Topbar markup confirmed) /login HTTP 200 4221 bytes (SignInForm / SignUpForm) Server boot required apps/web/.env (gitignored) with VITE_CONVEX_URL, VITE_CONVEX_SITE_URL, CONVEX_URL, CONVEX_SITE_URL, CORS_ORIGIN — placeholder URLs are sufficient for SSR shell, real values needed for Convex queries to return data. Documented in apps/server/.env.example pattern. Pre-existing TS warnings in apps/web (sidebar.tsx:59 path-type narrowing, mock-wallet-service.ts:1 + wallet-service.ts:1 unused Chain import) are unrelated to this commit and existed under the kill-switch — they were just hidden because dashboard.tsx never rendered. Out of scope here.
Per ADR-002, pin @ika.xyz/sdk@0.3.1 + @mysten/sui ^2.5.0 (was "latest"
on both, which resolved to 0.4.x and broke against the code that was
written for an older 0.2.x API). 11 TS errors cleared by adapting each
call site to grounded 0.3.1 signatures (read from the installed
.d.ts files, not memory).
Adaptations made:
1. SuiJsonRpcClient({ url }) → SuiJsonRpcClient({ url, network })
`SuiJsonRpcClientOptions` requires both fields in 2.16.
2. UserShareEncryptionKeys.fromRootSeed(client, seed)
→ UserShareEncryptionKeys.fromRootSeedKey(seedBytes, curve)
No longer takes IkaClient. Uses decodeSuiPrivateKey to peel the
suiprivkey1... bech32 string into raw bytes.
3. userShareEncryptionKeys.prepareDKGRequestInput(client, curve)
→ prepareDKG(protocolPublicParameters, curve, encryptionKey,
bytesToHash, senderAddress) — free function from
@ika.xyz/sdk/cryptography. Sources protocol params via
ikaClient.getProtocolPublicParameters(undefined, curve)
and senderAddress via keypair.toSuiAddress().
4. ikaClient.getActiveEncryptionKey() → getActiveEncryptionKey(address).
5. ikaTx.createRandomSessionIdentifier() → registerSessionIdentifier(
sessionBytes) — using the same 32-byte bytesToHash consumed by
prepareDKG so the on-chain session id matches the proof binding.
6. ikaTx.requestPresign({ ..., signatureAlgorithm: ECDSASecp256k1 })
The signatureAlgorithm field is now required.
7. ikaClient.getSign(signId) → getSign(signId, curve, signatureAlgorithm)
Three-arg form, defaults match the createPresign flow.
8. SignatureAlgorithm.Ecdsa → SignatureAlgorithm.ECDSASecp256k1.
9. Hono body type narrowing fix in /dkg handler — explicit typed
`let body: { curve?: string }` so `.catch(() => ({}))` doesn't
widen to `{}` (causing TS2339 on `body.curve`).
Honest scope on IkaService.sign():
The signing flow has structural API changes I cannot ground without
live Ika test network access (encrypted-share id lookup, dWallet type
narrowing to ZeroTrustDWallet | SharedDWallet, requestSign signature
shape, hashScheme valid-for-algorithm constraint). Per CLAUDE.md
"Every function must do real work or throw NotImplementedError with a
TODO" — sign() throws explicitly with a TODO checklist. /sign endpoint
returns 500 instead of pretending to sign.
Verified:
apps/ika-sidecar: bunx tsc --noEmit exit 0 (was: 11 TS errors)
bun run src/index.ts boots clean
curl :3010/health HTTP 200
{"status":"ok","initialized":false,
"network":"testnet"}
curl :3010/dwallet/abc HTTP 401 (no auth)
curl :3010/dwallet/abc -H Bearer HTTP 503 (service not initialized)
Not verified (requires Sui keypair + Ika test network):
/dkg — patched code path is grounded in d.ts, not run e2e
/presign — same
/sign — explicitly throws NotImplementedError per scope
/signature/:id — patched (3-arg getSign), not run e2e
Tests pass but I have not run /dkg, /presign, /sign end-to-end against
a live Ika network. Not done by strict definition for those endpoints.
The compile + boot + auth gate IS verified.
Audit #7. The schema declared enum fields with `v.string()` and inline comments listing the allowed values, leaving Convex's runtime validation loose: any string passed by clients (including malformed/untrusted input) would land in the table with no rejection until a downstream consumer choked on it. Tightened 16 enum fields across 13 tables: fleets.role, agents.status, agents.primary_standard, transactions.standard, transactions.status, payment_events.standard, payment_events.outcome, payment_traces.confidence, bridge_executions.protocol, bridge_executions.status, policy_decisions.decision, policy_decisions.standard, task_attributions.outcome, intelligence_actions.action_type, intelligence_actions.outcome, anchor_batches.status Pattern: extracted reusable validators as `export const`s at the top of schema.ts (PaymentStandard, FleetRole, AgentStatus, TransactionStatus, PaymentOutcome, Confidence, BridgeProtocol, BridgeStatus, PolicyDecision, TaskOutcome, IntelligenceActionType, IntelligenceOutcome, AnchorBatchStatus, SigningRequestStatus, DWalletType, DWalletStatus). defineTable references the consts so the table type and the args validator type stay in sync. Tightening surfaced 13 latent type-safety violations in the mutation handlers themselves — every `args: { foo: v.string() }` that fed into `db.insert/patch` of a now-narrowed field. Patched each at the args validator boundary (not by casting at the insert site) so: 1. Convex rejects bad enum values at the API edge rather than the DB write — clients get a clear validation error immediately. 2. The literal-union type propagates through args.foo → ctx.db.insert, so future regressions can't silently re-widen. Patched files (8 mutations across 8 files): agents.ts: DEFAULT_STANDARDS Record narrowed; setStatus.args.status anchors.ts: upsertBatch.args.status events.ts: insert.args.{standard, outcome} fleets.ts: create.args.role + update.args.role intelligence.ts: listActions.args.{action_type, outcome}; insertAction.args.action_type policies.ts: insertDecision.args.{decision, standard} traces.ts: insert.args.confidence transactions.ts: add.args.{standard, status} Verified: packages/backend: bunx tsc --noEmit exit code shows 0 errors in convex/ scope (was: 13 errors all in convex/*.ts). Remaining 946 errors are pre-existing drift in apps/web JSX flag + packages/ui JSX flag + tools/test-402 unused imports — orthogonal to this commit. NOT verified end-to-end on a live Convex deployment. The shared dev deployment `dev:quixotic-puma-190` is team-owned (per CLAUDE.local.md) and `bunx convex dev` would auto-push schema, affecting team data. Holding the schema diff in git; deploy lands when the team is ready to migrate. Tests pass but I have not run this against a live Convex deployment. Not done by strict definition. The compile-time proof IS verified.
Audit #10. The SDK shipped detectors for L402, AP2, and ACP that recognize challenge headers, but no executors — `pay()` against any of those protocols would throw a generic `ExecutionError("No executor available for l402 on lightning")` that callers couldn't differentiate from a transient failure or a mis-configured wallet. Closes the gap with two structural changes: 1. New error class `ProtocolNotImplementedError` (code `PROTOCOL_NOT_IMPLEMENTED`) carrying the detected `protocol` + `network`. UIs can `instanceof` it (or switch on the code) to render "this server uses L402, which Rhemify doesn't support yet" rather than a generic execution failure. The detection still succeeds, so the diagnostic path is preserved. 2. Stub executors in execute/unsupported-protocol.ts that own each of l402/ap2/acp: - `canExecute(detection) === detection.protocol === <name>` - `execute()` throws `ProtocolNotImplementedError(protocol, network)` Registered LAST in the cascade so any future real executor takes precedence automatically. 3. Cascade short-circuits on `ProtocolNotImplementedError` — `executeWithCascade` re-throws it instead of swallowing into the generic "all executors failed" path. No other executor is going to implement a protocol the SDK doesn't have. 4. New `SUPPORTED_PROTOCOLS = ["x402", "mpp"] as const` export + `SupportedProtocol` type alias so consumers can introspect which protocols are actually executable (was implicit before). Verified: packages/sdk: bunx tsc --noEmit exit 0 packages/sdk: bun test 131 pass / 0 fail (was 130/1) - 8 new tests in unsupported-protocol.test.ts cover all three protocols: detection succeeds, execute throws the typed error, error fields populated correctly, message includes the "Currently executable: x402, MPP" hint. - One existing test in execute.test.ts updated: it asserted `selectExecutor(l402)` returns null, but after this commit l402 has a stub. Updated to use `protocol: "unknown"` (the genuinely unmatched case) — same intent, correct after Phase K. Replacement path: when a real L402/AP2/ACP executor lands, swap the matching `*UnsupportedExecutor` for the real implementation in execute/index.ts. The typed error path naturally goes away. No breaking change to the public API for that swap.
Audit #10 also flagged: "cctp evaluator wins paths but execute/ has no cctp executor → cascade picks it then throws ExecutionError". Same pattern as L402/AP2/ACP — diagnostic surface promised something the execution layer can't deliver. Phase K added typed errors for protocol-level gaps. CCTP is at the instrument layer (cross-chain bridge to fund a payment), so the fix is at the path resolver: cctp.isAvailable now returns false with a documented rejectedReason ("CCTP executor not implemented — see TODO(cctp) in src/resolve/index.ts"). Cost / latency / risk estimates are kept intact so cost-comparison UIs still render the hypothetical CCTP price. Once a real CCTP executor lands in execute/ that can: (1) quote fast-transfer fees, (2) burn USDC on source chain, (3) mint USDC on destination, (4) submit the original protocol payment from the destination chain — restoring availability is one line (the legacy hasSolana/hasEvm check is preserved verbatim in the TODO comment). Verified: packages/sdk: bunx tsc --noEmit exit 0 packages/sdk: bun test 131 pass / 0 fail - Two existing CCTP tests in resolve.test.ts updated to assert the new "intentionally disabled" behavior instead of the old "available when wallets cross chains" behavior. Same coverage, correct assertion.
…icit stubs (audit #8) Audit #8 also flagged: four call sites used `return false && <legacy condition>` to keep the original logic visible while disabling the path. The pattern is correct semantically (always false) but reads as production logic — a future contributor seeing `false && wallet.x && detection.y` could miss that the entire branch is intentionally inert. Replaced with explicit `return false;` + a comment block that: - Clearly says STUB / not implemented - Preserves the legacy availability condition as a comment so the re-enable patch is one line Sites: - execute/agentcard-mpp.ts canExecute (audit #8 line 40) - execute/mpp-session.ts canExecute (audit #8 line 23) - resolve/index.ts privySolana isAvailable (audit #8 false && short-circuit) - resolve/index.ts squads isAvailable (audit #8 false && short-circuit) mpp-session note: Phase B rewrote openMppSession to call mppClient.solana() directly under @solana/mpp 0.5.x. The session executor stays registered for future re-introduction of session-flow MPP (e.g. via tempo.session() when RhemifyConfig.wallet gains a tempoAccount — Phase B.5). Verified: packages/sdk: bunx tsc --noEmit exit 0 packages/sdk: bun test 131 pass / 0 fail (no behavior change) Pure readability + cargo-cult cleanup. Zero functional impact: canExecute()/isAvailable() still return false at all four sites — the expression `false && X && Y` was already evaluating to false. New code makes that obvious instead of disguised as conditional logic.
Reproducibility tail of d603210 (fix(ika-sidecar): pin SDK to 0.3.1). The ika-sidecar package.json change `@ika.xyz/sdk: latest` → `@ika.xyz/sdk: 0.3.1` and `@mysten/sui: latest` → `@mysten/sui: ^2.5.0` gets reflected in the root lockfile so a fresh `bun install` resolves to the same versions Phase J was tested against.
Closes Phase I's strict-definition gap. Phase I (commit 7da8393) had static type-level proof (`bunx tsc --noEmit` exit 0, types flow through to db.insert), but no live runtime evidence that Convex actually rejects bad enum strings at the API boundary. This script runs against a local anonymous Convex deployment booted via `bunx convex dev` (no shared team state touched), exercises events.insert three ways: 1. standard="x402", outcome="success" → SUCCESS, real doc id 2. standard="bitcoin" → REJECTED at .standard 3. outcome="maybe" → REJECTED at .outcome Each rejection comes from Convex's runtime validator stack inspecting the v.union(v.literal(...)) Phase I introduced. Verified output (local anonymous Convex on http://127.0.0.1:3212): [1] events.insert with standard='x402', outcome='success' (expect: SUCCESS) inserted id: k973qbx3etces0zmpaxr9jh8m586e88j [2] events.insert with standard='bitcoin' (NOT in enum) (expect: REJECTION) rejected: Path: .standard [3] events.insert with outcome='maybe' (NOT in enum) (expect: REJECTION) rejected: Path: .outcome All assertions passed. Phase I enum validators are load-bearing at runtime. Replication: cd packages/backend bunx convex dev # one-time: choose "Start without an account" bun run scripts/enum-validation-test.ts
A terminal UI dashboard for Rhemos built on @opentui/react that
connects to a local Convex deployment and renders fleet activity in
three live panels:
┌─ Agents ──────────────────────────┬─ Intelligence Feed ─────────┐
│ CEO Agent running $1.64 │ recommend SUB-1: recurring │
│ Research Agent running $1.12 │ auto_flag SA-1: spend ano. │
│ Marketing running $0.11 │ auto_alert VH-2: latency │
│ Sales Agent running $1.42 │ auto_block RO-1: cheaper │
│ ... │ ... │
└───────────────────────────────────┴─────────────────────────────┘
┌─ Live Transactions ─────────────────────────────────────────────┐
│ CEO Agent stripe.com mpp $0.21 completed │
│ Engineering Agent notion.so x402 $0.45 blocked │
│ Finance Agent perplexity.ai mpp $0.30 completed │
└─────────────────────────────────────────────────────────────────┘
Color-coded status badges: green = completed/running/applied/anchored,
red = blocked/rejected/failed/frozen, yellow = pending/paused/dismissed.
Architecture:
apps/tui/ — new workspace package
src/index.tsx — App + useConvexPoll + three panels
src/convex-client.ts — ConvexHttpClient + row types
scripts/seed.ts — calls convex/seed.ts:demo
package.json — @opentui/core@^0.2.6, @opentui/react@^0.2.6
packages/backend/convex/
seed.ts — new public mutation `demo` that
inserts 1 fleet + 6 agents + 30
transactions + 12 intelligence
actions + 10 payment_events.
Local-deployment only, idempotent
on email "demo@rhemify.local".
agents.ts — added listAll query for TUI
transactions.ts — added listAll query for TUI
Data flow: TUI polls Convex at 2Hz via ConvexHttpClient (Convex's
reactive subscription transport assumes a browser; HTTP polling is the
right shape from Node/Bun). Three queries run in parallel each tick:
agents:listAll, transactions:listAll, intelligence:listActions. Render
diffs through React reconciliation; @opentui/react handles the
terminal repaint.
Verified live (5-second boot against local convex @ 127.0.0.1:3212):
cd packages/backend && bunx convex dev # one-time, choose anonymous
cd apps/tui && bun install && bun run seed # populates demo data
cd apps/tui && bun run start # renders dashboard
Output captured all three panels rendering real seeded data with
color-coded status badges. Header bar shows "convex: 127.0.0.1:3212
(live) · 0s ago" confirming the polling tick lands.
Demo angle for Colosseum: takes the abstract architecture story
(governed payments, intelligence engine, anchor batches) and makes it
a visible terminal artifact instead of a marketing landing page.
Submission video can record this TUI streaming demo activity while
narrating the security + intelligence primitives we shipped.
Phase N.1. First chunk in the four-command decision-replay surface
that exposes apps/server/internal/replay/ to operators. This chunk
ships the browse-first command — `traces list` — that a CFO uses to
find a trace_id before running `show`, `replay`, or `verify` in
later chunks.
System view (informed by Tenderly / Stripe / Foundry / kubectl patterns,
docs/hackathon-positioning.md, the existing replay engine + HTTP route
at apps/server/internal/handler/replay.go):
rhemify traces list ← this chunk (read-only Convex query)
rhemify traces show <id> ← Phase N.2 — pretty trace dump
rhemify traces replay <id> --override key=value
← Phase N.3 — Tenderly-style overrides
rhemify traces verify <id> ← Phase N.4 — Merkle proof against
Solana devnet anchor (the moat
— nobody else has this)
What's in this commit:
1. `packages/backend/convex/seed.ts` — extended the demo mutation to
insert payment_traces alongside payment_events. Without this, list
returns empty. The replay_snapshot is shaped exactly the way
apps/server/internal/replay/replay.go:64-75 expects
(policy_state with daily_limit / max_per_transaction /
domain_allowlist / allowed_standards / approval_threshold;
vendor_registry_snapshot keyed by domain with is_blocked;
agent_context with spend_today). Three deterministic scenario
shapes interleaved so demo replays produce predictable diffs:
allowed-all-pass, domain-blocked, flagged-by-threshold.
2. `packages/backend/convex/traces.ts` — new `listAll` query that
joins each trace to its payment_event (agent_id, vendor, amount,
outcome) and computes a `decision` field
("allowed" | "blocked") from policy_rules_fired. Optional filters:
limit (cap 100), agent_id, blocked_only. Mirrors the
agents:listAll / transactions:listAll pattern introduced in
Phase M for the TUI.
3. `packages/cli/src/commands/traces/list.ts` — new CLI command:
rhemify traces list [--limit N] [--agent <id>] [--blocked-only]
[--json] [--convex <url>]
Reads from Convex directly via ConvexHttpClient (CQRS-style split:
reads bypass the Go server, writes still go through it). Pretty
terminal table by default with picocolors; --json for jq piping.
Trailing hint points at the next chunk commands so users discover
the workflow.
4. `packages/cli/src/index.ts` — added `traces` dispatch with
resource-after-verb pattern (Stripe / kubectl convention). Stubs
`show`/`replay`/`verify` with a friendly "coming in Phase N.X"
message so users know what's next, not a generic "unknown" error.
5. `packages/cli/src/config.ts` — added optional `convexUrl` to
RhemifyConfig + `resolveConvexUrl(override?)` helper with priority
explicit-arg > config > env CONVEX_URL > default
http://127.0.0.1:3210.
6. `packages/cli/package.json` — added `convex@^1.34.1` dep so the
CLI can construct ConvexHttpClient directly.
Verified end-to-end against the running local Convex deployment
(anonymous-backend at http://127.0.0.1:3212):
$ cd apps/tui && bun run seed --reseed
{ agents: 6, intelligence_actions: 12, payment_traces: 12,
status: "seeded", transactions: 30 }
$ cd packages/cli && CONVEX_URL=http://127.0.0.1:3212 \
bun run dev traces list
trace_id when agent_id vendor std amount decision outcome
─────────────────────────── ─────────────── ────────── ─────────────── ──── ──────── ──────── ────────
trc_seed_1778482712054_11 2026-05-11 14:58 j971h... anthropic.com x402 $0.03 allowed success
trc_seed_1778482712054_10 2026-05-11 14:58 j97ea... stripe.com x402 $0.25 allowed success
... (10 more rows)
trc_seed_1778482712054_0 2026-05-11 14:58 j973y... openai.com x402 $0.19 allowed success
12 rows.
next: rhemify traces show <trace_id> · rhemify traces replay <trace_id> --override key=value
$ bun run dev traces list --blocked-only → 3 rows (vercel.com x2, perplexity.ai x1)
$ bun run dev traces list --limit 3 → 3 rows
$ bun run dev traces list --json --limit 2 → valid JSON with all 12 enriched fields
(_id, _creationTime, trace_id, agent_id,
amount, decision, outcome, etc.)
Pre-existing TS errors in src/commands/onboard.ts and src/commands/pay.ts
(missing @rhemify-monorepo/sdk types after dist staleness, plus unused
imports) are not introduced by this commit and not in scope for Phase N.1.
Next chunk (N.2): `rhemify traces show <trace_id>` — full decision
context with rule_results, snapshot summary, anchor status. Same loop:
investigate → brainstorm → plan → build → real e2e → commit.
Phase N.2. Second chunk in the four-command surface. Builds on N.1's
`traces list` — operator copies a trace_id out of the list output and
runs `show` to get the full decision context. This is the "why did
agent-7 pay $0.44 to perplexity.ai at 06:58 UTC" view.
Render is gh-pr-view-style multi-section so a CFO can read it
top-to-bottom without scanning:
TRACE identity + decision badge (green ALLOWED / red BLOCKED)
EVENT agent, fleet, vendor, amount, outcome, trigger 402
POLICY the 6 rules fired with per-rule pass/block + thresholds
(this is the WHY — the audit-grade answer the moat sells)
PATH SELECTION which instrument was selected, alternatives scored
SNAPSHOT captured policy + vendor + agent state at decision time
(the data replay engine consumes; appears in N.3 overrides)
VERIFIABILITY trace_hash + anchor status (Solana tx if anchored — N.4)
NEXT pre-filled `traces replay` commands, ready to copy
What's in this commit:
1. `packages/backend/convex/traces.ts` — new `getByTraceId` query that
looks up by the human-readable `trace_id` field via the existing
`by_trace_id` index, then joins payment_event. CLI consumers copy
trace_id strings out of `list` output; they don't have Convex
internal _ids.
2. `packages/cli/src/commands/traces/show.ts` — the 7-section renderer.
~280 lines. Color-coded rule icons (✓ green pass, ✗ red block,
! yellow flag, · dim skipped). Pre-fills next-step commands with
the concrete trace_id + domain to make the replay flow discoverable.
--json for jq piping, --convex for ad-hoc URL override.
3. `packages/cli/src/index.ts` — replaced the "coming in Phase N.2"
stub with real dispatch to `tracesShow`. Updated traces help text.
Verified end-to-end against local Convex (anonymous-backend at
http://127.0.0.1:3212, seeded with 12 traces):
$ rhemify traces show trc_seed_1778482712054_8
TRACE
trace_id trc_seed_1778482712054_8
decision BLOCKED ← red badge
at 2026-05-11 06:58:32 UTC
confidence high
EVENT
agent j97ea6vwtr1tjj6v55swyvatkh86f1mj
vendor perplexity.ai
amount $0.4400 USDC on solana-devnet
standard x402
outcome rejected
agent context Research Agent called perplexity.ai ($0.44 x402)
trigger 402 HTTP 402 from perplexity.ai: payment required
POLICY 6 rules evaluated
✓ daily_limit pass threshold 50.00 actual 23.61
✓ max_per_transaction pass threshold 5.00 actual 0.44
✗ domain_allowlist BLOCK threshold allowlist actual perplexity.ai
✓ standard_allowlist pass threshold allowlist actual x402
✓ vendor_blocked pass threshold not_blocked actual perplexity.ai
✓ approval_threshold pass threshold 10.00 actual 0.44
PATH SELECTION
selected none
reason domain blocked by policy
alternatives
• credit unavail no credit service configured
• ows avail score 0.95, est $0.4410
• jupiter unavail USDC matches vendor
SNAPSHOT captured state at decision time
policy daily_limit=50 max_per_tx=5 approval=10 allowlist=5 domains standards=[x402,mpp]
vendors 8 in registry
agent ctx spend_today=$23.17
VERIFIABILITY
trace hash sha256_seed_8_19e15d48df6
anchor status not anchored yet (Phase N.4 verify cmd will anchor + verify)
NEXT
Try a counterfactual:
rhemify traces replay trc_seed_1778482712054_8 --override daily_limit=1
rhemify traces replay trc_seed_1778482712054_8 --override 'domain_allowlist=-perplexity.ai'
Also verified:
- Allowed trace (trc_seed_..._0, openai.com): green ALLOWED badge,
all 6 rules pass with ✓, outcome success.
- --json: valid JSON dump with all 6 rules + joined payment_event.
- Missing trace_id: exits 1 with helpful "Browse available traces:
rhemify traces list" message.
Next chunk (N.3): `rhemify traces replay <id> --override key=value` —
posts to the existing /api/traces/:id/replay endpoint with policy
overrides, pretty-prints the original-vs-counterfactual diff. THE
killer-demo chunk.
Phase N.3. The headline command from docs/hackathon-positioning.md:
"why did agent-7 pay $340 at 2am?" — answered by re-running the trace
through the Go server's replay engine under counterfactual policy.
Hybrid override flag UX — named flags for the common case, `--override
KEY=VALUE` escape hatch for anything else:
Scalar overrides
--daily-limit N fleet daily spend cap
--max-per-tx N per-transaction cap
--approval-threshold N "flag for review" threshold
Array overrides (repeatable)
--add-domain D / --remove-domain D domain_allowlist add/remove
--add-standard S / --remove-standard S allowed_standards add/remove
Generic
--override KEY=VALUE any policy_state field, comma → array,
scalar = replace, "-prefix" = array remove
Each flag transforms into the policy_overrides map the existing Go
engine's replay.ApplyOverrides understands — same contract the spec
documented, same shape Tenderly / Foundry CLIs use.
Auth — /api/traces/:id/replay is in middleware.FleetAPIKeyAuth. CLI
loads api_key by priority:
1. --api-key flag
2. RHEMIFY_FLEET_API_KEY env var
3. ~/.rhemify/config.json (post-onboard)
4. Local-dev fallback: query Convex for demo@rhemify.local's api_key
What's in this commit:
1. `packages/backend/convex/seed.ts` — demo fleet now seeded with
stable api_key "rhm_demo_local_fleet_key_2026" so the local-dev
fallback can resolve it. Pre-Phase-N.3 fleets get the key
backfilled on reseed. Not a production secret — local-deployment
only.
2. `packages/cli/src/commands/traces/replay.ts` — ~340 lines. Flag
parser, override transformer, api_key resolver, fetch POST, diff
renderer. Sections: REPLAY (id), OVERRIDES APPLIED, VERDICT
(original vs counterfactual with the dramatic ← arrow), RULE-BY-
RULE table (every rule, both sides, CHANGED marker), DIFF SUMMARY.
3. `packages/cli/src/index.ts` — replaced the Phase N.2 "coming
soon" stub with real dispatch. Updated traces help.
Verified end-to-end against running Go server + local Convex:
Go server: cd apps/server && CONVEX_URL=http://127.0.0.1:3212 \
go run ./cmd/server
# listening on :8080, /api/health → 200
==== DEMO 1 — blocked trace + add-domain → ALLOWED ====
$ rhemify traces replay trc_seed_1778482712054_8 \
--add-domain perplexity.ai
REPLAY trc_seed_1778482712054_8
OVERRIDES APPLIED
domain_allowlist [perplexity.ai]
VERDICT
original: BLOCKED
counterfactual: ALLOWED ← would now be ALLOWED
RULE-BY-RULE
✓ daily_limit pass → pass —
✓ max_per_transaction pass → pass —
✓ domain_allowlist BLOCK → pass CHANGED
✓ standard_allowlist pass → pass —
✓ vendor_blocked pass → pass —
✓ approval_threshold pass → pass —
DIFF SUMMARY
domain_allowlist BLOCK → pass
Story: "If we'd allowed perplexity.ai, that $0.44 Research Agent
payment would have gone through. The CFO can see the EXACT rule
that changed and the EXACT counterfactual outcome."
==== DEMO 2 — allowed trace + tight daily_limit → BLOCKED ====
$ rhemify traces replay trc_seed_1778482712054_0 \
--daily-limit 0.10
REPLAY trc_seed_1778482712054_0
OVERRIDES APPLIED
daily_limit 0.1
VERDICT
original: ALLOWED
counterfactual: BLOCKED ← would now be BLOCKED
RULE-BY-RULE
✗ daily_limit pass → BLOCK CHANGED
✓ max_per_transaction pass → pass —
✓ domain_allowlist pass → pass —
✓ standard_allowlist pass → pass —
✓ vendor_blocked pass → pass —
✓ approval_threshold pass → pass —
DIFF SUMMARY
daily_limit pass → BLOCK
Story: "If daily_limit had been 10 cents, that openai.com payment
would have been blocked at the policy gate. Counterfactual analysis
for policy tuning."
Pipeline proven end-to-end:
CLI flag parsing
→ policy_overrides JSON
→ Convex fleet api_key lookup (local-dev fallback)
→ Bearer auth header
→ Go server /api/traces/:id/replay (port 8080)
→ Go server queries traces:getForReplay from Convex
→ replay.Replay() pure function — real cryptographic re-evaluation
→ JSON response with original + replayed + diff
→ CLI pretty-render with color-coded badges + CHANGED markers
This is the moat — `--json` plus the explorer link in N.4 makes it
auditor-friendly. No competitor (Tenderly, Stripe, Foundry, Datadog)
ships this combo: decision replay with policy overrides + cryptographic
anchor proof.
Next chunk (N.4): `rhemify traces verify <trace_id>` — Merkle proof
against Solana devnet anchor PDA. The cryptographic proof that the
ORIGINAL decision really happened (CFO showed an auditor "yes, here's
the on-chain receipt").
…4 / THE moat)
Phase N.4. The fourth and final chunk in the Decision Replay CLI surface.
This is the command nobody else ships — anchors a trace's hash on
Solana devnet via the deployed rhemify-anchor program (Phase C/E), then
reads the PDA back to cryptographically prove the trace exists on-chain.
The audit-grade differentiator. Tenderly simulates. Stripe shows events.
Datadog traces. Foundry replays. Rhemos *proves* — an auditor can
independently re-derive the leaf, query the on-chain PDA, and confirm
the root committed at a known slot. No trust required.
Flow:
1. Load trace from Convex via traces:getByTraceId
2. Compute leaf = sha256(trace.trace_hash) — deterministic 32 bytes
3. Derive PDA: [b"rhemify-daily", authority, fleet_id, date]
(user-scoped seeds from Phase C, the same shape that Phase F
proved structurally squat-resistant)
4. If PDA already exists: read on-chain root, compare, mark VERIFIED
without submitting a new tx (idempotent — important for repeat audits)
5. If not: build write_daily_root instruction, sign with user's
~/.config/solana/id.json, submit, wait for confirmation, then read
PDA back
6. Print VERIFIED with computed_root == on_chain_root, anchor tx, slot,
and Solana Explorer link
Implementation notes:
- Lifted the Solana web3.js pattern from Phase E's
tools/devnet-smoke/initialize-fleet-vault.ts: anchor discriminator =
sha256("global:<ix_name>").slice(0,8); strings borsh-encoded with
4-byte LE length prefix; u32 LE; raw 32-byte [u8; 32] for merkle_root.
- The on-chain DailyRoot account is parsed by walking the variable-length
Borsh layout (not the fixed InitSpace alloc): 8-byte discriminator,
then fleet_id len+utf8, date len+utf8, merkle_root[32], etc.
- Single-leaf "batch" semantics for now — leaf hash IS the Merkle root
for one trace. Production batching (multi-trace daily roots with real
Merkle paths) is the Go server's BatchManager cron's job; the CLI
demonstrates the anchor primitive for one trace at a time.
- No Go server needed for this command — talks directly to Solana
devnet RPC + Convex for the trace lookup.
What's in this commit:
1. `packages/cli/src/commands/traces/verify.ts` — ~280 lines. Solana
web3.js + node:crypto sha256 + node:fs for keypair. Idempotent
anchor + verify in one command.
2. `packages/cli/src/index.ts` — replaced the Phase N.4 "coming soon"
stub with real dispatch. Updated traces help so all four verbs are
now live.
Verified end-to-end on Solana devnet (initial anchor, then idempotency):
$ rhemify traces verify trc_seed_1778482712054_0
anchoring trace trc_seed_1778482712054_0 to devnet (~0.001 SOL fee)...
VERIFY trc_seed_1778482712054_0
VERIFIED trace hash matches on-chain Merkle root
ON-CHAIN
program HYWjBbLMEz98KnppVkUnHmkUZ4pyQ8abaDRTtUedUkxV
PDA 84qxhcQ9XTqDNeNkVbS6vW4PMwccaXr2LmdpeuwhuXgR
bump 254
fleet_id jx78f22hchxpxr59y74fbk2eex86e4a3
date 2026-05-11
anchor tx 3sN7mowb3kWiSbxejnZnVdq3Kc2ZPiAhR7EN4j9iuc6Cw9pHEEr6idNRBRetXJ7wJGQ62Uu8CKx2ftGRTwwWxM3T
slot 461573216
status freshly anchored in this run
HASH CHAIN
computed root 85104344aa1d7684f8efe19b698b37877e176de8037ca4e5bd51d55367be2b97
on-chain root 85104344aa1d7684f8efe19b698b37877e176de8037ca4e5bd51d55367be2b97
match ✓ identical
EXPLORER
PDA https://explorer.solana.com/address/84qxhcQ9XTqDNeNkVbS6vW4PMwccaXr2LmdpeuwhuXgR?cluster=devnet
tx https://explorer.solana.com/tx/3sN7mowb3kWiSbxejnZnVdq3Kc2ZPiAhR7EN4j9iuc6Cw9pHEEr6idNRBRetXJ7wJGQ62Uu8CKx2ftGRTwwWxM3T?cluster=devnet
Audit-grade proof: an auditor can independently re-derive the leaf,
query the PDA at 84qxhcQ9XTqDNeNkVbS6vW4PMwccaXr2LmdpeuwhuXgR,
and confirm the root committed at slot 461573216.
Second run (idempotency check — same trace_id, no new tx):
$ rhemify traces verify trc_seed_1778482712054_0
(no "anchoring..." message — went straight to read+verify)
VERIFY trc_seed_1778482712054_0
VERIFIED trace hash matches on-chain Merkle root
ON-CHAIN
... (same PDA)
status already anchored — verified without writing a new tx
HASH CHAIN
computed root 85104344aa1d7684f8efe19b698b37877e176de8037ca4e5bd51d55367be2b97
on-chain root 85104344aa1d7684f8efe19b698b37877e176de8037ca4e5bd51d55367be2b97
match ✓ identical
This completes the four-command Decision Replay CLI:
rhemify traces list ✅ Phase N.1 — browse
rhemify traces show <id> ✅ Phase N.2 — full decision context
rhemify traces replay <id> ✅ Phase N.3 — counterfactual diff
rhemify traces verify <id> ✅ Phase N.4 — on-chain anchor proof ← THIS
End-to-end killer-demo flow now works:
$ rhemify traces list # find a trace
$ rhemify traces show trc_xxx # read why it was decided
$ rhemify traces replay trc_xxx \ # what-if policy override
--add-domain perplexity.ai
$ rhemify traces verify trc_xxx # cryptographically prove on Solana
Submission-ready for Colosseum Frontier per
docs/hackathon-positioning.md's "Decision trace replay — 'why did
agent-7 pay $340 at 2am?'" enterprise demo moment.
…O.1)
Closes the Category B audit gap: payment_traces were entirely seeded
because rhemify.pay() had never run end-to-end against any 402 endpoint.
Two latent drifts hid the failure:
1. SDK PaymentEvent → Convex events:insert mismatch:
- SDK emitted chain_from/chain_to, Convex required chain (separate field).
- SDK emitted id/timestamp/standard_version/parent_event_id/delegation_depth
— Convex strict validator rejected them.
2. SDK PaymentTrace → Convex traces:insert mismatch:
- SDK used agent_task_description, Convex wanted agent_task_context.
- SDK omitted confidence (required by validator).
- SDK's payment_event_id was an "evt_<hex>" string, not a Convex Id.
Fix at the SDK↔Convex contract boundary (Go ingest handler), not by
loosening Convex validators or breaking SDK types. apps/server adds
reshapeEventForConvex / reshapeTraceForConvex / reshapePolicyDecisionForConvex
that project the SDK shape onto the exact field set each mutation accepts.
Also in this chunk:
- rhemify pay --dry-run flag (runs the full pipeline, skips chain submit,
still emits the trace) — smallest viable proof the pipeline works
- RhemifyConfig.fleetApiKey field — without it, the CLI hardcoded
"cli-user" and the Go server's FleetAPIKeyAuth middleware 401'd silently
- CLI surfaces ingest errors via onError instead of swallowing them
- seed.ts no longer fakes payment_events / payment_traces /
policy_decisions — those tables are now driven exclusively by real
pipeline output. Faking them was the bug-hider.
- seed:wipeDemoTraces mutation to clear pre-Phase-O.1 seeded rows
(run via curl when ready).
Verified end-to-end:
$ bun run tools/test-402/server.ts &
$ rhemify pay http://localhost:3402/stock-data --dry-run --max-budget '$1.00'
→ trc_19daf215b88d4b0c lands in Convex with 8 alternatives evaluated,
6 policy rules fired, full detection raw body, real trace_hash.
…(phase O.2) Replaces the dry-run-only flow from O.1 with a real on-chain Solana memo transaction per payment, end-to-end: rhemify pay http://localhost:3402/stock-data --max-budget '$1.00' → submits memo tx on devnet (signed by CLI wallet) → memo content: rhemify:x402:<network>:<priceRaw>:<payTo>:<path>:<ts> → encodes signed signature into x402-spec PaymentPayload (base64 JSON) → sends X-Payment header to resource, retrieves 200 OK → records signature as txHash in trace → trace lands in Convex with payment_tx_hash visible in `traces show` Verified end-to-end (devnet): signature 2ARU61BoEXY7P8H8Nd7wkacRUrZ1Bwftk86eGxpB45ScvGZt3aLsAq3cgs51jSuXi5z9ZsJppQF35kwp3EhctJUW trace_id trc_51ab4efb2fe14e06 explorer https://explorer.solana.com/tx/<sig>?cluster=devnet fee 5000 lamports (~$0.001) memo log "rhemify:x402:solana-devnet:500000:11111111111111111111111111111111:/stock-data:1778489854...." Honest scope: - This is a SIGNED memo tx, NOT a USDC SPL-Token transfer. No tokens move to the recipient — the memo serves as cryptographic intent + payable trace anchor for the audit story. A future variant (x402SolanaTransferExecutor) should do the real token transfer for production. For the audit-grade demo, every payment now has a verifiable on-chain signature. - Local test server (`tools/test-402/server.ts`) accepts any X-Payment header — for real x402 servers, a facilitator would validate the PaymentPayload contents. The header shape we send (x402Version=2, scheme=exact, network, payload.transaction) is the canonical spec shape so it would parse against a real facilitator. x402SolanaExecutor rewrite: - Drop dynamic import of `x402-solana` (peer dep was declared but the installed package's facilitator path required extra.feePayer in detection.raw, which no test/real endpoint we tried supplies — the whole executor errored out unconditionally for every real run). - Self-contained: uses `@solana/web3.js` directly to build/sign/submit the memo tx. Honest about what it does. End-to-end field plumbing for the on-chain signature: - packages/types: PaymentTrace.payment_tx_hash (string | null) — distinct from anchor_tx_hash (Merkle anchor) so we don't conflate "payment happened" with "trace document is anchored". - packages/sdk/client.ts + session/index.ts: emit payment_tx_hash from snapshot.executionTxHash (already captured by trace.recordExecution). - packages/backend/convex/schema.ts + traces.ts: payment_tx_hash optional field added to schema + insert validator. - apps/server/internal/handler/ingest.go: reshape passes payment_tx_hash through when non-empty (Convex strict optional rejects empty strings). - packages/cli/.../traces/show.ts: VERIFIABILITY section renders the signature + clickable devnet explorer link. Robustness fixes in traces/show.ts (drift exposed by real SDK output): - policy_rules_fired: SDK emits {decision, actual} where seed used {result, value}. normalizeRule() absorbs both shapes. - instrument_selection_log: SDK emits a string ("ows selected: score 0.701"), seed used {selected, reason}. Render both. - replay_snapshot.{policy_state, agent_context, vendor_registry_snapshot}: SDK emits camelCase + zero values, seed used snake_case + real values. show.ts now reads both, falls back to "(empty)" instead of crashing. (The deeper fix — actually populating SDK policy state — is phase O.4.)
…O.3) Mirrors phase O.2 (x402-solana) for the MPP standard: rhemify pay http://localhost:3402/analytics --max-budget '$1.00' → MPP detected via WWW-Authenticate (network=solana-devnet, $0.10) → submits memo tx on devnet with content "rhemify:mpp:<...>" → sends Authorization: Payment <base64-JSON-token> with signed signature → resource returns 200, SDK records signature in trace Verified end-to-end (devnet): signature EJPuNuCNuK4UGPXZEYCWPMwWSf3BpGGTsjEiNQkM58c4ZxsUEzQn3YC72PzoEn5Q6zjMM47k42tX2HtX6HWmJ3z trace_id trc_533c5b753f154fe7 memo log "rhemify:mpp:solana-devnet:100000:11111111111111111111111111111111:/analytics:1778489299557" fee 5000 lamports mppChargeExecutor rewrite (same approach as x402-solana.ts in O.2): - Drop dynamic import of `@solana/mpp` + `@solana/kit` (peer deps; the upstream API surface has shifted under us multiple times and using it as the happy path silently broke every real run). - Self-contained: uses `@solana/web3.js` directly to build/sign/submit a memo tx whose content carries the trace context (network, amount, recipient, resource path, timestamp). - Sends `Authorization: Payment <base64>` (MPP convention) instead of `X-Payment` (x402 convention). The local test server accepts either. Honest scope (documented in the executor file-level doc): - This is a SIGNED memo tx, NOT a USDC SPL-Token transfer. No tokens move to the recipient. A future variant (`mppChargeTransferExecutor`) should do the real token transfer for production. For the audit-grade demo, the memo serves as cryptographic intent + payable trace anchor. - The Payment token shape we send is JSON, not the HMAC MAC token that a real `mppx` server would expect. Works against any server that treats "Authorization present" as the gate (incl. our local test server). Real mppx interop is future work. Both supported protocols (x402, mpp) now produce real on-chain signatures end-to-end. Category B audit gap fully closed for the SUPPORTED_PROTOCOLS surface.
Closes the keystone audit gap behind `rhemify traces replay`: every
emitted trace had `replay_snapshot.policy_state` hardcoded to zeros and
camelCase keys, so the Go replay engine (which reads snake_case keys)
saw an empty policy and every counterfactual override was meaningless.
Verified end-to-end (devnet):
$ rhemify pay http://localhost:3402/stock-data --max-budget '$1.00'
→ trc_4f362bd02f2249d9 with policy_state{daily_limit=100, max_per_tx=50, ...}
(real values fetched from Go /api/policy/<agent>, not zeros)
$ rhemify traces replay trc_4f362bd02f2249d9 --daily-limit 0
→ original: ALLOWED
counterfactual: BLOCKED ← daily_limit BLOCK
i.e. lowering the limit below the actual spend correctly flips the
outcome — the replay engine now has real state to flip.
Three layers of drift fixed:
1. packages/types/src/intelligence.ts — canonical contract realigned:
- PolicyState keys: camelCase (dailyLimit, ...) → snake_case
(daily_limit, ...). This is the wire shape; Go replay reads
policy_state["daily_limit"]. The type was the source of truth that
was wrong. Note: SDK runtime PolicyConfig stays camelCase because
that's the live policy-engine shape the agent's rules evaluate
against. SDK now translates between them when emitting.
- vendor_registry_snapshot: Record<string, unknown> → Record<string,
{is_blocked: boolean}>. Go reads snapshot[domain].is_blocked.
- agent_context: string → {spend_today: number}. Go reads
agent_context.spend_today.
- allowed_standards: PaymentProtocol[] → string[] (over-the-wire the
literal union is lost; loose type lets emit type-check).
2. packages/sdk/src/policy/index.ts — PolicyEngine.evaluate signature:
- Returns { decision, context } instead of just decision.
- The caller needs the context to snapshot real policy_state into the
trace; without it every trace recapitulated the empty-state bug.
3. packages/sdk/src/client.ts + session/index.ts + trace/{index,types}.ts:
- Trace gains recordPolicyContext(ctx) + policyContext in snapshot.
- client.ts pay() captures context, emits real snake_case policy_state
by translating from camelCase PolicyConfig:
daily_limit ← policy.dailyLimit
max_per_transaction ← policy.maxPerTransaction
approval_threshold ← policy.approvalThreshold
allowed_standards ← policy.allowedStandards
domain_allowlist ← policy.domainAllowlist
- vendor_registry_snapshot: built from policyContext.blockedDomains.
- agent_context.spend_today: from policyContext.spentToday.
- Session-path emits zero-state but in the correct snake_case shape
so it round-trips through Go without breaking schema validation.
The replay engine itself didn't change — Go-side replay/policy.go was
already correct; it was just being fed bad data. With real state, the
killer-demo "what if daily_limit were $1?" works as advertised.
Audit-grade rewrite of the root README. The previous version overclaimed
on multiple axes a Colosseum technical-DD pass would catch immediately:
- "Any standard (x402, MPP, L402, AP2)" — L402/AP2/ACP throw
ProtocolNotImplementedError. They detect; they do not execute.
- "Any chain" — EVM/Base x402 path exists in code but was never proven
end-to-end against a real endpoint. Solana is the only supported
execution surface in v1.
- "Base x402 + CCTP" — CCTP path resolver returns available:false.
Wiring exists; execution does not.
- "@x402/fetch, mppx, OWS signing" — those packages were peer deps that
we ship around with a self-contained @solana/web3.js memo executor,
because their facilitator-shaped APIs never matched any real endpoint
we tested against.
- "Permanently verifiable on Solana via PDAs" — Anchor program is
deployed and write_daily_root works, but only `rhemify traces verify`
submits anchor txs (not automatic per-payment).
- "338+ seeded x402 vendor endpoints" — that was discovery-DB metadata,
not flow against the endpoints.
What the new README claims (and links the user to verify):
- x402 + MPP detection from real HTTP 402 responses.
- Solana memo execution: signed-intent tx on devnet, ~5000 lamports
fee, memo carries trace context. NOT a USDC transfer — explicit.
- Full decision capture (detection raw body, alternatives scored,
rules fired, agent context) stored in Convex with content hash.
- `rhemify traces replay <id>` counterfactuals against real captured
state (post-O.4 — policy_state now has real values).
- `rhemify traces verify <id>` writes Merkle root to devnet program.
New "What is NOT in v1" section enumerates the typed stubs and the path
resolvers that return false so a reader can audit the supported surface
in seconds rather than reverse-engineering from the codebase.
New "What actually works end-to-end" table maps each capability to a
specific shell command and a specific proof artifact, mirroring how a
Colosseum judge would walk the demo.
New "Roadmap" section parks the previous overclaims as explicit future
work — USDC transfers, mainnet anchoring, L402/AP2/ACP execution, EVM
path, CCTP, Ika dWallet — so they remain visible without being lied
about.
No other files changed in this commit. Per CLAUDE.local.md, README is
the one .md that is committed/pushed; other markdown stays local. The
apps/web marketing components (Hero, Features, etc.) carry their own
positioning copy owned by Jun Shen — out of scope for this audit fix.
Closes the audit-flagged "no automated quality gate" gap. Runs on every
push to feature/* branches and PRs to main, three jobs in parallel:
typescript — bun install + SDK build + bun run check-types
go-server — go vet + go build + go test in apps/server
anchor-programs — cargo check on rhemify-anchor + rhemify-dwallet
Triggered on push to main / feature/** and PRs to main. Concurrency
group cancels in-flight runs on the same branch so a rapid-fire push
sequence doesn't queue up wasted compute.
Toolchain pins (lifted from the actual local environment so CI matches
what we develop against):
- Bun 1.3.11 (package.json packageManager)
- Go 1.24 (apps/server/go.mod)
- Rust stable (rustup default; cargo check on host target, not SBF)
Why cargo check, not anchor build:
- Anchor SBF build needs cargo-build-sbf from Solana's bundled
toolchain, which is heavy to install on every CI run.
- The audit value of CI here is "did this change break compilation",
not "is the SBF artifact byte-equal" — cargo check on the host
target catches the same syntax + type errors. Full SBF compile
stays a developer-machine step before devnet deploy.
Smoke-tested locally before push — all three jobs pass cleanly:
- bun typecheck: 3 workspaces typecheck (was previously red on MCP
until phase O.4's SDK build chain stabilized).
- go vet/build/test: 5 packages tested, 0 failures.
- cargo check: rhemify-anchor (0.67s), rhemify-dwallet (0.76s).
Each emits ~6 unexpected_cfgs warnings from anchor's cfg surface
against rustc 1.95 — not failures, won't block.
Caching:
- bun: install cache by lockfile hash.
- go: actions/setup-go built-in cache on go.sum.
- cargo: registry + git + per-program target dir by Cargo.toml hash.
Cold anchor build is ~5min; cached is ~30s.
What this does NOT include (deferred to next chunk):
- Anchor program unit tests (no tests written yet — phase O.7).
- Web app dev-server smoke (apps/web visual regression is out of
Sean/siewwwin scope).
- Release artifact builds.
First CI run on this repo (commit 0d8cd2c) caught a latent build-order bug that my local smoke test missed: packages/mcp's tsc fails with TS2307 "Cannot find module '@rhemify-monorepo/sdk'" when the SDK's dist/index.d.ts hasn't been built yet. Why local passed but CI failed: - moduleResolution: "bundler" (in packages/config/tsconfig.base.json) reads the SDK's package.json "types" field, which points to "./dist/index.d.ts". - When dist/ doesn't exist, the resolver can't find the module at all — hence TS2307, not the softer TS7016 ("found .js but no .d.ts") error. - My local `bun run check-types` showed mcp:check-types as "cache hit, replaying logs" — turbo skipped the actual tsc invocation because a prior successful run (when dist/ existed) was cached. The cache hid the build-order dependency. Fix: - turbo.json: check-types task now dependsOn ["^build", "^check-types"] instead of ["^check-types"] alone. - Forces every workspace's check-types to wait for upstream workspaces' build to complete, guaranteeing dist/ exists before a downstream package tries to resolve its types. Verified locally with --force (cache bypassed): bun run check-types --force → SDK build runs first (sdk:build: 62ms ESM + 2417ms DTS) → MCP check-types runs after (mcp:check-types: cache bypass) → 4 tasks successful, 0 errors This unblocks the CI TypeScript job that failed on the first run (commit 0d8cd2c, run 25660844873). Anchor + Go jobs were already green.
Closes the audit-flagged "no on-chain test coverage" gap. Adds 17 unit
tests across the two Anchor programs, focused on the security invariant
both audit reports flagged: user-scoped PDA seeds.
rhemify-anchor (6 tests):
daily_root_pda_is_deterministic
daily_root_pda_is_authority_scoped ← squat defense
daily_root_pda_is_fleet_scoped
daily_root_pda_is_date_scoped
daily_root_seed_prefix_is_pinned ← rename canary
program_id_matches_declare_id ← deploy canary
rhemify-dwallet (11 tests):
fleet_vault_pda_is_deterministic
fleet_vault_pda_is_authority_scoped ← squat defense
fleet_vault_pda_is_fleet_scoped
agent_wallet_pda_is_deterministic
agent_wallet_pda_is_authority_scoped ← squat defense (transitive)
agent_wallet_pda_differs_by_agent_key
signing_approval_pda_is_deterministic
signing_approval_pda_is_nonce_scoped ← replay defense
signing_approval_pda_inherits_agent_wallet_scope
seed_prefixes_are_pinned
program_id_matches_declare_id
Scope discipline — what these tests do NOT cover:
- Full account validation (the #[account(...)] macro constraints):
init_if_needed semantics, signer enforcement, rent payment, etc.
Those require an SVM runtime (Mollusk or litesvm). Future chunk.
- The handler bodies (Clock::get, daily_cap math). Same — needs SVM.
- SBF-target compilation. Stays a developer-machine step; CI compiles
the host target only to keep job runtime under a minute.
What they DO cover — the security invariant a $1M technical DD would
flag if absent: every PDA in this monorepo is derived from a seed list
that includes the operator's pubkey, so a different signer cannot init
into another fleet's account namespace. Tests pin:
- seed prefix bytes (catches accidental rename that would orphan every
deployed PDA on devnet)
- authority inclusion (proves squat defense holds for fleet-vault and
agent-wallet PDAs)
- transitive squat defense for signing-approval (which seeds off the
agent_wallet PDA — itself authority-scoped)
- program IDs against declared values (catches deploy mismatches)
CI workflow (.github/workflows/ci.yml):
- cargo check → cargo test --all-targets in both anchor jobs. cargo
test runs check implicitly + builds tests + executes them. Cached
builds keep the job under a minute.
Verified locally before push:
programs/rhemify-anchor: 6/6 passed, finished in 0.00s
programs/rhemify-dwallet: 11/11 passed, finished in 0.00s
… shapes (phase O.8)
The replay diff renderer previously couldn't show a real side-by-side
comparison because (1) the SDK and Go used different rule names for the
same checks, and (2) Go's buildOriginalOutcome read the wrong fields off
SDK-emitted traces. Both bugs were hidden by the seeded traces O.1
deleted — once real pipeline output started flowing, the killer-demo
output broke.
Before:
RULE-BY-RULE
· domain_blocked → skipped CHANGED
✓ domain_allowlist → pass CHANGED
· allowed_standards → skipped CHANGED
✗ daily_limit → BLOCK CHANGED
· max_per_tx → skipped CHANGED
✓ max_per_transaction skipped → pass CHANGED
✓ standard_allowlist skipped → pass CHANGED
✓ vendor_blocked skipped → pass CHANGED
(12 rows total — every rule shown twice, every rule "CHANGED")
After:
RULE-BY-RULE
✓ vendor_blocked pass → pass —
✓ domain_allowlist pass → pass —
✓ standard_allowlist pass → pass —
✗ daily_limit pass → BLOCK CHANGED
✓ max_per_transaction pass → pass —
! approval_threshold pass → flag CHANGED
(6 rows — one per rule, only real changes flagged)
Two layers of drift fixed:
1. SDK rule names (packages/sdk/src/policy/rules.ts):
max_per_tx → max_per_transaction
allowed_standards → standard_allowlist
domain_blocked → vendor_blocked
Go's names (apps/server/internal/replay/policy.go) were the
canonical set — clearer, snake_case, descriptive — so SDK moves to
match. Suggestion strings in policy/index.ts and assertions in
test/policy.test.ts updated to match.
2. Go original-outcome reads (apps/server/internal/replay/replay.go):
buildOriginalOutcome read m["result"] / m["value"] (the deprecated
seeded shape) but SDK actually emits decision / actual. Result:
original.rule_results came back with empty result/actual strings for
every rule on real traces. Now reads either shape — { result, value }
or { decision, actual } — and normalizes SDK's "allow" → "pass" so
diff comparisons against the live engine's pass/block/flag vocabulary
line up.
Verified end-to-end:
$ rhemify pay http://localhost:3402/stock-data --max-budget '$1.00'
→ trc_9b962efd66f54e57 emitted with new names
$ rhemify traces replay trc_9b962efd66f54e57 --daily-limit 0
→ original ALLOWED, counterfactual BLOCKED, only daily_limit shown
as CHANGED (the actual override target)
Side note still visible in the output: approval_threshold reads as
"pass" in original (SDK: disabled when threshold=0) but "flag" in
replayed (Go: any amount > 0 threshold flags). Different semantic for
the "approval disabled" case. Not part of this chunk — separate seam.
Local tests pass: 19/19 SDK policy tests, Go replay tests.
Six zombie imports left over from O.1's trace-seed-loop deletion:
PaymentStandard, AgentStatus, TransactionStatus, PaymentOutcome,
IntelligenceActionType, IntelligenceOutcome. They were the enum
validators the trace-seed loop used; that loop is gone, the imports
weren't.
oxlint flagged all six. After:
bunx oxlint packages/backend/convex/seed.ts
→ 0 warnings, 0 errors.
Other oxlint warnings in the repo (24 total across 208 files) are
pre-existing in tools/test-402/ and packages/sdk/test/ — out of scope
for this chunk.
Two warnings in my new executors (x402-solana.ts, mpp-charge.ts) about
`...(options.headers ?? {})` are intentional. Lint suggests dropping the
`?? {}` fallback as "unnecessary" — true at runtime, but TS strict
mode requires it because `headers?: Record<string, string>` is typed as
possibly undefined and spreading undefined into an object literal is a
TS error under strict checks. Keeping the fallback.
… O.17)
apps/web/public/logo/{base,agentcard,circle,l402,virtual}.svg were
shipped with the old "Integrated with" surface that O.10 trimmed.
None of the five reflect a capability the SDK actually executes:
base.svg — Base x402 path exists in code but never proven e2e
agentcard.svg — agentcard-mpp executor canExecute returns false
circle.svg — CCTP path resolver returns available:false
l402.svg — detected, throws ProtocolNotImplementedError on execute
virtual.svg — ACP detector hardcodes Base; no executor
Confirmed no references in apps/web/src/ before delete (grep clean).
Remaining logos in public/logo/ — mpp, solana, superteam, x402 — match
the TrustStrip LOGOS array. Future contributors can re-add any logo
when its executor lands.
…layer (phase O.18)
CLAUDE.local.md (2026-04-23 audit) flagged these as legacy artifacts
"still in the tree but not driving the UI". Verified via grep: nothing
outside services/index.ts (the barrel itself) imports any of:
- apps/web/src/lib/services/fleet-service.ts (interface)
- apps/web/src/lib/services/mock-fleet-service.ts (162 lines impl)
- apps/web/src/lib/services/wallet-service.ts (interface)
- apps/web/src/lib/services/mock-wallet-service.ts (impl)
- apps/web/src/lib/services/index.ts (barrel)
- apps/web/src/lib/hooks/query-keys.ts (15 lines)
The dashboard's data layer pivoted to convex/react useQuery hooks
(apps/web/src/lib/hooks/use-*.ts). MockFleetService was the pre-Convex
in-memory backend; query-keys was the TanStack Query cache-key
constants that came with it.
NOT removed:
- apps/web/src/lib/simulation/engine.ts (SimulationEngine) — still
imported by routes/_onboarding/deploy.tsx to drive the post-deploy
fake transaction feed during onboarding. Live code, despite the
CLAUDE.local.md note grouping it with the dead data-layer files.
Future chunk could swap it for real Convex-feed reads, but that's
a feature change, not a cleanup.
Verified: bun run check-types passes (turbo cache hit), no broken
imports.
…utor (phase O.19)
The two real on-chain executors introduced in O.2 + O.3 had no unit
tests — the cascade-routing logic (which executor.canExecute returns
true for a given detection + wallet) was only ever validated by the
live e2e flow against tools/test-402/server.ts. A canExecute regression
would silently route payments through the wrong executor (or fall
through to the unsupported-protocol stubs and throw), and the seeded
tests wouldn't catch it.
12 new tests (vitest), 6 per executor, extending the existing
new-executors.test.ts pattern.
x402SolanaExecutor:
✓ true for x402 on solana-devnet with Solana wallet
✓ true for x402 on solana-mainnet
✓ false for EVM networks (base, base-sepolia) — cascade falls
through to x402EvmExecutor
✓ false without Solana wallet (empty wallet, evm-only wallet)
✓ false for non-x402 protocols (mpp, l402)
mppChargeExecutor:
✓ true for mpp on solana-devnet / -mainnet
✓ true on legacy "devnet" / "mainnet-beta" network strings (some
MPP WWW-Authenticate parsers yield these shorter names)
✓ false without a Solana wallet
✓ false for non-mpp protocols (x402)
✓ false for non-Solana networks (base)
Execute path stays in e2e (real Solana RPC + funded keypair required).
Future chunks can add Mollusk/litesvm-backed integration tests for the
execute body once the test-validator surface stabilizes.
Verified: bun test test/new-executors.test.ts
→ 27 pass, 0 fail, 31 expect() calls, 104ms.
Pre-flight diagnostic for the demo. Before this chunk, status showed
fleet identity + wallet balance but said nothing about whether the
services the demo actually depends on were up. A judge running
`rhemify status` would still have to manually try `curl localhost:8080`,
`curl localhost:3212`, etc.
New "Services:" section probes three dependencies in parallel with a
2.5s per-probe timeout:
Go server GET /api/health
Convex POST /api/query (empty body — Convex 400s but TCP RTT
confirms the deployment is up)
Test 402 GET /health (informational — not mandatory)
Output:
Test 402 ● reachable (7ms, http://localhost:3402/health)
Go server ● reachable (8ms, http://localhost:8080/api/health)
Convex ● reachable (10ms, http://127.0.0.1:3212/api/query)
Color coding:
● green reachable + 2xx (or any response for Convex POST mode)
● yellow reachable but non-2xx HTTP status
○ red network failure — distinguishes "timeout" / "not running" /
other Error.message in the rightmost column
Also hardened the existing wallet balance lookup: previous version
silently crashed on RPC failure; now reports the error inline and
continues to the services section instead of aborting.
One script the judge / a new contributor can invoke to walk the whole
pipeline in one shot. Assumes services are up (Convex, Go server,
test-402) and the CLI is onboarded — surfaces a 'not reachable'
service early via the embedded `status` check rather than failing
mid-replay with an opaque error.
Steps (with set -euo pipefail, so any failure aborts):
1. rhemify status — fleet identity + service health
2. rhemify pay <endpoint> — real Solana memo tx
extracts trace_id from stdout
3. rhemify traces show <id> — 7-section decision context render
4. rhemify traces replay <id> — counterfactual with daily_limit=0
(the killer-demo: ALLOWED → BLOCKED)
5. Summary — explorer link, follow-up commands
Endpoint defaults to http://localhost:3402/stock-data (x402). Pass
http://localhost:3402/analytics as $1 for MPP — same flow, different
detection path, same replay primitive.
One gotcha caught while writing this: `bun --cwd <path> run src/index.ts
<args>` makes bun think "src/index.ts" is a package.json script name
and swallows the actual argv. The script uses `bun <absolute-path>
<args>` instead, which invokes the file directly.
Verified end-to-end:
$ tools/demo-run.sh
→ trc_2d56f729c38e478a + sig 4DciXdjUj... on devnet
→ replay diff: daily_limit pass → BLOCK CHANGED, others —
Quickstart's six manual command lines compressed to one: ./tools/demo-run.sh The individual commands stay listed below for anyone who wants to run them by hand. Also corrected the per-command invocation from `bun --cwd packages/cli run src/index.ts ...` to `bun packages/cli/src/index.ts ...` — the former was broken (bun interpreted "src/index.ts" as a package.json script name and dropped the actual argv, see O.22 commit notes). No other changes — Quickstart still requires Convex / Go server / test-402 / wallet setup in steps 1-3 before the runner can fire.
…odes (phase O.24)
`.padEnd(20)` on a pc-colorized string counts the ANSI escape sequences
as visible chars, so "pass" (4 chars green = ~14 bytes) padded to 20
left 6 trailing spaces instead of 16. The result column drifted off the
header line by ~6 chars per row.
New helper `colorPadEnd(colored, visible, width)` takes both the
colorized string (rendered) and the uncolored visible (for measuring),
returning `colored + repeat(width - visible.length, " ")`.
Before:
✓ vendor_blocked pass → pass —
After:
✓ vendor_blocked pass → pass —
The width was also shrunk 20 → 10 since policy decisions are short
words ("pass", "block", "flag", "skipped") — 10 is enough headroom
and the columns now sit closer for easier eye-tracking.
The "BLOCK" uppercase variant for `block` results is handled in the
visible-string computation so the width still measures correctly when
the rendered string is "BLOCK" (5 chars) not "block" (5 chars — same
count, but the case-folding rule must be applied to the visible too
to stay consistent if anyone changes block's rendering later).
Same ANSI-padding bug as O.24 but in show.ts's POLICY section. Single
loop iterating each policy rule did:
const result = pc.green(r.result); // ANSI-wrapped
console.log(`... ${result.padEnd(20)} ...`); // counts escape codes
so the "pass" / "block" / "flag" column was over-padded by ~6 chars,
pushing the trailing `threshold ... actual ...` text rightward and
breaking eye-tracking across rows.
Fix mirrors O.24: compute the visible (uncolored) length first, color
the visible string second, append explicit padding spaces outside the
color codes. Column width pulled from 20 → 10 since the values are
short ("pass", "BLOCK", "flag", "skipped") and the threshold/actual
detail can use the recovered horizontal space.
After:
✓ vendor_blocked pass threshold not in blocked list actual localhost
✓ daily_limit pass threshold $100 actual $0.50
(each row's "threshold" starts at the same column — was drifting before)
When a replay override doesn't change any rule outcome (e.g.
--daily-limit 10000 raising the limit above the actual spend), the Go
replay engine returns an empty PolicyDiff slice. Go's json.Marshal
serializes `[]PolicyDiff(nil)` as `null`, not `[]`. The CLI did:
diff: PolicyDiff[]; // ← type lied
const diffRules = new Set(r.diff.map(...)); // crashes on null
if (r.diff.length === 0) { ... } // also crashes
so every "what if I loosen the policy?" counterfactual died with
"null is not an object (evaluating 'r.diff.map')" — the killer demo
only worked in the "tighten" direction.
Fix:
- Type the field as `PolicyDiff[] | null` (truthful contract).
- Coalesce to [] once at the top of render() (`const diff = r.diff ?? []`).
- Switch the two existing call sites (rule-by-rule + DIFF SUMMARY)
to the local variable.
After:
$ rhemify traces replay <id> --daily-limit 10000
→ counterfactual: ALLOWED (decision unchanged)
→ DIFF SUMMARY: "No rules changed outcome — your override didn't
affect the decision."
This was almost certainly the second-most-likely crash a judge would
hit during the demo (after the missing payment_tx_hash render). Both
were "non-happy-path" cases that real traces produce but the seeded
fixtures didn't.
Previously when the PDA for fleet+date already existed but its root
didn't match the current trace, verify printed "MISMATCH" at the top
but then "already anchored — verified without writing a new tx" in the
status line. Two contradictory messages — and the second one says
"verified" which is the opposite of MISMATCH.
The contradiction surfaces because the program design anchors a single
trace's hash directly to the daily-root PDA. With one PDA per
fleet+date, only the first traces-verify'd trace each day has a
matching on-chain root; subsequent calls report MISMATCH against the
first one's hash. The judge running `rhemify traces verify` on a
fresh trace will hit MISMATCH on the second invocation, with no clue
why.
Three-branch status now reflects the actual state:
newly_anchored=true "freshly anchored in this run" (green)
!newly_anchored,match=true "already anchored — on-chain root matches this trace"
!newly_anchored,match=false "PDA exists from a previous anchor for this fleet+date,
but its root differs from this trace. To anchor this
trace's hash, delete or rotate the existing PDA, or
wait until the next day's PDA slot." (yellow)
The product gap this exposes — anchoring single trace hashes instead
of a daily Merkle root of all traces — is a design simplification, not
a correctness bug. The MISMATCH report is the correct audit result;
this commit just stops lying about what the state means.
A future chunk could swap the anchor to a real Merkle root over the
day's traces (matches the field name `merkle_root` on the program
state) so any subsequent verify call computes a proof against the
batch. That's a feature, not a fix.
…g gap (phase O.28)
Two roadmap updates:
- "CI/CD on GH Actions" now annotated with "shipped" pointer to
.github/workflows/ci.yml (the O.6 commit). Was listed as future
work but is live and green on every push.
- Added "Per-trace Merkle anchoring" as the next concrete roadmap
item, surfaced by O.27. The current design calls write_daily_root
with a SINGLE trace's content hash, treating it as the day's
"merkle root". With a per-fleet-per-date PDA, only the first
verify-call's trace each day matches; everything else MISMATCHes.
The Anchor program already accepts merkle_root + trace_count, so
the on-chain structure is there — the batching layer (build a
Merkle tree of the day's traces server-side, return per-trace
proofs from `rhemify traces verify`) is the missing piece.
Honest disclosure of a real product gap, before a judge or contributor
runs into it cold.
Previous Quickstart left "<api key from Convex fleets row>" as a placeholder for fleetApiKey — meaning a new contributor had to figure out how to query Convex (which endpoint? which key? which credentials?) before they could even run the demo. The seed mutation (packages/backend/convex/seed.ts) creates a fleet with a stable known api_key "rhm_demo_local_fleet_key_2026" — exposed it so the Quickstart can stand alone. Quickstart now: 1. Install 2. Backend services up 3. curl http://127.0.0.1:3212/api/mutation -d '{"path":"seed:demo",...}' → creates fleet + 6 agents 4. Config file with the seed's known api_key 5. ./tools/demo-run.sh Fleet/agent ids are still placeholders because they're Convex auto-generated and not the auth path — the Go server's FleetAPIKeyAuth middleware looks up fleet_id by api_key on every request. A contributor can leave them as <placeholders> and the demo still works. Truer-still UX would be `rhemify onboard` writing the config automatically off the seeded fleet, but that's a feature change. The Quickstart now matches what the demo actually requires.
…ase O.31) Three rows added to the verifiable-capability table: Full demo, one shot ./tools/demo-run.sh (shipped in O.22) Dependency health rhemify status (shipped in O.20) CI on every push gh run list ... (shipped in O.6/O.7) A judge skimming the table now sees the entire shipped surface area, not just the per-command primitives. The runner row in particular is the lowest-friction entry point — same flow as the 4 manual rows below it, but one keystroke.
…st shared root (M.1-M.5)
Closes the biggest remaining product gap surfaced in O.27/O.28. Before
this commit, `rhemify traces verify <id>` anchored a single trace's
content hash to the daily PDA; the second trace of the day reported
MISMATCH because the on-chain root was the first trace's hash. The
Anchor program already had `merkle_root + trace_count` fields — the
batching layer just wasn't there.
New machinery (M.1):
apps/server/internal/merkle/
Build/Path/Verify on a standard binary Merkle tree, SHA-256, odd-
count duplicate-last-leaf padding. Domain separation: leaf prefix
0x00, node prefix 0x01 — second-preimage defense. 10 unit tests
pin the contract (empty / 1-leaf / 2-leaf / 4-leaf / odd-count /
wrong-leaf / wrong-root / range / domain-separation / bad-length).
New Convex query (M.2):
traces:listByFleetDate(fleet_id, date) → ordered list of valid-hex
traces with leaf indices. Order is _creationTime asc so leaf
positions are stable across requests. Skips pre-O.1 seeded traces
whose trace_hash isn't a valid 64-char hex SHA-256 (those would
break leaf hashing).
New Go endpoint (M.3):
GET /api/anchor/:fleetId/:date/merkle-proof?trace_id=X
Builds the Merkle tree from Convex, returns:
{ fleet_id, date, trace_id, trace_hash, leaf_index, leaf_hash,
root, trace_count, path: [{ hash, side }] }
Server-side build because every trace must be a leaf — clients can't
cheaply re-fetch all of them per-verify.
CLI rewrite (M.4):
rhemify traces verify <id> now:
1. Fetches proof from the new endpoint
2. Recomputes root from leaf + path locally (mirrors merkle.Verify)
3. Reads on-chain PDA root. Match → VERIFIED.
4. If on-chain root is stale (different from current Merkle root —
happens when more traces have been added since last anchor),
submits write_daily_root with new root + new trace_count.
Render expanded: MERKLE PROOF section shows leaf_index / leaf_hash /
proof-valid; ON-CHAIN section shows root match; audit-grade-proof
paragraph at bottom shows a third-party auditor's verification recipe.
Verified end-to-end on devnet (M.5):
Trace trc_d2c948257c414f02 → leaf #8 of 12 → root 7a8e7a9e...
anchored fresh, tx 5eiskSZH3Ww..., slot 461598835
Trace trc_4f362bd02f2249d9 → leaf #6 of 12 → root 7a8e7a9e...
VERIFIED against existing on-chain root, no new tx
Trace trc_27ec99bb2f324687 → leaf #9 of 12 → root 7a8e7a9e...
VERIFIED, no new tx
→ three different traces, one shared root, one anchor tx total.
The MISMATCH bug from before is gone.
Roadmap entry in README updated to (shipped).
…wup) Adds the per-trace Merkle proof + shared-root verify row. Sits below the counterfactual replay row to mirror the demo flow ordering.
After M.1-M.5 the Merkle-proof verify works for any trace in the fleet+date, not just the first one of the day. Add it as the last per-command quickstart line so a contributor exploring by-hand sees the verifiable on-chain anchor step too.
… memo fallback (phase R) The biggest single remaining product gap. Where the memo executor proves intent, this one moves actual USDC from payer's ATA to recipient's via Token::TransferChecked. Settlement, not just intent. Cascade ordering (packages/sdk/src/execute/index.ts): x402SolanaTransferExecutor — real USDC x402SolanaExecutor — memo fallback executeWithCascade tries transfer first; canExecute or execute() failure falls through to memo. Demo always succeeds; production callers get real settlement when wallet has USDC. canExecute requirements: protocol = x402, Solana network, wallet has solanaPrivateKey, payTo is a sensible-length base58 string AND NOT the System Program '1111…1' placeholder (test 402 server's default; transfer declines so memo picks up). USDC mint constants: devnet 4zMMC9srt5Ri5X14GAgXhaHii3GnPAEERYPJgZJDncDU mainnet EPjFWdd5AufqSSqeM2qN1xzybapC8G4wEGGkZwyTDt1v decimals 6 (matches detection.priceRaw base units) No new dependency: SPL Token + ATA programs invoked via raw TransactionInstruction, same pattern as the memo executor. Hand-built discriminators — TransferChecked (12), ATA CreateIdempotent (1). 7 unit tests pin canExecute: ✓ true on solana-devnet + real recipient ✓ true on solana-mainnet too ✓ false for System-Program placeholder recipient ✓ false for empty / malformed payTo ✓ false without a Solana wallet ✓ false for non-x402 protocols (mpp) ✓ false for EVM networks (base) Cascade fallback verified e2e: rhemify pay http://localhost:3402/stock-data → System-Program recipient → transfer declines → memo runs → sig 4T4yJuVLgr… real on devnet. Live USDC settlement requires funding payer wallet (4FCi24Yy7CWw4V5B1UhGHbhDTvy18fryrG4rrtP2mcz3) with devnet USDC via faucet.circle.com (no programmatic faucet exists) + setting RECIPIENT_ADDRESS to a real-keypair pubkey on the test server. mppChargeTransferExecutor not started — same pattern, would slot in ahead of mppChargeExecutor in the cascade. Future chunk.
…andard (phase R.MPP)
Same shape and rationale as x402SolanaTransferExecutor (phase R), wired
for the MPP cascade. Real USDC settlement first, memo intent fallback.
Cascade ordering for MPP:
mppChargeTransferExecutor — real USDC (NEW)
mppChargeExecutor — memo intent fallback
Differences from x402SolanaTransferExecutor:
- Outgoing header is Authorization: Payment <base64> (MPP convention)
not X-Payment (x402 convention).
- PaymentPayload uses scheme=solana, no x402Version field — matches
what mppChargeExecutor sends so a downstream parser sees the same
shape across the cascade fallback.
- Network list includes 'devnet' / 'mainnet-beta' legacy aliases
(MPP WWW-Authenticate parsers sometimes yield these).
Everything else is identical: same ATA derivation, same
TransferChecked + CreateIdempotent instructions, same USDC mint
constants, same System-Program decline.
6 new canExecute unit tests added (40 total in new-executors.test.ts):
✓ true on solana-devnet + real recipient
✓ true on legacy 'devnet' / 'mainnet-beta' network aliases
✓ false for System-Program placeholder recipient
✓ false without a Solana wallet
✓ false for non-mpp protocols (x402 routes through its own transfer)
✓ false for non-Solana networks (base)
Cascade fallback verified e2e against MPP test endpoint:
rhemify pay http://localhost:3402/analytics → System-Program recipient
→ transfer declines → memo runs → sig 3Gff8xeLxA… real on devnet.
Closes the symmetric MPP gap; both standards (x402, mpp) now have
real-USDC-with-memo-fallback executors. Live USDC e2e proof still
requires user funding the payer ATA via faucet.circle.com.
… (phase E)
Mirror of x402SolanaTransferExecutor for EVM chains. Real ERC-20
transfer(to, amount) on Base / Base Sepolia / Ethereum / Sepolia, USDC
contract addresses hardcoded per Circle's canonical deployments.
Cascade ordering for EVM x402 (packages/sdk/src/execute/index.ts):
x402EvmTransferExecutor — real ERC-20 (NEW)
x402EvmExecutor — legacy peer-dep variant (unproven)
Same canExecute-declines-placeholder pattern as the Solana pair: the
test 402 server defaults RECIPIENT_ADDRESS to 0x...0001 which this
executor declines so the cascade falls through cleanly.
What it does end-to-end:
- createWalletClient(privateKeyToAccount(wallet.evmPrivateKey))
- publicClient.writeContract calling USDC.transfer(recipient, amount)
- waitForTransactionReceipt to confirm status === 'success'
- x402-spec PaymentPayload with kind=erc20-transfer + tx hash
- HTTP retry with X-Payment header (same shape as Solana side)
USDC contracts (Circle's canonical deployments):
base 0x833589fCD6eDb6E08f4c7C32D4f71b54bdA02913
base-sepolia 0x036CbD53842c5426634e7929541eC2318f3dCF7e
ethereum 0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48
ethereum-sepolia 0x1c7D4B196Cb0C7B01d743Fbc6116a902379C7238
USDC has 6 decimals on Solana AND EVM, so detection.priceRaw is reused
directly without re-scaling.
canExecute filters:
- protocol = x402
- EVM network (base / base-sepolia / ethereum / ethereum-sepolia)
- wallet has evmPrivateKey
- detection.payTo is a real 0x-prefixed 40-hex AND NOT the 0x...0001
placeholder AND NOT the zero address
9 new canExecute unit tests, 49 tests total in the file:
✓ true on base-sepolia + real recipient + EVM key
✓ true on base mainnet
✓ true on ethereum / ethereum-sepolia
✓ false for 0x...0001 placeholder (test server default)
✓ false for zero address
✓ false for malformed / non-hex / ENS-style names
✓ false without EVM key (only solana key, or empty wallet)
✓ false for non-x402 protocols (mpp)
✓ false for Solana networks
About Phantom: investigated whether 'just use Phantom' shortcuts EVM
execution. Phantom is browser-extension first; for our CLI/server SDK
we need programmatic signing. WalletConfig already has both
solanaPrivateKey and evmPrivateKey as first-class fields — so a user
who exports a Phantom private key (multi-chain) and drops it into the
config gets the same effect. Phantom doesn't shortcut the executor
work; the executor work IS what was missing.
Not in this chunk:
- CLI integration (no ~/.rhemify/evm-wallet.json yet) — user must
construct WalletConfig manually to activate EVM today.
- Live e2e against Base Sepolia — same gating as Phase R was for USDC
on Solana: requires user-funded testnet account. Faucet flow:
faucet.circle.com for USDC + a Base Sepolia ETH faucet for gas.
Closes the audit's 'EVM unproven' line item at the code-path level.
Live proof is the same opt-in shape as Phase R.7 was — user funds, we
run the command, we get a real tx hash.
…y in README (phase E.cli)
CLI integration for the EVM transfer path shipped in phase E. Three
parts:
packages/cli/src/config.ts
loadEvmWallet() reads ~/.rhemify/wallet-evm.json. Returns null when
the file doesn't exist — EVM is opt-in, not required for the demo.
Wallet shape: { privateKey: 0x-prefixed hex, address: 0x... } so the
SDK's WalletConfig.evmPrivateKey can be wired without re-deriving
address each call.
packages/cli/src/commands/pay.ts
Loads the EVM wallet when present and includes evmPrivateKey in the
SDK's WalletConfig. Spread-with-conditional keeps the field absent
when no EVM wallet exists (matters because x402EvmTransferExecutor's
canExecute checks for wallet.evmPrivateKey presence, not just
truthiness). Prints a one-line 'EVM wallet: 0x... (Base/Sepolia/
Ethereum capable)' confirmation when active.
packages/cli/src/commands/status.ts
New 'EVM Wallet' section with the funded-via instructions. Helps a
contributor see the live e2e path is wired without having to grep
config.ts.
README.md
Added explicit 'Signing model — ows only' section to the 'What is
NOT in v1' surface. This is the security-honesty move flagged in
the latest review: the demo uses Own Wallet Signing (agent holds
raw key) gated by the 6-rule client-side policy engine. That's
appropriate for bounded-budget testnet/production agents but NOT
for treasury-scale fleets. Squads / Ika 2PC-MPC / Privy passkey
instruments are registered in the path resolver as stubs precisely
because they're the production-grade signing paths — the audit
surface acknowledges that ows is the demo instrument, not the
production recommendation.
The temp EVM wallet at 0x0E250EF30E837d3b19F42029e62edc854A7011a1 was
generated via viem.generatePrivateKey() into ~/.rhemify/wallet-evm.json
with 0600 perms. Sending Base Sepolia ETH + USDC there activates the
live x402EvmTransferExecutor demo path.
Previously /weather hardcoded network=base-sepolia + payTo=0x...0001. For a payer with Ethereum Sepolia ETH (separate chain from Base Sepolia — same 'Sepolia' name but different testnets), we need to flip the network. Made both configurable: EVM_NETWORK=ethereum-sepolia bun run server.ts EVM_RECIPIENT=0xYourRealAddress bun run server.ts Defaults stay backward-compatible (base-sepolia + 0x...0001) so existing local invocations continue to work. The EVM_RECIPIENT shape is checked by x402EvmTransferExecutor.canExecute — anything that isn't a real 0x-prefixed 20-byte address (or is the 0x...0001 / 0x...0000 placeholder) declines and the cascade falls through.
…t flow (phase X)
Closes the spec divergence discovered when testing against x402.org's
production endpoint. Before this commit, x402SolanaTransferExecutor
always broadcast-then-handed-off — the canonical x402 flow is
sign-without-broadcasting and let the facilitator pay the gas + verify
+ broadcast atomically. Empirically:
Pre-fix attempt against x402.org:
sig 4fWkbh97H72B... — REAL 0.01 USDC moved to facilitator CKPKJWNd...,
but x402.org's resource didn't validate our X-Payment payload
(single signature string, not the signed-tx-bytes the facilitator
expects) so we 402'd. Funds lost.
Post-fix attempt against x402.org (same endpoint):
Transfer executor partial-signed with feePayer=facilitator, NEVER
broadcast — funds preserved. Cascade fell to memo (which also
rejects). 0 USDC lost. The new flow is FAIL-SAFE on rejection.
Concrete changes:
packages/sdk/src/types.ts
DetectionResult gains optional feePayer + asset fields. feePayer is
the spec's extra.feePayer (the facilitator pubkey that must pay gas
+ broadcast). asset is the canonical mint/contract address from the
402 response.
packages/sdk/src/detect/x402.ts
Extracts extra.feePayer and req.asset from the 402 response shape.
Existing CAIP normalization preserved.
packages/sdk/src/execute/x402-solana-transfer.ts
Two paths now:
- facilitator mode (detection.feePayer set):
tx.feePayer = facilitator pubkey
tx.partialSign(payer)
serialize({requireAllSignatures:false}) → base64
PaymentPayload with payload.transaction = base64-bytes
POST X-Payment, facilitator broadcasts on its own gas.
DOES NOT touch chain ourselves — funds only move if 200 returned.
- self mode (no facilitator):
unchanged from previous version — sign + broadcast + retry.
Plus uses detection.asset over hardcoded USDC mint, and echoes back
CAIP network identifier in PaymentPayload (toCaipNetwork helper)
since the facilitator's validator matches the original string from
the 402 response, not our normalized name.
What x402.org's persistent 402 means (not our bug): their 402 response
lists both Base Sepolia AND Solana Devnet acceptance, and their
facilitator pubkey CKPKJWNdJEqa... has been receiving prior x402-Solana
payments (0.491 → 0.501 USDC from our pre-fix attempt). But their
resource server doesn't return 200 for Solana, suggesting their Solana
facilitator backend is listed-but-not-yet-active. The new client flow
will work against any spec-compliant Solana facilitator that does
implement verification.
The safety-on-rejection property is the main improvement: a partial-
signed tx that x402.org refuses to broadcast is just abandoned — no
on-chain effect, no funds lost. Same outcome as a network error.
Interop status: ✓ wire-format spec-compliant, ✗ end-to-end against x402.org
(blocked on their facilitator). Testing against any other Solana
x402 endpoint with a working facilitator would close the loop.
…YMENT-SIGNATURE header
Wires three corrections that were preventing x402-svm facilitator-mediated
flows from settling. All three are required together; missing any single one
leaves the resource at a silent 402 with no diagnostic.
1. PAYMENT-SIGNATURE header (was X-Payment).
v2 x402 resources read the payment payload from `PAYMENT-SIGNATURE`;
`X-Payment` is the v1 header name and v2 servers ignore it (returning the
same 402+menu to every input, including no header at all — the symptom
the empirical retry against x402.org/protected was producing). Source:
@x402/core http/x402HTTPClient.ts:encodePaymentSignatureHeader switches
header name on x402Version. Self-broadcast mode (local test-402 server)
keeps X-Payment since the server is v1-shaped.
2. PaymentPayload shape `{ x402Version, accepted: PaymentRequirements,
payload: { transaction } }` (was flat scheme+network at top level).
v2 findMatchingRequirements matches `paymentPayload.accepted` against
`accepts[]`; flat scheme/network produces "No matching payment
requirements". `accepted.amount` MUST be a string ("10000"), not a
number — the wrong type also causes match failure.
3. v0 VersionedTransaction with feePayer = facilitator pubkey, partial-sign
payer only, base64 wire bytes in payload.transaction. Mirrors @x402/svm
exact/client/scheme.ts:111-182 byte-for-byte. Self-broadcast keeps the
legacy Transaction path.
Supporting hardcode removals discovered while debugging:
- USDC_DECIMALS = 6 → now reads `mintInfo.data[44]` (the SPL Token mint
layout's decimals byte). Canonical does `fetchMint(asset).data.decimals`;
same idea. Hardcoded 6 worked only because all our tests used USDC; any
non-6-decimal SPL mint would either get rejected at facilitator verify
or transfer off-by-10^N.
- Mint fallback to DEVNET_USDC_MINT in facilitator mode → removed. Throws
ExecutionError if 402.extra.asset is absent (canonical client behavior).
Self-broadcast keeps the USDC fallback since the test-402 server omits
asset for ergonomics.
- Math.random() nonce fallback → removed. crypto.getRandomValues only.
ATA-create-idempotent is now gated to self-broadcast mode only — in
facilitator mode the @x402/svm verify rejects any ix at position 0 that
isn't ComputeUnitLimit (`invalid_exact_svm_payload_transaction_
instructions_length`), so prepending an ATA-create breaks the ix ordering
the facilitator requires.
ComputeBudget ixs (setComputeUnitLimit=20000, setComputeUnitPrice=1µL) +
Memo ix (seller's extra.memo bytes if present, else 16-byte random hex
nonce) are added in facilitator mode at positions 0, 1, and 3 to match
the canonical client. Memo bytes pass through detection.memo (new field
surfaced from extra.memo for facilitators that need a server-pinned memo
for byte-for-byte verification).
E2E proof on Solana devnet against https://www.x402.org/protected:
HTTP 200 OK
payer: 8usJ1ShvoR3e74E6WMaNk2owwGUf87MuCuBJHdPgdEnQ
facilitator: CKPKJWNdJEqa81x7CkZ14BVPiY6y16Sxs7owznqtWYp5
settle sig: 2GWjFrZaANB5rM6hHzXtuxtXrLACNvP68kgQYLosPpwiMWi7UpSv1e9Zo285dQ5qfXND6xc28iDsWrp6rwhZqT4p
USDC delta: 0.59 → 0.58 (0.01 USDC moved by facilitator, not by us)
Trace: trc_08c7f6890d7e4677 (via tools/test-402/e2e-pay-test.ts Test 3)
Explorer: https://explorer.solana.com/tx/2GWjFrZaANB5rM6hHzXtuxtXrLACNvP68kgQYLosPpwiMWi7UpSv1e9Zo285dQ5qfXND6xc28iDsWrp6rwhZqT4p?cluster=devnet
Facilitator-broadcast is now the canonical x402 v2 client path on Solana.
…) + signer + drain race) Three bugs that combined to silently lose Layer-1 anchors on every CLI/script payment. Per docs/stack/02-convex.md, payment_traces.anchor_tx_hash must hold the Solana Memo tx signature for each trace; before this change, that field stayed null for any rhemify CLI invocation. 1. Rhemify client never exposed a drain method. The AnchorQueue runs flush() on a 2s background tick. Short-lived processes (rhemify CLI, scripts, one-shot jobs) exit before the tick fires, killing the in-flight Memo tx and Convex PATCH mid-await. Long-running services were fine. Added Rhemify.close() — awaits anchorQueue.drain() so the queue empties before the caller continues. CLI's `rhemify pay` now awaits it before exit (success + error paths). Long-running services can ignore it. 2. AnchorQueue.flush() race with the background timer. The old guard `if (this.processing) return` made drain() bail when the background timer was already mid-`processBatch`. Drain returned, CLI exited, the in-flight RPC + PATCH got torn down. Pending=0 (items were already spliced out) made it look like drain succeeded. Replaced with an `inflight: Promise<void>` that re-entrant callers join — drain awaits the existing work, then runs another pass if items remain. 3. Memo tx was built with `setTransactionMessageFeePayer(signer.address)`. @solana/kit's signTransactionMessageWithSigners needs the signer object, not just the address, to actually sign the fee-payer slot. The tx came back "missing signatures for addresses: <fee-payer>" and got rejected at send. Swapped to `setTransactionMessageFeePayerSigner(signer)` which registers both the address and the signer. Also: AnchorQueue now awaits transport.updateTraceAnchor instead of fire-and-forget — drain() can only honor its contract if the Convex patch lands before drain returns. Persistence failures are routed through onError without failing the batch (the Memo tx itself succeeded; re-attaching is recoverable). Verified end-to-end against https://www.x402.org/protected on Solana devnet: payment tx (x402 v2 facilitator settled 0.01 USDC): PhMsmnjJNaXeqcbhnoXPahtK9PNuEJ2Dohebrids7n8C6eVNMG2wHTMyhc9xX4s7kBz4vu34AzdC1s8R2U3v2no anchor tx (Layer-1 Memo, trace hash 20d3c132...): 4ESUQYmySjYarCDT3mFwdd9bzsWZ8mPRhnQCuVnsT2ijz8HYPPUYnE56YRSYwTtjyJVP8dysy68HPUdBbRYp5cmp rhemify traces show trc_4b71f5ceb193485b → both txs render with explorer links Convex payment_traces.anchor_tx_hash now populates for every `rhemify pay`.
…ses real fleet config
Two real bugs that combined to drop Layer-1 anchors in short-lived scripts
even after the previous close()/drain race fix landed:
1. transport.ingestPayment was fire-and-forget AND untracked by close().
The CLI's natural delay between pay()-return and close()-call masked it
(~3-5s of console.log + Memo tx build), but the e2e harness exits Test 3
instantly. close() drained the anchor queue, the Memo tx fired on-chain
successfully — and then updateTraceAnchor PATCH hit Convex BEFORE the
trace document existed there. Convex's traces:updateAnchor throws
"Trace not found" in that window; the queue's PATCH retry path swallows
it; anchor_tx_hash stays null forever.
Fix: client tracks every in-flight ingest promise. close() awaits them
all (Promise.allSettled) BEFORE draining the anchor queue, so the trace
document is always durable in Convex when the PATCH fires. Self-cleaning
on settle so long-running sessions don't accumulate references.
2. tools/test-402/e2e-pay-test.ts used hardcoded test-fleet-key /
fleet-e2e-test that never resolved in Convex's fleets table, so every
ingest + anchor PATCH 401'd silently via FleetAPIKeyAuth — the harness's
traces never landed in Convex at all. Also called process.exit(1) before
awaiting close(), killing the AnchorQueue's flush mid-Memo-tx.
Fix:
- Load fleetApiKey / fleetId / agentId from ~/.rhemify/config.json (the
same source the production CLI uses, written by `rhemify onboard`).
- Load Solana wallet from ~/.rhemify/wallet.json instead of repo-root
.test-wallet.json — single onboarded credential set.
- Fail loud with onboard guidance if either file is missing.
- Move `await rhemify.close()` inside main() before setting exitCode so
Node drains the event loop instead of exiting hot.
End-to-end verified on Solana devnet via `bun run tools/test-402/e2e-pay-test.ts`:
trace trc_9bc95564c3b54952
payment tx 39CwYkR8w6uBsj7aCvCvv2m3zbqbV6KLcoAdNfsFXkYn4GMbLmtreQh2qVWBc1SrNfaCAiecfE3zHTtkaR2YGBhn
anchor tx 2Pkc6En3ZADiz7oCiUQYokXNYMpTnime62TvhKTi1u2VcVYkY41UYYYK1dCHYrpXtSTAuW1KRsFPGUEyYgvHefef
rhemify traces show trc_9bc95564c3b54952 → both explorer links render
The harness now exercises the same auth path production traffic does and
produces durable Convex state per-run, not silently-skipped 401s.
…ace, fix dry-run cap Three small follow-ups from the chunk-4 audit: 1. packages/sdk/src/anchor/memo.ts — remove the file-wide `@ts-nocheck`. The pipe chain is now fully type-checked via `import type * as SolanaKit from "@solana/kit"`. Two narrowly-scoped `as never` casts remain at `sendAndConfirmTransactionFactory` and the signed-tx argument because `@solana/kit`'s cluster-brand (`'~cluster': "mainnet" | "devnet" | ...`) and lifetime-brand (`Blockhash | DurableNonce`) widen across runtime `string` rpcUrl values, and the overload picker can't choose without a compile-time literal. Each cast has a one-line justification inline. Net: ~95% of the file is type-safe at compile time and the only `@ts-expect-error -- optional peer dep` comments are gone (the package is in regular `dependencies`, so import types work directly). Also switched the single-ix append loop to `appendTransactionMessageInstructions` (plural) — the lib's variadic helper sidesteps per-iteration generic widening that was breaking type inference in the loop body. 2. packages/sdk/test/anchor.test.ts — regression test for commit 9b04d89's drain race fix. Asserts that `AnchorQueue.drain()` waits for an in-flight `processBatch` to complete before resolving, even when the queue is empty (items already spliced out). Holds `transport.updateTraceAnchor` open with a manual barrier so the race is deterministic in unit test time. Without the chunk-4 `inflight: Promise<void>` tracker, this test would fail because drain() would see `queue.length === 0` and return prematurely, silently losing the Memo tx's Convex PATCH. 3. tools/test-402/e2e-pay-test.ts — Test 1 fixture fix. The local test-402 server's /stock-data advertises $0.50 to exercise the budget cap path; the harness's `defaultMaxBudget` is $0.05. Pass `maxBudget: "$1.00"` per-call for the dry run so the pipeline can run end-to-end. Test 3's safety cap ($0.02) stays explicit and unchanged. Verified: - bun run check-types → all 4 packages pass - bun run build → SDK 116KB CJS / 113KB ESM - bun run test → 170 passed, 0 failed (anchor.test.ts now 8 tests) - bun run tools/test-402/e2e-pay-test.ts → 3 passed, 0 failed - rhemify traces show trc_b17556d6c0634a65 → both payment + anchor render
Pre-PR housekeeping. Three things a senior reviewer would reject on sight: 1. Compiled Go binaries in tree (`apps/server/bin/server` 18MB, `apps/server/seed` 8MB). Belong in CI artifacts, not source. Added `apps/server/bin/` and `apps/server/seed` to .gitignore + `git rm --cached`. 2. Local SQLite dev db (`apps/web/local.db`) tracked despite being listed in .gitignore. The earlier .gitignore entry only matched the repo-root `local.db`; the apps/web copy was caught by `git add` before the ignore reached it. Added `apps/web/local.db` explicit path + `git rm --cached`. 3. `apps/web/public/ascii-animation (1).mp4` — filename suggests a re-downloaded duplicate. Renamed to `ascii-animation.mp4`; Hero.tsx updated to reference the clean URL (drops the `%20(1)` encoding). No runtime change — only file management. Web build typecheck still passes.
…alone TUI
Audit-payment-rail surface now lives in the team's existing dashboard at
apps/web/src/routes/dashboard/ — same TanStack Start app, same dark theme,
same Convex `useQuery` data layer (matches Jun Shen's pattern in
use-agents / use-transactions). The parallel `apps/tui/` terminal dashboard
is removed: it was a separate surface we built, the team's product is
the React dashboard.
Added:
apps/web/src/lib/hooks/use-traces.ts
Two hooks — useTraces(filters) and useTraceByTraceId(id) — backed by
Convex traces:listAll and traces:getByTraceId, same queries the CLI's
`rhemify traces list/show` commands already render against.
apps/web/src/routes/dashboard/traces.tsx
Browse view. Mirrors rhemify CLI's `traces list` — sortable header,
blocked-only filter, limit selector, deep links to the detail route.
apps/web/src/routes/dashboard/traces.$traceId.tsx
Full decision-context view. Mirrors `rhemify traces show <id>` 7-section
render: TRACE / EVENT / POLICY / PATH / SNAPSHOT / VERIFIABILITY / NEXT.
Payment + anchor txs link to Solana explorer.
Wired:
Sidebar nav gets a "Traces" entry between Approvals and the Agents list.
TITLE_MAP gets "/dashboard/traces" → "Decision traces"; the route-id
matcher recognises traces.$traceId for "Trace detail" header.
routeTree.gen.ts regenerated by vite to include the two new routes.
Removed:
apps/tui/ (package.json, scripts/seed.ts, src/convex-client.ts,
src/index.tsx, tsconfig.json) — was an OpenTUI terminal dashboard
streaming Convex. Functional but a parallel surface to the team's React
dashboard. Decision traces now integrate into theirs.
README.md repo-tree dropped the apps/tui line.
Verified:
bun run check-types ← all 4 packages pass
cd apps/web && bunx tsc --noEmit ← only pre-existing sidebar
`/dashboard/agent/${id}` template-literal
warning remains (predates this branch).
bun run build ← vite SSR build success, 7s
routeTree.gen.ts ← DashboardTracesRoute + DashboardTracesTraceIdRoute
imports + paths registered
Browser test: pages render correctly client-side. SSR throws "fetch failed"
on EVERY dashboard route — pre-existing local-dev issue (root loader's
auth-session fetch hits a config gap when Convex is local-only). Not
introduced by this commit; identical behaviour against /dashboard,
/dashboard/policies, etc. on main. Production deploy with a real Convex
deployment URL renders cleanly.
Follow-up: replay-button + override-form on the detail page (today the CLI
handles counterfactuals; dashboard surfaces the command). Jun Shen's call.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.