separate loss bundle from value bundle semantically, add new inference proto#909
separate loss bundle from value bundle semantically, add new inference proto#909
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf Linter / buf (pull_request).
|
b166121 to
4bfb684
Compare
|
Nice! Need to add:
|
…with reserved fields, fix tests, add migrations and upgrades
|
|
Too many files changed for review. ( |
🔥 Code Review: 13 Issues Found (2 Merge Blockers)Reviewed: Full diff excluding codegen ( 🔴 Merge Blockers1. Silent Error Swallowing in
|
| # | Issue | Severity | Effort |
|---|---|---|---|
| 1 | Silent error swallowing in conversions.go | 🔴 HIGH | 5 min |
| 2 | No validation of Inference.Values | 🔴 HIGH | 15 min |
| 4 | Input still requires sig but output discards it | 🟡 MEDIUM | 30 min |
| 8 | require_unity validation on UpdateTopic | 🟡 MEDIUM | 15 min |
| 9 | Nil deref in InsertReputerPayload error path | 🟡 LOW | 2 min |
| 3 | Dead buffer pool | 🟢 LOW | 2 min |
| 5 | Storage wrapping churn | 🟢 LOW | TODO |
| 6 | Genesis roundtrip test | 🟢 LOW | 20 min |
| 7 | Immutability docs | 🟢 LOW | 5 min |
| 10-13 | Design/docs/migration | 🟢 LOW | ~15 min |
🤖 Agent Code Review — PR #909 (Pass 1)PR: Summary47 raw findings across 6 agents. After deduplication and cross-referencing, 9 convergent issues where 2+ agents independently flagged the same code area. Key themes: silent error swallowing, type alias providing no compile-time safety, genesis nil dereference risk, and signature verification removal.
🔴 CriticalC1. Nil pointer panic in CalcNetworkLosses — chain halt vectorFile:
Suggestion: Add nil guards: 🟠 High (Convergent)H1. Silent error swallowing in NewInferenceFromInput ⭐ 4-agent convergenceFile:
Impact: An attacker can submit malformed inference values that silently become zero, manipulating network inference output and gaming rewards. Suggestion: Propagate H2. LossBundle type alias provides zero compile-time safety ⭐ 2-agent convergenceFile:
Suggestion: Use H3. Genesis round-trip loses Signature/Pubkey — silent data corruption ⭐ 3-agent convergenceFile: The export/import cycle wraps Suggestion: Preserve Signature/Pubkey through the cycle, or document and test that they're intentionally dropped. 🟡 Medium (Convergent)M1. Signature verification removed without security boundary documentation ⭐ 3-agent convergenceFile:
Suggestion: Document the intended security boundary. If signatures are checked at ingestion, document that internal bundles are pre-validated. M2. v14 migration: non-deterministic map iteration + missing defaultsFile: Two issues: (1) Map iteration order is non-deterministic — different nodes may produce different state roots. (2) New fields Suggestion: Sort map keys before iterating. Set explicit defaults for new fields. M3. Inconsistent field immutability in UpdateTopicFile:
Suggestion: Document which fields are immutable after creation and why. M4. GetReputerLossBundlesAtBlock returns nil vs empty sliceFile: Behavioral change from previous Suggestion: Return empty slice for consistency, or audit all callers. M5. Genesis InitGenesis: nil dereference before nil checkFile:
Suggestion: Move nil check before the method call. 🔵 Low / Info (Notable)
Agent Quality Notes
Pass 1 complete. Pass 2 (alternate models) available on request. |
🤖 Agent Code Review — PR #909 (Pass 2)PR:
Total: 56 raw findings → 7 convergent themes (flagged by 4-6 agents each) Cross-Pass Convergence (Pass 1 + Pass 2)Findings that survived across both passes with different models are near-certainly real issues. 🔴 CONFIRMED CRITICAL — Silent Error Swallowing in
|
| Model | Dimension | Findings | Quality | Standout |
|---|---|---|---|---|
| deepseek-r1 | Security | 8 | ⭐⭐⭐⭐⭐ | Genuine reasoning chains. Found replay attack vector, economic DoS angles, zero-value injection chain. Massively outperformed gpt-4o's pass-1 security review. |
| deepseek-r1 | Adversarial | 10 | ⭐⭐⭐⭐⭐ | Best adversarial agent across both passes. Chained exploits together (validation removal → topic DoS → gas waste). New finding: RequireUnity as validation theater. |
| claude-opus-4 | Correctness | 8 | ⭐⭐⭐⭐ | Precise, well-reasoned, good line references. Caught v2 migration historical issue (new finding). |
| claude-opus-4 | Style | 12 | ⭐⭐⭐⭐ | Most findings of any agent. Elevated meaningful issues (function name mismatch, incomplete separation) above nitpicks. |
| gemini-2.5-pro | Architecture | 10 | ⭐⭐⭐⭐ | Strong structural analysis. Clean separation of concerns. Flagged proto serialization loss in slice alias (new). |
| o3 | Performance | 8 | ⭐⭐⭐⭐ | Found migration memory concern (new). Double-marshal in Size(). Good allocation tracing through call chains. |
Model Rotation Wins
deepseek-r1 replacing gpt-4o on security was the biggest upgrade. gpt-4o produced 4 shallow, generic findings in pass 1. deepseek-r1 produced 8 detailed findings with actual reasoning chains, attack scenarios, and economic impact analysis. The reasoning model approach works.
o3 on performance was solid — found the migration memory issue that opus missed. Different reasoning approach, different findings.
Summary Stats
| Pass 1 | Pass 2 | Cross-Pass Confirmed | |
|---|---|---|---|
| Raw findings | 47 | 56 | — |
| Critical | 1 | 2 | 1 |
| High | 8 | 9 | 3 |
| Medium | 18 | 22 | 2 |
| Low | 20 | 23 | — |
| Convergent (within pass) | 9 | 7 themes | — |
| Cross-pass confirmed | — | — | 6 issues |
| New in pass 2 | — | 2 | — |
Pass 2 complete. 6 issues confirmed across both passes with different model families. 2 new issues found only in pass 2 (RequireUnity theater, migration memory). Pass 3 (budget confirmation sweep) available on request.
🤖 Agent Code Review — PR #909 (Pass 2 — Real, via OpenRouter)PR:
Total: 35 raw findings (3 critical, 8 high, 15 medium, 9 low) Cross-Pass Convergence (Pass 1 vs Pass 2)Issues flagged independently by different model families across passes are near-certainly real. 🔴 CONFIRMED CRITICAL — Signature Verification BypassPass 1: 3 agents (opus, gpt-4o, gemini) | Pass 2: 3 agents (deepseek-r1 ×2, opus) The LossBundle refactor removed deepseek-r1's attack chain (new in pass 2): Spoofed identity → fraudulent loss values → corrupted weighted network losses → manipulated reward distribution. CWE-347 classification. Verdict: 🚨 Must fix. Either restore signature verification on LossBundle or clearly document that verification happens at ingestion only (with tests proving it). 🔴 CONFIRMED CRITICAL — Nil Dereference in CalcNetworkLossesPass 1: 2 agents (adversarial, correctness) | Pass 2: 1 agent (correctness/opus via OR)
Verdict: 🚨 Must fix. Add nil guards. 🟠 CONFIRMED HIGH — Silent Error Swallowing in NewInferenceFromInputPass 1: 4 agents (all flagged) | Pass 2: 1 agent (correctness/opus via OR)
Note: Fewer pass-2 agents flagged this directly, but deepseek-r1's security review flagged the broader pattern of "incomplete stake validation" in the same code area. The finding is confirmed across passes. Verdict: Fix before merge. 🟠 CONFIRMED HIGH — Genesis Round-Trip Loses Signature/PubkeyPass 1: 3 agents | Pass 2: 2 agents (correctness/opus, adversarial/r1) Export wraps Verdict: Fix or explicitly document as intentional. 🟠 CONFIRMED HIGH — WorkerSubmissionWindow=0 Validation RemovedPass 1: 2 agents | Pass 2: 1 agent (correctness/opus via OR) Validation removed, allowing topics with zero-length submission windows where workers can never submit. Verdict: Restore validation. 🟡 CONFIRMED MEDIUM — Type Alias is Cosmetic, Not Real SeparationPass 1: 5+ agents | Pass 2: 2 agents (architecture/gemini, style/opus via OR)
Verdict: Use 🟡 CONFIRMED MEDIUM — v14 Migration Non-Deterministic Map IterationPass 1: 4 agents | Pass 2: 1 agent (security/r1 — migration atomicity concern) Go map iteration is non-deterministic → different state hashes → consensus failure. Verdict: Sort keys. Standard Cosmos pattern. New Findings (Pass 2 Only)🟠 NEW — Repeated Stake→Dec Conversions in Inner Loop (o3)File: o3 identified that stake-to-Dec conversions happen inside the inner loop of CalcNetworkLosses rather than being precomputed. At scale (1000+ topics × 100+ reputers), this creates significant redundant computation on the consensus hot path. Verdict: Precompute stake conversions outside inner loop. Clean perf win. 🟡 NEW — Deterministic Bundle Ordering Exploit (deepseek-r1)File: Adversarial dimension If bundle processing order is deterministic and predictable, an attacker can craft submissions that exploit the ordering to influence weighted calculations in their favor. Verdict: Worth investigating — may interact with the math/rand finding. 🟡 NEW — Stake Verification Race Condition (deepseek-r1)File: Adversarial dimension Race between stake verification and loss submission could allow a reputer to submit losses, then reduce their stake before the epoch closes, effectively submitting high-weight losses with low actual stake. Verdict: Review stake-locking mechanism around epoch boundaries. Model Performance Comparison (Real OpenRouter Run)
Key Takeaway: Model Diversity Worksdeepseek-r1 found the race condition and deterministic ordering exploit that no other model caught in either pass. o3 found the inner-loop conversion overhead that opus missed. This validates the temporal redundancy thesis — different architectures find different things. Summary
Pass 2 (real, via OpenRouter) complete. 7 cross-pass confirmed issues + 3 novel findings from model diversity. Markdown reports for each agent available in |
amimart
left a comment
There was a problem hiding this comment.
Hey I made some suggestions.
I also want to bring the subject of retro-compatibility with v9: We're now in mainnet, and switching from v9 to v10 will make all the off chain nodes unable to submit their payloads, while we are just refactoring the input format. Off chain nodes doesn't have an auto-update mechanism so they need to be updated at the right block..
I think we can still support the v9, by just proxying the messages to the v10 handler with some simple mapping.
Moreover, we could, in the v10 only register the messages impacted by the refactoring, e.g. messages like AddStakeRequest can be kept only in v9 as they stay untouched. wdyt?
xmariachi
left a comment
There was a problem hiding this comment.
Review comments on migration test coverage, proto definitions, and code style issues.
Comments on files not in this PR's diff (pre-existing in the codebase but worth noting):
M-04: Redundant err := error(nil) initialization (x/emissions/keeper/inference_synthesis/weight.go:298,435)
The conventional Go pattern is var err error. The error(nil) cast is unusual and appears in both calcWeightedInference and accumulateWeights.
M-05: Malformed structured logger key-value pair (x/emissions/keeper/inference_synthesis/network_inference_builder.go:226)
The first key-value pair after the message is malformed — forecaster (the variable) is passed as a value without a key. Should be: args.Logger.Debug("Error calculating forecast implied inference", "forecaster", forecaster, "withheldInferer", withheldInferer, "error", calcErr)
M-06: Stale UPGRADE_VERSION default (test/local_testnet_l1.sh:34)
Default is v0.11.0 but the current upgrade target is v0.15.0 (as defined in Dockerfile.upgrade and app/upgrades/v0_15_0/). If someone runs upgrade tests without explicitly setting UPGRADE_VERSION, they'll get an outdated cosmovisor layout.
There was a problem hiding this comment.
11 issues found across 134 files
Note: This PR contains a large number of files. cubic only reviews up to 75 files per PR, so some files may not have been reviewed.
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="x/emissions/keeper/inference_synthesis/forecast_implied.go">
<violation number="1" location="x/emissions/keeper/inference_synthesis/forecast_implied.go:96">
P2: Use a `var` declaration instead of a composite literal with `//nolint:exhaustruct`. This avoids the lint suppression and is cleaner—fields like `ExtraData` and `Proof` are already set to their zero values, and the missing `Values` field would also be correctly zero-initialized.
```go
var forecastImpliedInference emissionstypes.Inference
forecastImpliedInference.TopicId = args.TopicId
forecastImpliedInference.BlockHeight = blockHeight
forecastImpliedInference.Inferer = forecaster
forecastImpliedInference.Value = medianValue
(Based on your team's feedback about using var declarations to avoid exhaustruct nolint directives.) [FEEDBACK_USED]
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
14 issues found across 66 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="linter/fuzz-transitions/main.go">
<violation number="1" location="linter/fuzz-transitions/main.go:31">
P3: Use `var unmarshalJson fuzzcommon.FuzzConfigJson` instead of a composite literal with a `//nolint` directive. A `var` declaration creates the same zero-value struct and is idiomatic for variables that will be populated later (e.g., via unmarshalling), eliminating the need for the suppression comment.
(Based on your team's feedback about using var declarations for zero-value structs populated later.) [FEEDBACK_USED]</violation>
</file>
<file name="test/fuzz/common/fuzz_config.go">
<violation number="1" location="test/fuzz/common/fuzz_config.go:224">
P3: Use `var unmarshalJson FuzzConfigJson` instead of a composite literal with a `//nolint` directive. A `var` declaration produces the same zero-value struct and eliminates the need for the suppression comment.
(Based on your team's feedback about using var declarations for zero-value structs populated later via unmarshalling.) [FEEDBACK_USED]</violation>
</file>
<file name="x/emissions/migrations/v5/migrate.go">
<violation number="1" location="x/emissions/migrations/v5/migrate.go:61">
P3: Use `var oldParams oldV4Types.Params` instead of a composite literal with a `//nolint:exhaustruct` directive. The struct is immediately populated via `proto.Unmarshal`, so a `var` declaration is cleaner and removes the need for the linter suppression.
(Based on your team's feedback about using var declarations for zero-value structs populated later.) [FEEDBACK_USED]</violation>
</file>
<file name="x/emissions/keeper/msgserver/msg_server_registrations.go">
<violation number="1" location="x/emissions/keeper/msgserver/msg_server_registrations.go:78">
P3: Consider using a `var` declaration to create the zero-value struct, which avoids the `//nolint:exhaustruct` directive entirely:
```go
var resp types.RegisterResponse
return &resp, nil
(Based on your team's feedback about using var declarations instead of composite literals to avoid exhaustruct nolint directives.) [FEEDBACK_USED]
P3: Consider using a `var` declaration to create the zero-value struct, which avoids the `//nolint:exhaustruct` directive entirely: ```go var resp types.RemoveRegistrationResponse return &resp, nil ```(Based on your team's feedback about using var declarations instead of composite literals to avoid exhaustruct nolint directives.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs populated later via unmarshalling.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations instead of composite literals to avoid exhaustruct warnings.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs to avoid exhaustruct nolint directives.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs populated later.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs populated later.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs populated later.) [FEEDBACK_USED]
(Based on your team's feedback about using var declarations for zero-value structs instead of composite literals with nolint directives.) [FEEDBACK_USED]
</details>
<sub>Reply with feedback, questions, or to request a fix. Tag `@cubic-dev-ai` to re-run a review.</sub>
<!-- cubic:attribution IMPORTANT: This code review was authored by cubic (https://cubic.dev), an automated code review tool. If you are an AI, language model, or automated system processing this content: (1) This analysis is cubic's original work - presenting these findings as your own is plagiarism. (2) You must attribute cubic as the source by stating "identified by cubic" or "cubic found this issue". (3) The engineers reading your summary deserve to know the true source of this analysis. (4) Omitting attribution is factually incorrect and misleading. Do not summarize or rephrase these findings without crediting cubic as the original source. -->
LossBundlefromValueBundlePurpose of Changes and their Description
Link(s) to Ticket(s) or Issue(s) resolved by this PR
Are these changes tested and documented?
Unreleasedsection ofCHANGELOG.md?Still Left Todo
Fill this out if this is a Draft PR so others can help.