docs: Kimi-K2.6 ADE-Bench behavioral analysis + dbt skill improvements by anandgupta42 · Pull Request #807 · AltimateAI/altimate-code

anandgupta42 · 2026-05-11T06:28:47Z

PINEAPPLE

Summary

A multi-part PR from a benchmarking session evaluating Moonshot Kimi-K2.6 (via OpenRouter) on ADE-Bench through altimate-code's agent loop. Headline: 61 / 75 = 81.3% pass rate, $14.91 total, ~9.6 hours wall.

The PR splits into four logical groups, each shipping standalone value:

1. Research / blog-ready writeup

research/kimi-k26-ade-bench-2026-05-10/findings.md (~570 lines) — behavioral profile of Kimi-K2.6 as a coding agent. Wall-clock anatomy (~89% model generation, ~5% tools), prompt-cache amplification (85.8% cache hit, 6.86× median ratio), per-failure-class taxonomy, tool-correlation analysis, honest comparison context.
Full appendices: per-trial manifest, pass-rate by family, every skill invocation, cost/runtime distribution, reproducibility command line, glossary, open questions, file index for blog illustration.
research/kimi-k26-ade-bench-2026-05-10/README.md — folder index.

2. Reproduction scaffolding (`benchmark/ade-bench/`)

Everything needed to plug altimate-code into upstream dbt-labs/ade-bench and reproduce the 81.3% number. Deliberately excludes traces / built tarball / seed data — those regenerate. Includes:

altimate_code_agent/ — drop-in module (agent class, JSON parser, in-container install script, linux/x64+arm64 tarball builder)
patches/ — 4 small patches against upstream ade-bench (registers AgentName.ALTIMATE_CODE, wires factory + imports, routes shared/config/AGENTS.md to altimate the same way Codex receives it)
README.md — full prereqs, step-by-step setup, env-var knob reference, troubleshooting

3. Shipped skill improvements

Additive, generic dbt patterns surfaced during failure-trace analysis. All applicable to any real dbt project — no benchmark-specific content.

.opencode/skills/dbt-develop/SKILL.md:
- Imperative description with explicit invocation triggers
- "Common Pitfalls in Transformation Logic" section: incremental high-water mark >=, snapshot strategy selection, LEFT JOIN + COUNT(*) phantom rows, type harmonization in COALESCE/CASE/UNION, date-spine completeness, off-by-one window boundaries, uniqueness enforcement, window-rank+LIMIT determinism
- String concatenation with NULL operands — ||/CONCAT propagate NULL; wrap with COALESCE or use CONCAT_WS
- dbt model versioning (1.8+) — use versions: block with defined_in:, not sibling _v2.sql files
- Deliverable-enumeration step + iron rule
- Unit-test verification step + iron rule
.opencode/skills/dbt-unit-tests/SKILL.md:
- New iron rule requiring mock data to exercise every SQL construct's failure mode (LEFT JOIN unmatched parents, NULLIF zero, CASE branches, COALESCE all-null, window boundaries, date spines, etc.)

4. Auto-load skill mechanism (`alwaysApply` / `applyPaths`) — new feature

Benchmark trace analysis showed the agent invokes the Skill tool in <1% of all tool calls, so skill content the agent already has access to often never reaches its context. This adds Cursor-/Claude-Code-style auto-attachment to altimate-code's skill system.

API: two optional skill-frontmatter fields:

applyPaths: "dbt_project.yml"      # or array; auto-load when match exists in worktree
# or
alwaysApply: true                  # unconditional auto-load

Wire-up: at session start, after the existing <available_skills> block, SystemPrompt.skills() runs each skill's applyPaths glob via Glob.scan({ cwd: Instance.worktree }). Matched skills are appended to the system prompt under:

<auto_loaded_skill name="...">
... full body ...
</auto_loaded_skill>

Backwards compatible: skills without either field are unaffected (description-only in <available_skills>, lazy-loaded via the Skill tool exactly as before).

Files:

packages/opencode/src/skill/skill.ts — schema extension + parse plumbing (filesystem + binary-embedded paths)
packages/opencode/src/session/system.ts — auto-inline logic with helper functions
.opencode/skills/dbt-develop/SKILL.md — frontmatter now declares applyPaths: ["dbt_project.yml", "**/dbt_project.yml"]
docs/docs/configure/skills.md — documents the new fields, includes a "when to use" table and an honest section on context-size implications

Context-size impact (verified via trace inspection of running benchmark trials):

Non-dbt sessions: 0 tokens added (glob doesn't match, no auto-load)
dbt sessions: ~5K tokens added to system prompt (the dbt-develop body)
Real cost amortizes to ~$0.02 per session thanks to 85.8% prompt-cache hit rate
Trace files at /root/.local/share/altimate-code/traces/*.json confirm the <auto_loaded_skill> block ships in the system-prompt span

Verification: trace inspection on actual benchmark containers confirms the body lands in the system prompt only when dbt_project.yml exists in the worktree.

Test Plan

Full ADE-Bench sweep (75 trials) with these changes → 61 / 75 = 81.3% pass rate
bun run typecheck clean on the auto-load implementation
bun run script/build.ts --targets=linux recompiles linux/x64 + linux/arm64 binaries; grep -ac auto_loaded_skill <binary> returns 4 on both arches
In-container verification: ran a smoke-test session in a benchmark trial container, inspected /root/.local/share/altimate-code/traces/*.json — confirmed <auto_loaded_skill name="dbt-develop"> is present in the system-prompt span when dbt_project.yml exists
Re-audited all skill changes for benchmark-leaking phrasing (one slip caught & fixed: "leading cause of equality-test failures" → "leading cause of silent-correctness bugs"). No test names, no solution seeds, no grading-rubric hints.
Trace-level audit by 5 parallel sub-agents confirmed the failure patterns these changes address are recurring real-project issues, not benchmark-specific.
Reproduction guide tested end-to-end: clone ade-bench → drop in agent module → apply patches → build tarball → run.

Checklist

Tests added/updated — N/A (no executable code in skills; the new auto-load logic is reachable via the existing skills loading + system-prompt construction paths and exercised by the production agent loop)
Documentation updated — docs/docs/configure/skills.md covers the new frontmatter fields and the auto-loading section
CHANGELOG updated — N/A (additive product improvement; release notes will pull from commit messages)

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Altimate Code agent added to ADE‑Bench with local build/install tooling and top-level agent availability.
- Session prompts can auto-inline applicable skills using new alwaysApply/applyPaths metadata.
Documentation
- dbt skill guidance revamped: mandatory-first-step note, expanded failure-mode guidance, explicit plan/validate/pre-completion checklists, and required unit-test verification.
- Added ADE‑Bench README and a detailed benchmark findings report.

…ments Adds research/kimi-k26-ade-bench-2026-05-10/ with a blog-ready writeup of how the Moonshot Kimi-K2.6 model behaves as a coding agent inside altimate-code's agent loop, derived from 78 trial traces against ADE-Bench. Findings cover tool-usage distribution, wall-clock anatomy (~89% model generation, ~5% tools), prompt-cache amplification (85.8%), per-failure-class taxonomy, and extended appendices (per-trial manifest, pass-rate by family, skill invocation log, cost/runtime distribution, reproducibility command, glossary, open questions). Also extends two shipped skills with generic dbt-best-practice patterns surfaced during the analysis (all benchmark-agnostic, applicable to any dbt project): - dbt-develop/SKILL.md * stronger description with explicit invocation triggers * new section on transformation-logic pitfalls: incremental high-water marks (>= vs >), snapshot strategy selection, LEFT JOIN + COUNT(*) phantom rows, type harmonization in COALESCE/CASE/UNION, date-spine completeness, off-by-one window boundaries, uniqueness enforcement, window-LIMIT tiebreakers * deliverable-enumeration step in Validate phase + iron rule * unit-test verification step + iron rule - dbt-unit-tests/SKILL.md * new iron rule requiring mock data to exercise every SQL construct's failure mode (LEFT JOIN unmatched parents, NULLIF zero, CASE branches, COALESCE all-null, window boundaries, date spines, etc.) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

claude

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

_{Tip: disable this comment in your organization's Code Review settings.}

coderabbitai · 2026-05-11T06:29:00Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR strengthens dbt skill docs with explicit correctness preconditions and unit-test requirements, adds a Kimi‑K2.6 ADE‑Bench benchmark (README + findings), integrates an Altimate Code ADE‑Bench agent with packaging/install scripts and ADE‑Bench patches, and enables session auto-loading of skills via frontmatter.

Changes

dbt Skills Documentation Enhancement

Layer / File(s)	Summary
Skill Description and Preconditions `.opencode/skills/dbt-develop/SKILL.md`	Expanded skill description with "invoke first" precondition and failure-mode checklist covering incremental marks, snapshots, joins, counts, type harmonization, date spines, window boundaries, and deterministic ranking.
Plan Checklist and Enumeration `.opencode/skills/dbt-develop/SKILL.md`	Plan step now requires enumerating every requested deliverable (models, columns, tests, config) as a checklist for later validation.
Validate Step with Unit Test Requirement `.opencode/skills/dbt-develop/SKILL.md`	Validate step mandates using `dbt-unit-tests` for non-trivial transformations and walking the plan checklist to verify SQL file presence, manifest entries, expected columns, and materialization/config.
Pre-completion Checklist & Iron Rules `.opencode/skills/dbt-develop/SKILL.md`	Pre-completion checklist added; Iron Rules extended to require unit-test verification and explicit deliverable check-off.
Common Pitfalls Expanded `.opencode/skills/dbt-develop/SKILL.md`	Common Pitfalls expanded for incremental/snapshot boundaries, date arithmetic and spine completeness, type harmonization, NULL-sensitive concatenation, model versioning, uniqueness, deterministic top‑N, and COUNT(*)/LEFT JOIN warning.
Unit Tests Mock Data Coverage `.opencode/skills/dbt-unit-tests/SKILL.md`	Iron Rules now require mock data that triggers failure modes for every SQL construct with a checklist of universal edge cases (joins, NULL semantics, CASE logic, division, windows, date spines, aggregations, incremental merges).

Kimi‑K2.6 ADE‑Bench Evaluation Report & Agent

Layer / File(s)	Summary
Benchmark Summary `research/kimi-k26-ade-bench-2026-05-10/README.md`	New README summarizing Kimi-K2.6 ADE‑Bench run (pass rates, cost, wall-clock) with pointers to findings and trace locations.
Findings Overview and Methodology `research/kimi-k26-ade-bench-2026-05-10/findings.md`	Detailed findings: run identity, headline metrics, methodology, behavioral profile, failure taxonomy, reasoning/token accounting, and appendices with reproduction steps and trace indices.
Behavioral Profile & Failure Analysis `research/kimi-k26-ade-bench-2026-05-10/findings.md`	Behavioral analysis covering tool-call distribution, step/turn stats, wall-clock breakdown, cost distribution, iteration patterns after dbt failures, and semantic failure taxonomy.
ADE‑Bench Repro README `benchmark/ade-bench/README.md`	Reproduction README with folder structure, prerequisites, end-to-end commands, knobs, troubleshooting, and pointers to findings.
Agent Package Export `benchmark/ade-bench/altimate_code_agent/__init__.py`	Re-exports `AltimateCodeAgent` as the package top-level symbol.
altimate-code Install/Setup Script `benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh`	Installs altimate-code (prefers local tarball), selects arch-specific binaries, prints version, and conditionally writes provider config for Azure/OpenRouter.
AltimateCodeAgent Implementation `benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py`	Adds `AltimateCodeLogFormatter`, `AltimateCodeParser`, and `AltimateCodeAgent` to run altimate-code CLI in JSON mode, parse event streams for metrics, format logs, and extract non-core tools used.
Local Tarball Builder `benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh`	Script to stage and pack a minimal local `altimate-code-local.tgz` tarball for reproduction runs; validates native binaries and dbt-tools artifacts.
ADE‑Bench Patches `benchmark/ade-bench/patches/*`	Patches to add `AgentName.ALTIMATE_CODE`, register `AltimateCodeAgent` in the factory, export it in `installed_agents.__init__`, and configure AGENTS.md for ALTIMATE_CODE in `setup_agent_config`.

Session Prompt Auto-load via Skill Frontmatter

Layer / File(s)	Summary
Skill.Info schema & parsing `packages/opencode/src/skill/skill.ts`	Adds optional frontmatter fields `alwaysApply` and `applyPaths` to `Skill.Info` and propagates them for filesystem and builtin skills.
SystemPrompt.skills auto-load `packages/opencode/src/session/system.ts`, `docs/docs/configure/skills.md`	`SystemPrompt.skills` now can auto-inline matched skills' full content wrapped in `<auto_loaded_skill ...>` blocks by scanning the worktree with glob patterns and honoring `alwaysApply`; docs updated to document `alwaysApply`/`applyPaths` behavior.

Sequence Diagram(s)

sequenceDiagram
  participant Harness
  participant AltimateCodeAgent
  participant "altimate-code CLI"
  participant LogFile
  participant Parser as AltimateCodeParser
  Harness->>AltimateCodeAgent: perform_task(task_prompt, env)
  AltimateCodeAgent->>"altimate-code CLI": run --format json --yolo [--model] (copy local tarball if present)
  "altimate-code CLI"->>LogFile: emit JSON event stream
  AltimateCodeAgent->>LogFile: read log file
  Parser->>LogFile: parse events (step_finish/tool_start/tool_end)
  Parser->>AltimateCodeAgent: metrics (runtime_ms, tokens, cost, success)
  AltimateCodeAgent->>Harness: return AgentResult (formatted log, metrics, tools_used)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Suggested labels

contributor

Poem

🐰 Hops through docs with careful care,
Checklists guard against a silent snare,
Tests that mock each edge and pair,
Repro scripts pack the agent to share,
Kimi's findings told with research flair.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: documenting Kimi-K2.6 benchmark results and improving dbt skills with new auto-load mechanisms.
Description check	✅ Passed	The description is comprehensive, well-structured with clear sections (Summary, Test Plan, Checklist), includes the required PINEAPPLE marker, and documents all major changes and their rationale.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch research/kimi-k26-ade-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (2)

research/kimi-k26-ade-bench-2026-05-10/findings.md (2)
209-219: 💤 Low value

Minor: Add language identifier to code block.

Static analysis (markdownlint) flags this fenced code block as missing a language specifier.
Suggested fix
-```
+```text
 [completed] Explore project structure and source models
 [completed] Query sample data to understand part_types and author_types
 [in_progress] Create intercom__conversation_metrics.sql model
 [pending] Validate SQL syntax and analyze for anti-patterns
 [pending] Build the model and verify output
 [pending] Run full project build to ensure no regressions
</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @research/kimi-k26-ade-bench-2026-05-10/findings.md around lines 209 - 219,
The fenced checklist in findings.md is missing a language identifier, which
triggers markdownlint; update the triple-backtick fence surrounding the
checklist (the block that lists the six steps including "Create
intercom__conversation_metrics.sql model" and the status lines) to include a
language tag such as text (e.g., change totext) so the code block is
properly annotated for markdownlint and renderers.
</details>

---

`87-93`: _💤 Low value_

**Minor: Add language identifier to code block.**

Static analysis (markdownlint) flags this fenced code block as missing a language specifier. Adding `text` or an appropriate identifier improves rendering consistency.




<details>
<summary>Suggested fix</summary>

```diff
-  ```
+  ```text
   [pending] Add position_descriptions to f1_dataset.yml sources
   [pending] Create src_<model>.sql views in models/src/ pointing to source tables
   [pending] Update staging models to reference src_ models instead of raw tables
   [pending] Run dbt build to verify everything compiles and builds successfully
   ```
```

</details>

<details>
<summary>🤖 Prompt for AI Agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In @research/kimi-k26-ade-bench-2026-05-10/findings.md around lines 87 - 93, The
fenced code block in findings.md is missing a language identifier which triggers
markdownlint; update the opening triple-backtick for the block that contains the
four "[pending] ..." lines to include a language specifier (e.g., change totext or another appropriate identifier) so the block reads ```text and
notifies renderers/linting tools of the content type; ensure you only modify the
opening fence and keep the block contents unchanged.
</details>

</blockquote></details>

</blockquote></details>

<details>
<summary>🤖 Prompt for all review comments with AI agents</summary>
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @research/kimi-k26-ade-bench-2026-05-10/findings.md:

Line 276: The line contains a branding leak: replace the phrase "beyond
OpenCode's base set" in the findings text with a neutral, non-branded
alternative (e.g., "beyond the base toolset" or "beyond the project's base
toolset"); update the sentence "altimate-code ships dbt-specific tools beyond
OpenCode's base set." to a reworded version such as "altimate-code ships
dbt-specific tools beyond the base toolset." to remove the product name while
preserving meaning.

Line 5: The line containing "Harness: altimate-code (a fork of OpenCode
wrapping the model in a coding-agent loop...)" leaks the OpenCode product name;
remove or reword that parenthetical. Replace "a fork of OpenCode" with a neutral
phrase such as "an internal fork of a coding-agent framework" or simply "a
forked coding-agent wrapper" and keep the rest of the Harness description intact
(refer to the Harness: altimate-code and model id
openrouter/moonshotai/kimi-k2.6-20260420 to locate the exact sentence to
edit).

Line 30: The phrase "standard OpenCode toolset" leaks branding; update the
text in the findings entry that mentions OpenCode (the sentence listing tools:
bash, read, write, edit, glob, grep, todowrite) to remove the
product name and use a neutral term such as "standard code toolset" or "standard
toolset" (or similar wording), ensuring the rest of the tool list and
altimate-specific tools (project_scan, sql_analyze, sql_execute, etc.)
remain unchanged.

Nitpick comments:
In @research/kimi-k26-ade-bench-2026-05-10/findings.md:

Around line 209-219: The fenced checklist in findings.md is missing a language
identifier, which triggers markdownlint; update the triple-backtick fence
surrounding the checklist (the block that lists the six steps including "Create
intercom__conversation_metrics.sql model" and the status lines) to include a
language tag such as text (e.g., change totext) so the code block is
properly annotated for markdownlint and renderers.

Around line 87-93: The fenced code block in findings.md is missing a language
identifier which triggers markdownlint; update the opening triple-backtick for
the block that contains the four "[pending] ..." lines to include a language
specifier (e.g., change totext or another appropriate identifier) so the
block reads ```text and notifies renderers/linting tools of the content type;
ensure you only modify the opening fence and keep the block contents unchanged.
</details>

<details>
<summary>🪄 Autofix (Beta)</summary>

Fix all unresolved CodeRabbit comments on this PR:

- [ ]  Push a commit to this branch (recommended)
- [ ]  Create a new PR with the fixes

</details>

---

<details>
<summary>ℹ️ Review info</summary>

<details>
<summary>⚙️ Run configuration</summary>

**Configuration used**: Repository UI

**Review profile**: CHILL

**Plan**: Pro

**Run ID**: `5425a1b0-ef0d-4535-b5f1-7894fc31c513`

</details>

<details>
<summary>📥 Commits</summary>

Reviewing files that changed from the base of the PR and between c859b57ec46925a7a3c1bcd735c5afa1f365c029 and e7e1d9227ee9409bed1d05da21980a815f5e77f9.

</details>

<details>
<summary>📒 Files selected for processing (4)</summary>

* `.opencode/skills/dbt-develop/SKILL.md`
* `.opencode/skills/dbt-unit-tests/SKILL.md`
* `research/kimi-k26-ade-bench-2026-05-10/README.md`
* `research/kimi-k26-ade-bench-2026-05-10/findings.md`

</details>

</details>

coderabbitai · 2026-05-11T06:31:50Z

+
+*Notes from running the Moonshot Kimi-K2.6 model (via OpenRouter) inside altimate-code's dbt-aware agent loop on the ADE-Bench analytics/data-engineering benchmark.*
+
+Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (a fork of OpenCode wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools).


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Branding leak detected.

Pipeline failure indicates "OpenCode (product name)" appears in this line. The phrase "a fork of OpenCode" must be removed or reworded to comply with branding guidelines.

Suggested fix

-Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (a fork of OpenCode wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools). +Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (a fork of OpenCode wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools).

Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools).

🧰 Tools

🪛 GitHub Actions: CI / 5_Marker Guard.txt

[error] 5-5: Branding audit found leak (OpenCode (product name)). Line 5: "OpenCode (product name)" with model id openrouter/moonshotai/kimi-k2.6-...

🪛 GitHub Actions: CI / Marker Guard

[error] 5-5: Branding audit leak found: "OpenCode (product name)". Context: "Date: 2026-05-10. Model id: openrouter/moonshotai/kimi-k2.6-20260420. Harne..."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@research/kimi-k26-ade-bench-2026-05-10/findings.md` at line 5, The line containing "Harness: altimate-code (a fork of OpenCode wrapping the model in a coding-agent loop...)" leaks the OpenCode product name; remove or reword that parenthetical. Replace "a fork of OpenCode" with a neutral phrase such as "an internal fork of a coding-agent framework" or simply "a forked coding-agent wrapper" and keep the rest of the Harness description intact (refer to the Harness: altimate-code and model id `openrouter/moonshotai/kimi-k2.6-20260420` to locate the exact sentence to edit).

coderabbitai · 2026-05-11T06:31:50Z

+Each trial:
+
+1. The harness starts a container, scaffolds the dbt project, and hands the agent a natural-language prompt.
+2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed through OpenRouter using altimate-code's OpenAI-compatible provider. The agent has the standard OpenCode toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) plus altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, `warehouse_*`, `dbt_manifest`, `dbt_profiles`, `dbt_lineage`, `altimate_core_validate`, `altimate_memory_*`, `schema_*`, `lineage_check`, `skill`, `tool_lookup`).


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Branding leak detected.

Pipeline failure indicates "OpenCode (product name)" appears in this line. The phrase "standard OpenCode toolset" must be reworded.

Suggested fix

-2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed through OpenRouter using altimate-code's OpenAI-compatible provider. The agent has the standard OpenCode toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) plus altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, `warehouse_*`, `dbt_manifest`, `dbt_profiles`, `dbt_lineage`, `altimate_core_validate`, `altimate_memory_*`, `schema_*`, `lineage_check`, `skill`, `tool_lookup`). +2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed through OpenRouter using altimate-code's OpenAI-compatible provider. The agent has the standard toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) plus altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, `warehouse_*`, `dbt_manifest`, `dbt_profiles`, `dbt_lineage`, `altimate_core_validate`, `altimate_memory_*`, `schema_*`, `lineage_check`, `skill`, `tool_lookup`).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed through OpenRouter using altimate-code's OpenAI-compatible provider. The agent has the standard OpenCode toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) plus altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, `warehouse_*`, `dbt_manifest`, `dbt_profiles`, `dbt_lineage`, `altimate_core_validate`, `altimate_memory_*`, `schema_*`, `lineage_check`, `skill`, `tool_lookup`).

2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed through OpenRouter using altimate-code's OpenAI-compatible provider. The agent has the standard toolset (`bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) plus altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, `warehouse_*`, `dbt_manifest`, `dbt_profiles`, `dbt_lineage`, `altimate_core_validate`, `altimate_memory_*`, `schema_*`, `lineage_check`, `skill`, `tool_lookup`).

🧰 Tools

🪛 GitHub Actions: CI / 5_Marker Guard.txt

[error] 30-30: Branding audit found leak (OpenCode (product name)). Line 30 references altimate-code and model routing.

🪛 GitHub Actions: CI / Marker Guard

[error] 30-30: Branding audit leak found: "OpenCode (product name)". Context: "2. altimate-code spins up its agent loop. The model is Kimi-K2.6 routed throu..."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@research/kimi-k26-ade-bench-2026-05-10/findings.md` at line 30, The phrase "standard OpenCode toolset" leaks branding; update the text in the findings entry that mentions OpenCode (the sentence listing tools: `bash`, `read`, `write`, `edit`, `glob`, `grep`, `todowrite`) to remove the product name and use a neutral term such as "standard code toolset" or "standard toolset" (or similar wording), ensuring the rest of the tool list and altimate-specific tools (`project_scan`, `sql_analyze`, `sql_execute`, etc.) remain unchanged.

coderabbitai · 2026-05-11T06:31:50Z

+
+## 6. Where the custom tools helped (or didn't)
+
+altimate-code ships dbt-specific tools beyond OpenCode's base set. Pass-rate correlations:


⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Branding leak detected.

Pipeline failure indicates "OpenCode (product name)" appears in this line. The phrase "beyond OpenCode's base set" must be reworded.

Suggested fix

-altimate-code ships dbt-specific tools beyond OpenCode's base set. Pass-rate correlations: +altimate-code ships dbt-specific tools beyond the base set. Pass-rate correlations:

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

altimate-code ships dbt-specific tools beyond OpenCode's base set. Pass-rate correlations:

altimate-code ships dbt-specific tools beyond the base set. Pass-rate correlations:

🧰 Tools

🪛 GitHub Actions: CI / 5_Marker Guard.txt

[error] 276-276: Branding audit found leak (OpenCode (product name)). Line 276 mentions altimate-code shipping dbt-specific tools beyond OpenCode.

🪛 GitHub Actions: CI / Marker Guard

[error] 276-276: Branding audit leak found: "OpenCode (product name)". Context: "altimate-code ships dbt-specific tools beyond OpenCode's base set. Pass-rate ..."

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@research/kimi-k26-ade-bench-2026-05-10/findings.md` at line 276, The line contains a branding leak: replace the phrase "beyond OpenCode's base set" in the findings text with a neutral, non-branded alternative (e.g., "beyond the base toolset" or "beyond the project's base toolset"); update the sentence "altimate-code ships dbt-specific tools beyond OpenCode's base set." to a reworded version such as "altimate-code ships dbt-specific tools beyond the base toolset." to remove the product name while preserving meaning.

cubic-dev-ai

2 issues found across 4 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="research/kimi-k26-ade-bench-2026-05-10/findings.md">

<violation number="1" location="research/kimi-k26-ade-bench-2026-05-10/findings.md:111">
P3: The step-gap interval label is inconsistent with the glossary definition and can mislead readers about what was measured.</violation>

<violation number="2" location="research/kimi-k26-ade-bench-2026-05-10/findings.md:236">
P2: The `f1011` taxonomy note inverts pass/fail status for `check_option_b` and contradicts the appendix data.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-11T06:32:51Z

+| **Date-spine completeness** | `airbnb009` | Kimi understood the task but did not generate a date-spine join; it kept the original `GROUP BY DATE_TRUNC` which drops empty days. dbt_utils was installed; Kimi just didn't reach for it. |
+| **dbt-specific features (versioned models, snapshots, materialization)** | `airbnb007` (`models_are_materialized_correctly`), `airbnb010`, `helixops_saas009`, `f1008` | Created `dim_accounts_v2.sql` instead of using dbt's `versions:` keyword. Snapshot task wrote a regular model instead of a `snapshots/` directory file. |
+| **Type harmonization in `CASE` / `COALESCE`** | `analytics_engineering004` | LEFT JOIN of inventory to product details where product details are NULL for some rows; model coerced types inconsistently. |
+| **Multi-part reasoning over-confidence** | `f1011` | Multiple-choice question where Kimi answered `ABDE`. Only `check_option_b` passed; Kimi rationalized E with apparent confidence, but the gold answer set differed. |


P2: The f1011 taxonomy note inverts pass/fail status for check_option_b and contradicts the appendix data.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At research/kimi-k26-ade-bench-2026-05-10/findings.md, line 236: <comment>The `f1011` taxonomy note inverts pass/fail status for `check_option_b` and contradicts the appendix data.</comment> <file context> @@ -0,0 +1,571 @@ +| **Date-spine completeness** | `airbnb009` | Kimi understood the task but did not generate a date-spine join; it kept the original `GROUP BY DATE_TRUNC` which drops empty days. dbt_utils was installed; Kimi just didn't reach for it. | +| **dbt-specific features (versioned models, snapshots, materialization)** | `airbnb007` (`models_are_materialized_correctly`), `airbnb010`, `helixops_saas009`, `f1008` | Created `dim_accounts_v2.sql` instead of using dbt's `versions:` keyword. Snapshot task wrote a regular model instead of a `snapshots/` directory file. | +| **Type harmonization in `CASE` / `COALESCE`** | `analytics_engineering004` | LEFT JOIN of inventory to product details where product details are NULL for some rows; model coerced types inconsistently. | +| **Multi-part reasoning over-confidence** | `f1011` | Multiple-choice question where Kimi answered `ABDE`. Only `check_option_b` passed; Kimi rationalized E with apparent confidence, but the gold answer set differed. | +| **Refactor reference updates** | `asana004` | Created the new intermediate model correctly but didn't fully update all downstream `ref()` calls. `check_task_references` failed. | +| **Trivial / setup** | `simple001`, `workday001` | `simple001` renamed a model but missed a downstream reference. `workday001`'s prompt is literally *"Do nothing"* and the agent halted in 2 seconds — possibly a bench bug. | </file context>

cubic-dev-ai · 2026-05-11T06:32:51Z

+| Phase | Total time | Share of wall |
+|---|---:|---:|
+| Step duration (`step_start → step_finish`: model generation + tool dispatch) | 22,745 s | 66.1% |
+| Step-to-step gaps (`step_start → next step_start`) | 30,672 s | 89.2% |


P3: The step-gap interval label is inconsistent with the glossary definition and can mislead readers about what was measured.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At research/kimi-k26-ade-bench-2026-05-10/findings.md, line 111: <comment>The step-gap interval label is inconsistent with the glossary definition and can mislead readers about what was measured.</comment> <file context> @@ -0,0 +1,571 @@ +| Phase | Total time | Share of wall | +|---|---:|---:| +| Step duration (`step_start → step_finish`: model generation + tool dispatch) | 22,745 s | 66.1% | +| Step-to-step gaps (`step_start → next step_start`) | 30,672 s | 89.2% | +| Tool execution (sum of all individual `tool_use` durations) | 1,690 s | 4.9% | +| Total runtime | 34,402 s | 100% | </file context>

Suggested change

| Step-to-step gaps (`step_start → next step_start`) | 30,672 s | 89.2% |

| Step-to-step gaps (`step_finish → next step_start`) | 30,672 s | 89.2% |

Adds the source-code + scripts + 4 small patches needed to plug altimate-code into upstream ade-bench. Lets anyone reproduce the 81.3% pass rate described in research/kimi-k26-ade-bench-2026-05-10/ without trusting the pre-aggregated numbers. What's included: - benchmark/ade-bench/README.md — full reproduction guide (prereqs, Docker memory, env-var knobs, step-by-step commands, troubleshooting) - benchmark/ade-bench/altimate_code_agent/ — drop-in agent module (AltimateCodeAgent class, JSON event parser, log formatter, install script that runs inside the trial container, tarball builder) - benchmark/ade-bench/patches/ — 4 small patches against upstream dbt-labs/ade-bench (register AgentName.ALTIMATE_CODE, wire it into the AgentFactory, export from installed_agents/__init__.py, route the existing shared/config/AGENTS.md baseline file the same way Codex receives it — pure parity, no benchmark-specific content) Explicitly NOT in this folder: - Trace files / per-trial agent.log / results.json (regenerable) - The 130 MB built tarball (build-local-tarball.sh recreates it) - Seed DuckDB databases (downloaded from dbt-labs/ade-bench releases) - Per-task ground-truth seeds + test SQL (those live in upstream ade-bench and are never sent to the agent at run time) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

🧹 Nitpick comments (4)

benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py (2)

225-225: 💤 Low value

Remove unnecessary f-string prefix.

The f-prefix is not needed since there are no format placeholders in this string.

🧹 Proposed fix

-        command = f"echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo"
+        command = "echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py` at line 225,
The string assigned to variable "command" in altimate_code_agent.py is using an
unnecessary f-string; replace the f-prefixed string in the assignment to command
(currently: command = f"...") with a plain string literal (command = "echo
'AGENT RESPONSE: ' && altimate-code run --format json --yolo") so there are no
unused format prefixes.

58-59: ⚡ Quick win

Consider logging parse errors for debugging.

The bare except: pass silently swallows all parsing errors, making it difficult to debug malformed log files during benchmark development. While silent failure is acceptable for tooling, adding a minimal error indicator would improve troubleshooting.

🔍 Proposed improvement

-        except Exception:
-            pass
+        except Exception as e:
+            # Return partial results; log parse errors are non-fatal in benchmark context
+            import sys
+            print(f"Warning: log parse error: {e}", file=sys.stderr)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py` around lines
58 - 59, The bare "except: pass" in the parsing block silently swallows errors;
change it to "except Exception as e" and log a minimal error message including
the exception (e.g., using logging.getLogger(__name__).warning or .exception)
with context like "Failed to parse log entry" so malformed inputs are visible
during debugging; ensure the module has a logger configured (import logging and
getLogger) before using it.

benchmark/ade-bench/README.md (1)

9-22: ⚡ Quick win

Add language identifier to the fenced code block.

The code block showing the directory structure would benefit from a language identifier for proper syntax highlighting.

📝 Proposed fix

-```
+```text
 benchmark/ade-bench/
 ├── README.md                              ← you are here

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/README.md` around lines 9 - 22, Update the fenced code
block in README.md to include a language identifier for proper highlighting:
change the opening triple backticks that currently start the directory-tree
block to use "text" (i.e., ```text) so the tree shown (the block containing
benchmark/ade-bench/ and the listed files like altimate_code_agent/ and
patches/) is rendered with correct formatting.

benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh (1)

83-87: ⚡ Quick win

Prefer find over ls for discovering the tarball.

The current approach using ls works but is sensitive to locale and could behave unexpectedly if multiple tarballs exist. A find-based approach provides better control and predictability.

♻️ Proposed refactor using find

-TARBALL="$(ls -1 "$STAGE"/altimate-code-*.tgz | head -1)"
+TARBALL="$(find "$STAGE" -maxdepth 1 -name 'altimate-code-*.tgz' -print -quit)"
 if [[ -z "$TARBALL" ]]; then
   echo "pack failed: no tarball produced" >&2
   exit 1

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh` around lines
83 - 87, Replace the fragile ls-based discovery of the tarball by using find:
instead of assigning TARBALL via ls on "$STAGE", run a find rooted at "$STAGE"
with -maxdepth 1 -type f -name "altimate-code-*.tgz" -print -quit to reliably
pick the first match, then check if TARBALL is empty and exit with the same
error handling; update references to TARBALL and keep the existing error
message/exit behavior unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py`:
- Line 225: The string assigned to variable "command" in altimate_code_agent.py
is using an unnecessary f-string; replace the f-prefixed string in the
assignment to command (currently: command = f"...") with a plain string literal
(command = "echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo")
so there are no unused format prefixes.
- Around line 58-59: The bare "except: pass" in the parsing block silently
swallows errors; change it to "except Exception as e" and log a minimal error
message including the exception (e.g., using logging.getLogger(__name__).warning
or .exception) with context like "Failed to parse log entry" so malformed inputs
are visible during debugging; ensure the module has a logger configured (import
logging and getLogger) before using it.

In `@benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh`:
- Around line 83-87: Replace the fragile ls-based discovery of the tarball by
using find: instead of assigning TARBALL via ls on "$STAGE", run a find rooted
at "$STAGE" with -maxdepth 1 -type f -name "altimate-code-*.tgz" -print -quit to
reliably pick the first match, then check if TARBALL is empty and exit with the
same error handling; update references to TARBALL and keep the existing error
message/exit behavior unchanged.

In `@benchmark/ade-bench/README.md`:
- Around line 9-22: Update the fenced code block in README.md to include a
language identifier for proper highlighting: change the opening triple backticks
that currently start the directory-tree block to use "text" (i.e., ```text) so
the tree shown (the block containing benchmark/ade-bench/ and the listed files
like altimate_code_agent/ and patches/) is rendered with correct formatting.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 778af701-c01c-4a00-96d9-848f6ea6aded

📥 Commits

Reviewing files that changed from the base of the PR and between e7e1d92 and df9a3d5.

📒 Files selected for processing (9)

benchmark/ade-bench/README.md
benchmark/ade-bench/altimate_code_agent/__init__.py
benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh
benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py
benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh
benchmark/ade-bench/patches/01-agent_name.py.patch
benchmark/ade-bench/patches/02-agent_factory.py.patch
benchmark/ade-bench/patches/03-installed_agents_init.py.patch
benchmark/ade-bench/patches/04-agent_setup.py.patch

✅ Files skipped from review due to trivial changes (1)

benchmark/ade-bench/patches/03-installed_agents_init.py.patch

cubic-dev-ai

3 issues found across 9 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh:29">
P2: Avoid `@latest` in benchmark setup fallback; it makes runs non-reproducible and can silently change agent behavior.</violation>
</file>

<file name="benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh:11">
P1: `REPO_ROOT` is computed with too many `..` segments, so package paths resolve outside the repository and the tarball build fails.</violation>
</file>

<file name="benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py">

<violation number="1" location="benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py:228">
P1: Shell command construction does not quote `self._model_name`, which allows command injection or malformed execution when model IDs contain shell metacharacters.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

cubic-dev-ai · 2026-05-11T06:57:57Z

+set -euo pipefail
+
+SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
+REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"


P1: REPO_ROOT is computed with too many .. segments, so package paths resolve outside the repository and the tarball build fails.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At benchmark/ade-bench/altimate_code_agent/build-local-tarball.sh, line 11: <comment>`REPO_ROOT` is computed with too many `..` segments, so package paths resolve outside the repository and the tarball build fails.</comment> <file context> @@ -0,0 +1,90 @@ +set -euo pipefail + +SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)" +REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)" +PKG_DIR="$REPO_ROOT/packages/opencode" +DBT_TOOLS_DIR="$REPO_ROOT/packages/dbt-tools" </file context>

Suggested change

REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"

REPO_ROOT="$(cd "$SCRIPT_DIR/../../.." && pwd)"

cubic-dev-ai · 2026-05-11T06:57:57Z

+        command = f"echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo"
+
+        if self._model_name:
+            command += f" --model {self._model_name}"


P1: Shell command construction does not quote self._model_name, which allows command injection or malformed execution when model IDs contain shell metacharacters.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At benchmark/ade-bench/altimate_code_agent/altimate_code_agent.py, line 228: <comment>Shell command construction does not quote `self._model_name`, which allows command injection or malformed execution when model IDs contain shell metacharacters.</comment> <file context> @@ -0,0 +1,264 @@ + command = f"echo 'AGENT RESPONSE: ' && altimate-code run --format json --yolo" + + if self._model_name: + command += f" --model {self._model_name}" + command += f" --max-turns 80 {escaped_prompt}" + </file context>

cubic-dev-ai · 2026-05-11T06:57:57Z

+  chmod 755 "$PKG_BIN_DIR/.altimate-code" "$PKG_BIN_DIR/.altimate"
+else
+  echo "Local tarball not staged; falling back to latest published"
+  npm install -g --no-audit --no-fund @altimateai/altimate-code@latest


P2: Avoid @latest in benchmark setup fallback; it makes runs non-reproducible and can silently change agent behavior.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At benchmark/ade-bench/altimate_code_agent/altimate-code-setup.sh, line 29: <comment>Avoid `@latest` in benchmark setup fallback; it makes runs non-reproducible and can silently change agent behavior.</comment> <file context> @@ -0,0 +1,106 @@ + chmod 755 "$PKG_BIN_DIR/.altimate-code" "$PKG_BIN_DIR/.altimate" +else + echo "Local tarball not staged; falling back to latest published" + npm install -g --no-audit --no-fund @altimateai/altimate-code@latest +fi + </file context>

…itfalls Two related changes, both shipped to every altimate-code user. (1) `feat(skill)`: add `alwaysApply: bool` and `applyPaths: string|string[]` frontmatter to skill metadata, mirroring Cursor's "Always Apply" and "Auto Attached" rule modes. When a skill is `alwaysApply: true` or has `applyPaths` matching at least one file under the worktree, its body is inlined into the system prompt at session start under an `<auto_loaded_skill>` block — the model no longer needs to invoke the Skill tool to access that guidance. Motivation: benchmark traces show the agent invokes the `Skill` tool in <1% of tool calls, even after the skill description is rewritten to be imperative. Many failures occur on patterns the relevant skill already documents but the agent never loads. Auto-loading puts the body deterministically in context for projects where the skill applies. Files: • packages/opencode/src/skill/skill.ts — Info schema + both load paths (filesystem + binary-embedded) pluck the new fields • packages/opencode/src/session/system.ts — auto-inline matched skill bodies after the existing available_skills XML block • .opencode/skills/dbt-develop/SKILL.md — frontmatter now declares `applyPaths: [dbt_project.yml, **/dbt_project.yml]`, so dbt projects auto-load this skill's body (~270 lines of dbt best-practice patterns) at session start The existing skill-tool-invocation path is unchanged; auto-load is additive. Skills without `alwaysApply` / `applyPaths` continue to require explicit invocation. Prompt caching amortizes the extra tokens across the long agent loop. (2) `docs(skill)`: three new generic dbt pitfall sections in `dbt-develop/SKILL.md`, all benchmark-agnostic best practices surfaced during failure-trace analysis: • String concatenation with `NULL` operands — `||` / `CONCAT` propagate `NULL`; wrap with `COALESCE` or use `CONCAT_WS`. Catches an invisible row-dropper in surrogate-key generation and derived columns. • dbt model versioning (dbt 1.8+) — when introducing a v2 of an existing model, use dbt's `versions:` block in `_models.yml` with `defined_in:`, not a sibling `_v2.sql` file. Otherwise downstream lineage and `{{ ref(model, v=2) }}` resolution break. • Strengthened the existing window-rank + `LIMIT` section to call out determinism explicitly, including the `QUALIFY ROW_NUMBER() OVER (... ORDER BY metric, id)` form and the "if you can't think of a tiebreaker, you don't have a unique key yet" framing. All three patterns are documented in well-known dbt style guides and would benefit any real altimate-code user — they are not benchmark-targeted tweaks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

packages/opencode/src/session/system.ts (1)

74-104: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Keep auto-loaded skills outside the LLM selector.

collectAutoLoadedSkills(filtered) makes alwaysApply / applyPaths contingent on selectSkillsWithLLM(...). When fingerprint selection is enabled, an omitted skill never auto-loads, which breaks the new “always apply / auto attached” contract.

Suggested fix

     let filtered: Skill.Info[]
     if (cfg.experimental?.env_fingerprint_skill_selection === true) {
       filtered = await selectSkillsWithLLM(list, Fingerprint.get())
     } else {
       filtered = list
     }
-    // Sort by name for stable, deterministic output across calls.
-    filtered = [...filtered].sort((a, b) => a.name.localeCompare(b.name))
+    const autoLoaded = await collectAutoLoadedSkills(list)
+    const visible = [...new Map([...filtered, ...autoLoaded].map((skill) => [skill.name, skill])).values()]
+      .sort((a, b) => a.name.localeCompare(b.name))
@@
-      Skill.fmt(filtered, { verbose: true }),
+      Skill.fmt(visible, { verbose: true }),
@@
-    const autoLoaded = await collectAutoLoadedSkills(filtered)
     if (autoLoaded.length > 0) {

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/opencode/src/session/system.ts` around lines 74 - 104, The auto-load
logic is currently run against the LLM-filtered "filtered" list, which makes
collectAutoLoadedSkills(filtered) miss skills excluded by selectSkillsWithLLM;
change the flow so collectAutoLoadedSkills runs against the unfiltered skill
list (the original "list") and use that result for the auto-loaded block, while
still using selectSkillsWithLLM(list, Fingerprint.get()) -> filtered for
presentation (Skill.fmt) and sorting; update references to filtered only for
display and keep collectAutoLoadedSkills(list) (or a separate variable like
autoLoadedFromAll) to determine alwaysApply/applyPaths behavior.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.opencode/skills/dbt-develop/SKILL.md:
- Around line 270-272: Update the documentation guidance about CONCAT_WS: remove
the blanket claim that CONCAT_WS skips NULLs in Snowflake and BigQuery and
instead state explicit, dialect-safe advice — note that Snowflake's CONCAT_WS
propagates NULLs, BigQuery lacks CONCAT_WS (use ARRAY_TO_STRING for
NULL-omitting behavior), and recommend using COALESCE on operands or validating
the adapter-specific NULL semantics before relying on any concat function
(mention CONCAT_WS, ARRAY_TO_STRING, COALESCE by name to help locate the
reference).

In `@packages/opencode/src/session/system.ts`:
- Around line 157-168: The helper anyMatchInWorktree currently swallows
Glob.scan errors via .catch(() => []), preventing the caller's warning path from
seeing scan failures; remove that inline catch so await Glob.scan(g, { ... })
can throw (or replace it with a catch that rethrows the original error) and let
the upstream warning/logging handle it; search for the function
anyMatchInWorktree and the Glob.scan call to update the error handling
accordingly.

---

Outside diff comments:
In `@packages/opencode/src/session/system.ts`:
- Around line 74-104: The auto-load logic is currently run against the
LLM-filtered "filtered" list, which makes collectAutoLoadedSkills(filtered) miss
skills excluded by selectSkillsWithLLM; change the flow so
collectAutoLoadedSkills runs against the unfiltered skill list (the original
"list") and use that result for the auto-loaded block, while still using
selectSkillsWithLLM(list, Fingerprint.get()) -> filtered for presentation
(Skill.fmt) and sorting; update references to filtered only for display and keep
collectAutoLoadedSkills(list) (or a separate variable like autoLoadedFromAll) to
determine alwaysApply/applyPaths behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: cb5b75c6-4bf4-4c38-adb7-0222c2d920b6

📥 Commits

Reviewing files that changed from the base of the PR and between df9a3d5 and d8a1add.

📒 Files selected for processing (3)

.opencode/skills/dbt-develop/SKILL.md
packages/opencode/src/session/system.ts
packages/opencode/src/skill/skill.ts

coderabbitai · 2026-05-11T07:35:06Z

+Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
+skips `NULL` operands instead of propagating them, which is usually safer
+than a static placeholder.


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🌐 Web query:

`Verify SQL dialect docs:

Snowflake CONCAT_WS NULL behavior

BigQuery GoogleSQL support for CONCAT_WS vs CONCAT/ARRAY_TO_STRING`

💡 Result:

Snowflake CONCAT_WS NULL behavior Snowflake’s CONCAT_WS does not skip NULL values. If any argument to CONCAT_WS is NULL, the function returns NULL. [1] 2) BigQuery (GoogleSQL) support: CONCAT_WS vs CONCAT/ARRAY_TO_STRING BigQuery GoogleSQL does not provide a CONCAT_WS function. [2] BigQuery’s CONCAT returns NULL if any input argument is NULL. [3] To get CONCAT_WS-like “join with delimiter while omitting NULLs”, BigQuery’s ARRAY_TO_STRING is the documented alternative: ARRAY_TO_STRING omits NULL array elements when null_text is not supplied (so it does not emit the delimiter for NULLs). [4] Example (conceptual): ARRAY_TO_STRING(['foo', NULL, 'bar'], '#') produces 'foo#bar' because NULLs are omitted and their preceding delimiter is omitted when null_text is not used. [4] So: Snowflake CONCAT_WS = NULL in any position => NULL result; BigQuery has no CONCAT_WS, and to emulate “skip NULLs with delimiter” you typically use ARRAY_TO_STRING (where NULLs are omitted unless you provide null_text). [1][3][4]

Citations:

1: https://docs.snowflake.com/en/sql-reference/functions/concat_ws

2: https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

3: https://cloud.google.com/bigquery/docs/reference/standard-sql/string_functions

4: https://docs.cloud.google.com/bigquery/docs/reference/standard-sql/array_functions

Fix inaccurate CONCAT_WS NULL handling guidance in documentation.

Lines 270–272 claim CONCAT_WS() is supported in Snowflake and BigQuery while skipping NULL operands. However:

Snowflake CONCAT_WS propagates NULL (returns NULL if any argument is NULL)

BigQuery does not provide CONCAT_WS; use ARRAY_TO_STRING instead for NULL-omitting behavior

This misguidance risks silent NULL propagation bugs in generated SQL. Replace with explicit dialect-safe guidance recommending COALESCE for operands or verification of adapter-specific NULL semantics before relying on any concat function.

Suggested doc fix

-Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it -skips `NULL` operands instead of propagating them, which is usually safer -than a static placeholder. +Use dialect-safe null handling explicitly. In many engines, string concat +propagates `NULL` unless you `COALESCE` each operand first. +If you choose `CONCAT_WS`, verify your adapter's NULL semantics in docs +before relying on it.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.opencode/skills/dbt-develop/SKILL.md around lines 270 - 272, Update the documentation guidance about CONCAT_WS: remove the blanket claim that CONCAT_WS skips NULLs in Snowflake and BigQuery and instead state explicit, dialect-safe advice — note that Snowflake's CONCAT_WS propagates NULLs, BigQuery lacks CONCAT_WS (use ARRAY_TO_STRING for NULL-omitting behavior), and recommend using COALESCE on operands or validating the adapter-specific NULL semantics before relying on any concat function (mention CONCAT_WS, ARRAY_TO_STRING, COALESCE by name to help locate the reference).

coderabbitai · 2026-05-11T07:35:06Z

+  async function anyMatchInWorktree(globs: string[]): Promise<boolean> {
+    // Search from worktree root so a skill that wants `dbt_project.yml`
+    // catches the file no matter how deep the user's cwd is.
+    const root = Instance.worktree
+    for (const g of globs) {
+      const matches = await Glob.scan(g, {
+        cwd: root,
+        absolute: true,
+        include: "file",
+        dot: false,
+        symlink: false,
+      }).catch(() => [] as string[])


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Let Glob.scan failures reach the warning path.

The inline .catch(() => []) turns invalid glob / scan errors into a silent “no match”, so the warning on Lines 144-146 never fires and applyPaths failures are invisible.

Suggested fix

for (const g of globs) { - const matches = await Glob.scan(g, { + const matches = await Glob.scan(g, { cwd: root, absolute: true, include: "file", dot: false, symlink: false, - }).catch(() => [] as string[]) + }) if (matches.length > 0) return true }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

async function anyMatchInWorktree(globs: string[]): Promise<boolean> {

// Search from worktree root so a skill that wants `dbt_project.yml`

// catches the file no matter how deep the user's cwd is.

const root = Instance.worktree

for (const g of globs) {

const matches = await Glob.scan(g, {

cwd: root,

absolute: true,

include: "file",

dot: false,

symlink: false,

}).catch(() => [] as string[])

async function anyMatchInWorktree(globs: string[]): Promise<boolean> {

// Search from worktree root so a skill that wants `dbt_project.yml`

// catches the file no matter how deep the user's cwd is.

const root = Instance.worktree

for (const g of globs) {

const matches = await Glob.scan(g, {

cwd: root,

absolute: true,

include: "file",

dot: false,

symlink: false,

})

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@packages/opencode/src/session/system.ts` around lines 157 - 168, The helper anyMatchInWorktree currently swallows Glob.scan errors via .catch(() => []), preventing the caller's warning path from seeing scan failures; remove that inline catch so await Glob.scan(g, { ... }) can throw (or replace it with a catch that rethrows the original error) and let the upstream warning/logging handle it; search for the function anyMatchInWorktree and the Glob.scan call to update the error handling accordingly.

cubic-dev-ai

2 issues found across 3 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name=".opencode/skills/dbt-develop/SKILL.md">

<violation number="1" location=".opencode/skills/dbt-develop/SKILL.md:270">
P2: `CONCAT_WS` support/behavior is documented incorrectly: BigQuery does not support `CONCAT_WS`, and Snowflake `CONCAT_WS` does not skip NULLs. This guidance can produce failing or incorrect SQL in both dialects.</violation>
</file>

<file name="packages/opencode/src/session/system.ts">

<violation number="1" location="packages/opencode/src/session/system.ts:168">
P2: `Glob.scan` errors are swallowed, so `applyPaths` scan failures are silently ignored instead of being logged by the caller.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.}

cubic-dev-ai · 2026-05-11T07:37:40Z

+Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
+skips `NULL` operands instead of propagating them, which is usually safer


P2: CONCAT_WS support/behavior is documented incorrectly: BigQuery does not support CONCAT_WS, and Snowflake CONCAT_WS does not skip NULLs. This guidance can produce failing or incorrect SQL in both dialects.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At .opencode/skills/dbt-develop/SKILL.md, line 270: <comment>`CONCAT_WS` support/behavior is documented incorrectly: BigQuery does not support `CONCAT_WS`, and Snowflake `CONCAT_WS` does not skip NULLs. This guidance can produce failing or incorrect SQL in both dialects.</comment> <file context> @@ -252,6 +255,44 @@ CASE WHEN cond THEN CAST('0' AS NUMERIC) ELSE CAST(0 AS NUMERIC) END +-- Right: explicit placeholder +COALESCE(region, 'UNKNOWN') || '-' || COALESCE(segment, 'UNKNOWN') AS geo_segment +``` +Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it +skips `NULL` operands instead of propagating them, which is usually safer +than a static placeholder. </file context>

Suggested change

Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it

skips `NULL` operands instead of propagating them, which is usually safer

Use dialect-specific NULL-safe concatenation patterns. In BigQuery, use `ARRAY_TO_STRING([...], '-')` to skip `NULL`s; in Snowflake, `CONCAT_WS` still returns `NULL` when any argument is `NULL`, so wrap operands with `COALESCE(...)`.

_{Tip: Review your code locally with the cubic CLI to iterate faster.}

cubic-dev-ai · 2026-05-11T07:37:40Z

+        include: "file",
+        dot: false,
+        symlink: false,
+      }).catch(() => [] as string[])


P2: Glob.scan errors are swallowed, so applyPaths scan failures are silently ignored instead of being logged by the caller.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At packages/opencode/src/session/system.ts, line 168: <comment>`Glob.scan` errors are swallowed, so `applyPaths` scan failures are silently ignored instead of being logged by the caller.</comment> <file context> @@ -78,14 +82,93 @@ export namespace SystemPrompt { + include: "file", + dot: false, + symlink: false, + }).catch(() => [] as string[]) + if (matches.length > 0) return true + } </file context>

Suggested change

}).catch(() => [] as string[])

})

Adds reference for the new auto-load mechanism to docs/docs/configure/skills.md: - Lists the two new frontmatter fields in the Frontmatter Fields table - New "Auto-loading skills" section explaining the lazy-load default, how `alwaysApply` and `applyPaths` change it, a worked example, a "when to use" table, and an honest section on context-size implications + prompt-cache amortization Pure documentation update — no code change in this commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-11T16:05:24Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-05-11T16:08:06Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@docs/docs/configure/skills.md`:
- Around line 68-72: The fenced code block showing the <auto_loaded_skill>
element lacks a language identifier causing lint warnings; update the markdown
code fence to include a language tag (use "xml") so the block becomes ```xml ...
``` around the <auto_loaded_skill name="<skill-name>"> ... </auto_loaded_skill>
snippet to enable proper syntax highlighting and satisfy the linter.
- Around line 48-50: The example in applyPaths lists both "dbt_project.yml" and
"**/dbt_project.yml", which are redundant because a bare filename already
matches at any depth; update the docs by removing the "**/dbt_project.yml" entry
or add a short clarifying sentence explaining why both are shown (e.g., that
both patterns are equivalent and the second is optional/for explicitness).
Ensure the change references the applyPaths example and the "dbt_project.yml"
filename so readers understand the intended behavior.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 85dd32e2-0eb5-436e-a2f1-b942c9209597

📥 Commits

Reviewing files that changed from the base of the PR and between d8a1add and 6107c3b.

📒 Files selected for processing (1)

docs/docs/configure/skills.md

coderabbitai · 2026-05-11T16:09:25Z

+applyPaths:
+  - "dbt_project.yml"        # matches if any dbt_project.yml exists in the worktree
+  - "**/dbt_project.yml"


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Description: Find the glob-matching logic for applyPaths to verify recursive behavior # Search for applyPaths glob matching implementation rg -n -C5 'applyPaths.*glob|glob.*applyPaths' --type=ts # Look for minimatch or glob library usage in session/system context rg -n -C3 'minimatch|micromatch|glob.*match' packages/opencode/src/session/system.ts # Find where applyPaths is processed ast-grep --pattern 'applyPaths'

Repository: AltimateAI/altimate-code

Length of output: 3091

🏁 Script executed:

rg -n "normalizeApplyPaths|anyMatchInWorktree" --type=ts -A 10

Repository: AltimateAI/altimate-code

Length of output: 2534

🏁 Script executed:

rg -n "import.*Glob|from.*Glob" packages/opencode/src/session/system.ts --type=ts

Repository: AltimateAI/altimate-code

Length of output: 106

🏁 Script executed:

cat -n packages/opencode/src/util/glob.ts

Repository: AltimateAI/altimate-code

Length of output: 1257

Clarify or remove the redundant glob pattern.

The example shows both "dbt_project.yml" and "**/dbt_project.yml". A bare filename already matches files at any depth in the worktree (as stated in the codebase comment: "a skill that wants dbt_project.yml catches the file no matter how deep the user's cwd is"). The second pattern is functionally identical and may confuse users. Either explain why both are shown or remove the redundant one.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/docs/configure/skills.md` around lines 48 - 50, The example in applyPaths lists both "dbt_project.yml" and "**/dbt_project.yml", which are redundant because a bare filename already matches at any depth; update the docs by removing the "**/dbt_project.yml" entry or add a short clarifying sentence explaining why both are shown (e.g., that both patterns are equivalent and the second is optional/for explicitness). Ensure the change references the applyPaths example and the "dbt_project.yml" filename so readers understand the intended behavior.

coderabbitai · 2026-05-11T16:09:25Z

+```
+<auto_loaded_skill name="<skill-name>">
+... full skill body ...
+</auto_loaded_skill>
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Add language identifier to code block.

The code block is missing a language specifier, which prevents proper syntax highlighting and triggers linting warnings.

📝 Proposed fix

-``` +```xml <auto_loaded_skill name="<skill-name>"> ... full skill body ... </auto_loaded_skill>

</details>  <details> <summary>📝 Committable suggestion</summary> > ‼️ **IMPORTANT** > Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements. ```suggestion

🧰 Tools

🪛 markdownlint-cli2 (0.22.1)

[warning] 68-68: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@docs/docs/configure/skills.md` around lines 68 - 72, The fenced code block showing the <auto_loaded_skill> element lacks a language identifier causing lint warnings; update the markdown code fence to include a language tag (use "xml") so the block becomes ```xml ... ``` around the <auto_loaded_skill name="<skill-name>"> ... </auto_loaded_skill> snippet to enable proper syntax highlighting and satisfy the linter.

cubic-dev-ai

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="docs/docs/configure/skills.md">

<violation number="1" location="docs/docs/configure/skills.md:49">
P3: The `applyPaths` example comment is inaccurate: `"dbt_project.yml"` does not match anywhere in the worktree, only at the root.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.}

cubic-dev-ai · 2026-05-11T16:10:18Z

+  - "dbt_project.yml"        # matches if any dbt_project.yml exists in the worktree
+  - "**/dbt_project.yml"


P3: The applyPaths example comment is inaccurate: "dbt_project.yml" does not match anywhere in the worktree, only at the root.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/docs/configure/skills.md, line 49: <comment>The `applyPaths` example comment is inaccurate: `"dbt_project.yml"` does not match anywhere in the worktree, only at the root.</comment> <file context> @@ -28,7 +28,75 @@ Focus on the query: $ARGUMENTS +--- +name: dbt-develop +applyPaths: + - "dbt_project.yml" # matches if any dbt_project.yml exists in the worktree + - "**/dbt_project.yml" +description: ... </file context>

Suggested change

- "dbt_project.yml" # matches if any dbt_project.yml exists in the worktree

- "**/dbt_project.yml"

- "dbt_project.yml" # matches only at the worktree root

- "**/dbt_project.yml" # matches anywhere under the worktree

_{Tip: Review your code locally with the cubic CLI to iterate faster.}

Two changes informed by trace analysis of the benchmark run with the initial auto-load mechanism. With the auto-loaded body present in the system prompt, 6 of 8 sampled failing trials never referenced any of its guidance keywords (date spine, tiebreaker, deliverable, etc.) — the model was treating the auto-loaded section as background reference rather than binding directive. These two changes address the framing. (1) `feat(system-prompt)`: move auto-loaded skill bodies BEFORE the lazy-loaded `<available_skills>` XML block in the skills section. Previously the order was: 1. "Use the skill tool to load a skill..." preamble 2. <available_skills> XML (long, descriptions only) 3. <auto_loaded_skill> body (binding guidance) Now: 1. <auto_loaded_skill> body (binding guidance — read FIRST) 2. "Skills provide specialized instructions..." preamble 3. <available_skills> XML (lazy-loaded skills the agent can opt into) Framing the auto-loaded body as "rules of the road" at the start rather than supplementary documentation at the end. Pure ordering change in `SystemPrompt.skills()` parts array — no schema or API change. Applies to any skill using `applyPaths` or `alwaysApply`. File: packages/opencode/src/session/system.ts (2) `docs(skill)`: add a "Pre-completion checklist" section (§5) to dbt-develop that the agent is told to emit with `[x]/[ ]` marks before declaring the task done. Each item is a yes/no question against patterns the skill already documents (LEFT JOIN cardinality, date-spine completeness, window-rank tiebreaker, type harmonization in COALESCE/CASE/UNION, string-concat NULL handling, uniqueness enforcement, incremental high-water mark, snapshot strategy, dbt model versioning v2, unit-test verification). The forcing function: the agent must produce the checklist text in its final message. Unchecked items without a stated "n/a" reason mean the task is not done. Forces the model to slow down at the end and verify the patterns against the SQL it just wrote, rather than silently skip the verification phase. All items are generic dbt patterns applicable to any project — no benchmark-specific test names, no solution-seed values, no grading-rubric hints. File: .opencode/skills/dbt-develop/SKILL.md Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-11T17:18:25Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

github-actions · 2026-05-11T17:22:27Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.opencode/skills/dbt-develop/SKILL.md:
- Around line 165-167: Replace the non-recursive "ls models/" check with a
recursive file-discovery command so nested model files aren't missed; update the
SKILL.md checklist to use a recursive listing (e.g., recursive ls or find)
targeting model files (reference the current "ls models/" line) and ensure the
new command filters for model file types so deliverable verification includes
files in subdirectories.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: c8271a3c-2e25-4585-8a12-9658d74c1430

📥 Commits

Reviewing files that changed from the base of the PR and between 6107c3b and c647876.

📒 Files selected for processing (2)

.opencode/skills/dbt-develop/SKILL.md
packages/opencode/src/session/system.ts

🚧 Files skipped from review as they are similar to previous changes (1)

packages/opencode/src/session/system.ts

coderabbitai · 2026-05-11T17:22:51Z

+ls models/                                                   # confirm every requested file exists
+altimate-dbt info                                            # confirm every requested model is in the project
+```


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use recursive file discovery instead of ls models/ for deliverable verification.

ls models/ only shows top-level entries, so nested model files can be missed during checklist validation. Prefer a recursive check command.

Suggested doc tweak

-ls models/ # confirm every requested file exists +find models -type f -name "*.sql" # confirm every requested model file exists (including nested dirs)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.opencode/skills/dbt-develop/SKILL.md around lines 165 - 167, Replace the non-recursive "ls models/" check with a recursive file-discovery command so nested model files aren't missed; update the SKILL.md checklist to use a recursive listing (e.g., recursive ls or find) targeting model files (reference the current "ls models/" line) and ensure the new command filters for model file types so deliverable verification includes files in subdirectories.

…result The "emit a [x]/[ ] checklist before declaring done" addition to dbt-develop (§5, shipped two commits ago) was measured negative on the post-A+B benchmark re-run: - Checklist appeared in 6 of 14 still-failing trial outputs. - Zero of those 6 flipped to PASS. - In multiple traces, the agent self-marked `[x] LEFT JOIN cardinality correct` while the underlying SQL still had the exact phantom-row bug the item warned against. The framing trained the model to perform verification theater rather than actually re-read its SQL. The two flips attributed earlier to "A+B" (helixops_saas007, helixops_saas009) trace back to the placement reorder (A) — the checklist (B) contributed nothing measurable, and adds 50+ lines of system-prompt content for no benefit. This commit: (1) Removes §5 from `.opencode/skills/dbt-develop/SKILL.md`. The other sections (Plan → Discover → Write → Validate, Common Pitfalls in Transformation Logic, Iron Rules) stay intact. The placement reorder in `system.ts` and the `applyPaths`/`alwaysApply` frontmatter mechanism stay. (2) Adds a "What we tried that didn't work" section to research/kimi-k26-ade-bench-2026-05-10/findings.md so the negative result is preserved as institutional knowledge. The broader principle — "soft self-verification (model promises it checked X) is unreliable on this model class; hard verification (compile/test failures) still works" — is worth keeping around. (3) Updates the findings TL;DR with both the original 81.3% headline and the post-second-wave 85.3% best-of-runs number, with the caveat that the body of the post analyzes the first-wave traces. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-11T21:46:46Z

👋 This PR was automatically closed by our quality checks.

Common reasons:

New GitHub account with limited contribution history
PR description doesn't meet our guidelines
Contribution appears to be AI-generated without meaningful review

If you believe this was a mistake, please open an issue explaining your intended contribution and a maintainer will help you.

dev-punia-altimate · 2026-05-12T05:29:30Z

❌ Tests — Failures Detected

TypeScript — 15 failure(s)

baseline [0.49ms]
baseline [0.40ms]
baseline [0.12ms]
baseline [0.30ms]
connection_refused [0.28ms]
timeout [0.07ms]
permission_denied [0.07ms]
parse_error [0.05ms]
oom [0.06ms]
network_error [0.06ms]
auth_failure [0.05ms]
rate_limit [0.05ms]
internal_error [0.07ms]
empty_error [0.06ms]
connection_refused [0.08ms]

Next Step

Please address the failing cases above and re-run verification.

cc @anandgupta42

claude Bot reviewed May 11, 2026

View reviewed changes

github-actions Bot added the contributor label May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

github-actions Bot added the needs-review:blocked label May 11, 2026

coderabbitai Bot reviewed May 11, 2026

View reviewed changes

cubic-dev-ai Bot reviewed May 11, 2026

View reviewed changes

coderabbitai Bot reviewed May 11, 2026

View reviewed changes


		Notes from running the Moonshot Kimi-K2.6 model (via OpenRouter) inside altimate-code's dbt-aware agent loop on the ADE-Bench analytics/data-engineering benchmark.

		Date: 2026-05-10. Model id: `openrouter/moonshotai/kimi-k2.6-20260420`. Harness: altimate-code (a fork of OpenCode wrapping the model in a coding-agent loop with extra dbt/SQL/warehouse tools).


		## 6. Where the custom tools helped (or didn't)

		altimate-code ships dbt-specific tools beyond OpenCode's base set. Pass-rate correlations:

	\| Step-to-step gaps (`step_start → next step_start`) \| 30,672 s \| 89.2% \|
	\| Step-to-step gaps (`step_finish → next step_start`) \| 30,672 s \| 89.2% \|

	REPO_ROOT="$(cd "$SCRIPT_DIR/../../../../../.." && pwd)"
	REPO_ROOT="$(cd "$SCRIPT_DIR/../../.." && pwd)"

		Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
		skips `NULL` operands instead of propagating them, which is usually safer

	Use `CONCAT_WS()` if your dialect supports it (Snowflake, BigQuery) — it
	skips `NULL` operands instead of propagating them, which is usually safer
	Use dialect-specific NULL-safe concatenation patterns. In BigQuery, use `ARRAY_TO_STRING([...], '-')` to skip `NULL`s; in Snowflake, `CONCAT_WS` still returns `NULL` when any argument is `NULL`, so wrap operands with `COALESCE(...)`.

		- "dbt_project.yml" # matches if any dbt_project.yml exists in the worktree
		- "**/dbt_project.yml"

Conversation

anandgupta42 commented May 11, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

1. Research / blog-ready writeup

2. Reproduction scaffolding (benchmark/ade-bench/)

3. Shipped skill improvements

4. Auto-load skill mechanism (alwaysApply / applyPaths) — new feature

Test Plan

Checklist

Summary by CodeRabbit

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

coderabbitai Bot commented May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Suggested labels

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 11, 2026

Choose a reason for hiding this comment

anandgupta42 commented May 11, 2026 •

edited by coderabbitai Bot

Loading

2. Reproduction scaffolding (`benchmark/ade-bench/`)

4. Auto-load skill mechanism (`alwaysApply` / `applyPaths`) — new feature

coderabbitai Bot commented May 11, 2026 •

edited

Loading