docs: plan DuckDB/Node backend for buckaroo-js-core (#930) by paddymul · Pull Request #935 · buckaroo-data/buckaroo

paddymul · 2026-06-23T18:01:25Z

Summary

Design doc for #930 — a first-class DuckDB-backed buckaroo backend that runs in a pure Node/Electron host with no Python kernel, so the JS-core viewer (DFViewerInfinite/SmartRowCache/IDatasource) renders the same behind DuckDB as behind pandas/polars. IModel is the transport seam and stays untouched.

Motivating consumer is an Electron app (@duckdb/node-api, no Python) whose author wants the full notebook experience — search, infinite scroll, summary stats, histograms.

Blocked on #933

This waits on #933 (unified DF transport). That PR's decodeDFData + parquet_b64 envelope let the infinite path accept a single JSON message with inline base64 parquet, removing the two-frame requirement (BuckarooWidgetInfinite.tsx:122 reads buffers[0] unconditionally today). With #933 the IPC IModel adapter is a plain round-trip and there is zero buckaroo-js-core change; without it, 930 would need a JS-core patch. Don't start the row-transport work until #933 lands.

v1 scope

SUMMARIZE → SDType → wide {col}__{stat} parquet for pinned summary rows.
Windowed rows + sort + paging over infinite_request.
viewer mode, read-only (no autoclean/post-processing/search/quick-commands).
COPY … TO tmpfile (FORMAT PARQUET) for rows — the only serialization path that writes no type-coercion code (this @duckdb/node-api has no Arrow and no in-memory parquet; coercion is where fidelity bugs live).
Faithful a,b,c rename + synthesized index (kills index-named / dotted / duplicate-column landmines that user SQL hits more than DataFrames do).
Injected DuckSource connection — required for catalog correctness, not just hygiene.

Fast-follows (designed-for, not built)

Histograms/quantiles; search (as a search_-prefixed filtered stat set — the broader plan, which DuckDB is the natural first backend to implement since xorq-slowness was the only reason it didn't exist); quick commands (per-command SQL translation behind one effective-query seam).

Also

DECIMAL precision loss split out to DECIMAL columns lose precision (rendered as float64) — likely across all backends #934 (v1 casts DECIMAL → DOUBLE; likely affects all backends).

Doc: docs/plans/930-duckdb-node-backend.md. Plan only, no code.

🤖 Generated with Claude Code

v1 scope: SUMMARIZE stats + windowed sort/paging over an IModel-over-IPC adapter, injected DuckSource, no buckaroo-js-core change. Search, histograms, and quick commands are designed-for fast-follows behind one effective-query seam. Blocked on #933 (unified DF transport): its decodeDFData + parquet_b64 envelope let the infinite path take a single JSON message, removing the two-frame requirement that would otherwise force a JS-core change. DECIMAL precision loss split out to #934. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c4a9353eb2

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-06-23T18:04:29Z

+| `min` / `max` | `min` / `max` |
+| `distinct_count` | `approx_unique` |
+| `mean` / `std` | `avg` / `std` |
+| `null_count` | `count × null_percentage` (derived) |


Derive null_count from the percentage scale

DuckDB SUMMARIZE exposes null_percentage as a percentage, so this formula would overstate nulls by 100x if implemented literally: with 1,000 rows and 25% nulls, count × null_percentage gives 25,000 instead of 250. Since Buckaroo's null_count is an absolute integer count, the plan should either divide the parsed percentage by 100 or compute nulls directly.

Useful? React with 👍 / 👎.

github-actions · 2026-06-23T18:05:49Z

📦 TestPyPI package published

pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.15.1.dev28046370246

or with uv:

uv pip install --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo==0.15.1.dev28046370246

MCP server for Claude Code

claude mcp add buckaroo-table -- uvx --from "buckaroo[mcp]==0.15.1.dev28046370246" --index-strategy unsafe-best-match --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple/ buckaroo-table

📖 Docs preview

🎨 Storybook preview

paddymul temporarily deployed to testpypi June 23, 2026 18:03 — with GitHub Actions Inactive

chatgpt-codex-connector Bot reviewed Jun 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: plan DuckDB/Node backend for buckaroo-js-core (#930)#935

docs: plan DuckDB/Node backend for buckaroo-js-core (#930)#935
paddymul wants to merge 1 commit into
mainfrom
docs/930-duckdb-node-backend-plan

paddymul commented Jun 23, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Uh oh!

github-actions Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

paddymul commented Jun 23, 2026

Summary

Blocked on #933

v1 scope

Fast-follows (designed-for, not built)

Also

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Jun 23, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Jun 23, 2026

📦 TestPyPI package published

MCP server for Claude Code

📖 Docs preview

🎨 Storybook preview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant