Skip to content

Dataset pipelines#163

Open
Ankur Goyal (ankrgyl) wants to merge 17 commits into
mainfrom
dataset-pipelines
Open

Dataset pipelines#163
Ankur Goyal (ankrgyl) wants to merge 17 commits into
mainfrom
dataset-pipelines

Conversation

@ankrgyl
Copy link
Copy Markdown
Contributor

No description provided.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 30, 2026

Latest downloadable build artifacts for this PR commit d232b9e8dcd4:

Available artifact names
  • ``artifacts-build-global
  • ``artifacts-build-local-x86_64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-apple-darwin
  • ``artifacts-build-local-aarch64-pc-windows-msvc
  • ``artifacts-build-local-x86_64-unknown-linux-musl
  • ``artifacts-build-local-x86_64-unknown-linux-gnu
  • ``artifacts-build-local-aarch64-apple-darwin
  • ``artifacts-build-local-aarch64-unknown-linux-gnu
  • ``artifacts-plan-dist-manifest
  • ``cargo-dist-cache

@ankrgyl Ankur Goyal (ankrgyl) changed the title wip: dataset pipelines Dataset pipelines May 3, 2026
Comment thread README.md Outdated
bt datasets pipeline run ./pipeline.ts --limit 100

# Staged execution for inspection or agent editing.
bt datasets pipeline fetch ./pipeline.ts --limit 500
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this called fetch here, but the enum is called pull? Can we make the naming more consistent?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, at some point i renamed from fetch to pull and there were some lingering references.

Comment thread src/datasets/pipeline.rs
Ok((ctx, client, project))
}

fn discovery_filter(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a timestamp filter of some kind? Just to constrain the queries here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a --window argument which defaults to 1d and is always and'd in.

Comment thread src/datasets/pipeline.rs Outdated
/// Maximum number of source refs to discover
#[arg(
long,
alias = "target",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if we didn't have this alias. limit aligns better with the rest of the CLI options, and it might be confusing that this is referring to source refs, and not final row count (while things like --target-dataset refer to the output).

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree

Comment thread src/datasets/pipeline.rs
);
}

datasets_api::create_dataset_with_metadata(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

auto creating datasets like this might be surprising behaviour. Means that folks run into issues if they make a spelling mistake or something similar. But I don't feel strongly about this, just figured being explicit about it might be better for the agents.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i generally agree but our SDKs auto create projects and datasets so i think as-is, this is more consistent with our current semantics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants