Tool Name
@agent-tools/dataframe
Description
In-memory tabular data manipulation with column-oriented operations — filter, derive, group, aggregate, join, pivot, and reshape data tables without external databases.
Why It's Useful for Agents
Agents frequently need to process, transform, and analyze structured data — CSV query results, API response arrays, log entries, metrics. Currently they must write imperative loops or spawn database processes. A DataFrame tool provides declarative, composable table operations directly in memory.
Built on Arquero (1.5k stars, BSD-3-Clause, v8.0.3, ~44k weekly npm downloads, UW Interactive Data Lab) — a dplyr/pandas-inspired verb-based table library with lazy evaluation and Apache Arrow interop. Arquero is lighter and more focused than Danfo.js (5k stars, MIT, ~4.7k weekly downloads — pandas-like but heavier TensorFlow.js dependency) and complements apache-arrow (3M weekly downloads, Apache-2.0 — columnar memory format and IPC, but no query verbs).
Distinct from #150 (@agent-tools/parquet — file format I/O, not in-memory manipulation), #72 (@agent-tools/tabular — CSV/spreadsheet read/write, not query operations), #66 (@agent-tools/sqlite — SQL database, not in-memory functional API), #105 (@agent-tools/math — numerical computation, not table operations).
Proposed API
import { dataframe } from "@agent-tools/dataframe";
// Create from arrays, objects, CSV, or Arrow tables
const df = dataframe.from([
{ name: "Alice", dept: "eng", salary: 120000 },
{ name: "Bob", dept: "eng", salary: 110000 },
{ name: "Carol", dept: "sales", salary: 95000 },
]);
// Verb-based transformations (chainable, lazy)
const result = df
.filter((d) => d.salary > 100000)
.derive({ bonus: (d) => d.salary * 0.1 })
.select("name", "dept", "bonus");
// Grouping and aggregation
const summary = df
.groupby("dept")
.rollup({
count: (d) => op.count(),
avg_salary: (d) => op.mean(d.salary),
max_salary: (d) => op.max(d.salary),
});
// Joins
const merged = df.join(otherDf, ["dept", "dept"]);
// Pivot / reshape
const wide = df.pivot("dept", "name", "salary");
const long = wide.fold(["eng", "sales"], { as: ["dept", "salary"] });
// I/O
const csv = result.toCSV();
const objects = result.objects();
const arrow = result.toArrow(); // Apache Arrow IPC interop
// Summary statistics
const stats = df.describe(); // count, mean, std, min, max per numeric column
// Sorting and sampling
const top5 = df.orderby(desc("salary")).slice(0, 5);
const sample = df.sample(100);
Scope
In scope:
- Table creation from arrays, objects, CSV strings, Apache Arrow tables
- Column selection, renaming, reordering
- Row filtering with expression functions
- Derived/computed columns
- Groupby + rollup aggregation (count, sum, mean, median, min, max, stdev, variance)
- Joins (inner, left, right, full, cross, semi, anti)
- Pivot (wide↔long), fold, spread
- Sorting, slicing, sampling, deduplication
- Descriptive statistics (describe)
- Export to objects, CSV, Arrow IPC
- Expression language with arithmetic, string, and date operations
Out of scope:
- Persistent storage (use @agent-tools/sqlite)
- File format I/O beyond CSV (use @agent-tools/parquet, @agent-tools/tabular)
- Visualization / charting (use @agent-tools/chart)
- Machine learning / statistical modeling
- Distributed / out-of-core processing
Tool Name
@agent-tools/dataframeDescription
In-memory tabular data manipulation with column-oriented operations — filter, derive, group, aggregate, join, pivot, and reshape data tables without external databases.
Why It's Useful for Agents
Agents frequently need to process, transform, and analyze structured data — CSV query results, API response arrays, log entries, metrics. Currently they must write imperative loops or spawn database processes. A DataFrame tool provides declarative, composable table operations directly in memory.
Built on Arquero (1.5k stars, BSD-3-Clause, v8.0.3, ~44k weekly npm downloads, UW Interactive Data Lab) — a dplyr/pandas-inspired verb-based table library with lazy evaluation and Apache Arrow interop. Arquero is lighter and more focused than Danfo.js (5k stars, MIT, ~4.7k weekly downloads — pandas-like but heavier TensorFlow.js dependency) and complements apache-arrow (3M weekly downloads, Apache-2.0 — columnar memory format and IPC, but no query verbs).
Distinct from #150 (@agent-tools/parquet — file format I/O, not in-memory manipulation), #72 (@agent-tools/tabular — CSV/spreadsheet read/write, not query operations), #66 (@agent-tools/sqlite — SQL database, not in-memory functional API), #105 (@agent-tools/math — numerical computation, not table operations).
Proposed API
Scope
In scope:
Out of scope: