Live, queryable datasets that update automatically.
Think of it like a spreadsheet that fills itself in — you describe the dataset you want (YC companies currently hiring, insurance quotes in your area, restaurants serving a specific brand), and BigSet builds it, keeps it fresh, and lets you query it with SQL.
Built on TinyFish APIs.
At the end of the day, the only thing that matters is data. Every decision, every agent, every product — it all comes down to having the right data at the right time.
So what if you could just… ask for it? Describe the dataset you want — in plain English — and have it built, structured, and kept fresh automatically. No scrapers to maintain. No pipelines to babysit. No waking up to broken cron jobs because some site changed a div.
You describe it. BigSet collects it. Your agents query it with SQL. It stays up to date on your schedule — every 30 minutes, every hour, whatever you need. And if something breaks, a healer agent patches it before you even notice.
Any dataset. Any source. Always fresh. That's the idea.
Prerequisites: Docker, Make, and a free Clerk account
git clone https://github.com/tinyfish-io/bigset.git
cd bigsetCreate a Clerk application at dashboard.clerk.com, then go to JWT Templates and enable the Convex template.
# Root .env — used by Docker for the frontend container
cp .env.example .env
# Fill in NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY and CLERK_SECRET_KEY
# Frontend .env.local — used by Next.js and Convex CLI
cp frontend/.env.example frontend/.env.local
# Fill in all three Clerk keys (publishable, secret, and JWT issuer domain)Optional: to enable PostHog product analytics + session replay + error tracking, set
NEXT_PUBLIC_POSTHOG_KEYandNEXT_PUBLIC_POSTHOG_HOST. Leave blank to disable cleanly (the app no-ops every event).
make devThis starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically.
docker compose exec convex ./generate_admin_key.shPaste the output into frontend/.env.local as CONVEX_SELF_HOSTED_ADMIN_KEY, then re-run make dev.
The landing page and the dashboard's "Curated" section read from a set of 9 system-owned datasets. Load them with:
cd frontend
npx convex run publicSeed:seedPublicDatasetsThe script is idempotent — rerunning it skips datasets that already exist (matched by a stable seedKey, so renaming a curated dataset never creates a duplicate). To add a 10th curated dataset, append it to PUBLIC_DATASETS in frontend/convex/publicSeed.ts with a fresh seedKey and rerun the command. To replace existing curated content in place, pass force: true:
npx convex run publicSeed:seedPublicDatasets '{"force":true}'Open localhost:3500 and click Get started to sign in.
Note: Backend env needs no setup —
backend/.env.examplehas correct defaults. If you edit Convex functions infrontend/convex/, runmake convex-pushto deploy the changes.
| Layer | Tech |
|---|---|
| Frontend | Next.js 16, React 19, Tailwind 4 |
| Backend | Fastify, TypeScript (agent runner) |
| Auth | Clerk |
| Database | Convex (self-hosted) |
| Data Collection | TinyFish APIs (Search, Fetch, Browser) |
| Table view | TanStack Table + react-window virtualization |
| Analytics | PostHog — events, session replay, error tracking (optional) |
bigset/
├── frontend/ Next.js 16 — UI + Convex schema & functions
│ ├── convex/ Convex functions, schema, and auth config
│ └── .env.local Clerk + Convex keys (not committed)
├── backend/ Fastify — agent runner, writes to Convex via HTTP
├── .env Clerk keys for docker-compose (not committed)
├── docker-compose.dev.yml
└── Makefile
BigSet is a work in progress. We're building in the open because the best ideas come from the people who actually want to use the thing.
We'd love your feedback, ideas, or help building — come say hi:
- 🐦 Twitter: @Tiny_Fish for project updates
- 🗣 Twitter: @not_simantak for the unfiltered version
- 🐛 GitHub Issues: Report bugs or request features
Contributions are very welcome — whether it's code, feedback, or just telling us what datasets you'd want to build.
- Fork the repo
- Create a branch (
git checkout -b my-feature) - Make your changes
- Run
bash scripts/verify-authz.shto confirm the authorization layer still holds - Open a PR
If you're not sure where to start, open an issue or come say hi.