GitHub - tinyfish-io/bigset: What if you had all the data in the world?

Live, queryable datasets that update automatically.

Think of it like a spreadsheet that fills itself in — you describe the dataset you want (YC companies currently hiring, insurance quotes in your area, restaurants serving a specific brand), and BigSet builds it, keeps it fresh, and lets you query it with SQL.

Built on TinyFish APIs.

✨ Why BigSet?

At the end of the day, the only thing that matters is data. Every decision, every agent, every product — it all comes down to having the right data at the right time.

So what if you could just… ask for it? Describe the dataset you want — in plain English — and have it built, structured, and kept fresh automatically. No scrapers to maintain. No pipelines to babysit. No waking up to broken cron jobs because some site changed a div.

You describe it. BigSet collects it. Your agents query it with SQL. It stays up to date on your schedule — every 30 minutes, every hour, whatever you need. And if something breaks, a healer agent patches it before you even notice.

Any dataset. Any source. Always fresh. That's the idea.

🚀 Quick Start

Prerequisites: Docker, Make, and a free Clerk account

1. Clone and set up Clerk

git clone https://github.com/tinyfish-io/bigset.git
cd bigset

Create a Clerk application at dashboard.clerk.com, then go to JWT Templates and enable the Convex template.

2. Configure env files

# Root .env — used by Docker for the frontend container
cp .env.example .env
# Fill in NEXT_PUBLIC_CLERK_PUBLISHABLE_KEY and CLERK_SECRET_KEY

# Frontend .env.local — used by Next.js and Convex CLI
cp frontend/.env.example frontend/.env.local
# Fill in all three Clerk keys (publishable, secret, and JWT issuer domain)

Optional: to enable PostHog product analytics + session replay + error tracking, set NEXT_PUBLIC_POSTHOG_KEY and NEXT_PUBLIC_POSTHOG_HOST. Leave blank to disable cleanly (the app no-ops every event).

3. Start everything

make dev

This starts all Docker services, waits for Convex to be healthy, and deploys Convex functions automatically.

4. Generate Convex admin key (first time only)

docker compose exec convex ./generate_admin_key.sh

Paste the output into frontend/.env.local as CONVEX_SELF_HOSTED_ADMIN_KEY, then re-run make dev.

5. Load curated public datasets

The landing page and the dashboard's "Curated" section read from a set of 9 system-owned datasets. Load them with:

cd frontend
npx convex run publicSeed:seedPublicDatasets

The script is idempotent — rerunning it skips datasets that already exist (matched by a stable seedKey, so renaming a curated dataset never creates a duplicate). To add a 10th curated dataset, append it to PUBLIC_DATASETS in frontend/convex/publicSeed.ts with a fresh seedKey and rerun the command. To replace existing curated content in place, pass force: true:

npx convex run publicSeed:seedPublicDatasets '{"force":true}'

Open localhost:3500 and click Get started to sign in.

Note: Backend env needs no setup — backend/.env.example has correct defaults. If you edit Convex functions in frontend/convex/, run make convex-push to deploy the changes.

🛠 Tech Stack

Layer	Tech
Frontend	Next.js 16, React 19, Tailwind 4
Backend	Fastify, TypeScript (agent runner)
Auth	Clerk
Database	Convex (self-hosted)
Data Collection	TinyFish APIs (Search, Fetch, Browser)
Table view	TanStack Table + react-window virtualization
Analytics	PostHog — events, session replay, error tracking (optional)

📁 Project Structure

bigset/
├── frontend/            Next.js 16 — UI + Convex schema & functions
│   ├── convex/          Convex functions, schema, and auth config
│   └── .env.local       Clerk + Convex keys (not committed)
├── backend/             Fastify — agent runner, writes to Convex via HTTP
├── .env                 Clerk keys for docker-compose (not committed)
├── docker-compose.dev.yml
└── Makefile

🏗 Building in Public

BigSet is a work in progress. We're building in the open because the best ideas come from the people who actually want to use the thing.

We'd love your feedback, ideas, or help building — come say hi:

🐦 Twitter: @Tiny_Fish for project updates
🗣 Twitter: @not_simantak for the unfiltered version
🐛 GitHub Issues: Report bugs or request features

🤝 Contributing

Contributions are very welcome — whether it's code, feedback, or just telling us what datasets you'd want to build.

Fork the repo
Create a branch (git checkout -b my-feature)
Make your changes
Run bash scripts/verify-authz.sh to confirm the authorization layer still holds
Open a PR

If you're not sure where to start, open an issue or come say hi.

📄 License

AGPL-3.0

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.github		.github
assets		assets
backend		backend
db		db
frontend		frontend
makefiles		makefiles
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
.semgrepignore		.semgrepignore
.tags.json		.tags.json
.yamllint		.yamllint
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.dev.yml		docker-compose.dev.yml
golden-images.yaml		golden-images.yaml
renovate.json		renovate.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

✨ Why BigSet?

🚀 Quick Start

1. Clone and set up Clerk

2. Configure env files

3. Start everything

4. Generate Convex admin key (first time only)

5. Load curated public datasets

🛠 Tech Stack

📁 Project Structure

🏗 Building in Public

🤝 Contributing

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

✨ Why BigSet?

🚀 Quick Start

1. Clone and set up Clerk

2. Configure env files

3. Start everything

4. Generate Convex admin key (first time only)

5. Load curated public datasets

🛠 Tech Stack

📁 Project Structure

🏗 Building in Public

🤝 Contributing

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages