Congratulations — you made it to the technical challenge round. We're genuinely excited to see how you approach this.
This repository contains a hands-on data engineering challenge for two specialization tracks. We use it across our hiring process because it reflects the kind of work our team does every day.
What matters most to us is not a perfect implementation — it's how you think. We want to see the decisions you make, the trade-offs you consider, and how you communicate your reasoning.
We belive that a modern data engineer has two main areas:
You're interested in the infrastructure layer: how data pipelines are built, orchestrated, and maintained. You care about reusability, testability, and making the right architectural decisions.
→ Read docs/platform_track.md for the full brief.
You're interested in the data modeling layer: how raw data is shaped into clean, business-ready tables. You care about SQL quality, data contracts, and making data useful for analysts and BI tools.
→ Read docs/dwh_track.md for the full brief.
You work at a fictional company that runs an online shop. The company's product catalog, user base, and order history are exposed through a REST API:
FakeStore API: https://fakestoreapi.com
Three endpoints are available:
| Endpoint | What it returns |
|---|---|
/products |
Product catalog (20 items across 4 categories) |
/users |
Registered customers (10 users) |
/carts |
Shopping carts / orders (20 carts with line items) |
The data engineering team needs to ingest this data, transform it, and make it available for reporting. Your task is to build that pipeline — or the data model on top of it, depending on your track.
- Click "Code" → "Open with Codespaces" at the top of this page
- Wait for the environment to build (~2 minutes)
- Run
pixi run start-devin the terminal - Open the Dagster UI at localhost:3000
- Materialize the
ingestionasset group (this fetches data from the API and writes it to DuckDB) - Open
notebooks/explore.ipynbwith thedevkernel to explore the raw data - Choose your track and start building — see
docs/for your brief
Requirements: pixi (the project uses pixi for dependency management)
# Install all dependencies
pixi install -e dev
# Start Dagster dev server
pixi run start-dev
# Open Jupyter Lab
pixi run notebookOther useful commands:
make lint # Run ruff linter
make format # Auto-format with ruff
make typecheck # Run pyright
make test # Run pytest with coverage
make dbt-run # Run all dbt models
make dbt-test # Run all dbt tests
make dbt-docs # Generate and serve dbt documentation| Tool | Purpose |
|---|---|
| Dagster | Data orchestration and asset lineage |
| dbt | SQL transformations and data modeling |
| DuckDB | Local analytical database |
| Pixi | Dependency and environment management |
| Ruff | Python linting and formatting |
| Pyright | Static type checking |
| pytest | Testing framework |
| Jupyter Lab | Data exploration |
For Platform Engineers:
- Architectural decision-making — what trade-offs did you consider and why did you land where you did?
- Reusable, testable code — could another engineer extend this without needing to ask you?
- Observability — does the pipeline tell you what happened when something goes wrong?
- Pragmatism — a working, well-reasoned solution beats an elegant but incomplete one
For DWH Engineers:
- Data modeling craft — is the model correct, clear, and maintainable?
- SQL quality — is the SQL readable, well-structured, and appropriately tested?
- Business understanding — do the mart models answer real business questions?
- Documentation — would an analyst be able to use your models without asking you?
.
├── docs/
│ ├── platform_track.md # Platform Engineer brief
│ └── dwh_track.md # DWH Engineer brief
├── src/
│ ├── code_location_de/ # Dagster code location (Python)
│ │ └── code_location_de/
│ │ ├── assets/
│ │ │ ├── ingestion/ # PROVIDED: raw data ingestion from API
│ │ │ └── platform/ # STUB: Platform Engineer implements here
│ │ ├── resources/ # PROVIDED: API client, IO manager config
│ │ └── checks/ # STUB: DWH Engineer adds asset checks here
│ └── code_location_de_dbt/ # dbt project (SQL transformations)
│ └── models/
│ ├── staging/ # STUB: one model per raw source table
│ ├── intermediate/ # STUB: joins and unnesting
│ └── marts/ # STUB: reporting-ready tables
├── notebooks/
│ └── explore.ipynb # PROVIDED: explore raw API data
├── pyproject.toml # Dependencies and tool config (Pixi)
├── workspace.yaml # Dagster workspace config
└── Makefile # Convenience commands
- Fork this repository
- Implement your solution on a branch
- Open a pull request back to the main branch of your fork (so we can see a clean diff)
- Share the repository link with us
Include a short write-up in your PR description:
- What you implemented vs. what you skipped and why
- Anything you'd do differently with more time
- Which AI tools you used and how
Before you start, we recommend spending 30–60 minutes with these resources:
- Dagster concepts: Assets
- Dagster concepts: Resources
- Dagster concepts: Asset checks
- dbt: How we structure our projects
- dbt: Testing
- DuckDB: JSON functions
You don't need to read all of this — it's here so you know where to look when you need it.