Data Engineer Challenge

Congratulations — you made it to the technical challenge round. We're genuinely excited to see how you approach this.

This repository contains a hands-on data engineering challenge for two specialization tracks. We use it across our hiring process because it reflects the kind of work our team does every day.

What matters most to us is not a perfect implementation — it's how you think. We want to see the decisions you make, the trade-offs you consider, and how you communicate your reasoning.

Choose your track

We belive that a modern data engineer has two main areas:

Platform Engineer

You're interested in the infrastructure layer: how data pipelines are built, orchestrated, and maintained. You care about reusability, testability, and making the right architectural decisions.

→ Read docs/platform_track.md for the full brief.

DWH Engineer

You're interested in the data modeling layer: how raw data is shaped into clean, business-ready tables. You care about SQL quality, data contracts, and making data useful for analysts and BI tools.

→ Read docs/dwh_track.md for the full brief.

The scenario

You work at a fictional company that runs an online shop. The company's product catalog, user base, and order history are exposed through a REST API:

FakeStore API: https://fakestoreapi.com

Three endpoints are available:

Endpoint	What it returns
`/products`	Product catalog (20 items across 4 categories)
`/users`	Registered customers (10 users)
`/carts`	Shopping carts / orders (20 carts with line items)

The data engineering team needs to ingest this data, transform it, and make it available for reporting. Your task is to build that pipeline — or the data model on top of it, depending on your track.

Quick start (GitHub Codespaces — recommended)

Click "Code" → "Open with Codespaces" at the top of this page
Wait for the environment to build (~2 minutes)
Run pixi run start-dev in the terminal
Open the Dagster UI at localhost:3000
Materialize the ingestion asset group (this fetches data from the API and writes it to DuckDB)
Open notebooks/explore.ipynb with the dev kernel to explore the raw data
Choose your track and start building — see docs/ for your brief

Local setup

Requirements: pixi (the project uses pixi for dependency management)

# Install all dependencies
pixi install -e dev

# Start Dagster dev server
pixi run start-dev

# Open Jupyter Lab
pixi run notebook

Other useful commands:

make lint          # Run ruff linter
make format        # Auto-format with ruff
make typecheck     # Run pyright
make test          # Run pytest with coverage
make dbt-run       # Run all dbt models
make dbt-test      # Run all dbt tests
make dbt-docs      # Generate and serve dbt documentation

Stack

Tool	Purpose
Dagster	Data orchestration and asset lineage
dbt	SQL transformations and data modeling
DuckDB	Local analytical database
Pixi	Dependency and environment management
Ruff	Python linting and formatting
Pyright	Static type checking
pytest	Testing framework
Jupyter Lab	Data exploration

What we're looking for

For Platform Engineers:

Architectural decision-making — what trade-offs did you consider and why did you land where you did?
Reusable, testable code — could another engineer extend this without needing to ask you?
Observability — does the pipeline tell you what happened when something goes wrong?
Pragmatism — a working, well-reasoned solution beats an elegant but incomplete one

For DWH Engineers:

Data modeling craft — is the model correct, clear, and maintainable?
SQL quality — is the SQL readable, well-structured, and appropriately tested?
Business understanding — do the mart models answer real business questions?
Documentation — would an analyst be able to use your models without asking you?

Project structure

.
├── docs/
│   ├── platform_track.md     # Platform Engineer brief
│   └── dwh_track.md          # DWH Engineer brief
├── src/
│   ├── code_location_de/     # Dagster code location (Python)
│   │   └── code_location_de/
│   │       ├── assets/
│   │       │   ├── ingestion/    # PROVIDED: raw data ingestion from API
│   │       │   └── platform/     # STUB: Platform Engineer implements here
│   │       ├── resources/        # PROVIDED: API client, IO manager config
│   │       └── checks/           # STUB: DWH Engineer adds asset checks here
│   └── code_location_de_dbt/ # dbt project (SQL transformations)
│       └── models/
│           ├── staging/      # STUB: one model per raw source table
│           ├── intermediate/ # STUB: joins and unnesting
│           └── marts/        # STUB: reporting-ready tables
├── notebooks/
│   └── explore.ipynb         # PROVIDED: explore raw API data
├── pyproject.toml            # Dependencies and tool config (Pixi)
├── workspace.yaml            # Dagster workspace config
└── Makefile                  # Convenience commands

Submitting your work

Fork this repository
Implement your solution on a branch
Open a pull request back to the main branch of your fork (so we can see a clean diff)
Share the repository link with us

Include a short write-up in your PR description:

What you implemented vs. what you skipped and why
Anything you'd do differently with more time
Which AI tools you used and how

Useful reading

Before you start, we recommend spending 30–60 minutes with these resources:

You don't need to read all of this — it's here so you know where to look when you need it.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.devcontainer		.devcontainer
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
.vscode		.vscode
docs		docs
notebooks		notebooks
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml
workspace.yaml		workspace.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineer Challenge

Choose your track

Platform Engineer

DWH Engineer

The scenario

Quick start (GitHub Codespaces — recommended)

Local setup

Stack

What we're looking for

Project structure

Submitting your work

Useful reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Data Engineer Challenge

Choose your track

Platform Engineer

DWH Engineer

The scenario

Quick start (GitHub Codespaces — recommended)

Local setup

Stack

What we're looking for

Project structure

Submitting your work

Useful reading

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages