Coding Challenge - Data Engineer @ Vizzuality

What is the purpose of this challenge?

With this challenge we would like to see a little bit more about how you work and the way you make decisions. Specifically, the challenge will help us see:

Your current technical knowledge
Your thought process
The way you work and organize

The challenge will also help you get a glimpse of the real type of technical work we develop on a daily basis and the type of datasets we work with.

Please, feel free to surprise us, and showcase any skills that you think are important!

Keep in mind that there is no right or wrong answer. If you feel like your process isn't perfect, don't worry. This is just meant to be an exercise to help us gauge where you are in terms of your current capacity and be a talking point during the next interview.

The challenge

Overview

The challenge consists of creating a data pipeline that takes a raster dataset, summarizes it by administrative regions and stores the results in a relational database. Specifically, we want to summarize the total ecosystem carbon of the northern lakes region in the USA using data from the National Forest Carbon Monitoring System.

The result must be a database with the total carbon values for each county of the states of Michigan, Wisconsin and Minnesota. To achieve this, we want you to create a simple Python pipeline that loads the rasters, computes the zonal stats and loads the values to the db.

Data

Total ecosystem carbon raster from the National Forest Carbon Monitoring System.
Use any administrative boundary source for USA counties you think is appropriate.

Goals

As a minimum, we expect you to deliver these 4 points:

Deliver a simple and reproducible python data pipeline.
The pipeline must be easily reproducible end to end. This means that all the setup instructions or programs must be part of the deliverable.
The results must be accurate and correct (watch out for the units, there are some clues in the metadata documents.)
Include instructions on how to query the results, so that, after executing the pipeline, we are able to perform such queries.

Optional Goals

If you feel confident, want to go the extra mile, show us more skills, and surprise us, you can add all/some of the points below:

Share an initial exploration of the input datasets with some visualization in a notebook or similar medium.
A map with the results.
Do you think something is missing/you can add useful features? Go for it!

Tools

Apart from Python, use any tools you are comfortable with.

Things to keep in mind

Be pragmatic and mindful of the trade-off between feature-completeness and complexity/performance. Completeness is better than show-offs. Keep it simple.
About the use of AI assistance: as with any other tool, we do allow you to use it. Nonetheless, we expect that the delivered project is entirely yours and that you understand and are capable of defending all the aspects of your decisions. We want to know how you approached the problem, not how an LLM does it so keep them contained and under your control.

How should I deliver the results?

As a link to a reproducible and self-contained repository on your preferred git platform (GitHub, GitLab, Codeberg...)

How much time should I spend?

Based on our experience, we believe you shouldn't spend more than 6 hours. But ultimately, how much time you dedicate to the challenge is up to you. We will also be talking about allocated time during the interview.

What if I have questions?

Email us any questions and we will answer as soon as possible.

What will happen in the interview?

In the upcoming interview we’ll focus on your coding challenge submission. We will expect you to explain your code to an audience that will include members of the Science team but also one or two people from other functional areas (Design, Front-End, Back-End, Project Manager,..). We will ask you any clarifying questions we might have.
This will be an opportunity for you to provide some more context about the challenge, the assumptions you made, and add anything that you might want. The technical solution is not the only thing that we value, also your approach and explanations.
Finally, we will also allocate some time for you to ask any questions about anything and everything you would like to know more about (ie. role, how we work at Vizzuality, our culture, benefits, etc.)
The interview will last at most 120 minutes.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Coding Challenge - Data Engineer @ Vizzuality

What is the purpose of this challenge?

The challenge

Overview

Data

Goals

Optional Goals

Tools

Things to keep in mind

How should I deliver the results?

How much time should I spend?

What if I have questions?

What will happen in the interview?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

License

Vizzuality/science-data-engineer-challenge

Folders and files

Latest commit

History

Repository files navigation

Coding Challenge - Data Engineer @ Vizzuality

What is the purpose of this challenge?

The challenge

Overview

Data

Goals

Optional Goals

Tools

Things to keep in mind

How should I deliver the results?

How much time should I spend?

What if I have questions?

What will happen in the interview?

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Packages