Choose whatever language you’re most comfortable with to solve these problems. The tasks are designed to test your ability to work with data and scripts and showcase relevant skills for the DevOps role at Revieve.
The ACME inc. tool supply company manages its operations with 3 csv files:
customers.csvkeeps customer information:idis a numeric customer idfirstnameis the customer's first namelastnameis the customer's last name
products.csvkeeps product info:idis a numeric product idnameis the human-readable namecostis the product cost in euros
orders.csvkeeps order information:idis a numeric order idcustomeris the numeric id of the customer who created the orderproductsis a space-separated list of product ids ordered by the customer
Manually dealing with those files is hard and error-prone, and they've asked for your help writing some code to make their lives easier.
Right now the orders.csv doesn't have total order cost information.
We need to use the data in these files to emit a order_prices.csv file with the following columns:
idthe numeric id of the ordereurosthe total cost of the order
The marketing department wants to know which customers are interested in each product; they've asked for a product_customers.csv file that, for each product, gives the list of customers who have purchased this product:
idnumeric product idcustomer_idsa space-separated list of customer ids of the customers who have purchased this product
To evaluate our customers, we need a customer_ranking.csv containing the following columns, ranked in descending order by total_euros:
idnumeric id of the customerfirstnamecustomer first namelastnamecustomer last nametotal_eurostotal euros this customer has spent on products
- Automation: Create a script or configuration to automate the execution of these data processing tasks. Consider using a task runner or a simple CI/CD pipeline setup (e.g., GitHub Actions) to automate the generation of the CSV files whenever new data is added.
- Documentation: Provide clear and concise documentation on how to run your solution, including any setup steps and dependencies.
For those who wish to go above and beyond, consider implementing one or more of the following optional features:
- Unit Testing:
- Write unit tests for your data processing scripts using a testing framework of your choice.
- Include the tests in your GitHub Actions workflow to run automatically on each push and pull request.
- Error Handling and Logging:
- Implement robust error handling and logging for your scripts.
- Use a logging library to capture and store logs.
- Code Quality:
- Ensure your code follows best practices for code quality and style.
- Use tools like flake8, pylint, or black to lint and format your code.
- Performance Optimization:
- Optimize your scripts for better performance, especially if handling large datasets.
- Provide a brief explanation of the optimizations you implemented.
- Dockerization:
- Create a Dockerfile to containerize your scripts.
- Update the GitHub Actions workflow to use the Docker container for running the scripts.
- Fork this repository to your own GitHub account.
- Implement your solutions.
- Create a pull request in your forked repository with your solution.
- Ensure your pull request includes clear instructions and explanations of your approach.