This my repository for the Data Science course run at Olin College taught by Professor Zach del Rosario during the spring semester 2023. The curriculum made from open-source data science exercises, intended to take a student from zero coding experience to basic data science literacy. These exercises are heavily inspired by the (discontinued) Data Challenge Lab at Stanford University and rely on the Tidyverse.
- Curriculum contains the desired learning outcomes of this material
- Exercises contains the exercises, which provide a first introduction to using the Tidyverse to do Data Science
- Challenges contains more open-ended data challenges, which will test and build upon your skills from the exercises
- Content visualization script to help sequence course content and visualize topics
- Sequencing script to help assign exercise and challenge due dates
Data Science is a powerful toolkit to extract usable insights from data. In this class, you will learn tools and gain understanding. You will use software tools to liberate data from published images and tables, wrangle messy datasets into machine learning (ML)-ready form, fit and interpret ML models, and visualize to extract meaning. You will also speak the language of uncertainty---statistics---to avoid getting fooled by models. You will criticize published findings and ask what is, and what is not, in the data. Assignments will include regular practice exercises, progressively puzzling real-data challenges, and a final project of your choice where you obtain, wrangle, and understand a dataset.
I welcome suggestions and contributions! If you want to contribute, please see Contributing.