This is the code repo for the O'Reilly book "Data Science: The Hard Parts" to be released in September 2023.
If you're new to python, I recommend downloading the Anaconda Distribution. Please follow the instructions thereby provided.
All code is provided as chapter-specific Jupyter notebooks (download instructions here) that are rendered on Github.
- Chapter 1: So What. Creating Value With Data Science
- Chapter 2: Metrics Design
- Chapter 3: Growth Decompositions: Understanding Tail and Headwinds
- Chapter 4: 2x2 Designs
- Chapter 5: Building Business Cases
- Chapter 6: What's In a Lift?
- Chapter 7: Narratives
- Chapter 8: Datavis: Choosing the Right Plot to Deliver a Message
- Chapter 9: Simulation and Bootstrapping
- Chapter 10: Linear Regression: Going Back to Basics
- Chapter 11: Data Leakage
- Chapter 12: Productionizing Models
- Chapter 13: Storytelling in ML
- Chapter 14: From Prediction to Decisions
- Chapter 15: Incrementality: The Holy Grail of Data Science?
- Chapter 16: A/B Tests
- Chapter 17: Large Language Models and the Practice of Data Science
I recently started writing blog posts on my substack. You can find the accompanying code here.