Skip to content

Latest commit

 

History

History
119 lines (90 loc) · 6.35 KB

outline.md

File metadata and controls

119 lines (90 loc) · 6.35 KB

Course Description

This class is targeted at aspiring programmers who are learning their first programming language. It may be appropriate for programmers who are currently at beginner-intermediate level in another programming language, and want to get a jump start in learning Python. Such students will also clarify their understanding of what aspects of programming languages are broadly shared among languages, and which are idiosyncratic to whatever language they already speak.

The majority of this course is dedicated to learning foundational aspects of programming. However, it is often easier to elucidate these through concrete examples. Many of the examples are geared towards applications in analytics, statistics, and artificial intelligence. This focus skews the class slightly, but students will still build a strong enough foundation to learn domain specific skills. It also means, a little bit of prior statistics knowledge will be helpful during the last two classes.

Students who take this class will be able to:

  • Identify and describe the foundations of all programming languages, especially the use of:
    • Variables,
    • Common data processing techniques,
    • Control flow,
    • Functions,
    • Scope,
    • Imports, libraries, and APIs
  • Use Python to import and process data.
  • Work with libraries and APIs in Python.
  • Use Python libraries to create charts and graphs.
  • Describe artificial intelligence and machine learning at a high level.
  • Use Scikit Learn to perform ML tasks.

Course Outline

1. The Big Picture and The Very Basics

This class begins with the end by examining the code we hope to write and fully understand by the final class. After examining an example of a complete machine learning task, we'll dive into the basics in order to demystify the code.

  • What does it mean to program a computer?
  • What can programming languages do?
  • Why Python? How is it different from other languages?
  • Variables and data types
  • Combining data with operations
  • Exercise:
    • Write and run your first simple script.
    • Use a debugger to examine code examples.

2. Complex Data and Control Flow

Modern programs, especially machine learning programs, rely on large complex datasets. In this class we'll examine the two most foundational components of complex data: lists and dictionaries. Once we have seen these two collection types, we'll look at how to iteratively and selectively process individual items from the collections.

  • Lists and dictionaries
  • Control Flow (if/elif/else)
  • Looping
  • Exercise:
    • Create and modify lists and dictionaries
    • Search for items in a list
    • Create, modify, and select from nested lists and dictionaries

3. Functions and Classes

In Python (and most other programming languages) there are two fundamental ways to organize code into logical units: functions and classes. Reusing blocks of code and creating complex data structures both rely heavily on the existence of functions and classes. Similarly, the libraries that programmers use to execute AI and ML workloads are organized into importable functions and classes.

  • Functions
  • Classes
  • Code reuse
  • Exercise:
    • Create and use functions to process data
    • Create and use classes to manage complex data and processes

4. Working With Libraries and APIs Part 1

Python is popular for machine learning in no small part due to the wealth of available open source libraries and APIs focused on ML tasks. The ecosystem surrounding the language is at least as important as the language itself in terms of how the language ends up being used in the software industry.

  • Importing and using libraries and modules.
  • Reading documentation.
  • Searching for tutorials and answers to questions.
  • Using Jupyter Notebooks vs a text editor.
  • Exercise:
    • Use Jupyter Notebook, Pandas, and Matplotlib to perform descriptive statistics and create charts.

5. AI/ML In Python and K-Nearest Neighbors

After a primer on modern AI and ML, we'll finally get back to the example we saw in the first class. With our expanded understanding of Python and the foundations of computer programming, students will be challenged to expand on that example by building an additional model and adding useful charts.

  • What are artificial intelligence and machine learning?
  • What is supervised learning?
  • Scikit-learn's API.
  • Test/train/validation data splits.
  • The K-nearest Neighbor algorithm.
  • Exercise:
    • Modify the parameters of a K-NN model and evaluate the changing performance.
    • Use Scikit Learn to build a predictive model using a different algorithm.

6. Decision Trees, Random Forests, and Hyper-Parameter Search

In this class we'll introduce two more models, the Decision Tree and the Random Forest. We'll also introduce the concept of "hyper-parameter search" — a widely used tactic to decide which hyper-parameters are most effective for a given dataset.

  • The Decision Tree algorithm
  • Ensemble models
  • Bagging
  • Random Forests
  • Hyper-parameter search
  • Exercise:
    • Apply grid search with decision trees and random forests.

7. Back To The Big Picture: Further Study and What Can Go Wrong

Machine learning is powerful, the state of the art in ML is pushing boundaries in everything from language translation to facial recognition. In this final class we'll name some of the big topics we didn't have time to cover in class, discuss the current state of the art, and provide some sign-posts for your continuing journey in machine learning.

Machine learning is also dangerous. Many uses of machine learning tactics have been found to perpetuate and even exacerbate societal bias. As we discuss some of the most advanced ML tactics, we'll also discuss some cautionary tales about ML gone awry and advice for avoiding these pitfalls.

  • Popular ML use cases.
    • Facial recognition.
    • Language translation.
    • Fraud and spam detection.
    • Recommendation systems.
    • Generative models.
  • Prominent ML failures and some causes.
    • Class imbalance.
    • Assumed objectivity.
    • Poor data quality.
    • Adversarial data.
  • Exercise:
    • Work on your Kaggle challenge.

8. Kaggle Challenge Presentations

In the previous class students will have been presented with a Kaggle challenge and asked to build and train a model to complete the challenge. In this class students will each give a brief presentation about their work, results, and what they learned.