I recently (early 2018) began learning Python for Data Science and have explored NumPy, Pandas and Matplotlib to clean, analyze and visualize data.
This repo is used to store practice exercises and a project for the workshops that I attended.
The program description (complete as of Feb 10, 2018),
Class Hours: 18
Our program serves as the foundation for many well-known concepts of data science. We teach practical techniques and algorithms for extracting and studying useful knowledge from data. This course is not a theory class as we believe there are many ways to learn statistics and analytics concepts on your own. We are providing students with a set of practical tools for data science and knowledge on how to apply Python to solve linear algebra, statistics, and probability problems. This course is designed to fill the gap between theoretical academic research and the needs of the industry. We will start with a crash course on the basics of the Python programming language and then learn how to use Python to turn raw data into insight and knowledge.
Fundamental introduction to Data Science using Python programming language, practical application of different statistical, analytical and linear algebra models to a variety of data science projects, and feeling comfortable enough to apply acquired knowledge on your own seeking a junior data scientist position.
- Discover best practices for data analysis and start on the path to becoming a data scientist
- Get comfortable using Python to build statistical and analytical models
- Learn and practice essential tools for data analytics: NumPy, Pandas and Matplotlib
- Learn to find solutions to problems by analyzing data using appropriate tools
- Master your analytical skills by working on real life projects
- Explore graphical techniques to see what your data looks like
- Implement the core Data Science techniques of Linear Algebra, Probability, Gradient Descent, and Linear Regression
- Build your own analytical tools with Python from scratch
- Become familiar with industry standards and learn the best practices for writing code
Lecture on new topics takes about 90 minutes and starts at 10.00am. After lecture, students start working on new exercises with instructor guidance. Around 1.00pm students present and discuss their work with instructors, learn alternative solutions, and best practices from instructors and invited data scientist professionals.
Session 1
Variables Data types: strings, integers, floats, lists Mutability Control Flow statements If statements For loops Practical Exercises
Session 2
Functions Data types: tuples, dictionaries, sets While loops Indexing and slicing Reading from CSV and TXT Files Writing to CSV and TXT Files Analyzing a File’s content Practical Exercises
Session 3
Scientific computing with Python NumPy Arrays Creating and manipulating NumPy Arrays Computation on NumPy Arrays Broadcasting and UFuncs Sorting and Indexing NumPy Arrays Practical Exercises
Session 4
Python Data Analysis Library - Pandas Pandas Data structures Data Indexing and Selection Aggregation and Grouping High-Performance Pandas Logic, Control Flow and Filtering in Pandas Practical Exercises
Session 5
Visualization with Matplotlib Line Plots, Scatter Plots and Histograms Customizing Plots Multiple Subplots Density and Contour Plots Practical Exercises
Session 6
Final Project – Statistical Modeling with Python Writing Efficient Python Code Q and A Session