Skip to content

Harshchafle/Machine-Learning-and-Data-Analytics-with-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

51 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine Learning Learning Timetable

Phase 1: Foundations & Supervised Learning (Weeks 1-8)

This phase focuses on core Python libraries for ML, essential math, and the most common supervised learning algorithms.

Week 1: Python for Data Science & Math Fundamentals

Day 1: Python Refresh & NumPy

Topic 1: Python Refresher (if needed): Functions, classes, modules.

Topic 2: Introduction to NumPy: Arrays, array operations, broadcasting.

Day 2: Pandas for Data Manipulation

Topic 1: Pandas DataFrames: Creation, selection, indexing.

Topic 2: Data Cleaning with Pandas: Handling missing values, duplicates.

Day 3: Data Visualization with Matplotlib & Seaborn

Topic 1: Matplotlib Basics: Line plots, scatter plots, histograms.

Topic 2: Seaborn for Statistical Plots: Box plots, violin plots, heatmaps.

Day 4: Introduction to Linear Algebra for ML

Topic 1: Vectors and Matrices: Operations, dot product, transpose.

Topic 2: Systems of Linear Equations (conceptual understanding).

Day 5: Introduction to Calculus for ML

Topic 1: Derivatives and Gradients (conceptual understanding).

Topic 2: Gradient Descent intuition (how models learn).

Day 6: Probability & Statistics Basics

Topic 1: Probability distributions (Normal, Bernoulli).

Topic 2: Descriptive Statistics (mean, median, mode, variance, standard deviation).

Day 7: Review & Practice

Review all concepts learned.

Solve small coding challenges using NumPy and Pandas.

Week 2: Exploratory Data Analysis (EDA) & Data Preprocessing

Day 1: Advanced EDA Techniques

Topic 1: Correlation analysis.

Topic 2: Outlier detection and treatment.

Day 2: Feature Engineering - Numerical Data

Topic 1: Scaling (Standardization, Normalization).

Topic 2: Binning, polynomial features.

Day 3: Feature Engineering - Categorical Data

Topic 1: One-Hot Encoding.

Topic 2: Label Encoding, Target Encoding (conceptual).

Day 4: Data Splitting & Cross-Validation

Topic 1: Train-Validation-Test Split.

Topic 2: K-Fold Cross-Validation.

Day 5: Introduction to Scikit-learn

Topic 1: Scikit-learn API: Estimators, transformers, pipelines.

Topic 2: Basic data loading and preprocessing with scikit-learn.

Day 6: Building a First ML Pipeline

Topic 1: Create a basic pipeline for data preprocessing and model training.

Topic 2: Implement a simple regression model end-to-end.

Day 7: Review & Mini Project

Consolidate EDA and preprocessing skills.

Mini-Project: Perform EDA and preprocess a simple dataset (e.g., Iris, Boston Housing).

Week 3: Linear & Logistic Regression

Day 1: Linear Regression Theory

Topic 1: Simple Linear Regression: Hypothesis, cost function (MSE).

Topic 2: Gradient Descent for Linear Regression.

Day 2: Implementing Linear Regression

Topic 1: Coding Linear Regression from scratch (using NumPy).

Topic 2: Linear Regression with scikit-learn.

Day 3: Regression Evaluation Metrics

Topic 1: MAE, MSE, RMSE, R-squared.

Topic 2: Interpreting regression coefficients.

Day 4: Logistic Regression Theory

Topic 1: Logistic Regression: Sigmoid function, cross-entropy loss.

Topic 2: Gradient Descent for Logistic Regression.

Day 5: Implementing Logistic Regression

Topic 1: Coding Logistic Regression from scratch (using NumPy).

Topic 2: Logistic Regression with scikit-learn.

Day 6: Classification Evaluation Metrics

Topic 1: Accuracy, Precision, Recall, F1-score.

Topic 2: Confusion Matrix.

Day 7: Review & Practice

Solve problems involving both regression and classification on small datasets.

Week 4: Regularization & Model Selection

Day 1: Overfitting & Underfitting

Topic 1: Bias-Variance Trade-off.

Topic 2: Identifying overfitting and underfitting.

Day 2: Regularization Techniques

Topic 1: L1 (Lasso) Regularization.

Topic 2: L2 (Ridge) Regularization.

Day 3: Hyperparameter Tuning

Topic 1: Grid Search.

Topic 2: Random Search.

Day 4: Model Persistence

Topic 1: Saving and loading models (Joblib, Pickle).

Topic 2: Versioning trained models (conceptual).

Day 5: Project Structuring Introduction

Topic 1: Basic folder structure for an ML project (data/, notebooks/, src/, models/).

Topic 2: README.md and requirements.txt importance.

Day 6: Git & GitHub Basics

Topic 1: Version control with Git: add, commit, push, pull.

Topic 2: Creating a GitHub repository for your project.

Day 7: Mini-Project: Predictive Model with Regularization & Hyperparameter Tuning

Choose a classification or regression dataset.

Implement a full pipeline: preprocessing, model training with regularization, hyperparameter tuning, evaluation.

Structure your project with basic folders and use Git.

Phase 2: Advanced Supervised & Unsupervised Learning (Weeks 9-16)

This phase introduces more complex algorithms and moves into unsupervised methods.

Week 5: Decision Trees & Ensemble Methods (Part 1)

Day 1: Decision Tree Theory

Topic 1: How Decision Trees work: Splits, impurity (Gini, Entropy).

Topic 2: Overfitting in Decision Trees and Pruning.

Day 2: Implementing Decision Trees

Topic 1: Decision Trees for Classification with scikit-learn.

Topic 2: Decision Trees for Regression with scikit-learn.

Day 3: Ensemble Learning Introduction

Topic 1: Bagging vs. Boosting (conceptual).

Topic 2: Random Forests theory.

Day 4: Random Forests Implementation

Topic 1: Random Forest Classifier with scikit-learn.

Topic 2: Random Forest Regressor with scikit-learn.

Day 5: Feature Importance

Topic 1: Understanding feature importance from tree-based models.

Topic 2: Practical applications of feature importance.

Day 6: Advanced Evaluation Metrics

Topic 1: ROC AUC Curve and PR Curve (for imbalanced data).

Topic 2: Model interpretability beyond coefficients.

Day 7: Review & Practice

Compare performance of Linear/Logistic Regression, Decision Trees, and Random Forests on a dataset.

Week 6: Ensemble Methods (Part 2) & SVMs

Day 1: Gradient Boosting Theory

Topic 1: AdaBoost (conceptual).

Topic 2: Gradient Boosting Machines (GBM) intuition.

Day 2: XGBoost & LightGBM

Topic 1: Introduction to XGBoost library.

Topic 2: Introduction to LightGBM library.

Day 3: Support Vector Machines (SVM) Theory

Topic 1: Linear SVM: Max-margin classifier.

Topic 2: Kernels (Polynomial, RBF) for non-linear data.

Day 4: Implementing SVMs

Topic 1: SVM Classifier with scikit-learn.

Topic 2: SVM Regressor with scikit-learn.

Day 5: Choosing the Right Algorithm

Topic 1: Factors influencing algorithm choice (data size, linearity, interpretability).

Topic 2: When to use which algorithm.

Day 6: Project Setup & Environment Management

Topic 1: Virtual environments (conda, venv).

Topic 2: requirements.txt best practices.

Day 7: Mini-Project: Advanced Classification/Regression

Apply XGBoost/LightGBM or SVM to a moderately complex dataset.

Focus on hyperparameter tuning and robust evaluation.

Week 7: Unsupervised Learning - Clustering

Day 1: Introduction to Unsupervised Learning

Topic 1: Difference between supervised and unsupervised learning.

Topic 2: Use cases for clustering and dimensionality reduction.

Day 2: K-Means Clustering Theory

Topic 1: K-Means algorithm: Centroids, inertia.

Topic 2: Elbow method for optimal K.

Day 3: Implementing K-Means

Topic 1: K-Means with scikit-learn.

Topic 2: Interpreting clusters.

Day 4: Hierarchical Clustering

Topic 1: Agglomerative vs. Divisive clustering (conceptual).

Topic 2: Dendrograms.

Day 5: DBSCAN Clustering

Topic 1: DBSCAN algorithm: Density-based clustering.

Topic 2: Advantages and disadvantages over K-Means.

Day 6: Cluster Evaluation Metrics

Topic 1: Silhouette Score.

Topic 2: Davies-Bouldin Index.

Day 7: Review & Practice

Apply different clustering algorithms to an unlabeled dataset and compare results.

Week 8: Unsupervised Learning - Dimensionality Reduction

Day 1: Introduction to Dimensionality Reduction

Topic 1: Curse of Dimensionality.

Topic 2: Use cases for dimensionality reduction.

Day 2: Principal Component Analysis (PCA) Theory

Topic 1: PCA intuition: Variance maximization, eigenvalues/eigenvectors.

Topic 2: Interpreting principal components.

Day 3: Implementing PCA

Topic 1: PCA with scikit-learn.

Topic 2: Using PCA for visualization and feature extraction.

Day 4: t-SNE and UMAP (Conceptual)

Topic 1: t-SNE for visualization of high-dimensional data.

Topic 2: UMAP for dimensionality reduction and visualization.

Day 5: Feature Selection Techniques

Topic 1: Filter methods (e.g., variance threshold, correlation).

Topic 2: Wrapper methods (e.g., RFE).

Day 6: Model Interpretability

Topic 1: SHAP values (conceptual).

Topic 2: LIME (conceptual).

Day 7: Mini-Project: Unsupervised Learning Application

Apply clustering and/or dimensionality reduction to a suitable dataset.

Visualize the results and interpret the findings.

Phase 3: Deep Learning & MLOps Foundations (Weeks 9-16)

This phase introduces Deep Learning concepts and lays the groundwork for MLOps.

Week 9: Introduction to Neural Networks & TensorFlow/Keras

Day 1: Neural Network Fundamentals

Topic 1: Perceptrons, activation functions.

Topic 2: Feedforward Neural Networks architecture.

Day 2: Backpropagation & Optimization

Topic 1: How neural networks learn (conceptual understanding of backpropagation).

Topic 2: Optimizers (SGD, Adam, RMSprop).

Day 3: TensorFlow/Keras Basics

Topic 1: Setting up TensorFlow/Keras environment.

Topic 2: Building a simple dense neural network for classification.

Day 4: Training & Evaluation of NNs

Topic 1: Training process (epochs, batch size).

Topic 2: Loss functions, metrics for neural networks.

Day 5: Overfitting in Neural Networks

Topic 1: Dropout regularization.

Topic 2: Early stopping.

Day 6: Introduction to Computer Vision with CNNs

Topic 1: Convolutional Neural Network (CNN) intuition.

Topic 2: Common CNN architectures (LeNet, AlexNet - conceptual).

Day 7: Review & Practice

Implement a simple neural network for a classification task (e.g., MNIST).

Week 10: Convolutional Neural Networks (CNNs)

Day 1: Deeper Dive into CNNs

Topic 1: Convolutional layers, pooling layers.

Topic 2: Batch Normalization.

Day 2: Building a CNN for Image Classification

Topic 1: Practical implementation of a CNN in Keras for image classification (e.g., CIFAR-10).

Topic 2: Data Augmentation for image data.

Day 3: Transfer Learning

Topic 1: What is transfer learning and why it's useful.

Topic 2: Using pre-trained models (e.g., VGG16, ResNet) with Keras.

Day 4: Object Detection (Conceptual)

Topic 1: Introduction to object detection tasks.

Topic 2: High-level overview of R-CNN, YOLO, SSD.

Day 5: Image Segmentation (Conceptual)

Topic 1: Introduction to image segmentation tasks.

Topic 2: High-level overview of U-Net, Mask R-CNN.

Day 6: Model Explainability for Deep Learning (Conceptual)

Topic 1: Grad-CAM, LRP (Layer-wise Relevance Propagation).

Topic 2: Importance of interpreting deep learning models.

Day 7: Mini-Project: Image Classification with Transfer Learning

Build an image classifier using a pre-trained CNN on a custom dataset.

Week 11: Natural Language Processing (NLP) Fundamentals

Day 1: NLP Basics

Topic 1: Text preprocessing: Tokenization, stemming, lemmatization.

Topic 2: Bag-of-Words, TF-IDF.

Day 2: Recurrent Neural Networks (RNNs)

Topic 1: RNN architecture and sequential data.

Topic 2: Limitations of simple RNNs.

Day 3: LSTMs & GRUs

Topic 1: Long Short-Term Memory (LSTM) networks.

Topic 2: Gated Recurrent Units (GRU).

Day 4: Implementing RNNs/LSTMs for Text

Topic 1: Text classification with LSTMs in Keras.

Topic 2: Sentiment analysis example.

Day 5: Word Embeddings

Topic 1: Word2Vec, GloVe (conceptual).

Topic 2: Using pre-trained word embeddings.

Day 6: Attention Mechanism & Transformers (Conceptual)

Topic 1: Introduction to Attention.

Topic 2: High-level overview of Transformer architecture.

Day 7: Mini-Project: Basic NLP Task

Build a text classifier (e.g., spam detection) using traditional ML and then an LSTM.

Week 12: MLOps - Problem Definition & Data Management

Day 1: ML Project Lifecycle Overview

Topic 1: CRISP-DM vs. typical ML lifecycle (Problem Definition, Data, Modeling, Deployment, Monitoring).

Topic 2: Why MLOps is crucial.

Day 2: Defining the ML Problem

Topic 1: Business understanding: What problem are we solving?

Topic 2: Defining success metrics (business and technical).

Day 3: Data Collection Strategies

Topic 1: Data sources (databases, APIs, web scraping).

Topic 2: Data acquisition pipelines (conceptual).

Day 4: Data Versioning (DVC - Conceptual)

Topic 1: Why version data?

Topic 2: Introduction to Data Version Control (DVC).

Day 5: Feature Stores (Conceptual)

Topic 1: Why use feature stores?

Topic 2: Online vs. offline feature stores.

Day 6: Data Governance & Ethics in ML

Topic 1: Privacy concerns (GDPR, PII).

Topic 2: Fairness, bias in data and models.

Day 7: Case Study Analysis

Analyze a real-world ML project's problem definition and data strategy.

Week 13: MLOps - Model Development & Experiment Tracking

Day 1: Experiment Tracking

Topic 1: Why track experiments?

Topic 2: Introduction to MLflow (logging parameters, metrics, models).

Day 2: Reproducibility in ML

Topic 1: Ensuring reproducible results.

Topic 2: Seed management, environment consistency.

Day 3: Code Organization for ML Projects

Topic 1: Modular code structure (src/, scripts/, tests/).

Topic 2: Best practices for writing clean and testable ML code.

Day 4: Unit Testing for ML Code

Topic 1: Testing data preprocessing functions.

Topic 2: Testing model training components.

Day 5: Model Registry (Conceptual)

Topic 1: Centralized model management.

Topic 2: Versioning and staging models.

Day 6: CI/CD for ML (Conceptual)

Topic 1: Continuous Integration/Continuous Deployment for ML.

Topic 2: Automated testing and deployment.

Day 7: Practice: MLflow Integration

Integrate MLflow into one of your previous mini-projects to track experiments.

Week 14: MLOps - Deployment Strategies

Day 1: Model Deployment Overview

Topic 1: Different deployment patterns (batch, real-time, edge).

Topic 2: REST APIs for model serving.

Day 2: Flask/FastAPI for Model Serving

Topic 1: Building a simple Flask/FastAPI endpoint to serve predictions.

Topic 2: Receiving requests and returning predictions.

Day 3: Docker for Containerization

Topic 1: Docker fundamentals: Images, containers, Dockerfiles.

Topic 2: Containerizing your ML model and its dependencies.

Day 4: Cloud Deployment (AWS/GCP/Azure - Conceptual)

Topic 1: Overview of cloud ML services (SageMaker, Vertex AI, Azure ML).

Topic 2: Basic conceptual understanding of deploying a Docker container to a cloud platform.

Day 5: Kubernetes (Conceptual)

Topic 1: Introduction to Kubernetes for orchestration.

Topic 2: Scaling and managing containerized applications.

Day 6: Serverless Functions for ML (Conceptual)

Topic 1: Deploying models as serverless functions (AWS Lambda, Google Cloud Functions).

Topic 2: Pros and cons of serverless.

Day 7: Mini-Project: Deploy a Simple Model

Take one of your trained models, create a Flask/FastAPI endpoint, and containerize it with Docker.

Week 15: MLOps - Monitoring & Maintenance

Day 1: Model Monitoring

Topic 1: Why monitor models in production?

Topic 2: Key metrics to monitor (performance, data drift, concept drift).

Day 2: Data Drift Detection

Topic 1: Techniques for detecting changes in input data distribution.

Topic 2: Tools for data drift monitoring (e.g., Evidently AI - conceptual).

Day 3: Concept Drift Detection

Topic 1: Detecting changes in the relationship between input and output.

Topic 2: Strategies for handling concept drift (model retraining).

Day 4: Model Retraining Strategies

Topic 1: Scheduled retraining.

Topic 2: Event-driven retraining.

Day 5: A/B Testing for ML Models

Topic 1: Comparing different model versions in production.

Topic 2: Gradual rollout strategies.

Day 6: Incident Response & Alerting

Topic 1: Setting up alerts for model performance degradation.

Topic 2: Troubleshooting common MLOps issues.

Day 7: Review & Project Ideation

Reflect on the entire ML lifecycle.

Brainstorm ideas for a capstone project that incorporates MLOps principles.

Week 16: Advanced MLOps & Special Topics (as needed)

Day 1: Responsible AI & Explainable AI (XAI)

Topic 1: Bias and fairness in ML models.

Topic 2: Introduction to XAI tools (e.g., LIME, SHAP) in MLOps context.

Day 2: ML Security

Topic 1: Adversarial attacks on ML models.

Topic 2: Securing ML pipelines.

Day 3: Model Governance & Auditing

Topic 1: Compliance and regulatory requirements.

Topic 2: Maintaining audit trails for models.

Day 4: Reinforcement Learning (Conceptual)

Topic 1: Introduction to RL: Agents, environments, rewards.

Topic 2: Q-learning, Deep Q-Networks (conceptual).

Day 5: Graph Neural Networks (Conceptual)

Topic 1: Introduction to GNNs.

Topic 2: Use cases for graph data.

Day 6: Ethics in AI

Topic 1: Societal impact of AI.

Topic 2: Ethical guidelines for AI development.

Day 7: Capstone Project Planning & Literature Review

Finalize your capstone project idea.

Start researching relevant papers, datasets, and existing solutions.

Machine Learning Project and Requirement Structure A well-structured ML project is crucial for collaboration, reproducibility, and maintainability. Here's a common and highly recommended structure:

your-ml-project-name/ ├── .git/ # Git version control directory (hidden) ├── .github/ # GitHub Actions (for CI/CD later) │ └── workflows/ │ └── main.yml # CI/CD pipeline definition ├── data/ │ ├── raw/ # Original, immutable raw data │ ├── interim/ # Intermediate data (after initial cleaning/transformation) │ └── processed/ # Final, ready-to-use data for training ├── notebooks/ # Jupyter notebooks for exploration, EDA, prototyping │ ├── 01_eda.ipynb │ ├── 02_model_experimentation.ipynb │ └── ... ├── src/ # Source code for the ML pipeline │ ├── init.py # Makes 'src' a Python package │ ├── components/ # Reusable modules/functions for specific pipeline steps │ │ ├── init.py │ │ ├── data_ingestion.py │ │ ├── data_transformation.py │ │ ├── model_trainer.py │ │ └── ... │ ├── pipeline/ # End-to-end ML pipelines │ │ ├── init.py │ │ ├── train_pipeline.py │ │ ├── predict_pipeline.py │ │ └── ... │ ├── utils/ # General utility functions (e.g., file I/O, logging) │ │ └── init.py │ │ └── utils.py │ ├── logger.py # Custom logging configuration │ └── exception.py # Custom exception handling ├── models/ # Trained model artifacts (.pkl, .h5, etc.) ├── experiments/ # Logs and artifacts from MLflow/DVC runs ├── config/ # Configuration files (YAML, JSON) for paths, hyperparameters │ └── config.yaml ├── tests/ # Unit and integration tests │ ├── test_data_processing.py │ ├── test_model_training.py │ └── ... ├── app/ # (Optional) For deploying the model as an API │ ├── main.py # Flask/FastAPI application │ ├── Dockerfile # Dockerfile for containerizing the API │ └── requirements.txt # Dependencies for the API ├── Dockerfile # (Optional) For creating a Docker image of the entire project/training environment ├── Makefile # (Optional) For automating common tasks (e.g., make train, make test) ├── requirements.txt # Python dependencies for the project ├── setup.py # (Optional) For making your project installable as a package ├── README.md # Project overview, setup, usage, results ├── LICENSE # Software license (e.g., MIT, Apache 2.0) └── .gitignore # Files/directories to be ignored by Git

About

Learning Python to Upskill in programming

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published