This repository contains a complete, reproducible end-to-end machine learning pipeline for the Autism Spectrum Disorder (ASD) Screening Dataset for Children, covering data preprocessing, exploratory data analysis (EDA), feature engineering, model training, evaluation, and deployment.
The project is implemented using a Jupyter Notebook for experimentation and analysis, and the trained model is deployed using Flask as a lightweight web application.
- Handling missing values
- Encoding categorical variables
- Scaling numerical features
- Outlier detection and removal
- Data leakage prevention using a configurable
DROP_LEAKAGEflag - Finding top 10 features
- Train/test split
- Classical ML models:
- Decision Tree Classifer
- Other baseline classifiers
- Model evaluation using:
- Accuracy
- Precision
- Recall
- F1-score
- Model & preprocessing pipeline serialization
- Trained ML model deployed using Flask
- Reuse of saved preprocessing pipeline in production
- Modular and extensible backend structure
autism-pipeline/
β
βββ autism.csv # Cleaned dataset
βββ Autism-Child-Data.arff # Original ARFF dataset
β
βββ models.pkl # PKL file
βββ preprocessor.pkl # PKL file
β
βββ autism_pipeline_notebook.ipynb # EDA + preprocessing + training
β
βββ templates/
β βββ index.html # Fronted
β
βββ app.py # Flask application
βββ train.py # Model training & serialization
βββ README.md # Project documentation
βββ .gitignore
git clone https://github.com/your-username/autism-pipeline.git
cd autism-pipeline