This project automates the cleaning and analysis of messy CSV/Excel data. It identifies missing values, detects outliers, normalizes data, and generates insightful reports with visualizations.
- Handles missing values (fills with mean/median or "Unknown" for categorical data)
- Detects outliers using the Interquartile Range (IQR) method
- Normalizes numerical data using Min-Max Scaling
- Generates statistical reports and visualizations (Matplotlib & Seaborn)
- Saves the cleaned dataset for further analysis
- Python
- Pandas (for data manipulation)
- Matplotlib & Seaborn (for data visualization)
- Jupyter Notebook (for interactive execution)
- Clone the Repository:
git clone https://github.com/kbhavneet/data-cleaning-project.git cd data-cleaning - Install Dependencies:
pip install pandas matplotlib seaborn openpyxl numpy scikit-learn
- Run Jupyter Notebook:
jupyter notebook
- Open
data_cleaning_final.ipynband execute the cells.
- Open
- Place Your Dataset (
.csvor.xlsx) inside the project folder assample_data.csv. - Run the Jupyter Notebook to clean and process the data.
- View the Insights:
- Processed data is saved as
cleaned_data.csv. - Visualization plots display trends and distributions.
- Processed data is saved as
❤️


