This project provides tools to search for datasets on Kaggle, download and preprocess them, and perform predictions using a Linear Regression model. It includes interactive text-based user interfaces built with curses
.
- Search for datasets on Kaggle interactively.
- Download datasets and automatically extract files.
- Load datasets into a pandas DataFrame and preprocess them.
- Train a Linear Regression model and evaluate it using RMSE and MAE.
- Visualize results with scatter plots.
- Python: Python 3.7 or higher.
- Install Required Libraries:
pip install pandas numpy matplotlib scikit-learn kaggle
- Go to Kaggle Account.
- Download the kaggle.json API token.
- Place it in ~/.kaggle/ (Linux/Mac) or %USERPROFILE%.kaggle\ (Windows).
.
├── kaggle_connect.py # Handles dataset search and download via Kaggle API.
├── prediction.py # Performs data preprocessing, model training, and visualization.
└── README.md # Documentation for the project.
Run the following command to search, download a kaggle dataset and prediction script:
python prediction.py
Or
python3 prediction.py
Follow the interactive prompts:
- Enter a search term for datasets (e.g., Boston Housing Dataset).
- Select a dataset from the list.
- Specify a folder to store the downloaded files.
The script:
- Displays descriptive statistics of the data.
- Splits the data into training and testing sets.
- Trains a Linear Regression model and evaluates its performance.
- Displays a scatter plot comparing actual and predicted values.
Model Metrics
Contributions are welcome! Feel free to submit issues or pull requests to enhance the functionality.