Welcome to my CODSOFT Machine Learning Projects Repository! This is an ML portfolio from my Machine Learning internship at CODSOFT. It includes three practical ML projects that demonstrate the use of supervised learning techniques on real-world datasets. Each project involves data preprocessing, model training, and evaluation using Logistic Regression and other essential tools.
- Goal: Predict the genre of a movie based on its description or text-based features.
- Approach:
- Used
TfidfVectorizerto convert movie descriptions into numerical feature vectors. - Trained a classification model to predict genres from text data.
- Used
- Highlights:
- Great for NLP beginners.
- Hands-on use of TF-IDF and Logistic Regression.
- Goal: Detect fraudulent transactions from customer credit card transaction data.
- Approach:
- Combined
TfidfVectorizerfor categorical text features (like merchant, job, etc.) andStandardScalerfor numeric features. - Used
scipy.sparse.hstackto merge both types into a unified feature set. - Trained a Logistic Regression model.
- Combined
- Performance: Achieved approximately 99.5% accuracy.
- Highlights:
- Demonstrates hybrid data handling (text + numeric).
- Realistic fraud detection scenario.
- Goal: Predict whether a customer will leave a bank (churn) based on account and demographic data.
- Approach:
- Text features like geography and gender were vectorized.
- Numeric features were scaled using
StandardScaler. - Combined features and trained a Logistic Regression model.
- Performance: Achieved over 83% accuracy.
- Highlights:
- Focuses on customer behavior modeling.
- Illustrates simple preprocessing and classification techniques.
pandasfor data handlingscikit-learnfor preprocessing, modeling, and evaluationscipyfor handling sparse matricesTfidfVectorizerfor text feature extractionStandardScalerfor feature scalingLogisticRegressionfor classification
-
Clone this repository.
-
Navigate to the project directory you want to explore.
-
Install dependencies:
pip install -r requirements.txt
These projects collectively demonstrate key aspects of machine learning workflows, including:
- Data Preprocessing: Handling both text and numeric data, scaling, and vectorizing.
- Feature Engineering: Converting categorical data into numerical format and combining multiple data types.
- Model Training & Evaluation: Applying Logistic Regression to predict outcomes and measure performance.
Each project showcases practical applications of ML with real-world data, offering valuable experience in fraud detection, customer behavior analysis, and text classification β making them ideal for learners entering the field of data science and Machine Learning.