This repository contains a collection of Artificial Intelligence and Machine Learning projects focused on solving real-world problems using data-driven approaches. The projects demonstrate skills in data preprocessing, natural language processing (NLP), machine learning model development, and evaluation.
The goal of this repository is to build and showcase practical AI solutions and experiments using modern machine learning techniques.
- Python
- Scikit-learn
- Pandas
- NumPy
- Matplotlib
- Seaborn
- NLTK
- BeautifulSoup
- WordCloud
- Jupyter Notebook / Google Colab
- Data Cleaning and Preprocessing
- Natural Language Processing (NLP)
- Text Feature Engineering
- Bag of Words (Count Vectorization)
- TF-IDF Vectorization
- Model Training and Evaluation
- Random Forest Classification
- Model Performance Metrics
- Cross Validation
- Data Visualization
Objective:
Automatically classify e-commerce product descriptions into predefined categories.
Dataset:
- 50,000+ product descriptions
- 4 product categories:
- Books
- Clothing & Accessories
- Electronics
- Household
Approach:
- Text preprocessing
- HTML tag removal
- Tokenization
- Stopword removal
- Lemmatization
- Feature extraction using:
- Count Vectorization
- TF-IDF Vectorization
- Model training using Random Forest Classifier
Results:
- Model Accuracy: 93%
- Evaluated using:
- Precision
- Recall
- F1-score
- Confusion Matrix
Insights:
- Clothing & Accessories category achieved the highest classification accuracy.
- Electronics category showed slightly lower recall due to overlapping product descriptions.
- End-to-end NLP pipeline
- Text preprocessing techniques
- Comparison of feature engineering approaches
- Machine learning model training
- Model performance evaluation
- Visualization using WordCloud and plots
AI_models/ │ ├── ecommerce_product_classification │ ├── notebooks │ └── datasets
- Implement additional models such as:
- Naive Bayes
- Logistic Regression
- Gradient Boosting
- Hyperparameter tuning
- Model deployment using APIs
- Build an interactive application for product classification
Aman Bhargava working in Samsung Research Institute Bangalore
Interested in:
- Machine Learning
- Artificial Intelligence
- Natural Language Processing
- Data Science
- LLM Models
- Future of AI in Telecom domain
This repository is for learning and portfolio purposes.