Here is the final version of your GitHub README.md with both training accuracy (80%) and testing accuracy (79%) clearly mentioned:
# 🐦 Twitter Sentiment Analysis Using Machine Learning
This project aims to analyze and classify sentiments expressed in tweets related to various airlines. It applies machine learning techniques to categorize each tweet as **Positive**, **Negative**, or **Neutral** based on its textual content.
---
## 📌 Objective
To build a machine learning model that classifies the sentiment of tweets using Natural Language Processing (NLP) and vectorization techniques.
---
## 📂 Dataset
The dataset used is `train.csv`, containing the following key columns:
- `tweet_id` – Unique ID for each tweet
- `airline` – The airline company mentioned
- `airline_sentiment` – The sentiment label (positive, negative, neutral)
- `text` – The content of the tweet
---
## 🧰 Libraries Used
- `pandas`, `numpy` – Data manipulation
- `matplotlib`, `seaborn` – Data visualization
- `nltk` – Text preprocessing
- `sklearn` – Machine learning models and evaluation
- `wordcloud` – Word cloud visualization
---
## 🔄 Workflow
### 1. **Data Preprocessing**
- Lowercasing text
- Removing URLs, mentions, hashtags, punctuations
- Removing stopwords
- Tokenization and Lemmatization (using NLTK)
### 2. **Exploratory Data Analysis**
- Visualizing sentiment distribution
- Airline-wise sentiment analysis
- Word clouds for each sentiment category
### 3. **Feature Extraction**
- TF-IDF Vectorization of cleaned text
### 4. **Model Training**
Trained the following classifiers:
- Logistic Regression
- Naive Bayes
- Random Forest
- Support Vector Machine (SVM)
> 📈 **Best Model Training Accuracy: ~80%**
> 🧪 **Best Model Testing Accuracy: ~79%**
### 5. **Model Evaluation**
Used the following metrics:
- Accuracy
- Confusion Matrix
- Classification Report
---
## ✅ Results
The best-performing model achieved:
- **Training Accuracy:** ~80%
- **Testing Accuracy:** ~79%
These results indicate strong model performance with minimal overfitting.
---
## 💡 Possible Improvements
- Integrate deep learning models like LSTM for better results
- Add real-time tweet scraping using Tweepy (Twitter API)
- Use Word2Vec or transformer-based embeddings (like BERT)
- Perform cross-validation for better model reliability
---
## 📊 Visualizations
- Word clouds for Positive, Negative, Neutral tweets
- Bar charts for sentiment distribution across airlines
- Confusion matrices for each ML model
---
## 🚀 Getting Started
1. Clone the repo:
```bash
git clone https://github.com/venkat-0706/Twalyze.git-
Install dependencies:
pip install -r requirements.txt
-
Run the notebook: Open the Colab notebook here
Created by @venkat-0706 Feel free to reach out for suggestions or collaborations!