Skip to content

venkat-0706/Twalyze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

Twalyze

Here is the final version of your GitHub README.md with both training accuracy (80%) and testing accuracy (79%) clearly mentioned:


# 🐦 Twitter Sentiment Analysis Using Machine Learning

This project aims to analyze and classify sentiments expressed in tweets related to various airlines. It applies machine learning techniques to categorize each tweet as **Positive**, **Negative**, or **Neutral** based on its textual content.

---

## 📌 Objective

To build a machine learning model that classifies the sentiment of tweets using Natural Language Processing (NLP) and vectorization techniques.

---

## 📂 Dataset

The dataset used is `train.csv`, containing the following key columns:

- `tweet_id` – Unique ID for each tweet  
- `airline` – The airline company mentioned  
- `airline_sentiment` – The sentiment label (positive, negative, neutral)  
- `text` – The content of the tweet  

---

## 🧰 Libraries Used

- `pandas`, `numpy` – Data manipulation  
- `matplotlib`, `seaborn` – Data visualization  
- `nltk` – Text preprocessing  
- `sklearn` – Machine learning models and evaluation  
- `wordcloud` – Word cloud visualization  

---

## 🔄 Workflow

### 1. **Data Preprocessing**
- Lowercasing text  
- Removing URLs, mentions, hashtags, punctuations  
- Removing stopwords  
- Tokenization and Lemmatization (using NLTK)  

### 2. **Exploratory Data Analysis**
- Visualizing sentiment distribution  
- Airline-wise sentiment analysis  
- Word clouds for each sentiment category  

### 3. **Feature Extraction**
- TF-IDF Vectorization of cleaned text  

### 4. **Model Training**
Trained the following classifiers:
- Logistic Regression  
- Naive Bayes  
- Random Forest  
- Support Vector Machine (SVM)  

> 📈 **Best Model Training Accuracy: ~80%**  
> 🧪 **Best Model Testing Accuracy: ~79%**

### 5. **Model Evaluation**
Used the following metrics:
- Accuracy  
- Confusion Matrix  
- Classification Report  

---

## ✅ Results

The best-performing model achieved:
- **Training Accuracy:** ~80%  
- **Testing Accuracy:** ~79%  

These results indicate strong model performance with minimal overfitting.

---

## 💡 Possible Improvements

- Integrate deep learning models like LSTM for better results  
- Add real-time tweet scraping using Tweepy (Twitter API)  
- Use Word2Vec or transformer-based embeddings (like BERT)  
- Perform cross-validation for better model reliability  

---

## 📊 Visualizations

- Word clouds for Positive, Negative, Neutral tweets  
- Bar charts for sentiment distribution across airlines  
- Confusion matrices for each ML model  

---

## 🚀 Getting Started

1. Clone the repo:
   ```bash
   git clone https://github.com/venkat-0706/Twalyze.git
  1. Install dependencies:

    pip install -r requirements.txt
  2. Run the notebook: Open the Colab notebook here


📬 Contact

Created by @venkat-0706 Feel free to reach out for suggestions or collaborations!

Releases

No releases published

Packages

 
 
 

Contributors