Skip to content

SoudeepGhoshal/EmotionDetection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Emotion Distribution Modeling in Text

An applied multi-label NLP system that predicts probabilistic distributions over 10 emotion categories from input text. The project integrates multiple public emotion datasets into a unified taxonomy and implements a complete training, evaluation, and inference pipeline using a BERT-based architecture.


Overview

The system predicts probabilities for the following emotion categories:

  • Joy
  • Sadness
  • Anger
  • Fear
  • Surprise
  • Disgust
  • Love
  • Neutral
  • Sarcasm
  • Nostalgia

The model operates in a multi-label setting using sigmoid outputs, allowing multiple emotions to be active for a single input.


Dataset Construction

To build a consistent multi-label framework, the following datasets were integrated:

  • GoEmotions (Google Research)
  • Twitter Emotion Dataset

Preprocessing steps:

  • Harmonized heterogeneous label spaces into a unified 10-emotion taxonomy.
  • Mapped fine-grained labels into broader categories.
  • Generated probabilistic target vectors for multi-label learning.
  • Performed stratified train/validation/test splits.
  • Stored dataset statistics and metadata for reproducibility.

Model Architecture

  • Backbone: bert-base-uncased
  • Classification Head: Linear projection over pooled CLS representation
  • Loss Function: BCEWithLogitsLoss
  • Optimizer: AdamW
  • Learning Rate Scheduler: Cosine Annealing
  • Early Stopping based on validation performance

The model is trained to output independent probabilities for each emotion category.


Evaluation Results

Evaluation performed on 14,853 test samples.

Overall Metrics

Metric Value
Dominant Accuracy 0.7045
Subset Accuracy 0.6364
Hamming Loss 0.0559
Macro F1 Score 0.5232
Weighted F1 Score 0.7037

Per-Emotion F1 Scores

Emotion F1 Score
Joy 0.7515
Sadness 0.8378
Anger 0.6313
Fear 0.8041
Surprise 0.4725
Disgust 0.3119
Love 0.7568
Neutral 0.6664
Sarcasm 0.0000
Nostalgia 0.0000

Lower performance on sarcasm and nostalgia is primarily due to severe class imbalance in the integrated dataset.


Visual Diagnostics

Confusion Matrix

Confusion Matrix

Per-Emotion Performance

Per Emotion Metrics

These diagnostics help identify systematic confusion patterns (e.g., anger vs. fear, joy vs. sarcasm) and class imbalance effects.


Inference Pipeline

The inference module includes:

  • Tokenization with truncation and padding
  • Sigmoid probability outputs
  • Threshold-based filtering
  • Probability mass redistribution for interpretability
  • Command-line interface for quick testing

Example usage:

python inference.py "I feel incredibly happy today!"

Reproducing Results

1. Clone the Repository

git clone <repository-url>
cd emotion-detection

2. Install Dependencies

pip install -r requirements.txt

Required libraries include:

  • torch
  • transformers
  • scikit-learn
  • pandas
  • numpy
  • matplotlib

3. Prepare the Dataset

Place processed CSV files in the data/ directory:

data/
├── train.csv
├── val.csv
└── test.csv

4. Train the Model

python train.py

The best-performing model checkpoint will be saved in:

models/best_model.pt

5. Evaluate the Model

python test.py

Evaluation reports and visualizations will be generated automatically.


Project Structure

emotion-detection/
│
├── data/
├── models/
├── train.py
├── test.py
├── inference.py
├── preprocess.py
├── requirements.txt
├── LICENCE.txt
└── README.md

Limitations

  • Severe class imbalance for sarcasm and nostalgia.
  • Semantic ambiguity between closely related emotions.
  • Neutral class bias under uncertainty.

Future improvements may include class-weighted loss functions, data augmentation, or probability calibration.


Author

Soudeep Ghoshal

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages