An applied multi-label NLP system that predicts probabilistic distributions over 10 emotion categories from input text. The project integrates multiple public emotion datasets into a unified taxonomy and implements a complete training, evaluation, and inference pipeline using a BERT-based architecture.
The system predicts probabilities for the following emotion categories:
- Joy
- Sadness
- Anger
- Fear
- Surprise
- Disgust
- Love
- Neutral
- Sarcasm
- Nostalgia
The model operates in a multi-label setting using sigmoid outputs, allowing multiple emotions to be active for a single input.
To build a consistent multi-label framework, the following datasets were integrated:
- GoEmotions (Google Research)
- Twitter Emotion Dataset
Preprocessing steps:
- Harmonized heterogeneous label spaces into a unified 10-emotion taxonomy.
- Mapped fine-grained labels into broader categories.
- Generated probabilistic target vectors for multi-label learning.
- Performed stratified train/validation/test splits.
- Stored dataset statistics and metadata for reproducibility.
- Backbone:
bert-base-uncased - Classification Head: Linear projection over pooled CLS representation
- Loss Function:
BCEWithLogitsLoss - Optimizer: AdamW
- Learning Rate Scheduler: Cosine Annealing
- Early Stopping based on validation performance
The model is trained to output independent probabilities for each emotion category.
Evaluation performed on 14,853 test samples.
| Metric | Value |
|---|---|
| Dominant Accuracy | 0.7045 |
| Subset Accuracy | 0.6364 |
| Hamming Loss | 0.0559 |
| Macro F1 Score | 0.5232 |
| Weighted F1 Score | 0.7037 |
| Emotion | F1 Score |
|---|---|
| Joy | 0.7515 |
| Sadness | 0.8378 |
| Anger | 0.6313 |
| Fear | 0.8041 |
| Surprise | 0.4725 |
| Disgust | 0.3119 |
| Love | 0.7568 |
| Neutral | 0.6664 |
| Sarcasm | 0.0000 |
| Nostalgia | 0.0000 |
Lower performance on sarcasm and nostalgia is primarily due to severe class imbalance in the integrated dataset.
These diagnostics help identify systematic confusion patterns (e.g., anger vs. fear, joy vs. sarcasm) and class imbalance effects.
The inference module includes:
- Tokenization with truncation and padding
- Sigmoid probability outputs
- Threshold-based filtering
- Probability mass redistribution for interpretability
- Command-line interface for quick testing
Example usage:
python inference.py "I feel incredibly happy today!"git clone <repository-url>
cd emotion-detectionpip install -r requirements.txtRequired libraries include:
- torch
- transformers
- scikit-learn
- pandas
- numpy
- matplotlib
Place processed CSV files in the data/ directory:
data/
├── train.csv
├── val.csv
└── test.csv
python train.pyThe best-performing model checkpoint will be saved in:
models/best_model.pt
python test.pyEvaluation reports and visualizations will be generated automatically.
emotion-detection/
│
├── data/
├── models/
├── train.py
├── test.py
├── inference.py
├── preprocess.py
├── requirements.txt
├── LICENCE.txt
└── README.md
- Severe class imbalance for sarcasm and nostalgia.
- Semantic ambiguity between closely related emotions.
- Neutral class bias under uncertainty.
Future improvements may include class-weighted loss functions, data augmentation, or probability calibration.
Soudeep Ghoshal

