Emotion Distribution Modeling in Text

An applied multi-label NLP system that predicts probabilistic distributions over 10 emotion categories from input text. The project integrates multiple public emotion datasets into a unified taxonomy and implements a complete training, evaluation, and inference pipeline using a BERT-based architecture.

Overview

The system predicts probabilities for the following emotion categories:

Joy
Sadness
Anger
Fear
Surprise
Disgust
Love
Neutral
Sarcasm
Nostalgia

The model operates in a multi-label setting using sigmoid outputs, allowing multiple emotions to be active for a single input.

Dataset Construction

To build a consistent multi-label framework, the following datasets were integrated:

GoEmotions (Google Research)
Twitter Emotion Dataset

Preprocessing steps:

Harmonized heterogeneous label spaces into a unified 10-emotion taxonomy.
Mapped fine-grained labels into broader categories.
Generated probabilistic target vectors for multi-label learning.
Performed stratified train/validation/test splits.
Stored dataset statistics and metadata for reproducibility.

Model Architecture

Backbone: bert-base-uncased
Classification Head: Linear projection over pooled CLS representation
Loss Function: BCEWithLogitsLoss
Optimizer: AdamW
Learning Rate Scheduler: Cosine Annealing
Early Stopping based on validation performance

The model is trained to output independent probabilities for each emotion category.

Evaluation Results

Evaluation performed on 14,853 test samples.

Overall Metrics

Metric	Value
Dominant Accuracy	0.7045
Subset Accuracy	0.6364
Hamming Loss	0.0559
Macro F1 Score	0.5232
Weighted F1 Score	0.7037

Per-Emotion F1 Scores

Emotion	F1 Score
Joy	0.7515
Sadness	0.8378
Anger	0.6313
Fear	0.8041
Surprise	0.4725
Disgust	0.3119
Love	0.7568
Neutral	0.6664
Sarcasm	0.0000
Nostalgia	0.0000

Lower performance on sarcasm and nostalgia is primarily due to severe class imbalance in the integrated dataset.

Visual Diagnostics

Confusion Matrix

Per-Emotion Performance

These diagnostics help identify systematic confusion patterns (e.g., anger vs. fear, joy vs. sarcasm) and class imbalance effects.

Inference Pipeline

The inference module includes:

Tokenization with truncation and padding
Sigmoid probability outputs
Threshold-based filtering
Probability mass redistribution for interpretability
Command-line interface for quick testing

Example usage:

python inference.py "I feel incredibly happy today!"

Reproducing Results

1. Clone the Repository

git clone <repository-url>
cd emotion-detection

2. Install Dependencies

pip install -r requirements.txt

Required libraries include:

torch
transformers
scikit-learn
pandas
numpy
matplotlib

3. Prepare the Dataset

Place processed CSV files in the data/ directory:

data/
├── train.csv
├── val.csv
└── test.csv

4. Train the Model

python train.py

The best-performing model checkpoint will be saved in:

models/best_model.pt

5. Evaluate the Model

python test.py

Evaluation reports and visualizations will be generated automatically.

Project Structure

emotion-detection/
│
├── data/
├── models/
├── train.py
├── test.py
├── inference.py
├── preprocess.py
├── requirements.txt
├── LICENCE.txt
└── README.md

Limitations

Severe class imbalance for sarcasm and nostalgia.
Semantic ambiguity between closely related emotions.
Neutral class bias under uncertainty.

Future improvements may include class-weighted loss functions, data augmentation, or probability calibration.

Author

Soudeep Ghoshal

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Emotion Distribution Modeling in Text

Overview

Dataset Construction

Model Architecture

Evaluation Results

Overall Metrics

Per-Emotion F1 Scores

Visual Diagnostics

Confusion Matrix

Per-Emotion Performance

Inference Pipeline

Reproducing Results

1. Clone the Repository

2. Install Dependencies

3. Prepare the Dataset

4. Train the Model

5. Evaluate the Model

Project Structure

Limitations

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
data		data
evaluation		evaluation
LICENSE		LICENSE
README.md		README.md
inference.py		inference.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
test.py		test.py
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

Emotion Distribution Modeling in Text

Overview

Dataset Construction

Model Architecture

Evaluation Results

Overall Metrics

Per-Emotion F1 Scores

Visual Diagnostics

Confusion Matrix

Per-Emotion Performance

Inference Pipeline

Reproducing Results

1. Clone the Repository

2. Install Dependencies

3. Prepare the Dataset

4. Train the Model

5. Evaluate the Model

Project Structure

Limitations

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages