TRIBE v2 Emotion Validation Experiment

Can predicted brain activations help a language model understand emotion better?

This project validates whether META's TRIBE v2 brain encoding model — which predicts fMRI responses to text and audio — carries discriminative emotion signal that an LLM can use. We test this with a controlled 3-condition experiment using 100 emotion-labeled text samples, two input modalities (text vs expressive audio), and local LLM inference via Ollama.

Pipeline

                        +---------------------------+
  mosei_samples.json    |  ElevenLabs TTS (12 voices)|
  (100 labelled texts)  |  emotional_audio/*.wav     |
          |             +---------------------------+
          |                          |
          v                          v
  tribe_inference.py      tribe_inference_emotional.py
  (text-based input)      (audio-based, batched)
          |                          |
          v                          v
  tribe_activations.json   tribe_activations_emotional.json
  (7 ROI means x 100)      (same schema, audio-derived)
          |                          |
          +----------+  +------------+
                     v  v
               brain_formatter.py
               (z-score -> natural language)
                     |
                     v
          formatted_contexts*.json
          (real context + shuffled control)
                     |
              +------+------+
              v             v
         evaluate.py    roi_classifier.py
         (LLM eval)     (direct ML, no LLM)
              |             |
              v             v
       results_summary   plots/ + results.txt

Experimental Design

Condition	LLM Input	Role
A (baseline)	Text only	What the LLM achieves without any brain signal
B (test)	Text + real TRIBE v2 neural context	Does brain data help?
C (control)	Text + shuffled neural context	Proves B is meaningful, not noise

Validation logic: B > A > C = signal. B ≈ C = noise.

100 samples drawn from dair-ai/emotion:

40 explicit (emotion named directly: "I feel happy")
40 implicit (emotion inferred from context: "it was my last day before retirement")
20 sarcastic (sentiment contradicts tone: "oh great, another Monday")

4 emotion classes: happy, angry, sad, fearful. Chance = 25%.

Results

LLM Evaluation (llama3.1:8b via Ollama)

Emotional audio input to TRIBE v2:

Condition	Overall	Explicit	Implicit	Sarcastic
A (text only)	52%	60%	50%	40%
B (neural context)	55%	67.5%	50%	40%
C (shuffled)	54%	60%	55%	40%

Text-based input to TRIBE v2 (for comparison):

Condition	Overall	Explicit	Implicit
A	52%	60%	50%
B	54%	62.5%	52.5%
C	52%	55%	55%

Emotional audio improved Condition B on explicit emotions by +5pp (62.5% -> 67.5%).

Direct ROI Classifier (no LLM)

Logistic regression / SVM / Random Forest on 7 ROI means directly:

Input	LogReg	SVM	RF	Chance
Text-based	20%	26%	20%	25%
Emotional audio	26%	33%	32%	25%

Per category (LogReg, text-based): Explicit 47.5%, Implicit 37.5%, Sarcastic 30%

Per category (LogReg, emotional audio): Explicit 45%, Implicit 47.5%, Sarcastic 35%

Most informative ROIs: ACC (conflict/uncertainty) and amygdala (threat/arousal).

Key Findings

Neural context helps on explicit emotions. Condition B reached 67.5% accuracy on explicitly stated emotions, vs 60% baseline and 60% shuffled -- a clean B > A = C pattern.
Raw brain activations encode emotion above chance. A simple SVM on 7 ROI means hit 33% (emotional audio) -- above the 25% chance level with no LLM required.
Emotional audio (ElevenLabs) enriches TRIBE v2 signal. Implicit emotion classification from raw ROIs jumped from 37.5% (text) to 47.5% (audio). Prosody reaches the Wav2VecBert encoder and changes the predicted activation patterns.
ACC and amygdala carry the most discriminative signal -- consistent with emotion neuroscience literature.
Implicit emotions remain challenging for the LLM eval. C(55%) > B(50%) on implicit in the LLM condition suggests the natural language formatter does not yet surface the signal the raw activations contain. The ROI classifier, which bypasses the formatter, shows the signal is there.
Sarcasm is flat across all conditions (40%) -- expected; sarcasm requires pragmatic inference no brain encoder helps with at this scale.

Verdict: PARTIAL SIGNAL. Brain data improves explicit emotion prediction. Implicit emotion signal exists in the activations (shown by ROI classifier) but the LLM formatter does not yet surface it effectively.

Limitations

100 samples -- results are directional, not statistically definitive. A 3% accuracy difference at this scale is noise.
7 index-based ROI slices -- not anatomically validated. The actual amygdala is not at vertices 2967–5721. Real parcellation (Glasser HCP-MMP1.0, 360 regions) could change the results significantly.
z-score clustering -- many samples fall in the "moderate" z-score band, limiting the formatter's expressiveness.
ElevenLabs free tier -- 10,000 chars/month. Re-generation requires a new API key or paid plan.
Local LLM (8B params) -- a larger or fine-tuned emotion classifier would likely respond better to the neural context.

What Would Make This Publishable

The approach is novel -- no prior work combines TRIBE v2 predicted activations with LLM emotion inference in a controlled experiment. The early signal is real. What it needs to reach publication quality:

Real atlas parcellation. Replace the 7 index-based slices with Glasser HCP-MMP1.0 (360 anatomically validated regions). This alone could substantially change the classifier results.
500+ samples with proper statistical significance testing across conditions.
A validated multimodal emotion dataset with annotated implicit and explicit emotion labels (CMU-MOSEI is the natural target).
Direct embedding approach instead of NL formatting -- feed the activations as a vector to the LLM rather than translating them to text first. The ROI classifier results suggest the signal is there; the translation step is the bottleneck.
Larger or fine-tuned model -- llama3.1:8b is a generalist. An emotion-specialized model would respond more reliably to the neural context.

The infrastructure is here. The idea is here. It just needs resources.

Plots

All plots are in plots/:

Plot	Description
`accuracy_by_category.png`	LLM emotion accuracy per condition x category
`vad_valence_heatmap.png`	VAD-Valence correlation per category x condition
`vad_correlation_overall.png`	VAD (V/A/D) correlations overall
`roi_classifier_comparison.png`	ROI classifier accuracy: text vs audio
`roi_emotion_fingerprint_text.png`	Emotion activation heatmap (text-based)
`roi_emotion_fingerprint_emotional.png`	Emotion activation heatmap (emotional audio)

Installation

Requires Python 3.12, CUDA 12.1, and uv.

# 1. Clone this repo
git clone https://github.com/RudraanshBhati/tribe-v2-emotion-eval
cd tribe-v2-emotion-eval

# 2. Clone TRIBE v2 as a workspace member (required -- editable install)
git clone https://github.com/facebookresearch/tribev2

# 3. Install all dependencies (PyTorch CUDA index configured in pyproject.toml)
uv sync

# 4. Pull the LLM for evaluation
ollama pull llama3.1:8b

# 5. Set up secrets
cp .env.example .env
# Edit .env and add:
#   ELEVENLABS_API_KEY=your_key_here   (only needed to regenerate audio)
#   HF_TOKEN=your_token_here           (HuggingFace gated access)

VRAM note: TRIBE v2 uses ~8-10 GB. Ollama uses ~5-6 GB. Run them sequentially on a 12 GB GPU.

Reproduce

Run steps in order. Each step's output is the next step's input.

# 0. Activate venv
source .venv/Scripts/activate   # Windows
# source .venv/bin/activate     # Linux/Mac

# 1. Build dataset (100 samples from dair-ai/emotion)
python load_mosei.py

# 2. Generate emotional audio (ElevenLabs, requires API key)
python generate_audio_elevenlabs.py

# 3a. TRIBE v2 inference on text
python tribe_inference.py

# 3b. TRIBE v2 inference on emotional audio (batched, ~15 min on RTX 4070)
python tribe_inference_emotional.py

# 4. Format brain activations as natural language context
python brain_formatter.py

# 5. LLM evaluation (300 Ollama calls, ~5-8 min)
python evaluate.py

# 6a. Direct ROI classifier (no LLM)
python roi_classifier.py

# 6b. Visualize LLM results + decision gate
python visualize_results.py

Results land in plots/, results_summary*.csv, roi_classifier_results.txt, and decision_gate.txt.

File Structure

tribe-v2-emotion-eval/
├── pyproject.toml                      # Dependencies (uv)
├── .python-version                     # Python 3.12
├── .gitignore
│
├── load_mosei.py                       # Step 1: dataset
├── generate_audio_elevenlabs.py        # Step 2: ElevenLabs TTS
├── generate_emotional_audio.py         # Step 2 alt: edge-tts fallback
├── tribe_inference.py                  # Step 3a: text-based TRIBE v2
├── tribe_inference_emotional.py        # Step 3b: audio TRIBE v2 (batched)
├── brain_formatter.py                  # Step 4: activations -> NL context
├── evaluate.py                         # Step 5: LLM eval (Ollama)
├── roi_classifier.py                   # Step 6a: direct ML classifier
├── visualize_results.py                # Step 6b: plots + decision gate
│
├── mosei_samples.json                  # 100 labelled samples
├── results_summary.csv                 # LLM eval results (text-based)
├── results_summary_emotional.csv       # LLM eval results (emotional audio)
├── roi_classifier_results.txt          # ROI classifier accuracy summary
├── decision_gate.txt                   # Final verdict
│
├── plots/                              # All output figures (6 PNGs)
└── PROJECT_STATUS.md                   # Detailed experiment log

Credits & Acknowledgements

TRIBE v2 -- META / Facebook AI Research. Brain encoding model predicting fMRI from multimodal inputs.
ElevenLabs -- Emotionally expressive TTS used for audio generation.
dair-ai/emotion -- Hugging Face emotion dataset (CMU-MOSEI substitute).
Ollama + llama3.1:8b -- Local LLM inference.
Warriner, A.B., Kuperman, V., & Brysbaert, M. (2013) -- Norms of valence, arousal, and dominance for 13,915 English lemmas. Behavior Research Methods.

Citation

If you use this experiment design or code:

@misc{tribe_emotion_eval_2026,
  title   = {TRIBE v2 Emotion Validation Experiment},
  year    = {2026},
  url     = {https://github.com/RudraanshBhati/tribe-v2-emotion-eval},
  note    = {Validates whether TRIBE v2 predicted brain activations improve LLM emotion understanding}
}

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TRIBE v2 Emotion Validation Experiment

Pipeline

Experimental Design

Results

LLM Evaluation (llama3.1:8b via Ollama)

Direct ROI Classifier (no LLM)

Key Findings

Limitations

What Would Make This Publishable

Plots

Installation

Reproduce

File Structure

Credits & Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
plots		plots
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
PROJECT_STATUS.md		PROJECT_STATUS.md
README.md		README.md
brain_formatter.py		brain_formatter.py
decision_gate.txt		decision_gate.txt
evaluate.py		evaluate.py
generate_audio_elevenlabs.py		generate_audio_elevenlabs.py
load_mosei.py		load_mosei.py
mosei_samples.json		mosei_samples.json
pyproject.toml		pyproject.toml
results_summary.csv		results_summary.csv
results_summary_emotional.csv		results_summary_emotional.csv
roi_classifier.py		roi_classifier.py
roi_classifier_results.txt		roi_classifier_results.txt
tribe_inference.py		tribe_inference.py
tribe_inference_emotional.py		tribe_inference_emotional.py
visualize_results.py		visualize_results.py

Folders and files

Latest commit

History

Repository files navigation

TRIBE v2 Emotion Validation Experiment

Pipeline

Experimental Design

Results

LLM Evaluation (llama3.1:8b via Ollama)

Direct ROI Classifier (no LLM)

Key Findings

Limitations

What Would Make This Publishable

Plots

Installation

Reproduce

File Structure

Credits & Acknowledgements

Citation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages