Trustworthy Speech Emotion Recognition [Paper Link]

Trust-SER is an open source project for researchers exploring SER applications with trustworthiness elements

The core elements for Trustworthy Speech Emotion Recognition (The figure uses images from https://openmoji.org/):

Our framework support popular pre-trained speech models:

APC
TERA
Wav2vec 2.0
WavLM Base+
Whisper Tiny
Whisper Base
Whisper Small

To begin with, please clone this repo:

git clone [email protected]:usc-sail/trust-ser.git

To install the conda environment:

cd trust-ser
conda env create -f trust-ser.yml
conda activate trust-ser

To install the essential SUPERB audio benchmark:

cd model/s3prl
pip3 install -e .

Please specify the data file to your work dir under config/config.yml

data_dir:
  crema_d: CREMA_D_PATH
  iemocap: IEMOCAP_PATH
  meld: MELD_PATH
  msp-improv: MSP-IMPROV_PATH
  msp-podcast: MSP-PODCAST_PATH
  ravdess: RAVDESS_PATH
project_dir: OUTPUT_PATH

Data Spliting

For most of the dataset, user need first split the train/dev/test by the given script file. Take the IEMOCAP data as instance:

cd train_split_gen
python3 iemocap.py

Audio Preprocess

For most of the dataset, user can generate the preprocessed audio file by the given script file. The preprocessing includes resample to 16kHz and to mono channel. Take the IEMOCAP data as instance:

cd preprocess_audio
python3 preprocess_audio.py --dataset iemocap
# dataset: iemocap, ravdess, msp-improv, msp-podcast, crema_d

The script will generate the folder under your working dir:

OUTPUT_PATH/audio/iemocap

ML training

To train with a pretrained backbone, use the following:

cd experiment
CUDA_VISIBLE_DEVICES=0 taskset -c 1-30 python3 finetune_single_thread.py --pretrain_model apc --dataset iemocap --learning_rate 0.0005 --downstream_model cnn --num_epochs 30 --num_layers 3 --conv_layers 2 --pooling mean --hidden_size 128

# pretrain_model: apc, tera, wavlm, wav2vec2_0, whisper_tiny, whisper_base, whisper_small
# pooling: mean, att (self-attention)
# hidden_size: size of cnn
# conv_layers: number of cnn layers

Trustworthy Evaluation

1.Fairness Evaluation

To evaluate the fairness with a pretrained backbone and its downstream model, use the following:

cd trustworthy/fairness
CUDA_VISIBLE_DEVICES=0 taskset -c 1-30 python3 fairness_evaluation.py --pretrain_model apc --dataset iemocap --learning_rate 0.0005 --downstream_model cnn --num_epochs 30 --num_layers 3 --conv_layers 2 --pooling mean --hidden_size 128

The output will be under: OUTPUT_PATH/fariness/iemocap The output metrics include: demographic disparity statistical_parity (Speaker-wise); equality of opportunity (Speaker-wise).

The aggregation is based on the max, which means the worst case will be the output. The lower the metric, the better the fairness.

2.Safety Evaluation

To evaluate the safety with a pretrained backbone and its downstream model, use the following:

cd trustworthy/safety
CUDA_VISIBLE_DEVICES=0 taskset -c 1-30 python3 adversarial_attack.py --pretrain_model apc --dataset iemocap --learning_rate 0.0005 --downstream_model cnn --num_epochs 30 --num_layers 3 --conv_layers 2 --pooling mean --hidden_size 128 --attack_method fgsm -snr 45
# attack_method: fgsm
# snr: SNR of the input adversarial noise

The output will be under: OUTPUT_PATH/attack/iemocap The output metrics is attack success rate, and the higher the metric, the worse the safety.

3.Privacy (Speaker recognition TBA)

CUDA_VISIBLE_DEVICES=0, taskset -c 90-120 python3 gender_inference.py --pretrain_model $pretrain_model --dataset $dataset --learning_rate 0.00005 --downstream_model cnn --num_epochs 10 --num_layers 3 --conv_layers 2 --pooling mean --hidden_size 128 --privacy_attack gender

The output will be under: OUTPUT_PATH/privacy/gender/iemocap The output metrics is gender prediction accuracy, and the higher the metric, the worse the privacy (We are adding the support for speaker recognition).

Name		Name	Last commit message	Last commit date
Latest commit History 53 Commits
config		config
data		data
dataloader		dataloader
experiment		experiment
img		img
model		model
preprocess_audio		preprocess_audio
train_split_gen		train_split_gen
trustworthy		trustworthy
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
trust-ser.yml		trust-ser.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Trustworthy Speech Emotion Recognition [Paper Link]

Trust-SER is an open source project for researchers exploring SER applications with trustworthiness elements

Our framework support popular pre-trained speech models:

Data Spliting

Audio Preprocess

ML training

Trustworthy Evaluation

1.Fairness Evaluation

2.Safety Evaluation

3.Privacy (Speaker recognition TBA)

About

Releases

Packages

Languages

License

usc-sail/trust-ser

Folders and files

Latest commit

History

Repository files navigation

Trustworthy Speech Emotion Recognition [Paper Link]

Trust-SER is an open source project for researchers exploring SER applications with trustworthiness elements

Our framework support popular pre-trained speech models:

Data Spliting

Audio Preprocess

ML training

Trustworthy Evaluation

1.Fairness Evaluation

2.Safety Evaluation

3.Privacy (Speaker recognition TBA)

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages