Skip to content

SA-toolkit: Speaker speech anonymization toolkit in python

License

Notifications You must be signed in to change notification settings

deep-privacy/SA-toolkit

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SA-toolkit

SA-toolkit: Speaker speech anonymization toolkit in python

License: Apache 2.0 Open In Colab Gradio demo

SA-toolkit is a pytorch-based library providing pipelines and basic building blocs designing speaker anonymization techniques.
This library is the result of the work of Pierre Champion's thesis.

Features include:

  • ASR training with a pytorch kaldi LF-MMI wrapper (evaluation, and VC linguistic feature extraction)
  • VC HiFi-GAN training with on-the-fly feature caching (anonymization)
  • ASV training (evaluation)
  • WER Utility and EER/Linkability/Cllr Privacy evaluations
  • Clear and simplified egs directories
  • Unified trainer/configs
  • TorchScript YAAPT & TorchScript kaldi.fbank (with batch processing!)
  • On the fly only feature extraction
  • TorchScript JIT-compatible network models

All data are formatted with kaldi-like wav.scp, spk2utt, text, etc.
Kaldi is necessary for training the ASR models and the handy run.pl/ssh.pl/data_split.. scripts, but most of the actual logic is performed in python; you won't have to deal kaldi ;)

Installation

conda

The best way to install the SA-toolkit is with the install.sh script, which setup a micromamba environment, and kaldi.
Take a look at the script and adapt it to your cluster configuration, or leave it do it's magic.
This install is recommended for training ASR models.

git clone https://github.com/deep-privacy/SA-toolkit
./install.sh

pip

Another way of installing SA-toolkit is with pip3, this will setup everything for inference/testing.

pip3 install 'git+https://github.com/deep-privacy/SA-toolkit.git@master#egg=satools&subdirectory=satools'

Anonymize bin

Once installed (with any of the above ways), you will have access to the anonymize bin in your PATH that you can use together with a config (example: here) to anonymize a kaldi like directory.

anonymize --config ./configs/anon_pipelines --directory ./data/XXX

PyTorch API

Torch HUB anonymization example

This locally installs satools with Torch HUB (the required pip dependencies are: torch and torchaudio).
This version gives access to the python/torch model for inference/testing, but for training use install.sh. You can modify tag_version accordingly to the available model tag here.

import torch

model = torch.hub.load("deep-privacy/SA-toolkit", "anonymization", tag_version="hifigan_bn_tdnnf_wav2vec2_vq_48_v1", trust_repo=True)
wav_conv = model.convert(torch.rand((1, 77040)), target="1069")
asr_bn = model.get_bn(torch.rand((1, 77040))) # (ASR-BN extraction for disentangled linguistic features (best with hifigan_bn_tdnnf_wav2vec2_vq_48_v1))

Torch JIT anonymization example

This version does not rely on any dependencies using TorchScript.

import torch
import torchaudio
waveform, _, text_gt, speaker, chapter, utterance = torchaudio.datasets.LIBRISPEECH("/tmp", "dev-clean", download=True)[1]
torchaudio.save(f"/tmp/clear_{speaker}-{chapter}-{str(utterance)}.wav", waveform, 16000)

model = torch.jit.load("__Exp_Path__/final.jit").eval()
wav_conv = model.convert(waveform, target="1069")
torchaudio.save(f"/tmp/anon_{speaker}-{chapter}-{str(utterance)}.wav", wav_conv, 16000)

Ensure you have the model downloaded. Check the egs/vc directory for more detail.

VPC 2024 performances

tag_version=hifigan_bn_tdnnf_600h_vq_48_v1

VPC-B6

---- ASV_eval^anon results ----
 dataset split gender enrollment trial     EER
   libri  test      f       anon  anon  21.146
   libri  test      m       anon  anon  21.137

---- ASR results ----
 dataset split       asr    WER
   libri   dev      anon  9.693
   libri  test      anon  9.092

tag_version=hifigan_bn_tdnnf_wav2vec2_vq_48_v1

VPC-B5

---- ASV_eval^anon results ----
 dataset split gender enrollment trial     EER
   libri  test      f       anon  anon  33.946
   libri  test      m       anon  anon  34.729

---- ASR results ----
 dataset split       asr    WER
   libri   dev      anon  4.731
   libri  test      anon  4.369

tag_version=hifigan_bn_tdnnf_wav2vec2_vq_48_v1+f0-transformation=quant_16_awgn_2

Add F0 transformations to B5

With a stronger attacker (a better ASV model), the F0 transformation does not necessarily help to get a higher EER, (the VPC 2024 attack model is sensible to F0 modification).

---- ASR results ----
 dataset split       asr    WER
   libri   dev      anon  5.306
   libri  test      anon  4.814

---- ASV_eval^anon results ----
 dataset split gender enrollment trial     EER
   libri  test      f       anon  anon  42.151
   libri  test      m       anon  anon  40.755

tag_version=hifigan_inception_bn_tdnnf_wav2vec2_train_600_vq_48_v1+f0-transformation=quant_16_awgn_2

Experiment where libritts speech data is converted to a single speaker (using an anonymization system), then used as training data for another anonymization system.
ASR bottleneck extractor fine-tuned on librispeech 600 (rather than 100 like the above).

---- ASR results ----
 dataset split       asr    WER
   libri   dev      anon  4.693
   libri  test      anon  4.209

---- ASV_eval^anon results ----
 dataset split gender enrollment trial     EER
   libri  test      f       anon  anon  35.765
   libri  test      m       anon  anon  35.195

Model training

Checkout the READMEs of egs/asr/librispeech / egs/vc/libritts / egs/asv/voxceleb .

Evaluation

It is prefered to use the Voice-Privacy-Challenge-2024 evaluation tool as this SA-toolkit library was used for two baselines (B5 and B6)*

cd egs/anon/vctk
./local/eval.py --config configs/eval_clear  # eval privacy/utility of the signals

Ensure you have the corresponding evaluation model trained or downloaded.

Citation

This library is the result of the work of Pierre Champion's thesis.
If you found this library useful in academic research, please cite:

@phdthesis{champion2023,
    title={Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques},
    author={Pierre Champion},
    year={2023},
    school={Université de Lorraine - INRIA Nancy},
    type={Thesis},
}

(Also consider starring the project on GitHub.)

Acknowledgements

License

Most of the software is distributed under Apache 2.0 License (http://www.apache.org/licenses/LICENSE-2.0); the parts distributed under other licenses are indicated by a LICENSE file in related directories.