SA-toolkit is a pytorch-based library providing pipelines and basic building blocs designing speaker anonymization techniques.
This library is the result of the work of Pierre Champion's thesis.
Features include:
- ASR training with a pytorch kaldi LF-MMI wrapper (evaluation, and VC linguistic feature extraction)
- VC HiFi-GAN training with on-the-fly feature caching (anonymization)
- ASV training (evaluation)
- WER Utility and EER/Linkability/Cllr Privacy evaluations
- Clear and simplified egs directories
- Unified trainer/configs
- TorchScript YAAPT & TorchScript kaldi.fbank (with batch processing!)
- On the fly only feature extraction
- TorchScript JIT-compatible network models
All data
are formatted with kaldi-like wav.scp, spk2utt, text, etc.
Kaldi is necessary for training the ASR models and the handy run.pl
/ssh.pl
/data_split
.. scripts, but most of the actual logic is performed in python; you won't have to deal kaldi ;)
The best way to install the SA-toolkit is with the install.sh
script, which setup a micromamba environment, and kaldi.
Take a look at the script and adapt it to your cluster configuration, or leave it do it's magic.
This install is recommended for training ASR models.
git clone https://github.com/deep-privacy/SA-toolkit
./install.sh
Another way of installing SA-toolkit is with pip3, this will setup everything for inference/testing.
pip3 install 'git+https://github.com/deep-privacy/SA-toolkit.git@master#egg=satools&subdirectory=satools'
Once installed (with any of the above ways), you will
have access to the anonymize
bin in your PATH that you can use together
with a config (example: here) to anonymize a kaldi like directory.
anonymize --config ./configs/anon_pipelines --directory ./data/XXX
This locally installs satools with Torch HUB (the required pip dependencies are: torch
and torchaudio
).
This version gives access to the python/torch model for inference/testing, but for training use install.sh
.
You can modify tag_version
accordingly to the available model tag here.
import torch
model = torch.hub.load("deep-privacy/SA-toolkit", "anonymization", tag_version="hifigan_bn_tdnnf_wav2vec2_vq_48_v1", trust_repo=True)
wav_conv = model.convert(torch.rand((1, 77040)), target="1069")
asr_bn = model.get_bn(torch.rand((1, 77040))) # (ASR-BN extraction for disentangled linguistic features (best with hifigan_bn_tdnnf_wav2vec2_vq_48_v1))
This version does not rely on any dependencies using TorchScript.
import torch
import torchaudio
waveform, _, text_gt, speaker, chapter, utterance = torchaudio.datasets.LIBRISPEECH("/tmp", "dev-clean", download=True)[1]
torchaudio.save(f"/tmp/clear_{speaker}-{chapter}-{str(utterance)}.wav", waveform, 16000)
model = torch.jit.load("__Exp_Path__/final.jit").eval()
wav_conv = model.convert(waveform, target="1069")
torchaudio.save(f"/tmp/anon_{speaker}-{chapter}-{str(utterance)}.wav", wav_conv, 16000)
Ensure you have the model downloaded. Check the egs/vc directory for more detail.
VPC-B6
---- ASV_eval^anon results ----
dataset split gender enrollment trial EER
libri test f anon anon 21.146
libri test m anon anon 21.137
---- ASR results ----
dataset split asr WER
libri dev anon 9.693
libri test anon 9.092
VPC-B5
---- ASV_eval^anon results ----
dataset split gender enrollment trial EER
libri test f anon anon 33.946
libri test m anon anon 34.729
---- ASR results ----
dataset split asr WER
libri dev anon 4.731
libri test anon 4.369
Add F0 transformations to B5
With a stronger attacker (a better ASV model), the F0 transformation does not necessarily help to get a higher EER, (the VPC 2024 attack model is sensible to F0 modification).
---- ASR results ----
dataset split asr WER
libri dev anon 5.306
libri test anon 4.814
---- ASV_eval^anon results ----
dataset split gender enrollment trial EER
libri test f anon anon 42.151
libri test m anon anon 40.755
tag_version=hifigan_inception_bn_tdnnf_wav2vec2_train_600_vq_48_v1+f0-transformation=quant_16_awgn_2
Experiment where libritts speech data is converted to a single speaker (using
an anonymization system), then used as training data for another anonymization
system.
ASR bottleneck extractor fine-tuned on librispeech 600 (rather than 100 like the
above).
---- ASR results ----
dataset split asr WER
libri dev anon 4.693
libri test anon 4.209
---- ASV_eval^anon results ----
dataset split gender enrollment trial EER
libri test f anon anon 35.765
libri test m anon anon 35.195
Checkout the READMEs of egs/asr/librispeech / egs/vc/libritts / egs/asv/voxceleb .
It is prefered to use the Voice-Privacy-Challenge-2024 evaluation tool as this SA-toolkit library was used for two baselines (B5 and B6)*
cd egs/anon/vctk
./local/eval.py --config configs/eval_clear # eval privacy/utility of the signals
Ensure you have the corresponding evaluation model trained or downloaded.
This library is the result of the work of Pierre Champion's thesis.
If you found this library useful in academic research, please cite:
@phdthesis{champion2023,
title={Anonymizing Speech: Evaluating and Designing Speaker Anonymization Techniques},
author={Pierre Champion},
year={2023},
school={Université de Lorraine - INRIA Nancy},
type={Thesis},
}
(Also consider starring the project on GitHub.)
- Idiap' pkwrap
- Jik876's HifiGAN
- A.Larcher's Sidekit
- Organazers of the VoicePrivacy Challenge
Most of the software is distributed under Apache 2.0 License (http://www.apache.org/licenses/LICENSE-2.0); the parts distributed under other licenses are indicated by a LICENSE
file in related directories.