Skip to content

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

License

Notifications You must be signed in to change notification settings

nathanstep55/pitch-transposition-using-DiffSinger

 
 

Repository files navigation

High-Quality Voice Pitch Transposition using DiffSinger (OpenVPI maintained version)

With the recent release of PC-NSF-HiFiGAN, it is possible to pitch-shift any vocals or periodic audio with reasonably high quality using this codebase!

Steps to use:

  • Clone/download this repository
  • Download PC-NSF-HiFiGAN as a .zip file
  • Extract the .zip file
  • Open the config.json inside and add the following lines in the JSON near the bottom (within the curly braces):
    "audio_sample_rate": 44100,
    "num_mels": 128,
    "audio_num_mel_bins": 128,
    "hop_size": 512,
    "fft_size": 2048,
    "win_size": 2048,
    "fmin": 40,
    "fmax": 16000,
    "mel_fmin": 40,
    "mel_fmax": 16000,
    "mel_base": "e",
    "mel_scale": "slaney"
    
  • Download the RMVPE pitch detection model from here and extract the .zip file
  • In scripts, run the pitch_transpose.py script using the following syntax: python3 pitch_transpose.py --pitchalgo rmvpe --rmvpeckpt <model.pt inside RMVPE folder> --transpose <pitch in semitones> --config <config.json inside PC-NSF-HiFiGAN folder> --ckpt <model.ckpt inside PC-NSF-HiFiGAN folder> <input file>

The output is formatted filename-nsfhifigan-2.0-out.wav where "filename" is the input filename minus the .wav and 2.0 is the pitch transposed in semitones.

You can also choose to use the WORLD vocoder by adding the argument --method world, which is lower quality but does not rely on machine learning.

Original README below:

arXiv downloads Bilibili license

This is a refactored and enhanced version of DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism based on the original paper and implementation, which provides:

  • Cleaner code structure: useless and redundant files are removed and the others are re-organized.
  • Better sound quality: the sampling rate of synthesized audio are adapted to 44.1 kHz instead of the original 24 kHz.
  • Higher fidelity: improved acoustic models and diffusion sampling acceleration algorithms are integrated.
  • More controllability: introduced variance models and parameters for prediction and control of pitch, energy, breathiness, etc.
  • Production compatibility: functionalities are designed to match the requirements of production deployment and the SVS communities.
Overview Variance Model Acoustic Model
arch-overview arch-variance arch-acoustic

User Guidance

中文教程 / Chinese Tutorials: Text, Video

Progress & Roadmap

Architecture & Algorithms

TBD

Development Resources

TBD

References

Original Paper & Implementation

Generative Models & Algorithms

Dependencies & Submodules

Disclaimer

Any organization or individual is prohibited from using any functionalities included in this repository to generate someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws.

License

This forked DiffSinger repository is licensed under the Apache 2.0 License.

About

An advanced singing voice synthesis system with high fidelity, expressiveness, controllability and flexibility based on DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%