date	tags
2020-03-23	paper, gan, deep-learning, vocoder, tts, speech, generative

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

Link to the paper

Congyi Wang, Yu Chen, Bin Wang, Yi Shi

Report

Year: 2021

Samples: https://anonymous1086.github.io/prlsgan-vocoder/

This paper presents a GAN-based vocoder based on MelGAN, that introduces the Pointwise Relativistic Least Square GAN (PRLSGAN) to improve the performance of the GAN-based vocoder. It is based in Relativistic GAN and it shows improvement mainly in the artifacts normally produced by GAN-based vocoders.

The authors make a great overview of different types of vocoder, to motivate the use of GAN vocoders which, among other things, are able to generate audio in parallel from a set of features (e.g. spectrogram).

After, they introduce the basic methods:

Parallel WaveGAN (PWGAN): consists of a generator, a discriminator and a multi-resolution STFT auxiliary loss
Multi-resolution STFT: consists of a conventional loss applied over the STFT of the signal produced by the generator.
MelGAN: consists of a generator implementing transposed convolutions, and a multi-scale discriminator that handles audio at different levels. As loss function, it implements a feature-matching loss between generated and real audio, as well as an adversarial loss.

PRLSGAN builds upon MelGAN and implements a different loss: a combination of the MSE loss and a pointwise relative discrepancy loss that seems to increase the minimax difficulty, forcing the generator to refine the output.

The method

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wang2021.md

wang2021.md

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN

Files

wang2021.md

Latest commit

History

wang2021.md

File metadata and controls

Improve GAN-based Neural Vocoder using Pointwise Relativistic LeastSquare GAN