Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compute_loudness issues #361

Closed
nielsrolf opened this issue Jun 9, 2021 · 9 comments · Fixed by #387
Closed

compute_loudness issues #361

nielsrolf opened this issue Jun 9, 2021 · 9 comments · Fixed by #387

Comments

@nielsrolf
Copy link

Hello,

I was looking at the loudness curves that spectral_ops.compute_loudness produces and found that they don't describe the perceived loudness very well. An example is shown below:

image
image

The loudness has a little peak somewhere where there is no audible sound, and from the curve you cannot see the four distinct notes as clearly as I would expect.

When looking into the code, I think I found the issue

  s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=True)

  # Compute power.
  amplitude = lib.abs(s)
  power_db = amplitude_to_db(amplitude, use_tf=use_tf)

  # Perceptual weighting.
  frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
  a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
  loudness = power_db + a_weighting

  # Set dynamic range.
  loudness -= ref_db
  loudness = lib.maximum(loudness, -range_db)
  mean = tf.reduce_mean if use_tf else np.mean

  # Average over frequency bins.
  loudness = mean(loudness, axis=-1)

loudness is a decibel value per frequency bin before taking the mean, but it does not make sense to take the mean over decibel values I believe. Instead, the a-weighting should be applied to the power spectrum (not in dB), then it should be summed over the frequency bins and then converted to dB scale.

In the way it is currently implemented, the A-weighting changes nothing besides adding its mean value to every time step. I plotted the result of compute_loudness with and without A-weighting, but in the without case I added its mean value, and this is the result (on another sample):
image

So I tried to look into how librosa computes loudness but found that they don't offer this function because it seems to be so complex, but in this discussion there is some helpful code.

Another unrelated small issue is, that the stft is computed with padding, which makes the loudness go down at the end of every sample a lot.

Let me know if you agree that these things are issues, and if I should work on a PR for it.

@voodoohop
Copy link

voodoohop commented Jun 9, 2021

The loudness calculation has been puzzling me too.

@nielsrolf the potential bug you discovered looks like it could be the culprit. I think one could also just change the mean calculation to:

loudness = amplitude_to_db(mean(db_to_amplitude(loudness), axis=-1))

I assumed we'd want to average in the linear domain and not in the power domain.

@nielsrolf
Copy link
Author

I tried the version that @voodoohop proposed without padding and with a shorter fft_size, it looks much better I think:
Screenshot 2021-06-09 at 16 38 01

The shorter fft size really helps to distinguish separate notes.

Before that, I played around a little bit and implemented a less theoretically backed version, but one that describes the perceived loudness also really good in my opinion:
image

  s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=False)

  # Compute power.
  amplitude = lib.abs(s)
  power = amplitude**2
  
  frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
  a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
  weighting = 10**(a_weighting/10)
  power = power * weighting

  power = reduce_mean(power, axis=-1)
  loudness = log(power*100 + 1)

@PluieElectrique
Copy link

I agree with what's been said so far. The way the current code adds the A-weighting and then averages over decibels means that the loudness values are only shifted a little. Furthermore, take a look at this frequency sweep from 0 Hz to 8 kHz:

sweep-audacity

This is a linear spectrogram in Audacity with an FFT size of 2048, Hann windows, and zero padding 1. Notice the ringing around the curve (this is known as spectral leakage?). Now, look at the loudness features for this sweep:

sweep-ddsp

The blue line is using the NumPy loudness code. The orange line is the same code, but with

loudness = np.mean(loudness, axis=-1)

replaced by

loudness = np.mean(np.power(10, loudness / 10.0), axis=-1)
loudness = 10.0 * np.log10(np.maximum(1e-20, loudness))

so that instead of taking the mean over decibels, we convert to power, take the mean, and then convert back to decibels. This gets rid of the artifacts and produces a smooth curve, which seems to better match the smooth rise in loudness of the sweep.

I suppose there's a chance that the fluctuations of the blue line help the model by giving it extra information. But it seems unlikely, because DDSP is supposed to extract loudness in the same way as "Fast and Flexible Neural Audio Synthesis", and it says in the FFNAS appendix that: (emphasis mine)

To calculate loudness, we use librosa’s perceptual_weighting() function on the square of the STFT. This produces a spectrum in dB, which we convert back into a linear scale and compute a mean over frequency bins. This value is then scaled via log compression with a small offset eps = 1e − 5 to prevent overflow.

@voodoohop
Copy link

Shall we compile a pull request from this?

@nielsrolf
Copy link
Author

nielsrolf commented Jul 20, 2021

The question is how to do it without breaking old models, or if it's fine to break old models (maybe for v2.0?). Some input from the core devs would be nice @jesseengel

@voodoohop
Copy link

We could add a gin parameter to enable the "fixed" loudness calculation to not break old models.

nielsrolf added a commit to pollinations/ddsp that referenced this issue Jul 22, 2021
@nielsrolf
Copy link
Author

I implemented the fix as proposed by @PluieElectrique in pollinations/ddsp. A demo of the new method can be found in this colab.

I added a use_buggy_loudness parameter that is by default set to true if the ddsp version is below or equal to '1.6.2'.

@jesseengel
Copy link
Contributor

Hi all,

Sorry for the long delay on the reply. Thanks for looking into this and keeping up on it. I've caught up on the discussion and I think it makes a lot of sense. This loudness calculation was always an unnecessary headache (power seems to work just as well), and I think I got confused in the initial implementation because librosa.A_weighting returned units in decibels so I just combined and averaged in dB space. My bad!

The colab makes this pretty clear and the curves look good. Thanks @nielsrolf!

So I'd like to incorporate the changes, and the question just becomes how is the best way to do it? It's a bit complicated because there's the existing trained DDSP models for use in the colab (trained with loudness), there's the models in magenta.js (trained with power), and we're trying to finish up the work on this VST plugin (also using power). There are actually 3 separate bugs in the power/loudness that I'd like to fix, so it seems now is as good a time as any to make a breaking change and do them together.

  1. This loudness issue
  2. compute_power has an extra square accidentally as @mosheman5 found in Does square needed in compute_power? #383
  3. For some reason I originally chose a ref_db of 20.7. I mean, I guess it's principled as setting 0 dB at white noise amplitude 1.0, n_fft=2048, but doing that for power as well just complicates things from the natural scale of ref_db=0 (ref_amp=1.0), which is what everyone else does, so it makes all the power/loudness in DDSP crazy low without a good reason. So, yah, I'd like to fix that.

Rather than create a tangled web of deprecated kwargs and constants, I think I'd prefer a breaking change (for sanity's sake). My plan is to try incorporating all those changes in a PR, and train some new reference models to make sure everything is okay, and replace those for the colab notebook. Then merge the PR and push to v2.0.0 to mark the breaking change. That still leaves updating the magenta.js base models and code, but I can follow up with that.

If anyone has any big objections, let me know, otherwise I'm just going to go ahead with it. Old models can of course still be used, but will be pinned to older versions.

@nielsrolf
Copy link
Author

Sounds good!
While you are at it, you can also consider computing the spectrogram without padding, and then copying the last loudness value to pad the loudness curve to the desired shape as I did it in pollinations/ddsp

copybara-service bot pushed a commit that referenced this issue Aug 20, 2021
Breaking Change (-> v2.0.0):
* Perform loudness averaging in linear scale, not dB.
* Remove extra square from compute power.
* Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa.
* Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings.
* Moved db functions to core with other scaling functions.
* Tests to make sure the db functions match librosa.
* Default to `use_tf=True`.
* Moved diff() to core (not spectral specific)
* Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution].
* Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed).

Still TODO:
Retrain and swap out models in Colab demo.

PiperOrigin-RevId: 391661045
copybara-service bot pushed a commit that referenced this issue Jan 14, 2022
Breaking Change (-> v2.0.0):
* Perform loudness averaging in linear scale, not dB.
* Remove extra square from compute power.
* Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa.
* Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings.
* Moved db functions to core with other scaling functions.
* Tests to make sure the db functions match librosa.
* Default to `use_tf=True`.
* Moved diff() to core (not spectral specific)
* Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution].
* Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed).

Still TODO:
Retrain and swap out models in Colab demo.

PiperOrigin-RevId: 391661045
copybara-service bot pushed a commit that referenced this issue Jan 14, 2022
Breaking Change (-> v2.0.0):
* Perform loudness averaging in linear scale, not dB.
* Remove extra square from compute power.
* Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa.
* Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings.
* Moved db functions to core with other scaling functions.
* Tests to make sure the db functions match librosa.
* Default to `use_tf=True`.
* Moved diff() to core (not spectral specific)
* Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution].
* Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed).

Future TODO:
Retrain and swap out models in Colab demo.

PiperOrigin-RevId: 391661045
copybara-service bot pushed a commit that referenced this issue Jan 15, 2022
Breaking Change (-> v2.0.0):
* Perform loudness averaging in linear scale, not dB.
* Remove extra square from compute power.
* Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa.
* Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings.
* Moved db functions to core with other scaling functions.
* Tests to make sure the db functions match librosa.
* Default to `use_tf=True`.
* Moved diff() to core (not spectral specific)
* Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution].
* Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed).

Future TODO:
Retrain and swap out models in Colab demo.

PiperOrigin-RevId: 391661045
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants