`compute_loudness` issues #361

nielsrolf · 2021-06-09T12:11:26Z

Hello,

I was looking at the loudness curves that spectral_ops.compute_loudness produces and found that they don't describe the perceived loudness very well. An example is shown below:

The loudness has a little peak somewhere where there is no audible sound, and from the curve you cannot see the four distinct notes as clearly as I would expect.

When looking into the code, I think I found the issue

  s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=True)

  # Compute power.
  amplitude = lib.abs(s)
  power_db = amplitude_to_db(amplitude, use_tf=use_tf)

  # Perceptual weighting.
  frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
  a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
  loudness = power_db + a_weighting

  # Set dynamic range.
  loudness -= ref_db
  loudness = lib.maximum(loudness, -range_db)
  mean = tf.reduce_mean if use_tf else np.mean

  # Average over frequency bins.
  loudness = mean(loudness, axis=-1)

loudness is a decibel value per frequency bin before taking the mean, but it does not make sense to take the mean over decibel values I believe. Instead, the a-weighting should be applied to the power spectrum (not in dB), then it should be summed over the frequency bins and then converted to dB scale.

In the way it is currently implemented, the A-weighting changes nothing besides adding its mean value to every time step. I plotted the result of compute_loudness with and without A-weighting, but in the without case I added its mean value, and this is the result (on another sample):

So I tried to look into how librosa computes loudness but found that they don't offer this function because it seems to be so complex, but in this discussion there is some helpful code.

Another unrelated small issue is, that the stft is computed with padding, which makes the loudness go down at the end of every sample a lot.

Let me know if you agree that these things are issues, and if I should work on a PR for it.

The text was updated successfully, but these errors were encountered:

voodoohop · 2021-06-09T13:51:02Z

The loudness calculation has been puzzling me too.

@nielsrolf the potential bug you discovered looks like it could be the culprit. I think one could also just change the mean calculation to:

loudness = amplitude_to_db(mean(db_to_amplitude(loudness), axis=-1))

I assumed we'd want to average in the linear domain and not in the power domain.

nielsrolf · 2021-06-09T14:41:46Z

I tried the version that @voodoohop proposed without padding and with a shorter fft_size, it looks much better I think:

The shorter fft size really helps to distinguish separate notes.

Before that, I played around a little bit and implemented a less theoretically backed version, but one that describes the perceived loudness also really good in my opinion:

  s = stft_fn(audio, frame_size=n_fft, overlap=overlap, pad_end=False)

  # Compute power.
  amplitude = lib.abs(s)
  power = amplitude**2
  
  frequencies = librosa.fft_frequencies(sr=sample_rate, n_fft=n_fft)
  a_weighting = librosa.A_weighting(frequencies)[lib.newaxis, lib.newaxis, :]
  weighting = 10**(a_weighting/10)
  power = power * weighting

  power = reduce_mean(power, axis=-1)
  loudness = log(power*100 + 1)

PluieElectrique · 2021-07-18T08:50:19Z

I agree with what's been said so far. The way the current code adds the A-weighting and then averages over decibels means that the loudness values are only shifted a little. Furthermore, take a look at this frequency sweep from 0 Hz to 8 kHz:

This is a linear spectrogram in Audacity with an FFT size of 2048, Hann windows, and zero padding 1. Notice the ringing around the curve (this is known as spectral leakage?). Now, look at the loudness features for this sweep:

The blue line is using the NumPy loudness code. The orange line is the same code, but with

loudness = np.mean(loudness, axis=-1)

replaced by

loudness = np.mean(np.power(10, loudness / 10.0), axis=-1)
loudness = 10.0 * np.log10(np.maximum(1e-20, loudness))

so that instead of taking the mean over decibels, we convert to power, take the mean, and then convert back to decibels. This gets rid of the artifacts and produces a smooth curve, which seems to better match the smooth rise in loudness of the sweep.

I suppose there's a chance that the fluctuations of the blue line help the model by giving it extra information. But it seems unlikely, because DDSP is supposed to extract loudness in the same way as "Fast and Flexible Neural Audio Synthesis", and it says in the FFNAS appendix that: (emphasis mine)

To calculate loudness, we use librosa’s perceptual_weighting() function on the square of the STFT. This produces a spectrum in dB, which we convert back into a linear scale and compute a mean over frequency bins. This value is then scaled via log compression with a small offset eps = 1e − 5 to prevent overflow.

voodoohop · 2021-07-20T18:20:24Z

Shall we compile a pull request from this?

nielsrolf · 2021-07-20T23:05:10Z

The question is how to do it without breaking old models, or if it's fine to break old models (maybe for v2.0?). Some input from the core devs would be nice @jesseengel

voodoohop · 2021-07-21T00:20:21Z

We could add a gin parameter to enable the "fixed" loudness calculation to not break old models.

nielsrolf · 2021-07-22T09:49:40Z

I implemented the fix as proposed by @PluieElectrique in pollinations/ddsp. A demo of the new method can be found in this colab.

I added a use_buggy_loudness parameter that is by default set to true if the ddsp version is below or equal to '1.6.2'.

jesseengel · 2021-08-19T01:17:20Z

Hi all,

Sorry for the long delay on the reply. Thanks for looking into this and keeping up on it. I've caught up on the discussion and I think it makes a lot of sense. This loudness calculation was always an unnecessary headache (power seems to work just as well), and I think I got confused in the initial implementation because librosa.A_weighting returned units in decibels so I just combined and averaged in dB space. My bad!

The colab makes this pretty clear and the curves look good. Thanks @nielsrolf!

So I'd like to incorporate the changes, and the question just becomes how is the best way to do it? It's a bit complicated because there's the existing trained DDSP models for use in the colab (trained with loudness), there's the models in magenta.js (trained with power), and we're trying to finish up the work on this VST plugin (also using power). There are actually 3 separate bugs in the power/loudness that I'd like to fix, so it seems now is as good a time as any to make a breaking change and do them together.

This loudness issue
compute_power has an extra square accidentally as @mosheman5 found in Does square needed in compute_power? #383
For some reason I originally chose a ref_db of 20.7. I mean, I guess it's principled as setting 0 dB at white noise amplitude 1.0, n_fft=2048, but doing that for power as well just complicates things from the natural scale of ref_db=0 (ref_amp=1.0), which is what everyone else does, so it makes all the power/loudness in DDSP crazy low without a good reason. So, yah, I'd like to fix that.

Rather than create a tangled web of deprecated kwargs and constants, I think I'd prefer a breaking change (for sanity's sake). My plan is to try incorporating all those changes in a PR, and train some new reference models to make sure everything is okay, and replace those for the colab notebook. Then merge the PR and push to v2.0.0 to mark the breaking change. That still leaves updating the magenta.js base models and code, but I can follow up with that.

If anyone has any big objections, let me know, otherwise I'm just going to go ahead with it. Old models can of course still be used, but will be pinned to older versions.

nielsrolf · 2021-08-19T18:36:31Z

Sounds good!
While you are at it, you can also consider computing the spectrogram without padding, and then copying the last loudness value to pad the loudness curve to the desired shape as I did it in pollinations/ddsp

Breaking Change (-> v2.0.0): * Perform loudness averaging in linear scale, not dB. * Remove extra square from compute power. * Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa. * Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings. * Moved db functions to core with other scaling functions. * Tests to make sure the db functions match librosa. * Default to `use_tf=True`. * Moved diff() to core (not spectral specific) * Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution]. * Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed). Still TODO: Retrain and swap out models in Colab demo. PiperOrigin-RevId: 391661045

Breaking Change (-> v2.0.0): * Perform loudness averaging in linear scale, not dB. * Remove extra square from compute power. * Set reference db to 0.0 to align with librosa instead of magic 20.7 number for white noise reference. That way for power, dB=0.0 corresponds to amplitude = 1.0. Correspondingly, set dB range to [-80, 0] instead of [-120, 0] to align with librosa. * Provide convenience functions for `power_to_db` and `db_to_power` that are aligned to librosa. Clearer docstrings. * Moved db functions to core with other scaling functions. * Tests to make sure the db functions match librosa. * Default to `use_tf=True`. * Moved diff() to core (not spectral specific) * Default loudness, power, and rms to much more reasonable frame size default of 512 (instead of 1024/2048) [better time resolution]. * Compute Loudness on the fly by default (it's very fast and doesn't hurt training speed). Future TODO: Retrain and swap out models in Colab demo. PiperOrigin-RevId: 391661045

nielsrolf added a commit to pollinations/ddsp that referenced this issue Jul 22, 2021

implemented loudness fix relating to issue magenta#361

4bbb3b1

This was referenced Aug 19, 2021

Does square needed in compute_power? #383

Open

to use magenta-js with this ddsp. #352

Open

copybara-service bot mentioned this issue Aug 20, 2021

Fix bugs in compute_power and compute_loudness. Fixes #361 and #383. #387

Merged

nielsrolf mentioned this issue Nov 26, 2021

Loudness is computed in a wrong way acids-ircam/ddsp_pytorch#31

Open

copybara-service bot closed this as completed in a260cde Jan 15, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`compute_loudness` issues #361

`compute_loudness` issues #361

nielsrolf commented Jun 9, 2021

voodoohop commented Jun 9, 2021 •

edited

Loading

nielsrolf commented Jun 9, 2021

PluieElectrique commented Jul 18, 2021

voodoohop commented Jul 20, 2021

nielsrolf commented Jul 20, 2021 •

edited

Loading

voodoohop commented Jul 21, 2021

nielsrolf commented Jul 22, 2021

jesseengel commented Aug 19, 2021

nielsrolf commented Aug 19, 2021

compute_loudness issues #361

compute_loudness issues #361

Comments

nielsrolf commented Jun 9, 2021

voodoohop commented Jun 9, 2021 • edited Loading

nielsrolf commented Jun 9, 2021

PluieElectrique commented Jul 18, 2021

voodoohop commented Jul 20, 2021

nielsrolf commented Jul 20, 2021 • edited Loading

voodoohop commented Jul 21, 2021

nielsrolf commented Jul 22, 2021

jesseengel commented Aug 19, 2021

nielsrolf commented Aug 19, 2021

`compute_loudness` issues #361

`compute_loudness` issues #361

voodoohop commented Jun 9, 2021 •

edited

Loading

nielsrolf commented Jul 20, 2021 •

edited

Loading