Add PSNR (Y/U/V) for outbound-rtp #794

fippo · 2025-02-05T16:41:17Z

This is similar to qpSum but codec-independent.

Since PSNR requires additional computation it is defined with an
accompanying psnrMeasurements counter to allow the computation of
an average PSNR.

Defined as a record with components for the Y, U and V planes respectively.

See also
https://datatracker.ietf.org/doc/html/rfc8761#section-5

Preview | Diff

This is similar to qpSum but codec-independent. Since PSNR requires additional computation it is defined with an accompanying psnrMeasurements counter to allow the computation of an average PSNR. Defined as three components for the Y, U and V planes respeectively. See also https://datatracker.ietf.org/doc/html/rfc8761#section-5

fippo · 2025-02-05T16:52:43Z

@henbos ^

henbos · 2025-02-06T15:24:05Z

This needs to be presented at the next virtual interim. Youenn mentions he'd like to hear about use case of the metric.

fippo · 2025-02-06T16:09:15Z

https://www.researchgate.net/publication/383545049_Low-Complexity_Video_PSNR_Measurement_in_Real-Time_Communication_Products has a whole paper about this.

tl;dr is "gp is codec dependent", PSNR is not (but comes at a cost hence this can not be a simple sum)

@youennf the folks who implemented https://developer.apple.com/documentation/videotoolbox/kvtcompressionpropertykey_calculatemeansquarederror?changes=l_4_8 might be able to tell you more too.

cc @taste1981

vr000m · 2025-02-18T17:38:23Z

I have not read the paper, is a preprint available?

But based on my own interactions with PSNR, there is a source and decoded image. Is the measurement on the outbound-rtp related to source and the encoded image, i.e., the PSNR due to the encoder.

fippo · 2025-02-18T18:40:20Z

This one is encoder PSNR, not scaling PSNR. Scaling PSNR would end up living on media-source in stats.
I'll have a copy of the paper sent to you

vr000m · 2025-02-18T22:35:48Z

webrtc-stats.html

+                  The PSNR is defined in [[ISO-29170-1:2017]].
+                </p>
+                <p class="note">
+                  The frequency of PSNR measurements is outside the scope of this specification.


We should recommend that perceptually, Y-PSNR is used. And PSNR values can vary based on motion and encoder settings.

Typically, Average Y-PSNR is calculated by dividing with the number of frames.

that division is done by the application.

vr000m · 2025-02-18T22:38:03Z

The other thing that I am curious about is if the PSNR requires decoding the encoded video or is this calculated as part of the encoder operation. Mainly the impact on CPU if it requires some kind of decode step, I wonder if this is only calculated applied to I-frames or huge-frames.

alvestrand · 2025-02-20T16:00:23Z

@sprangerik have you looked at this?

jan-ivar · 2025-02-20T16:00:39Z

@jesup thoughts?

sprangerik · 2025-02-21T09:30:34Z

I am supportive of this.

For context, see also https://webrtc-review.googlesource.com/c/src/+/368960

sprangerik · 2025-02-21T09:36:04Z

The other thing that I am curious about is if the PSNR requires decoding the encoded video or is this calculated as part of the encoder operation. Mainly the impact on CPU if it requires some kind of decode step, I wonder if this is only calculated applied to I-frames or huge-frames.

The idea is to have it as part of the encoder process. The encoder is by definition also a decoder, so it can directly use both the raw input and reconstructed state without penalty. The actual PSNR calculation will of course often incur an extra CPU hit, unless it is already a part of e.g. a rate-distortion aware rate controller - but that's not often the case for real-time encoders. That's why it's proposed to limit the frequency of PSNR calculations.

This of course means the user cannot count on PSNR metrics being populated. Even for a given stream, the PSNR values might suddenly disappear if e.g. there is a software/hardware switching event and only one implementation supports PSNR output.

vr000m · 2025-02-22T00:17:19Z

since webrtc-stats anyway does aggregate values, we could do a sumPsnr and countFrames, i.e., each time a psnr is calculated it is added and corresponding frame count counter goes up. If it is done for all frames, we would not need a frame counter

dontcallmedom-bot · 2025-02-26T09:32:27Z

This issue was discussed in WebRTC February 2025 meeting – (#794 Add PSNR)

Drekabi · 2025-02-27T01:22:40Z

This seems like a nice feature that could have a few uses. I do wonder if it could be a separate API instead of part of the outbound RTP stats. My initial concerns are calculating this data regardless if the application is even interested in the data and no specification or recommendations on frequency of measurements.

Some pros and cons that come to mind if this were implemented as a separate API instead.
Pros:

This would allow for those interested in the information to request a PSNR directly.
Eliminates the need to constantly be collecting PSNR measurements that may not be used.
It would allow for applications to have some control over the frequency of PSNR measurements.

Cons:

Applications would not be getting PSNR sum for free and need to calculate and store this on their own.

If this should remain in the stats could we consider adding some sort of getStats object to enable logging for this kind of data?

alvestrand · 2025-02-27T06:46:56Z

WebRTC users routinely log getStats data, so adding this would not be any big overhead. If the stats are collected on a timescale of seconds, the overhead is usually negligible. (polling stats for every frame is not a good idea).
I presume that the stats would not be populated if not implemented or enabled; I don't know if it's worth it to expose a control to turn computation on and off on a track.

jan-ivar · 2025-02-27T14:08:52Z

If I understand correctly, the concern of overhead is in the browser doing an expensive calculation most websites would never request (though per-frame is not an issue due to caching, per-second might be; is never an acceptable frequency?)

How expensive is this computation? Our webstats model is like a boat we keep loading with new stuff. Eventually, it becomes problematic.

At some point (maybe now?) might we wish we had something like this?

await sender.getStats({verbosity: "high"}) // low | medium (default) | high

henbos · 2025-02-27T14:55:21Z

I don't think WebRTC has to do these measurements very often for the PSNR measurements to be valuable and if they aren't done very often (say every second or every several seconds) then I don't think we need to make API changes.

A similar example is that if you negotiate corruption-detection we do corruptionMeasurements, but since we only make these once per second they don't have any significant performance implications compared to the rest of the decoding pipeline.

henbos · 2025-02-27T14:56:52Z

Btw this is unrelated to the polling frequency since the metrics only update when a measurement is made and a measurement happens in the background whether or not the app is polling getStats. (Polling getStats several times per second is bad because of the overhead of that call, not because of counters incrementing in the background)

jan-ivar · 2025-02-27T15:13:21Z

One concern is this stat seems to require making two getStats call over some interval.

E.g. is the use case here to try one encoder setting, get stats, then wait 1 second and call getStats again expecting two different measurements? If so this might cause divide by zero error in one browser but not another.

henbos · 2025-02-27T15:34:11Z

All metrics in the getStats API are used like so: "delta foo / delta bar", that is true whether it is a rate (delta bytesSent / delta timestamp), or a measurement thingy (delta totalCorruptionProbability / delta corruptionMeasurements) or something more exotic like (delta qpSum / delta framesEncoded) or even (jitterBufferDelay / jitterBufferEmittedCount).

I could go on with more examples but "divide by zero" is something that the user of this API should be aware of

henbos · 2025-02-27T15:35:54Z

qpSum / framesDecoded might be a better example of a foot gun since that could fail when network glitches but not in stable environment.

In practice web developers will make helper functions that does lookup of deltas and rates taking care of foot guns.

Also you have to be prepared for a metric not being present all of a sudden

jan-ivar · 2025-02-27T15:53:52Z

Yes I didn't mean to suggest the divide by zero hazard was limited to this API. The difference is something like framesDecoded is tied to real-world reception of RTP which won't vary by user agent.

I think the concern in this case is:

lack of cap on frequency might lead two browsers to implement vastly different intervals, e.g. 1 vs 15 seconds.
whether an anticipated use case is A/B codec testing where a website might tune its interval to the fastest browser

fippo · 2025-02-27T17:41:53Z

PSNR is similar to qp so having it in getStats makes sense.

As the paper says we have done this at a frequency higher than one per second on devices where battery consumption is a concern and it works there. Hardware encoder support makes this "cheaper" even. Note that the calculation is done by the encoder so can not be triggered by calling getStats with some magic option. I considered whether it was possible to gate it on the corruption detection RTP header extension but that would have been quite awkward since it is not closely related (not without precedence, quite a few statistics depend on header extensions)

When I say "A/B testing" consider a project like Jitsi moving to AV1, in particular the "Metrics Captured" which, unsurprisingly, relies on getStats. See here for how one uses PSNR to evaluate when it is available.

Such experiments are designed not to compare 🍎 to 🍌 (different browsers, different operating systems) so letting a UA decide on sampling frequency is not a concern as long as it does so consistently.

henbos · 2025-02-28T08:54:23Z

I think the concern in this case is:

lack of cap on frequency might lead two browsers to implement vastly different intervals, e.g. 1 vs 15 seconds.

whether an anticipated use case is A/B codec testing where a website might tune its interval to the fastest browser

Polling is just asking "do you have any new measurements for me?"
Whether I poll once per 15 seconds and get that measurement on the first try or if I poll getStats 15 times and get 14 "no's" and 1 "yes", I think I end up in the same situation, so what exactly was the problem there (and why is this different from the other hundred+ metrics we have)?

It doesn't matter if app polling interval and browser polling interval aligns or not and it's clear from the guidelines that there is no control of sampling period. So I would argue that the only thing that matters is if the measurements are arriving at a granular enough level to be useful.

If the concern is that a browser implementer doesn't know what a useful measurement interval is, maybe we can provide some guidance there, but I fail to see the interop issue with different polling intervals that are all within a "useful" range. FTR I think 15 second is too large of an interval since a lot can happen in that period of time.

fippo · 2025-02-28T18:45:59Z

I would not even poll getStats for A/B testing purposes. One would typically poll periodically and use the last result or call getStats explicitly before closing the peerconnection and then calculate the average PSNR as psnrSum_{y,u,v}/psnrMeasurements. Only calls with enough psnrMeasurements should be taken into account which one needs to irrespective of sampling frequency to exclude "short calls".

(while we are rambling: it seems Firefox throws when calling getStats on a closed peerconnection which is not my understanding of #3 arguably with all the transceivers gone all the interesting stats disappear nowadays)

jan-ivar · 2025-02-28T21:46:42Z

I think the concern in this case is:
...
2. whether an anticipated use case is A/B codec testing where a website might tune its interval to the fastest browser

Polling is just asking "do you have any new measurements for me?" Whether I poll once per 15 seconds and get that measurement on the first try or if I poll getStats 15 times and get 14 "no's" and 1 "yes", I think I end up in the same situation, so what exactly was the problem there

That's fine for telemetry. I think our concern was more someone making runtime decisions off stats, e.g.:

// probe and switch to best codec for media being sent right now:
let bestCodec, bestY = 0;
for (const codec of sender.getParameters().codecs) {
  const params = sender.getParameters();
  params.encodings[0].codec = codec;
  await sender.setParameters(params);
  await wait(1000);
  const ortp1 = [...(await sender.getStats()).values()].find(({type}) => type == "outbound-rtp");
  await wait(1000);
  const ortp2 = [...(await sender.getStats()).values()].find(({type}) => type == "outbound-rtp");
  const y = (ortp2.psnrSum.y - ortp1.psnrSum.y) / (ortp2.psnrMeasurements - ortp1.psnrMeasurements);
  if (bestY < y) { bestY = y; bestCodec = codec;
}
const params = sender.getParameters();
params.encodings[0].codec = bestCodec; }
await sender.setParameters(params);

(and why is this different from the other hundred+ metrics we have)?

No other stat has this note:

FTR I think 15 second is too large of an interval since a lot can happen in that period of time.

This might be good for an implementer to know.

fippo · 2025-02-28T23:08:27Z

I can replace that with "is implementation-defined" linking to https://infra.spec.whatwg.org/#implementation-defined

And copy @sprangerik's great "these metrics should primarily be used as a basis for statistical analysis rather than be used as an absolute truth on a per-frame basis"

henbos · 2025-03-03T07:54:05Z

We could say that PSNR measurement frequency is implementation-specific but that "the user agent SHOULD make make PSNR measurements no less frequently than every X seconds, if PSNR measurements are supported for the current encoder implementation". I still think the app needs to handle the case that a PSNR measurement is not available for a given encoder implementation or browser though, but this would give the app an upper bound on how long to maximally wait for.

fippo · 2025-03-03T18:24:47Z

From what I can see browser implementations would boil down to the same line of code in libWebRTC for software encoders, no? None of the parameters for video encoding are "specified", why do we need to be prescribe here?

updated along the lines of #794 (comment)

jan-ivar · 2025-03-03T21:03:14Z

webrtc-stats.html

+                <p>
+                  The PSNR is defined in [[ISO-29170-1:2017]].
+                </p>


This seems redundant with the same text under psnrSum (line 2235) which is already link to here.

Suggested change



The PSNR is defined in [[ISO-29170-1:2017]].

jan-ivar · 2025-03-03T21:14:20Z

webrtc-stats.html

+                <p class="note">
+		  PSNR metrics should primarily be used as a basis for statistical analysis rather
+		  than be used as an absolute truth on a per-frame basis.
+                  The frequency of PSNR measurements is [=implementation-defined=].
+                </p>


We want to avoid "should" and normative statements in non-normative notes. If we want to be normative we should lift it out.

Based on #794 (comment) are we comfortable picking a min frequency of every 5 seconds?

(Edit: added "or the encoding frame rate whichever is lower")

Suggested change



PSNR metrics should primarily be used as a basis for statistical analysis rather

than be used as an absolute truth on a per-frame basis.

The frequency of PSNR measurements is [=implementation-defined=].





If the current encoder supports taking PSNR measurements, their

frequency SHOULD be no less than every 5 seconds or the

encoding frame rate, whichever is lower.





This allows for testing. PSNR measurements are intended for

statistical analysis, and aren't expected to be accurate down

to a frame.

Can you explain why 5 seconds? keyFramesEncoded and keyFramesDecoded are examples very-low-frequency events happening already where you can not make an assumption about the minimum interval between two getStats calls that will give you an increase. Same for packetsLost.

What value would you like? The frequency of keyFramesEncoded and keyFramesDecoded are determined by external factors unlikely to vary by user agent.

5 is arbitrary. In this thread I heard 15 seconds was too high, and that 1 second was not too expensive, but also that we'd rather not overconstrain implementations too much. 10 seconds?

This value, whatever we pick would go into WPT and also give web developers that can't wait for some reason a minimum time to wait to be interoperable.

Hmm, maybe we need to also qualify that the encoder is actually encoding something? E.g. if the track is muted or a canvas track, then the frame rate may be less than what value we pick. Maybe we add "... or the encoding frame rate whichever is lower"?

tl;dr: this is a parameter of the video encoder configuration. It becomes observable in stats just like others.
What the spec should say is "implementation defined" - the implied warning is "do not compare apples to oranges"

The frequency of keyFramesEncoded is a good example, it is determined by the encoder setting such as gop size. One wants this as large as possible for good performance but this is something where the encoder can be tuned without "interoperability" constraints since the decoder behavior is required to be flexible.

If one wants to get fancy about PSNR one may need to take into account things like "is this a screen sharing track" (higher resolution, lower frame rate and wholly different content) or frame rate (which may depend on BWE). Worth the effort for statistical analysis? Unlikely.

15 seconds was arbitrary too I think with the purpose of "i dont think this makes sense anymore". One second picked in the code is 3x the value from the paper (which has a parameter study) and hence 1/3rd as expensive in terms of power impact.

A SHOULD implies implementation-defined. Is there a value that would be satisfactory?

alvestrand · 2025-03-13T15:37:59Z

From editors meeting: Main reason for a number seems to be feature detection - people want to know that if they have waited this long between calls to GetStats, and the frame counter has increased by X, and the stat doesn't show up, the feature is off.

However, feature detection can be done by insisting that the stat is visible and with value zero if supported.

youennf · 2025-03-13T15:44:43Z

If we have the psnr measurement frequency be implementation-defined, the spec should help the UA developer select a good frequency value. Spec should add some guidelines, for instance doing measurements every second or every 30 frames or so.

fippo · 2025-03-13T22:37:01Z

From editors meeting: Main reason for a number seems to be feature detection - people want to know that if they have waited this long between calls to GetStats, and the frame counter has increased by X, and the stat doesn't show up, the feature is off.

Huh? This is a counter, it can not disappear, it can only stop increasing.

sprangerik · 2025-03-14T09:08:35Z

If we have the psnr measurement frequency be implementation-defined, the spec should help the UA developer select a good frequency value. Spec should add some guidelines, for instance doing measurements every second or every 30 frames or so.

I think the gist of it is that we want the frequency to be a high as possible as long as the performance impact can be kept negligible. Perhaps we can add some text to that effect. Trying to give guidelines in terms of the expected frequency doesn't seem helpful to UA developers - but on the other hand those developers are probably in the best position to determine at which frequency the performance hit becomes non-negligible for their particular encoder implementations.

jan-ivar · 2025-03-14T16:23:21Z

I think the gist of it is that we want the frequency to be a high as possible as long as the performance impact can be kept negligible.

That's great, because I don't think anyone is proposing a ceiling on frequency. A floor would be nice though.

I disagree giving guidance is bad, since what you wrote sounds like good guidance already. Let's write it down.

fippo · 2025-03-15T01:32:16Z

As we say in German, "Guter Rat ist teuer". The guidance should "talk to your video encoding guys". That is what "implementation-defined" is for, no?

youennf · 2025-03-17T09:55:34Z

talk to your video encoding guys seems to induce that the rate selection may be encoder specific.

This naturally leads to privacy questions if the selection of the rate depends on the encoder, the system load, the CPU...
I only see benefits in the spec defining the implementor playground.

fippo · 2025-03-17T15:46:44Z

if the selection of the rate depends on the encoder, the system load, the CPU...

Who is proposing this?

This naturally leads to privacy questions

How is this a new concern given video codecs in WebRTC already are adapting to CPU, thermal state, etc? Which is not described by any specification either. Note that PSNR should be possible to compute with JS and existing APIs already.

And you are most welcome to invite privacy guys into your meetings obviously.

fippo added 3 commits January 23, 2025 10:09

record of doubles

cece6f4

make reference informative

33b4159

vr000m reviewed Feb 18, 2025

View reviewed changes

fippo mentioned this pull request Feb 27, 2025

Stats example uses confusing variable names #768

Open

statistical analysis, implementation-defined

85afc36

jan-ivar requested changes Mar 3, 2025

View reviewed changes

-                <p class="note">
-		  PSNR metrics should primarily be used as a basis for statistical analysis rather
-		  than be used as an absolute truth on a per-frame basis.
-                  The frequency of PSNR measurements is [=implementation-defined=].
-                </p>
+                <p>
+                  If the current encoder supports taking PSNR measurements, their
+                  frequency SHOULD be no less than every 5 seconds or the
+                  encoding frame rate, whichever is lower.
+                </p>
+                <p class="note">
+                  This allows for testing. PSNR measurements are intended for
+                  statistical analysis, and aren't expected to be accurate down
+                  to a frame.
+                </p>

Add PSNR (Y/U/V) for outbound-rtp #794

Are you sure you want to change the base?

Add PSNR (Y/U/V) for outbound-rtp #794

Conversation

fippo commented Feb 5, 2025 • edited by pr-preview bot Loading

fippo commented Feb 5, 2025

henbos commented Feb 6, 2025

fippo commented Feb 6, 2025

vr000m commented Feb 18, 2025

fippo commented Feb 18, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vr000m commented Feb 18, 2025

alvestrand commented Feb 20, 2025

jan-ivar commented Feb 20, 2025

sprangerik commented Feb 21, 2025

sprangerik commented Feb 21, 2025

vr000m commented Feb 22, 2025

dontcallmedom-bot commented Feb 26, 2025

Drekabi commented Feb 27, 2025

alvestrand commented Feb 27, 2025

jan-ivar commented Feb 27, 2025

henbos commented Feb 27, 2025

henbos commented Feb 27, 2025

jan-ivar commented Feb 27, 2025

henbos commented Feb 27, 2025

henbos commented Feb 27, 2025

jan-ivar commented Feb 27, 2025

fippo commented Feb 27, 2025 • edited Loading

henbos commented Feb 28, 2025

fippo commented Feb 28, 2025

jan-ivar commented Feb 28, 2025

fippo commented Feb 28, 2025 • edited Loading

henbos commented Mar 3, 2025 • edited Loading

fippo commented Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

jan-ivar Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alvestrand commented Mar 13, 2025 • edited Loading

youennf commented Mar 13, 2025

fippo commented Mar 13, 2025 • edited Loading

sprangerik commented Mar 14, 2025

jan-ivar commented Mar 14, 2025

fippo commented Mar 15, 2025 • edited Loading

youennf commented Mar 17, 2025

fippo commented Mar 17, 2025

fippo commented Feb 5, 2025 •

edited by pr-preview bot

Loading

fippo commented Feb 18, 2025 •

edited

Loading

fippo commented Feb 27, 2025 •

edited

Loading

fippo commented Feb 28, 2025 •

edited

Loading

henbos commented Mar 3, 2025 •

edited

Loading

fippo commented Mar 3, 2025 •

edited

Loading

jan-ivar Mar 3, 2025 •

edited

Loading

alvestrand commented Mar 13, 2025 •

edited

Loading

fippo commented Mar 13, 2025 •

edited

Loading

fippo commented Mar 15, 2025 •

edited

Loading