Description
At start of decode (and after a flush), WebCodecs VideoDecoder demands a keyframe which at the moment is defined as an IDR frame.
H264 has the concept of a Recovery Point SEI Message (D.2.8 in the (08.21) h264 spec): "The recovery point SEI message assists a decoder in determining when the decoding process will produce acceptable pictures for display after the decoder initiates random access or after the encoder indicates a broken link in the coded video sequence.".
So (afaict) an I-frame with a such a SEI message is meant to be usable as start frame for a decoding operation.
ffprobe
also marks these frames as key-frames.
I don't have enough data to comment on how often this happens in real-live video streams; personally I have 1000s of hours of videos taken with different JVC / Sony camcorders (timelaps recordings, used in animal conservation projects), which have the following properties:
- Stream starts (when record button is pressed) with IDR frame
IBBPBBPBBPBBI
GOPs, where every I-frame has Recovery Point SEI message withexact_match_flag=1
andrecovery_frame_cnt=0
- IDR frames repeat every 300 frames (every 25 GOPs)
- Streams get "cut" after 4GB recording into new file, new file starts with I-frame, but not (guaranteed) IDR frame.
Not being able to start decoding on I-frame + SEI means that:
- Worst case first 24 GOP's of stream can not be decoded without having access to previous file
- When random-access is needed in decoder, worst case 299 frames need to be decoded before requested frame can be shown (takes about 0.25s on my M1 macbook, not the end of the world, but not a smooth drag-playhead-and-find experience for users either. Note that the video files generally are 4GB large, so decoding all frames up-front is also not a solution.
Solution on client side (short of recoding, which results in unacceptable quality loss) that kind of seems to work (but probably a very bad idea) is to add a dummy-IDR frame that I offer to the decoder before feeding the real stream (and then dropping the first frame of the output).