Skip to content

Consider marking an I-frame with Recovery Point SEI message as h264 key frameΒ #650

Open
@reinhrst

Description

@reinhrst

At start of decode (and after a flush), WebCodecs VideoDecoder demands a keyframe which at the moment is defined as an IDR frame.

H264 has the concept of a Recovery Point SEI Message (D.2.8 in the (08.21) h264 spec): "The recovery point SEI message assists a decoder in determining when the decoding process will produce acceptable pictures for display after the decoder initiates random access or after the encoder indicates a broken link in the coded video sequence.".

So (afaict) an I-frame with a such a SEI message is meant to be usable as start frame for a decoding operation.

ffprobe also marks these frames as key-frames.

I don't have enough data to comment on how often this happens in real-live video streams; personally I have 1000s of hours of videos taken with different JVC / Sony camcorders (timelaps recordings, used in animal conservation projects), which have the following properties:

  • Stream starts (when record button is pressed) with IDR frame
  • IBBPBBPBBPBBI GOPs, where every I-frame has Recovery Point SEI message with exact_match_flag=1 and recovery_frame_cnt=0
  • IDR frames repeat every 300 frames (every 25 GOPs)
  • Streams get "cut" after 4GB recording into new file, new file starts with I-frame, but not (guaranteed) IDR frame.

Not being able to start decoding on I-frame + SEI means that:

  • Worst case first 24 GOP's of stream can not be decoded without having access to previous file
  • When random-access is needed in decoder, worst case 299 frames need to be decoded before requested frame can be shown (takes about 0.25s on my M1 macbook, not the end of the world, but not a smooth drag-playhead-and-find experience for users either. Note that the video files generally are 4GB large, so decoding all frames up-front is also not a solution.

Solution on client side (short of recoding, which results in unacceptable quality loss) that kind of seems to work (but probably a very bad idea) is to add a dummy-IDR frame that I offer to the decoder before feeding the real stream (and then dropping the first frame of the output).

Metadata

Metadata

Assignees

No one assigned

    Labels

    maybeIdeas that might be in scope, and worth discussingneed-definitionAn issues where something needs to be specified normativelyregistrypertains to new or updated registry entry

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions