-
Notifications
You must be signed in to change notification settings - Fork 683
Decoder metadata interface #2672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With this framing, I think Limits::max_image_width and Limits::max_image_height no longer need to be communicated to or handled by the ImageDecoder trait, because the external code can check ImageDecoder::dimensions() before invoking ImageDecoder::read_image(); only the memory limit (Limits::max_alloc) is essential. That being said, the current way Limits are handled by ImageDecoder isn't that awkward to implement, so to reduce migration costs keeping the current ImageDecoder::set_limits() API may be OK. |
|
A couple thoughts... I do like the idea of handling animation decoding with this same trait. To understand, are you thinking of "sequences" as being animations or also stuff like the multiple images stored in a TIFF file? Even just handling animation has some tricky cases though. For instance in PNG, the default image that you get if you treat the image as non-animated may be different from the first frame of the animation. We might need both a The addition of an |
It's a dyn-compatible way that achieves the goal of the constructor so it is actually an abstraction.
What do you by this? The main problem in I'm also not suggesting that calling |
|
@fintelia This now includes the other changes including to
|
|
I can't speak about image metadata, but I really don't like the new
Regarding rectangle decoding, I think it would be better if we force decoders to support arbitrary rects. That's because the current interface is actually less efficient by allowing decoder to support only certain rects. To read a specific rect that is not supported as is, However, most image formats are based on lines of block (macro pixels). So we can do a trick. Decode a line according to the too-large rect, and then only copy the pixels in the real rect to the output buffer. This reduces the memory overhead for unsupported rects from And if a format can't do the line-based trick for unsupported rects, then decoders should just allocate a temp buffer for the too-large rect and then crop (=copy what is needed). This is still just as efficient as the best For use cases where users can use rowpitch to ignore the exccess parts of the too-large rect, we could just have a method that gives back a preferred rect, which can be decoded very efficiently. So the API could look like this: trait ImageDecoder {
// ...
/// Returns a viewbox that contains all pixels of the given rect but can potentially be decoded more efficiently.
/// If rect decoding is not supported or no more-efficient rect exists, the given rect is returned as is.
fn preferred_viewbox(&self, viewbox: Rect) -> Rect {
viewbox // default impl
}
fn read_image_rect(&mut self, buf, viewbox) -> ImageResult {
Err(ImageError::Decoding(Decoding::RectDecodingNotSupported)) // or similar
}This API should make rect decoding easier to use, easier to implement, and allow for more efficient implementations. |
86c9194 to
cdc0363
Compare
That was one of the open questions, the argument you're presenting makes it clear it should return the layout and that's it. Renamed to
It's suppose to be to the full image. Yeah, that needs more documentation and pointers to the proper implementation. |
| } | ||
| } | ||
|
|
||
| impl ImageReader<'_> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it is easier for scrolling through the file if the impl blocks for each struct immediately follow the definitions
src/io/image_reader_type.rs
Outdated
| self.viewbox = Some(viewbox); | ||
| } | ||
|
|
||
| /// Get the previously decoded EXIF metadata if any. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "previously decoded" part here makes me a bit nervous. I think we'll want to be clearer about what the user has to do to make sure they don't get None for images that actually do have EXIF data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit odd and will depend on the format and metadatum. For instance, XMP is encoded per-image in tiff but only once in gif (despite us representing this as an image sequence) and also only once in png (no word about APNG). Problematically, in gif and png the standard requires absolutely no ordering with any of the other chunks. So it might be encountered before all of the header information is done; or after all the images have been consumed.
The problem with back references is of course the unclear association. And when multiple are included we always have a problem with consuming them 'after the end' since it should need to be buffered or the decoder able to store seek-back points (like png). Encoding all that in the interface is a challenge, i will incur some unavoidable complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moving the metadata query between peek_layout and read_image doesn't really affect this argument with the variants of MetadataHint that I've found to be necessary. So that is cleaner, see #2672 (comment)
Of course we have last_attributes still remaining since that is information that combines the information the reader has retrieved, e.g. the orientation given directly from read_image together with the fallback of querying it from exif.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
This comment was marked as resolved.
1a114c3 to
306c6d2
Compare
|
Resolving the naming question as |
e8d2713 to
4325060
Compare
|
@fintelia I understand this is too big for a code-depth review but I'd be interested in the directional input. Is the merging of 'animations' and simple images as well as the optimization hint methods convincing enough? Is the idea of returning data from As an aside, in wondermagick we basically find that sequence encoding is a missing API to match imagemagick. We can currently only do this with |
f6720de to
c677c88
Compare
|
I think it would be nice to have an iterator of all images in a file, where in the case of pyramid TIFFs for example, I could filter for the image with dimensions most closely matching a given need. |
2f71c9a to
a2703f4
Compare
| /// Consume the header of the image, determining the image's layout. | ||
| /// | ||
| /// This must be called before a call to [`Self::read_image`] to ensure that the initial | ||
| /// metadata has been read. In contrast to a constructor it can be called after configuring | ||
| /// limits and context which avoids resource issues for formats that buffer metadata. | ||
| /// | ||
| /// The layout returned by an implementation of [`ImageDecoder::peek_layout`] must match the | ||
| /// buffer expected in [`ImageDecoder::read_image`]. | ||
| fn peek_layout(&mut self) -> ImageResult<crate::ImageLayout>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To better understand how the new ImageDecoder interface works, I've been trying to implement it for the image-extras decoders, and as time permits will add comments as I run into issues and have questions. (Edit: see https://github.com/mstoeckl/image-extras/tree/experiment-deco-meta, still rather rough)
- The name ("peek_...") suggests that it is safe to call this multiple times in a row (or zero!); for example: that the sequence
self.peek_layout(),self.peek_layout(),self.read_image()should work and would not read two headers from the input. Could this be explicitly documented? - For formats containing a single image, may and/or should calls to peek_layout() return an Error (like ParameterErrorKind::NoMoreData) after read_image() has been called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should work and would not read two headers from the input. Could this be explicitly documented?
may and/or should calls to peek_layout() return an Error (like ParameterErrorKind::NoMoreData) after read_image() has been called?
Both correctly guessed as was the intention behind the naming. Of course we can touch up the documentation. The decoder should return a useful error as NoMoreData but for any format that is neither is_animated nor is_sequence we don't need to make any strong requirements. It would be nice to make our internal implementations behave more strictly correct which is mostly missing tests, I think.
| #[non_exhaustive] | ||
| pub struct ImageLayout { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
External crates would need to construct ImageLayout in order to implement ImageDecoder::peek_layout(), which is tricky because ImageLayout is marked non-exhaustive.
| /// Returns the color type of the image file before decoding | ||
| fn original_color_type(&self) -> ExtendedColorType { | ||
| self.color_type().into() | ||
| fn original_color_type(&mut self) -> ImageResult<ExtendedColorType> { | ||
| Ok(self.peek_layout()?.color.into()) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
original_color_type() might be worth merging into ImageLayout/peek_layout(), because I expect typical format decoders will determine the value of this at the same time that the ImageLayout::color is determined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, I wasn't quite sure what to do with it in the commit where the previous dimensions were finally folded into peek_layout but that is a very reasonable treatment. Just an extra Option<ExtendedColorType> field I suppose?
This was one of the silently broken features when adding the new methods. While reviewing how to address the underlying cause with a clippy fix it became obvious a few other methods were missing, too.
Change ImageDecoder::read_image to `fn(&mut self)`. There are some codecs where metadata and other information is only available after decoding. Add ImageDecoder::init for codecs where reading the header needs to take into account limits and other configuration we are to introduce. This sketch compiles as: `--no-default-features --features=png`
Renames it `peek_layout` to clarify the relation to `read_image`, that it is to be called potentially multiple times for each such call. Document the trait more with this relationship, moving some methods around to clarify.
The latter interface will be something that interfaces and interacts with the underlying decoder. Introduces ImageDecoder::set_viewport as a stand-in for what is intended here: color, subsampling, gain map application will take more interaction, clarification of how the library will request their data, and adjustments to the layout definition by the decoder.
The purpose of the trait is to be careful with the boundary to the `moxcms` dependency. We use it because it is a quality implementation but it is heavy weight for what we need. There's other possible ways to provide transfer functions and color space transforms. Now this also introduces ICC profile parsing but again that could be done with a *much* lighter dependency as we only need basic information from it. The trait should make every little additional cross-dependency a conscious decision. Also it should be the start of a customization point, by feature flag or actually at runtime.
No longer responsible for ensuring the size constraints are met under the new policy and with available of constructing a reader from an instance of a boxed decoder.
This allows us to write a generic iterator which uses the same decoder function to generate a whole sequence of frames. The attributes are designed to be extensible to describe changes in available metadata as well, very concretely some formats require that XMP/ICC/… are polled for each individual image whereas others have one for the whole file and put that at the end. So there's no universal sequence for querying the metadata, and we need to hold runtime information. This will be the focus of the next commit.
This commit is meant to be revertable. The demonstration code was an experiment at improving the protocol between the trait implementation and the user facing `ImageReader`. While successfully showing that the interface would work with extension points as intended the details are not fleshed out and we would like at least some implementations to provide actual `viewbox` internals to add this feature. Also remember: this is not yet tested, `ImageReader::peek_layout` and all its derived methods should very likely consult the viewbox setting.
This method was mostly implemented to acquire EXIF information and retrieve the entry from it. However, that is a weird way of using the trait system with this mutable method. Instead we move the responsibility of extracting and manipulation of EXIF to the ImageReader and have the decoder interface only support its input. The orientation is added as a field to per-image decoded data in case where such information exists outside of EXIF (e.g. in TIFF).
At least the per-image metadata should be retrieved there. This also ensures that there is a consistent location for data available after the header for all images and data that changes per image. The call to `read_image` can then be destructive for all metadata that is only valid for that particular frame (e.g. TIFF) which it needed to delay until the next `peek_layout` before, leading to very odd interactions. Previously, it meant that `read_image` was supposed to cleanup the image data itself but not the metadata. This split responsibility is not super intuitive for many image types..
Having separate methods for these is not very effective in terms of protocol. All the formats we have can determine them at the same sequence point and `peek_layout` has become somewhat central in the definition of the protocol. It defines the buffer requirements for `read_image`—having one method means one source of truth for this length requirement which is far clearer than requiring that the caller run multiple methods and combine their results (with unclear semantics of running intermediate other methods).
c7a2be7 to
4d151b6
Compare
RunDevelopment
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love how this came along since the last time I had a look. Awesome work, @197g and everyone involved!
| /// Retrieve animation attributes. | ||
| /// | ||
| /// You should check [`DecoderAttributes::is_animated`] before calling this method. It will | ||
| /// only be available on animated images. Additionally, most file formats store the metadata in | ||
| /// the header which might not be read until after calling [`ImageDecoder::peek_layout`]. | ||
| fn animation_attributes(&mut self) -> Option<DecodedAnimationAttributes> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
What does "available" mean here? Does that mean that
is_animated == falseimpliesanimation_attrs() == None?If so, then I would suggest replacing the first two sentences with: "This method always returns
Nonefor non-animated images. Use [`DecoderAttributes::is_animated`] to check whether an image is animated." -
Why is this method
mut? From its description, this sounds like a pure method that doesn't cause side effects like IO or changing the internal state of the decoder.I know that GIF needs
mutcall itsensure_decoder()method, but I don't understand why it callsensure_decoder()in the first place.animation_attributesis documented as requiringpeek_layout, which initializes theGifDecoder::decoderfield.This leaves me wondering. Is
animation_attributesallowed to rely on the side effects ofpeek_layout? If it is, then it doesn't needmutsincepeek_layoutdoes the side effects. -
(I assume that repeated calls are allowed.) Are the
DecodedAnimationAttributesreturned by this method allowed to change over the lifetime of the decoder? In other words, if it returnedSome(x)at some point, is it allowed to returnSome(y)forx != yor evenNonein subsequent calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- […]
Aye.
Why is this method mut? From its description, this sounds like a pure method that doesn't cause side effects like IO or changing the internal state of the decoder.
Mainly since all other metadata methods like icc data have the same interface. With the current shape of the trait, every caller is generally expected to have unique ownership over the decoder. There's little reason to make these methods shared then while this allows a few more patterns in the implementation that might otherwise be awkward.
It does not need mut but it also does not need shared.
Are the DecodedAnimationAttributes returned by this method allowed to change over the lifetime of the decoder? In other words, if it returned Some(x) at some point, is it allowed to return Some(y) for x != y or even None in subsequent calls?
In the sense that the interface is not unsafe that is always okay. We probably need more fields in the DecoderAttributes if ImageReader has reason to ever cache these from the decoder object. But there's plenty of extension points left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sense that the interface is not
unsafethat is always okay.
I was more so asking about what is correct in terms of the interface. So would we consider behavior like return values changing a bug if a decoder behaved that way?
| /// basic capability information about the format. If, in the future, we added different basic | ||
| /// methods of retrieving color data then the attributes would indicate the preferred and/or | ||
| /// possible choices. | ||
| fn attributes(&self) -> DecoderAttributes { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please explain what kind of attributes this method returns? Right now, it's not clear to me whether the attributes are about the format itself (i.e. static attributes that solely describe the format and its implementation) or about the image file (i.e. attributes that change even within a format depending on which file is read).
Right now, ::is_animated is documented as: "Are there multiple images in this file that form an animation?" However, implementations like GIF contradict this documentation, because they return a constant value irrespective of the image file's contents. I.e. even GIF images with only a single frame return is_animated: true.
See #2245, the intended
ImageDecoderchanges.This changes the
ImageDecodertrait to fix some underlying issues. The main change is a clarification to the responsibilities; the trait is an interface from an implementor towards theimagelibrary. That is, the protocol established from its interface should allow us to drive the decoder into our buffers and our metadata. It is not optimized to be used by an external caller which should prefer the use ofImageReaderand other inherent methods instead.This is a work-in-progress, below motivates the changes and discusses open points.
ImageDecoder::peek_layoutencourages decoders to read headers after the constructor. This fixes the inherent problem we had with communicating limits. The sequences for internal use is roughly:ImageDecoder::read_image(&mut self)no longer consumesself. We no longer need the additionalboxedmethod and its trait work around, the trait is now dyn-compatible.Discussion
peek_layoutmore consistently afterread_imageinitpeek_layoutshould return the full layout information in a single struct. We have a similar open issue forpngin its own crate, and the related work fortiffis in the pipeline where itsBufferLayoutPreferencealready exists to be extended with said information.Review limits and remove its size bounds insofar as they can be checked against the communicated bounds in the metadata step by thesee: Replaceimageside.ImageDecoder::set_limitswithImageDecoder::set_allocation_limit#2709, Add an atomically shared allocation limit #27081.1, but it's not highly critical.read_imageis 'destructive' in all decoders, i.e. re-reading an image and reading an image beforeinitshould never access an incorrect part of the underlying stream but instead return an error. Affects pnm and qoi for instance where the read will interpret bytes based on the dimensions and color, which would be invalid before reading the header and only valid for one read.read_imagethen switching to a sequence reader. But that is supposed to become mainly an adapter that implements the iterator protocol.ImageReaderwith a new interface to return some of it. That may be better suited for a separate PR though.CicpRgband apply it to a decodedDynamicImage.