Subtractive vs additive patch sizes #942

thekevinscott · 2023-05-28T16:58:09Z

thekevinscott
May 28, 2023
Maintainer

Today, using patch sizes with padding is additive.

Let's say you specify a patchSize of 64 with a padding of 2. The model will receive successive slices of image of size 68 (patchSize + (padding * 2), or 64 + (2 * 2)).

I made this API choice because it seemed more intuitive to expect an additive API surface. However, the upcoming series of MAXIM models complicates this by introducing two new architectural changes not present in the existing ESRGAN models:

The ability for models to specify a fixed input shape (e.g., an input shape of [any, 64, 64, 3] - all incoming tensors must be of shape 64x64)
The ability for models to specify a divisibility constraint (e.g., all images must have even dimensions, or must be divisible by 2, even if the model's input shape is dynamic)

In the first case, specifying a patch size is no longer valid, as the model dictates the size of the incoming slices of image; in the second case, patch size can be specified but must be adjusted to fit the constraints of the divisibility constraint of the model (e.g., a patchSize of 3 would need to be adjusted to 4 for a model whose divisibility factor is 4).

The latter case in particular requires changes in the way users interact with UpscalerJS when specifying patchSize and padding. That change is either:

Expect users to do the math themselves when providing patchSize and padding to a model. If a model has a fixed input size of [null, 64, 64, 3], this means understanding the algorithm under the hood and providing something like { patchSize: 60, padding: 2 }
Move to a subtractive algorithm when specifying these two properties.

My vote is for the latter approach.

What that means in practice is that, when specifying a patchSize of 64 and a padding of 2, the slice of image passed to the model will equal 64, but the usable slice of image (the chunk that will be used to compose the final image minus the padding) will be 60. Another way to think of this is, the image returned via the progress callback will be of size 60 (the final enhanced image won't be affected).

In addition, I think this change should apply across all models. I think it is unreasonable to expect users to differentiate what kind of model is being leveraged (whether it by of dynamic input shape, fixed input shape, or divisibility-constrained input shape) and have to perform the homework of switching between an additive or subtractive scheme depending on the used model.

That means that this would be a breaking change. That said, I would not expect this to be a large breaking change - it would mean that if there are some users expecting that patchSize = the chunk of image they get back when also specifying padding, that this behavior would break. I suspect this is either zero or a very small subset of users.

thekevinscott · 2023-06-05T20:36:26Z

thekevinscott
Jun 5, 2023
Maintainer Author

Implemented this with #948

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Subtractive vs additive patch sizes #942

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

Select a reply

Subtractive vs additive patch sizes #942

thekevinscott May 28, 2023 Maintainer

Replies: 1 comment

thekevinscott Jun 5, 2023 Maintainer Author

thekevinscott
May 28, 2023
Maintainer

thekevinscott
Jun 5, 2023
Maintainer Author