Subtractive vs additive patch sizes #942
Closed
thekevinscott
started this conversation in
Ideas
Replies: 1 comment
-
Implemented this with #948 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Today, using patch sizes with padding is additive.
Let's say you specify a
patchSize
of64
with apadding
of2
. The model will receive successive slices of image of size68
(patchSize + (padding * 2)
, or64 + (2 * 2)
).I made this API choice because it seemed more intuitive to expect an additive API surface. However, the upcoming series of MAXIM models complicates this by introducing two new architectural changes not present in the existing ESRGAN models:
[any, 64, 64, 3]
- all incoming tensors must be of shape64x64
)In the first case, specifying a patch size is no longer valid, as the model dictates the size of the incoming slices of image; in the second case, patch size can be specified but must be adjusted to fit the constraints of the divisibility constraint of the model (e.g., a
patchSize
of3
would need to be adjusted to4
for a model whose divisibility factor is4
).The latter case in particular requires changes in the way users interact with UpscalerJS when specifying
patchSize
andpadding
. That change is either:patchSize
andpadding
to a model. If a model has a fixed input size of[null, 64, 64, 3]
, this means understanding the algorithm under the hood and providing something like{ patchSize: 60, padding: 2 }
My vote is for the latter approach.
What that means in practice is that, when specifying a
patchSize
of64
and apadding
of2
, the slice of image passed to the model will equal64
, but the usable slice of image (the chunk that will be used to compose the final image minus the padding) will be60
. Another way to think of this is, the image returned via theprogress
callback will be of size60
(the final enhanced image won't be affected).In addition, I think this change should apply across all models. I think it is unreasonable to expect users to differentiate what kind of model is being leveraged (whether it by of dynamic input shape, fixed input shape, or divisibility-constrained input shape) and have to perform the homework of switching between an additive or subtractive scheme depending on the used model.
That means that this would be a breaking change. That said, I would not expect this to be a large breaking change - it would mean that if there are some users expecting that
patchSize
= the chunk of image they get back when also specifyingpadding
, that this behavior would break. I suspect this is either zero or a very small subset of users.Beta Was this translation helpful? Give feedback.
All reactions