How to skip integer params from Flax Model? #2621

amankhandelia · 2022-11-16T14:16:15Z

amankhandelia
Nov 16, 2022

Hello there,

I have been working on porting Donut Model from torch to Flax. In my test, inference is fully functional and gives the same results as torch. In the same vain, I am finetuning Flax model to check if it gets the same performance as torch model. For that I am following this notebook by @NielsRogge

This is where I am getting errors. To reproduce the error, please refer to the colab notebook.

On this line we are defining relative_position_index which of type int32, and it is not a learnable parameter, In order to make the jax.value_and_grad work with int type I am passing allow_int=True.I am guessing that is the source of problem, not skipping gradient calculation on the integer variable (although I could be wrong), So my primary question is how to skip the gradient calculation for such parameters, and if that not the case, what should I do correct the error. Any help is deeply appreciated.

I have done one more hacky thing, which could also be the source of my pains, although I am not so certain of that, as you will see in the notebook, we have to expand the number of tokens in the pretrained model, to include special tokens for document classes. In order to the same

additional_tokens = ["<advertisement/>", "<budget/>", "<email/>", "<file_folder/>", "<form/>", "<handwritten/>", "<invoice/>",
  "<letter/>", "<memo/>", "<news_article/>", "<presentation/>", "<questionnaire/>", "<resume/>",
  "<scientific_publication/>", "<scientific_report/>", "<specification/>"]

seed = jax.random.PRNGKey(seed=42)
extra_tokens = jax.random.normal(seed, shape=(len(additional_tokens), 1024))

newly_added_num = processor.tokenizer.add_tokens(additional_tokens)

model.params['decoder']['model']['decoder']['embed_tokens']['embedding'] = jnp.concatenate([model.params['decoder']['model']['decoder']['embed_tokens']['embedding'], extra_tokens], axis=0)

after running this code, I am running model.init again through model.init_weights:

seed = jax.random.PRNGKey(seed=1729)

input_shape = (
    (1, height, width, num_channels),
    (1, 1),
)
model.config.decoder.vocab_size = 57545
model.params = model.init_weights(seed, input_shape, model.params)

Could this be problem, my understanding is Jax/Flax is not good enough to comment if it is, if somebody else sees a problem please do comment

cc: @cgarciae

cgarciae · 2022-11-16T16:32:58Z

cgarciae
Nov 16, 2022
Maintainer

Hey @amankhandelia

Params Issue

For the params having non-differentiable values issue there are a couple of options:

Option 1

Since relative_position_index is basically a constant, one suggestion would be to treat it as such e.g:

class FlaxDonutSwinSelfAttention(nn.Module):
    config: DonutSwinConfig
    dim: int
    num_attention_heads: int
    relative_position_index: jnp.ndarray # accept is as an input
    dtype: jnp.dtype = jnp.float32

This way its not in your params collection so there is no need to filter it. To update it you can use the replace function from the dataclasses module. If you update it like this and are using a TrainState object, you will have to also update apply_fn, else JAX won't know of the updates.

Option 2

Use optax.multi_transform to prevent the optimizer from trying to modify those parameters. See the 'Freezing Layers' from our Transfer Learning guide as an example.

Option 3

You can create a separate collection (e.g. index) so separately store these value:

# we are piggybacking the `params` rng here for simplicity
index_key = self.make_rng('params') if self.has_rng('params') else None 
self.relative_position_index = self.variable(
    "index", "relative_position_index", 
    relative_position_index_init, index_key, self.window_size, self.dtype
).value

Just beware that you have to pass this collection around now, same as with the batch_stats collection when you use BatchNorm.

Updating issue

I am not familiar enough with HuggingFace's API so its hard to comment if init_weights is doing the correct thing here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to skip integer params from Flax Model? #2621

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

How to skip integer params from Flax Model? #2621

amankhandelia Nov 16, 2022

Replies: 1 comment

cgarciae Nov 16, 2022 Maintainer

Params Issue

Option 1

Option 2

Option 3

Updating issue

amankhandelia
Nov 16, 2022

cgarciae
Nov 16, 2022
Maintainer