Initializing Parameterized Modules #1921

srush · 2022-02-17T13:33:49Z

srush
Feb 17, 2022

A bit of confusion about the Flax semantics of setup.

I have a module that has a complex parameterization. During training, I need setup to be called repeatedly to transform my parameters to the right form. apply works nicely for this.

def setup(self):
   self.t = slow_function(self.parameter(...))

During test, I would like to avoid rerunning this transformation since the parameters don't change.

Is there a way to do this? I looked at bind but the docs seemed to warn against it. I guess setting up a cache is possible, but seems like overkill. Is there a canonical way to do this?

Answered by jheek

Feb 17, 2022

This is an interesting problem. You provide a (potentially fresh) dict of params on each apply. But the jax arrays don't have identity so there's no super fast way to validate the the param didn't change. I don't think you would want to use bind because in a setup like this:

@jit
def eval_step(variables, batch)
  ....
for batch in batches:
  eval_step(variables, batch)

you can only bind inside eval_step which means you would still recompute it every step in the end (you can't pass abound module across transform boundaries).

So I would do this by using a collection to cache the value

def setup(self):
   self.t = self.variable("constants", "t", slow_function, param)

then in training you do:

…

View full answer

jheek · 2022-02-17T14:31:41Z

jheek
Feb 17, 2022
Maintainer

This is an interesting problem. You provide a (potentially fresh) dict of params on each apply. But the jax arrays don't have identity so there's no super fast way to validate the the param didn't change. I don't think you would want to use bind because in a setup like this:

@jit
def eval_step(variables, batch)
  ....
for batch in batches:
  eval_step(variables, batch)

you can only bind inside eval_step which means you would still recompute it every step in the end (you can't pass abound module across transform boundaries).

So I would do this by using a collection to cache the value

def setup(self):
   self.t = self.variable("constants", "t", slow_function, param)

then in training you do:

# don't pass old constants back in
y, variables = model.apply({"params": params}, mutable="constants")
# discard constants after optimizer update...

and during eval you use

y = model.apply({"params": params, "constants": constants})

Before eval you could pre-compute the constants or pass an empty constants dict and make it mutable so it's automatically computed on the first eval_step (in this case jax would compile 2 versions of eval_step one with slow_function and one where it uses the cache)

0 replies

srush · 2022-02-17T16:42:01Z

srush
Feb 17, 2022
Author

Thanks! This is a very clear explanation and doesn't seem to make my code much messier. I'm using caches for RNN state anyway, so this is not too much of a change. It's nice that I can have some mutable and some non-mutable variables.

Just to clarify two points.

slow_function would need to take all the parameters it uses as arguments. It couldn't call self.param internally right, since it is only called during .init?
Regarding the last point. Is there a way to call the .init function with known parameters (but not known variables)? That seems kind of like what I would want. (Although internally is init just equivalent to calling setup with nothing init? in which case I guess it would be equivalent to your solution).

One random comment:

"Before eval you could pre-compute the constants " <- This kind of thing always terrifies me in flax. like I'm supposed to build a tree of variables that is exactly the shape of my network? that seems so hard to get right.

1 reply

jheek Feb 18, 2022
Maintainer

slow_function would need to take all the parameters it uses as arguments. It couldn't call self.param internally right, since it is only called during .init?

Actually It could and this would be a legit use case if you only use the param to compute the constant in which case it doesn't need to be passed during eval.

Regarding the last point. Is there a way to call the .init function with known parameters (but not known variables)? That seems kind of like what I would want. (Although internally is init just equivalent to calling setup with nothing init? in which case I guess it would be equivalent to your solution).

init is equivalent to apply with variables = {} and mutable = DenyList("intermediates") (so everything is mutable except the special collection for tracking intermediates during init). So init is just sugar and in this case you can just use _, out_vars = model.apply({'params': ...}, input, mutable=["constants"])

"Before eval you could pre-compute the constants " <- This kind of thing always terrifies me in flax. like I'm supposed to build a tree of variables that is exactly the shape of my network? that seems so hard to get right.

I don't mean you construct it "by hand" but that you call the network to precompute it before you run eval. This would just be an optimization for compile time:

@jax.jit
def precomp_constants(params):
  constants = modle.apply({'params': params}, dummy_input, mutable="constants")[1]["constants"]
  return constants

for i in range(n)
train_step()

constants = precomp_constants()

for i in rage(m)
eval_step(params, constants)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initializing Parameterized Modules #1921

{{title}}

Replies: 2 comments 1 reply

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Initializing Parameterized Modules #1921

srush Feb 17, 2022

Replies: 2 comments · 1 reply

jheek Feb 17, 2022 Maintainer

srush Feb 17, 2022 Author

jheek Feb 18, 2022 Maintainer

srush
Feb 17, 2022

Replies: 2 comments 1 reply

jheek
Feb 17, 2022
Maintainer

srush
Feb 17, 2022
Author

jheek Feb 18, 2022
Maintainer