How does nn.vjp works ? #2217

pablo2909 · 2022-06-22T00:52:03Z

pablo2909
Jun 22, 2022

Hi,

I have been playing with nn.vjp but I can't get it to work. Here is my toy example showcasing the limits of my understanding:

class VF(nn.Module):
    init_val : float = 0.01
    def setup(self):
        self.kernel = self.param(
            'kernel', # Name of the parameter
            nn.initializers.constant(self.init_val), #Function that returns an array, that is the parameter
            (1,), # Input to the previous functions
        )
    def __call__(self, h, t): #h and t are mandatory arguments
        return h*self.kernel**2


class JVP_VF_params(nn.Module):
    @nn.compact
    def __call__(self, h, t, x):
        primal_out, vjpfun_params = nn.vjp(lambda mdl, h: mdl(h, t), VF(init_val=-3), h)
        print(f"primal out {primal_out}")
        z1, z2 = vjpfun_params(x)
        print(z1, z2)
        return z1

vf1 = JVP_VF_params()
key1, _ = random.split(random.PRNGKey(1))
params1 = vf1.init(key1, jnp.array(3.), 0, jnp.array([4.]))

In this example VF(Vector Field) is a function for which I would like to compute the vjp with respect to its parameters and the vjp with respect to the input h.
Above, as expected we have primal_out = 3 * (-3)**2 = 27. However, we also have z1 = 36 , which suggests z1 = x*kernel**2 = 4*3**2=36. Hence the function vjpfun_params returned by nn.vjp is the vjp with respect to the input h. It makes sense with my definition of lambda, but I expected the vjp with respect to kernel as the documentation says "vjp_variables – The vjpfun will return a cotangent vector for all variable collections specified by this filter."

This leads to my two questions:

How do I get the vjp with respect to the parameters ? (grad of vf wrt to params)
How do I get the vjp with respect to the input h ?

Note that I would like that the vjp lives in an nn.Module to incorporate it in my existing code.

Thank you very much for reading that far and for any help on this question. :)
Paul

PS: I have read the related discussions but was not able to find the answer I was looking for :)
PPS: I understand that vjp is the product of a vector and the Jacobian. For ease here, I confuse it with the gradient wrt to the parameters or the input.

Answered by jheek

Jun 22, 2022

In this example you are only calling init. Because the params aren't initialized yet you will get a vjp for h but an empty dict as the vjp for params. When you call vf1.apply, z2 will contain a tangent for the params as well.

You can also preinitialize the weights to get a param vjp even during init:

class JVP_VF_params(nn.Module):
    @nn.compact
    def __call__(self, h, t, x):
        vf = VF(init_val=-3)
        vf(h) # make sure params are initialized
        primal_out, vjpfun_params = nn.vjp(lambda mdl, h: mdl(h, t), vf, h)
        print(f"primal out {primal_out}")
        z1, z2 = vjpfun_params(x)
        print(z1, z2)
        return z1

View full answer

jheek · 2022-06-22T09:21:01Z

jheek
Jun 22, 2022
Maintainer

In this example you are only calling init. Because the params aren't initialized yet you will get a vjp for h but an empty dict as the vjp for params. When you call vf1.apply, z2 will contain a tangent for the params as well.

You can also preinitialize the weights to get a param vjp even during init:

class JVP_VF_params(nn.Module):
    @nn.compact
    def __call__(self, h, t, x):
        vf = VF(init_val=-3)
        vf(h) # make sure params are initialized
        primal_out, vjpfun_params = nn.vjp(lambda mdl, h: mdl(h, t), vf, h)
        print(f"primal out {primal_out}")
        z1, z2 = vjpfun_params(x)
        print(z1, z2)
        return z1

4 replies

pablo2909 Jun 22, 2022
Author

Thank you for the quick reply ! This solves my issue :)

pablo2909 Jul 11, 2022
Author

Hi, @jheek

Follow up question on this issue :)

Why in the lambda function do we use the __call__ method of mdl and not apply ? I am asking because I imagine the following scenario:

I have some function vf:

class VF(nn.Module):
    init_val : float =2.
    def setup(self):
        self.kernel = self.param(
            'kernel', # Name of the parameter
            nn.initializers.constant(self.init_val), #Function that returns an array, that is the parameter
            (1,), # Input to the previous functions
        )
    def __call__(self, h, t): #h and t are mandatory arguments
        print(f"{self.kernel = }")
        return h*self.kernel**2 + t**2

vf = VF(init_val=-3)
key1, _ = random.split(random.PRNGKey(1))
params1 = vf.init(key1, jnp.array(3.), 0)

params1 is initialised to be -3.

Then I compute the VJP of vf:

class VJP_VF_params(nn.Module):
    mdl_in: nn.Module        

    @nn.compact
    def __call__(self, h, t, x):
        # vf(h, t) # make sure params are initialized
        d = dict(h=h, t=t) # create a dictionary with the inputs to contain them, otherwise it fails
        primal_out, vjpfun_params = nn.vjp(lambda mdl, d: mdl(d["h"], d["t"]), self.mdl_in, d)
        print(f"{primal_out = }")
        z1, z2 = vjpfun_params(x)
        print(f"{z1 = }, {z2 = }") #z2 is the derivative of vf with respect to the params times x
        #z1 is the derivative of vf with respect to h and t times x
        return z1
vf1 = JVP_VF_params(mdl_in=vf)
key1, _ = random.split(random.PRNGKey(1))
params2 = vf1.init(key1, jnp.array(3.), 0, jnp.array([4.]))
vf1.apply(params2, h=jnp.array(3.), t=1., x=jnp.array([4.]))

Then assume some operation changed params1 from -3 to 4. How do I communicate that to the VJP ? Should I initialise it again ? What is the best way of achieving that ?

Thank you for your help ! :)

pablo2909 Jul 13, 2022
Author

To answer my first question:

Why in the lambda function do we use the __call__ method of mdl and not apply ?

I think this is because I define vf as a sub-module of VJP_VF_params, thus it becomes bound and so the variables are attached to it. In that case, we can just use __call__. It is actually what we do when we define for example nn.Dense as a submodule, we just use it's __call__method.

pablo2909 Jul 13, 2022
Author

To answer my second question:

Then assume some operation changed params1 from -3 to 4. How do I communicate that to the VJP ? Should I initialise it again ? What is the best way of achieving that ?

We have to see that VF and VJP_VF_params actually share the same parameters, so if the parameters of VF changes, just use those new parameters in the apply method of VJP_VF_params.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How does nn.vjp works ? #2217

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 4 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

How does nn.vjp works ? #2217

pablo2909 Jun 22, 2022

Replies: 1 comment · 4 replies

jheek Jun 22, 2022 Maintainer

pablo2909 Jun 22, 2022 Author

pablo2909 Jul 11, 2022 Author

pablo2909 Jul 13, 2022 Author

pablo2909 Jul 13, 2022 Author

pablo2909
Jun 22, 2022

Replies: 1 comment 4 replies

jheek
Jun 22, 2022
Maintainer

pablo2909 Jun 22, 2022
Author

pablo2909 Jul 11, 2022
Author

pablo2909 Jul 13, 2022
Author

pablo2909 Jul 13, 2022
Author