Taking gradients with respect to network activations (not weights) #2501

awtaw5q25ASF · 2022-10-04T14:38:54Z

awtaw5q25ASF
Oct 4, 2022

Consider a feedforward neural network

z(x) = sigma_N(f_N(... sigma_1( f_1(x)) ...))

where each sigma_n is an activation function, and each f_n is a layer (e.g., linear). We can define the network's activations recursively as

a_n = f_n( sigma_{n-1}(a_{n-1} )

where a_0 = x and sigma_0 is the identity.

Suppose we also have a loss function L(y, z(x)).

Then my question is: whether we can use jax and flax to easily calculate the set of gradients

grad_{a_n} L(y, z(x))

I.e., the gradient of the loss with respect to each of its activations?

Thank you!

cgarciae · 2022-10-06T16:01:29Z

cgarciae
Oct 6, 2022
Maintainer

There is a new perturb mechanism that lets you inspect gradients, docs are missing (will create a PR right now), but here is some sample code:

from typing import Callable
import jax
import jax.numpy as jnp
import flax.linen as nn

class Block(nn.Module):
  units: int
  activation: Callable

  @nn.compact
  def __call__(self, x):
    x = nn.Dense(features=self.units)(x)
    x = nn.relu(x)
    x = self.perturb('act', x)
    return x

class Model(nn.Module):
  units: int
  num_blocks: int
  activation: Callable

  @nn.compact
  def __call__(self, x):
    for _ in range(self.num_blocks):
      x = Block(units=self.units, activation=self.activation)(x)
    return x

x = jnp.ones((1, 4))
y = jax.random.normal(jax.random.PRNGKey(0), (1, 3))

model = Model(units=3, num_blocks=2, activation=nn.relu)
variables = model.init(jax.random.PRNGKey(0), jnp.ones((1, 4)))

def loss_fn(params, perturbations, x, y):
  y_pred = model.apply({'params': params, 'perturbations': perturbations}, x)
  return jnp.mean((y_pred - y) ** 2)

activation_grads = jax.grad(loss_fn, argnums=1)(variables['params'], variables['perturbations'], x, y)

print(activation_grads)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Taking gradients with respect to network activations (not weights) #2501

{{title}}

Replies: 1 comment

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Taking gradients with respect to network activations (not weights) #2501

awtaw5q25ASF Oct 4, 2022

Replies: 1 comment

cgarciae Oct 6, 2022 Maintainer

awtaw5q25ASF
Oct 4, 2022

cgarciae
Oct 6, 2022
Maintainer