Why this loss function does not take batch as input? #2319

uduse · 2022-07-21T21:43:18Z

uduse
Jul 21, 2022

@jax.jit
def train_step(state, batch):
  """Train for a single step."""
  def loss_fn(params):
    logits = CNN().apply({'params': params}, batch['image'])
    loss = cross_entropy_loss(logits=logits, labels=batch['label'])
    return loss, logits
  grad_fn = jax.value_and_grad(loss_fn, has_aux=True)
  (_, logits), grads = grad_fn(state.params)
  state = state.apply_gradients(grads=grads)
  metrics = compute_metrics(logits=logits, labels=batch['label'])
  return state, metrics

My guess is that since we are inside jit all values including the batch sample are already traced, and gradient computation is only need to computed with regards to the parameters. The reason that we are passing parameters is solely for the purpose of telling the value_and_grad whos gradents to return.

Answered by marcvanzee

Jul 22, 2022

It is not related to jitting, but indeed, jax.grad has the following explanation:

Input fun: Function to be differentiated. Its arguments at positions specified by argnums should be arrays, scalars, or standard Python containers. (...) . It should return a scalar (which includes arrays with shape () but not arrays with shape (1,) etc.)
Returns: A function with the same arguments as fun, that evaluates the gradient of fun.

Note the default value of argnums is 0, so by default grad will return a function that evaluates the gradients of your input wrt the first argument, which in the case of your example is params.

You could also pass the batch as a second argument to loss_fn (which would …

View full answer

marcvanzee · 2022-07-22T14:22:22Z

marcvanzee
Jul 22, 2022
Maintainer

It is not related to jitting, but indeed, jax.grad has the following explanation:

Input fun: Function to be differentiated. Its arguments at positions specified by argnums should be arrays, scalars, or standard Python containers. (...) . It should return a scalar (which includes arrays with shape () but not arrays with shape (1,) etc.)
Returns: A function with the same arguments as fun, that evaluates the gradient of fun.

Note the default value of argnums is 0, so by default grad will return a function that evaluates the gradients of your input wrt the first argument, which in the case of your example is params.

You could also pass the batch as a second argument to loss_fn (which would be ignored by default), but given that you are closing over that value from the outer scope, it doesn't make much sense to add it as an argument, that it can actually be confusing since someone might think that argument is being used to compute gradients as well.

2 replies

uduse Jul 23, 2022
Author

I am confused because I think the batch is used to compute the gradients. More specifically, we have to do a forward pass during the loss computation and that forward pass relies on the batch input. The tracing only happens once (for same input specs) so it appears to me that if I do not put the batch in the loss function's scope, the forward pass will be performed using a fixed batch.

cgarciae Jul 24, 2022
Maintainer

batch is passed as a capture to loss_fn, you can view these as arguments that are passed behind the scenes. How they are implemented in python, batch inside loss_fn changes when batch in train_step changes so what you are describing doesn't happen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why this loss function does not take batch as input? #2319

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Why this loss function does not take batch as input? #2319

uduse Jul 21, 2022

Replies: 1 comment · 2 replies

marcvanzee Jul 22, 2022 Maintainer

uduse Jul 23, 2022 Author

cgarciae Jul 24, 2022 Maintainer

uduse
Jul 21, 2022

Replies: 1 comment 2 replies

marcvanzee
Jul 22, 2022
Maintainer

uduse Jul 23, 2022
Author

cgarciae Jul 24, 2022
Maintainer