How to effectively do Gradient Accumulation to `flax.training.train_state` ? #1989

reshinthadithyan · 2022-01-25T18:44:40Z

reshinthadithyan
Jan 25, 2022

I have a train_step that looks roughly like this,

def train_step(state, batch):
       def compute_loss(state,batch):
              #loss is computed here.
             return loss
        grad_fn = jax.value_and_grad(compute_loss)
        loss, grad = grad_fn(state.params)
        grad = jax.lax.pmean(grad, "batch")

        new_state = state.apply_gradients(
            grads=grad, dropout_rng=new_dropout_rng
        )
        metrics = {"loss": loss, "learning_rate": learning_rate_fn(state.step)}
        metrics = jax.lax.pmean(metrics, axis_name="batch")
        return new_state, metrics

What is the most effective way of adding gradient accumulation to this? Thanks. I had implemented a rough version with jax.lax.cond [Based on this] writing an additional accumulation attribute to flax.training.train_state. Is this the right way to do it?
Thanks, much. An example with gradient accumulation would help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to effectively do Gradient Accumulation to `flax.training.train_state` ? #1989

{{title}}

Replies: 0 comments

Select a reply

How to effectively do Gradient Accumulation to flax.training.train_state ? #1989

reshinthadithyan Jan 25, 2022

Replies: 0 comments

How to effectively do Gradient Accumulation to `flax.training.train_state` ? #1989

reshinthadithyan
Jan 25, 2022