Half precision in flax models #2385

mar-muel · 2022-08-08T09:46:42Z

mar-muel
Aug 8, 2022

Hello team

I'm trying to run experiments in half precision (bfloat16) on TPU v3-8 VMs. I'm experiencing the following behaviour and I just wanted to check with you guys whether this expected?

from typing import Sequence
import jax 
import jax.numpy as jnp 
import flax.linen as nn


class MLP(nn.Module):
  features: Sequence[int]

  @nn.compact
  def __call__(self, x): 
    for feat in self.features[:-1]:
      x = nn.relu(nn.Dense(feat, dtype=jnp.bfloat16)(x))
    x = nn.Dense(self.features[-1], dtype=jnp.bfloat16)(x)
    return x


model = MLP([12, 8, 4]) 
batch = jnp.ones((32, 10), dtype=jnp.bfloat16)
variables = model.init(jax.random.PRNGKey(0), batch)
output = model.apply(variables, batch)
print('Input dtype: ', batch.dtype)
print('Output dtype: ', output.dtype)
print('Model dtype: ', jax.tree_util.tree_map(lambda s: s.dtype, variables))

Output:

Input dtype:  bfloat16
Output dtype:  bfloat16
Model dtype:  FrozenDict({
    params: {
        Dense_0: {
            bias: dtype('float32'),
            kernel: dtype('float32'),
        },
        Dense_1: {
            bias: dtype('float32'),
            kernel: dtype('float32'),
        },
        Dense_2: {
            bias: dtype('float32'),
            kernel: dtype('float32'),
        },
    },
})

I understand that it is somewhat uncommon to keep the model weights in half precision (especially for batch norm layers). But how would one force the model weights to be bfloat16 (say in case the model is too large to fit on a single device)?

I guess one could explicitly cast all variables afterwards

variables = jax.tree_util.tree_map(lambda x: jnp.asarray(x, jnp.bfloat16), variables)

Casting params results (as expected) in ValueError: FrozenDict is immutable..

Thanks for your help!

Versions:

jax==0.3.15
jaxlib==0.3.15
flax==0.5.3
libtpu-nightly==0.1.dev20220722

Python version 3.8.13

Answered by mar-muel

Aug 8, 2022

Nevermind - I just found found the solution to use param_dtype on the modules! So using

class MLP(nn.Module):
  features: Sequence[int]

  @nn.compact
  def __call__(self, x): 
    for feat in self.features[:-1]:
      x = nn.relu(nn.Dense(feat, dtype=jnp.bfloat16, param_dtype=jnp.bfloat16)(x))
    x = nn.Dense(self.features[-1], dtype=jnp.bfloat16, param_dtype=jnp.bfloat16)(x)
    return x

Can be closed.

View full answer

mar-muel · 2022-08-08T09:51:35Z

mar-muel
Aug 8, 2022
Author

Nevermind - I just found found the solution to use param_dtype on the modules! So using

class MLP(nn.Module):
  features: Sequence[int]

  @nn.compact
  def __call__(self, x): 
    for feat in self.features[:-1]:
      x = nn.relu(nn.Dense(feat, dtype=jnp.bfloat16, param_dtype=jnp.bfloat16)(x))
    x = nn.Dense(self.features[-1], dtype=jnp.bfloat16, param_dtype=jnp.bfloat16)(x)
    return x

Can be closed.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Half precision in flax models #2385

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Half precision in flax models #2385

mar-muel Aug 8, 2022

Replies: 1 comment

mar-muel Aug 8, 2022 Author

mar-muel
Aug 8, 2022

mar-muel
Aug 8, 2022
Author