MLP for NNX #4033

jlperla · 2024-06-26T17:12:59Z

jlperla
Jun 26, 2024

Is there a coding pattern (or even class itself in progress) for a flexible MLP-style wrapper in NNX? Thinking along the lines of https://docs.kidger.site/equinox/api/nn/mlp/ or https://pytorch.org/vision/main/generated/torchvision.ops.MLP.html with a few extra features

To me, at least, the key parameters are:

in_size
out_size
depth. For me, at least, I think that the depth is always small enough that unrolling is appropriate.
use_bias between layers
final_bias on the last layer (before a possible nonlinear transformation)
activation functions between layers
final transformation (defaulting to the identity) which may transform the output. e.g. softplus or exponential to make the output positive, etc.

I don't mind trying to see if an RA can do a PR for this as practice with NNX if you give some hints. But in the meantime is there a coding pattern which will work well with JAX/NNX.

Note that the pytorch ones don't quite have these features but the equinox one does, after I put in a PR to it.

cgarciae · 2024-06-27T10:42:17Z

cgarciae
Jun 27, 2024
Maintainer

Hey @jlperla! Please take a look at The Flax Philosophy. We avoid the use of combinators as much as possible, only combinators we have are are linen.RNN because getting the details of masking is not easy and Sequential because its fairly transparent.

0 replies

jlperla · 2024-06-28T16:32:14Z

jlperla
Jun 28, 2024
Author

Thanks @cgarciae I think I see the philsophy. As a JAX outsider, though, the mental model of performance in JAX is much more opaque than you may realize so it might be useful to give some more guidance. The other thing is that random number generation in nnx looks cool, but it is unclear to me how to use it properly.

Along those lines, can you do a critique of this implementation and tell me things that would have led to hidden sub-optimal performance, incorrect use of random numbers, etc.? If helpful, I could clean this up as an exmaple for the docs if you think that is valuable?

import typing as tp
import jax
import jax.numpy as jnp
from flax import nnx
from flax.nnx.nnx import rnglib
from flax.typing import Dtype, PrecisionLike

class MLP(nnx.Module):
    def __init__(
        self,
        in_features: int,
        out_features: int,
        *,
        width: int,
        depth: int,
        activation: tp.Callable,
        rngs: rnglib.Rngs,
        use_bias: bool = True,
        use_final_bias: bool = True,
        final_activation: tp.Optional[tp.Callable] = None,
        dtype: tp.Optional[Dtype] = None,
        param_dtype: Dtype = jnp.float32,
        precision: PrecisionLike = None,
    ):
        self.in_features = in_features
        self.out_features = out_features
        self.width = width
        self.depth = depth
        self.use_bias = use_bias
        self.use_final_bias = use_final_bias
        self.activation = activation
        self.final_activation = final_activation
        assert depth > 0  # skipping specialization of no hidden layers

        self.layers = []
        self.layers.append(
            nnx.Linear(
                in_features,
                width,
                use_bias=use_bias,
                dtype=dtype,
                param_dtype=param_dtype,
                precision=precision,
                rngs=rngs,
            )
        )
        for i in range(self.depth - 1):
            self.layers.append(
                nnx.Linear(
                    width,
                    width,
                    use_bias=self.use_bias,
                    dtype=dtype,
                    param_dtype=param_dtype,
                    precision=precision,
                    rngs=rngs,
                )
            )
            self.layers.append(self.activation)
        self.layers.append(
            nnx.Linear(
                width,
                out_features,
                use_bias=self.use_final_bias,
                dtype=dtype,
                param_dtype=param_dtype,
                precision=precision,
                rngs=rngs,
            )
        )
        if self.final_activation is not None:
            self.layers.append(self.final_activation)

    def __call__(self, x: jax.Array) -> jax.Array:
        for layer in self.layers:
            x = layer(x)
        return x


if __name__ == "__main__":
    rngs = nnx.Rngs(0)
    n_in = 2
    n_out = 1
    depth = 3
    width = 128
    activation = nnx.relu
    model = MLP(n_in, n_out, width=width, depth=depth, activation=activation, rngs=rngs)
    # NOT SURE HOW THE nnx rngs can be split, etc.?
    x = jax.random.normal(rngs.next(), (n_in,))
    model(x)
    random_inputs = jax.random.normal(rngs.next(), (5, n_in))

    @nnx.jit
    def loss(f, batch):
        return jnp.mean(jax.vmap(f)(batch))

    val = loss(model, random_inputs)

    my_batch = jax.random.normal(rngs.next(), (20, n_in))
    @nnx.jit
    def loss_closure(f):
        return jnp.mean(jax.vmap(f)(my_batch))

    loss_val, loss_grad = nnx.value_and_grad(loss_closure)(model)

    print(val)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MLP for NNX #4033

{{title}}

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

MLP for NNX #4033

jlperla Jun 26, 2024

Replies: 2 comments

cgarciae Jun 27, 2024 Maintainer

jlperla Jun 28, 2024 Author

jlperla
Jun 26, 2024

cgarciae
Jun 27, 2024
Maintainer

jlperla
Jun 28, 2024
Author