Flax model too big... OOM memory allocation error #2283

rosikand · 2022-07-09T07:40:56Z

rosikand
Jul 9, 2022

Hello,

I am trying to train a simple encoder-decoder model (input: (1,224,224) --> output: (2,224,224)) in Jax + Flax on a Tesla M60 GPU but I am running into a memory error during model and parameter initialization.

Here is my model:

latent_dims = 20

class AE(nn.Module):
  @nn.compact
  def __call__(self, x):
    # encoder 
    x = jnp.reshape(x, -1) 
    x = nn.Dense(features=50176)(x)
    x = nn.relu(x)
    x = nn.Dense(features=512)(x)
    x = nn.relu(x)
    x = nn.Dense(features=latent_dims)(x)
    
    # decoder 
    x = nn.Dense(features=512)(x)
    x = nn.relu(x)
    x = nn.Dense(features=50176 * 2)(x)
    return x.reshape(2,224,224)

when I run this cell:

autoencoder = AE()
sample_batch = jnp.ones((1,224,224))
params = autoencoder.init(jax.random.PRNGKey(0), sample_batch)

the following error appears:

2022-07-09 07:32:00.924400: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:479] Allocator (GPU_0_bfc) ran out of memory trying to allocate 9.38GiB (rounded to 10070523904)requested by op 
2022-07-09 07:32:00.924705: W external/org_tensorflow/tensorflow/core/common_runtime/bfc_allocator.cc:491] *___________________________________________________________________________________________________
2022-07-09 07:32:00.924886: E external/org_tensorflow/tensorflow/compiler/xla/pjrt/pjrt_stream_executor_client.cc:2129] Execution of replica 0 failed: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 10070523904 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
             parameter allocation:        16B
              constant allocation:         0B
        maybe_live_out allocation:    9.38GiB
     preallocated temp allocation:   23.45GiB
  preallocated temp fragmentation:         0B (0.00%)
                 total allocation:   32.83GiB
              total fragmentation:    4.69GiB (14.29%)
Peak buffers:
	Buffer 1:
		Size: 9.38GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/jit(clip)/min" source_file="..." source_line=754
		XLA Label: fusion
		Shape: f32[50176,50176]
		==========================

	Buffer 2:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/slice[start_indices=(0,) limit_indices=(1258815488,) strides=None]" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 3:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/slice[start_indices=(1258815488,) limit_indices=(2517630976,) strides=None]" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 4:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 5:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: u32[1258815488]
		==========================

	Buffer 6:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: u32[1258815488]
		==========================

	Buffer 7:
		Size: 16B
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: fusion
		Shape: (u32[1258815488], u32[1258815488])
		==========================

	Buffer 8:
		Size: 16B
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: (u32[1258815488], u32[1258815488])
		==========================

	Buffer 9:
		Size: 8B
		Entry Parameter Subshape: u32[2]
		==========================

	Buffer 10:
		Size: 4B
		Entry Parameter Subshape: s32[]
		==========================

	Buffer 11:
		Size: 4B
		Entry Parameter Subshape: s32[]
		==========================


---------------------------------------------------------------------------
XlaRuntimeError                           Traceback (most recent call last)
/tmp/ipykernel_28650/1590727240.py in <cell line: 3>()
      1 autoencoder = AE()
      2 sample_batch = jnp.ones((1,224,224))
----> 3 params = autoencoder.init(jax.random.PRNGKey(0), sample_batch)  # model param that we'll train

    [... skipping hidden 9 frame]

~/miniconda3/envs/.../lib/python3.9/contextlib.py in inner(*args, **kwds)
     77         def inner(*args, **kwds):
     78             with self._recreate_cm():
---> 79                 return func(*args, **kwds)
     80         return inner
     81 

    [... skipping hidden 2 frame]

/tmp/ipykernel_28650/4201526030.py in __call__(self, x)
      6     # encoder
      7     x = jnp.reshape(x, -1)
----> 8     x = nn.Dense(features=50176)(x)
      9     x = nn.relu(x)
     10     x = nn.Dense(features=512)(x)

    [... skipping hidden 1 frame]

~/miniconda3/envs/.../lib/python3.9/contextlib.py in inner(*args, **kwds)
     77         def inner(*args, **kwds):
     78             with self._recreate_cm():
---> 79                 return func(*args, **kwds)
     80         return inner
     81 

    [... skipping hidden 2 frame]

~/miniconda3/envs/.../lib/python3.9/site-packages/flax/linen/linear.py in __call__(self, inputs)
    184       The transformed input.
    185     """
--> 186     kernel = self.param('kernel',
    187                         self.kernel_init,
    188                         (jnp.shape(inputs)[-1], self.features),

    [... skipping hidden 2 frame]

~/miniconda3/envs/.../lib/python3.9/site-packages/jax/_src/nn/initializers.py in init(key, shape, dtype)
    231         # constant is stddev of standard normal truncated to (-2, 2)
    232         stddev = jnp.sqrt(variance) / jnp.array(.87962566103423978, dtype)
--> 233         return random.truncated_normal(key, -2, 2, shape, dtype) * stddev
    234       else:
    235         # constant is stddev of complex standard normal truncated to 2

~/miniconda3/envs/.../lib/python3.9/site-packages/jax/_src/random.py in truncated_normal(key, lower, upper, shape, dtype)
    655   if shape is not None:
    656     shape = core.as_named_shape(shape)
--> 657   return _truncated_normal(key, lower, upper, shape, dtype)  # type: ignore
    658 
    659 @partial(jit, static_argnums=(3, 4), inline=True)

    [... skipping hidden 6 frame]

~/miniconda3/envs/.../lib/python3.9/site-packages/jax/_src/dispatch.py in _execute_compiled(name, compiled, input_handler, output_buffer_counts, result_handler, has_unordered_effects, ordered_effects, kept_var_idx, *args)
    715     in_flat, token_handler = _add_tokens(has_unordered_effects, ordered_effects,
    716                                          device, in_flat)
--> 717   out_flat = compiled.execute(in_flat)
    718   check_special(name, out_flat)
    719   out_bufs = unflatten(out_flat, output_buffer_counts)

XlaRuntimeError: RESOURCE_EXHAUSTED: Out of memory while trying to allocate 10070523904 bytes.
BufferAssignment OOM Debugging.
BufferAssignment stats:
             parameter allocation:        16B
              constant allocation:         0B
        maybe_live_out allocation:    9.38GiB
     preallocated temp allocation:   23.45GiB
  preallocated temp fragmentation:         0B (0.00%)
                 total allocation:   32.83GiB
              total fragmentation:    4.69GiB (14.29%)
Peak buffers:
	Buffer 1:
		Size: 9.38GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/jit(clip)/min" source_file="..." source_line=754
		XLA Label: fusion
		Shape: f32[50176,50176]
		==========================

	Buffer 2:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/slice[start_indices=(0,) limit_indices=(1258815488,) strides=None]" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 3:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/slice[start_indices=(1258815488,) limit_indices=(2517630976,) strides=None]" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 4:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: fusion
		Shape: u32[1258815488]
		==========================

	Buffer 5:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: u32[1258815488]
		==========================

	Buffer 6:
		Size: 4.69GiB
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: u32[1258815488]
		==========================

	Buffer 7:
		Size: 16B
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: fusion
		Shape: (u32[1258815488], u32[1258815488])
		==========================

	Buffer 8:
		Size: 16B
		Operator: op_name="jit(_truncated_normal)/jit(main)/threefry2x32" source_file="..." source_line=754
		XLA Label: custom-call
		Shape: (u32[1258815488], u32[1258815488])
		==========================

	Buffer 9:
		Size: 8B
		Entry Parameter Subshape: u32[2]
		==========================

	Buffer 10:
		Size: 4B
		Entry Parameter Subshape: s32[]
		==========================

	Buffer 11:
		Size: 4B
		Entry Parameter Subshape: s32[]
		==========================

And so it seems that the model is simply too big (9 GB) to be stored in main memory. I am not sure how I can resolve this issue given the compute resources. Is it possible to reduce model size in Flax? What about memory-mapping the model between disk and main memory if it is too large to fit in?

Again, I simply just don't know how to progress from this point. I think the model is relatively simple and many architectures are able to take in images as big as 224 x 224 so I find the fact that the model takes up this much space a little surprising.

Any recommendations?

Thanks!

Answered by marcvanzee

Jul 9, 2022

When initializing the model you should jit your init function. If you don't do this, you will run a full forward pass which consumes a lot of memory. Jitting will only initialize the parameters and everything else will be optimized away by XLA's dead code elimination. So you should try this:

autoencoder = AE()
sample_batch = jnp.ones((1,224,224))
params = jax.jit(autoencoder.init)(jax.random.PRNGKey(0), sample_batch)

View full answer

marcvanzee · 2022-07-09T19:27:11Z

marcvanzee
Jul 9, 2022
Maintainer

When initializing the model you should jit your init function. If you don't do this, you will run a full forward pass which consumes a lot of memory. Jitting will only initialize the parameters and everything else will be optimized away by XLA's dead code elimination. So you should try this:

autoencoder = AE()
sample_batch = jnp.ones((1,224,224))
params = jax.jit(autoencoder.init)(jax.random.PRNGKey(0), sample_batch)

2 replies

rosikand Jul 9, 2022
Author

Hi! Thanks for the reply. Even with jit'ing the params initialization, I still get the same OOM error. Any steps on what to do next?

marcvanzee Jul 9, 2022
Maintainer

Ah sorry I actually didn't look very closely at your architecture.

I think the model is relatively simple

Well... not really. Your first Dense layer has a huge kernel: the input is 224x244, which you flatten to an array which then has a length of about 50k. Then you apply a Dense of 50k features, which means your kernel has 50k * 50k = 2.5B parameters. For comparison: the second largest T5 model has 3B parameters (the entire model!). So your model is simply too big to fit in memory.

I wouldn't flatten an entire image and then apply a Dense, you have to do some pooling first to shrink the input.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Flax model too big... OOM memory allocation error #2283

{{title}}

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Flax model too big... OOM memory allocation error #2283

rosikand Jul 9, 2022

Replies: 1 comment · 2 replies

marcvanzee Jul 9, 2022 Maintainer

rosikand Jul 9, 2022 Author

marcvanzee Jul 9, 2022 Maintainer

rosikand
Jul 9, 2022

Replies: 1 comment 2 replies

marcvanzee
Jul 9, 2022
Maintainer

rosikand Jul 9, 2022
Author

marcvanzee Jul 9, 2022
Maintainer