diff --git a/CHANGELOG.md b/CHANGELOG.md
index e603f6bdbf..7b2151d164 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,28 +7,56 @@ vNext
 (Add your change to a random empty line to avoid merge conflicts)
  -
  -
- - Added an NLP text classification example (SST-2 sentiment) to examples/sst2.
-   that uses a birectional LSTM (BiLSTM) to encode the input text.
- - Added flax.training.train_state to simplifying using Optax optimizers.
- - Rewrote ImageNet example to use Optax instead of flax.optim for optimizers.
  -
- - `mutable` argument is now available on `Module.init` and `Module.init_with_outputs`
- - When calling `init` the 'intermediates' collection is no longer mutable
-   Therefore, intermediates will no longer be returned from initialization by default. 
  -
  -
- - Bug Fix: Correclty handle non-default parameters of Linen Modules with nested inheritance.
  -
  -
  -
  -
- - `BatchNorm` instances will behave correctly during init when called multiple times.
  -
  -
  -
  -
  -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+ -
+
+0.3.4
+------
+
+Possibly breaking changes:
+ - When calling `init` the 'intermediates' collection is no longer mutable.
+   Therefore, intermediates will no longer be returned from initialization by default. 
+ - Don't update batch statistics during initialization.
+ - When not using any non-determinism (e.g., dropout), it is not longer necessary to specify the `deterministic` argument in `MultiHeadDotProductAttention`.
+
 
+Other changes:
+ - Rewrote various examples to use Optax instead of Flax optimizers (e.g., Imagenet, SST2).
+ - Added an NLP text classification example (on the SST-2 dataset) to
+   [`examples/sst2`](https://github.com/google/flax/tree/master/examples/sst2).
+   that uses a bidirectional LSTM (BiLSTM) to encode the input text.
+ - Added `flax.training.train_state` to simplify using Optax optimizers.
+ - `mutable` argument is now available on `Module.init` and `Module.init_with_outputs`
+ - Bug fix: Correctly handle non-default parameters of Linen Modules with nested inheritance.
+ - Expose `dot_product_attention_weights`, allowing access to attention weights.
+ - `BatchNorm` instances will behave correctly during init when called multiple times.
+ - Added a more extensive "how to contribute" guide in `contributing.md`.
+ - Add proper cache behavior for [`lift.jit`](https://flax.readthedocs.io/en/latest/_autosummary/flax.linen.jit.html#flax.linen.jit),
+fixing cache misses.
+ - Fix bug in Embed layer: make sure it behaves correctly when embedding is np.array.
+ - Fix `linen.Module` for deep inheritance chains.
+ - Fix bug in DenseGeneral: correctly expand bias to account for batch & noncontracting dimensions.
+ - Allow Flax lifted transforms to work on partially applied Modules.
+ - Make `MultiOptimizer` use `apply_gradient` instead of `apply_param_gradient`.
 
 0.3.3
 ------
diff --git a/README.md b/README.md
index ee3ebba287..eb442526e6 100644
--- a/README.md
+++ b/README.md
@@ -153,7 +153,7 @@ To cite this repository:
   author = {Jonathan Heek and Anselm Levskaya and Avital Oliver and Marvin Ritter and Bertrand Rondepierre and Andreas Steiner and Marc van {Z}ee},
   title = {{F}lax: A neural network library and ecosystem for {JAX}},
   url = {http://github.com/google/flax},
-  version = {0.3.3},
+  version = {0.3.4},
   year = {2020},
 }
 ```
diff --git a/flax/version.py b/flax/version.py
index 34dcfec74e..4ca93680b0 100644
--- a/flax/version.py
+++ b/flax/version.py
@@ -12,5 +12,5 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-__version__ = "0.3.3"
+__version__ = "0.3.4"