Releases: keras-team/keras-hub
v0.23.0
Summary:
New Models:
We've integrated a range of cutting-edge models, each designed to tackle specific challenges in their respective domains:
- 
Cell2Sentence: A single-cell, biology-aware model built on the Gemma-2 architecture, designed to interpret complex biological data. 
- 
T5Gemma: A new encoder-decoder model, ideal for sequence-to-sequence tasks like translation and summarization. 
- 
PARSeq: An end-to-end, ViT-based model for scene text recognition (STR), excelling at reading text in natural images. 
- 
D-FINE: A high-performance, real-time object detection model. 
- 
DepthAnythingV2: A monocular depth estimation (MDE) model trained on a combination of synthetic labeled data and real-world unlabeled images. 
- 
Qwen3 Moe: The largest language model in the Qwen series, utilizing a Mixture-of-Experts (MoE) architecture for enhanced performance and efficiency. 
- 
MobileNetV5: A state-of-the-art vision encoder specifically designed for high-efficiency AI on edge devices. 
- 
SmolLM3: A compact yet powerful language model excelling in reasoning, long-context understanding, and multilingual capabilities. 
Improvements & Enhancements
This update also includes several key improvements to enhance the platform's stability, compatibility, and flexibility:
- export_to_transformers: You can now export trainable models, tokenizers, and configurations directly into the Hugging Face Transformers format using- export_to_transformers. This feature is currently available for Gemma models, with support for more architectures coming soon.
- OpenVINO Backend Support: We've integrated OpenVINO inference support, enabling optimized inference for Mistral, Gemma, and GPT-2 models.
- Bidirectional Attention Mask: Gemma models now support a bidirectional attention mask, enabling more effective fine-tuning on tasks that require understanding the full context of a sequence.
- CLIP & SD3 Model Refactor: The CLIP and Stable Diffusion 3 models have been refactored to improve numerical stability. Updated checkpoints are now available to ensure seamless and reliable performance.
What's Changed
- Register tiny Gemma presets by @sachinprasadhs in #2360
- Update fixed preset version for gemma3 by @sachinprasadhs in #2362
- Add generic export_to_transformers to the base classes by @Bond099 in #2346
- update version file in master by @sachinprasadhs in #2361
- add styleguide for GCA code reviews by @divyashreepathihalli in #2366
- Update styleguide.md by @divyashreepathihalli in #2370
- Add T5Gemma to KerasHub by @harshaljanjani in #2339
- Allow passing flexible positions to positional embedding layers by @abheesht17 in #2369
- Supports Loading Quantized Models with from_preset()by @JyotinderSingh in #2367
- PARSeq Model by @sineeli in #2089
- Add D-FINE to KerasHub by @harshaljanjani in #2318
- Fixing dtype issue by @buildwithsuhana in #2372
- quantize(...) should accept a config object by @JyotinderSingh in #2388
- [OpenVINO backend] Adding support for OpenVINO backend & support inference for Mistral & Gemma & GPT2 by @Mohamed-Ashraf273 in #2350
- minor modify by @pass-lin in #2386
- Add bidirectional attention mask for EmbeddingGemma by @abheesht17 in #2382
- Fixes by @buildwithsuhana in #2395
- Disable DINO quantisation checks by @abheesht17 in #2397
- Introduce D-FINE model presets in KerasHub by @harshaljanjani in #2376
- Introduce T5Gemma model presets in KerasHub by @harshaljanjani in #2373
- Update CLIP presets by @abheesht17 in #2400
- Fix Gemma OpenVINO tests by @abheesht17 in #2402
- Adds support for gemma_270m to checkpoint converter by @JyotinderSingh in #2380
- [internal] Reorder @pytest.mark.large decorator to fix CI by @JyotinderSingh in #2410
- Update preset map for VGG model by @sonali-kumari1 in #2411
- Update preset map for T5 model by @sonali-kumari1 in #2414
- Update preset map values for cspnet by @dhantule in #2416
- Add DepthAnythingV2. by @james77777778 in #2377
- Add Qwen3 Moe by @kanpuriyanawab in #2260
- update hf checkpoints list by @sachinprasadhs in #2381
- Patch conversion script qwen3 moe by @kanpuriyanawab in #2425
- update SD3 & 3.5 presets by @sachinprasadhs in #2417
- Add and Register the Qwen3_MoE Presets to Hub by @laxmareddyp in #2429
- Add MobileNetV5 to KerasHub by @harshaljanjani in #2399
- For sharded weights let's not delete explicitly by @amitsrivastava78 in #2431
- Update Keras min Test version to 3.9 by @sachinprasadhs in #2434
- Overrides _post_quantizeto resetgenerate_functiongraph after quantization by @JyotinderSingh in #2436
- Handles incompatible quantization mode for ReversibleEmbedding by @JyotinderSingh in #2435
- extend PR stale and closure time by @sachinprasadhs in #2437
- register depth anything presets by @sachinprasadhs in #2420
- [SmolLM3] Add Backbone, CausalLM + Converter for HuggingFace Weights by @DavidLandup0 in #2327
- Register Cell2Sentence Presets by @laxmareddyp in #2442
- register parseq preset by @sachinprasadhs in #2438
- register mobilenet presets by @sachinprasadhs in #2443
- update release version by @sachinprasadhs in #2446
New Contributors
- @buildwithsuhana made their first contribution in #2372
- @Mohamed-Ashraf273 made their first contribution in #2350
- @dhantule made their first contribution in #2416
- @amitsrivastava78 made their first contribution in #2431
Full Changelog: v0.22.2...v0.23.0
v0.23.0.dev0
What's Changed
- Register tiny Gemma presets by @sachinprasadhs in #2360
- Update fixed preset version for gemma3 by @sachinprasadhs in #2362
- Add generic export_to_transformers to the base classes by @Bond099 in #2346
- update version file in master by @sachinprasadhs in #2361
- add styleguide for GCA code reviews by @divyashreepathihalli in #2366
- Update styleguide.md by @divyashreepathihalli in #2370
- Add T5Gemma to KerasHub by @harshaljanjani in #2339
- Allow passing flexible positions to positional embedding layers by @abheesht17 in #2369
- Supports Loading Quantized Models with from_preset()by @JyotinderSingh in #2367
- PARSeq Model by @sineeli in #2089
- Add D-FINE to KerasHub by @harshaljanjani in #2318
- Fixing dtype issue by @buildwithsuhana in #2372
- quantize(...) should accept a config object by @JyotinderSingh in #2388
- [OpenVINO backend] Adding support for OpenVINO backend & support inference for Mistral & Gemma & GPT2 by @Mohamed-Ashraf273 in #2350
- minor modify by @pass-lin in #2386
- Add bidirectional attention mask for EmbeddingGemma by @abheesht17 in #2382
- Fixes by @buildwithsuhana in #2395
- Disable DINO quantisation checks by @abheesht17 in #2397
- Introduce D-FINE model presets in KerasHub by @harshaljanjani in #2376
- Introduce T5Gemma model presets in KerasHub by @harshaljanjani in #2373
- Update CLIP presets by @abheesht17 in #2400
- Fix Gemma OpenVINO tests by @abheesht17 in #2402
- Adds support for gemma_270m to checkpoint converter by @JyotinderSingh in #2380
- [internal] Reorder @pytest.mark.large decorator to fix CI by @JyotinderSingh in #2410
- Update preset map for VGG model by @sonali-kumari1 in #2411
- Update preset map for T5 model by @sonali-kumari1 in #2414
- Update preset map values for cspnet by @dhantule in #2416
- Add DepthAnythingV2. by @james77777778 in #2377
- Add Qwen3 Moe by @kanpuriyanawab in #2260
- update hf checkpoints list by @sachinprasadhs in #2381
- Patch conversion script qwen3 moe by @kanpuriyanawab in #2425
- update SD3 & 3.5 presets by @sachinprasadhs in #2417
- Add and Register the Qwen3_MoE Presets to Hub by @laxmareddyp in #2429
- Add MobileNetV5 to KerasHub by @harshaljanjani in #2399
- For sharded weights let's not delete explicitly by @amitsrivastava78 in #2431
- Update Keras min Test version to 3.9 by @sachinprasadhs in #2434
- Overrides _post_quantizeto resetgenerate_functiongraph after quantization by @JyotinderSingh in #2436
- Handles incompatible quantization mode for ReversibleEmbedding by @JyotinderSingh in #2435
- extend PR stale and closure time by @sachinprasadhs in #2437
- register depth anything presets by @sachinprasadhs in #2420
- [SmolLM3] Add Backbone, CausalLM + Converter for HuggingFace Weights by @DavidLandup0 in #2327
- Register Cell2Sentence Presets by @laxmareddyp in #2442
- register parseq preset by @sachinprasadhs in #2438
- register mobilenet presets by @sachinprasadhs in #2443
New Contributors
- @buildwithsuhana made their first contribution in #2372
- @Mohamed-Ashraf273 made their first contribution in #2350
- @dhantule made their first contribution in #2416
- @amitsrivastava78 made their first contribution in #2431
Full Changelog: v0.22.2...v0.23.0.dev0
v0.22.2
New Model: VaultGemma
VaultGemma is a 1-billion-parameter, 26-layer, text-only decoder model trained with sequence-level differential privacy (DP).
Derived from Gemma 2, its architecture notably drops the norms after the Attention and MLP blocks and uses full attention for all layers, rather than alternating with local sliding attention.
The pretrained model is available with a 1024-token sequence length.
What's Changed
- Add DP research model by @sachinprasadhs in #2396
Full Changelog: v0.22.1...v0.22.2
v0.22.1
What's Changed
- Patch release with Gemma3 presets fix by @sachinprasadhs in #2363
Full Changelog: v0.22.0...v0.22.1
v0.22.0
Summary:
New Models:
We've integrated a range of cutting-edge models, each designed to tackle specific challenges in their respective domains:
- 
Gemma 3 270M: Released Gemma 3 270M parameter model and instruction tuned, 18-layer, text-only model designed for 
 hyper-efficient AI, particularly for task-specific fine-tuning.
- 
Qwen3: A powerful, large-scale multilingual language model, excelling in various natural language processing tasks, from text generation to complex reasoning. 
- 
DeiT: Data-efficient Image Transformers (DeiT), specifically designed to train Vision Transformers effectively with less data, making high-performance visual models more accessible. 
- 
HGNetV2: An advanced version of the Hybrid-Grouped Network, known for its efficient architecture in computer vision tasks, particularly optimized for performance on diverse hardware. 
- 
DINOV2: A state-of-the-art Self-Supervised Vision Transformer, enabling the learning of robust visual representations without relying on explicit labels, ideal for foundation models. 
- 
ESM & ESM2: Evolutionary Scale Modeling (ESM & ESM2), powerful protein language models used for understanding protein sequences and structures, with ESM2 offering improved capabilities for bioinformatics research. 
Improvements & Enhancements
This update also includes several key improvements to enhance the platform's stability, compatibility, and flexibility:
- Python 3.10 Minimum Support: Updated the minimum supported Python version to 3.10, ensuring compatibility with the latest libraries and features.
- Gemma Conversion (Keras to SafeTensors): Added a new conversion script to effortlessly convert Gemma models from Keras format to Hugging Face's Safetensor format.
- Gemma3 Conversion Script: Added conversion script for Gemma3 models, streamlining their integration into the Hugging Face ecosystem.
- ViT Non-Square Image Support: Enhanced the Vision Transformer (ViT) model to now accept non-square images as input, providing greater flexibility for various computer vision applications.
- LLM Left Padding Method: Added support for left padding in our LLM padding methods, offering more control and compatibility for specific model architectures and inference requirements.
What's Changed
Complete list of all the changes included in this release.
- register presets by @sachinprasadhs in #2268
- Fix batch preprocessing bug in Moonshine generation by @harshaljanjani in #2266
- fix get_lora_target_names function by @divyashreepathihalli in #2167
- implement of leftpadding by @pass-lin in #2242
- make vit compatible with non square images by @sineeli in #2255
- Bump up master version to 0.22.0.dev0 by @laxmareddyp in #2277
- Fix keras-io integration test by @laxmareddyp in #2280
- Add Qwen3 by @kanpuriyanawab in #2249
- Add DeiT Model by @Sohaib-Ahmed21 in #2203
- [HOTFIX] Add Docstring for QwenCausalLM by @kanpuriyanawab in #2279
- Fix: Correct coverage tracking for keras_hub by @sachinprasadhs in #2283
- Update the sharded version number for Llama3 variants by @laxmareddyp in #2294
- Support None for max_shard_size by @laxmareddyp in #2261
- Sharded weights type error by @laxmareddyp in #2296
- Fix PaliGemmaCausalLM example. by @hertschuh in #2302
- Routine HF sync by @divyashreepathihalli in #2303
- Incorrect condition on sliding_window_size by @laxmareddyp in #2289
- Bump the python group with 2 updates by @dependabot[bot] in #2282
- Modify TransformerEncoder masking documentation by @sonali-kumari1 in #2297
- Fix Gemma3InterleaveEmbeddings JAX inference error by ensuring indices are int32 by @pctablet505 in #2305
- Update preset versions for Mixtral,Qwen-MoE and Mistral models by @laxmareddyp in #2307
- Fix Mistral conversion script by @laxmareddyp in #2306
- Bump the python group with 6 updates by @dependabot[bot] in #2317
- Qwen3 causal lm by @kanpuriyanawab in #2311
- Fix JAX GPU tests by @sachinprasadhs in #2319
- support flash-attn at torch backend by @pass-lin in #2257
- Add HGNetV2 to KerasHub by @harshaljanjani in #2293
- Qwen3 presets register by @laxmareddyp in #2325
- Register HGNetV2 presets by @laxmareddyp in #2326
- Safetensors conversion by @Bond099 in #2290
- Add DINOV2. by @james77777778 in #2328
- Refactor CLIPand update SD3. by @james77777778 in #2316
- add DINOv2 preset details by @sachinprasadhs in #2336
- Fix dtype issues on JAX CPU in SD3 tests. by @james77777778 in #2338
- Revert "Fix dtype issues of JAX CPU in SD3. (#2338)" by @divyashreepathihalli in #2344
- Resolve preset comparison bug in glue load model method by @emmanuel-ferdman in #2345
- Removes unnecessary call to torch.no_grad()by @JyotinderSingh in #2353
- Add Esm by @pass-lin in #2244
- Fix float16 issue in SD3 when using JAX CPU. by @james77777778 in #2354
- update python to 3.10 and Keras minimum version to 3.8 by @sachinprasadhs in #2292
- register DeiT presets by @sachinprasadhs in #2348
- Fix path for presets to link it to API docs in keras.io by @sachinprasadhs in #2357
- Fix for llama3.1 instruct models by @pctablet505 in #2355
- Add & register ESM presets by @sachinprasadhs in #2356
- Add Gemma 3 conversion script by @abheesht17 in #2358
- Remove exact matching of outputs from Gemma 3 conversion notebook by @abheesht17 in #2359
New Contributors
- @Sohaib-Ahmed21 made their first contribution in #2203
- @sonali-kumari1 made their first contribution in #2297
- @Bond099 made their first contribution in #2290
- @emmanuel-ferdman made their first contribution in #2345
Full Changelog: v0.21.1...v0.22.0
For detailed documentation and usage examples/guides, please refer to our updated guides on https://keras.io/keras_hub/
v0.22.0.dev0
What's Changed
- Fix Roformer export symbol by @abheesht17 in #2199
- Bump up master version to 0.21 by @abheesht17 in #2204
- reenable test by @mattdangerw in #2188
- Add xception model by @mattdangerw in #2179
- Make image converter built by @mattdangerw in #2206
- Qwen - Fix Preset Loader + Add Causal LM Test by @kanpuriyanawab in #2193
- Update Qwen conversion script by @laxmareddyp in #2207
- Revert "Do not export Qwen for release" by @sachinprasadhs in #2208
- Fixes compute_output_shape for PaliGemmaVitEncoder and Gemma3VisionEncoderBlock by @JyotinderSingh in #2210
- Python 3.12 fix by @mattdangerw in #2211
- Small Gemma3 doc-string edits by @abheesht17 in #2214
- Llama3.1 by @pctablet505 in #2132
- Update gemma3_causal_lm_preprocessor.py by @pctablet505 in #2217
- fix: apply weights_only = Trueby @b8zhong in #2215
- Fix the keras_hub package for typecheckers and IDEs by @mattdangerw in #2222
- Add utility to map COCO IDs to class names by @mattdangerw in #2219
- Set GPU timeouts to 2 hours by @mattdangerw in #2226
- Fix nightly by @mattdangerw in #2227
- Another fix for nightly builds by @mattdangerw in #2229
- Cast a few more input to tensors in SD3 by @mattdangerw in #2234
- Fix up package build scripts again by @mattdangerw in #2230
- Add qwen presets by @laxmareddyp in #2241
- script for converting retinanet weights from trochvision by @sineeli in #2233
- Sharded weights support by @james77777778 in #2218
- Add Qwen Moe by @kanpuriyanawab in #2163
- Add Mixtral by @kanpuriyanawab in #2196
- Made label data optional for inference and adopted other required changes by @laxmareddyp in #2183
- Fix the layer names by @kanpuriyanawab in #2247
- Add new CSPNet preset and add manual padding. by @sachinprasadhs in #2212
- Update the int8 quant logic in ReversibleEmbeddingby @james77777778 in #2250
- Add Moonshine to KerasHub by @harshaljanjani in #2093
- Add Kaggle handle for moonshine presets by @laxmareddyp in #2253
- Update requirements-jax-cuda.txt by @pctablet505 in #2252
- Add Mixtral,Qwen-MoE presets and Update conversion script. by @laxmareddyp in #2248
- fix flash attention test by @divyashreepathihalli in #2263
- Fix JAX bugs for qwen moe & mixtral by @kanpuriyanawab in #2258
- Create pull_request_template.md by @sachinprasadhs in #2262
- Update preset versions for sharded models by @laxmareddyp in #2264
- Add AudioToText and AudioToTextPreprocessor class stubs to enable auto class functionality by @harshaljanjani in #2265
- register moonshine presets by @sachinprasadhs in #2267
- register presets by @sachinprasadhs in #2268
- Fix batch preprocessing bug in Moonshine generation by @harshaljanjani in #2266
- fix get_lora_target_names function by @divyashreepathihalli in #2167
- implement of leftpadding by @pass-lin in #2242
- make vit compatible with non square images by @sineeli in #2255
- Bump up master version to 0.22.0.dev0 by @laxmareddyp in #2277
- Fix keras-io integration test by @laxmareddyp in #2280
- Add Qwen3 by @kanpuriyanawab in #2249
- Add DeiT Model by @Sohaib-Ahmed21 in #2203
- [HOTFIX] Add Docstring for QwenCausalLM by @kanpuriyanawab in #2279
- Fix: Correct coverage tracking for keras_hub by @sachinprasadhs in #2283
- Update the sharded version number for Llama3 variants by @laxmareddyp in #2294
- Support None for max_shard_size by @laxmareddyp in #2261
- Sharded weights type error by @laxmareddyp in #2296
- Fix PaliGemmaCausalLM example. by @hertschuh in #2302
- Routine HF sync by @divyashreepathihalli in #2303
- Incorrect condition on sliding_window_size by @laxmareddyp in #2289
- Bump the python group with 2 updates by @dependabot[bot] in #2282
- Modify TransformerEncoder masking documentation by @sonali-kumari1 in #2297
- Fix Gemma3InterleaveEmbeddings JAX inference error by ensuring indices are int32 by @pctablet505 in #2305
- Update preset versions for Mixtral,Qwen-MoE and Mistral models by @laxmareddyp in #2307
- Fix Mistral conversion script by @laxmareddyp in #2306
- Bump the python group with 6 updates by @dependabot[bot] in #2317
- Qwen3 causal lm by @kanpuriyanawab in #2311
- Fix JAX GPU tests by @sachinprasadhs in #2319
- support flash-attn at torch backend by @pass-lin in #2257
- Add HGNetV2 to KerasHub by @harshaljanjani in #2293
- Qwen3 presets register by @laxmareddyp in #2325
- Register HGNetV2 presets by @laxmareddyp in #2326
- Safetensors conversion by @Bond099 in #2290
- Add DINOV2. by @james77777778 in #2328
- Refactor CLIPand update SD3. by @james77777778 in #2316
- add DINOv2 preset details by @sachinprasadhs in #2336
- Fix dtype issues on JAX CPU in SD3 tests. by @james77777778 in #2338
- Revert "Fix dtype issues of JAX CPU in SD3. (#2338)" by @divyashreepathihalli in #2344
- Resolve preset comparison bug in glue load model method by @emmanuel-ferdman in #2345
- Removes unnecessary call to torch.no_grad()by @JyotinderSingh in #2353
- Add Esm by @pass-lin in #2244
- Fix float16 issue in SD3 when using JAX CPU. by @james77777778 in #2354
- update python to 3.10 and Keras minimum version to 3.8 by @sachinprasadhs in #2292
- register DeiT presets by @sachinprasadhs in #2348
- Fix path for presets to link it to API docs in keras.io by @sachinprasadhs in #2357
- Fix for llama3.1 instruct models by @pctablet505 in #2355
- Add & register ESM presets by @sachinprasadhs in #2356
- Add Gemma 3 conversion script by @abheesht17 in #2358
- Remove exact matching of outputs from Gemma 3 conversion notebook by @abheesht17 in #2359
New Contributors
- @JyotinderSingh made their first contribution in #2210
- @pctablet505 made their first contribution in #2132
- @b8zhong made their first contribution in #2215
- @harshaljanjani made their first contribution in #2093
- @Sohaib-Ahmed21 made their first contribution in #2203
- @sonali-kumari1 made their first contribution in #2297
- @Bond099 mad...
v0.21.1
Summary:
- Comprehensive docstrings to QwencausalLM, resolve integration test issues for Keras-IO, and coverage tracking for Keras-Hub.
What's Changed
- Add QwencausalLM docstrings, coverage tracking, keras-io integration fix by @laxmareddyp in #2284
- Version bump to 0.21.1 by @laxmareddyp in #2285
Full Changelog: v0.21.0...v0.21.1
v0.21.0
Summary
- 
New Models. - Xception: Added Xception architecture for image classification tasks.
- Qwen: Added Qwen2.5 large language models and presets of base language models and instruction-tuned language models ranging from 0.5 to 72 billion parameters.
- Qwen MoE: Added transformer-based Mixture of Experts (MoE) decoder-only language model with a base variant having 2.7B activated parameters during runtime.
- Mixtral: Added Mixtral LLM, a pretrained generative Sparse Mixture of Experts with pre-trained and instruction tuned models having 7 billion activated parameters.
- Moonshine: Added Moonshine, a speech recognition task model.
- CSPNet: Added Cross Stage Partial Network (CSPNet) classification task model.
- Llama3: Added support for Llama 3.1 and 3.2.
 
- 
Added sharded weight support to KerasPresetSaver and KerasPresetLoader, defaulting to a 10GB maximum shard size. 
What's Changed
- Fix Roformer export symbol by @abheesht17 in #2199
- Bump up master version to 0.21 by @abheesht17 in #2204
- reenable test by @mattdangerw in #2188
- Add xception model by @mattdangerw in #2179
- Make image converter built by @mattdangerw in #2206
- Qwen - Fix Preset Loader + Add Causal LM Test by @kanpuriyanawab in #2193
- Update Qwen conversion script by @laxmareddyp in #2207
- Revert "Do not export Qwen for release" by @sachinprasadhs in #2208
- Fixes compute_output_shape for PaliGemmaVitEncoder and Gemma3VisionEncoderBlock by @JyotinderSingh in #2210
- Python 3.12 fix by @mattdangerw in #2211
- Small Gemma3 doc-string edits by @abheesht17 in #2214
- Llama3.1 by @pctablet505 in #2132
- Update gemma3_causal_lm_preprocessor.py by @pctablet505 in #2217
- fix: apply weights_only = Trueby @b8zhong in #2215
- Fix the keras_hub package for typecheckers and IDEs by @mattdangerw in #2222
- Add utility to map COCO IDs to class names by @mattdangerw in #2219
- Set GPU timeouts to 2 hours by @mattdangerw in #2226
- Fix nightly by @mattdangerw in #2227
- Another fix for nightly builds by @mattdangerw in #2229
- Cast a few more input to tensors in SD3 by @mattdangerw in #2234
- Fix up package build scripts again by @mattdangerw in #2230
- Add qwen presets by @laxmareddyp in #2241
- script for converting retinanet weights from trochvision by @sineeli in #2233
- Sharded weights support by @james77777778 in #2218
- Add Qwen Moe by @kanpuriyanawab in #2163
- Add Mixtral by @kanpuriyanawab in #2196
- Made label data optional for inference and adopted other required changes by @laxmareddyp in #2183
- Fix the layer names by @kanpuriyanawab in #2247
- Add new CSPNet preset and add manual padding. by @sachinprasadhs in #2212
- Update the int8 quant logic in ReversibleEmbeddingby @james77777778 in #2250
- Add Moonshine to KerasHub by @harshaljanjani in #2093
- Add Kaggle handle for moonshine presets by @laxmareddyp in #2253
- Update requirements-jax-cuda.txt by @pctablet505 in #2252
- Add Mixtral,Qwen-MoE presets and Update conversion script. by @laxmareddyp in #2248
- fix flash attention test by @divyashreepathihalli in #2263
- Fix JAX bugs for qwen moe & mixtral by @kanpuriyanawab in #2258
- Create pull_request_template.md by @sachinprasadhs in #2262
- Update preset versions for sharded models by @laxmareddyp in #2264
- Add AudioToText and AudioToTextPreprocessor class stubs to enable auto class functionality by @harshaljanjani in #2265
- register moonshine presets by @sachinprasadhs in #2267
- Version bump 0.21.0.dev1 by @laxmareddyp in #2273
- Version bump to 0.21.0 by @laxmareddyp in #2275
New Contributors
- @JyotinderSingh made their first contribution in #2210
- @pctablet505 made their first contribution in #2132
- @b8zhong made their first contribution in #2215
Full Changelog: v0.20.0...v0.21.0
v0.20.0
What's Changed
- Install TF Text on non-Windows only by @abheesht17 in #2115
- Add SigLIP by @james77777778 in #2113
- Fix PaliGemmaVitEncoderoutput shape by @abheesht17 in #2123
- Cspnet architecture. by @sachinprasadhs in #2091
- Update our master version to be a dev release by @mattdangerw in #2131
- Add top 3 HF Presets for Mobilenet by @pkgoogle in #2105
- Add SigLIP2 by @james77777778 in #2127
- update Gemma attention for TPU by @divyashreepathihalli in #2130
- Update dev version rule for nightly by @SamanehSaadat in #2139
- Fix dtype bug in image converter by @abheesht17 in #2147
- Add instruction in .md for manual pre-commit run by @abheesht17 in #2148
- Add Qwen 2.5 by @shivance in #2088
- Updated CONTRIBUTING.md (Fixes issue #2153) by @villurignanesh in #2156
- Update kaggle preset paths for SigLip model by @laxmareddyp in #2164
- Routine Kaggle HF sync by @divyashreepathihalli in #2165
- Enable LoRA target names arg by @divyashreepathihalli in #2166
- Update retinanet_presets.py by @sineeli in #2157
- Add Gemma3 by @abheesht17 in #2152
- Add precommit to the common requirements file by @mattdangerw in #2173
- Add back a format script for compat by @mattdangerw in #2174
- Add a TextToImagePreprocessor base class by @mattdangerw in #2181
- Bump the python group with 2 updates by @dependabot in #2185
- implement of roformerv2 by @pass-lin in #2145
- Move sliding window attn before FA block for Gemma by @abheesht17 in #2187
- Update gating condition to include check for supporting GPUs for flash attention by @divyashreepathihalli in #2184
- Revert "Fix dtype bug in image converter (#2147)" by @mattdangerw in #2180
- Add vision for Gemma3 by @abheesht17 in #2170
- Do not export Qwen for release by @abheesht17 in #2198
- Version bump to 0.20.0.dev1 by @abheesht17 in #2200
- Version bump to 0.20.0 by @abheesht17 in #2202
New Contributors
- @villurignanesh made their first contribution in #2156
Full Changelog: v0.19.3...v0.20.0
v0.20.0.dev1
What's Changed
- Version bump to 0.20.0.dev1 by @abheesht17 in #2200
Full Changelog: v0.20.0.dev0...v0.20.0.dev1