Update dependency transformers to v4.48.0 #130

renovate · 2024-06-05T03:45:14Z

This PR contains the following updates:

Package	Change	Age	Adoption	Passing	Confidence
transformers	`==4.38.0` -> `==4.48.0`

Release Notes

huggingface/transformers (transformers)

`v4.48.0`: : ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose, DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge, Moonshine

Compare Source

New models

ModernBERT

The ModernBert model was proposed in Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference by Benjamin Warner, Antoine Chaffin, Benjamin Clavié, Orion Weller, Oskar Hallström, Said Taghadouini, Alexis Galalgher, Raja Bisas, Faisal Ladhak, Tom Aarsen, Nathan Cooper, Grifin Adams, Jeremy Howard and Iacopo Poli.

It is a refresh of the traditional encoder architecture, as used in previous models such as BERT and RoBERTa.

It builds on BERT and implements many modern architectural improvements which have been developed since its original release, such as:

Rotary Positional Embeddings to support sequences of up to 8192 tokens.
Unpadding to ensure no compute is wasted on padding tokens, speeding up processing time for batches with mixed-length sequences.
GeGLU Replacing the original MLP layers with GeGLU layers, shown to improve performance.
Alternating Attention where most attention layers employ a sliding window of 128 tokens, with Global Attention only used every 3 layers.
Flash Attention to speed up processing.
A model designed following recent The Case for Co-Designing Model Architectures with Hardware, ensuring maximum efficiency across inference GPUs.
Modern training data scales (2 trillion tokens) and mixtures (including code ande math data)

Add ModernBERT to Transformers by @warner-benjamin in #35158

Aria

The Aria model was proposed in Aria: An Open Multimodal Native Mixture-of-Experts Model by Li et al. from the Rhymes.AI team.

Aria is an open multimodal-native model with best-in-class performance across a wide range of multimodal, language, and coding tasks. It has a Mixture-of-Experts architecture, with respectively 3.9B and 3.5B activated parameters per visual token and text token.

Add Aria by @aymeric-roucher in #34157

TimmWrapper

We add a TimmWrapper set of classes such that timm models can be loaded in as transformer models into the library.

Here's a general usage example:

import torch
from urllib.request import urlopen
from PIL import Image
from transformers import AutoConfig, AutoModelForImageClassification, AutoImageProcessor

checkpoint = "timm/resnet50.a1_in1k"
img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))

image_processor = AutoImageProcessor.from_pretrained(checkpoint)
inputs = image_processor(img, return_tensors="pt")
model = AutoModelForImageClassification.from_pretrained(checkpoint)

with torch.no_grad():
    logits = model(**inputs).logits

top5_probabilities, top5_class_indices = torch.topk(logits.softmax(dim=1) * 100, k=5)

Thanks to this, timm models now have access to pipelines, as well as Trainer, accelerate device maps, quantization, etc:

import torch
from urllib.request import urlopen
from PIL import Image

from transformers import pipeline

img = Image.open(urlopen(
    'https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/beignets-task-guide.png'
))
pipe = pipeline("image-classification", model="timm/resnet18.a1_in1k")
print(pipe(img))

Add TimmWrapper by @qubvel and @amyeroberts in #34564

Pixtral-Large

Pixtral modeling and checkpoint conversion code has been updated to support the new Pixtral-Large model.

Update Pixtral conversion script to support large format! by @arthurzucker in #34801

ColPali

The ColPali model was proposed in ColPali: Efficient Document Retrieval with Vision Language Models by Manuel Faysse*, Hugues Sibille*, Tony Wu*, Bilel Omrani, Gautier Viaud, Céline Hudelot, Pierre Colombo (* denotes equal contribution). Work lead by ILLUIN Technology.

In the proposed ColPali approach, the authors leverage VLMs to construct efficient multi-vector embeddings directly from document images (“screenshots”) for document retrieval. They train the model to maximize the similarity between these document embeddings and the corresponding query embeddings, using the late interaction method introduced in ColBERT.

Add ColPali to 🤗 transformers by @tonywu71 and @yonigozlan in #33736

Falcon3

Falcon3 represents a natural evolution from previous releases, emphasizing expanding the models’ science, math, and code capabilities. This iteration includes five base models: Falcon3-1B-Base, Falcon3-3B-Base, Falcon3-Mamba-7B-Base, Falcon3-7B-Base, and Falcon3-10B-Base. In developing these models, the authors incorporated several key innovations aimed at improving the models’ performances while reducing training costs:

One pre-training: They conducted a single large-scale pretraining run on the 7B model, using 2048 H100 GPU chips, leveraging 14 trillion tokens featuring web, code, STEM, and curated high-quality and multilingual data. Depth up-scaling for improved reasoning: Building on recent studies on the effects of model depth, they upscaled the 7B model to a 10B parameters model by duplicating the redundant layers and continuing pre-training with 2TT of high-quality data. This yielded Falcon3-10B-Base which achieves state-of-the-art zero-shot and few-shot performance for models under 13B parameters. Knowledge distillation for better tiny models: To provide compact and efficient alternatives, we developed Falcon3-1B-Base and Falcon3-3B-Base by leveraging pruning and knowledge distillation techniques, using less than 100GT of curated high-quality data, thereby redefining pre-training efficiency.

Add Falcon3 documentation by @mokeddembillel in #35307

Bamba

Bamba-9B is a decoder-only language model based on the Mamba-2 architecture and is designed to handle a wide range of text generation tasks. It is trained from scratch using a two-stage training approach. In the first stage, the model is trained on 2 trillion tokens from the Dolma v1.7 dataset. In the second stage, it undergoes additional training on 200 billion tokens, leveraging a carefully curated blend of high-quality data to further refine its performance and enhance output quality.

Checkout all Bamba-9B model checkpoints here.

Add the Bamba Model by @fabianlim in #34982

VitPose

ViTPose is a state-of-the-art vision transformer-based model for human pose estimation, introduced by Yufei Xu, Jing Zhang, Qiming Zhang, and Dacheng Tao in "ViTPose: Simple Vision Transformer Baselines for Human Pose Estimation”.

The model leverages the capabilities of vision transformers to accurately predict 2D human keypoints. Adopting a top-down approach, ViTPose estimates keypoints locations for each detected person, allowing it to be easily used with any object detection model.

Add VitPose by @SangbumChoi and @NielsRogge in #30530

DINOv2 with registers

The DINOv2 with Registers model was proposed in Vision Transformers Need Registers by Timothée Darcet, Maxime Oquab, Julien Mairal, Piotr Bojanowski.

The Vision Transformer (ViT) is a transformer encoder model (BERT-like) originally introduced to do supervised image classification on ImageNet.

Next, people figured out ways to make ViT work really well on self-supervised image feature extraction (i.e. learning meaningful features, also called embeddings) on images without requiring any labels. Some example papers here include DINOv2 and MAE.

The authors of DINOv2 noticed that ViTs have artifacts in attention maps. It’s due to the model using some image patches as “registers”. The authors propose a fix: just add some new tokens (called “register” tokens), which you only use during pre-training (and throw away afterwards). This results in:

no artifacts
interpretable attention maps
and improved performances.

Add DINOv2 with registers by @NielsRogge in #35348

Emu3

The Emu3 model was proposed in Emu3: Next-Token Prediction is All You Need by Xinlong Wang, Xiaosong Zhang, Zhengxiong Luo, Quan Sun, Yufeng Cui, Jinsheng Wang, Fan Zhang, Yueze Wang, Zhen Li, Qiying Yu, Yingli Zhao, Yulong Ao, Xuebin Min, Tao Li, Boya Wu, Bo Zhao, Bowen Zhang, Liangdong Wang, Guang Liu, Zheqi He, Xi Yang, Jingjing Liu, Yonghua Lin, Tiejun Huang, Zhongyuan Wang.

Emu3 sets a new standard in multimodal AI by using next-token prediction to handle images, text, and videos. It simplifies multimodal modeling by tokenizing all data into a unified format and training a single transformer. Visual data is tokenized using vector quantization methods based on VQ-VAE model. Discretized visual tokens are later fused with text token ids for image and text generation.

Emu3 outperforms leading models like SDXL and LLaVA-1.6 in both generation and perception tasks, without relying on diffusion or compositional methods..

Add Emu3 by @zucchini-nlp in #33770

Cohere2

A new Cohere update was added through a new "Cohere2" set of classes.

Add Cohere2 model by @alexrs-cohere in #35224

TextNet

TextNet is a lightweight and efficient architecture designed specifically for text detection, offering superior performance compared to traditional models like MobileNetV3. With variants TextNet-T, TextNet-S, and TextNet-B (6.8M, 8.0M, and 8.9M parameters respectively), it achieves an excellent balance between accuracy and inference speed.

Add TextNet by @jadechoghari in #34979

DiffLlama

Differential Transformer combines the Llama architecture with Differential Transformer's Attention.

Add DiffLllama by @weak-kajuma in #34083

PixtralLarge

The conversion script needed a few update, while the modeling code was barely changed!

[PixtralLarge] Update Pixtral conversion script to support large format! (#34801)

Moonshine

Moonshine is an autoregressive speech recognition encoder-decoder model that improves upon Whisper's architecture. Namely, it replaces absolute position embeddings with Rotary Position Embeddings (RoPE). This allows Moonshine to handle audio inputs of any length, unlike Whisper, which is restricted to fixed 30-second windows. It was introduced by Nat Jeffries, Evan King, Manjunath Kudlur, Guy Nicholson, James Wang, and Pete Warden in Moonshine: Speech Recognition for Live Transcription and Voice Commands
.

Add Moonshine by @eustlb in #34784

Quantization methods

VPTQ Quantization

From the VPTQ contributors:

VPTQ is a novel Post-Training Quantization method that leverages Vector Quantization to high accuracy on LLMs at an extremely low bit-width (<2-bit). VPTQ can compress 70B, even the 405B model, to 1-2 bits without retraining and maintain high accuracy.. More details here: https://github.com/microsoft/vptq

FEAT : Adding VPTQ quantization method to HFQuantizer by @wejoncy in #34770

HIGGS Quantization

From the contributors:

HIGGS is a new 0-shot quantization algorithm that combines Hadamard preprocessing with MSE-Optimal quantization grids to achieve lower quantization error and SOTA performance. You can find more information in the paper.

Runtime support for HIGGS is implemented through FLUTE, and its library.

This PR adds support for HIGGS+FLUTE into transformers allowing for low-error 0-shot quantization and fast LLM inference.

HIGGS Quantization Support by @BlackSamorez in #34997

Cleanup

We merged a cleanup for vision language models, to make sure it all models are standardized.

VLMs: major clean up 🧼 (#34502)

Breaking changes

Conversion scripts

Many models in Transformers include scripts to convert the original model checkpoints into a Transformers-compatible format. These scripts can be found in the repo using the glob pattern models/**/convert_*.py. They were a recurring source of vulnerability reports and CVEs because many models were originally released using insecure formats like older PyTorch .bin weights or pickle files. The conversion scripts had to open these formats, and this meant that they were vulnerable to maliciously crafted inputs.

In practice, we do not see this as a serious vulnerability. The conversion scripts are never imported or called by the rest of the library; each script is standalone, and so the only way to exploit the vulnerability is to create a malicious checkpoint, induce a user to download it, and then also induce them to manually call a specific conversion script on it.

However, even if there is little practical risk of an exploit, we are aware that open vulnerability reports create a compliance problem for users, and so beginning with this release we will be excluding these conversion scripts from release branches and wheels. They will remain accessible to developers on the main branch.

🚨🚨🚨 Delete conversion scripts when making release wheels by @Rocketknight1 in #35296

Backtracking in Nougat

A regular expression used within the Nougat code has been modified to ensure it does not hang. The method should output the same results but we cannot guarantee it; we recommend upgrading to the latest transformers if you use this model to ensure your code is performance-optimized.

🚨🚨🚨 Limit backtracking in Nougat regexp by @qubvel in #35264

Whisper decoding

This PR finalizes work that aimes to enable short-form (< 30 secs) and long-form generation using temperature fallback. It is a significant improvement to the whisper codebase, but it does result in the following breaking changes:

➡️ Previously:
• Short-form: Returned a ModelOutput or torch.LongTensor, including decoder input IDs and the EOS token ID.
• Long-form: Returned a Dict or torch.LongTensor, excluding decoder input IDs and the EOS token ID.

➡️ From now on:
Short-form and long-form generation are now treated identically, meaning output differentiation based on these modes is no longer applicable.

Decoder input IDs and EOS token IDs are never returned, except in two specific cases: when return_dict_in_generate=True and (return_timestamps=False or force_unique_generate_call=True).

In this case, the output will be a ModelOutput, which is the result of the underlying call to GenerationMixin’s generate. Indeed, return_timestamps=False ensures no seeking occurs; only a single call to generate is made. Therefore, this output includes both decoder input IDs and the EOS token ID.

[Whisper] 🚨 Fix whisper decoding 🚨 by @eustlb in #34135

Attention refactor

In order to have a cleaner, isolated, future-proof code for the attention layers, they have been refactored so as to keep the model attention code within their files; but attention definitions relating to SDPA, Flash Attention, and other types of attention have been moved to a common file.

🚨All attention refactor🚨 by @ArthurZucker in #35235

Bugfixes and improvements

[tokenizers] Ensure that add_prefix_space is propagated to backend_tokenizer.pre_tokenizer (#35593)
Setup loss_type in config at model init time (#34616)
[docs] Update Python version in translations by @jla524 in #35096
[docs] top_p, top_k, temperature docstrings by @stevhliu in #35065
Fix private forked repo. CI by @ydshieh in #35114
Add feature dim attributes to BitLinear for easier PEFT integration by @agostinv in #34946
Update I-JEPA checkpoints path by @qubvel in #35120
Fix GA loss bugs and add unit test by @techkang in #35121
[I-JEPA] Update docs by @NielsRogge in #35148
Corrected typo in agent system prompts by @Uvi-12 in #35143
Option to set 'non_blocking' for to(device) in BatchEncoding and BatchFeature by @daniel-bogdoll in #34883
Fix typo in EETQ Tests by @MekkCyber in #35160
Cleanup: continue the init refactor by @LysandreJik in #35167
Super tiny fix logging message by @fzyzcjy in #35132
Fixed typo of 'avilable' in prompts.py by @Uvi-12 in #35145
[CI] Fix bnb quantization tests with accelerate>=1.2.0 by @matthewdouglas in #35172
Fix num_items_in_batch not being an integer by @xspirus in #35115
Assisted decoding multi-gpu by @zucchini-nlp in #35116
Fix file path for shard_num 1 with mllama converter by @strangiato in #35053
Support BatchNorm in Hubert pos_conv_emb as in fairseq by @gallilmaimon in #34389
Remove unnecessary masked_fill in deberta models by @xadupre in #35182
Fix DBRX LayerNorm init method by @hgt312 in #35177
Fixing GGUF support for StableLm by @MekkCyber in #35060
[i18n-ar] Translated file : docs/source/ar/community.md into Arabic by @AhmedAlmaghz in #33027
Multiple typo fixes in NLP, Audio docs by @henryhmko in #35181
Only import torch.distributed if it is available by @GaetanLepage in #35133
[i18n-] Translating Benchmarks.md to Chinese by @asdkfjsd in #35137
[docs] Fix FlashAttention link by @stevhliu in #35171
Update data collator docstrings to accurately reference Nvidia tensor core compute capability version by @johngrahamreynolds in #35188
[i18n-] Translating agents.md to Chinese by @HMJ0628 in #35139
BLIP: enable device map by @zucchini-nlp in #34850
🧹 Remove deprecated RotaryEmbedding parts in the Attention layers by @Cyrilvallez in #34858
[PEFT] Better Trainer error when prompt learning with loading best model at the end by @BenjaminBossan in #35087
Cleanup: continue the init refactor by @LysandreJik in #35170
Fix CI by @Cyrilvallez in #35208
Fix seamless TTS generate by @ylacombe in #34968
docs: clarify initializer_range parameter description in Idefics3VisionConfig by @h3110Fr13nd in #35215
Fixed typo of 'indentifier' in audio_utils.py by @Uvi-12 in #35226
Fix type hints for apply_chat_template by @Rocketknight1 in #35216
Support Python 3.10+ Union style in chat template type hints parsing by @RezaRahemtola in #35103
Refactoring AssistedCandidateGenerator for Improved Modularity and Reusability by @keyboardAnt and @jmamou in #35009
Change back to Thread for SF conversion by @ydshieh in #35236
[Init refactor] Modular changes by @LysandreJik in #35240
Fix typo in chat template example by @EricWinsorDSIT in #35250
Run model as compressed/uncompressed mode by @horheynm in #34719
skip Fuyu from test_generate by @nhamanasu in #35246
[tests] fix "Tester object has no attribute '_testMethodName'" by @faaany in #34910
Use rsfE with pytest by @ydshieh in #35119
Update AMD docker image (rocm 6.1) by @ivarflakstad in #35259
Fixed typos in Audio Classification Documentation by @Uvi-12 in #35263
Translating agents_advanced.md to Chinese by @HMJ0628 in #35231
Fix FSDP no longer working by @muellerzr in #35212
don't use no_sync when deepspeed doesn't support it for certain zero stages by @winglian in #35157
[i18n-Chinese] Translating perf_train_cpu.md to Chinese by @asdkfjsd in #35242
Fall back to slow image processor in ImageProcessingAuto when no fast processor available by @yonigozlan in #34785
Aggeregate test summary files in CircleCI workflow runs by @ydshieh in #34989
Blip: fix offloading and MP tests by @zucchini-nlp in #35239
Fix : model used to test ggml conversion of Falcon-7b is incorrect by @MekkCyber in #35083
Temporarily disable amd push ci by @ivarflakstad in #35293
Delete redundancy for loop checks. by @zhanluxianshen in #35288
[Whisper] patch float type on mps by @eustlb in #35295
Fix typos in Translated Audio Classification Docs by @jla524 in #35287
Translating "translate perf_infer_gpu_multi.md" to Chinese by @HMJ0628 in #35271
Fix wrongs in quicktour[zh] by @zhanluxianshen in #35272
Improved documentation of Automatic speech recognition by @Uvi-12 in #35268
fix modular order by @ArthurZucker in #35297
Add sdpa for Beit by @OmarManzoor in #34941
Support for SDPA for SAM models by @MagnusS0 in #34110
remove benchmark job in push-important-models.yml by @ydshieh in #35292
Fix typos in translated quicktour docs by @jla524 in #35302
Fix image preview in multi-GPU inference docs by @jla524 in #35303
Fix remove unused parameter in docs by @zzzzzsa in #35306
Add Cohere2 docs details by @alexrs-cohere in #35294
Fixed typo in audio_classification.md by @Uvi-12 in #35305
[docs] Improve register_pipeline by @stevhliu in #35300
Fix loading with only state dict and low_cpu_mem_usage = True by @SunMarc in #35217
[tests] make cuda-only tests device-agnostic by @faaany in #35222
Trigger GitHub CI with a comment on PR by @ydshieh in #35211
change bnb tests by @jiqing-feng in #34713
[Whisper] fix docstrings typo by @eustlb in #35319
feat: add benchmarks_entrypoint.py by @McPatate in #34495
Fix documentation for ColPali by @tonywu71 in #35321
Update comment CI bot by @ydshieh in #35323
PaliGemma: Make sure to add to suffix if is present in text by @probicheaux in #35201
Fix some fa2 tests by @ArthurZucker in #35340
Modernbert Release Fixes by @warner-benjamin in #35344
[docs] Add link to ModernBERT Text Classification GLUE finetuning script by @tomaarsen in #35347
fix onnx export of speech foundation models by @nikosanto13 in #34224
[Mamba2] Fix caching, slow path, and multi-gpu by @vasqu in #35154
Reduce CircleCI usage by @ydshieh in #35355
Implement AsyncTextIteratorStreamer for asynchronous streaming by @CISC in #34931
Cleaner attention interfaces by @Cyrilvallez in #35342
Add Tensor Parallel support for Qwen2VL by @jla524 in #35050
fix zoedepth initialization error under deepspeed zero3 by @Tavish9 in #35011
Aurevoir PyTorch 1 by @ydshieh in #35358
bugfix: torch.export failure caused by _make_causal_mask by @jiwoong-choi in #35291
update codecarbon by @nhamanasu in #35243
Update test fetcher when we want to test all by @ArthurZucker in #35364
Use weights_only=True with torch.load for transfo_xl by @ydshieh in #35241
Make test_generate_with_static_cache even less flaky by @ydshieh in #34995
Improve modular transformers documentation by @joelpaulkoch in #35322
Improved Documentation Of Audio Classification by @Uvi-12 in #35368
[docs] Follow up register_pipeline by @stevhliu in #35310
owlvit/2 dynamic input resolution by @bastrob in #34764
Fix new FA2 if is_causal is passed explicitly by @Cyrilvallez in #35390
bitsandbytes: simplify 8bit dequantization by @matthewdouglas in #35068
make LlamaModel._update_causal_mask torch compilable by @winglian in #35187
Patch GPTNeoX to use adequate FA2 if position_ids is provided by @taha-yassine in #35318
uniformize kwargs for SAM by @tibor-reiss in #34578
Deprecate _is_quantized_training_enabled by @MekkCyber in #34991
Scale loss before backward by @qgallouedec in #35207
Fix typing in docstring for PaliGemmaProcessor by @alvarobartt in #35278
Fix : VPTQ test by @MekkCyber in #35394
add bnb support for Ascend NPU by @statelesshz in #31512
bugfix Idefics3 processor - handle gracefully cases with text and no images by @mfarre in #35363
Adding logger.info about update_torch_dtype in some quantizers by @MekkCyber in #35046
Add compile test for fast image processor by @yonigozlan in #35184
Disable .github/workflows/self-comment-ci.yml for now by @ydshieh in #35366
enable non-cuda awq model support without modify version by @jiqing-feng in #35334
[GPTQ, CompressedTensors] Fix unsafe imports and metada check by @vasqu in #34815
Drop inplace operation for loss computation with gradient accumulation by @qgallouedec in #35416
Fix: Rename keyword argument in_channels to num_channels by @ningyuv in #35289
CLIP conversion script - Change fairseq to OpenAI by @gau-nernst in #35384
Fix f-string to show ACCELERATE_MIN_VERSION on error by @KSafran in #35189
Fix model_accepts_loss_kwargs for timm model by @qubvel in #35257
Update perf_infer_gpu_one.md: fix a typo by @martin0258 in #35441
Add compute_loss_func to Seq2SeqTrainer by @d223302 in #35136
Update docs for sdpa_kernel by @jla524 in #35410
[i18n-ar] Translated file: docs/source/ar/tasks/question_answering.md into Arabic by @AhmedAlmaghz in #35196
[i18n-ar] Translated file: docs/source/ar/tasks/summarization.md into Arabic by @AhmedAlmaghz in #35195
Update translated docs for sdpa_kernel by @jla524 in #35461
Reintroduce Python 3.9 support for ModernBERT by @tomaarsen in #35458
Fix new BNB test failures by @matthewdouglas in #35345
Fix docs typos. by @zhanluxianshen in #35465
Fix paligemma warning message by @hiyouga in #35486

Significant community contributions

The following contributors have made significant changes to the library over the last release:

@ydshieh
- Fix private forked repo. CI (#35114)
- Change back to Thread for SF conversion (#35236)
- Use rsfE with pytest (#35119)
- Aggeregate test summary files in CircleCI workflow runs (#34989)
- remove benchmark job in push-important-models.yml (#35292)
- Trigger GitHub CI with a comment on PR (#35211)
- Update comment CI bot (#35323)
- Reduce CircleCI usage (#35355)
- Aurevoir PyTorch 1 (#35358)
- Use weights_only=True with torch.load for transfo_xl (#35241)
- Make test_generate_with_static_cache even less flaky (#34995)
- Disable .github/workflows/self-comment-ci.yml for now (#35366)
@aymeric-roucher
- Add Aria (#34157)
@NielsRogge
- [I-JEPA] Update docs (#35148)
- Add DINOv2 with registers (#35348)
@HMJ0628
- [i18n-] Translating agents.md to Chinese (#35139)
- Translating agents_advanced.md to Chinese (#35231)
- Translating "translate perf_infer_gpu_multi.md" to Chinese (#35271)
@alexrs-cohere
- Add Cohere2 model (#35224)
- Add Cohere2 docs details (#35294)
@ArthurZucker
- fix modular order (#35297)
- 🚨All attention refactor🚨 (#35235)
- Fix some fa2 tests (#35340)
- Update test fetcher when we want to test all (#35364)
@tonywu71
- Add ColPali to 🤗 transformers (#33736)
- Fix documentation for ColPali (#35321)
@OmarManzoor
- Add sdpa for Beit (#34941)
@fabianlim
- Add the Bamba Model (#34982)
@warner-benjamin
- Add ModernBERT to Transformers (#35158)
- Modernbert Release Fixes (#35344)
@wejoncy
- FEAT : Adding VPTQ quantization method to HFQuantizer (#34770)
@bastrob
- owlvit/2 dynamic input resolution (#34764)
@BlackSamorez
- HIGGS Quantization Support (#34997)

`v4.47.1`

Compare Source

Patch release v4.47.1

We waited a little bit to make sure it was stable, thanks @winglian for double checking and everyone for the fixes!

Fix GA loss bugs and add unit test (#35121)
Contributed by @techkang and @ArthurZucker.
Fix num_items_in_batch not being an integer (#35115)
Contributed by @xspirus.
Fix FSDP no longer working (#35212)
Contributed by @muellerzr.
Don't use no_sync when DeepSpeed doesn't support it for certain ZeRO configurations (#35212)
Contributed by @winglian.
Only import torch.distributed if it is available (#35133)
Contributed by @GaetanLepage.
[Whisper] Patch float type on MPS (#35295)
Contributed by @eustlb. 🔜 we should probably have MPS CIs to avoid repeating this!

`v4.47.0`: v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel

Compare Source

New models

PaliGemma-2

PaliGemma 2 and PaliGemma are lightweight open vision-language models (VLM) inspired by PaLI-3, and based on open components like the SigLIP vision model and the Gemma language model. PaliGemma takes both images and text as inputs and can answer questions about images with detail and context, meaning that PaliGemma can perform deeper analysis of images and provide useful insights, such as captioning for images and short videos, object detection, and reading text embedded within images.

PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes, which are based on Gemma 2 2B, 9B, and 27B models, respectively. The original PaliGemma models are available in the 3B size. For more information on Gemma model variants, see the Gemma models list. PaliGemma model variants support different pixel resolutions for image inputs, including 224 x 224, 448 x 448, and 896 x 896 pixels.

I-JEPA

The I-JEPA model was proposed in Image-based Joint-Embedding Predictive Architecture by Mahmoud Assran, Quentin Duval, Ishan Misra, Piotr Bojanowski, Pascal Vincent, Michael Rabbat, Yann LeCun, Nicolas Ballas. I-JEPA is a self-supervised learning method that predicts the representations of one part of an image based on other parts of the same image. This approach focuses on learning semantic features without relying on pre-defined invariances from hand-crafted data transformations, which can bias specific tasks, or on filling in pixel-level details, which often leads to less meaningful representations.

Add I-JEPA by @jmtzt in #33125

OLMo 2

The OLMo2 model is the successor of the OLMo model, which was proposed in OLMo: Accelerating the Science of Language Models.

The architectural changes from the original OLMo model to this model are:

RMSNorm is used instead of standard layer norm.
Norm is applied to attention queries and keys.
Norm is applied after attention/feedforward layers rather than before.

Commits:

Add OLMo November 2024 by @2015aroras in #34551
Rename OLMo November to OLMo2 by @2015aroras in #34864

Layer-Skip Llama

We add support for Meta's Layer-Skip Llama 3.2 1B model.

The Llama3.2 1B model was continually pretrained with LayerSkip recipe, early exit loss and layer dropout, as presented in Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding and is capable of performing self-speculative decoding: decode with earlier layers and verify with remaining layers.

Self-speculation (Layer-Skip Llama) by @ArthurZucker in #34240

Tensor Parallel implementation

This PR uses the torch.distributed.tensor.parallel subpackage to implement Tensor Parallel for Llama (as an example).

The motivation is multi-fold:

to make modeling code simple as single-worker case:
all manual TP implementations under if self.config.pretraining_tp > 1 can be removed.
to make tensor parallelism easily accessible by users:
added a model.tensor_parallel(device_mesh) method that allows users to turn a single-proc model into a parallel model. !- Please guide me to a right place to put this function/method if PreTrainedModel is not a preferred place. -!

This is the first PR of many to simplify and enable Tensor Parallel across models.

Simplify Tensor Parallel implementation with PyTorch TP by @kwen2501 in [#34184](https://redirect.github.com/huggingface/transformers/issues/341

Configuration

📅 Schedule: Branch creation - "* 0-4 * * 3" (UTC), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.

If you want to rebase/retry this PR, check this box

This PR was generated by Mend Renovate. View the repository job log.

renovate bot requested review from aurangzaib048 and LorenzoMinto as code owners June 5, 2024 03:45

renovate bot force-pushed the renovate/transformers-4.x branch from d3f070c to 24ba2af Compare July 1, 2024 15:55

renovate bot changed the title ~~Update dependency transformers to v4.41.2~~ Update dependency transformers to v4.42.0 Jul 1, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 24ba2af to 9ef59a3 Compare July 1, 2024 19:30

renovate bot changed the title ~~Update dependency transformers to v4.42.0~~ Update dependency transformers to v4.42.1 Jul 1, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 9ef59a3 to 9abb6f3 Compare July 2, 2024 07:16

renovate bot changed the title ~~Update dependency transformers to v4.42.1~~ Update dependency transformers to v4.42.2 Jul 2, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 9abb6f3 to d173aa7 Compare July 2, 2024 16:10

renovate bot changed the title ~~Update dependency transformers to v4.42.2~~ Update dependency transformers to v4.42.3 Jul 2, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from d173aa7 to 1a54c3d Compare July 15, 2024 18:50

renovate bot changed the title ~~Update dependency transformers to v4.42.3~~ Update dependency transformers to v4.42.4 Jul 15, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 1a54c3d to 0ded9ee Compare July 27, 2024 16:05

renovate bot changed the title ~~Update dependency transformers to v4.42.4~~ Update dependency transformers to v4.43.1 Jul 27, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 0ded9ee to 50a5053 Compare July 28, 2024 16:32

renovate bot changed the title ~~Update dependency transformers to v4.43.1~~ Update dependency transformers to v4.43.2 Jul 28, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 50a5053 to daab913 Compare July 30, 2024 16:30

renovate bot changed the title ~~Update dependency transformers to v4.43.2~~ Update dependency transformers to v4.43.3 Jul 30, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from daab913 to 0023214 Compare August 9, 2024 12:28

renovate bot changed the title ~~Update dependency transformers to v4.43.3~~ Update dependency transformers to v4.43.4 Aug 9, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 0023214 to ccebe27 Compare August 10, 2024 22:22

renovate bot changed the title ~~Update dependency transformers to v4.43.4~~ Update dependency transformers to v4.44.0 Aug 10, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ccebe27 to ce803f6 Compare August 24, 2024 18:38

renovate bot changed the title ~~Update dependency transformers to v4.44.0~~ Update dependency transformers to v4.44.1 Aug 24, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ce803f6 to 7ecf72b Compare August 26, 2024 18:54

renovate bot changed the title ~~Update dependency transformers to v4.44.1~~ Update dependency transformers to v4.44.2 Aug 26, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 7ecf72b to f7a3fb2 Compare September 29, 2024 19:28

renovate bot changed the title ~~Update dependency transformers to v4.44.2~~ Update dependency transformers to v4.45.0 Sep 29, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from f7a3fb2 to 07fa8a0 Compare September 30, 2024 18:27

renovate bot changed the title ~~Update dependency transformers to v4.45.0~~ Update dependency transformers to v4.45.1 Sep 30, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 07fa8a0 to f1816da Compare October 11, 2024 18:50

renovate bot changed the title ~~Update dependency transformers to v4.45.1~~ Update dependency transformers to v4.45.2 Oct 11, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from f1816da to 89065f4 Compare October 28, 2024 10:29

renovate bot changed the title ~~Update dependency transformers to v4.45.2~~ Update dependency transformers to v4.46.0 Oct 28, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 89065f4 to ba78e43 Compare October 29, 2024 16:37

renovate bot changed the title ~~Update dependency transformers to v4.46.0~~ Update dependency transformers to v4.45.2 Oct 29, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ba78e43 to bf7700d Compare November 2, 2024 20:08

renovate bot changed the title ~~Update dependency transformers to v4.45.2~~ Update dependency transformers to v4.46.1 Nov 2, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from bf7700d to ac06e43 Compare November 9, 2024 18:58

renovate bot changed the title ~~Update dependency transformers to v4.46.1~~ Update dependency transformers to v4.46.2 Nov 9, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from ac06e43 to b20f59a Compare November 23, 2024 01:00

renovate bot changed the title ~~Update dependency transformers to v4.46.2~~ Update dependency transformers to v4.46.3 Nov 23, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from b20f59a to 2a428cc Compare December 9, 2024 20:34

renovate bot changed the title ~~Update dependency transformers to v4.46.3~~ Update dependency transformers to v4.47.0 Dec 9, 2024

renovate bot force-pushed the renovate/transformers-4.x branch from 2a428cc to 58d82ea Compare December 24, 2024 18:35

renovate bot changed the title ~~Update dependency transformers to v4.47.0~~ Update dependency transformers to v4.47.1 Dec 24, 2024

Update dependency transformers to v4.48.0

9c900c3

renovate bot force-pushed the renovate/transformers-4.x branch from 58d82ea to 9c900c3 Compare January 19, 2025 08:53

renovate bot changed the title ~~Update dependency transformers to v4.47.1~~ Update dependency transformers to v4.48.0 Jan 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update dependency transformers to v4.48.0 #130

Update dependency transformers to v4.48.0 #130

renovate bot commented Jun 5, 2024 •

edited

Loading

Update dependency transformers to v4.48.0 #130

Are you sure you want to change the base?

Update dependency transformers to v4.48.0 #130

Conversation

renovate bot commented Jun 5, 2024 • edited Loading

Release Notes

v4.48.0: : ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose, DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge, Moonshine

New models

ModernBERT

Aria

TimmWrapper

Pixtral-Large

ColPali

Falcon3

Bamba

VitPose

DINOv2 with registers

Emu3

Cohere2

TextNet

DiffLlama

PixtralLarge

Moonshine

Quantization methods

VPTQ Quantization

HIGGS Quantization

Cleanup

Breaking changes

Conversion scripts

Backtracking in Nougat

Whisper decoding

Attention refactor

Bugfixes and improvements

Significant community contributions

v4.47.1

Patch release v4.47.1

v4.47.0: v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel

New models

PaliGemma-2

I-JEPA

OLMo 2

Layer-Skip Llama

Tensor Parallel implementation

Configuration

renovate bot commented Jun 5, 2024 •

edited

Loading

`v4.48.0`: : ModernBERT, Aria, TimmWrapper, ColPali, Falcon3, Bamba, VitPose, DinoV2 w/ Registers, Emu3, Cohere v2, TextNet, DiffLlama, PixtralLarge, Moonshine

`v4.47.1`

`v4.47.0`: v4.47.0: PaliGemma-2, I-JEPA, OLMo-2, LayerSkip, Tensor Parallel