Skip to content

Conversation

@red-hat-konflux
Copy link

@red-hat-konflux red-hat-konflux bot commented May 17, 2025

This PR contains the following updates:

Package Change Age Confidence
accelerate ==0.28.0 -> ==0.34.2 age confidence

Warning

Some dependencies could not be looked up. Check the warning logs for more information.


Release Notes

huggingface/accelerate (accelerate)

v0.34.2

Compare Source

v0.34.1: Patchfix

Compare Source

Bug fixes

  • Fixes an issue where processed DataLoaders could no longer be pickled in #​3074 thanks to @​byi8220
  • Fixes an issue when using FSDP where default_transformers_cls_names_to_wrap would separate _no_split_modules by characters instead of keeping it as a list of layer names in #​3075

Full Changelog: huggingface/accelerate@v0.34.0...v0.34.1

v0.34.0: : StatefulDataLoader Support, FP8 Improvements, and PyTorch Updates!

Compare Source

Dependency Changes

  • Updated Safetensors Requirement: The library now requires safetensors version 0.4.3.
  • Added support for Numpy 2.0: The library now fully supports numpy 2.0.0

Core

New Script Behavior Changes
  • Process Group Management: PyTorch now requires users to destroy process groups after training. The accelerate library will handle this automatically with accelerator.end_training(), or you can do it manually using PartialState().destroy_process_group().
  • MLU Device Support: Added support for saving and loading RNG states on MLU devices by @​huismiling
  • NPU Support: Corrected backend and distributed settings when using transfer_to_npu, ensuring better performance and compatibility.
DataLoader Enhancements
  • Stateful DataDataLoader: We are excited to announce that early support has been added for the StatefulDataLoader from torchdata, allowing better handling of data loading states. Enable by passing use_stateful_dataloader=True to the DataLoaderConfiguration, and when calling load_state() the DataLoader will automatically be resumed from its last step, no more having to iterate through passed batches.
  • Decoupled Data Loader Preparation: The prepare_data_loader() function is now independent of the Accelerator, giving you more flexibility towards which API levels you would like to use.
  • XLA Compatibility: Added support for skipping initial batches when using XLA.
  • Improved State Management: Bug fixes and enhancements for saving/loading DataLoader states, ensuring smoother training sessions.
  • Epoch Setting: Introduced the set_epoch function for MpDeviceLoaderWrapper.
FP8 Training Improvements
  • Enhanced FP8 Training: Fully Sharded Data Parallelism (FSDP) and DeepSpeed support now work seamlessly with TransformerEngine FP8 training, including better defaults for the quantized FP8 weights.
  • Integration baseline: We've added a new suite of examples and benchmarks to ensure that our TransformerEngine integration works exactly as intended. These scripts run one half using 🤗 Accelerate's integration, the other with raw TransformersEngine, providing users with a nice example of what we do under the hood with accelerate, and a good sanity check to make sure nothing breaks down over time. Find them here
  • Import Fixes: Resolved issues with import checks for the Transformers Engine that has downstream issues.
  • FP8 Docker Images: We've added new docker images for TransformerEngine and accelerate as well. Use docker pull huggingface/accelerate@gpu-fp8-transformerengine to quickly get an environment going.

torchpippy no more, long live torch.distributed.pipelining

  • With the latest PyTorch release, torchpippy is now fully integrated into torch core, and as a result we are exclusively supporting the PyTorch implementation from now on
  • There are breaking examples and changes that comes from this shift. Namely:
    • Tracing of inputs is done with a shape each GPU will see, rather than the size of the total batch. So for 2 GPUs, one should pass in an input of [1, n, n] rather than [2, n, n] as before.
    • We no longer support Encoder/Decoder models. PyTorch tracing for pipelining no longer supports encoder/decoder models, so the t5 example has been removed.
    • Computer vision model support currently does not work: There are some tracing issues regarding resnet's we are actively looking into.
  • If either of these changes are too breaking, we recommend pinning your accelerate version. If the encoder/decoder model support is actively blocking your inference using pippy, please open an issue and let us know. We can look towards adding in the old support for torchpippy potentially if needed.

Fully Sharded Data Parallelism (FSDP)

  • Environment Flexibility: Environment variables are now fully optional for FSDP, simplifying configuration. You can now fully create a FullyShardedDataParallelPlugin yourself manually with no need for environment patching:
from accelerate import FullyShardedDataParallelPlugin
fsdp_plugin = FullyShardedDataParallelPlugin(...)
  • FSDP RAM efficient loading: Added a utility to enable RAM-efficient model loading (by setting the proper environmental variable). This is generally needed if not using accelerate launch and need to ensure the env variables are setup properly for model loading:
from accelerate.utils import enable_fsdp_ram_efficient_loading, disable_fsdp_ram_efficient_loading
enable_fsdp_ram_efficient_loading()
  • Model State Dict Management: Enhanced support for unwrapping model state dicts in FSDP, making it easier to manage distributed models.

New Examples

Bug Fixes

New Contributors

Full Changelog:

Detailed Full Changelog:

v0.33.0: : MUSA backend support and bugfixes

Compare Source

MUSA backend support and bugfixes

Small release this month, with key focuses on some added support for backends and bugs:

What's Changed

New Contributors

Full Changelog: huggingface/accelerate@v0.32.1...v0.33.0

v0.32.1

Compare Source

v0.32.0: : Profilers, new hooks, speedups, and more!

Compare Source

Core
  • Utilize shard saving from the huggingface_hub rather than our own implementation (#​2795)
  • Refactor logging to use logger in dispatch_model (#​2855)
  • The Accelerator.step number is now restored when using save_state and load_state (#​2765)
  • A new profiler has been added allowing users to collect performance metrics during model training and inference, including detailed analysis of execution time and memory consumption. These can then be generated in Chrome's tracing tool. Read more about it here (#​2883)
  • Reduced import times for doing import accelerate and any other major core import by 68%, now should be only slightly longer than doing import torch (#​2845)
  • Fixed a bug in get_backend and added a clear_device_cache utility (#​2857)
Distributed Data Parallelism
  • Introduce DDP communication hooks to have more flexibility in how gradients are communicated across workers, overriding the standard allreduce. (#​2841)
  • Make log_line_prefix_template optional the notebook_launcher (#​2888)
FSDP
  • If the output directory doesn't exist when using accelerate merge-weights, one will be automatically created (#​2854)
  • When merging weights, the default is now .safetensors (#​2853)
XPU
  • Migrate to pytorch's native XPU backend on torch>=2.4 (#​2825)
  • Add @require_triton test decorator and enable test_dynamo work on xpu (#​2878)
  • Fixed load_state_dict not working on xpu and refine xpu safetensors version check (#​2879)
XLA
  • Added support for XLA Dynamo backends for both training and inference (#​2892)
Examples
  • Added a new multi-cpu SLURM example using accelerate launch (#​2902)
Full Changelog
New Contributors

Full Changelog: huggingface/accelerate@v0.31.0...v0.32.0

v0.31.0: : Better support for sharded state dict with FSDP and Bugfixes

Compare Source

Core
FSDP
Megatron
What's Changed
New Contributors

Full Changelog: huggingface/accelerate@v0.30.1...v0.31.0

v0.30.1: : Bugfixes

Compare Source

Patchfix
  • Fix duplicate environment variable check in multi-cpu condition thanks to @​yhna940 in #​2752
  • Fix issue with missing values in the SageMaker config leading to not being able to launch in #​2753
  • Fix CPU OMP num threads setting thanks to @​jiqing-feng in #​2755
  • Fix FSDP checkpoint unable to resume when using offloading and sharded weights due to CUDA OOM when loading the optimizer and model #​2762
  • Fixed the problem of incorrect conditional judgment statement when configuring enable_cpu_affinity thanks to @​statelesshz in #​2748
  • Fix stacklevel in logging to log the actual user call site (instead of the call site inside the logger wrapper) of log functions thanks to @​luowyang in #​2730
  • Fix support for multiple optimizers when using LOMO thanks to @​younesbelkada in #​2745

Full Changelog: huggingface/accelerate@v0.30.0...v0.30.1

v0.30.0: : Advanced optimizer support, MoE DeepSpeed support, add upcasting for FSDP, and more

Compare Source

Core
Documentation
  • Through collaboration between @​fabianlim (lead contribuitor), @​stas00, @​pacman100, and @​muellerzr we have a new concept guide out for FSDP and DeepSpeed explicitly detailing how each interop and explaining fully and clearly how each of those work. This was a momumental effort by @​fabianlim to ensure that everything can be as accurate as possible to users. I highly recommend visiting this new documentation, available here
  • New distributed inference examples have been added thanks to @​SunMarc in #​2672
  • Fixed some docs for using internal trackers by @​brentyi in #​2650
DeepSpeed
  • Accelerate can now handle MoE models when using deepspeed, thanks to @​pacman100 in #​2662
  • Allow "auto" for gradient clipping in YAML by @​regisss in #​2649
  • Introduce a deepspeed-specific Docker image by @​muellerzr in #​2707. To use, pull the gpu-deepspeed tag docker pull huggingface/accelerate:cuda-deepspeed-nightly
Megatron
Big Modeling
Bug Fixes
Full Changelog

Configuration

📅 Schedule: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.

Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 Ignore: Close this PR and you won't be reminded about this update again.


  • If you want to rebase/retry this PR, check this box

To execute skipped test pipelines write comment /ok-to-test.


Documentation

Find out how to configure dependency updates in MintMaker documentation or see all available configuration options in Renovate documentation.

Signed-off-by: red-hat-konflux <126015336+red-hat-konflux[bot]@users.noreply.github.com>
@coveralls
Copy link

Pull Request Test Coverage Report for Build 15082634293

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall first build on konflux/mintmaker/konflux-poc/accelerate-0.x at 93.407%

Totals Coverage Status
Change from base Build 15020007478: 93.4%
Covered Lines: 85
Relevant Lines: 91

💛 - Coveralls

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant