Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transformer_engine.pytorch.distributed.checkpoint function only works with TE modules, instead of all Callables #1423

Open
MaciejBalaNV opened this issue Jan 27, 2025 · 0 comments

Comments

@MaciejBalaNV
Copy link

The activaction checkpointing function only works for TE modules, even though the argument function is typed as Callable, indicating it should work with an arbitrary function, like PyTorch activation checkpointing does. The error in TE comes from this line, where it's assumed the function argument needs fsdp_wrapped attribute. The error message when we pass a function, or a method in this case, is as follows:

[rank0]:   File "/usr/local/lib/python3.12/dist-packages/transformer_engine/pytorch/distributed.py", line 651, in checkpoint
[rank0]:     setattr(function, "fsdp_wrapped", False)
[rank0]: AttributeError: 'method' object has no attribute 'fsdp_wrapped'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant