From ddde667da7610a7bdcf02313cccd3a4ccca97ecc Mon Sep 17 00:00:00 2001 From: Tsung-Hsien Lee Date: Fri, 8 Nov 2024 00:47:38 -0800 Subject: [PATCH] Add citing PyTorch Distributed Shampoo section in README.md Summary: Add a section for how to cite PyTorch Distributed Shampoo. Reviewed By: hjmshi Differential Revision: D65094885 fbshipit-source-id: cf0dbb27b879e184751b9d1cf3a2cc56ca855bd8 --- distributed_shampoo/README.md | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/distributed_shampoo/README.md b/distributed_shampoo/README.md index 1906c4e..87e0f02 100644 --- a/distributed_shampoo/README.md +++ b/distributed_shampoo/README.md @@ -10,7 +10,6 @@ Developers: - Hao-Jun Michael Shi (Meta Platforms, Inc.) - Tsung-Hsien Lee - Anna Cai (Meta Platforms, Inc.) -- Runa Eschenhagen (University of Cambridge) - Shintaro Iwasaki (Meta Platforms, Inc.) - Ke Sang (Meta Platforms, Inc.) - Wang Zhou (Meta Platforms, Inc.) @@ -44,7 +43,7 @@ Key distinctives of this implementation include: We have tested this implementation on the following versions of PyTorch: -- PyTorch >= 2.2; +- PyTorch >= 2.0; - Python >= 3.10; - CUDA 11.3-11.4; 12.2+; @@ -476,6 +475,22 @@ When encountering those errors, following are things you could try: 3. Increase `start_preconditioning_step`. 4. Consider applying gradient clipping. +## Citing PyTorch Distributed Shampoo + +If you use PyTorch Distributed Shampoo in your work, please use the following BibTeX entry. + +```BibTeX +@misc{shi2023pytorchshampoo, + title={A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale}, + author={Hao-Jun Michael Shi and Tsung-Hsien Lee and Shintaro Iwasaki and Jose Gallego-Posada and Zhijing Li and Kaushik Rangadurai and Dheevatsa Mudigere and Michael Rabbat}, + howpublished={\url{https://github.com/facebookresearch/optimizers/tree/main/distributed_shampoo}}, + year ={2023}, + eprint={2309.06497}, + archivePrefix={arXiv}, + primaryClass={cs.LG} +} +``` + ## References 1. [Shampoo: Preconditioned Stochastic Tensor Optimization](https://proceedings.mlr.press/v80/gupta18a/gupta18a.pdf). Vineet Gupta, Tomer Koren, and Yoram Singer. International Conference on Machine Learning, 2018.