From ddde667da7610a7bdcf02313cccd3a4ccca97ecc Mon Sep 17 00:00:00 2001
From: Tsung-Hsien Lee <zong@meta.com>
Date: Fri, 8 Nov 2024 00:47:38 -0800
Subject: [PATCH] Add citing PyTorch Distributed Shampoo section in README.md

Summary: Add a section for how to cite PyTorch Distributed Shampoo.

Reviewed By: hjmshi

Differential Revision: D65094885

fbshipit-source-id: cf0dbb27b879e184751b9d1cf3a2cc56ca855bd8
---
 distributed_shampoo/README.md | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/distributed_shampoo/README.md b/distributed_shampoo/README.md
index 1906c4e..87e0f02 100644
--- a/distributed_shampoo/README.md
+++ b/distributed_shampoo/README.md
@@ -10,7 +10,6 @@ Developers:
 - Hao-Jun Michael Shi (Meta Platforms, Inc.)
 - Tsung-Hsien Lee
 - Anna Cai (Meta Platforms, Inc.)
-- Runa Eschenhagen (University of Cambridge)
 - Shintaro Iwasaki (Meta Platforms, Inc.)
 - Ke Sang (Meta Platforms, Inc.)
 - Wang Zhou (Meta Platforms, Inc.)
@@ -44,7 +43,7 @@ Key distinctives of this implementation include:
 
 We have tested this implementation on the following versions of PyTorch:
 
-- PyTorch >= 2.2;
+- PyTorch >= 2.0;
 - Python >= 3.10;
 - CUDA 11.3-11.4; 12.2+;
 
@@ -476,6 +475,22 @@ When encountering those errors, following are things you could try:
 3. Increase `start_preconditioning_step`.
 4. Consider applying gradient clipping.
 
+## Citing PyTorch Distributed Shampoo
+
+If you use PyTorch Distributed Shampoo in your work, please use the following BibTeX entry.
+
+```BibTeX
+@misc{shi2023pytorchshampoo,
+    title={A Distributed Data-Parallel PyTorch Implementation of the Distributed Shampoo Optimizer for Training Neural Networks At-Scale},
+    author={Hao-Jun Michael Shi and Tsung-Hsien Lee and Shintaro Iwasaki and Jose Gallego-Posada and Zhijing Li and Kaushik Rangadurai and Dheevatsa Mudigere and Michael Rabbat},
+    howpublished={\url{https://github.com/facebookresearch/optimizers/tree/main/distributed_shampoo}},
+    year ={2023},
+    eprint={2309.06497},
+    archivePrefix={arXiv},
+    primaryClass={cs.LG}
+}
+```
+
 ## References
 
 1. [Shampoo: Preconditioned Stochastic Tensor Optimization](https://proceedings.mlr.press/v80/gupta18a/gupta18a.pdf). Vineet Gupta, Tomer Koren, and Yoram Singer. International Conference on Machine Learning, 2018.