update READMEs

tohtana · tohtana · commit 7eedc310ca69 · 2026-01-21T15:46:32.000-08:00
Signed-off-by: Masahiro Tanaka &lt;mtanaka@anyscale.com&gt;
diff --git a/training/tensor_parallel/README.md b/training/tensor_parallel/README.md
@@ -1,14 +1,15 @@
-# AutoTP training examples
+# AutoTP Training Examples
+
 This folder groups AutoTP training examples at different complexity levels.
 
 ## Contents
-- `basic_example/`: minimal AutoTP + ZeRO-2 example with synthetic tokens. It also shows that AutoTP recognizes typical parameter patterns and automatically applies proper partitioning.
-- `hf_integration/`: Hugging Face Trainer example (adapted from Stanford Alpaca).
-- `custom_patterns/`: AutoTP example with custom layer patterns and a simple
+- [Basic example](basic_example): minimal AutoTP + ZeRO-2 example with synthetic tokens. It also shows that AutoTP recognizes typical parameter patterns and automatically applies proper partitioning.
+- [HuggingFace integration](hf_integration): Hugging Face Trainer example (adapted from Stanford Alpaca).
+- [Custom partitioning patterns](custom_patterns): AutoTP example with custom layer patterns and a simple
   text dataset that uses a DP-rank random sampler. It shows how to define
   parameter partitioning easily for custom models with non-standard parameter
   definitions.
 
 ## Related references
-- AutoTP training docs: https://github.com/deepspeedai/DeepSpeed/blob/master/docs/code-docs/source/training.rst
-- AutoTP training tutorial: https://github.com/deepspeedai/DeepSpeed/blob/master/docs/_tutorials/autotp-training.md
+- [AutoTP training docs](https://deepspeed.readthedocs.io/en/latest/training.html)
+- [AutoTP training tutorial](https://github.com/deepspeedai/DeepSpeed/blob/master/docs/_tutorials/autotp-training.md)
diff --git a/training/tensor_parallel/custom_patterns/README.md b/training/tensor_parallel/custom_patterns/README.md
@@ -1,4 +1,5 @@
-# AutoTP custom patterns example
+# AutoTP (Tensor Parallel) Custom Patterns Example
+
 This example extends the minimal AutoTP script with:
 
 - custom layer sharding patterns (`partition_config`)
@@ -10,6 +11,7 @@ AutoTP is enabled by the DeepSpeed config (`tensor_parallel.autotp_size`), so
 you do not need to call any initialization helpers before `deepspeed.initialize`.
 
 ## Key code (custom patterns)
+
 The config below targets **Pythia 6.9B (GPT-NeoX)**, which uses a fused
 `query_key_value` projection. We provide a `shape` so AutoTP can split the
 fused Q/K/V tensor cleanly across tensor-parallel ranks. The MLP uses
diff --git a/training/tensor_parallel/hf_integration/README.md b/training/tensor_parallel/hf_integration/README.md
@@ -1,10 +1,9 @@
-# tensor parallel example (Hugging Face Trainer + AutoTP)
+# AutoTP (Tensor Parallel) HuggingFace Integration Example
+
 This project is adapted from https://github.com/tatsu-lab/stanford_alpaca.
 It uses Hugging Face `Trainer` with a DeepSpeed config that enables AutoTP via `tensor_parallel.autotp_size`.
 We only modified the DeepSpeed config and logging, as an example use case.
 
 **Script**
 
 ``` bash run.sh ``` or ```bash run.sh MODE``` 
-
-