Support DI sharding for FPE_EBC #2968

hannaxu · 2025-05-13T20:08:24Z

Summary:
Support models that have FeatureProcessedEmbeddingBagCollection.

These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan.

Differential Revision: D74671655

facebook-github-bot · 2025-05-13T20:08:40Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Support models that have FeatureProcessedEmbeddingBagCollection. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding yet, so FPE_EBCs must be all sharded with the same (sharding_type, device). Differential Revision: D74671655

facebook-github-bot · 2025-05-27T12:42:09Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Differential Revision: D74671655

facebook-github-bot · 2025-06-03T15:06:52Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Differential Revision: D74671655

facebook-github-bot · 2025-06-04T14:25:53Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Differential Revision: D74671655

facebook-github-bot · 2025-06-04T18:50:35Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Pull Request resolved: pytorch#2968 Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Differential Revision: D74671655

Summary: Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Differential Revision: D74671655

facebook-github-bot · 2025-06-06T15:19:47Z

This pull request was exported from Phabricator. Differential Revision: D74671655

jingsh · 2025-06-09T18:24:50Z

torchrec/distributed/quant_embeddingbag.py

+                assert (
+                    shard_device_type == expected_device_type
+                ), f"Expected {expected_device_type} but got {shard_device_type} for FeatureProcessedEmbeddingBagCollection sharding device type"
+


since we assert here and the expected_device_type is always cuda, why don't we keep the signature of the method's env as a single ShardingEnv, and just pass the cuda's sharding evn to this?

While right now FPE is used on cuda, we can support it on cuda/cpu in the future. Performing the check here allows us to de-duplicate the logic so we can ensure there's one sharding type for FPE for all sharding paths.

facebook-github-bot · 2025-06-16T21:24:01Z

This pull request was exported from Phabricator. Differential Revision: D74671655

Summary: Pull Request resolved: pytorch#2968 Support models that have FeatureProcessedEmbeddingBagCollection to be DI sharded. However, conservatively enforce that FPE itself can only be sharded on HBM and not across CPU as well. These changes make sure we add QuantFeatureProcessedEmbeddingBagCollectionSharder as a recognized sharder, handle multiple envs needed for specifying DI sharding, and propagate TBE properly when processing the sharding plan. This doesn't support true hybrid sharding for FPE. Reviewed By: faran928 Differential Revision: D74671655

facebook-github-bot · 2025-06-17T18:32:10Z

This pull request was exported from Phabricator. Differential Revision: D74671655

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2025

facebook-github-bot added the fb-exported label May 13, 2025

hannaxu force-pushed the export-D74671655 branch from 704be8d to 3939474 Compare May 27, 2025 12:42

hannaxu force-pushed the export-D74671655 branch from 3939474 to 0ec68e2 Compare June 3, 2025 15:06

hannaxu force-pushed the export-D74671655 branch from 0ec68e2 to a1ace56 Compare June 4, 2025 14:25

hannaxu force-pushed the export-D74671655 branch from a1ace56 to b3406e1 Compare June 4, 2025 18:47

hannaxu force-pushed the export-D74671655 branch from b3406e1 to f4b70a0 Compare June 4, 2025 18:50

hannaxu force-pushed the export-D74671655 branch from f4b70a0 to 527d6cf Compare June 6, 2025 15:19

jingsh suggested changes Jun 9, 2025

View reviewed changes

hannaxu force-pushed the export-D74671655 branch from 527d6cf to 14c9ff0 Compare June 16, 2025 21:24

hannaxu force-pushed the export-D74671655 branch from 14c9ff0 to 2ec92ba Compare June 17, 2025 18:32

facebook-github-bot closed this in 6f583af Jun 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support DI sharding for FPE_EBC #2968

Support DI sharding for FPE_EBC #2968

Uh oh!

hannaxu commented May 13, 2025

Uh oh!

facebook-github-bot commented May 13, 2025

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

facebook-github-bot commented Jun 3, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

jingsh Jun 9, 2025

Uh oh!

hannaxu Jun 17, 2025

Uh oh!

facebook-github-bot commented Jun 16, 2025

Uh oh!

facebook-github-bot commented Jun 17, 2025

Uh oh!

Uh oh!

Support DI sharding for FPE_EBC #2968

Support DI sharding for FPE_EBC #2968

Uh oh!

Conversation

hannaxu commented May 13, 2025

Uh oh!

facebook-github-bot commented May 13, 2025

Uh oh!

facebook-github-bot commented May 27, 2025

Uh oh!

facebook-github-bot commented Jun 3, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 4, 2025

Uh oh!

facebook-github-bot commented Jun 6, 2025

Uh oh!

jingsh Jun 9, 2025

Choose a reason for hiding this comment

Uh oh!

hannaxu Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jun 16, 2025

Uh oh!

facebook-github-bot commented Jun 17, 2025

Uh oh!

Uh oh!