Support eval mode for st publish (#5085)

EddyLXJ · meta-codesync[bot] · commit 8c83b4114d85 · 2025-11-04T19:00:17.000-08:00
Summary: Pull Request resolved: #5085 X-link: https://github.com/facebookresearch/FBGEMM/pull/2093 As title, in silvertorch bulk eval, they will not call eval() for the module but using torch.no_grad() to run. https://www.internalfb.com/code/fbsource/[324dbccd0ab0]/fbcode/dper_lib/silvertorch/core/publish/data_processing/bulk_eval_dmp_gpu.py?lines=1057 So set a eval mode to turn the self.training to False in tbe for bulk eval. Reviewed By: emlin Differential Revision: D86220286 fbshipit-source-id: 9a48c7b4dc09767c99a545d1f25e53bf4265079f
diff --git a/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py b/fbgemm_gpu/fbgemm_gpu/tbe/ssd/training.py
@@ -4762,3 +4762,9 @@ def is_first_tbe() -> bool:
                 logging.info(
                     f"[FREE_MEM Eviction] Evict all at batch {self.step}, {free_cpu_mem_gb} GB free CPU memory, {global_evict_trigger} ranks triggered eviction"
                 )
+
+    def reset_inference_mode(self) -> None:
+        """
+        Reset the inference mode
+        """
+        self.eval()