Skip to content

Fix CI and open telemetry API usage#305

Closed
H-Huang wants to merge 4 commits intometa-pytorch:mainfrom
H-Huang:fix_otel
Closed

Fix CI and open telemetry API usage#305
H-Huang wants to merge 4 commits intometa-pytorch:mainfrom
H-Huang:fix_otel

Conversation

@H-Huang
Copy link
Contributor

@H-Huang H-Huang commented Dec 23, 2025

TorchFT CI is installing the latest opentelemetry-sdk, which caused breakages in some of the APIs in otel.py. This PR updates otel.py to use the correct APIs.

This PR also fixes TorchFT lint (pyre check was failing).

This should resolve the failing TorchFT CI failures:
https://github.com/meta-pytorch/torchft/actions/runs/19983176027/job/57313269930

This is also causing downstream CI failures in in torchtitan: https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_torchft.yaml

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Dec 23, 2025
@H-Huang H-Huang changed the title Fix open telemetry API usage Fix CI and open telemetry API usage Dec 23, 2025
Copy link

@tianyu-l tianyu-l left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be other failures.

Stamp to unblock -- hope that's OK

@H-Huang
Copy link
Contributor Author

H-Huang commented Dec 23, 2025

CPU tests are passing now, but the GPU tests are failing with a NCCL error (https://github.com/meta-pytorch/torchft/actions/runs/20466340171/job/58826912036?pr=305) that is unrelated to the PR. I will land it and we should investigate where the nccl error is happening

@meta-codesync
Copy link

meta-codesync bot commented Dec 23, 2025

@H-Huang has imported this pull request. If you are a Meta employee, you can view this in D89743414.

H-Huang added a commit to H-Huang/torchft that referenced this pull request Dec 26, 2025
Summary:
TorchFT CI is installing the latest `opentelemetry-sdk`, which caused breakages in some of the APIs in otel.py. This PR updates `otel.py` to use the correct APIs.

This PR also fixes TorchFT lint (pyre check was failing).

This should resolve the failing TorchFT CI failures:
https://github.com/meta-pytorch/torchft/actions/runs/19983176027/job/57313269930

This is also causing downstream CI failures in in torchtitan: https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_torchft.yaml


Reviewed By: tianyu-l

Differential Revision: D89743414

Pulled By: H-Huang
@meta-codesync meta-codesync bot closed this in 86c4291 Dec 26, 2025
@meta-codesync
Copy link

meta-codesync bot commented Dec 26, 2025

@H-Huang merged this pull request in 86c4291.

AnantGulati pushed a commit to AnantGulati/torchft that referenced this pull request Jan 2, 2026
Summary:
Pull Request resolved: meta-pytorch#306

TorchFT CI is installing the latest `opentelemetry-sdk`, which caused breakages in some of the APIs in otel.py. This PR updates `otel.py` to use the correct APIs.

This PR also fixes TorchFT lint (pyre check was failing).

This should resolve the failing TorchFT CI failures:
https://github.com/meta-pytorch/torchft/actions/runs/19983176027/job/57313269930

This is also causing downstream CI failures in in torchtitan: https://github.com/pytorch/torchtitan/actions/workflows/integration_test_8gpu_torchft.yaml

Pull Request resolved: meta-pytorch#305

Reviewed By: tianyu-l

Differential Revision: D89743414

Pulled By: H-Huang

fbshipit-source-id: c066fe535b332ba94b918f5b684595c8de8b6740
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot. Merged

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants