-
Notifications
You must be signed in to change notification settings - Fork 553
Error Handling: propagate status for ReleaseGilAndTransferData
and XlaDataToTensors
.
#9431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
ysiraichi
wants to merge
8
commits into
ysiraichi/status-for-oom-errors
Choose a base branch
from
ysiraichi/propagate-status-for-oom
base: ysiraichi/status-for-oom-errors
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Error Handling: propagate status for ReleaseGilAndTransferData
and XlaDataToTensors
.
#9431
ysiraichi
wants to merge
8
commits into
ysiraichi/status-for-oom-errors
from
ysiraichi/propagate-status-for-oom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Blocked until #9429 is merged. |
9d505e7
to
5d4742b
Compare
247fdf5
to
b390a61
Compare
5d4742b
to
40a75d7
Compare
b390a61
to
821c384
Compare
40a75d7
to
b0e25da
Compare
- Refactor status tests - Remove test_status.cpp as its tests are now covered by specialized context tests - Both specialized tests now cover all status utility functions and macros
- Add `XLA_RETURN_IF_ERROR_WITH_LOCATION` macro for external library status propagation - Add `XLA_ASSIGN_OR_RETURN_WITH_LOCATION` macro for external library status handling - Enhance test coverage with new test cases for location-specific macro variants - Improve macro documentation to clarify internal vs external usage patterns
b0e25da
to
97ef4c1
Compare
821c384
to
08c5ecd
Compare
…ansferData` Modify `ReleaseGilAndTransferData` function to use proper status propagation instead of `GetValueOrThrow` with `GetComputationClientOrDie`. This improves error handling by allowing status types to be propagated up the call stack rather than immediately throwing exceptions. Changes: - Update function signature to return `absl::StatusOr<std::vector<xla::Literal>>` - Replace `GetComputationClientOrDie()` with `GetComputationClient()` - Use `XLA_ASSIGN_OR_RETURN` macros for both client acquisition and `TransferFromDevice` - Update callers in tensor_util.cpp and xla_graph_executor.cpp to handle `StatusOr<T>` This follows the status propagation patterns used elsewhere in the codebase and aligns with the examples in pjrt_registry.cpp.
Modify `XlaDataToTensors` function to use proper status propagation instead of `GetValueOrThrow`, and update all callers to handle the new `StatusOr<T>` return type. This continues the status propagation improvements started with `ReleaseGilAndTransferData`. Changes: - Update `XlaDataToTensors` signature to return `absl::StatusOr<std::vector<at::Tensor>>` - Replace `GetValueOrThrow` with `XLA_ASSIGN_OR_RETURN` for `ReleaseGilAndTransferData` call - Update all callers to use `GetValueOrThrow` wrapper: - `XLATensor::ToTensor` in tensor.cpp:515 - test_xla_sharding.cpp:31 - init_python_bindings.cpp:2716 - xla_backend_impl.cpp:95 - Add necessary status.h includes to xla_backend_impl.cpp and test_xla_sharding.cpp This maintains backward compatibility at the API level while enabling proper status propagation internally within the tensor conversion pipeline.
97ef4c1
to
de09876
Compare
46401cf
to
0cc5400
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR refactors our error handling by replacing
GetValueOrThrow
with proper status propagation usingabsl::StatusOr<T>
andXLA_ASSIGN_OR_RETURN
macros.Key Changes:
ReleaseGilAndTransferData
Function:absl::StatusOr<std::vector<xla::Literal>>
.GetComputationClientOrDie()
withGetComputationClient()
.XLA_ASSIGN_OR_RETURN
for client acquisition andTransferFromDevice
calls.tensor_util.cpp
andxla_graph_executor.cpp
to handle the newStatusOr<T>
return type.XlaDataToTensors
Function:absl::StatusOr<std::vector<at::Tensor>>
.GetValueOrThrow
withXLA_ASSIGN_OR_RETURN
for theReleaseGilAndTransferData
call.XLATensor::ToTensor
,test_xla_sharding.cpp
,init_python_bindings.cpp
, andxla_backend_impl.cpp
) to correctly handle theStatusOr<T>
return type.status.h
includes toxla_backend_impl.cpp
andtest_xla_sharding.cpp
.These modifications align with existing status propagation patterns in the codebase, as seen in
pjrt_registry.cpp
, and maintain API-level backward compatibility while improving internal error handling within the tensor conversion pipeline.