-
Notifications
You must be signed in to change notification settings - Fork 526
Improve torch_xla.compile
documentation
#9194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
69266ed
to
890336c
Compare
docs/source/learn/eager.md
Outdated
|
||
## Basic Usage | ||
- **Recompilation Overhead**: Non-core operations (e.g., data preprocessing) | ||
could leak into the graph, triggering expensive recompilations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... can leak ....
docs/source/learn/eager.md
Outdated
- `torch_xla.compile` is optimized for PyTorch/XLA training workflows. Designed | ||
to work efficiently with the XLA backend for iterative training, it's the | ||
recommended API for compiling training loops due to its observed performance | ||
advantages. Best practice dictates enclosing the complete training step—forward |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best practice is to enclose the complete training step-forward pass, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL.
docs/source/learn/eager.md
Outdated
|
||
The benchmark results unequivocally demonstrate that eager mode combined with | ||
`torch_xla.compile` achieves performance parity with the traditional LazyTensor | ||
tracing mode, both yielding `147` tokens/s. This empirically validates the claim |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"... This shows that the new API provides a better user experience without a performance penalty. " I'm trying to keep sentence length down to ease readibility. I don't think the last phrase ("no-regret ...") is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some wordsmiths. PTAL
|
||
## Basic Usage | ||
- **Recompilation Overhead**: Non-core operations (e.g., data preprocessing) can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could be clearer to explain that these non-core operations are all recorded. Maybe this phrasing:
- **Recompilation Overhead**: Whenever any part of the captured graph changes, `torch_xla.sync()` will
recompile the whole graph. Changes in non-core operations (e.g., data preprocessing) thus trigger expensive
recompilations.
|
||
To address these issues, PyTorch/XLA introduces an experimental eager mode | ||
(enabled via `torch_xla.experimental.eager_mode(True)`) and the | ||
`torch_xla.compile` API. **This shift aligns PyTorch/XLA more closely with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to emphasize this sentence? Seems a bit jarring to me but just a minor opinion.
performance**. Eager mode is likely to become the default in future releases. | ||
|
||
- **Eager Mode**: Executes operations immediately, enhancing flexibility and | ||
debugging but at a performance cost for core tasks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "core tasks" is vague. Could simply remove: "[...] but at a performance cost."
This varying overhead means pure eager mode is not intended for main training or | ||
inference loops. Its utility lies in non-core tasks like data preprocessing, | ||
random number generation, custom utilities, or debugging, where immediate | ||
execution is prioritized over throughput. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need to call out that torch_xla.compile
is independently useful, even not in eager mode. That's why torchprime does: https://github.com/AI-Hypercomputer/torchprime/blob/31c450e82c6273f50f9815351f6fbebb42903f58/torchprime/torch_xla_models/train.py#L330
Wrapping the training loop in torch_xla.compile
provides a few benefits:
- There's no
mark_step
anywhere - Dataloading operations don't leak into the training loop graph. This benefit is similar to if eager mode was turned on. The only difference is that the dataloading operations are captured into a separate graph as opposed to running eagerly.
torch_xla.compile(full_graph=True)
will catch accidental graph breaks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since torch_xla.compile
isn't really tied to eager mode, I think it could be clearer to rename the document heading as such. Alternatively, we could move the contents about torch_xla.compile
to a separate compile.md
, and then talk about its interaction with eager mode in this markdown.
fixes #8859