Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Cache Offload] Remove device sync overhead #2533

Open
wants to merge 15 commits into
base: xiezhq-hierarchical
Choose a base branch
from

Conversation

Edenzzzz
Copy link

@Edenzzzz Edenzzzz commented Dec 20, 2024

Motivation

Device synchronization is expensive. In most cases we can use event or stream sync. https://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/html/group__CUDART__EVENT_g14c387cc57ce2e328f6669854e6020a5.html

Modifications

  1. Replace device sync with event sync, which only blocks a single stream.
  2. Removed device sync after tensor.to(). It should block the current stream until completion. I ran the benchmark and didn't encounter any illegal memory access issues. If you want to use another stream to write through in the background, we will also need pin_memory()?
    cc @xiezhq-hermann , would appreciate if you could share a bug reproducer for that.

Checklist

  • Format your code according to the Contributor Guide.
  • Add unit tests as outlined in the Contributor Guide.
  • Update documentation as needed, including docstrings or example tutorials.

@Edenzzzz Edenzzzz changed the title Replace device sync with event sync to reduce overhead Replace device sync overhead Dec 20, 2024
@Edenzzzz Edenzzzz changed the title Replace device sync overhead [Cache Offload] Replace device sync overhead Dec 20, 2024
@Edenzzzz Edenzzzz changed the title [Cache Offload] Replace device sync overhead [Cache Offload] Remove device sync overhead Dec 22, 2024
@xiezhq-hermann xiezhq-hermann force-pushed the xiezhq-hierarchical branch 3 times, most recently from 2bd500a to 1853cf2 Compare January 2, 2025 08:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants