Skip to content

Conversation

chloechia4
Copy link

@chloechia4 chloechia4 commented Oct 1, 2025

Description

This PR includes tests for the low-level bindings and the generated low-level bindings introduced in CUDA 13.0 for CUFile.

CUDA 13.0 CuFile Operations

test_set_stats_level
test_stats_start
test_stats_stop
test_stats_reset
test_get_stats_l1
test_get_stats_l2
test_get_stats_l3
test_get_bar_size_in_kb
test_set_parameter_posix_pool_slab_array
test_set_get_parameter_size_t

Note: The original test_batch_io_large_operations() did not pass once switched from CUDA 12.9 to 13.0. I realized it was because the file reads were occurring before the writes as it was submitting all operations (reads and writes) together in one batch. As a result, it was trivially failing because the reads would return as 0 bytes, since they were happening before any write I/O occurred. I changed it to so it would be separated into two phases: writes complete first in one batch handle, and then reads are submitted in another batch handle. This new test works with CUDA 12.9 versioning as well.

All tests passing across CUDA versions
image

Copy link
Contributor

copy-pr-bot bot commented Oct 1, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@leofang
Copy link
Member

leofang commented Oct 1, 2025

Thanks, Chloe! Pinning you internally...

@leofang leofang requested review from cpcloud and mdboom October 1, 2025 23:03
@leofang leofang added P0 High priority - Must do! feature New feature or request cuda.bindings Everything related to the cuda.bindings module labels Oct 1, 2025
@leofang leofang added this to the cuda-python parking lot milestone Oct 1, 2025
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chloechia4 any reason cuFileDriverClose_v2 is removed? I see this symbol still exists in the cuFILE header. For cuda-bindings, the Cython layer (cyxxxxx.{pxd,pyx}) are consider stable public APIs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chloechia4: I left a comment on why I think this is happening in your cybind MR. Addressing that, putting the changes here, and removing those extra files @leofang mentioned should hopefully do it. I haven't tried testing (I'm on WSL and it looks like cuFile doesn't work there).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed this

assert io_events[i].status == cufile.Status.COMPLETE, f"Write {i} failed with status {io_events[i].status}"

# Force file sync
os.fsync(fd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't needed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


# Verify that statistics data was written to the buffer
# Convert buffer to bytes and check that it's not all zeros
buffer_bytes = bytes(stats_buffer)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than just checking for buffer_bytes, can you verify by looking at actual fields of the data structure(CUfileStatsLevel1_t)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point I added field checks in get_stats_l1/get_stats_l2/get_stats_l3. It appears that the Python bindings don't expose the CUfileStatsLevel*_t structures as ctypes classes that we can directly use. So I just added Python equivalent classes

check_status(__status__)


cpdef get_parameter_min_max_value(int param, intptr_t min_value, intptr_t max_value):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add tests for this API as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mdboom
Copy link
Contributor

mdboom commented Oct 15, 2025

/ok to test

Copy link
Contributor

copy-pr-bot bot commented Oct 15, 2025

/ok to test

@mdboom, there was an error processing your request: E1

See the following link for more information: https://docs.gha-runners.nvidia.com/cpr/e/1/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda.bindings Everything related to the cuda.bindings module feature New feature or request P0 High priority - Must do!

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants