-
Notifications
You must be signed in to change notification settings - Fork 990
Description
Describe the bug
The compute-sanitizer --tool racecheck reports a race condition in decode_page_data_generic_kernel when decoding plain strings (decode_kernel_mask::STRING). Commenting out this line:
cudf/cpp/src/io/parquet/decode_fixed.cu
Line 1231 in 2ae7474
| initialize_string_descriptors<is_calc_sizes_only::NO>(s, sb, target_pos, block); |
seems to make the race condition go away but since the function is entirely serial so it could be a red herring. It's also possible that the reported race condition itself is a false positive. The race condition does go away if the column is dictionary encoded (decode_kernel_mask::STRING_DICT)
Steps/Code to reproduce bug
Append this GTest to parquet_reader_test.cpp
TEST_F(ParquetReaderTest, RaceCondition)
{
// Parquet buffer
std::vector<char> parquet_buffer;
{
// Input table
auto col0 = testdata::ascending<cudf::string_view>();
auto table = cudf::table_view{{col0}};
cudf::io::table_input_metadata expected_metadata(table);
expected_metadata.column_metadata[0].set_name("col0");
// Write to parquet buffer
cudf::io::parquet_writer_options out_opts =
cudf::io::parquet_writer_options::builder(cudf::io::sink_info{&parquet_buffer}, table)
.metadata(std::move(expected_metadata))
.compression(cudf::io::compression_type::ZSTD) // Avoid snappy for false positive race
// condition reports in unsnap kernel
.stats_level(cudf::io::statistics_freq::STATISTICS_COLUMN);
cudf::io::write_parquet(out_opts);
}
cudf::io::parquet_reader_options const options = cudf::io::parquet_reader_options::builder(
cudf::io::source_info(cudf::host_span<char>(parquet_buffer.data(), parquet_buffer.size())));
EXPECT_NO_THROW(cudf::io::read_parquet(options));
}Now run this test
compute-sanitizer --tool racecheck ./PARQUET_TEST --gtest_filter="ParquetReaderTest.RaceCondition"Expected behavior
compute-sanitizer should not report a race condition.
Environment overview (please complete the following information)
- RDS Lab dgx05 machine
- cuDF cuda 12.9 pip devcontainer
- branch-25.10
Environment details
N/A
Additional context
Race condition log from compute-sanitizer:
========= Error: Race reported between Read access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x8e60
========= and Write access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x9ac0 [316 hazards]
=========
========= Error: Race reported between Read access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x8e70
========= and Write access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x9cf0 [316 hazards]