Skip to content

[BUG] Race condition in PQ decoder while decoding PLAIN strings #19632

@mhaseeb123

Description

@mhaseeb123

Describe the bug
The compute-sanitizer --tool racecheck reports a race condition in decode_page_data_generic_kernel when decoding plain strings (decode_kernel_mask::STRING). Commenting out this line:

initialize_string_descriptors<is_calc_sizes_only::NO>(s, sb, target_pos, block);

seems to make the race condition go away but since the function is entirely serial so it could be a red herring. It's also possible that the reported race condition itself is a false positive. The race condition does go away if the column is dictionary encoded (decode_kernel_mask::STRING_DICT)

Steps/Code to reproduce bug
Append this GTest to parquet_reader_test.cpp

TEST_F(ParquetReaderTest, RaceCondition)
{
  // Parquet buffer
  std::vector<char> parquet_buffer;
  {
    // Input table
    auto col0  = testdata::ascending<cudf::string_view>();
    auto table = cudf::table_view{{col0}};
    cudf::io::table_input_metadata expected_metadata(table);
    expected_metadata.column_metadata[0].set_name("col0");

    // Write to parquet buffer
    cudf::io::parquet_writer_options out_opts =
      cudf::io::parquet_writer_options::builder(cudf::io::sink_info{&parquet_buffer}, table)
        .metadata(std::move(expected_metadata))
        .compression(cudf::io::compression_type::ZSTD)  // Avoid snappy for false positive race
                                                        // condition reports in unsnap kernel
        .stats_level(cudf::io::statistics_freq::STATISTICS_COLUMN);
    cudf::io::write_parquet(out_opts);
  }

  cudf::io::parquet_reader_options const options = cudf::io::parquet_reader_options::builder(
    cudf::io::source_info(cudf::host_span<char>(parquet_buffer.data(), parquet_buffer.size())));
  EXPECT_NO_THROW(cudf::io::read_parquet(options));
}

Now run this test

compute-sanitizer --tool racecheck ./PARQUET_TEST --gtest_filter="ParquetReaderTest.RaceCondition"

Expected behavior
compute-sanitizer should not report a race condition.

Environment overview (please complete the following information)

  • RDS Lab dgx05 machine
  • cuDF cuda 12.9 pip devcontainer
  • branch-25.10

Environment details
N/A

Additional context
Race condition log from compute-sanitizer:

========= Error: Race reported between Read access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x8e60
=========     and Write access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x9ac0 [316 hazards]
========= 
========= Error: Race reported between Read access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x8e70
=========     and Write access at void cudf::io::parquet::detail::<unnamed>::decode_page_data_generic<unsigned char, (int)128, (cudf::io::parquet::detail::decode_kernel_mask)2>(cudf::io::parquet::detail::PageInfo *, cudf::device_span<const cudf::io::parquet::detail::ColumnChunkDesc, (unsigned long)18446744073709551615>, unsigned long, unsigned long, cudf::device_span<const bool, (unsigned long)18446744073709551615>, cudf::device_span<unsigned long, (unsigned long)18446744073709551615>, unsigned int *)+0x9cf0 [316 hazards]

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcuIOcuIO issuelibcudfAffects libcudf (C++/CUDA) code.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions