Skip to content

Conversation

@rjzamora
Copy link
Member

@rjzamora rjzamora commented Dec 3, 2025

Description

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@rjzamora rjzamora self-assigned this Dec 3, 2025
@rjzamora rjzamora requested a review from a team as a code owner December 3, 2025 15:43
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function labels Dec 3, 2025
@rjzamora rjzamora requested review from mroeschke and vyasr December 3, 2025 15:43
@rjzamora rjzamora added the non-breaking Non-breaking change label Dec 3, 2025
@github-actions github-actions bot added Python Affects Python cuDF API. cudf-polars Issues specific to cudf-polars labels Dec 3, 2025
@GPUtester GPUtester moved this to In Progress in cuDF Python Dec 3, 2025
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All changes to this file are directly reverting #20724

Comment on lines -159 to +172
small_df = _concat(
*await get_small_table(context, small_child, small_ch),
context=ir_context,
)
if context.comm().nranks > 1 and not small_duplicated:
small_dfs, small_size = await get_small_table(context, small_child, small_ch)
need_allgather = context.comm().nranks > 1 and not small_duplicated
if (
ir.options[0] != "Inner" or small_size < target_partition_size
) and not need_allgather:
# Pre-concat for non-inner joins, otherwise
# we need a local shuffle, and face additional
# memory pressure anyway.
small_dfs = [_concat(*small_dfs, context=ir_context)]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before #20724, we were only pre-concatenating for non-inner joins (to avoid the extra local shuffle).

After #20724, we were always pre-concatenating.

Now, we pre-concatenate when it makes sense:

  • To avoid the local shuffle (e.g. non-inner joins)
  • We have small data
  • We aren't already concatenating via an allgather

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

3 - Ready for Review Ready for review by team cudf-polars Issues specific to cudf-polars improvement Improvement / enhancement to an existing function non-breaking Non-breaking change Python Affects Python cuDF API.

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant