Skip to content

by default enable skip agg #12558

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 6, 2025

Conversation

binmahone
Copy link
Collaborator

this PR closes #12557, pls check out the descriptions there

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

winningsix
winningsix previously approved these changes Apr 22, 2025
@abellina
Copy link
Collaborator

abellina commented Apr 22, 2025

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

@sameerz sameerz added the performance A performance related task/issue label Apr 22, 2025
@binmahone
Copy link
Collaborator Author

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings. Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

@binmahone
Copy link
Collaborator Author

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings (unless it's very verbose). Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

@abellina
Copy link
Collaborator

I believe we should mark the config internal as it doesn't seem like it is a user setting. You'd have to know that the partial aggregate has two stages, and that the first stage does some aggregation and that there are more stages that further combine these partial aggregates into bigger partial aggregates.

will do

Side comment, this log at INFO level seems a little odd. Shouldn't this be at DEBUG level? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L995 and the INFO level one should be the case where we do skip? https://github.com/NVIDIA/spark-rapids/blob/branch-25.06/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuAggregateExec.scala#L986

These two logs are both helpful when troubleshooting the skip agg case. How about I changing both to INFO level? I don't know what role DEBUG level logging are supposed to play for us. To me, it would be nice if a log in user production is informative enough for most troubleshootings (unless it's very verbose). Asking user to run it again with DEBUG logging settings would be impractical sometimes because it might be very costly.

That's fine with me. It's once per task for the first batch only, so it should be fine for now to make them both INFO

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

Signed-off-by: Hongbin Ma (Mahone) <[email protected]>
@binmahone
Copy link
Collaborator Author

build

@binmahone
Copy link
Collaborator Author

build

@binmahone binmahone merged commit 6ce344b into NVIDIA:branch-25.06 May 6, 2025
54 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance A performance related task/issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[FEA] by default enable skip agg when the inputs of agg cannot reduce
5 participants