Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing exec-to-stageId mapping in Qual tool #1437

Merged
merged 11 commits into from
Dec 3, 2024

Conversation

amahussein
Copy link
Collaborator

Fixes #1156

This adds logic to walk the SparkGraph in order to assign execs to
stages. For nodes that have no AccumIDs, the clusterization processes
relies on adjacent nodes.

Some followups and improvements can be tracked in #1434

Refactoring and Improvements:

This pull request includes several changes to the com.nvidia.spark.rapids.tool package, focusing on improving the SQL plan analysis and stage mapping functionality. The changes involve refactoring methods, adding new interfaces, and updating tests to ensure comprehensive coverage.

  • AppSQLPlanAnalyzer.scala: Removed the SQLPlanParser.getStagesInSQLNode method and replaced it with app.getStageIDsFromAccumIds to map stages to operators more efficiently.
  • SQLPlanParser.scala: Refactored the parseSQLPlan method to use ToolsPlanGraph.createGraphWithStageClusters and updated related methods to utilize the new nodeIdToStagesFunc parameter for improved stage mapping. [1] [2] [3] [4]
  • WholeStageExecParser.scala: Updated constructors to include the nodeIdToStagesFunc parameter and refactored methods to use this function for stage mapping. [1] [2] [3]
  • PhotonStageExecParser.scala: Modified the constructor to include the nodeIdToStagesFunc parameter for consistent stage mapping.

New Interface:

  • AccumToStageRetriever.scala: Introduced a new trait AccumToStageRetriever to define the interface for retrieving stage IDs from accumulables, allowing for better separation of logic and easier testing.

Test Updates:

  • BasePlanParserSuite.scala and SQLPlanParserSuite.scala: Added new methods to verify that all execution nodes are assigned to stages and updated existing tests to ensure comprehensive validation of the new stage mapping logic. [1] [2] [3] [4]

These changes collectively enhance the accuracy and maintainability of the SQL plan analysis and stage mapping within the com.nvidia.spark.rapids.tool package.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Fixes NVIDIA#1156

This adds logic to walk the SparkGraph in order to assign execs to
stages. For nodes that have no AccumIDs, the clusterization processes
relies on adjacent nodes.
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@amahussein amahussein added bug Something isn't working core_tools Scope the core module (scala) labels Nov 26, 2024
@amahussein amahussein self-assigned this Nov 26, 2024
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
@amahussein amahussein requested review from nartal1, parthosa and cindyyuanjiang and removed request for nartal1 and parthosa November 26, 2024 21:09
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amahussein ! I tested your PR on different eventlogs and the assignment of stageId's to execs is more accurate and I see lesser execs with no stageId's assigned to them. This will be useful as we have more execs assigned to stages.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>
Copy link
Collaborator Author

@amahussein amahussein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1 !
I have addressed your comments.

Copy link
Collaborator

@nartal1 nartal1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @amahussein !

@amahussein amahussein merged commit 0eb5bf5 into NVIDIA:dev Dec 3, 2024
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] unsupportedoperators.csv shows stageID=-1 for certain unsupported operator
2 participants