You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We wish to add serialization time in one of the diagnostic views in Profiler tool. The serialization time metric was planned to be included in IO diagnostic view. However, we realized it is a stage/task level metric which cannot be added to IO diagnostic view (which collects SQL/node level metrics).
The text was updated successfully, but these errors were encountered:
Contributes to #1374
### Changes
- Added an IO diagnostic view in Profiler output:
`io_diagnostic_metrics.csv`
- Added class `IOAccumDiagnosticMetrics` to store selected IO related
metric names and methods
- Added class `IODiagnosticResult` to represent each IO diagnostic
result
- In
`core/src/main/scala/com/nvidia/spark/rapids/tool/analysis/AppSQLPlanAnalyzer.scala`,
cache results from `generateSQLAccums` and use them to compute IO
diagnostic metrics in function `generateIODiagnosticAccums`
- Added `IODiagnostics` in class `DiagnosticSummaryInfo`
- Reorganized `AccumProfileResults` and `SQLAccumProfileResults`
presentation for better readability
### Testing
- Added unit test "test IO diagnostic metrics" in
`core/src/test/scala/com/nvidia/spark/rapids/tool/profiling/AnalysisSuite.scala`
### Example Output
```
appIndex,appName,appId,sqlId,stageId,stageDurationMs,nodeId,nodeName,outputRowsMin,outputRowsMedian,outputRowsMax,outputRowsTotal,scanTimeMin,scanTimeMedian,scanTimeMax,scanTimeTotal,outputBatchesMin,outputBatchesMedian,outputBatchesMax,outputBatchesTotal,bufferTimeMin,bufferTimeMedian,bufferTimeMax,bufferTimeTotal,shuffleWriteTimeMin,shuffleWriteTimeMedian,shuffleWriteTimeMax,shuffleWriteTimeTotal,fetchWaitTimeMin,fetchWaitTimeMedian,fetchWaitTimeMax,fetchWaitTimeTotal,gpuDecodeTimeMin,gpuDecodeTimeMedian,gpuDecodeTimeMax,gpuDecodeTimeTotal
1,Spark shell,local-1622814619968,0,0,1743,16,"GpuColumnarExchange",1666666,1666667,1666667,10000000,0,0,0,0,200,200,200,1200,0,0,0,0,41434653,60830365,100858775,400284505,0,0,0,0,0,0,0,0
1,Spark shell,local-1622814619968,0,0,1743,21,"Scan",1666666,1666667,1666667,10000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Spark shell,local-1622814619968,0,1,1631,8,"GpuColumnarExchange",1666666,1666667,1666667,10000000,0,0,0,0,200,200,200,1200,0,0,0,0,37444140,92128351,108992798,508750471,0,0,0,0,0,0,0,0
1,Spark shell,local-1622814619968,0,1,1631,13,"Scan",1666666,1666667,1666667,10000000,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
1,Spark shell,local-1622814619968,0,2,688,3,"GpuColumnarExchange",1,1,1,200,0,0,0,0,1,1,1,200,0,0,0,0,139875,230038,9747416,93193331,0,0,0,0,0,0,0,0
```
### Follow-up Issue
#1454
---------
Signed-off-by: cindyyuanjiang <[email protected]>
We wish to add serialization time in one of the diagnostic views in Profiler tool. The serialization time metric was planned to be included in IO diagnostic view. However, we realized it is a stage/task level metric which cannot be added to IO diagnostic view (which collects SQL/node level metrics).
The text was updated successfully, but these errors were encountered: