[BUG] Ensure AutoTuner generates functional configuration files #1489

parthosa · 2025-01-07T21:35:56Z

Describe the bug
The AutoTuner currently generates a configuration file (rapids_4_spark_qualification_output/tuning/app_xxx.conf) that combines existing application execution configurations with Spark RAPIDS configurations recommended by the AutoTuner. This file is intended to be directly usable for subsequent job runs. However, several issues in the generated configuration file prevent it from being used as-is.

Some of the issues Identified:

Configurations with redacted values that are not functional:

--conf spark.databricks.cloudfetch.requestDownloadUrlsWithHeaders=*********(redacted)
--conf spark.databricks.cloudfetch.requesterClassName=*********(redacted)

Configurations specific to the original execution, which are unnecessary or invalid for future runs:

--conf spark.app.startTime=1666840921589
--conf spark.driver.appUIAddress=<dynamic_ip_address>:<dynamic_port>
--conf spark.driver.host=<dynamic_ip_address>
--conf spark.driver.port=<dynamic_port>

If the Spark RAPIDS JAR is unavailable, the AutoTuner comments about its absence but does not provide a valid JAR path in the generated configuration:

--conf spark.plugins=org.apache.spark.sql.connect.SparkConnectPlugin

Similarly, AutoTuner comments about missing configuration but does not generate a configuration with that value:

- 'spark.rapids.memory.pinnedPool.size' should be set to 2048m.
- Cannot recommend RAPIDS Shuffle Manager for unsupported Spark version: '3.1.3'.
  To enable RAPIDS Shuffle Manager, use a supported Spark version (e.g., '3.5.1')
  and set: '--conf spark.shuffle.manager=com.nvidia.spark.rapids.spark351.RapidsShuffleManager'.
  See supported versions: https://docs.nvidia.com/spark-rapids/user-guide/latest/additional-functionality/rapids-shuffle.html#rapids-shuffle-manager.

Some configurations are set by the CSP environments without user's involvement:

--conf spark.dataproc.sql.joinConditionReorder.enabled=true
--conf spark.dataproc.sql.local.rank.pushdown.enabled=true
--conf spark.executorEnv.PYTHONPATH={{PWD}}/pyspark.zip<CPS>{{PWD}}/py4j-0.10.9-src.zip
--conf spark.metrics.namespace=app_name:${spark.app.name}.app_id:${spark.app.id}

Expected behavior
AutoTuner should generate a clean, functional configuration file which is ready for direct use with Spark RAPIDS plugin.

The text was updated successfully, but these errors were encountered:

parthosa added ? - Needs Triage bug Something isn't working core_tools Scope the core module (scala) labels Jan 7, 2025

parthosa mentioned this issue Jan 7, 2025

Improve shuffle manager recommendation in AutoTuner with version validation #1483

Merged

amahussein removed the ? - Needs Triage label Jan 8, 2025

amahussein assigned parthosa Jan 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Ensure AutoTuner generates functional configuration files #1489

[BUG] Ensure AutoTuner generates functional configuration files #1489

parthosa commented Jan 7, 2025 •

edited

Loading

[BUG] Ensure AutoTuner generates functional configuration files #1489

[BUG] Ensure AutoTuner generates functional configuration files #1489

Comments

parthosa commented Jan 7, 2025 • edited Loading

parthosa commented Jan 7, 2025 •

edited

Loading