Skip to content
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions presto/scripts/setup_benchmark_data_and_tables.sh
Original file line number Diff line number Diff line change
Expand Up @@ -61,3 +61,7 @@ DATA_GEN_SCRIPT_PATH=$(readlink -f ../../benchmark_data_tools/generate_data_file
$CONVERT_DECIMALS_TO_FLOATS_ARG

./setup_benchmark_tables.sh -b $BENCHMARK_TYPE -s $SCHEMA_NAME -d $DATA_DIR_NAME

echo "Running ANALYZE TABLES for schema '$SCHEMA_NAME'..."
Copy link
Contributor

@mattgara mattgara Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably makes sense to make this optional. Otherwise, for the foreseeable future this would place a CPU worker requirement on this script (due to the requirement on the analyze tables script.)

That is add a command line option to this script that (may be enabled by default) will toggle this behaviour on/off.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and I'm in favor of having the option "on" by default.

I think we should also add a check when the option is "on" to make sure we are running on CPU workers (as ANALYZE is currently not supported by GPU workers, as Matt alluded to). If the option is set and we are running on GPU, then I think we should abort the script with an error suggesting we run analyze_tables.sh with CPU workers

./analyze_tables.sh --schema-name "$SCHEMA_NAME"
echo "Column statistics refreshed for schema '$SCHEMA_NAME'."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: This message does not match the previous message. Consider something along the lines of "ANALYZE TABLES successfully executed for schema '$SCHEMA_NAME'".

4 changes: 4 additions & 0 deletions presto/scripts/setup_benchmark_tables.sh
Original file line number Diff line number Diff line change
Expand Up @@ -50,3 +50,7 @@ trap cleanup EXIT
--schema-name $SCHEMA_NAME \
--schemas-dir-path $TEMP_SCHEMA_DIR \
--data-dir-name $DATA_DIR_NAME

echo "Running ANALYZE TABLES for schema '$SCHEMA_NAME'..."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this leads to analyze_tables.sh being called twice, does it not? As setup_benchmark_data_and_tables.sh calls setup_benchmark_tables.sh on line 63?

I think we can just have the one copy here.

./analyze_tables.sh --schema-name "$SCHEMA_NAME"
echo "Column statistics refreshed for schema '$SCHEMA_NAME'."