Skip to content

Commit 9ee1129

Browse files
committed
[SPARK-51739][PYTHON][FOLLOW-UP] Set spark.sql.execution.arrow.pyspark.validateSchema.enabled for 3.5 connect client build
### What changes were proposed in this pull request? We have a scheduled build to test Spark 3.5 client with Spark 4.0 server but this fails. We should set the legacy conf to make the tests passing. ### Why are the changes needed? Build fails as below (https://github.com/apache/spark/actions/runs/14502535136/job/40685331457): ``` ====================================================================== ERROR [0.289s]: test_empty_rows (pyspark.sql.tests.connect.test_parity_arrow_map.ArrowMapParityTests.test_empty_rows) ---------------------------------------------------------------------- Traceback (most recent call last): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_arrow_map.py", line 128, in test_empty_rows self.assertEqual(self.spark.range(10).mapInArrow(empty_rows, "a int").count(), 0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 248, in count pdd = self.agg(_invoke_function("count", lit(1))).toPandas() ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1663, in toPandas return self._session.client.to_pandas(query) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 873, in to_pandas table, schema, metrics, observed_metrics, _ = self._execute_and_fetch( ^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch for response in self._execute_and_fetch_as_iterator(req): File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator self._handle_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error self._handle_rpc_error(error) File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line [153](https://github.com/apache/spark/actions/runs/14502535136/job/40685331457#step:10:154)9, in _handle_rpc_error raise convert_exception(info, status.message) from None pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) [ARROW_TYPE_MISMATCH] Invalid schema from SQL_MAP_ARROW_ITER_UDF: expected StructType(StructField(a,IntegerType,true)), got StructType(StructField(a,DoubleType,true)). SQLSTATE: 42K0G ``` ### Does this PR introduce _any_ user-facing change? No, test-only. ### How was this patch tested? Will monitor the build. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #50613 from HyukjinKwon/SPARK-51739-followup. Authored-by: Hyukjin Kwon <[email protected]> Signed-off-by: Hyukjin Kwon <[email protected]>
1 parent bfe9558 commit 9ee1129

File tree

1 file changed

+2
-1
lines changed

1 file changed

+2
-1
lines changed

.github/workflows/build_python_connect35.yml

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,8 @@ jobs:
9090
# Start a Spark Connect server for local
9191
PYTHONPATH="python/lib/pyspark.zip:python/lib/py4j-0.10.9.9-src.zip:$PYTHONPATH" ./sbin/start-connect-server.sh \
9292
--driver-java-options "-Dlog4j.configurationFile=file:$GITHUB_WORKSPACE/conf/log4j2.properties" \
93-
--jars "`find connector/protobuf/target -name spark-protobuf-*SNAPSHOT.jar`,`find connector/avro/target -name spark-avro*SNAPSHOT.jar`"
93+
--jars "`find connector/protobuf/target -name spark-protobuf-*SNAPSHOT.jar`,`find connector/avro/target -name spark-avro*SNAPSHOT.jar`" \
94+
--conf spark.sql.execution.arrow.pyspark.validateSchema.enabled=false
9495
9596
# Checkout to branch-3.5 to use the tests in branch-3.5.
9697
cd ..

0 commit comments

Comments
 (0)