You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-51739][PYTHON][FOLLOW-UP] Set spark.sql.execution.arrow.pyspark.validateSchema.enabled for 3.5 connect client build
### What changes were proposed in this pull request?
We have a scheduled build to test Spark 3.5 client with Spark 4.0 server but this fails. We should set the legacy conf to make the tests passing.
### Why are the changes needed?
Build fails as below (https://github.com/apache/spark/actions/runs/14502535136/job/40685331457):
```
======================================================================
ERROR [0.289s]: test_empty_rows (pyspark.sql.tests.connect.test_parity_arrow_map.ArrowMapParityTests.test_empty_rows)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/tests/test_arrow_map.py", line 128, in test_empty_rows
self.assertEqual(self.spark.range(10).mapInArrow(empty_rows, "a int").count(), 0)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 248, in count
pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/dataframe.py", line 1663, in toPandas
return self._session.client.to_pandas(query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 873, in to_pandas
table, schema, metrics, observed_metrics, _ = self._execute_and_fetch(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1283, in _execute_and_fetch
for response in self._execute_and_fetch_as_iterator(req):
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1264, in _execute_and_fetch_as_iterator
self._handle_error(error)
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line 1503, in _handle_error
self._handle_rpc_error(error)
File "/home/runner/work/spark/spark-3.5/python/pyspark/sql/connect/client/core.py", line [153](https://github.com/apache/spark/actions/runs/14502535136/job/40685331457#step:10:154)9, in _handle_rpc_error
raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.SparkConnectGrpcException: (org.apache.spark.SparkException) [ARROW_TYPE_MISMATCH] Invalid schema from SQL_MAP_ARROW_ITER_UDF: expected StructType(StructField(a,IntegerType,true)), got StructType(StructField(a,DoubleType,true)). SQLSTATE: 42K0G
```
### Does this PR introduce _any_ user-facing change?
No, test-only.
### How was this patch tested?
Will monitor the build.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes#50613 from HyukjinKwon/SPARK-51739-followup.
Authored-by: Hyukjin Kwon <[email protected]>
Signed-off-by: Hyukjin Kwon <[email protected]>
0 commit comments