-
Notifications
You must be signed in to change notification settings - Fork 4.5k
[Bug]: 'Failed to copy Non partitioned table to Column partitioned table: not supported.' #38017
Copy link
Copy link
Open
Description
What happened?
I have a Python Dataflow streaming pipeline which reads from PubSub, does some processing, and writes to BigQuery. Using STORAGE_WRITE_API works fine but I'm trying to use FILE_LOADS to reduce costs, and I get the following exception:
Error message from worker: generic::unknown: Traceback (most recent call last):
File "apache_beam/runners/common.py", line 1498, in apache_beam.runners.common.DoFnRunner.process
File "apache_beam/runners/common.py", line 912, in apache_beam.runners.common.PerWindowInvoker.invoke_process
File "apache_beam/runners/common.py", line 1057, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py", line 528, in process
self.process_one(element, job_name_prefix)
File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py", line 575, in process_one
self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 690, in wait_for_bq_job
raise RuntimeError(
RuntimeError: BigQuery job beam_bq_job_COPY_pipenmeafileloads_COPY_STEP_e2c58135db064a1e869f3633a7a1b037_0b8da1df1d54f6591407114d91108002 failed. Error Result: <ErrorProto
message: 'Failed to copy Non partitioned table to Column partitioned table: not supported.'
reason: 'invalid'>I've done this successfully in the past, and I think it could be related to high volume of incoming messages.
I'm using DAY partitioning in the output table, and a triggering_frequency of 5 minutes. Does anyone know why this could be happening? Is there a set of parameters that could help resolve this issue?
Someone reported something similar here:
https://stackoverflow.com/questions/68556242/pub-sub-to-bigquery-batch-using-dataflow-python
Thanks.
Issue Priority
Priority: 2 (default / most bugs should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner
Reactions are currently unavailable