Skip to content

[Bug]: 'Failed to copy Non partitioned table to Column partitioned table: not supported.' #38017

@tomaslink

Description

@tomaslink

What happened?

I have a Python Dataflow streaming pipeline which reads from PubSub, does some processing, and writes to BigQuery. Using STORAGE_WRITE_API works fine but I'm trying to use FILE_LOADS to reduce costs, and I get the following exception:

Error message from worker: generic::unknown: Traceback (most recent call last):
  File "apache_beam/runners/common.py", line 1498, in apache_beam.runners.common.DoFnRunner.process
  File "apache_beam/runners/common.py", line 912, in apache_beam.runners.common.PerWindowInvoker.invoke_process
  File "apache_beam/runners/common.py", line 1057, in apache_beam.runners.common.PerWindowInvoker._invoke_process_per_window
  File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py", line 528, in process
    self.process_one(element, job_name_prefix)
  File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_file_loads.py", line 575, in process_one
    self.bq_wrapper.wait_for_bq_job(job_reference, sleep_duration_sec=10)
  File "/usr/local/lib/python3.12/site-packages/apache_beam/io/gcp/bigquery_tools.py", line 690, in wait_for_bq_job
    raise RuntimeError(
RuntimeError: BigQuery job beam_bq_job_COPY_pipenmeafileloads_COPY_STEP_e2c58135db064a1e869f3633a7a1b037_0b8da1df1d54f6591407114d91108002 failed. Error Result: <ErrorProto
 message: 'Failed to copy Non partitioned table to Column partitioned table: not supported.'
 reason: 'invalid'>

I've done this successfully in the past, and I think it could be related to high volume of incoming messages.
I'm using DAY partitioning in the output table, and a triggering_frequency of 5 minutes. Does anyone know why this could be happening? Is there a set of parameters that could help resolve this issue?

Someone reported something similar here:
https://stackoverflow.com/questions/68556242/pub-sub-to-bigquery-batch-using-dataflow-python

Thanks.

Issue Priority

Priority: 2 (default / most bugs should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions