-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Closed
Closed
Copy link
Description
What would you like to happen?
What is the problem?
When using BigQuery Storage Write API (STORAGE_WRITE_API method) and write operations fail, the BigQueryStorageApiInsertError object does not contain information about which
table the error occurred on. This makes it difficult for users to identify and troubleshoot errors, especially when writing to multiple tables.
Current behavior:
BigQueryStorageApiInsertErroronly contains the row data and error message- Users cannot determine which table caused the error without additional logging
Expected behavior:
BigQueryStorageApiInsertErrorshould include table identification (project, dataset, table)- API should be consistent with
BigQueryInsertError(used bySTREAMING_INSERTSmethod), which provides table information viaTableReference
Proposed solution:
Add the following to BigQueryStorageApiInsertError:
tableUrnfield (format:projects/{project}/datasets/{dataset}/tables/{table})- Convenience methods:
getProjectId(),getDatasetId(),getTableId()
Use case:
This is particularly useful for:
- Error monitoring and alerting systems
- Debugging write failures in multi-table pipelines
- Logging and auditing
Related:
- Consistent with existing
BigQueryInsertErrorAPI - Uses standard
TableDestination.getTableUrn()format
Additional context:
I'm willing to contribute a PR for this enhancement.
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner