-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Open
Labels
Description
What would you like to happen?
The WriteToPubSub transform in the Apache Beam Python SDK does not correctly handle the ordering_key when publishing messages to Pub/Sub. While the ordering_key is correctly serialized into a PubsubMessage protobuf, it is not passed to the publish() method of the PublisherClient. As a result, message ordering is not preserved, even when an ordering_key is specified.
The issue lies in the _flush method of the _PubSubWriteDoFn class, which omits the ordering_key when calling self._pub_client.publish() here or here for
The publish method in _PubSubWriteDoFn only passes data and attributes to the underlying client, and is missing the ordering_key:
# From apache_beam/io/gcp/pubsub.py
if self.with_attributes and pubsub_msg.attributes:
future = self._pub_client.publish(
self._topic, pubsub_msg.data, **pubsub_msg.attributes)
else:
future = self._pub_client.publish(self._topic, pubsub_msg.data)
Issue Priority
Priority: 2 (default / most feature requests should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner