Skip to content

Beam Python PubSubIO Does Not Support Ordered Key Publishing #36201

@mohamed-ali24

Description

@mohamed-ali24

What would you like to happen?

The WriteToPubSub transform in the Apache Beam Python SDK does not correctly handle the ordering_key when publishing messages to Pub/Sub. While the ordering_key is correctly serialized into a PubsubMessage protobuf, it is not passed to the publish() method of the PublisherClient. As a result, message ordering is not preserved, even when an ordering_key is specified.

The issue lies in the _flush method of the _PubSubWriteDoFn class, which omits the ordering_key when calling self._pub_client.publish() here or here for

The publish method in _PubSubWriteDoFn only passes data and attributes to the underlying client, and is missing the ordering_key:

# From apache_beam/io/gcp/pubsub.py

if self.with_attributes and pubsub_msg.attributes:
  future = self._pub_client.publish(
    self._topic, pubsub_msg.data, **pubsub_msg.attributes)
else:
  future = self._pub_client.publish(self._topic, pubsub_msg.data)

Issue Priority

Priority: 2 (default / most feature requests should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions