-
Notifications
You must be signed in to change notification settings - Fork 139
Wait for the Dapr health check asynchronously in aio/clients/grpc/subscription.py to avoid blocking, ensuring the asyncio gRPC stream can close properly. #839
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
4ede53a to
1157957
Compare
5a63471 to
edba7ea
Compare
Switch Dapr health check from blocking call to async call to avoid blocking the event loop in async environments Signed-off-by: mingsing <[email protected]>
Signed-off-by: mingsing <[email protected]>
Signed-off-by: mingsing <[email protected]>
Signed-off-by: mingsing <[email protected]>
edba7ea to
afe00ee
Compare
Signed-off-by: mingsing <[email protected]>
Signed-off-by: mingsing <[email protected]>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #839 +/- ##
==========================================
+ Coverage 86.63% 87.35% +0.72%
==========================================
Files 84 94 +10
Lines 4473 6176 +1703
==========================================
+ Hits 3875 5395 +1520
- Misses 598 781 +183 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| return SubscriptionMessage(message.event_message) | ||
| except AioRpcError as e: | ||
| if e.code() == StatusCode.UNAVAILABLE: | ||
| if e.code() == StatusCode.UNAVAILABLE or e.code() == StatusCode.UNKNOWN: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatusCode.UNKNOWN shouldn't be retriable, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
StatusCode.UNKNOWNshouldn't be retriable, no?
Actually, the exception code is not always recognized as StatusCode.UNAVAILABLE when Dapr shuts down immediately; instead, it may appear as StatusCode.UNKNOWN, as seen in the error message previously encountered: gRPC error while reading from stream: api server closed, Status Code: StatusCode.UNKNOWN. Attempting to reconnect....
The synchronous subscription implementation also retries upon encountering StatusCode.UNKNOWN, see: e7c85ce#diff-c7dbe5c0c85056e25bd5d884ad10147a7d12f4b8e9aac52513504a171404093e (dapr/clients/grpc/subscription.py)
|
|
||
| DaprGrpcClientAsync.get_credentials = replacement_get_credentials_func | ||
| DaprHealthAsync.get_ssl_context = replacement_get_health_context | ||
| DaprHealth.get_ssl_context = replacement_get_health_context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need the sync version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we still need the sync version?
See dapr/clients/grpc/subscription.py: the init function of DaprGrpcClientAsync invokes the synchronous wait_until_ready. I'm not sure whether it's a good practice to implement an asynchronous factory method to create a DaprGrpcClientAsync instance, since init can only be synchronous.
| while True: | ||
| try: | ||
| req = urllib.request.Request(health_url, headers=headers) | ||
| with urllib.request.urlopen(req, context=DaprHealth.get_ssl_context()) as response: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use an async http lib like aiohttp?
Switch Dapr health check from blocking call to async call to avoid blocking the event loop in async environments
Description
This PR addresses a critical issue in the Dapr Python SDK's async pubsub subscription functionality where the gRPC stream fails to close properly during reconnection, causing the first response after reconnection to be lost. The root cause is the synchronous health check blocking the async event loop, which prevents the _StreamRequestMixin._done_writing() method from executing properly.
Details:
Solution:
The fix makes the health check asynchronous by using loop.run_in_executor(), which:
Issue reference
We strive to have all PR being opened based on an issue, where the problem or feature have been discussed prior to implementation.
Please reference the issue this PR will close: #[issue number]
Checklist
Please make sure you've completed the relevant tasks for this PR, out of the following list: