Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: do not create many redis subscriptions per token - fast yielding background #4419

Draft
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

benedikt-bartscher
Copy link
Contributor

No description provided.

@benedikt-bartscher
Copy link
Contributor Author

@masenf any idea why fast yielding background tasks seem to lock up after some state updates?

@benedikt-bartscher
Copy link
Contributor Author

@masenf i accidentally pushed unfinished changes and forgot to mention that the test needs to be run with redis

bonus: properly cleanup StateManager connections on disconnect
@benedikt-bartscher benedikt-bartscher changed the title test fast yielding in background task fix: do not create many redis subscriptions per token - fast yielding background Nov 23, 2024
@benedikt-bartscher benedikt-bartscher marked this pull request as ready for review November 23, 2024 20:04
@benedikt-bartscher
Copy link
Contributor Author

benedikt-bartscher commented Nov 24, 2024

I pushed 389f4c7 which fixes the issue.
Another idea to improve this even further: one shared subscription per token and worker (if I am not mistaken, we basically need to lock a token to a worker on redis, further locking can be done with asyncio). Reuse the subscription and release/garbage collect it after a delay.

However, I would merge this as-is, because it works and does not change the behavior much. I would then try implementing the other idea in a followup.

@@ -3475,6 +3502,19 @@ async def _wait_lock(self, lock_key: bytes, lock_id: bytes) -> None:
break
state_is_locked = await self._try_get_lock(lock_key, lock_id)

@override
async def disconnect(self, token: str):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably wrong. Clients can reconnect. Need to check if there are any unexpired redis tokens

Copy link
Collaborator

@masenf masenf Nov 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

edit: oops i thought this was the StateManagerMemory.disconnect func; either way we need to be careful about reconnects, which can be triggered by a simple page refresh

i had a draft comment on my in progress review calling out this function, but i suppose your testing found it first. we need to keep the in memory states around. we do have an internal bug to add an expiry for memory state manager, but it just hasn't been high priority because we assume no one is using memory state manager in production

Copy link
Contributor Author

@benedikt-bartscher benedikt-bartscher Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one asyncio Lock is enough, we won't need to garbage collect then

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to myself: subscribe to redis expire events to clean up locks, may be easier with #4459

@benedikt-bartscher benedikt-bartscher marked this pull request as draft November 26, 2024 23:17
@benedikt-bartscher
Copy link
Contributor Author

@masenf is it true that the current state locking does not guarantee fifo? Imo the longest waiting task should get the lock next

@benedikt-bartscher
Copy link
Contributor Author

I have a local draft where only one pubsub instance is created per reflex worker. All lock releases are distributed to the waiting tasks via asyncio locks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants