-
Notifications
You must be signed in to change notification settings - Fork 890
Rate Limit and Retry for Models #1734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@grll Nice work! Looks like we still have some failing tests, let me know if you'd like some guidance there. As for the code, can we please reduce the duplication a bit by moving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 this is nicely done.
@grll @DouweM what do you think about having a GCRA implementation as implemented by something like https://github.com/ZhuoZhuoCrayon/throttled-py ?
It's a more efficient variant of the leaky bucket algorithm without its downsides (ie: doesn't need a background "drip" process). See https://brandur.org/rate-limiting
Hey thanks for review, I will have a look was indeed struggling a bit with the test in the CI while it was working ok locally need to investigate a bit more. I can also have a look at the refactor you suggested! |
Hey thanks for the suggestion, the initial goal with using aiolimiter was to keep things as simple as possible but we could instead integrated with throttled-py in order to let user choose from various algorithm / redis backend as well. |
@grll I am the developer of throttled-py. throttled-py provides flexible current limiting strategies and storage backend configurations. I am very willing to participate in the discussion and implementation of this PR. |
Hey @ZhuoZhuoCrayon thanks for jumping in and great work in throttled-py. I am concerned about one thing here is that asyncio support is limited in the current implementatino of throttled-py. What is the current status around asyncio? |
@grll Features are in the development branch and a stable version will be released this weekend |
throttled-py asynchronous support has been officially released in v2.1.0. The core API is the same for synchronous and asynchronous code, just replace import asyncio
from throttled.asyncio import RateLimiterType, Throttled, rate_limiter, store, utils
throttle = Throttled(
using=RateLimiterType.GCRA.value,
quota=rate_limiter.per_sec(1_000, burst=1_000),
store=store.MemoryStore(),
)
async def call_api() -> bool:
result = await throttle.limit("/ping", cost=1)
return result.limited
async def main():
benchmark: utils.Benchmark = utils.Benchmark()
denied_num: int = sum(await benchmark.async_serial(call_api, 100_000))
print(f"❌ Denied: {denied_num} requests")
if __name__ == "__main__":
asyncio.run(main()) I think GCRA has lower performance overhead and smoother throttling. At the same time, making the storage backend optional can increase subsequent scalability. Anyway, adding throttling and retries to the model is very cool, thank you! |
@ZhuoZhuoCrayon Thanks a ton, throttled looks great! @grll Are you up for changing this PR to use throttled instead? |
Sure happy to! |
Add a quite popular feature request: Rate Limit and Retry support for models.
It is implemented as a wrapper like the InstrumentedModel. It leverages
aiolimiter
for a simple implementation of the leaking bucket algorithm for rate limit while retry leverage thetenacity
library.Usage: