Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

Open
RyanMarten opened this issue Jan 21, 2025 · 0 comments
Open
Assignees

Comments

@RyanMarten
Copy link
Contributor

RyanMarten commented Jan 21, 2025

Still seeing rate limits on anthropic because of

OTPM rate limits are estimated based on max_tokens at the beginning of each request, and the estimate is adjusted at the end of the request to reflect the actual number of output tokens used. If you’re hitting OTPM limits earlier than expected, try reducing max_tokens to better approximate the size of your completions.

https://docs.anthropic.com/en/api/rate-limits

We need to assume max_tokens of the model when we send, but can unblock once we get a response

@adamoptimizer

Originally posted by @RyanMarten in #373 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants