Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

RyanMarten · 2025-01-21T09:16:05Z

Still seeing rate limits on anthropic because of

OTPM rate limits are estimated based on max_tokens at the beginning of each request, and the estimate is adjusted at the end of the request to reflect the actual number of output tokens used. If you’re hitting OTPM limits earlier than expected, try reducing max_tokens to better approximate the size of your completions.

https://docs.anthropic.com/en/api/rate-limits

We need to assume max_tokens of the model when we send, but can unblock once we get a response

@adamoptimizer

Originally posted by @RyanMarten in #373 (comment)

The text was updated successfully, but these errors were encountered:

adamoptimizer self-assigned this Jan 21, 2025

adamoptimizer mentioned this issue Jan 21, 2025

ref: block capacity by max_tokens for anthropic online #387

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

RyanMarten commented Jan 21, 2025 •

edited

Loading

Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

Anthropic OTPM limits estimate wiht max_tokens at the start of requests #386

Comments

RyanMarten commented Jan 21, 2025 • edited Loading

RyanMarten commented Jan 21, 2025 •

edited

Loading