You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm thinking similarly to litellm, we can store this in a json (easily readable, and editable by the community). They store fields relevant to sending single request:
We should store fields relevant to sending many requests:
{
"batch_available": True
"batch_discount_factor": 0.5
"max_requests_per_batch": Z
"max_batch_file_size": W
"rate_limit_strategy": "separate" # also can be concurrent tokens / etc.
"max_input_tokens_per_minute": Y
"max_output_tokens_per_minute": X
"max_requests_per_minute": V
}
Sometimes this information is contained in the headers, but most of the time it is not. So the user has to manually find these values in the documentation and then manually set them in backend_params. We can do this automatically instead for the user by storing these values in this file and loading them in.
For example, gemini doesn't provide rate limit information in the headers. This way when the user specifies gemini, they get a reasonable default (instead of our current very conservative super low one of 10 RPM). See how we currently have to specify the values in examples/litellm-recipe-generation/litellm_recipe_prompting.py
There is an obviously complicating factor of different tiers / custom rate limit values for users. But those can be manually defined by the user (as we require the user to do now, even if they are on a standard tier). The point of this is to have reasonable defaults.
They also have a nice website where you can search and compare providers (which we can do too for bulk inference and batching): https://models.litellm.ai/
The text was updated successfully, but these errors were encountered:
I'm thinking similarly to litellm, we can store this in a json (easily readable, and editable by the community). They store fields relevant to sending single request:
We should store fields relevant to sending many requests:
Sometimes this information is contained in the headers, but most of the time it is not. So the user has to manually find these values in the documentation and then manually set them in
backend_params
. We can do this automatically instead for the user by storing these values in this file and loading them in.For example, gemini doesn't provide rate limit information in the headers. This way when the user specifies gemini, they get a reasonable default (instead of our current very conservative super low one of 10 RPM). See how we currently have to specify the values in
examples/litellm-recipe-generation/litellm_recipe_prompting.py
curator/examples/litellm-recipe-generation/litellm_recipe_prompting.py
Lines 49 to 59 in b083f69
There is an obviously complicating factor of different tiers / custom rate limit values for users. But those can be manually defined by the user (as we require the user to do now, even if they are on a standard tier). The point of this is to have reasonable defaults.
The litellm file is here:
https://github.com/BerriAI/litellm/blob/main/model_prices_and_context_window.json
They also have a nice website where you can search and compare providers (which we can do too for bulk inference and batching):
https://models.litellm.ai/
The text was updated successfully, but these errors were encountered: