-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add openai client backend #565
base: main
Are you sure you want to change the base?
Conversation
Test calling openai through curator using new "openai_client" backend
|
Test calling deepseek through openai client
|
Test calling deepseek through curator using openai_client backend (hello)
Finishes in 9s with the following response
|
Test calling deepseek through curator using openai_client backend (reasoning problem)
10 min for 15k tokens |
Testing hello with 50 RPM and 100 requests
Takes ~2 mins and getting around 44 RPM |
Testing s1 with 50 RPM and 100 requests
~17min with 6RPM |
Testing s1 with 100 RPM and 1,000 requests
25min and ~44RPM |
Testing 5,000 requests with 500 RPM
max in progress: 1600 stragglers and max retries being 10... only 6 in progress canceled after 47 minutes (since a couple were on their third try) |
The bottleneck here is the openai client. Started another one in parallel on a different machine and getting good RPM there 250 RPM. Use multiple clients in the backend? - will try implementing this. New backend param This just uses multiple curator instances (and then later you need to merge together)
|
- Add num_clients parameter to OnlineBackendParams and OnlineRequestProcessorConfig - Modify OpenAIClientOnlineRequestProcessor to initialize multiple clients - Implement round-robin client selection in call_single_request method - Add example file demonstrating multi-client usage with DeepSeek - Add unit tests for multi-client functionality This enhancement addresses bottlenecks in parallel request handling by allowing users to create multiple AsyncOpenAI clients that distribute requests in a round-robin fashion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
multiple clients, one curator
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!, please override the existing backend.
|
||
llm = Reasoner( | ||
model_name="deepseek-reasoner", | ||
backend="openai_client", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
openai vs openai_client
isn't it confusing?
_OPENAI_ALLOWED_IMAGE_SIZE_MB = 20 | ||
|
||
|
||
class OpenAIClientOnlineRequestProcessor(BaseOnlineRequestProcessor, OpenAIRequestMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please inherit from OpenAIOnlineProcessor, since there is too much redundancy with existing one.
lets only override changed functions.
@@ -0,0 +1,76 @@ | |||
import unittest | |||
from unittest.mock import AsyncMock, MagicMock, patch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets Integration test it?
since other backends are also tested like that.
having this as UT will be bit weird.
Test ping