add openai client backend #565

RyanMarten · 2025-03-02T18:27:35Z

Test ping

curl https://api.deepseek.com/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer ${DEEPSEEK_API_KEY}" \
  -d '{
        "model": "deepseek-reasoner",
        "messages": [
          {"role": "system", "content": "You are a helpful assistant."},
          {"role": "user", "content": "Hello!"}
        ],
        "stream": false
      }'

RyanMarten · 2025-03-02T18:40:08Z

Test calling openai through curator using new "openai_client" backend

from bespokelabs import curator

# Make sure the backend works
llm = curator.LLM(model_name="gpt-4o-mini", backend="openai_client")
response = llm("Hello, world!")
print(response["response"])

RyanMarten · 2025-03-02T18:40:33Z

Test calling deepseek through openai client

# Make sure that the client works
# Answer for Hello very quickly
# Answer for the question 2m14s
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get("DEEPSEEK_API_KEY"), base_url="https://api.deepseek.com")

response = client.chat.completions.create(
    model="deepseek-reasoner",
    messages=[
        {"role": "system", "content": "You are a helpful assistant"},
        {"role": "user", "content": "Given a rational number, write it as a fraction in lowest terms and calculate the product of the resulting numerator and denominator. For how many rational numbers between 0 and 1 will $20_{}^{}!$ be the resulting product?"}, # noqa
        # {"role": "user", "content": "Hello"},
    ],
    stream=False,
)
print(response.choices[0].message.content)

RyanMarten · 2025-03-02T18:40:54Z

Test calling deepseek through curator using openai_client backend (hello)

from bespokelabs import curator
import os

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 5,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = llm("Hello")
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])

Finishes in 9s with the following response

REASONING: Okay, the user just said "Hello". I should respond in a friendly and welcoming manner. Let me make sure to keep it natural and not too formal. Maybe add a smiley to keep the tone light. Something like, "Hello! How can I assist you today? 😊" That should work. Wait, should I use an emoji here? The previous example did, so maybe it's okay. Alright, that's a good response.
SOLUTION: Hello! How can I assist you today? 😊

RyanMarten · 2025-03-02T18:47:46Z

Test calling deepseek through curator using openai_client backend (reasoning problem)

from bespokelabs import curator
import os
from datasets import load_dataset

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["question"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 5,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = load_dataset("simplescaling/s1K", split="train")
ds = ds.remove_columns(["thinking_trajectories", "cot", "attempt"])
ds = llm(ds.take(1))
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])

10 min for 15k tokens

RyanMarten · 2025-03-02T18:59:15Z

Testing hello with 50 RPM and 100 requests

from bespokelabs import curator
import os
from datasets import Dataset

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["question"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 50,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = Dataset.from_dict({"question": ["Hello"]*100})
ds = llm(ds) 
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])

Takes ~2 mins and getting around 44 RPM

RyanMarten · 2025-03-02T19:08:01Z

Testing s1 with 50 RPM and 100 requests

from bespokelabs import curator
import os
from datasets import load_dataset

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["question"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 50,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = load_dataset("simplescaling/s1K", split="train")
ds = ds.remove_columns(["thinking_trajectories", "cot", "attempt"])
ds = llm(ds.take(100))
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])

~17min with 6RPM

RyanMarten · 2025-03-02T19:26:27Z

Testing s1 with 100 RPM and 1,000 requests

from bespokelabs import curator
import os
from datasets import load_dataset

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["question"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 100,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = load_dataset("simplescaling/s1K", split="train")
ds = ds.remove_columns(["thinking_trajectories", "cot", "attempt"])
ds = llm(ds)
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])

25min and ~44RPM

RyanMarten · 2025-03-02T19:45:18Z

Testing 5,000 requests with 500 RPM

from bespokelabs import curator
import os
from datasets import load_dataset

class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["problem"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 500,
        "max_tokens_per_minute": 100_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
    },
)

ds = load_dataset("mlfoundations-dev/herorun1_code", split="train")
ds = llm(ds.take(5_000))
print("REASONING: ", ds[0]["deepseek_reasoning"])
print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])
ds.push_to_hub("mlfoundations-dev/herorun1_code-test")

max in progress: 1600

stragglers and max retries being 10... only 6 in progress canceled after 47 minutes (since a couple were on their third try)

RyanMarten · 2025-03-02T20:45:51Z

Testing 25,000 requests with 2,500 RPM

import os

from datasets import load_dataset

from bespokelabs import curator


class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["problem"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 2_500,
        "max_tokens_per_minute": 1_000_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
        "require_all_responses": False,
        "max_retries": 2,
    },
)

ds = load_dataset("mlfoundations-dev/herorun1_code", split="train")
ds = llm(ds.take(25_000))
# print("REASONING: ", ds[0]["deepseek_reasoning"])
# print("\n\nSOLUTION: ", ds[0]["deepseek_solution"])
ds.push_to_hub("mlfoundations-dev/herorun1_code-test")

I think here we are bumping up against the openai client's ability to send parallel requests.
in progress requests are increasing at an identical rate to max RPM 500.

        "require_all_responses": False,
        "max_retries": 2,

which will reduce the impact of length finish reason stragglers

same max concurrent of 1,600 or so
20 mins in seeing 255 RPM

RyanMarten · 2025-03-02T20:55:32Z

The bottleneck here is the openai client. Started another one in parallel on a different machine and getting good RPM there 250 RPM.

Use multiple clients in the backend? - will try implementing this. New backend param number_of_clients

This just uses multiple curator instances (and then later you need to merge together)

import os
import argparse
from datasets import load_dataset

from bespokelabs import curator


class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["problem"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


def main():
    # Parse command line arguments
    parser = argparse.ArgumentParser(description="Process a dataset with DeepSeek and shard it")
    parser.add_argument("--worker", type=int, required=True, help="Worker ID (0-indexed)")
    parser.add_argument("--global", type=int, dest="global_workers", required=True, 
                        help="Total number of workers")
    parser.add_argument("--dataset", type=str, required=True, 
                        help="Input dataset name (e.g., 'mlfoundations-dev/herorun1_code')")
    parser.add_argument("--output", type=str, required=False,
                        help="Output dataset name base (worker ID will be appended). Defaults to input dataset name with 'annotated' suffix.")
    args = parser.parse_args()
    
    # Validate arguments
    if args.worker < 0 or args.worker >= args.global_workers:
        raise ValueError(f"Worker ID must be between 0 and {args.global_workers-1}")
    
    if args.global_workers <= 0:
        raise ValueError("Total number of workers must be positive")
    
    # Initialize the LLM
    llm = Reasoner(
        model_name="deepseek-reasoner",
        backend="openai_client",
        generation_params={"temperature": 0.0},
        backend_params={
            "max_requests_per_minute": 500,
            "max_tokens_per_minute": 1_000_000_000,
            "base_url": "https://api.deepseek.com/",
            "api_key": os.environ.get("DEEPSEEK_API_KEY"),
            "require_all_responses": False,
            "max_retries": 2,
        },
    )
    
    # Load the dataset
    print(f"Loading dataset: {args.dataset}")
    ds = load_dataset(args.dataset, split="train")
    
    # Calculate shard size and indices
    total_examples = len(ds)
    shard_size = total_examples // args.global_workers
    start_idx = args.worker * shard_size
    end_idx = start_idx + shard_size if args.worker < args.global_workers - 1 else total_examples
    
    print(f"Worker {args.worker}/{args.global_workers}: Processing examples {start_idx} to {end_idx-1} (total: {end_idx-start_idx})")
    
    # Extract the shard
    ds_shard = ds.select(range(start_idx, end_idx))
    
    # Process the shard with the LLM
    processed_ds = llm(ds_shard)
    
    # Create output dataset name with worker ID
    output_name = f"{args.output}-{args.worker}" if args.output else f"{args.dataset}-annotated-{args.worker}"
    
    # Push to hub
    print(f"Pushing processed dataset to {output_name}")
    processed_ds.push_to_hub(output_name)
    print(f"Worker {args.worker} completed successfully")


if __name__ == "__main__":
    main()

- Add num_clients parameter to OnlineBackendParams and OnlineRequestProcessorConfig - Modify OpenAIClientOnlineRequestProcessor to initialize multiple clients - Implement round-robin client selection in call_single_request method - Add example file demonstrating multi-client usage with DeepSeek - Add unit tests for multi-client functionality This enhancement addresses bottlenecks in parallel request handling by allowing users to create multiple AsyncOpenAI clients that distribute requests in a round-robin fashion. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

RyanMarten · 2025-03-02T21:42:29Z

multiple clients, one curator

import os

from datasets import load_dataset

from bespokelabs import curator

# This example demonstrates using multiple OpenAI clients in parallel
# to overcome the bottleneck of parallel requests that a single client can send.
# The `num_clients` parameter creates multiple AsyncOpenAI clients that are used
# in a round-robin fashion to distribute requests.


class Reasoner(curator.LLM):
    """Curator class for reasoning."""

    return_completions_object = True

    def prompt(self, input):
        """Create a prompt for the LLM to reason about the problem."""
        return [{"role": "user", "content": input["problem"]}]

    def parse(self, input, response):
        """Parse the LLM response to extract reasoning and solution."""
        input["deepseek_reasoning"] = response["choices"][0]["message"]["reasoning_content"]
        input["deepseek_solution"] = response["choices"][0]["message"]["content"]
        return input


# Initialize the LLM with multiple clients for parallel request handling
llm = Reasoner(
    model_name="deepseek-reasoner",
    backend="openai_client",
    generation_params={"temperature": 0.0},
    backend_params={
        "max_requests_per_minute": 2_500,
        "max_tokens_per_minute": 1_000_000_000,
        "base_url": "https://api.deepseek.com/",
        "api_key": os.environ.get("DEEPSEEK_API_KEY"),
        "require_all_responses": False,
        "max_retries": 2,
        "num_clients": 2,  # Create 2 OpenAI clients for parallel requests
    },
)


ds = load_dataset("mlfoundations-dev/herorun1_code", split="train")
ds = llm(ds.select(range(25_000, 50_000)))
ds.push_to_hub("mlfoundations-dev/herorun1_code-second-25k")

kartik4949

Thanks!, please override the existing backend.

kartik4949 · 2025-03-03T19:50:31Z

examples/providers/deepseek_limit.py

+
+llm = Reasoner(
+    model_name="deepseek-reasoner",
+    backend="openai_client",


openai vs openai_client
isn't it confusing?

kartik4949 · 2025-03-03T19:53:38Z

src/bespokelabs/curator/request_processor/online/openai_client_online_request_processor.py

+_OPENAI_ALLOWED_IMAGE_SIZE_MB = 20
+
+
+class OpenAIClientOnlineRequestProcessor(BaseOnlineRequestProcessor, OpenAIRequestMixin):


Please inherit from OpenAIOnlineProcessor, since there is too much redundancy with existing one.

lets only override changed functions.

kartik4949 · 2025-03-03T19:58:20Z

tests/unittests/test_openai_client_multiple.py

@@ -0,0 +1,76 @@
+import unittest
+from unittest.mock import AsyncMock, MagicMock, patch


lets Integration test it?
since other backends are also tested like that.
having this as UT will be bit weird.

RyanMarten · 2025-03-06T19:07:26Z

800+ RPM with two clients

add openai client backend

2750120

RyanMarten closed this Mar 2, 2025

RyanMarten reopened this Mar 2, 2025

RyanMarten requested a review from kartik4949 March 2, 2025 19:22

RyanMarten closed this Mar 2, 2025

RyanMarten reopened this Mar 2, 2025

RyanMarten added 4 commits March 2, 2025 11:49

examples

1cd5cde

update examples

5b04b53

update requi

de70a7d

update requi

be99c9b

RyanMarten added 2 commits March 2, 2025 13:35

sharded

40bbec9

multiple clients example

c70c08f

RyanMarten mentioned this pull request Mar 2, 2025

Empty Message response from DeepSeek API #544

Closed

RyanMarten added 2 commits March 3, 2025 16:39

more clients

dfa4310

4 was too many

c71d4cc

kartik4949 requested changes Mar 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add openai client backend #565

add openai client backend #565

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

kartik4949 left a comment

kartik4949 Mar 3, 2025

kartik4949 Mar 3, 2025

kartik4949 Mar 3, 2025

RyanMarten commented Mar 6, 2025

		_OPENAI_ALLOWED_IMAGE_SIZE_MB = 20


		class OpenAIClientOnlineRequestProcessor(BaseOnlineRequestProcessor, OpenAIRequestMixin):

		@@ -0,0 +1,76 @@
		import unittest
		from unittest.mock import AsyncMock, MagicMock, patch

add openai client backend #565

Are you sure you want to change the base?

add openai client backend #565

Conversation

RyanMarten commented Mar 2, 2025 • edited Loading

RyanMarten commented Mar 2, 2025 • edited Loading

Test calling openai through curator using new "openai_client" backend

RyanMarten commented Mar 2, 2025 • edited Loading

Test calling deepseek through openai client

RyanMarten commented Mar 2, 2025 • edited Loading

Test calling deepseek through curator using openai_client backend (hello)

RyanMarten commented Mar 2, 2025 • edited Loading

Test calling deepseek through curator using openai_client backend (reasoning problem)

RyanMarten commented Mar 2, 2025 • edited Loading

Testing hello with 50 RPM and 100 requests

RyanMarten commented Mar 2, 2025 • edited Loading

Testing s1 with 50 RPM and 100 requests

RyanMarten commented Mar 2, 2025 • edited Loading

Testing s1 with 100 RPM and 1,000 requests

RyanMarten commented Mar 2, 2025 • edited Loading

Testing 5,000 requests with 500 RPM

RyanMarten commented Mar 2, 2025 • edited Loading

Testing 25,000 requests with 2,500 RPM

RyanMarten commented Mar 2, 2025 • edited Loading

RyanMarten commented Mar 2, 2025 • edited Loading

kartik4949 left a comment

Choose a reason for hiding this comment

kartik4949 Mar 3, 2025

Choose a reason for hiding this comment

kartik4949 Mar 3, 2025

Choose a reason for hiding this comment

kartik4949 Mar 3, 2025

Choose a reason for hiding this comment

RyanMarten commented Mar 6, 2025

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading

RyanMarten commented Mar 2, 2025 •

edited

Loading