Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generator-style run_batch #303

Closed
ValeKnappich opened this issue Mar 14, 2024 · 1 comment · May be fixed by #2513
Closed

generator-style run_batch #303

ValeKnappich opened this issue Mar 14, 2024 · 1 comment · May be fixed by #2513
Labels
enhancement New feature or request inactive

Comments

@ValeKnappich
Copy link

run_batch is great for performance but you also give up control during the iteration. For instance, you might want to save the results to disk as soon as they come in while also getting the performance benefits of run_batch. A simple solution is to modify run_batch to act as a generator that yields a result as soon as its available.

Slow but full control with run

args = [{...}, {...}, ...]
for arg in args:
   state = my_program.run(**arg)
   save_result(state, arg)

Fast but no control during iteration with run_batch

args = [{...}, {...}, ...]
states = my_program.run_batch(args)
# save results in the end
for arg, state in zip(args, states):
   save_result(state, arg)

Fast with full control with generator-style run_batch

args = [{...}, {...}, ...]
for arg, state in my_program.run_batch(args):
   save_result(state, arg)

The change is quite straight-forward, but I currently don't have time to set up a PR. I just monkeypatched it so that run_batch acts as a generator. Just a couple of considerations if someone would take the time to create the PR:

  • I just replaced the behavior alltogether:
    sgl.lang.interpreter.run_program_batch = run_program_batch_generator
    one should probably just add a flag in run_batch to switch between the two variants
  • This implementation does not necessarily return the results in order. Since the arguments are also returned, this is not an issue for me, and it gives a more precise progress estimate. Yielding them in order would also be simple to implement
  • I removed the progressbar functionality here because the generator-style allows you to just control the progressbar yourself tqdm(run_batch(args), total=len(args))
def run_program_batch_generator(
    program,
    backend,
    batch_arguments,
    default_sampling_para,
    num_threads,
    progress_bar,
):
    if hasattr(backend, "endpoint"):
        backend = backend.endpoint

    # Extract prefix by tracing and cache it
    if len(batch_arguments) > 1:
        pin_program(program, backend)

    # Determine the number of threads to use
    if num_threads == "auto":
        num_threads = max(96, multiprocessing.cpu_count() * 16)
    num_threads = min(num_threads, len(batch_arguments))

    # Execute each run_program call and yield results as they become available
    if num_threads == 1:
        for arguments in batch_arguments:
            yield run_program(
                program,
                backend,
                (),
                arguments,
                default_sampling_para,
                False,
                True,
            )
    else:
        with concurrent.futures.ThreadPoolExecutor(num_threads) as executor:
            # Use a dictionary to map futures to their corresponding arguments
            future_to_arguments = {
                executor.submit(
                    run_program,
                    program,
                    backend,
                    (),
                    arguments,
                    default_sampling_para,
                    False,
                    True,
                ): arguments
                for arguments in batch_arguments
            }

            # Asynchronously yield results as they complete
            for future in concurrent.futures.as_completed(future_to_arguments):
                yield future_to_arguments[future], future.result()
@hnyls2002 hnyls2002 added the enhancement New feature or request label Apr 7, 2024
Copy link

This issue has been automatically closed due to inactivity. Please feel free to reopen it if needed.

xingyaoww pushed a commit to xingyaoww/sglang that referenced this issue Dec 18, 2024
This change adds a new generator_style parameter to run_batch that allows
yielding results as they become available, while maintaining the performance
benefits of batch processing. This is particularly useful when you want to
process results as soon as they are ready, for example to save them to disk.

When generator_style=True, run_batch yields tuples of (arguments, result)
as they become available, instead of returning a list at the end.

Fixes sgl-project#303
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request inactive
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants