You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
run_batch is great for performance but you also give up control during the iteration. For instance, you might want to save the results to disk as soon as they come in while also getting the performance benefits of run_batch. A simple solution is to modify run_batch to act as a generator that yields a result as soon as its available.
The change is quite straight-forward, but I currently don't have time to set up a PR. I just monkeypatched it so that run_batch acts as a generator. Just a couple of considerations if someone would take the time to create the PR:
I just replaced the behavior alltogether: sgl.lang.interpreter.run_program_batch = run_program_batch_generator
one should probably just add a flag in run_batch to switch between the two variants
This implementation does not necessarily return the results in order. Since the arguments are also returned, this is not an issue for me, and it gives a more precise progress estimate. Yielding them in order would also be simple to implement
I removed the progressbar functionality here because the generator-style allows you to just control the progressbar yourself tqdm(run_batch(args), total=len(args))
defrun_program_batch_generator(
program,
backend,
batch_arguments,
default_sampling_para,
num_threads,
progress_bar,
):
ifhasattr(backend, "endpoint"):
backend=backend.endpoint# Extract prefix by tracing and cache itiflen(batch_arguments) >1:
pin_program(program, backend)
# Determine the number of threads to useifnum_threads=="auto":
num_threads=max(96, multiprocessing.cpu_count() *16)
num_threads=min(num_threads, len(batch_arguments))
# Execute each run_program call and yield results as they become availableifnum_threads==1:
forargumentsinbatch_arguments:
yieldrun_program(
program,
backend,
(),
arguments,
default_sampling_para,
False,
True,
)
else:
withconcurrent.futures.ThreadPoolExecutor(num_threads) asexecutor:
# Use a dictionary to map futures to their corresponding argumentsfuture_to_arguments= {
executor.submit(
run_program,
program,
backend,
(),
arguments,
default_sampling_para,
False,
True,
): argumentsforargumentsinbatch_arguments
}
# Asynchronously yield results as they completeforfutureinconcurrent.futures.as_completed(future_to_arguments):
yieldfuture_to_arguments[future], future.result()
The text was updated successfully, but these errors were encountered:
This change adds a new generator_style parameter to run_batch that allows
yielding results as they become available, while maintaining the performance
benefits of batch processing. This is particularly useful when you want to
process results as soon as they are ready, for example to save them to disk.
When generator_style=True, run_batch yields tuples of (arguments, result)
as they become available, instead of returning a list at the end.
Fixessgl-project#303
run_batch
is great for performance but you also give up control during the iteration. For instance, you might want to save the results to disk as soon as they come in while also getting the performance benefits ofrun_batch
. A simple solution is to modifyrun_batch
to act as a generator that yields a result as soon as its available.Slow but full control with
run
Fast but no control during iteration with
run_batch
Fast with full control with generator-style
run_batch
The change is quite straight-forward, but I currently don't have time to set up a PR. I just monkeypatched it so that
run_batch
acts as a generator. Just a couple of considerations if someone would take the time to create the PR:sgl.lang.interpreter.run_program_batch = run_program_batch_generator
one should probably just add a flag in
run_batch
to switch between the two variantstqdm(run_batch(args), total=len(args))
The text was updated successfully, but these errors were encountered: