Skip to content

Feat/max error rate #171

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 40 commits into
base: main
Choose a base branch
from
Open

Feat/max error rate #171

wants to merge 40 commits into from

Conversation

markVaykhansky
Copy link
Collaborator

@markVaykhansky markVaykhansky commented May 21, 2025

Addressing this issue:
#105

--max-error-rate flag was added.
It is only applicable in either or the the following scenarios:

  1. Fixed length dataset
  2. max_requests is set
  3. rate_type is constant and max_duration is set
  • If error rate is not applicable (i.e non of the above scenarios) we just ignore it (throwing an error would break sweep for example)
  • Implemented using multiprocessing.Event which is used to trigger a shutdown once we hit max_error_rate.
  • max_error_rate - added to report
  • error_rate in the final report is calculated as total errored / total requests finalized i.e we exclude incomplete requests

@markVaykhansky markVaykhansky self-assigned this May 21, 2025

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167220831/artifacts/3170182036.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15167280131/artifacts/3170205369.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15180648310/artifacts/3176050662.
They will be retained for up to 30 days.

@markVaykhansky markVaykhansky marked this pull request as ready for review May 22, 2025 10:58
@markVaykhansky markVaykhansky requested a review from markurtz May 22, 2025 11:02

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15186822925/artifacts/3176853207.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15373675841/artifacts/3236044854.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15374892878/artifacts/3236270734.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15375026939/artifacts/3236298177.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15411127841/artifacts/3247570190.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15440208848/artifacts/3257998536.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15440444689/artifacts/3258080259.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15442589273/artifacts/3258825312.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15458689300/artifacts/3264746327.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15460007795/artifacts/3265136314.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15462798203/artifacts/3266031840.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15465932166/artifacts/3267086597.
They will be retained for up to 30 days.

📦 Build Artifacts Available
The build artifacts (.whl and .tar.gz) have been successfully generated and are available for download: https://github.com/neuralmagic/guidellm/actions/runs/15465961033/artifacts/3267096068.
They will be retained for up to 30 days.

@markVaykhansky
Copy link
Collaborator Author

I've changed the logic as suggested to have a single termination method which is a shutdown event.
However, this did not go a smoothly as expected:

After implementing the code like you suggested above I ran into an issue with get_request. After a shutdown was initiated and the _wait_for_shutdown task raised an exception the code moved to cancel the pending task i.e the request sending task. However, it was stuck indefinitely cause the logic there's asyncio.to_thread(requests_queue.get) which is blocking. To resolve the issue I've re-introduced the sleep_intermittently logic which check the shutdown_event periodically inside the get_request. Then I ran into another issue, after sending the shutdown event there's a race condition between the asyncio.wait and checking for errors in get_request, to resolve the issue I've created a separate internal_shutdown_event which is set after the task.cancel().

Another super weird thing that happened is the raising specifically asyncio.CancellationError in _wait _for_shutdown did not trigger the return of asyncio.wait with FIRST_EXCEPTION and it was stuck forever, I'm not sure why couldn't find a reasonable explanation. Ended up throwing a custom error instead.

Also, I added the context of last X request as suggested but I feel that for the scenario where we do know the exact amount, i.e const rate + fixed duration, it is best to look at the entire run.

@@ -4,40 +4,40 @@ repos:
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- repo: https://github.com/astral-sh/ruff-pre-commit
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll need these reenabled. If you ever need to push and skip recommit, you can pass in the --no-verify flag with your git commit command

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also @sjmonson just landed a diff that might clear up some of the issues you were seeing

@@ -147,6 +147,8 @@ The `guidellm benchmark` command is used to run benchmarks against a generative

- `--max-requests`: Sets the maximum number of requests for each benchmark run. If not provided, the benchmark will run until `--max-seconds` is reached or the dataset is exhausted.

- `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- `--max-error-rate`: The maximum error rate after which a benchmark will stop. Applicable only for finite deterministic scenarios i.e `rate_type` is `constant` and `--max-seconds` exists OR `--max-requests` exists OR the dataset is finite. If `--max-error-rate` is `None` or not applicable, benchmarks will continue regardless of error rate.
- `--max-error`: The maximum error rate after which a benchmark will stop. Can either be a rate i.e 0 < rate < 1 or constant number. If rate is given and rate_type is 'constant' and 'max_seconds' exists then the rate will be calculated as part of the total expected requests counts i.e rate * duration. If rate is given and number of requests is not pre-determined than a context window of the last requests will be looked at. Context window size is configurable under GUIDELLM__ERROR_CHECK_WINDOW_SIZE. If a number above 1 is given than we just count the total number of error and check if it's above the threshold.

@@ -46,12 +50,15 @@ class SchedulerRunInfo(StandardBaseModel):
end_number: float
processes: int
strategy: SchedulingStrategy
last_requests_statuses: deque[RequestStatus]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can simplify the logic here so we don't need to add this in especially since this list can be come very long and it is not json serializable so would break any pydantic serialization in the event someone wants to save that state. See note later down where that logic is located

"RequestLoader",
"RequestLoaderDescription",
]


class GetInfiniteDatasetLengthError(Exception):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this exception and logic is no longer needed since if we can get the length then we go through a specific logic route and if we can't, then we fallback on the window

if max_duration is not None and max_duration < 0:
raise ValueError(f"Invalid max_duration: {max_duration}")
self._validate_scheduler_params(
scheduling_strategy, max_duration, max_error, max_number
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: could we keep the arg order the same as the surrounding function? Makes it a bit easier to track especially if additions com in later

)
asyncio.create_task(self.send_result(results_queue, result))
asyncio.create_task(self.send_result(results_queue, request_scheduled_result))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch on potential threading issues with these. Rather than polluting the local namespace with a bunch of vars that aren't used later, though, could we add the WorkerProcessResult to pass directly to this call function?

process_request := await self.get_request(requests_queue)
) is not None:
dequeued_time = time.time()
# We are using a separate internal event
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my note on fixes for get_request so we don't hit this race condition ideally and can rely on the shutdown event here

process_id: int,
max_concurrency: Optional[int] = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change this out so it isn't optional? I think we had a change either pushed in or queued for that and that way we don't need to validate it for async. I believe the changes are in so that for sync, it still returns a max_concurrency there as well

internal_shutdown_event.set()
try: # noqa: SIM105
await task
except asyncio.CancelledError:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are raising a different error in the shutdown task, there could be a race condition here where an error occurs and shutdown is set before we call await causing an uncaught exception here

while not shutdown_event.is_set(): # noqa: ASYNC110
await asyncio.sleep(shutdown_poll_interval)

# Raising asyncio.CancelledError instead would
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the correct way where we raise a specific error signaling this which we can track and catch. Can we remove the comments here since this is pretty self explanatory based on the name of the exception?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants