Skip to content

Conversation

pskiran1
Copy link
Member

@pskiran1 pskiran1 commented Oct 13, 2025

What does the PR do?

This PR adds testing for a new parameter max_ensemble_inflight_responses.
Added ensemble_backpressure_test.py with custom decoupled producer and slow consumer models to validate the new feature.

Related PR: triton-inference-server/core#455

Checklist

  • PR title reflects the change and is of format <commit_type>: <Title>
  • Changes are described in the pull request.
  • Related issues are referenced.
  • Populated github labels field
  • Added test plan and verified test passes.
  • Verified that the PR passes existing CI.
  • Verified copyright is correct on all changed files.
  • Added succinct git squash message before merging ref.
  • All template sections are filled out.
  • Optional: Additional screenshots for behavior/output changes with before/after.

Commit Type:

Check the conventional commit type
box here and add the label to the github PR.

  • build
  • ci
  • docs
  • feat
  • fix
  • perf
  • refactor
  • revert
  • style
  • test

Related PRs:

Where should the reviewer start?

Test plan:

  • CI Pipeline ID:

Caveats:

Background

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

@pskiran1 pskiran1 requested a review from Copilot October 13, 2025 17:24
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds support for a new max_ensemble_inflight_responses parameter to ensemble models for preventing unbounded memory growth in scenarios with decoupled models and slow consumers.

  • Implements backpressure mechanism to limit concurrent responses in ensemble pipelines
  • Adds comprehensive test coverage including valid/invalid parameter validation
  • Creates new test models for decoupled producer and slow consumer scenarios

Reviewed Changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
qa/L0_simple_ensemble/test.sh Adds backpressure testing logic and invalid parameter validation
qa/L0_simple_ensemble/models/slow_consumer/config.pbtxt Configures Python backend model with intentional processing delay
qa/L0_simple_ensemble/models/slow_consumer/1/model.py Implements model that adds 200ms delay per request to simulate slow processing
qa/L0_simple_ensemble/models/ensemble_enabled_max_inflight_responses/config.pbtxt Ensemble configuration with backpressure parameter set to 4
qa/L0_simple_ensemble/models/ensemble_disabled_max_inflight_responses/config.pbtxt Baseline ensemble configuration without backpressure parameter
qa/L0_simple_ensemble/models/decoupled_producer/config.pbtxt Configures decoupled Python model for multiple response generation
qa/L0_simple_ensemble/models/decoupled_producer/1/model.py Implements decoupled model that produces N responses based on input value
qa/L0_simple_ensemble/ensemble_backpressure_test.py Comprehensive test suite for backpressure functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@pskiran1 pskiran1 added the PR: ci Changes to our CI configuration files and scripts label Oct 13, 2025
@pskiran1 pskiran1 requested a review from Copilot October 13, 2025 18:04
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

kill $SERVER_PID
wait $SERVER_PID

# Test ensemble backpressure feature (max_ensemble_inflight_responses parameter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove /models and make model repository here. It's easier for debugging and can speed up model loading time for other tests.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied the above change to have individual model repositories for different groups of test cases.

@pskiran1 pskiran1 changed the title ci: Add support for max_ensemble_inflight_responses parameter to prevent unbounded memory growth in ensemble models ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025
@pskiran1 pskiran1 changed the title ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025
@pskiran1 pskiran1 changed the title ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models ci: Add support for max_inflight_responses parameter to prevent unbounded memory growth in ensemble models Oct 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PR: ci Changes to our CI configuration files and scripts

Development

Successfully merging this pull request may close these issues.

2 participants