feat(core): Multiple endpoint types on benchmark and endpoint end #277

prokotg · 2025-10-06T15:43:27Z

This MR introduces multiple type definition on both benchmark and endpoint ends. With these changes, we can define a list of required types on benchmark end. For example: [chat, vlm]. On the endpoint side, we define capabilites which must be a superset of required typed. More information is provided in the documentation

copy-pr-bot · 2025-10-06T15:43:31Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

prokotg · 2025-10-06T15:43:47Z

/ok to test cc7936d

wprazuch

Added some comments :)

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

wprazuch · 2025-10-08T07:32:10Z

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

+                benchmark_type_combination = [benchmark_type_combination]
+
+            if model_types.issuperset(set(benchmark_type_combination)):
+                is_target_compatible = True


We only keep information about last successful check in this variable - is this not a potential issue? To be completely honest with you, I have a hard time understanding this part, could you elaborate on that? :D

Benchmark requirements are strict. From a benchmark perspective, you define a list of sets that are accepted. For example, let's image that your benchmark is a VLM benchmark but accepts both base and chat models. Then, on benchmark side, you would define evaluation.config.supported_endpoint_types as a list of :

config: supported_endpoint_types: - [vlm, chat] - [vlm, completions]

This means, that your model must have all capabilities in at least one of these sets (those sets are implemented as lists but that does not matter here IMO)

So if we define on model endpoint capabilities it must be a superset of at least one element from supported_endpoint_types list.

Let's have a look at examples.

Model [chat] - netiher a superset of [vlm, chat] nor [vlm, completions]

Model [chat, vlm] - superset of [vlm, chat] but not [vlm, completions]

Model [chat, vlm, completions] - superset of both [vlm, chat] and [vlm, completions]

Model [chat, vlm, completions, audio] - superset of both [vlm, chat] and [vlm, completions]

It does not matter which one is accepted - it only matters that at least one is compatible. So we keep the last one

But I do see a crash where we have multiple combinations compatible - which one is passed down to the evaluation harness?

I'm raising an error on multiple compatible

packages/nemo-evaluator/src/nemo_evaluator/core/input.py

Signed-off-by: Tomasz Grzegorzek <[email protected]>

…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>

Signed-off-by: Tomasz Grzegorzek <[email protected]>

prokotg · 2025-10-22T11:22:18Z

/ok to test f781e54

Signed-off-by: Tomasz Grzegorzek <[email protected]>

prokotg · 2025-10-28T08:37:30Z

Closing as new ideas for model types - tags emerges

prokotg requested review from a team as code owners October 6, 2025 15:43

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to test October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:44 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 6, 2025 15:46 Inactive

wprazuch reviewed Oct 8, 2025

View reviewed changes

prokotg added 2 commits October 22, 2025 12:26

feat(core): Multiple endpoint types on benchmark and endpoint end

72a316d

Signed-off-by: Tomasz Grzegorzek <[email protected]>

fix(core): raise on more than one compatible endpoint-benchmark type …

e2db2fb

…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>

prokotg force-pushed the feature/multiple-endpoint-types branch from 82c512c to e2db2fb Compare October 22, 2025 11:13

github-actions bot added nemo-evaluator tests labels Oct 22, 2025

fix(core): check endpoint type earlier

f781e54

Signed-off-by: Tomasz Grzegorzek <[email protected]>

copy-pr-bot bot temporarily deployed to test October 22, 2025 11:23 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 22, 2025 11:23 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 22, 2025 11:25 Failure

fix(core): unpack endpoint types before processing

4c9f725

Signed-off-by: Tomasz Grzegorzek <[email protected]>

copy-pr-bot bot temporarily deployed to test October 22, 2025 11:53 Inactive

copy-pr-bot bot temporarily deployed to nemo-ci October 22, 2025 11:54 Inactive

copy-pr-bot bot had a problem deploying to nemo-ci October 22, 2025 11:56 Failure

prokotg closed this Oct 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(core): Multiple endpoint types on benchmark and endpoint end #277

feat(core): Multiple endpoint types on benchmark and endpoint end #277

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

wprazuch left a comment

Uh oh!

Uh oh!

wprazuch Oct 8, 2025

Uh oh!

prokotg Oct 8, 2025 •

edited

Loading

Uh oh!

prokotg Oct 8, 2025

Uh oh!

prokotg Oct 22, 2025

Uh oh!

Uh oh!

prokotg commented Oct 22, 2025

Uh oh!

prokotg commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat(core): Multiple endpoint types on benchmark and endpoint end #277

feat(core): Multiple endpoint types on benchmark and endpoint end #277

Uh oh!

Conversation

prokotg commented Oct 6, 2025

Uh oh!

copy-pr-bot bot commented Oct 6, 2025

Uh oh!

prokotg commented Oct 6, 2025

Uh oh!

wprazuch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wprazuch Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 8, 2025

Choose a reason for hiding this comment

Uh oh!

prokotg Oct 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

prokotg commented Oct 22, 2025

Uh oh!

prokotg commented Oct 28, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

prokotg Oct 8, 2025 •

edited

Loading