-
Notifications
You must be signed in to change notification settings - Fork 8
feat(core): Multiple endpoint types on benchmark and endpoint end #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test cc7936d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments :)
| benchmark_type_combination = [benchmark_type_combination] | ||
|
|
||
| if model_types.issuperset(set(benchmark_type_combination)): | ||
| is_target_compatible = True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only keep information about last successful check in this variable - is this not a potential issue? To be completely honest with you, I have a hard time understanding this part, could you elaborate on that? :D
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmark requirements are strict. From a benchmark perspective, you define a list of sets that are accepted. For example, let's image that your benchmark is a VLM benchmark but accepts both base and chat models. Then, on benchmark side, you would define evaluation.config.supported_endpoint_types as a list of :
config:
supported_endpoint_types:
- [vlm, chat]
- [vlm, completions]This means, that your model must have all capabilities in at least one of these sets (those sets are implemented as lists but that does not matter here IMO)
So if we define on model endpoint capabilities it must be a superset of at least one element from supported_endpoint_types list.
Let's have a look at examples.
- Model
[chat]- netiher a superset of[vlm, chat]nor[vlm, completions] - Model
[chat, vlm]- superset of[vlm, chat]but not[vlm, completions] - Model
[chat, vlm, completions]- superset of both[vlm, chat]and[vlm, completions] - Model
[chat, vlm, completions, audio]- superset of both[vlm, chat]and[vlm, completions]
It does not matter which one is accepted - it only matters that at least one is compatible. So we keep the last one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do see a crash where we have multiple combinations compatible - which one is passed down to the evaluation harness?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm raising an error on multiple compatible
Signed-off-by: Tomasz Grzegorzek <[email protected]>
…combination Signed-off-by: Tomasz Grzegorzek <[email protected]>
82c512c to
e2db2fb
Compare
Signed-off-by: Tomasz Grzegorzek <[email protected]>
|
/ok to test f781e54 |
Signed-off-by: Tomasz Grzegorzek <[email protected]>
|
Closing as new ideas for model types - tags emerges |
This MR introduces multiple type definition on both benchmark and endpoint ends. With these changes, we can define a list of required types on benchmark end. For example:
[chat, vlm]. On the endpoint side, we define capabilites which must be a superset of required typed. More information is provided in the documentation