-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Currently we have a variety of UQ scoring methods for various scenarios. Due to the way these various scorers work, they have different runtime performance. Additionally due to the speeds of different LLMs and the nature of different user input spaces (text input space) actual runtime performance of UQLM can vary dramatically.
We imagine that a concern for users deciding to adopt UQLM in production workflows may be "how much latency will UQLM add to my application's LLM pipeline"?
Right now I see two angles where we can help a user with this concern:
- Help users more quickly understand the runtime performance they should expect in a variety of different scenarios
Possible solutions could include:
- A chart on our readme or docs site that showcases a few experiments we ran and the results
- A benchmarking/leaderboard section of the doc site (more mature version of the above solution)
- Give users a way to easily test runtime performance in their specific scenario (their UQLM scorer choices, their LLM, their data, their network bandwidth, etc.)
Possible solutions could include:
- A notebook in /examples that guides a user through setting up and running a runtime performance experiment
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels