Expand our latency benchmarking capabilities

Currently we have a variety of UQ scoring methods for various scenarios. Due to the way these various scorers work, they have different runtime performance. Additionally due to the speeds of different LLMs and the nature of different user input spaces (text input space) actual runtime performance of UQLM can vary dramatically.

We imagine that a concern for users deciding to adopt UQLM in production workflows may be "how much latency will UQLM add to my application's LLM pipeline"?

Right now I see two angles where we can help a user with this concern:

1) Help users more quickly understand the runtime performance they should expect in a variety of different scenarios

Possible solutions could include:
- A chart on our readme or docs site that showcases a few experiments we ran and the results
- A benchmarking/leaderboard section of the doc site (more mature version of the above solution)

2) Give users a way to easily test runtime performance in their specific scenario (their UQLM scorer choices, their LLM, their data, their network bandwidth, etc.)

Possible solutions could include:
- A notebook in /examples that guides a user through setting up and running a runtime performance experiment


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expand our latency benchmarking capabilities #87

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Expand our latency benchmarking capabilities #87

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions