Skip to content

Expand our latency benchmarking capabilities #87

@dskarbrevik

Description

@dskarbrevik

Currently we have a variety of UQ scoring methods for various scenarios. Due to the way these various scorers work, they have different runtime performance. Additionally due to the speeds of different LLMs and the nature of different user input spaces (text input space) actual runtime performance of UQLM can vary dramatically.

We imagine that a concern for users deciding to adopt UQLM in production workflows may be "how much latency will UQLM add to my application's LLM pipeline"?

Right now I see two angles where we can help a user with this concern:

  1. Help users more quickly understand the runtime performance they should expect in a variety of different scenarios

Possible solutions could include:

  • A chart on our readme or docs site that showcases a few experiments we ran and the results
  • A benchmarking/leaderboard section of the doc site (more mature version of the above solution)
  1. Give users a way to easily test runtime performance in their specific scenario (their UQLM scorer choices, their LLM, their data, their network bandwidth, etc.)

Possible solutions could include:

  • A notebook in /examples that guides a user through setting up and running a runtime performance experiment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions