Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

a cost-to-be-correct meter for model selection #5869

Open
kvchitrapu opened this issue Dec 27, 2024 · 3 comments
Open

a cost-to-be-correct meter for model selection #5869

kvchitrapu opened this issue Dec 27, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@kvchitrapu
Copy link
Contributor

kvchitrapu commented Dec 27, 2024

What problem or use case are you trying to solve?

Users need a way to balance cost and accuracy when choosing AI models for their specific tasks. For example, someone might prefer paying for a high-accuracy model for unfamiliar languages like TrueScript but would opt for a less expensive, lower-tier model for Python, where they can handle corrections themselves.

Describe the UX of the solution you'd like

A clear, intuitive "Cost-to-be-Correct Meter" within the UI that shows:

  1. The estimated cost of using a particular model.
  2. A relative accuracy score or confidence level.
  3. Guidance or recommendations based on the user's chosen parameters (e.g., language familiarity or budget).
    This should allow users to compare options and make informed decisions directly from the interface.

Do you have thoughts on the technical implementation?

  1. Integrate the meter into the model selection workflow.
  2. Pull cost and accuracy metrics dynamically from model data. Perhaps from existing SWE results run on the model?
  3. Allow users to input parameters like budget and level of familiarity with a language to generate personalized recommendations.

Describe alternatives you've considered

  1. Hardcoding static recommendations based on language or task.

This feature could enhance the user experience by reducing friction in selecting the right model while aligning costs with needs. It could also serve as a unique value proposition for the project.

@kvchitrapu kvchitrapu added the enhancement New feature or request label Dec 27, 2024
@neubig neubig self-assigned this Dec 28, 2024
@neubig
Copy link
Contributor

neubig commented Dec 28, 2024

I'll try to work on this (based on @xingyaoww 's spreadsheet) unless someone else would like to take it.

@BradKML
Copy link

BradKML commented Dec 29, 2024

Okay, so how do we determine which ones take priority to be put on the board? Does it include or exclude finetunes and specialized merges (see Sakana AI) that are not on conventional providers like OpenRouter? If it does, is it possible to include multiple benchmarks such that people can see how consistent it is (in case of benchmark hacking)?
Also as a secondary question: would OpenRouter be treated as a universal standard? Maybe HuggingFace models loaded on a rented GPU, and have that monitored instead?

@kvchitrapu
Copy link
Contributor Author

@neubig, I appreciate you taking this on! Happy to collaborate with you on developing this feature. While I have a conceptual understanding, it’d be helpful if you could provide a starting point and an outline for implementation. Let me know how I can assist.

@BradKML , as a starting point, priority could be guided by the OH-verified configurations, as @neubig mentioned using @xingyaoww 's spreadsheet. Once we’ve solidified usability, we can potential expand to include specialized merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants