You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Users need a way to balance cost and accuracy when choosing AI models for their specific tasks. For example, someone might prefer paying for a high-accuracy model for unfamiliar languages like TrueScript but would opt for a less expensive, lower-tier model for Python, where they can handle corrections themselves.
Describe the UX of the solution you'd like
A clear, intuitive "Cost-to-be-Correct Meter" within the UI that shows:
The estimated cost of using a particular model.
A relative accuracy score or confidence level.
Guidance or recommendations based on the user's chosen parameters (e.g., language familiarity or budget).
This should allow users to compare options and make informed decisions directly from the interface.
Do you have thoughts on the technical implementation?
Integrate the meter into the model selection workflow.
Pull cost and accuracy metrics dynamically from model data. Perhaps from existing SWE results run on the model?
Allow users to input parameters like budget and level of familiarity with a language to generate personalized recommendations.
Describe alternatives you've considered
Hardcoding static recommendations based on language or task.
This feature could enhance the user experience by reducing friction in selecting the right model while aligning costs with needs. It could also serve as a unique value proposition for the project.
The text was updated successfully, but these errors were encountered:
Okay, so how do we determine which ones take priority to be put on the board? Does it include or exclude finetunes and specialized merges (see Sakana AI) that are not on conventional providers like OpenRouter? If it does, is it possible to include multiple benchmarks such that people can see how consistent it is (in case of benchmark hacking)?
Also as a secondary question: would OpenRouter be treated as a universal standard? Maybe HuggingFace models loaded on a rented GPU, and have that monitored instead?
@neubig, I appreciate you taking this on! Happy to collaborate with you on developing this feature. While I have a conceptual understanding, it’d be helpful if you could provide a starting point and an outline for implementation. Let me know how I can assist.
@BradKML , as a starting point, priority could be guided by the OH-verified configurations, as @neubig mentioned using @xingyaoww 's spreadsheet. Once we’ve solidified usability, we can potential expand to include specialized merges.
What problem or use case are you trying to solve?
Users need a way to balance cost and accuracy when choosing AI models for their specific tasks. For example, someone might prefer paying for a high-accuracy model for unfamiliar languages like TrueScript but would opt for a less expensive, lower-tier model for Python, where they can handle corrections themselves.
Describe the UX of the solution you'd like
A clear, intuitive "Cost-to-be-Correct Meter" within the UI that shows:
This should allow users to compare options and make informed decisions directly from the interface.
Do you have thoughts on the technical implementation?
Describe alternatives you've considered
This feature could enhance the user experience by reducing friction in selecting the right model while aligning costs with needs. It could also serve as a unique value proposition for the project.
The text was updated successfully, but these errors were encountered: