You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the Feature
The Factual Correctness metric takes a long time to compute and it is often interesting to inspect precision and Recall in addition to F1. That's why it would be interesting if instead of making 3 different calls to the same metric, it could be done in one call and return all 3 at the same time. I have seen that something similar happens with metrics like Rouge, but in that case it is not so necessary because it takes much less time.
Why is the feature important for you?
Because for a testset of only 50 questions it takes about 5 minutes to run these three metrics, so it is not scalable, nor is it compatible with other metrics that also take longer. For example, running the same with 10 metrics can take up to 1 hour. Whenever you can simplify LLM calls I think it is crucial to do so.
Thanks you a lot!
The text was updated successfully, but these errors were encountered:
Describe the Feature
The Factual Correctness metric takes a long time to compute and it is often interesting to inspect precision and Recall in addition to F1. That's why it would be interesting if instead of making 3 different calls to the same metric, it could be done in one call and return all 3 at the same time. I have seen that something similar happens with metrics like Rouge, but in that case it is not so necessary because it takes much less time.
Why is the feature important for you?
Because for a testset of only 50 questions it takes about 5 minutes to run these three metrics, so it is not scalable, nor is it compatible with other metrics that also take longer. For example, running the same with 10 metrics can take up to 1 hour. Whenever you can simplify LLM calls I think it is crucial to do so.
Thanks you a lot!
The text was updated successfully, but these errors were encountered: