[Feature Proposal] Multi-needle in a haystack #41

jsharf · 2024-03-27T15:45:44Z

I really like this kind of benchmark. It would be interesting to make generalized versions of this, where there are a variable number of needles inserted. These could be unrelated independent needles, or they could be related. For example you could imagine 4 needles:

A implies B
B implies C, D
D implies E.
B is true

Then you could test the "related" needles, to ensure that all of them were detected and the relationship is understood. (What might A be? What about D?)

Curious what you think about this. If you're interested in a feature like this and willing to accept a pull request, I could find the time to try implementing it. If you have a style guide preference or anything like that, please let me know.

gkamradt · 2024-03-27T17:22:59Z

Hey! Awesome post and request

Couple things:

I totally agree that reasoning should be a part of the next set of tests. As an aside, I've been wondering what the "unit test" of reasoning is - what is the minimal amount of reasoning we can start with? It may be the transitive reasoning you're referring to here. I like this because you can easily append additional chains, and even put forks in the logic.
Lance from LangChain added multi-needle recall, but it didn't have reasoning in there.

We are trying to have the repo separate tests from providers from evaluators, other than that. No style guide.

Contributions are very welcome and we'll be quick with feedback

gkamradt · 2024-03-27T23:47:26Z

More context here too
https://twitter.com/GregKamradt/status/1772491996063526971

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Proposal] Multi-needle in a haystack #41

[Feature Proposal] Multi-needle in a haystack #41

jsharf commented Mar 27, 2024 •

edited

Loading

gkamradt commented Mar 27, 2024

gkamradt commented Mar 27, 2024

[Feature Proposal] Multi-needle in a haystack #41

[Feature Proposal] Multi-needle in a haystack #41

Comments

jsharf commented Mar 27, 2024 • edited Loading

gkamradt commented Mar 27, 2024

gkamradt commented Mar 27, 2024

jsharf commented Mar 27, 2024 •

edited

Loading