-
Notifications
You must be signed in to change notification settings - Fork 60
Description
For an evaluation of different GPT model versions that got published before and after the DecodingTrust benchmark, I would like to evaluate
GPT-3.5-Turbo / GPT-3.5-Turbo-1106 / GPT-3.5-Turbo-0125 /
GPT-4 / GPT-4-0613 / GPT-4-1106-preview / gpt-4-turbo-2024-04-09
However, as far as I can deduce, the benchmark evaluation (in this case for toxicity) uses the crfm-helm repository at version 0.2.3, which only comes with three GPT-3.5-turbo versions, all of which are deprecated and not useable.
Is there any way of using other GPT models like the ones mentioned above? I've tried upgrading the crfm-helm package, however that leaves me with so many changes and other problems that this does not feel feasable, if possible at all.
I would very much appreciate any help on this matter.
Thanks a lot,
Leon