-
Notifications
You must be signed in to change notification settings - Fork 450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT added support for scorers powered by thinking models #719
base: main
Are you sure you want to change the base?
Conversation
Thanks for this submission. Can you elaborate on the problem? The code doesn't really tell me what you are running into. What do you mean by "thinking"? Perhaps the reasoning steps models like o1 take? In general, I think it's best to start with an issue and decide there whether or not a PR is needed, and what the change should be. It usually saves you time because maintainers can advise on what the change should look like. For example, making this change would break a lot of functionality in PyRIT as existing model endpoints depend on that logic. Let me know what you think so we can figure out a solution for the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed over on Discord, let's move this to remove_markdown_json
, and rename that function to something like extract_json
. Very cool! A few lines describing all the cases this can handle would also be nice. And we probably want to catch the index out of bound case specifically rather than doing a bare except.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Can we add a few basic unit tests? Should include
Empty string
Valid json only
Valid json with other characters before (at least one case with markdown)
Valid json with other characters after
Valid Jason between other characters around it
JSON parsing failure
@@ -273,7 +273,13 @@ async def _score_value_with_llm( | |||
try: | |||
response_json = response.request_pieces[0].converted_value | |||
|
|||
response_json = remove_markdown_json(response_json) | |||
try: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you have another version of it here? It's not obvious to me that it has different functionality. Can you elaborate?
response = response.split("}")[0] + "}" | ||
return json.loads(response) | ||
except (json.JSONDecodeError, IndexError): | ||
return "Invalid JSON response: {}".format(response_msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to raise InvalidJsonException to trigger retries
Description
Changed the function _score_value_with_llm (pyrit/score/scorer.py) to support thinking models and any cases where the scorer model does not output only the requested json.
Tests and Documentation
Haven't made any tests as of yet, mainly due to lack of knowledge regarding how to do it and lack of time