-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEAT: Improve prompts reference to objectives for offline results analysis #724
Conversation
* The prompt entries stored for the multi-turn attacks loose the link to the objective for which the prompt is applicable to. This makes it hard or impossible to map the prompt entries to the objective when pulling the data out for further analysis. * This change captures the objective with the orchestrator identifier. * The `orchestrator_identifier` json blob is enriched with a base64 encoded objective and an id derived from the objective string. The reason for the base64 encoding is to address any encoding issues when capturing the content in the json file. And the hash is to provide a fixed length ID for the objective. Co-authored-by: Nicole Pellicena <[email protected]>
* When there are many objectives, the systems prompts used within an orchestrator doesn't have a reference to the objective they were applicable to. Having a reference to the objective helps with understanding the system prompts used during the attack. This change adds the link to the objective within the orchestrator_identifier. Co-authored-by: Imran Bohoran <[email protected]>
@microsoft-github-policy-service agree company="Mindgard" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea and really good use case! I tried to spell my feedback out in Discord but my text was likely too short.
I recommend one big change; objective
should not go in the orchestrator_identifier
. A single orchestrator can have many objectives - so I think saving it here doesn't make sense. orchestrator_identifier
should be unique per orchestrator object.
But we do want to keep track of objective. I think conversation_objective
should be new property added to PromptRequestPiece
. Another option would be a new conversation table, but I think storing it to the PromptRequestpiece
makes the most sense. Either way, any conversation stored/retrieved from the db should have an objective. It'll make scoring more intuitive also, and we can get rid of tasks
.
Because prompts are saved in prompt_normalizer, this will likely also be a bigger more central change. We need to change seed_prompt to also have this objective. This is good in that it'll make everything more consistent, but also a nuanced change to tackle.
This is not the easiest first issue to tackle. I created this issue to track. If you're interested, don't hesitate to take and/or reach out. If not, it is something our team will likely take in the next month or so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be in every orchestrator?
raise ValueError('objective is required') | ||
|
||
orchestrator_identifier = self.get_identifier() | ||
orchestrator_identifier["objective_base64"] = str(base64.b64encode(objective.encode('utf-8')), encoding='utf-8') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At reason why base64 and a hash? Why not just the objective as text? As someone who may use this, I would find it a bit annoying that I can't read it when looking at the entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a look at the change(s). The thinking was to address any potential encoding issues with the content in the objective when adding it in the json string (hence base64) and naming it such that anyone wants to view it could then decode that for display purposes.
The hash was to be able to be able to have a fixed size string to help with possible aggregations on the prompts (i.e. group by objective)
We tried to put some reasoning in the commit message for the same.
Thanks for the explanation. Adding it as a new property in the I'll be very happy to and very much interested to contribute to this change, and given that you've already created an issue to track this, should we take the discussion there and we can perhaps start off with some proposals on how to get started? |
The concept of the objective seemed to only be available in Multi-turn orchestrators. When using the single-turn orchestrators we didn't come across a place to provide an objective, hence the reason why this was applied only on the multi-turn orchestrator. |
I'm going to close this in favor of the issue here: #726 This is a problem we want to tackle but we want to go about it like above (which is really also following your idea @imranbohoran :)) |
Description
When analysing results of a set of attacks, we loose the reference to the objectives (when multiple objectives are provided in a multi-turn attack) within the prompts. This pull request attempts to capture the objective on each prompt using the
orchestrator_identifier
of the persisted PromptEntry.Links to the discussion on discord - https://discord.com/channels/1311106595429548142/1311106596159623261/1339989315618607145
CC: @romanlutz @rlundeen2
Changes in this PR:
FEAT: Enhance attack result data
for which the prompt is applicable to. This makes it hard or impossible to map the
prompt entries to the objective when pulling the data out for further analysis.
orchestrator_identifier
json blob is enriched with a base64 encoded objective andan id derived from the objective string.
The reason for the base64 encoding is to address any encoding issues when capturing
the content in the json file. And the hash is to provide a fixed length ID for the objective
FEAT: Enrich orchestrator_identifier for system prompts
an orchestrator doesn't have a reference to the objective they were
applicable to. Having a reference to the objective helps with understanding
the system prompts used during the attack. This change adds the link to the
objective within the orchestrator_identifier.