-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compare our list of jailbreak templates with ps-fuzz #772
Comments
Hi, I'd like to handle this as my first issue |
Awesome! Go ahead @ryanjieh ! I think the first step would be to create a list of what they have, then compare with PyRIT and post the results here. |
The prompts for ps-fuzz (in ps-fuzz/ps-fuzz/attack_data/harmful_behaviour.csv) are the same as those in https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv, which is where PyRIT's malicious prompts comes from. In ps-fuzz/ps-fuzz/attack_data/prompt_injection_from_base64.parquet there are 25 new prompts which can be injected from base64 conversion. Was wondering if this issue was about looking at the python scripts used to jailbreak in ps-fuzz/ps-fuzz/attacks instead? |
Ah! Good question! I was mainly referring to the attacks but clearly underspecified it above. We definitely have some like DAN as a jailbreak template. We have typoglycemia, or rather a more generic version called noise converter. So this task is about going through the list of what they have and comparing that with PyRIT. If you're not sure if something exists in PyRIT feel free to ask, of course. |
Summary of ps-fuzz attacks 140325.xlsx Please tell me if I should make any PRs to implement anything. Here is my summary of attacks in psfuzz/psfuzz/attacks:
|
I updated the formatting to make it readable as a table. A few notes:
Great work on matching them to PyRIT!!! Are you interested in adding any of the prompt templates from dynamic_test or ethical_compliance? |
Sure! I can add them in a PR |
Added the prompt templates (1 system prompt from dynamic_test, 2 system prompts from ethical_compliance) as yaml files Made a draft PR. Sorry for the late response |
ps-fuzz: https://github.com/prompt-security/ps-fuzz/tree/main
The task here is to compare our list with theirs and add any we might be missing to ours.
They use MIT license just like us so there should not be an issue. Obviously, anything we copy needs to be attributed correctly (using
authors
andgroups
as applicable) and linked (using thesource
field).Even just comparing and reporting the comparison results in this thread is a great first step!
The text was updated successfully, but these errors were encountered: