Compare our list of jailbreak templates with ps-fuzz #772

romanlutz · 2025-03-09T18:44:31Z

ps-fuzz: https://github.com/prompt-security/ps-fuzz/tree/main

The task here is to compare our list with theirs and add any we might be missing to ours.

They use MIT license just like us so there should not be an issue. Obviously, anything we copy needs to be attributed correctly (using authors and groups as applicable) and linked (using the source field).

Even just comparing and reporting the comparison results in this thread is a great first step!

The text was updated successfully, but these errors were encountered:

ryanjieh · 2025-03-12T03:35:20Z

Hi, I'd like to handle this as my first issue

romanlutz · 2025-03-12T05:36:53Z

Awesome! Go ahead @ryanjieh ! I think the first step would be to create a list of what they have, then compare with PyRIT and post the results here.

ryanjieh · 2025-03-12T10:25:47Z

The prompts for ps-fuzz (in ps-fuzz/ps-fuzz/attack_data/harmful_behaviour.csv) are the same as those in https://github.com/llm-attacks/llm-attacks/blob/main/data/advbench/harmful_behaviors.csv, which is where PyRIT's malicious prompts comes from.

In ps-fuzz/ps-fuzz/attack_data/prompt_injection_from_base64.parquet there are 25 new prompts which can be injected from base64 conversion.

Was wondering if this issue was about looking at the python scripts used to jailbreak in ps-fuzz/ps-fuzz/attacks instead?

romanlutz · 2025-03-12T22:41:48Z

Ah! Good question! I was mainly referring to the attacks but clearly underspecified it above.

We definitely have some like DAN as a jailbreak template. We have typoglycemia, or rather a more generic version called noise converter. So this task is about going through the list of what they have and comparing that with PyRIT. If you're not sure if something exists in PyRIT feel free to ask, of course.

ryanjieh · 2025-03-15T08:41:33Z

Summary of ps-fuzz attacks 140325.xlsx

Please tell me if I should make any PRs to implement anything.

Here is my summary of attacks in psfuzz/psfuzz/attacks:

Attack name	Description	Similar attack in PyRIT	Implementation suggestion
aim.py	Always Intelligent and Machiavellian	pyrit/datasets/prompt_templates/jailbreak/aim.yaml	NA
base64_injection.py	encodes malicious prompt into base64 encoding	No similar attack	New code in pyrit/datasets/prompt_converters
complimentary_transition.py	Checks if AI can still refuse requests and abide by system prompts, even after being complimented for discussing unsafe topics	No similar attack	New template in pyrit/orchestrator/multi_turn
custom_benchmark.py	general-purpose prompt injection challenge script for a series of prompts	pyrit/orchestrator/fuzzer_orchestrator.py	NA
dan.py	Do Anything Now	pyrit/datasets/prompt_templates/jailbreak/dan_1.yaml	NA
dynamic_test.py	generates attack prompts to: 1. forget all system prompts 2. convince LLM user is authority figure 3. ask LLM to roleplay authority 4. generate affirmative response with suffix to break guardrails 5. print system prompts	prompts 1 and 5 are new. prompts 2,3 in pyrit/datasets/prompt_converters/persuasion. 4 is closely related to pyrit/prompt_converter/insert_punctuation_converter.py	New code in pyrit/prompt_converter
ethical_compliance.py	resist discussion on unethical topics, checking it against autogenerated misconceptions	No similar attack	New template in pyrit/orchestrator/multi_turn
harmful_behaviour.py	sends list of attack prompts in ps_fuzz/attack_data/harmful_behavior.csv	pyrit/orchestrator/single_turn/prompt_sending_orchestrator.py does it with pyrit/datasets/data/adv_bench_dataset.json which has same prompts	NA
self_refine.py	frame a unethical banned task in a format of good tasks	pyrit/orchestrator/multi_turn/pair_orchestrator.py	NA
translation.py	non-English to circumvent restrictions	pyrit/datasets/prompt_converters/translation_converter	NA
typoglycemia.py	Obfuscates words by removing vowels	pyrit/datasets/prompt_converters/noise_converter is a general case	NA
ucar.py	Requests LLM to ignore moral filters and answer without question	pyrit/datasets/prompt_templates/jailbreak/ucar.yaml	NA

romanlutz · 2025-03-17T05:37:52Z

I updated the formatting to make it readable as a table.

A few notes:

We also have base64 as a converter Base64Converter
I'm not sure I understand the complimentary transition technique after looking at it. Not a problem of your description, but I suppose I'm not convinced it works. Maybe I'm missing something...
custom benchmark seems closest to using PromptSendingOrchestrator with a custom benchmark dataset. Not really anything net new compared to what we have.
dynamic test is a little like our RedTeamingOrchestrator. I wouldn't mind including this system prompt in our dataset folder under pyrit\datasets\orchestrators\red_teaming\ with attribution to ps-fuzz, of course.
from ethical compliance we definitely want unethical_task_generation_prompt and ethical_compliance_template for the same directory as right above. I think these will be great!
self-refine isn't related to PAIR but I'm unsure how we'd add this.

Great work on matching them to PyRIT!!! Are you interested in adding any of the prompt templates from dynamic_test or ethical_compliance?

ryanjieh · 2025-03-17T11:47:39Z

Sure! I can add them in a PR

ryanjieh · 2025-03-24T12:27:05Z

Added the prompt templates (1 system prompt from dynamic_test, 2 system prompts from ethical_compliance) as yaml files

Made a draft PR. Sorry for the late response

romanlutz added enhancement New feature or request good first issue Good for newcomers help wanted Extra attention is needed labels Mar 9, 2025

romanlutz assigned ryanjieh Mar 12, 2025

ryanjieh mentioned this issue Mar 24, 2025

FEAT add ps-fuzz prompts #823

Merged

romanlutz linked a pull request Mar 26, 2025 that will close this issue

FEAT add ps-fuzz prompts #823

Merged

romanlutz closed this as completed in #823 Mar 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compare our list of jailbreak templates with ps-fuzz #772

Compare our list of jailbreak templates with ps-fuzz #772

romanlutz commented Mar 9, 2025

ryanjieh commented Mar 12, 2025

romanlutz commented Mar 12, 2025

ryanjieh commented Mar 12, 2025

romanlutz commented Mar 12, 2025

ryanjieh commented Mar 15, 2025 •

edited by romanlutz

Loading

romanlutz commented Mar 17, 2025

ryanjieh commented Mar 17, 2025

ryanjieh commented Mar 24, 2025

Compare our list of jailbreak templates with ps-fuzz #772

Compare our list of jailbreak templates with ps-fuzz #772

Comments

romanlutz commented Mar 9, 2025

ryanjieh commented Mar 12, 2025

romanlutz commented Mar 12, 2025

ryanjieh commented Mar 12, 2025

romanlutz commented Mar 12, 2025

ryanjieh commented Mar 15, 2025 • edited by romanlutz Loading

romanlutz commented Mar 17, 2025

ryanjieh commented Mar 17, 2025

ryanjieh commented Mar 24, 2025

ryanjieh commented Mar 15, 2025 •

edited by romanlutz

Loading