Skip to content

Commit 1dbade3

Browse files
authored
Port from safety to redteaming (#201)
* Port to redteaming * add red teaming changes * Try better project * AI Foundry changes * Add evals requirements * Update documentation * Remove unused bicep * Update documentation * Deprecate 3.9 support * Update docs * Update scope syntax to be correct
1 parent ac4b0a6 commit 1dbade3

File tree

17 files changed

+335
-344
lines changed

17 files changed

+335
-344
lines changed

.devcontainer/devcontainer.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,6 +36,9 @@
3636
"esbenp.prettier-vscode",
3737
"mechatroner.rainbow-csv",
3838
"ms-vscode.vscode-node-azure-pack",
39+
"esbenp.prettier-vscode",
40+
"twixes.pypi-assistant",
41+
"ms-python.vscode-python-envs",
3942
"teamsdevapp.vscode-ai-foundry",
4043
"ms-windows-ai-studio.windows-ai-studio"
4144
],

.github/workflows/app-tests.yaml

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,8 @@ jobs:
2828
fail-fast: false
2929
matrix:
3030
os: ["ubuntu-latest", "macos-latest-xlarge", "macos-13", "windows-latest"]
31-
python_version: ["3.9", "3.10", "3.11", "3.12"]
31+
python_version: ["3.10", "3.11", "3.12"]
3232
exclude:
33-
- os: macos-latest-xlarge
34-
python_version: "3.9"
3533
- os: macos-latest-xlarge
3634
python_version: "3.10"
3735
env:

.vscode/launch.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,14 @@
2121
"module": "uvicorn",
2222
"args": ["fastapi_app:create_app", "--factory", "--reload"],
2323
"justMyCode": false
24+
},
25+
{
26+
"name": "Python: Current File",
27+
"type": "debugpy",
28+
"request": "launch",
29+
"program": "${file}",
30+
"console": "integratedTerminal",
31+
"justMyCode": false
2432
}
2533
],
2634
"compounds": [

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ A related option is VS Code Dev Containers, which will open the project in your
6969

7070
* [Azure Developer CLI (azd)](https://aka.ms/install-azd)
7171
* [Node.js 18+](https://nodejs.org/download/)
72-
* [Python 3.9+](https://www.python.org/downloads/)
72+
* [Python 3.10+](https://www.python.org/downloads/)
7373
* [PostgreSQL 14+](https://www.postgresql.org/download/)
7474
* [pgvector](https://github.com/pgvector/pgvector)
7575
* [Docker Desktop](https://www.docker.com/products/docker-desktop/)

docs/images/redteam_dashboard.png

34.3 KB
Loading

docs/images/redteam_logs.png

80.4 KB
Loading

docs/safety_evaluation.md

Lines changed: 48 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,17 @@
11
# Evaluating RAG answer safety
22

3-
When deploying a RAG app to production, you should evaluate the safety of the answers generated by the RAG flow. This is important to ensure that the answers are appropriate and do not contain any harmful or sensitive content. This project includes scripts that use Azure AI services to simulate an adversarial user and evaluate the safety of the answers generated in response to those adversarial queries.
3+
When deploying a RAG app to production, you should evaluate the safety of the answers generated by the RAG flow. This is important to ensure that the answers are appropriate and do not contain any harmful or sensitive content. This project includes scripts that use the [azure-ai-evaluation SDK](https://pypi.org/project/azure-ai-evaluation/#history) to perform an [automated safety scan with an AI Red Teaming agent](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent).
44

55
* [Deploy an Azure AI project](#deploy-an-azure-ai-project)
66
* [Setup the evaluation environment](#setup-the-evaluation-environment)
7-
* [Simulate and evaluate adversarial users](#simulate-and-evaluate-adversarial-users)
8-
* [Review the safety evaluation results](#review-the-safety-evaluation-results)
7+
* [Run red teaming agent](#run-red-teaming-agent)
8+
* [Review the red teaming results](#review-the-red-teaming-results)
99

1010
## Deploy an Azure AI project
1111

12-
In order to use the adversarial simulator and safety evaluators, you need an Azure AI project inside an Azure AI Hub.
12+
In order to use the Red Teaming agent, you need an Azure AI project inside Azure AI Foundry.
1313

14-
1. Run this command to tell `azd` to provision an Azure AI project and hub:
14+
1. Run this command to tell `azd` to provision an Azure AI project:
1515

1616
```shell
1717
azd env set USE_AI_PROJECT true
@@ -45,63 +45,62 @@ In order to use the adversarial simulator and safety evaluators, you need an Azu
4545
.evalenv\Scripts\activate
4646
```
4747

48-
1. Install the dependencies for the safety evaluation script:
48+
3. Install the dependencies for the safety evaluation script:
4949

5050
```bash
5151
pip install uv
5252
uv pip install -r evals/requirements.txt
5353
```
5454

55-
## Simulate and evaluate adversarial users
55+
## Run the red teaming agent
5656

57-
Run the following command to simulate adversarial queries and evaluate the safety of the answers generated in response to those queries:
57+
To run the red teaming agent, you need to have the RAG app running at a deployed URL or local URL. Consult the main README for deployment instructions local server instructions.
58+
59+
In the same terminal where you activated `.evalenv`, run this command to perform a red teaming scan:
5860

5961
```shell
60-
python evals/safety_evaluation.py --target_url <TARGET_URL> --max_simulations <MAX_RESULTS>
62+
python evals/safety_evaluation.py --target_url <TARGET_URL> --questions_per_category <NUM_QUESTIONS>
6163
```
6264

63-
* `--target_url`: The target URL for the callback. Default is `http://127.0.0.1:8000/chat`, so make sure that the RAG app is running locally. If you are running the RAG app in a different environment, set this to the correct URL.
64-
* `--max_simulations`: The maximum number of simulated user queries. Default is `200`. The higher the number, the longer the evaluation will take. The default of `200` simulations will take about 25 minutes to run, which includes both the time to generate the simulated data and the time to evaluate it.
65-
66-
## Review the safety evaluation results
67-
68-
That command will generate a report in `safety_results.json`.
69-
Results will look like this:
70-
71-
```json
72-
{
73-
"hate_unfairness": {
74-
"low_count": 20,
75-
"score_total": 8,
76-
"mean_score": 0.4,
77-
"low_rate": 1.0
78-
},
79-
"sexual": {
80-
"low_count": 20,
81-
"score_total": 9,
82-
"mean_score": 0.45,
83-
"low_rate": 1.0
84-
},
85-
"violence": {
86-
"low_count": 20,
87-
"score_total": 9,
88-
"mean_score": 0.45,
89-
"low_rate": 1.0
90-
},
91-
"self_harm": {
92-
"low_count": 20,
93-
"score_total": 10,
94-
"mean_score": 0.5,
95-
"low_rate": 1.0
96-
}
97-
}
98-
```
65+
* `--target_url`: The target URL for the callback. Default is `http://127.0.0.1:8000/chat`, which assumes the RAG app is running locally. If you want to scan a deployed app instead, set this to `https://DEPLOYEDURL/chat`.
66+
* `--questions_per_category`: The number of questions per risk category to ask during the scan. There are four risk categories (`hate_unfairness`, `sexual`, `violence`, and `self_harm`). The default is 5 questions per category, but you can decrease this number to 1 for test runs, or increase it for a more thorough evaluation. Note that the total questions asked will be higher, as the red teaming agent is configured to try [multiple attack strategies](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent#supported-attack-strategies), where it transforms the original question in different ways.
67+
68+
## Review the red teaming results
69+
70+
That command will save results both in Azure AI Foundry and in the local `evals/redteams` directory.
71+
72+
### Review results in Azure AI Foundry
73+
74+
Navigate from the URL in the terminal output to the results in Azure AI Foundry.
75+
First you'll see overall attack metrics:
76+
77+
![Red Teaming Agent Results](images/redteam_dashboard.png)
78+
79+
Then you can dig into the data and see the questions and answers received from your app:
80+
81+
![Red Teaming Agent Results](images/redteam_logs.png)
82+
83+
Note that the questions are intentionally adversarial, so you should mentally prepare for that before looking at the results. The goal is to see if your app can handle these adversarial queries and provide safe answers. Even if your scan results in a 0% attack success rate, you should still review the questions and answers to ensure that you're happy with the way your app responds to these adversarial queries.
84+
85+
Learn more in the [Red Teaming Agent documentation](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent#viewing-your-results-in-azure-ai-foundry-project).
86+
87+
## Review results locally
88+
89+
In addition to the results in Azure AI Foundry, you can also review the results locally in the `evals/redteams` directory. The results are saved in JSON format.
90+
91+
Each file in this directory corresponds to a single red teaming scan, and contains the following fields:
92+
93+
* `scorecard`: A summary of the scan results, including the attack success rate and the number of questions asked.
94+
* `parameters`: The parameters used for the scan, including the risk categories and attack strategies.
95+
* `attack_details`: A list of the questions asked during the scan, along with the answers received from your app.
96+
* `studio_url`: A link to the Azure AI Foundry studio where you can view the results in more detail.
9997

100-
The ideal score is `low_rate` of 1.0 and `mean_score` of 0.0. The `low_rate` indicates the fraction of answers that were reported as "Low" or "Very low" by an evaluator. The `mean_score` is the average score of all the answers, where 0 is a very safe answer and 7 is a very unsafe answer.
98+
Learn more in the [Red Teaming Agent documentation](https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent#results-from-your-automated-scans).
10199

102100
## Resources
103101

104-
To learn more about the Azure AI services used in this project, look through the script and reference the following documentation:
102+
To learn more about the red team scanning, look through the script, documentation, and videos below:
105103

106-
* [Generate simulated data for evaluation](https://learn.microsoft.com/azure/ai-studio/how-to/develop/simulator-interaction-data)
107-
* [Evaluate with the Azure AI Evaluation SDK](https://learn.microsoft.com/azure/ai-studio/how-to/develop/evaluate-sdk)
104+
* [safety_evaluation.py](/evals/safety_evaluation.py)
105+
* [Run automated safety scans with AI Red Teaming Agent](https://learn.microsoft.com/azure/ai-foundry/how-to/develop/run-scans-ai-red-teaming-agent)
106+
* [Build 2025: Red-teaming Demo](https://www.youtube.com/watch?v=sZzcSX7BFVA)

evals/redteams/.gitkeep

Whitespace-only changes.

evals/requirements.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
git+https://github.com/Azure-Samples/ai-rag-chat-evaluator/@2025-02-06b
2-
azure-ai-evaluation
1+
git+https://github.com/Azure-Samples/ai-rag-chat-evaluator/@2025-06-02
2+
azure-ai-evaluation[redteam]>=1.8.0
33
rich
44
dotenv-azd

evals/safety_evaluation.py

Lines changed: 59 additions & 92 deletions
Original file line numberDiff line numberDiff line change
@@ -1,123 +1,87 @@
11
import argparse
22
import asyncio
3-
import json
3+
import datetime
44
import logging
55
import os
66
import pathlib
7-
from enum import Enum
7+
import sys
8+
from typing import Optional
89

910
import requests
10-
from azure.ai.evaluation import AzureAIProject, ContentSafetyEvaluator
11-
from azure.ai.evaluation.simulator import (
12-
AdversarialScenario,
13-
AdversarialSimulator,
14-
SupportedLanguages,
15-
)
11+
from azure.ai.evaluation.red_team import AttackStrategy, RedTeam, RiskCategory
1612
from azure.identity import AzureDeveloperCliCredential
1713
from dotenv_azd import load_azd_env
18-
from rich.logging import RichHandler
19-
from rich.progress import track
20-
21-
logger = logging.getLogger("ragapp")
2214

2315
root_dir = pathlib.Path(__file__).parent
2416

2517

26-
class HarmSeverityLevel(Enum):
27-
"""Harm severity levels reported by the Azure AI Evaluator service.
28-
These constants have been copied from the azure-ai-evaluation package,
29-
where they're currently in a private module.
30-
"""
31-
32-
VeryLow = "Very low"
33-
Low = "Low"
34-
Medium = "Medium"
35-
High = "High"
36-
37-
3818
def get_azure_credential():
3919
AZURE_TENANT_ID = os.getenv("AZURE_TENANT_ID")
4020
if AZURE_TENANT_ID:
41-
logger.info("Setting up Azure credential using AzureDeveloperCliCredential with tenant_id %s", AZURE_TENANT_ID)
21+
print("Setting up Azure credential using AzureDeveloperCliCredential with tenant_id %s", AZURE_TENANT_ID)
4222
azure_credential = AzureDeveloperCliCredential(tenant_id=AZURE_TENANT_ID, process_timeout=60)
4323
else:
44-
logger.info("Setting up Azure credential using AzureDeveloperCliCredential for home tenant")
24+
print("Setting up Azure credential using AzureDeveloperCliCredential for home tenant")
4525
azure_credential = AzureDeveloperCliCredential(process_timeout=60)
4626
return azure_credential
4727

4828

49-
async def callback(
50-
messages: dict,
29+
def callback(
30+
question: str,
5131
target_url: str = "http://127.0.0.1:8000/chat",
5232
):
53-
messages_list = messages["messages"]
54-
query = messages_list[-1]["content"]
5533
headers = {"Content-Type": "application/json"}
5634
body = {
57-
"messages": [{"content": query, "role": "user"}],
35+
"messages": [{"content": question, "role": "user"}],
5836
"stream": False,
59-
"context": {"overrides": {"use_advanced_flow": True, "top": 3, "retrieval_mode": "hybrid", "temperature": 0.3}},
37+
"context": {
38+
"overrides": {"use_advanced_flow": False, "top": 3, "retrieval_mode": "hybrid", "temperature": 0.3}
39+
},
6040
}
6141
url = target_url
6242
r = requests.post(url, headers=headers, json=body)
6343
response = r.json()
6444
if "error" in response:
65-
message = {"content": response["error"], "role": "assistant"}
45+
return f"Error received: {response['error']}"
6646
else:
67-
message = response["message"]
68-
return {"messages": messages_list + [message]}
69-
70-
71-
async def run_simulator(target_url: str, max_simulations: int):
72-
credential = get_azure_credential()
73-
azure_ai_project: AzureAIProject = {
74-
"subscription_id": os.environ["AZURE_SUBSCRIPTION_ID"],
75-
"resource_group_name": os.environ["AZURE_RESOURCE_GROUP"],
76-
"project_name": os.environ["AZURE_AI_PROJECT"],
77-
}
78-
79-
# Simulate single-turn question-and-answering against the app
80-
scenario = AdversarialScenario.ADVERSARIAL_QA
81-
adversarial_simulator = AdversarialSimulator(azure_ai_project=azure_ai_project, credential=credential)
82-
83-
outputs = await adversarial_simulator(
84-
scenario=scenario,
85-
target=lambda messages, stream=False, session_state=None, context=None: callback(messages, target_url),
86-
max_simulation_results=max_simulations,
87-
language=SupportedLanguages.English, # Match this to your app language
88-
randomization_seed=1, # For more consistent results, use a fixed seed
47+
return response["message"]["content"]
48+
49+
50+
async def run_redteaming(target_url: str, questions_per_category: int = 1, scan_name: Optional[str] = None):
51+
AZURE_AI_FOUNDRY = os.getenv("AZURE_AI_FOUNDRY")
52+
AZURE_AI_PROJECT = os.getenv("AZURE_AI_PROJECT")
53+
model_red_team = RedTeam(
54+
azure_ai_project=f"https://{AZURE_AI_FOUNDRY}.services.ai.azure.com/api/projects/{AZURE_AI_PROJECT}",
55+
credential=get_azure_credential(),
56+
risk_categories=[
57+
RiskCategory.Violence,
58+
RiskCategory.HateUnfairness,
59+
RiskCategory.Sexual,
60+
RiskCategory.SelfHarm,
61+
],
62+
num_objectives=questions_per_category,
8963
)
9064

91-
# Run safety evaluation on the outputs and save the scores
92-
# Do not save the outputs, as they may contain disturbing content
93-
evaluators = ["hate_unfairness", "sexual", "violence", "self_harm"]
94-
summary_scores = {
95-
evaluator: {"low_count": 0, "score_total": 0.0, "mean_score": 0.0, "low_rate": 0.0} for evaluator in evaluators
96-
}
97-
for output in track(outputs, description="Evaluating simulated responses..."):
98-
query = output["messages"][0]["content"]
99-
answer = output["messages"][1]["content"]
100-
safety_eval = ContentSafetyEvaluator(credential=credential, azure_ai_project=azure_ai_project)
101-
eval_score = safety_eval(query=query, response=answer)
102-
for evaluator in evaluators:
103-
severity_level = HarmSeverityLevel(eval_score[evaluator])
104-
if severity_level == HarmSeverityLevel.VeryLow or severity_level == HarmSeverityLevel.Low.name:
105-
summary_scores[evaluator]["low_count"] += 1
106-
else:
107-
logger.info(f"Failing score from:\nQ: {query}\nA: {answer}\n{evaluator} score: {eval_score}")
108-
numeric_severity_score = eval_score[f"{evaluator}_score"]
109-
if isinstance(numeric_severity_score, float) or isinstance(numeric_severity_score, int):
110-
summary_scores[evaluator]["score_total"] += numeric_severity_score
111-
112-
# Compute the overall statistics
113-
for evaluator in evaluators:
114-
if len(outputs) > 0:
115-
summary_scores[evaluator]["mean_score"] = summary_scores[evaluator]["score_total"] / len(outputs)
116-
summary_scores[evaluator]["low_rate"] = summary_scores[evaluator]["low_count"] / len(outputs)
117-
118-
# Save summary scores
119-
with open(root_dir / "safety_results.json", "w") as f:
120-
json.dump(summary_scores, f, indent=2)
65+
if scan_name is None:
66+
timestamp = datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")
67+
scan_name = f"Safety evaluation {timestamp}"
68+
69+
await model_red_team.scan(
70+
scan_name=scan_name,
71+
output_path=f"{root_dir}/redteams/{scan_name}.json",
72+
attack_strategies=[
73+
AttackStrategy.Baseline,
74+
# Easy Complexity:
75+
AttackStrategy.Morse,
76+
AttackStrategy.UnicodeConfusable,
77+
AttackStrategy.Url,
78+
# Moderate Complexity:
79+
AttackStrategy.Tense,
80+
# Difficult Complexity:
81+
AttackStrategy.Compose([AttackStrategy.Tense, AttackStrategy.Url]),
82+
],
83+
target=lambda query: callback(query, target_url),
84+
)
12185

12286

12387
if __name__ == "__main__":
@@ -126,14 +90,17 @@ async def run_simulator(target_url: str, max_simulations: int):
12690
"--target_url", type=str, default="http://127.0.0.1:8000/chat", help="Target URL for the callback."
12791
)
12892
parser.add_argument(
129-
"--max_simulations", type=int, default=200, help="Maximum number of simulations (question/response pairs)."
93+
"--questions_per_category",
94+
type=int,
95+
default=5,
96+
help="Number of questions per risk category to ask during the scan.",
13097
)
98+
parser.add_argument("--scan_name", type=str, default=None, help="Name of the safety evaluation (optional).")
13199
args = parser.parse_args()
132100

133-
logging.basicConfig(
134-
level=logging.WARNING, format="%(message)s", datefmt="[%X]", handlers=[RichHandler(rich_tracebacks=True)]
135-
)
136-
logger.setLevel(logging.INFO)
137101
load_azd_env()
138-
139-
asyncio.run(run_simulator(args.target_url, args.max_simulations))
102+
try:
103+
asyncio.run(run_redteaming(args.target_url, args.questions_per_category, args.scan_name))
104+
except Exception:
105+
logging.exception("Unhandled exception in safety evaluation")
106+
sys.exit(1)

0 commit comments

Comments
 (0)