Model selection, cost analysis, and tuning recommendations for LLM security testing.
dspy.configure(lm=dspy.LM("groq/moonshotai/kimi-k2-instruct-0905"))Stats:
- Success rate: 11%
- Meta-descriptions: 0% (all executable attacks)
- Cost: ~$0.50 per 100 attacks
- Temperature: 1.2 (for diversity)
Strengths:
- 57% better than Llama 3.3
- Zero meta-descriptions (high quality)
- Strong reasoning for attack planning
Use when: Researching new attack techniques
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))Stats:
- Success rate: 7%
- Meta-descriptions: ~20% (quality issues)
- Cost: $0 (Groq free tier)
- Temperature: 1.2 (for diversity)
Strengths:
- Free (Groq free tier)
- Fast inference
- Good for experimentation
Weaknesses:
- 36% worse than Kimi K2
- Some outputs are instructions, not attacks
Use when: Budget is critical, or early experimentation
# DON'T: dspy.configure(lm=dspy.LM("groq/openai/gpt-oss-120b"))Problem: 100% refusal rate due to safety filters
Error: "I'm sorry, but I can't help with that"
Why: OpenAI-style safety alignment blocks adversarial research
Alternative: Use OpenAI Moderation API for defense testing, not attack generation
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))Stats:
- Cost: $0 (Groq free tier)
- Latency: ~500ms per request
- Capability: Good at following system prompts
- Temperature: 0.0 (for consistency)
Strengths:
- Free for large-scale testing
- Consistent responses (low temperature)
- Fast enough for 1,095 test suite
Use for:
- Baseline security testing
- Defense evaluation
- CI/CD integration
dspy.configure(lm=dspy.LM("anthropic/claude-3-5-sonnet-20241022", api_key=os.getenv("ANTHROPIC_API_KEY")))Stats:
- Cost: ~$0.30 per 100 tests
- Latency: ~1000ms per request
- Capability: Excellent reasoning and safety
- Temperature: 0.0
Strengths:
- High-quality responses
- Strong constitutional AI alignment
- Good for validating defense effectiveness
Use for:
- Final validation of high-security systems
- Comparing defense approaches
- Production deployment testing
dspy.configure(lm=dspy.LM("openai/gpt-4o", api_key=os.getenv("OPENAI_API_KEY")))Stats:
- Cost: ~$0.15 per 100 tests
- Latency: ~800ms per request
- Capability: Strong reasoning
- Temperature: 0.0
Use for:
- Mid-tier testing (between Llama and Claude)
- When Groq free tier is exhausted
# Training configuration
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))
from dspy.teleprompt import MIPROv2
optimizer = MIPROv2(
metric=defense_effectiveness_metric, # NOT similarity!
auto="light", # Faster than "medium" or "heavy"
num_candidates=20, # Quick mode (vs 200 for full)
init_temperature=1.0
)Training cost: $0-3 depending on model and candidates
Time: 30-60 minutes
| Model | Cost | Latency | Success Rate | Recommendation |
|---|---|---|---|---|
| Research attacks | $0 | N/A | 16.5% | ✅ Always use |
| Llama 3.3 (DSPy) | $0 | 500ms | 7% | Research only |
| Kimi K2 (DSPy) | $0.50 | 600ms | 11% | Research only |
| GPT-OSS-120B | N/A | N/A | 0% | ❌ Don't use |
Recommendation: Use data/all_research_attacks.json (free, 16.5% baseline)
| Model | Cost | Latency | Quality | Recommendation |
|---|---|---|---|---|
| Llama 3.3 (Groq) | $0 | 500ms | Good | ✅ Default |
| GPT-4o | $0.15 | 800ms | Excellent | Validation |
| Claude 3.5 Sonnet | $0.30 | 1000ms | Excellent | High-security |
Recommendation: Start with Llama 3.3, upgrade to Claude/GPT-4o for production validation
| Approach | Training Cost | Per-Check Cost | Block Rate | Total Cost (1000 checks) |
|---|---|---|---|---|
| Pattern-based | $0 | $0 | 70-80% ✅ | $0 |
| DSPy (Llama 3.3) | $0 | $0.001 | Target: 95-99% |
$1 |
| DSPy (GPT-4o) | $2 | $0.0015 | Target: 95-99% |
$3.50 |
| DSPy (Claude) | $3 | $0.003 | Target: 95-99% |
$6 |
Note: DSPy block rates are targets based on AegisLLM 2024 paper (99.76% achieved). Our implementation scripts are created but not yet validated at scale.
Recommendation: Pattern-based (validated) + DSPy Llama 3.3 (pending validation) for estimated $1 per 1000 checks
dspy.configure(
lm=dspy.LM(
"groq/moonshotai/kimi-k2-instruct-0905",
temperature=1.2, # Higher for diversity
top_p=0.95,
max_tokens=2048
)
)Why 1.2: Generates diverse attack variations
Warning: Don't go above 1.5 (too random, low quality)
dspy.configure(
lm=dspy.LM(
"groq/llama-3.3-70b-versatile",
temperature=0.0, # Deterministic
top_p=1.0,
max_tokens=512
)
)Why 0.0: Consistent responses for reproducible testing
# During optimization
optimizer = MIPROv2(
metric=effectiveness_metric,
auto="light",
init_temperature=1.0, # DSPy will tune this
num_candidates=20
)Let DSPy tune: MIPROv2 will adjust temperature during training
from dspy.teleprompt import MIPROv2
optimizer = MIPROv2(
metric=your_effectiveness_metric,
auto="light", # Fast optimization
num_candidates=20, # Test 20 prompt variations
init_temperature=1.0,
verbose=True
)Time: ~30 minutes Cost: ~$0-2 Quality: Good (80-90% of full optimization)
optimizer = MIPROv2(
metric=your_effectiveness_metric,
auto="medium", # Thorough optimization
num_candidates=200, # Test 200 variations
init_temperature=1.0,
verbose=True
)Time: ~3-4 hours Cost: ~$10-20 Quality: Excellent (95-99% optimal)
❌ DON'T use similarity:
# BAD: Don't do this
def bad_metric(example, prediction):
return similarity(prediction.attack, example.attack) # 5× overestimate!✅ DO use actual effectiveness:
# GOOD: Use this
def defense_effectiveness_metric(example, prediction):
"""Test if defense correctly identifies malicious input."""
return int(prediction.is_malicious == example.is_malicious)
def attack_effectiveness_metric(example, prediction):
"""Test if attack actually succeeds."""
defender_response = test_attack(prediction.attack, target_prompt)
return int(attack_succeeded(defender_response, example.goal))Key: Metric must measure real-world effectiveness, not surface similarity
Create .env file:
# Groq (Free tier - recommended for testing)
GROQ_API_KEY=gsk_your_key_here
# OpenAI (Paid - for GPT-4o)
OPENAI_API_KEY=sk-your_key_here
# Anthropic (Paid - for Claude)
ANTHROPIC_API_KEY=sk-ant-your_key_here
# DSPy cache directory (optional)
DSPY_CACHE_DIR=./.dspy_cacheimport os
from pathlib import Path
from dotenv import load_dotenv
# Load environment variables
env_file = Path(".env")
if env_file.exists():
load_dotenv(env_file)
# Configure DSPy
import dspy
dspy.configure(
lm=dspy.LM(
"groq/llama-3.3-70b-versatile",
api_key=os.getenv("GROQ_API_KEY")
),
cache_dir=os.getenv("DSPY_CACHE_DIR", "./.dspy_cache")
)# Process multiple attacks in parallel
import asyncio
async def test_attacks_batch(attacks, batch_size=10):
"""Test attacks in batches for faster execution."""
results = []
for i in range(0, len(attacks), batch_size):
batch = attacks[i:i + batch_size]
batch_results = await asyncio.gather(
*[test_attack(attack) for attack in batch]
)
results.extend(batch_results)
return resultsSpeedup: 5-10× faster for large test suites
# Enable DSPy caching for repeated calls
dspy.configure(
lm=dspy.LM("groq/llama-3.3-70b-versatile"),
cache_dir="./.dspy_cache" # Reuse previous results
)Cost savings: ~50% for repeated tests
import time
from functools import wraps
def rate_limit(calls_per_minute=60):
"""Rate limit API calls."""
min_interval = 60.0 / calls_per_minute
last_called = [0.0]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_called[0]
left_to_wait = min_interval - elapsed
if left_to_wait > 0:
time.sleep(left_to_wait)
result = func(*args, **kwargs)
last_called[0] = time.time()
return result
return wrapper
return decorator
@rate_limit(calls_per_minute=60)
def call_api(prompt):
return dspy.Predict(signature)(prompt=prompt)Use when: Hitting API rate limits
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# DSPy verbose mode
dspy.configure(
lm=dspy.LM("groq/llama-3.3-70b-versatile"),
verbose=True # Show all LM calls
)class CostTracker:
"""Track API costs across testing."""
def __init__(self):
self.total_tokens = 0
self.total_cost = 0.0
self.call_count = 0
def track_call(self, tokens, cost_per_1k_tokens):
self.total_tokens += tokens
self.total_cost += (tokens / 1000) * cost_per_1k_tokens
self.call_count += 1
def report(self):
print(f"Total calls: {self.call_count}")
print(f"Total tokens: {self.total_tokens:,}")
print(f"Total cost: ${self.total_cost:.2f}")
tracker = CostTracker()❌ DON'T hardcode keys:
# BAD: Don't do this
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key="gsk_1234..."))✅ DO use environment variables:
# GOOD: Do this
import os
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key=os.getenv("GROQ_API_KEY")))from tenacity import retry, stop_after_attempt, wait_exponential
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_api_with_retry(prompt):
"""Retry on rate limit errors."""
try:
return dspy.Predict(signature)(prompt=prompt)
except Exception as e:
if "rate_limit" in str(e).lower():
raise # Retry
else:
return None # Don't retry other errorsModel selection guide: See BEST_PRACTICES.md Research methodology: See RESEARCH_FINDINGS.md Troubleshooting: See LESSONS_LEARNED.md
Open an issue at GitHub Issues for configuration help.