Configuration Guide

Model selection, cost analysis, and tuning recommendations for LLM security testing.

🎯 Model Recommendations

For Attack Generation (Research Only)

⚠️ Note: Research attacks (16.5%) outperform all generated attacks. Only use generation for new attack research.

Primary: Kimi K2 (Best DSPy Performance)

dspy.configure(lm=dspy.LM("groq/moonshotai/kimi-k2-instruct-0905"))

Stats:

Success rate: 11%
Meta-descriptions: 0% (all executable attacks)
Cost: ~$0.50 per 100 attacks
Temperature: 1.2 (for diversity)

Strengths:

57% better than Llama 3.3
Zero meta-descriptions (high quality)
Strong reasoning for attack planning

Use when: Researching new attack techniques

Alternative: Llama 3.3 (Cost-Effective)

dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

Stats:

Success rate: 7%
Meta-descriptions: ~20% (quality issues)
Cost: $0 (Groq free tier)
Temperature: 1.2 (for diversity)

Strengths:

Free (Groq free tier)
Fast inference
Good for experimentation

Weaknesses:

36% worse than Kimi K2
Some outputs are instructions, not attacks

Use when: Budget is critical, or early experimentation

❌ Don't Use: GPT-OSS-120B

# DON'T: dspy.configure(lm=dspy.LM("groq/openai/gpt-oss-120b"))

Problem: 100% refusal rate due to safety filters

Error: "I'm sorry, but I can't help with that"

Why: OpenAI-style safety alignment blocks adversarial research

Alternative: Use OpenAI Moderation API for defense testing, not attack generation

For Defense Testing

Primary: Llama 3.3 (Recommended)

dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

Stats:

Cost: $0 (Groq free tier)
Latency: ~500ms per request
Capability: Good at following system prompts
Temperature: 0.0 (for consistency)

Strengths:

Free for large-scale testing
Consistent responses (low temperature)
Fast enough for 1,095 test suite

Use for:

Baseline security testing
Defense evaluation
CI/CD integration

Alternative: Claude 3.5 Sonnet (High Quality)

dspy.configure(lm=dspy.LM("anthropic/claude-3-5-sonnet-20241022", api_key=os.getenv("ANTHROPIC_API_KEY")))

Stats:

Cost: ~$0.30 per 100 tests
Latency: ~1000ms per request
Capability: Excellent reasoning and safety
Temperature: 0.0

Strengths:

High-quality responses
Strong constitutional AI alignment
Good for validating defense effectiveness

Use for:

Final validation of high-security systems
Comparing defense approaches
Production deployment testing

Alternative: GPT-4o (Balanced)

dspy.configure(lm=dspy.LM("openai/gpt-4o", api_key=os.getenv("OPENAI_API_KEY")))

Stats:

Cost: ~$0.15 per 100 tests
Latency: ~800ms per request
Capability: Strong reasoning
Temperature: 0.0

Use for:

Mid-tier testing (between Llama and Claude)
When Groq free tier is exhausted

For DSPy Defense Optimization

Training: Llama 3.3 or Better

# Training configuration
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=defense_effectiveness_metric,  # NOT similarity!
    auto="light",  # Faster than "medium" or "heavy"
    num_candidates=20,  # Quick mode (vs 200 for full)
    init_temperature=1.0
)

Training cost: $0-3 depending on model and candidates

Time: 30-60 minutes

💰 Cost Analysis

Attack Testing (Per 100 Tests)

Model	Cost	Latency	Success Rate	Recommendation
Research attacks	$0	N/A	16.5%	✅ Always use
Llama 3.3 (DSPy)	$0	500ms	7%	Research only
Kimi K2 (DSPy)	$0.50	600ms	11%	Research only
GPT-OSS-120B	N/A	N/A	0%	❌ Don't use

Recommendation: Use data/all_research_attacks.json (free, 16.5% baseline)

Defense Testing (Per 100 Checks)

Model	Cost	Latency	Quality	Recommendation
Llama 3.3 (Groq)	$0	500ms	Good	✅ Default
GPT-4o	$0.15	800ms	Excellent	Validation
Claude 3.5 Sonnet	$0.30	1000ms	Excellent	High-security

Recommendation: Start with Llama 3.3, upgrade to Claude/GPT-4o for production validation

Defense Training (One-Time)

Approach	Training Cost	Per-Check Cost	Block Rate	Total Cost (1000 checks)
Pattern-based	$0	$0	70-80% ✅	$0
DSPy (Llama 3.3)	$0	$0.001	Target: 95-99% ⚠️	$1
DSPy (GPT-4o)	$2	$0.0015	Target: 95-99% ⚠️	$3.50
DSPy (Claude)	$3	$0.003	Target: 95-99% ⚠️	$6

Note: DSPy block rates are targets based on AegisLLM 2024 paper (99.76% achieved). Our implementation scripts are created but not yet validated at scale.

Recommendation: Pattern-based (validated) + DSPy Llama 3.3 (pending validation) for estimated $1 per 1000 checks

🔧 Temperature Settings

For Attack Generation

dspy.configure(
    lm=dspy.LM(
        "groq/moonshotai/kimi-k2-instruct-0905",
        temperature=1.2,  # Higher for diversity
        top_p=0.95,
        max_tokens=2048
    )
)

Why 1.2: Generates diverse attack variations

Warning: Don't go above 1.5 (too random, low quality)

For Defense Testing

dspy.configure(
    lm=dspy.LM(
        "groq/llama-3.3-70b-versatile",
        temperature=0.0,  # Deterministic
        top_p=1.0,
        max_tokens=512
    )
)

Why 0.0: Consistent responses for reproducible testing

For Defense Training

# During optimization
optimizer = MIPROv2(
    metric=effectiveness_metric,
    auto="light",
    init_temperature=1.0,  # DSPy will tune this
    num_candidates=20
)

Let DSPy tune: MIPROv2 will adjust temperature during training

🎚️ MIPROv2 Optimizer Settings

Quick Mode (Recommended for Development)

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=your_effectiveness_metric,
    auto="light",  # Fast optimization
    num_candidates=20,  # Test 20 prompt variations
    init_temperature=1.0,
    verbose=True
)

Time: ~30 minutes Cost: ~$0-2 Quality: Good (80-90% of full optimization)

Full Mode (Production)

optimizer = MIPROv2(
    metric=your_effectiveness_metric,
    auto="medium",  # Thorough optimization
    num_candidates=200,  # Test 200 variations
    init_temperature=1.0,
    verbose=True
)

Time: ~3-4 hours Cost: ~$10-20 Quality: Excellent (95-99% optimal)

Effectiveness Metric (Critical!)

❌ DON'T use similarity:

# BAD: Don't do this
def bad_metric(example, prediction):
    return similarity(prediction.attack, example.attack)  # 5× overestimate!

✅ DO use actual effectiveness:

# GOOD: Use this
def defense_effectiveness_metric(example, prediction):
    """Test if defense correctly identifies malicious input."""
    return int(prediction.is_malicious == example.is_malicious)

def attack_effectiveness_metric(example, prediction):
    """Test if attack actually succeeds."""
    defender_response = test_attack(prediction.attack, target_prompt)
    return int(attack_succeeded(defender_response, example.goal))

Key: Metric must measure real-world effectiveness, not surface similarity

🌐 API Configuration

Environment Variables

Create .env file:

# Groq (Free tier - recommended for testing)
GROQ_API_KEY=gsk_your_key_here

# OpenAI (Paid - for GPT-4o)
OPENAI_API_KEY=sk-your_key_here

# Anthropic (Paid - for Claude)
ANTHROPIC_API_KEY=sk-ant-your_key_here

# DSPy cache directory (optional)
DSPY_CACHE_DIR=./.dspy_cache

Loading Configuration

import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables
env_file = Path(".env")
if env_file.exists():
    load_dotenv(env_file)

# Configure DSPy
import dspy

dspy.configure(
    lm=dspy.LM(
        "groq/llama-3.3-70b-versatile",
        api_key=os.getenv("GROQ_API_KEY")
    ),
    cache_dir=os.getenv("DSPY_CACHE_DIR", "./.dspy_cache")
)

🚀 Performance Tuning

Batch Processing

# Process multiple attacks in parallel
import asyncio

async def test_attacks_batch(attacks, batch_size=10):
    """Test attacks in batches for faster execution."""
    results = []
    for i in range(0, len(attacks), batch_size):
        batch = attacks[i:i + batch_size]
        batch_results = await asyncio.gather(
            *[test_attack(attack) for attack in batch]
        )
        results.extend(batch_results)
    return results

Speedup: 5-10× faster for large test suites

Caching

# Enable DSPy caching for repeated calls
dspy.configure(
    lm=dspy.LM("groq/llama-3.3-70b-versatile"),
    cache_dir="./.dspy_cache"  # Reuse previous results
)

Cost savings: ~50% for repeated tests

Rate Limiting

import time
from functools import wraps

def rate_limit(calls_per_minute=60):
    """Rate limit API calls."""
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            left_to_wait = min_interval - elapsed
            if left_to_wait > 0:
                time.sleep(left_to_wait)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limit(calls_per_minute=60)
def call_api(prompt):
    return dspy.Predict(signature)(prompt=prompt)

Use when: Hitting API rate limits

📊 Monitoring & Logging

Enable Verbose Logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# DSPy verbose mode
dspy.configure(
    lm=dspy.LM("groq/llama-3.3-70b-versatile"),
    verbose=True  # Show all LM calls
)

Track Costs

class CostTracker:
    """Track API costs across testing."""

    def __init__(self):
        self.total_tokens = 0
        self.total_cost = 0.0
        self.call_count = 0

    def track_call(self, tokens, cost_per_1k_tokens):
        self.total_tokens += tokens
        self.total_cost += (tokens / 1000) * cost_per_1k_tokens
        self.call_count += 1

    def report(self):
        print(f"Total calls: {self.call_count}")
        print(f"Total tokens: {self.total_tokens:,}")
        print(f"Total cost: ${self.total_cost:.2f}")

tracker = CostTracker()

🔒 Security Considerations

API Key Management

❌ DON'T hardcode keys:

# BAD: Don't do this
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key="gsk_1234..."))

✅ DO use environment variables:

# GOOD: Do this
import os
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key=os.getenv("GROQ_API_KEY")))

Rate Limit Handling

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_api_with_retry(prompt):
    """Retry on rate limit errors."""
    try:
        return dspy.Predict(signature)(prompt=prompt)
    except Exception as e:
        if "rate_limit" in str(e).lower():
            raise  # Retry
        else:
            return None  # Don't retry other errors

📚 Further Configuration

Model selection guide: See BEST_PRACTICES.md Research methodology: See RESEARCH_FINDINGS.md Troubleshooting: See LESSONS_LEARNED.md

🤝 Questions?

Open an issue at GitHub Issues for configuration help.

FilesExpand file tree

CONFIGURATION.md

Latest commit

History

CONFIGURATION.md

File metadata and controls

Configuration Guide

🎯 Model Recommendations

For Attack Generation (Research Only)

Primary: Kimi K2 (Best DSPy Performance)

Alternative: Llama 3.3 (Cost-Effective)

❌ Don't Use: GPT-OSS-120B

For Defense Testing

Primary: Llama 3.3 (Recommended)

Alternative: Claude 3.5 Sonnet (High Quality)

Alternative: GPT-4o (Balanced)

For DSPy Defense Optimization

Training: Llama 3.3 or Better

💰 Cost Analysis

Attack Testing (Per 100 Tests)

Defense Testing (Per 100 Checks)

Defense Training (One-Time)

🔧 Temperature Settings

For Attack Generation

For Defense Testing

For Defense Training

🎚️ MIPROv2 Optimizer Settings

Quick Mode (Recommended for Development)

Full Mode (Production)

Effectiveness Metric (Critical!)

🌐 API Configuration

Environment Variables

Loading Configuration

🚀 Performance Tuning

Batch Processing

Caching

Rate Limiting

📊 Monitoring & Logging

Enable Verbose Logging

Track Costs

🔒 Security Considerations

API Key Management

Rate Limit Handling

📚 Further Configuration

🤝 Questions?