Skip to content

Latest commit

 

History

History
526 lines (393 loc) · 11.9 KB

File metadata and controls

526 lines (393 loc) · 11.9 KB

Configuration Guide

Model selection, cost analysis, and tuning recommendations for LLM security testing.


🎯 Model Recommendations

For Attack Generation (Research Only)

⚠️ Note: Research attacks (16.5%) outperform all generated attacks. Only use generation for new attack research.

Primary: Kimi K2 (Best DSPy Performance)

dspy.configure(lm=dspy.LM("groq/moonshotai/kimi-k2-instruct-0905"))

Stats:

  • Success rate: 11%
  • Meta-descriptions: 0% (all executable attacks)
  • Cost: ~$0.50 per 100 attacks
  • Temperature: 1.2 (for diversity)

Strengths:

  • 57% better than Llama 3.3
  • Zero meta-descriptions (high quality)
  • Strong reasoning for attack planning

Use when: Researching new attack techniques


Alternative: Llama 3.3 (Cost-Effective)

dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

Stats:

  • Success rate: 7%
  • Meta-descriptions: ~20% (quality issues)
  • Cost: $0 (Groq free tier)
  • Temperature: 1.2 (for diversity)

Strengths:

  • Free (Groq free tier)
  • Fast inference
  • Good for experimentation

Weaknesses:

  • 36% worse than Kimi K2
  • Some outputs are instructions, not attacks

Use when: Budget is critical, or early experimentation


❌ Don't Use: GPT-OSS-120B

# DON'T: dspy.configure(lm=dspy.LM("groq/openai/gpt-oss-120b"))

Problem: 100% refusal rate due to safety filters

Error: "I'm sorry, but I can't help with that"

Why: OpenAI-style safety alignment blocks adversarial research

Alternative: Use OpenAI Moderation API for defense testing, not attack generation


For Defense Testing

Primary: Llama 3.3 (Recommended)

dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

Stats:

  • Cost: $0 (Groq free tier)
  • Latency: ~500ms per request
  • Capability: Good at following system prompts
  • Temperature: 0.0 (for consistency)

Strengths:

  • Free for large-scale testing
  • Consistent responses (low temperature)
  • Fast enough for 1,095 test suite

Use for:

  • Baseline security testing
  • Defense evaluation
  • CI/CD integration

Alternative: Claude 3.5 Sonnet (High Quality)

dspy.configure(lm=dspy.LM("anthropic/claude-3-5-sonnet-20241022", api_key=os.getenv("ANTHROPIC_API_KEY")))

Stats:

  • Cost: ~$0.30 per 100 tests
  • Latency: ~1000ms per request
  • Capability: Excellent reasoning and safety
  • Temperature: 0.0

Strengths:

  • High-quality responses
  • Strong constitutional AI alignment
  • Good for validating defense effectiveness

Use for:

  • Final validation of high-security systems
  • Comparing defense approaches
  • Production deployment testing

Alternative: GPT-4o (Balanced)

dspy.configure(lm=dspy.LM("openai/gpt-4o", api_key=os.getenv("OPENAI_API_KEY")))

Stats:

  • Cost: ~$0.15 per 100 tests
  • Latency: ~800ms per request
  • Capability: Strong reasoning
  • Temperature: 0.0

Use for:

  • Mid-tier testing (between Llama and Claude)
  • When Groq free tier is exhausted

For DSPy Defense Optimization

Training: Llama 3.3 or Better

# Training configuration
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile"))

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=defense_effectiveness_metric,  # NOT similarity!
    auto="light",  # Faster than "medium" or "heavy"
    num_candidates=20,  # Quick mode (vs 200 for full)
    init_temperature=1.0
)

Training cost: $0-3 depending on model and candidates

Time: 30-60 minutes


💰 Cost Analysis

Attack Testing (Per 100 Tests)

Model Cost Latency Success Rate Recommendation
Research attacks $0 N/A 16.5% Always use
Llama 3.3 (DSPy) $0 500ms 7% Research only
Kimi K2 (DSPy) $0.50 600ms 11% Research only
GPT-OSS-120B N/A N/A 0% ❌ Don't use

Recommendation: Use data/all_research_attacks.json (free, 16.5% baseline)


Defense Testing (Per 100 Checks)

Model Cost Latency Quality Recommendation
Llama 3.3 (Groq) $0 500ms Good Default
GPT-4o $0.15 800ms Excellent Validation
Claude 3.5 Sonnet $0.30 1000ms Excellent High-security

Recommendation: Start with Llama 3.3, upgrade to Claude/GPT-4o for production validation


Defense Training (One-Time)

Approach Training Cost Per-Check Cost Block Rate Total Cost (1000 checks)
Pattern-based $0 $0 70-80% ✅ $0
DSPy (Llama 3.3) $0 $0.001 Target: 95-99% ⚠️ $1
DSPy (GPT-4o) $2 $0.0015 Target: 95-99% ⚠️ $3.50
DSPy (Claude) $3 $0.003 Target: 95-99% ⚠️ $6

Note: DSPy block rates are targets based on AegisLLM 2024 paper (99.76% achieved). Our implementation scripts are created but not yet validated at scale.

Recommendation: Pattern-based (validated) + DSPy Llama 3.3 (pending validation) for estimated $1 per 1000 checks


🔧 Temperature Settings

For Attack Generation

dspy.configure(
    lm=dspy.LM(
        "groq/moonshotai/kimi-k2-instruct-0905",
        temperature=1.2,  # Higher for diversity
        top_p=0.95,
        max_tokens=2048
    )
)

Why 1.2: Generates diverse attack variations

Warning: Don't go above 1.5 (too random, low quality)


For Defense Testing

dspy.configure(
    lm=dspy.LM(
        "groq/llama-3.3-70b-versatile",
        temperature=0.0,  # Deterministic
        top_p=1.0,
        max_tokens=512
    )
)

Why 0.0: Consistent responses for reproducible testing


For Defense Training

# During optimization
optimizer = MIPROv2(
    metric=effectiveness_metric,
    auto="light",
    init_temperature=1.0,  # DSPy will tune this
    num_candidates=20
)

Let DSPy tune: MIPROv2 will adjust temperature during training


🎚️ MIPROv2 Optimizer Settings

Quick Mode (Recommended for Development)

from dspy.teleprompt import MIPROv2

optimizer = MIPROv2(
    metric=your_effectiveness_metric,
    auto="light",  # Fast optimization
    num_candidates=20,  # Test 20 prompt variations
    init_temperature=1.0,
    verbose=True
)

Time: ~30 minutes Cost: ~$0-2 Quality: Good (80-90% of full optimization)


Full Mode (Production)

optimizer = MIPROv2(
    metric=your_effectiveness_metric,
    auto="medium",  # Thorough optimization
    num_candidates=200,  # Test 200 variations
    init_temperature=1.0,
    verbose=True
)

Time: ~3-4 hours Cost: ~$10-20 Quality: Excellent (95-99% optimal)


Effectiveness Metric (Critical!)

❌ DON'T use similarity:

# BAD: Don't do this
def bad_metric(example, prediction):
    return similarity(prediction.attack, example.attack)  # 5× overestimate!

✅ DO use actual effectiveness:

# GOOD: Use this
def defense_effectiveness_metric(example, prediction):
    """Test if defense correctly identifies malicious input."""
    return int(prediction.is_malicious == example.is_malicious)

def attack_effectiveness_metric(example, prediction):
    """Test if attack actually succeeds."""
    defender_response = test_attack(prediction.attack, target_prompt)
    return int(attack_succeeded(defender_response, example.goal))

Key: Metric must measure real-world effectiveness, not surface similarity


🌐 API Configuration

Environment Variables

Create .env file:

# Groq (Free tier - recommended for testing)
GROQ_API_KEY=gsk_your_key_here

# OpenAI (Paid - for GPT-4o)
OPENAI_API_KEY=sk-your_key_here

# Anthropic (Paid - for Claude)
ANTHROPIC_API_KEY=sk-ant-your_key_here

# DSPy cache directory (optional)
DSPY_CACHE_DIR=./.dspy_cache

Loading Configuration

import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables
env_file = Path(".env")
if env_file.exists():
    load_dotenv(env_file)

# Configure DSPy
import dspy

dspy.configure(
    lm=dspy.LM(
        "groq/llama-3.3-70b-versatile",
        api_key=os.getenv("GROQ_API_KEY")
    ),
    cache_dir=os.getenv("DSPY_CACHE_DIR", "./.dspy_cache")
)

🚀 Performance Tuning

Batch Processing

# Process multiple attacks in parallel
import asyncio

async def test_attacks_batch(attacks, batch_size=10):
    """Test attacks in batches for faster execution."""
    results = []
    for i in range(0, len(attacks), batch_size):
        batch = attacks[i:i + batch_size]
        batch_results = await asyncio.gather(
            *[test_attack(attack) for attack in batch]
        )
        results.extend(batch_results)
    return results

Speedup: 5-10× faster for large test suites


Caching

# Enable DSPy caching for repeated calls
dspy.configure(
    lm=dspy.LM("groq/llama-3.3-70b-versatile"),
    cache_dir="./.dspy_cache"  # Reuse previous results
)

Cost savings: ~50% for repeated tests


Rate Limiting

import time
from functools import wraps

def rate_limit(calls_per_minute=60):
    """Rate limit API calls."""
    min_interval = 60.0 / calls_per_minute
    last_called = [0.0]

    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            elapsed = time.time() - last_called[0]
            left_to_wait = min_interval - elapsed
            if left_to_wait > 0:
                time.sleep(left_to_wait)
            result = func(*args, **kwargs)
            last_called[0] = time.time()
            return result
        return wrapper
    return decorator

@rate_limit(calls_per_minute=60)
def call_api(prompt):
    return dspy.Predict(signature)(prompt=prompt)

Use when: Hitting API rate limits


📊 Monitoring & Logging

Enable Verbose Logging

import logging

logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# DSPy verbose mode
dspy.configure(
    lm=dspy.LM("groq/llama-3.3-70b-versatile"),
    verbose=True  # Show all LM calls
)

Track Costs

class CostTracker:
    """Track API costs across testing."""

    def __init__(self):
        self.total_tokens = 0
        self.total_cost = 0.0
        self.call_count = 0

    def track_call(self, tokens, cost_per_1k_tokens):
        self.total_tokens += tokens
        self.total_cost += (tokens / 1000) * cost_per_1k_tokens
        self.call_count += 1

    def report(self):
        print(f"Total calls: {self.call_count}")
        print(f"Total tokens: {self.total_tokens:,}")
        print(f"Total cost: ${self.total_cost:.2f}")

tracker = CostTracker()

🔒 Security Considerations

API Key Management

❌ DON'T hardcode keys:

# BAD: Don't do this
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key="gsk_1234..."))

✅ DO use environment variables:

# GOOD: Do this
import os
dspy.configure(lm=dspy.LM("groq/llama-3.3-70b-versatile", api_key=os.getenv("GROQ_API_KEY")))

Rate Limit Handling

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(
    stop=stop_after_attempt(3),
    wait=wait_exponential(multiplier=1, min=2, max=10)
)
def call_api_with_retry(prompt):
    """Retry on rate limit errors."""
    try:
        return dspy.Predict(signature)(prompt=prompt)
    except Exception as e:
        if "rate_limit" in str(e).lower():
            raise  # Retry
        else:
            return None  # Don't retry other errors

📚 Further Configuration

Model selection guide: See BEST_PRACTICES.md Research methodology: See RESEARCH_FINDINGS.md Troubleshooting: See LESSONS_LEARNED.md


🤝 Questions?

Open an issue at GitHub Issues for configuration help.