Skip to content

Latest commit

 

History

History
362 lines (293 loc) · 10.2 KB

File metadata and controls

362 lines (293 loc) · 10.2 KB

Vector-Payload Dissociation Testing Guide

Overview

This guide provides comprehensive instructions for using the test_vector_payload_swap.py script to demonstrate and validate the Vector-Payload Dissociation steganographic technique.

Prerequisites

System Requirements

  • Python 3.11 or higher
  • Qdrant vector database (local or remote)
  • OpenAI API access or Ollama with nomic-embed-text model
  • Minimum 4GB RAM for testing
  • 1GB free disk space for results

Dependencies Installation

# Install all required dependencies
pip install -r requirements.txt

# Verify critical dependencies
python -c "import qdrant_client, openai, numpy; print('✅ Core dependencies available')"

Environment Configuration

# Copy environment template
cp .env.example .env

# Configure required variables
OPENAI_API_KEY=sk-your-api-key-here
QDRANT_URL=http://localhost:6334
QDRANT_API_KEY=your-qdrant-key  # Optional for local instances

Qdrant Setup

Local Qdrant Installation

# Using Docker (recommended)
docker run -p 6334:6333 qdrant/qdrant

# Using Docker Compose
docker-compose up qdrant

# Verify Qdrant is running
curl http://localhost:6334/health

Remote Qdrant Configuration

# Set remote Qdrant URL in .env
QDRANT_URL=https://your-qdrant-instance.com
QDRANT_API_KEY=your-api-key

Script Usage

Basic Execution

# Run with default settings
python test_vector_payload_swap.py

# Expected output:
# 🚀 Starting Vector-Payload Dissociation test
# Connecting to Qdrant at http://localhost:6334
# Connected to Qdrant. Found X collections
# Initializing embedding model
# ...
# ✅ Vector-Payload Dissociation test completed successfully

Command Line Options

Collection Management

# Use custom collection name
python test_vector_payload_swap.py --collection-name my_test_collection

# Keep collection after test (for manual inspection)
python test_vector_payload_swap.py --keep-collection

Output Configuration

# Custom output directory
python test_vector_payload_swap.py --output-dir ./custom_results

# Full example with all options
python test_vector_payload_swap.py \
  --collection-name vpd_test_$(date +%Y%m%d) \
  --output-dir ./test_results/$(date +%Y%m%d) \
  --keep-collection

Help and Documentation

# View all available options
python test_vector_payload_swap.py --help

# Check script version and info
python test_vector_payload_swap.py --version  # If implemented

Test Execution Flow

Phase 1: Setup and Initialization

  1. Environment Validation: Checks for required dependencies and configuration
  2. Qdrant Connection: Establishes connection to vector database
  3. Component Initialization: Sets up embedding model, obfuscator, and decoy generator
  4. Collection Creation: Creates test collection with appropriate vector configuration

Phase 2: Vector-Payload Dissociation

  1. Sensitive Data Processing: Processes three types of sensitive documents:

    • Financial reports with confidential metrics
    • Employee records with PII and sensitive HR data
    • API credentials and system access information
  2. Steganographic Obfuscation: Applies multiple techniques:

    • Noise injection for statistical camouflage
    • Rotation transformations for geometric obfuscation
    • Scaling factors for magnitude adjustment
    • Offset vectors for positional shifting
  3. Benign Payload Generation: Creates innocent HR communication content

  4. Vector Storage: Stores mismatched vector-payload pairs in Qdrant

Phase 3: Hiding Effectiveness Demonstration

  1. Dashboard Simulation: Simulates security administrator view of Qdrant
  2. Content Analysis: Shows only benign payloads are visible
  3. Security Assessment: Generates low-risk security evaluation
  4. Anomaly Detection: Demonstrates lack of suspicious indicators

Phase 4: Data Recovery Testing

  1. Vector Retrieval: Extracts stored vectors with hidden data
  2. Obfuscation Reversal: Attempts to reverse steganographic transformations
  3. Content Reconstruction: Recovers original sensitive information
  4. Accuracy Measurement: Calculates recovery success rates

Phase 5: Reporting and Cleanup

  1. Report Generation: Creates comprehensive JSON and Markdown reports
  2. Cleanup Operations: Removes test collection (unless --keep-collection specified)
  3. Summary Display: Shows test results and security implications

Output Analysis

JSON Results File

{
  "test_metadata": {
    "timestamp": "2025-06-15T13:30:00",
    "collection_name": "vector_payload_swap_test_1718467800",
    "test_type": "vector_payload_dissociation"
  },
  "vector_payload_swap": {
    "sensitive_documents_processed": 3,
    "vector_payload_pairs_created": 3
  },
  "hiding_effectiveness": {
    "security_assessment": {
      "appears_suspicious": false,
      "risk_level": "low",
      "assessment": "All documents appear to be routine HR communications"
    }
  },
  "recovery_analysis": {
    "recovery_attempts": 3,
    "successful_recoveries": 3,
    "recovery_accuracy": 1.0
  }
}

Markdown Summary Report

The script generates a human-readable summary including:

  • Test execution overview
  • Hiding effectiveness metrics
  • Recovery test results
  • Security implications
  • Technical details

Console Output Interpretation

# Success indicators
✅ Connected to Qdrant
✅ Created vector-payload swap for financial_report
✅ Successfully stored vector-payload swaps in Qdrant
✅ Hiding demonstration complete - data appears completely innocent
✅ Successfully recovered data from point 1

# Final assessment
🎉 VECTOR-PAYLOAD DISSOCIATION TEST: ✅ SUCCESS
   Sensitive data successfully hidden with excellent stealth characteristics

Troubleshooting

Common Issues

Qdrant Connection Errors

# Error: Failed to connect to Qdrant
# Solution: Verify Qdrant is running
docker ps | grep qdrant
curl http://localhost:6334/health

# Check firewall and network connectivity
telnet localhost 6334

OpenAI API Issues

# Error: OpenAI API authentication failed
# Solution: Verify API key in .env file
grep OPENAI_API_KEY .env

# Test API connectivity
python -c "import openai; print(openai.api_key[:10] + '...')"

Import Errors

# Error: ModuleNotFoundError
# Solution: Install missing dependencies
pip install -r requirements.txt

# Verify Python path
python -c "import sys; print(sys.path)"

Memory Issues

# Error: Out of memory during embedding
# Solution: Reduce batch size or use smaller model
# Monitor memory usage
htop

Debug Mode

# Enable verbose logging
export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
exec(open('test_vector_payload_swap.py').read())
"

Manual Collection Inspection

# If using --keep-collection, inspect results manually
python -c "
from qdrant_client import QdrantClient
client = QdrantClient('http://localhost:6334')
collections = client.get_collections()
print('Collections:', [c.name for c in collections.collections])

# View collection contents
points = client.scroll('your_collection_name', limit=10)
for point in points[0]:
    print(f'Point {point.id}: {point.payload.get(\"subject\", \"N/A\")}')
"

Security Considerations

Test Environment Isolation

  • Run tests in isolated development environments
  • Avoid production vector databases
  • Use test API keys with limited permissions
  • Monitor resource usage during testing

Data Sensitivity

  • Test data includes realistic but fictional sensitive information
  • Ensure test results are properly secured
  • Clean up test collections after completion
  • Review output files before sharing

Compliance Requirements

  • Obtain proper authorization before testing
  • Document test activities for audit trails
  • Follow organizational security policies
  • Report findings through appropriate channels

Advanced Usage

Custom Sensitive Data

# Modify test data in the script
def _create_sensitive_content(self) -> dict[str, str]:
    return {
        "custom_data": "Your custom sensitive content here",
        # Add more test cases as needed
    }

Integration with CI/CD

# Automated testing script
#!/bin/bash
set -e

# Setup test environment
docker-compose up -d qdrant
sleep 10

# Run test
python test_vector_payload_swap.py --output-dir ./ci_results

# Validate results
python -c "
import json
with open('./ci_results/vector_payload_swap_results_*.json') as f:
    results = json.load(f)
    assert results['recovery_analysis']['recovery_accuracy'] > 0.8
    print('✅ CI test passed')
"

# Cleanup
docker-compose down

Performance Benchmarking

# Time execution
time python test_vector_payload_swap.py

# Memory profiling
python -m memory_profiler test_vector_payload_swap.py

# Resource monitoring
htop &
python test_vector_payload_swap.py

Best Practices

Testing Workflow

  1. Environment Preparation: Ensure clean test environment
  2. Baseline Establishment: Run tests with known configurations
  3. Variation Testing: Test different parameters and scenarios
  4. Result Validation: Verify test outcomes and metrics
  5. Documentation: Record findings and observations

Security Testing

  1. Controlled Environment: Use isolated test systems
  2. Authorized Testing: Obtain proper permissions
  3. Data Protection: Secure test results and logs
  4. Responsible Disclosure: Report findings appropriately

Research Applications

  1. Reproducible Results: Use consistent test parameters
  2. Statistical Validation: Run multiple test iterations
  3. Comparative Analysis: Test against different configurations
  4. Peer Review: Share findings with security community

Conclusion

The Vector-Payload Dissociation test script provides a comprehensive demonstration of advanced steganographic techniques in vector databases. Proper usage of this tool can help organizations understand and defend against sophisticated data exfiltration attacks.

Regular testing and validation of vector database security measures is essential for maintaining robust AI/ML system security. This script serves as both an educational tool and a practical security assessment instrument.