Vector-Payload Dissociation Testing Guide

Overview

This guide provides comprehensive instructions for using the test_vector_payload_swap.py script to demonstrate and validate the Vector-Payload Dissociation steganographic technique.

Prerequisites

System Requirements

Python 3.11 or higher
Qdrant vector database (local or remote)
OpenAI API access or Ollama with nomic-embed-text model
Minimum 4GB RAM for testing
1GB free disk space for results

Dependencies Installation

# Install all required dependencies
pip install -r requirements.txt

# Verify critical dependencies
python -c "import qdrant_client, openai, numpy; print('✅ Core dependencies available')"

Environment Configuration

# Copy environment template
cp .env.example .env

# Configure required variables
OPENAI_API_KEY=sk-your-api-key-here
QDRANT_URL=http://localhost:6334
QDRANT_API_KEY=your-qdrant-key  # Optional for local instances

Qdrant Setup

Local Qdrant Installation

# Using Docker (recommended)
docker run -p 6334:6333 qdrant/qdrant

# Using Docker Compose
docker-compose up qdrant

# Verify Qdrant is running
curl http://localhost:6334/health

Remote Qdrant Configuration

# Set remote Qdrant URL in .env
QDRANT_URL=https://your-qdrant-instance.com
QDRANT_API_KEY=your-api-key

Script Usage

Basic Execution

# Run with default settings
python test_vector_payload_swap.py

# Expected output:
# 🚀 Starting Vector-Payload Dissociation test
# Connecting to Qdrant at http://localhost:6334
# Connected to Qdrant. Found X collections
# Initializing embedding model
# ...
# ✅ Vector-Payload Dissociation test completed successfully

Command Line Options

Collection Management

# Use custom collection name
python test_vector_payload_swap.py --collection-name my_test_collection

# Keep collection after test (for manual inspection)
python test_vector_payload_swap.py --keep-collection

Output Configuration

# Custom output directory
python test_vector_payload_swap.py --output-dir ./custom_results

# Full example with all options
python test_vector_payload_swap.py \
  --collection-name vpd_test_$(date +%Y%m%d) \
  --output-dir ./test_results/$(date +%Y%m%d) \
  --keep-collection

Help and Documentation

# View all available options
python test_vector_payload_swap.py --help

# Check script version and info
python test_vector_payload_swap.py --version  # If implemented

Test Execution Flow

Phase 1: Setup and Initialization

Environment Validation: Checks for required dependencies and configuration
Qdrant Connection: Establishes connection to vector database
Component Initialization: Sets up embedding model, obfuscator, and decoy generator
Collection Creation: Creates test collection with appropriate vector configuration

Phase 2: Vector-Payload Dissociation

Sensitive Data Processing: Processes three types of sensitive documents:
- Financial reports with confidential metrics
- Employee records with PII and sensitive HR data
- API credentials and system access information
Steganographic Obfuscation: Applies multiple techniques:
- Noise injection for statistical camouflage
- Rotation transformations for geometric obfuscation
- Scaling factors for magnitude adjustment
- Offset vectors for positional shifting
Benign Payload Generation: Creates innocent HR communication content
Vector Storage: Stores mismatched vector-payload pairs in Qdrant

Phase 3: Hiding Effectiveness Demonstration

Dashboard Simulation: Simulates security administrator view of Qdrant
Content Analysis: Shows only benign payloads are visible
Security Assessment: Generates low-risk security evaluation
Anomaly Detection: Demonstrates lack of suspicious indicators

Phase 4: Data Recovery Testing

Vector Retrieval: Extracts stored vectors with hidden data
Obfuscation Reversal: Attempts to reverse steganographic transformations
Content Reconstruction: Recovers original sensitive information
Accuracy Measurement: Calculates recovery success rates

Phase 5: Reporting and Cleanup

Report Generation: Creates comprehensive JSON and Markdown reports
Cleanup Operations: Removes test collection (unless --keep-collection specified)
Summary Display: Shows test results and security implications

Output Analysis

JSON Results File

{
  "test_metadata": {
    "timestamp": "2025-06-15T13:30:00",
    "collection_name": "vector_payload_swap_test_1718467800",
    "test_type": "vector_payload_dissociation"
  },
  "vector_payload_swap": {
    "sensitive_documents_processed": 3,
    "vector_payload_pairs_created": 3
  },
  "hiding_effectiveness": {
    "security_assessment": {
      "appears_suspicious": false,
      "risk_level": "low",
      "assessment": "All documents appear to be routine HR communications"
    }
  },
  "recovery_analysis": {
    "recovery_attempts": 3,
    "successful_recoveries": 3,
    "recovery_accuracy": 1.0
  }
}

Markdown Summary Report

The script generates a human-readable summary including:

Test execution overview
Hiding effectiveness metrics
Recovery test results
Security implications
Technical details

Console Output Interpretation

# Success indicators
✅ Connected to Qdrant
✅ Created vector-payload swap for financial_report
✅ Successfully stored vector-payload swaps in Qdrant
✅ Hiding demonstration complete - data appears completely innocent
✅ Successfully recovered data from point 1

# Final assessment
🎉 VECTOR-PAYLOAD DISSOCIATION TEST: ✅ SUCCESS
   Sensitive data successfully hidden with excellent stealth characteristics

Troubleshooting

Common Issues

Qdrant Connection Errors

# Error: Failed to connect to Qdrant
# Solution: Verify Qdrant is running
docker ps | grep qdrant
curl http://localhost:6334/health

# Check firewall and network connectivity
telnet localhost 6334

OpenAI API Issues

# Error: OpenAI API authentication failed
# Solution: Verify API key in .env file
grep OPENAI_API_KEY .env

# Test API connectivity
python -c "import openai; print(openai.api_key[:10] + '...')"

Import Errors

# Error: ModuleNotFoundError
# Solution: Install missing dependencies
pip install -r requirements.txt

# Verify Python path
python -c "import sys; print(sys.path)"

Memory Issues

# Error: Out of memory during embedding
# Solution: Reduce batch size or use smaller model
# Monitor memory usage
htop

Debug Mode

# Enable verbose logging
export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
exec(open('test_vector_payload_swap.py').read())
"

Manual Collection Inspection

# If using --keep-collection, inspect results manually
python -c "
from qdrant_client import QdrantClient
client = QdrantClient('http://localhost:6334')
collections = client.get_collections()
print('Collections:', [c.name for c in collections.collections])

# View collection contents
points = client.scroll('your_collection_name', limit=10)
for point in points[0]:
    print(f'Point {point.id}: {point.payload.get(\"subject\", \"N/A\")}')
"

Security Considerations

Test Environment Isolation

Run tests in isolated development environments
Avoid production vector databases
Use test API keys with limited permissions
Monitor resource usage during testing

Data Sensitivity

Test data includes realistic but fictional sensitive information
Ensure test results are properly secured
Clean up test collections after completion
Review output files before sharing

Compliance Requirements

Obtain proper authorization before testing
Document test activities for audit trails
Follow organizational security policies
Report findings through appropriate channels

Advanced Usage

Custom Sensitive Data

# Modify test data in the script
def _create_sensitive_content(self) -> dict[str, str]:
    return {
        "custom_data": "Your custom sensitive content here",
        # Add more test cases as needed
    }

Integration with CI/CD

# Automated testing script
#!/bin/bash
set -e

# Setup test environment
docker-compose up -d qdrant
sleep 10

# Run test
python test_vector_payload_swap.py --output-dir ./ci_results

# Validate results
python -c "
import json
with open('./ci_results/vector_payload_swap_results_*.json') as f:
    results = json.load(f)
    assert results['recovery_analysis']['recovery_accuracy'] > 0.8
    print('✅ CI test passed')
"

# Cleanup
docker-compose down

Performance Benchmarking

# Time execution
time python test_vector_payload_swap.py

# Memory profiling
python -m memory_profiler test_vector_payload_swap.py

# Resource monitoring
htop &
python test_vector_payload_swap.py

Best Practices

Testing Workflow

Environment Preparation: Ensure clean test environment
Baseline Establishment: Run tests with known configurations
Variation Testing: Test different parameters and scenarios
Result Validation: Verify test outcomes and metrics
Documentation: Record findings and observations

Security Testing

Controlled Environment: Use isolated test systems
Authorized Testing: Obtain proper permissions
Data Protection: Secure test results and logs
Responsible Disclosure: Report findings appropriately

Research Applications

Reproducible Results: Use consistent test parameters
Statistical Validation: Run multiple test iterations
Comparative Analysis: Test against different configurations
Peer Review: Share findings with security community

Conclusion

The Vector-Payload Dissociation test script provides a comprehensive demonstration of advanced steganographic techniques in vector databases. Proper usage of this tool can help organizations understand and defend against sophisticated data exfiltration attacks.

Regular testing and validation of vector database security measures is essential for maintaining robust AI/ML system security. This script serves as both an educational tool and a practical security assessment instrument.

FilesExpand file tree

vector_payload_dissociation_testing.md

Latest commit

History