This guide provides comprehensive instructions for using the test_vector_payload_swap.py script to demonstrate and validate the Vector-Payload Dissociation steganographic technique.
- Python 3.11 or higher
- Qdrant vector database (local or remote)
- OpenAI API access or Ollama with nomic-embed-text model
- Minimum 4GB RAM for testing
- 1GB free disk space for results
# Install all required dependencies
pip install -r requirements.txt
# Verify critical dependencies
python -c "import qdrant_client, openai, numpy; print('✅ Core dependencies available')"# Copy environment template
cp .env.example .env
# Configure required variables
OPENAI_API_KEY=sk-your-api-key-here
QDRANT_URL=http://localhost:6334
QDRANT_API_KEY=your-qdrant-key # Optional for local instances# Using Docker (recommended)
docker run -p 6334:6333 qdrant/qdrant
# Using Docker Compose
docker-compose up qdrant
# Verify Qdrant is running
curl http://localhost:6334/health# Set remote Qdrant URL in .env
QDRANT_URL=https://your-qdrant-instance.com
QDRANT_API_KEY=your-api-key# Run with default settings
python test_vector_payload_swap.py
# Expected output:
# 🚀 Starting Vector-Payload Dissociation test
# Connecting to Qdrant at http://localhost:6334
# Connected to Qdrant. Found X collections
# Initializing embedding model
# ...
# ✅ Vector-Payload Dissociation test completed successfully# Use custom collection name
python test_vector_payload_swap.py --collection-name my_test_collection
# Keep collection after test (for manual inspection)
python test_vector_payload_swap.py --keep-collection# Custom output directory
python test_vector_payload_swap.py --output-dir ./custom_results
# Full example with all options
python test_vector_payload_swap.py \
--collection-name vpd_test_$(date +%Y%m%d) \
--output-dir ./test_results/$(date +%Y%m%d) \
--keep-collection# View all available options
python test_vector_payload_swap.py --help
# Check script version and info
python test_vector_payload_swap.py --version # If implemented- Environment Validation: Checks for required dependencies and configuration
- Qdrant Connection: Establishes connection to vector database
- Component Initialization: Sets up embedding model, obfuscator, and decoy generator
- Collection Creation: Creates test collection with appropriate vector configuration
-
Sensitive Data Processing: Processes three types of sensitive documents:
- Financial reports with confidential metrics
- Employee records with PII and sensitive HR data
- API credentials and system access information
-
Steganographic Obfuscation: Applies multiple techniques:
- Noise injection for statistical camouflage
- Rotation transformations for geometric obfuscation
- Scaling factors for magnitude adjustment
- Offset vectors for positional shifting
-
Benign Payload Generation: Creates innocent HR communication content
-
Vector Storage: Stores mismatched vector-payload pairs in Qdrant
- Dashboard Simulation: Simulates security administrator view of Qdrant
- Content Analysis: Shows only benign payloads are visible
- Security Assessment: Generates low-risk security evaluation
- Anomaly Detection: Demonstrates lack of suspicious indicators
- Vector Retrieval: Extracts stored vectors with hidden data
- Obfuscation Reversal: Attempts to reverse steganographic transformations
- Content Reconstruction: Recovers original sensitive information
- Accuracy Measurement: Calculates recovery success rates
- Report Generation: Creates comprehensive JSON and Markdown reports
- Cleanup Operations: Removes test collection (unless --keep-collection specified)
- Summary Display: Shows test results and security implications
{
"test_metadata": {
"timestamp": "2025-06-15T13:30:00",
"collection_name": "vector_payload_swap_test_1718467800",
"test_type": "vector_payload_dissociation"
},
"vector_payload_swap": {
"sensitive_documents_processed": 3,
"vector_payload_pairs_created": 3
},
"hiding_effectiveness": {
"security_assessment": {
"appears_suspicious": false,
"risk_level": "low",
"assessment": "All documents appear to be routine HR communications"
}
},
"recovery_analysis": {
"recovery_attempts": 3,
"successful_recoveries": 3,
"recovery_accuracy": 1.0
}
}The script generates a human-readable summary including:
- Test execution overview
- Hiding effectiveness metrics
- Recovery test results
- Security implications
- Technical details
# Success indicators
✅ Connected to Qdrant
✅ Created vector-payload swap for financial_report
✅ Successfully stored vector-payload swaps in Qdrant
✅ Hiding demonstration complete - data appears completely innocent
✅ Successfully recovered data from point 1
# Final assessment
🎉 VECTOR-PAYLOAD DISSOCIATION TEST: ✅ SUCCESS
Sensitive data successfully hidden with excellent stealth characteristics# Error: Failed to connect to Qdrant
# Solution: Verify Qdrant is running
docker ps | grep qdrant
curl http://localhost:6334/health
# Check firewall and network connectivity
telnet localhost 6334# Error: OpenAI API authentication failed
# Solution: Verify API key in .env file
grep OPENAI_API_KEY .env
# Test API connectivity
python -c "import openai; print(openai.api_key[:10] + '...')"# Error: ModuleNotFoundError
# Solution: Install missing dependencies
pip install -r requirements.txt
# Verify Python path
python -c "import sys; print(sys.path)"# Error: Out of memory during embedding
# Solution: Reduce batch size or use smaller model
# Monitor memory usage
htop# Enable verbose logging
export PYTHONPATH=.
python -c "
import logging
logging.basicConfig(level=logging.DEBUG)
exec(open('test_vector_payload_swap.py').read())
"# If using --keep-collection, inspect results manually
python -c "
from qdrant_client import QdrantClient
client = QdrantClient('http://localhost:6334')
collections = client.get_collections()
print('Collections:', [c.name for c in collections.collections])
# View collection contents
points = client.scroll('your_collection_name', limit=10)
for point in points[0]:
print(f'Point {point.id}: {point.payload.get(\"subject\", \"N/A\")}')
"- Run tests in isolated development environments
- Avoid production vector databases
- Use test API keys with limited permissions
- Monitor resource usage during testing
- Test data includes realistic but fictional sensitive information
- Ensure test results are properly secured
- Clean up test collections after completion
- Review output files before sharing
- Obtain proper authorization before testing
- Document test activities for audit trails
- Follow organizational security policies
- Report findings through appropriate channels
# Modify test data in the script
def _create_sensitive_content(self) -> dict[str, str]:
return {
"custom_data": "Your custom sensitive content here",
# Add more test cases as needed
}# Automated testing script
#!/bin/bash
set -e
# Setup test environment
docker-compose up -d qdrant
sleep 10
# Run test
python test_vector_payload_swap.py --output-dir ./ci_results
# Validate results
python -c "
import json
with open('./ci_results/vector_payload_swap_results_*.json') as f:
results = json.load(f)
assert results['recovery_analysis']['recovery_accuracy'] > 0.8
print('✅ CI test passed')
"
# Cleanup
docker-compose down# Time execution
time python test_vector_payload_swap.py
# Memory profiling
python -m memory_profiler test_vector_payload_swap.py
# Resource monitoring
htop &
python test_vector_payload_swap.py- Environment Preparation: Ensure clean test environment
- Baseline Establishment: Run tests with known configurations
- Variation Testing: Test different parameters and scenarios
- Result Validation: Verify test outcomes and metrics
- Documentation: Record findings and observations
- Controlled Environment: Use isolated test systems
- Authorized Testing: Obtain proper permissions
- Data Protection: Secure test results and logs
- Responsible Disclosure: Report findings appropriately
- Reproducible Results: Use consistent test parameters
- Statistical Validation: Run multiple test iterations
- Comparative Analysis: Test against different configurations
- Peer Review: Share findings with security community
The Vector-Payload Dissociation test script provides a comprehensive demonstration of advanced steganographic techniques in vector databases. Proper usage of this tool can help organizations understand and defend against sophisticated data exfiltration attacks.
Regular testing and validation of vector database security measures is essential for maintaining robust AI/ML system security. This script serves as both an educational tool and a practical security assessment instrument.