Status: Accepted
Authors: Provenant team
Supersedes: None
Current contract owner:
../TESTING_STRATEGY.mddefines the live test-layer taxonomy, golden-fixture ownership rules, and current CI commands. This ADR records the decision to use golden tests as part of the verification model.
We need a reliable way to verify that Provenant produces output functionally equivalent to the Python ScanCode Toolkit reference implementation. Key challenges:
- Feature Parity Verification - How do we prove our parsers extract the same data?
- Regression Prevention - How do we catch unintended behavior changes?
- Edge Case Coverage - How do we ensure rare formats and corner cases work?
- Architectural Differences - How do we handle intentional implementation differences?
The Python reference implementation has extensive test data and expected outputs, but our Rust implementation may legitimately differ in structure (e.g., single package vs array, field ordering).
We use golden testing where parsers are validated against reference outputs from ScanCode Toolkit, with documented exceptions for intentional architectural differences.
┌──────────────────┐
│ testdata/ │
│ npm/package.json │
│ │
└────────┬─────────┘
│
├─────────────────────────┐
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ Python ScanCode │ │ Provenant │
│ │ │ │
│ scancode -p ... │ │ NpmParser:: │
│ │ │ extract_first_ │
└────────┬─────────┘ └────────┬─────────┘
│ │
▼ ▼
┌──────────────────┐ ┌──────────────────┐
│ expected.json │ │ actual output │
│ (reference) │ │ │
└────────┬─────────┘ └────────┬─────────┘
│ │
└─────────┬───────────────┘
│
▼
┌─────────────┐
│ JSON diff │
│ comparison │
└─────────────┘
1. Generate Reference Output (historical example for one-time setup per test case):
Set up the reference submodule, run the corresponding ScanCode command once, and save the resulting reference fixture alongside the Rust-owned test data.
2. Create Golden Test (in Rust):
Create a focused test that loads one fixture, runs the owning parser or subsystem entry point, and compares against the expected artifact or semantic projection.
3. Handle Intentional Differences:
Document intentional differences directly in the test metadata or fixture ownership notes so the exception is explicit and reviewable.
src/parsers/
├── npm.rs # Implementation
├── npm_test.rs # Unit tests
└── npm_golden_test.rs # Golden tests
testdata/
├── npm/
│ ├── package.json # Test input
│ ├── package-lock.json
│ └── yarn.lock
└── expected/
├── npm-package.json # Reference output
├── npm-lockfile.json
└── npm-yarn.json
-
Feature Parity Proof
- Direct comparison with Python reference
- Catches missing fields or incorrect values
- Validates edge case handling
-
Regression Prevention
- Any change that breaks compatibility is caught immediately
- Prevents accidental feature removal
- Safe refactoring with confidence
-
Documentation of Differences
- Ignored tests document WHY we differ from Python
- Architectural decisions are explicit
- Future maintainers understand context
-
Real-World Test Data
- Uses actual package manifests from ecosystems
- Covers edge cases found in production
- Validates against proven reference implementation
-
Continuous Validation
- Pre-commit hooks run fast local quality gates (format/lint/docs checks)
- CI validates on every push
- Automated regression detection
-
Test Maintenance
- Must regenerate expected outputs if Python changes
- Need to document intentional differences
- Acceptable: Worth the confidence in correctness
-
Blocked Tests
- Some tests blocked on detection engine (license normalization)
- Can't validate full output until detection is implemented
- Acceptable: Unit tests validate extraction correctness
-
JSON Structure Differences
- Must handle field ordering differences
- Some fields may be legitimately different (e.g., array vs single object)
- Mitigated: Custom comparison logic, documented exceptions
Python Approach: represent the manifest result as multiple package-like records.
Rust Approach: normalize the same information into one package record with dependency edges.
Rationale: Both are valid representations. Rust uses normalized PackageData struct for consistency. Validated via comprehensive unit tests.
Decision: Document the difference and rely on the appropriate test layer.
For Swift, parser-only goldens may still need special handling because the Rust implementation intentionally models a graph differently from older Python expectations.
For CocoaPods, parser-only goldens are active again because the current Rust fixtures and expectations now pin the parser contract directly rather than relying on the older ignored-golden workaround.
These examples are historical illustrations of the decision, not the authoritative current command set. For the live test-layer model, fixture ownership rules, and CI commands, follow ../TESTING_STRATEGY.md.
Python: Provider field (p:) is ignored ("not used yet")
Rust: Provider field fully extracted and stored in extra_data.providers
Rationale: We implement features that Python has marked as TODO. This is intentional improvement.
Decision: Document as enhancement, ignore golden test for provider field.
Approach: test individual parser functions without comparing to Python reference.
Rejected because:
- No proof of feature parity with Python reference
- Easy to miss fields or edge cases
- Manual assertion maintenance is error-prone
- Doesn't catch regressions against reference
Approach: generate Rust snapshots and review diffs manually.
Rejected because:
- No comparison with Python reference (our source of truth)
- Snapshot becomes the truth (circular validation)
- Harder to verify feature parity
- Doesn't validate against proven reference implementation
Approach: generate random inputs and verify coarse-grained properties.
Partial acceptance: We use property-based testing for security (DoS protection, invalid input handling), but NOT as primary validation strategy.
Why not primary:
- Can't verify feature parity with reference
- Doesn't test real-world manifests
- Hard to generate valid package manifests
- Golden tests are more effective for correctness
Approach: run the full CLI and compare emitted artifacts.
Partial acceptance: We do this at CI level, but NOT as primary test strategy.
Why not primary:
- Slower than unit/golden tests
- Harder to debug failures
- Can't test parsers in isolation
- Golden tests at parser level are more granular
Golden tests are gated behind the golden-tests Cargo feature flag to keep the default cargo test fast.
All *_golden_test.rs modules are conditionally compiled with #[cfg(all(test, feature = "golden-tests"))]. CI always runs with --features golden-tests.
✅ Write golden test when:
- Parser is complete and stable
- Reference output available from Python ScanCode
- Edge cases covered by real test data
❌ Don't write golden test when:
- Feature depends on detection engine (not yet built)
- Architectural difference makes comparison meaningless
- Parser is still experimental/unstable
Document with #[ignore = "reason"] when:
- Detection Engine Dependency: Test requires license normalization or copyright detection
- Architectural Difference: Intentional implementation difference (e.g., data structure)
- Beyond Parity: We implement features Python has as TODO/missing
Always document WHY in the ignore attribute.
Comparison helpers should normalize legitimate differences such as ordering, null-vs-missing representation, and URL normalization before asserting equivalence.
Before marking a parser complete:
- ✅ All relevant golden tests passing OR documented as ignored with reason
- ✅ Unit tests cover extraction logic
- ✅ Edge cases validated (empty files, malformed input, etc.)
- ✅ Real-world test data included
- ✅ Performance acceptable (benchmarked)
- ADR 0001: Trait-Based Parser Architecture - Parser structure enables golden testing
- ADR 0002: Extraction vs Detection Separation - Why some tests are blocked on detection engine
- ADR 0004: Security-First Parsing - Security property testing complements golden tests
- Python reference test data:
reference/scancode-toolkit/tests/packagedcode/data/ - Golden test examples:
src/parsers/*_golden_test.rs - Test infrastructure:
src/parsers/golden_test_utils.rs - CI configuration:
.github/workflows/check.yml