A comprehensive Rust library for matching patient records in healthcare information exchanges, developed for NHS Wales.
This crate implements both deterministic and probabilistic patient matching algorithms based on research from:
- Patient Matching within a Health Information Exchange
- Patient Identification Techniques – Approaches, Implications, and Findings
- ✅ Deterministic Matching: Exact matches on NHS numbers and key demographics
- ✅ Probabilistic Matching: Fuzzy matching with configurable scoring thresholds
- ✅ String Similarity Algorithms: Jaro-Winkler and Levenshtein distance
- ✅ UK NHS Number Support: Validation and normalization
- ✅ Phonetic Matching: Soundex-like algorithm for names (handles "Stephen" vs "Steven")
- ✅ Welsh Language Support: Handles diacritics (Siân → Sian)
- ✅ Address Normalization: Postcode and street address comparison
- ✅ Phone Number Normalization: UK format handling (+44, 0044, 07xxx)
- ✅ Configurable Weights: Customize importance of each field
- ✅ Serialization Support: JSON import/export via serde
Add to your Cargo.toml:
[dependencies]
patient-matching = "0.1.0"use patient_matching::{Patient, MatchingEngine, MatchConfig};
use chrono::NaiveDate;
fn main() {
// Create two patient records
let patient1 = Patient::builder()
.given_name("John")
.family_name("Smith")
.date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
.nhs_number("1234567890")
.build();
let patient2 = Patient::builder()
.given_name("Jon") // Typo
.family_name("Smith")
.date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
.nhs_number("1234567890")
.build();
// Create matching engine with default config
let engine = MatchingEngine::default_config();
// Match patients
let result = engine.match_patients(&patient1, &patient2);
println!("Match score: {:.2}", result.score);
println!("Is match: {}", result.is_match);
println!("Confidence: {:?}", result.confidence);
}use patient_matching::{MatchConfig, MatchingEngine};
// Strict matching (exact matches required)
let strict_engine = MatchingEngine::new(MatchConfig::strict());
// Lenient matching (more forgiving for typos)
let lenient_engine = MatchingEngine::new(MatchConfig::lenient());
// Custom configuration
let custom_config = MatchConfig {
match_threshold: 0.90,
nhs_number_weight: 0.40, // Increase NHS number importance
given_name_weight: 0.15,
family_name_weight: 0.20,
date_of_birth_weight: 0.15,
use_phonetic_matching: true,
..Default::default()
};
let engine = MatchingEngine::new(custom_config);// Check for exact matches only
let is_deterministic_match = engine.deterministic_match(&patient1, &patient2);
if is_deterministic_match {
println!("Exact match on NHS number or all key demographics");
}let result = engine.match_patients(&patient1, &patient2);
println!("Overall score: {:.2}", result.score);
println!("NHS number score: {:?}", result.breakdown.nhs_number_score);
println!("Given name score: {:?}", result.breakdown.given_name_score);
println!("Family name score: {:?}", result.breakdown.family_name_score);
println!("Date of birth score: {:?}", result.breakdown.date_of_birth_score);
println!("Address score: {:?}", result.breakdown.address_score);
println!("Phonetic name score: {:?}", result.breakdown.phonetic_name_score);The Patient struct supports:
- NHS Number: UK national health identifier
- Name Fields: First, middle, and Family names
- Date of Birth: Birth date for age verification
- Gender: Male, Female, Other, Unknown
- Address: Multi-line address with postcode
- Contact: Phone, mobile, email
- Local ID: Hospital/practice-specific identifier
The matching engine uses a weighted scoring system:
| Field | Default Weight | Purpose |
|---|---|---|
| NHS Number | 30% | Strongest identifier when available |
| Family Name | 20% | Critical demographic |
| Date of Birth | 20% | Age verification |
| Given Name | 15% | Important but subject to nicknames |
| Address | 5% | Supporting evidence |
| Gender | 5% | Supporting evidence |
| Phone | 5% | Supporting evidence |
Phonetic Matching provides bonus points when names sound similar (e.g., "Stephen" vs "Steven").
-
No 100% Accuracy: Research shows even the best algorithms achieve 90-98% accuracy. This crate aims for transparency with confidence scores.
-
Standardization Critical: All inputs are normalized:
- Names: lowercase, remove diacritics, trim spaces
- Postcodes: uppercase, remove spaces
- Phone numbers: remove formatting, handle country codes
- NHS numbers: digits only
-
Multi-Factor Approach: Following research recommendations, matching uses multiple demographic fields rather than relying on a single identifier.
-
Weighted Probabilistic Matching: Combines multiple weak identifiers into a strong match signal, following best practices from health information exchanges.
Run the test suite:
# Unit tests
cargo test
# Integration tests
cargo test --test integration_tests
# Run with output
cargo test -- --nocapture
# Run specific test
cargo test test_fuzzy_name_match- ✅ Perfect matches (100% score)
- ✅ Fuzzy name matching (typos, alternate spellings)
- ✅ Welsh names with diacritics
- ✅ Phonetic name matching
- ✅ UK phone number normalization
- ✅ Address comparison
- ✅ NHS number validation
- ✅ Deterministic matching
- ✅ Strict vs lenient modes
- ✅ Missing field handling
- ✅ Serialization/deserialization
cargo runThis runs example scenarios including:
- Perfect match
- Fuzzy name match (Stephen vs Steven)
- Welsh names with diacritics (Siân vs Sian)
- Address matching
- Complete mismatch
- Strict vs lenient comparison
- Time Complexity: O(1) for deterministic matching, O(n) for string similarity
- Memory: Minimal allocation, uses borrowed references where possible
- Concurrency: Thread-safe, all operations are immutable
- No Machine Learning: This is a rule-based system, not ML/AI
- Single Country Focus: Optimized for UK/NHS data formats
- No Persistent Storage: In-memory matching only
- No Batch Processing: Processes pairs of patients
- Support for other national identifiers (SSN, etc.)
- Batch matching API for large datasets
- Machine learning integration
- Performance benchmarks
- More sophisticated address parsing
- International phone number support
MIT OR Apache-2.0
Contributions welcome! Please ensure:
- All tests pass (
cargo test) - Code is formatted (
cargo fmt) - No clippy warnings (
cargo clippy)
- Grannis SJ, et al. "Patient Matching within a Health Information Exchange." AMIA Annu Symp Proc. 2014.
- Reisman M. "Patient Identification Techniques – Approaches, Implications, and Findings." NCVHS. 2020.
For NHS Wales specific queries, contact the Digital Health and Care (DHCW) team.