Patient Matching Rust Crate

A comprehensive Rust library for matching patient records in healthcare information exchanges, developed for NHS Wales.

Overview

This crate implements both deterministic and probabilistic patient matching algorithms based on research from:

Features

✅ Deterministic Matching: Exact matches on NHS numbers and key demographics
✅ Probabilistic Matching: Fuzzy matching with configurable scoring thresholds
✅ String Similarity Algorithms: Jaro-Winkler and Levenshtein distance
✅ UK NHS Number Support: Validation and normalization
✅ Phonetic Matching: Soundex-like algorithm for names (handles "Stephen" vs "Steven")
✅ Welsh Language Support: Handles diacritics (Siân → Sian)
✅ Address Normalization: Postcode and street address comparison
✅ Phone Number Normalization: UK format handling (+44, 0044, 07xxx)
✅ Configurable Weights: Customize importance of each field
✅ Serialization Support: JSON import/export via serde

Installation

Add to your Cargo.toml:

[dependencies]
patient-matching = "0.1.0"

Usage

Basic Example

use patient_matching::{Patient, MatchingEngine, MatchConfig};
use chrono::NaiveDate;

fn main() {
    // Create two patient records
    let patient1 = Patient::builder()
        .given_name("John")
        .family_name("Smith")
        .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
        .nhs_number("1234567890")
        .build();

    let patient2 = Patient::builder()
        .given_name("Jon")  // Typo
        .family_name("Smith")
        .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
        .nhs_number("1234567890")
        .build();

    // Create matching engine with default config
    let engine = MatchingEngine::default_config();

    // Match patients
    let result = engine.match_patients(&patient1, &patient2);

    println!("Match score: {:.2}", result.score);
    println!("Is match: {}", result.is_match);
    println!("Confidence: {:?}", result.confidence);
}

Configurable Matching

use patient_matching::{MatchConfig, MatchingEngine};

// Strict matching (exact matches required)
let strict_engine = MatchingEngine::new(MatchConfig::strict());

// Lenient matching (more forgiving for typos)
let lenient_engine = MatchingEngine::new(MatchConfig::lenient());

// Custom configuration
let custom_config = MatchConfig {
    match_threshold: 0.90,
    nhs_number_weight: 0.40,  // Increase NHS number importance
    given_name_weight: 0.15,
    family_name_weight: 0.20,
    date_of_birth_weight: 0.15,
    use_phonetic_matching: true,
    ..Default::default()
};

let engine = MatchingEngine::new(custom_config);

Deterministic Matching

// Check for exact matches only
let is_deterministic_match = engine.deterministic_match(&patient1, &patient2);

if is_deterministic_match {
    println!("Exact match on NHS number or all key demographics");
}

Detailed Match Breakdown

let result = engine.match_patients(&patient1, &patient2);

println!("Overall score: {:.2}", result.score);
println!("NHS number score: {:?}", result.breakdown.nhs_number_score);
println!("Given name score: {:?}", result.breakdown.given_name_score);
println!("Family name score: {:?}", result.breakdown.family_name_score);
println!("Date of birth score: {:?}", result.breakdown.date_of_birth_score);
println!("Address score: {:?}", result.breakdown.address_score);
println!("Phonetic name score: {:?}", result.breakdown.phonetic_name_score);

Patient Data Model

The Patient struct supports:

NHS Number: UK national health identifier
Name Fields: First, middle, and Family names
Date of Birth: Birth date for age verification
Gender: Male, Female, Other, Unknown
Address: Multi-line address with postcode
Contact: Phone, mobile, email
Local ID: Hospital/practice-specific identifier

Matching Algorithm

The matching engine uses a weighted scoring system:

Field	Default Weight	Purpose
NHS Number	30%	Strongest identifier when available
Family Name	20%	Critical demographic
Date of Birth	20%	Age verification
Given Name	15%	Important but subject to nicknames
Address	5%	Supporting evidence
Gender	5%	Supporting evidence
Phone	5%	Supporting evidence

Phonetic Matching provides bonus points when names sound similar (e.g., "Stephen" vs "Steven").

Research Basis

Key Findings Applied

No 100% Accuracy: Research shows even the best algorithms achieve 90-98% accuracy. This crate aims for transparency with confidence scores.
Standardization Critical: All inputs are normalized:
- Names: lowercase, remove diacritics, trim spaces
- Postcodes: uppercase, remove spaces
- Phone numbers: remove formatting, handle country codes
- NHS numbers: digits only
Multi-Factor Approach: Following research recommendations, matching uses multiple demographic fields rather than relying on a single identifier.
Weighted Probabilistic Matching: Combines multiple weak identifiers into a strong match signal, following best practices from health information exchanges.

Testing

Run the test suite:

# Unit tests
cargo test

# Integration tests
cargo test --test integration_tests

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_fuzzy_name_match

Test Coverage

✅ Perfect matches (100% score)
✅ Fuzzy name matching (typos, alternate spellings)
✅ Welsh names with diacritics
✅ Phonetic name matching
✅ UK phone number normalization
✅ Address comparison
✅ NHS number validation
✅ Deterministic matching
✅ Strict vs lenient modes
✅ Missing field handling
✅ Serialization/deserialization

Example: Running the Demo

cargo run

This runs example scenarios including:

Perfect match
Fuzzy name match (Stephen vs Steven)
Welsh names with diacritics (Siân vs Sian)
Address matching
Complete mismatch
Strict vs lenient comparison

Performance Considerations

Time Complexity: O(1) for deterministic matching, O(n) for string similarity
Memory: Minimal allocation, uses borrowed references where possible
Concurrency: Thread-safe, all operations are immutable

Limitations

No Machine Learning: This is a rule-based system, not ML/AI
Single Country Focus: Optimized for UK/NHS data formats
No Persistent Storage: In-memory matching only
No Batch Processing: Processes pairs of patients

Future Enhancements

Support for other national identifiers (SSN, etc.)
Batch matching API for large datasets
Machine learning integration
Performance benchmarks
More sophisticated address parsing
International phone number support

License

MIT OR Apache-2.0

Contributing

Contributions welcome! Please ensure:

All tests pass (cargo test)
Code is formatted (cargo fmt)
No clippy warnings (cargo clippy)

References

Grannis SJ, et al. "Patient Matching within a Health Information Exchange." AMIA Annu Symp Proc. 2014.
Reisman M. "Patient Identification Techniques – Approaches, Implications, and Findings." NCVHS. 2020.

Contact

For NHS Wales specific queries, contact the Digital Health and Care (DHCW) team.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.claude		.claude
examples		examples
help		help
src		src
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
IMPLEMENTATION_SUMMARY.md		IMPLEMENTATION_SUMMARY.md
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Patient Matching Rust Crate

Overview

Features

Installation

Usage

Basic Example

Configurable Matching

Deterministic Matching

Detailed Match Breakdown

Patient Data Model

Matching Algorithm

Research Basis

Key Findings Applied

Testing

Test Coverage

Example: Running the Demo

Performance Considerations

Limitations

Future Enhancements

License

Contributing

References

Contact

About

Uh oh!

Releases

Packages

Languages

Work-In-Progress-For-Health/patient-matching-rust-crate

Folders and files

Latest commit

History

Repository files navigation

Patient Matching Rust Crate

Overview

Features

Installation

Usage

Basic Example

Configurable Matching

Deterministic Matching

Detailed Match Breakdown

Patient Data Model

Matching Algorithm

Research Basis

Key Findings Applied

Testing

Test Coverage

Example: Running the Demo

Performance Considerations

Limitations

Future Enhancements

License

Contributing

References

Contact

About

Resources

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages