Skip to content

Work-In-Progress-For-Health/patient-matching-rust-crate

Repository files navigation

Patient Matching Rust Crate

A comprehensive Rust library for matching patient records in healthcare information exchanges, developed for NHS Wales.

Overview

This crate implements both deterministic and probabilistic patient matching algorithms based on research from:

Features

  • Deterministic Matching: Exact matches on NHS numbers and key demographics
  • Probabilistic Matching: Fuzzy matching with configurable scoring thresholds
  • String Similarity Algorithms: Jaro-Winkler and Levenshtein distance
  • UK NHS Number Support: Validation and normalization
  • Phonetic Matching: Soundex-like algorithm for names (handles "Stephen" vs "Steven")
  • Welsh Language Support: Handles diacritics (Siân → Sian)
  • Address Normalization: Postcode and street address comparison
  • Phone Number Normalization: UK format handling (+44, 0044, 07xxx)
  • Configurable Weights: Customize importance of each field
  • Serialization Support: JSON import/export via serde

Installation

Add to your Cargo.toml:

[dependencies]
patient-matching = "0.1.0"

Usage

Basic Example

use patient_matching::{Patient, MatchingEngine, MatchConfig};
use chrono::NaiveDate;

fn main() {
    // Create two patient records
    let patient1 = Patient::builder()
        .given_name("John")
        .family_name("Smith")
        .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
        .nhs_number("1234567890")
        .build();

    let patient2 = Patient::builder()
        .given_name("Jon")  // Typo
        .family_name("Smith")
        .date_of_birth(NaiveDate::from_ymd_opt(1980, 5, 15).unwrap())
        .nhs_number("1234567890")
        .build();

    // Create matching engine with default config
    let engine = MatchingEngine::default_config();

    // Match patients
    let result = engine.match_patients(&patient1, &patient2);

    println!("Match score: {:.2}", result.score);
    println!("Is match: {}", result.is_match);
    println!("Confidence: {:?}", result.confidence);
}

Configurable Matching

use patient_matching::{MatchConfig, MatchingEngine};

// Strict matching (exact matches required)
let strict_engine = MatchingEngine::new(MatchConfig::strict());

// Lenient matching (more forgiving for typos)
let lenient_engine = MatchingEngine::new(MatchConfig::lenient());

// Custom configuration
let custom_config = MatchConfig {
    match_threshold: 0.90,
    nhs_number_weight: 0.40,  // Increase NHS number importance
    given_name_weight: 0.15,
    family_name_weight: 0.20,
    date_of_birth_weight: 0.15,
    use_phonetic_matching: true,
    ..Default::default()
};

let engine = MatchingEngine::new(custom_config);

Deterministic Matching

// Check for exact matches only
let is_deterministic_match = engine.deterministic_match(&patient1, &patient2);

if is_deterministic_match {
    println!("Exact match on NHS number or all key demographics");
}

Detailed Match Breakdown

let result = engine.match_patients(&patient1, &patient2);

println!("Overall score: {:.2}", result.score);
println!("NHS number score: {:?}", result.breakdown.nhs_number_score);
println!("Given name score: {:?}", result.breakdown.given_name_score);
println!("Family name score: {:?}", result.breakdown.family_name_score);
println!("Date of birth score: {:?}", result.breakdown.date_of_birth_score);
println!("Address score: {:?}", result.breakdown.address_score);
println!("Phonetic name score: {:?}", result.breakdown.phonetic_name_score);

Patient Data Model

The Patient struct supports:

  • NHS Number: UK national health identifier
  • Name Fields: First, middle, and Family names
  • Date of Birth: Birth date for age verification
  • Gender: Male, Female, Other, Unknown
  • Address: Multi-line address with postcode
  • Contact: Phone, mobile, email
  • Local ID: Hospital/practice-specific identifier

Matching Algorithm

The matching engine uses a weighted scoring system:

Field Default Weight Purpose
NHS Number 30% Strongest identifier when available
Family Name 20% Critical demographic
Date of Birth 20% Age verification
Given Name 15% Important but subject to nicknames
Address 5% Supporting evidence
Gender 5% Supporting evidence
Phone 5% Supporting evidence

Phonetic Matching provides bonus points when names sound similar (e.g., "Stephen" vs "Steven").

Research Basis

Key Findings Applied

  1. No 100% Accuracy: Research shows even the best algorithms achieve 90-98% accuracy. This crate aims for transparency with confidence scores.

  2. Standardization Critical: All inputs are normalized:

    • Names: lowercase, remove diacritics, trim spaces
    • Postcodes: uppercase, remove spaces
    • Phone numbers: remove formatting, handle country codes
    • NHS numbers: digits only
  3. Multi-Factor Approach: Following research recommendations, matching uses multiple demographic fields rather than relying on a single identifier.

  4. Weighted Probabilistic Matching: Combines multiple weak identifiers into a strong match signal, following best practices from health information exchanges.

Testing

Run the test suite:

# Unit tests
cargo test

# Integration tests
cargo test --test integration_tests

# Run with output
cargo test -- --nocapture

# Run specific test
cargo test test_fuzzy_name_match

Test Coverage

  • ✅ Perfect matches (100% score)
  • ✅ Fuzzy name matching (typos, alternate spellings)
  • ✅ Welsh names with diacritics
  • ✅ Phonetic name matching
  • ✅ UK phone number normalization
  • ✅ Address comparison
  • ✅ NHS number validation
  • ✅ Deterministic matching
  • ✅ Strict vs lenient modes
  • ✅ Missing field handling
  • ✅ Serialization/deserialization

Example: Running the Demo

cargo run

This runs example scenarios including:

  1. Perfect match
  2. Fuzzy name match (Stephen vs Steven)
  3. Welsh names with diacritics (Siân vs Sian)
  4. Address matching
  5. Complete mismatch
  6. Strict vs lenient comparison

Performance Considerations

  • Time Complexity: O(1) for deterministic matching, O(n) for string similarity
  • Memory: Minimal allocation, uses borrowed references where possible
  • Concurrency: Thread-safe, all operations are immutable

Limitations

  1. No Machine Learning: This is a rule-based system, not ML/AI
  2. Single Country Focus: Optimized for UK/NHS data formats
  3. No Persistent Storage: In-memory matching only
  4. No Batch Processing: Processes pairs of patients

Future Enhancements

  • Support for other national identifiers (SSN, etc.)
  • Batch matching API for large datasets
  • Machine learning integration
  • Performance benchmarks
  • More sophisticated address parsing
  • International phone number support

License

MIT OR Apache-2.0

Contributing

Contributions welcome! Please ensure:

  • All tests pass (cargo test)
  • Code is formatted (cargo fmt)
  • No clippy warnings (cargo clippy)

References

  1. Grannis SJ, et al. "Patient Matching within a Health Information Exchange." AMIA Annu Symp Proc. 2014.
  2. Reisman M. "Patient Identification Techniques – Approaches, Implications, and Findings." NCVHS. 2020.

Contact

For NHS Wales specific queries, contact the Digital Health and Care (DHCW) team.

About

Patient matching Rust crate - work in progress - research and development

Resources

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

No packages published

Languages