The high-performance core of the Law Compare system, built with Rust. It provides the heavy-lifting logic for parsing, aligning, and analyzing legal documents.
The backend transforms raw unstructured legal text into a structured hierarchical tree.
- Pattern Matching: Uses optimized regular expressions to identify legal markers (e.g., "第一条", "第十章").
- State Machine: A custom parser traverses the text, maintaining a stack of parents (Chapters, Sections) to correctly attribute Article nodes.
- Normalization: Handles full-width/half-width characters and varied indentation styles prevalent in official legal publications.
This is the core algorithm that links "Old" articles to "New" articles, even when they move.
- Similarity Matrix: Computes a weighted score between every article pair using Jaccard Similarity, Containment Score, and Character Overlap.
- Multi-Stage Matching:
- Strict 1:1 Match: Same number and high similarity.
- Renumbering Detection: High similarity but different numbering.
- Contextual Bonus: Boosting scores if surrounding articles or parents (titles) match.
- Merge/Split Detection: N:1 and 1:N patterns identifying complex legislative changes.
Engineered for scale and low-latency.
- Zero-Copy Strings (
Arc<str>): Textual data is wrapped in Atomic Reference Counters. Both versions point to the same memory segment when identical, reducing memory overhead. - Tokio Multi-threading: API requests are handled asynchronously. CPU-intensive alignment tasks are offloaded to
spawn_blocking. - Parallel Processing: Uses
rayonto calculate the N x M similarity matrix in parallel.
- A hybrid approach using optimized regex patterns to extract Dates, Amounts, and Legal Terms.
- Helps identify material changes (e.g., fee increases) vs. simple wording tweaks.
- Framework: Axum
- NLP:
jieba-rs - Diff:
similar - Parallelism:
rayon&tokio