Every registered detector in dqt has a structured doc page at docs/algorithms/<group>/<slug>.md containing:
- What it computes, assumptions, and parameters
- When it works well and when it fails (with failure-mode table)
- Default-threshold calibration (FPR per canonical data shape)
- Recommended thresholds per data shape
- Canonical citation and runnable Python API example
- Limitations
The 64 detectors below are grouped by dqt.algorithms.<group> module. Every detector implements the same contract: fit(reference) -> state then score(current, state) -> DetectorResult.
Declarative, deterministic, rule-based checks that don't require statistical fitting.
| Slug | Summary |
|---|---|
| cardinality_in_range | COUNT(DISTINCT col) must fall within [min_val, max_val]. |
| column_pair_comparison | Fraction of rows violating a cross-column rule (shipped_at >= created_at). |
| completeness | Fraction of non-null values (inverse of null_fraction). |
| composite_uniqueness | Duplicate fraction on a multi-column composite key. |
| date_format | Fraction of non-null values whose string shape does not match the declared format. |
| date_part_missing_fraction | Fraction of expected time buckets (day/hour/...) that contain zero rows. |
| freshness_seconds_behind | Seconds elapsed since the most recent row timestamp. |
| max_in_range | MAX(col) must fall within [min_val, max_val]. |
| median_in_range | PERCENTILE_CONT(0.5) must fall within [min_val, max_val]. |
| min_in_range | MIN(col) must fall within [min_val, max_val]. |
| monotonicity | Sequence must be non-decreasing (or non-increasing). |
| null_fraction | Fraction of NULL rows in the column. |
| numeric_mean | Z-score of AVG(col) relative to the fitted baseline mean. |
| quantile_in_range | A specified quantile (p95 etc.) must fall within [min_val, max_val]. |
| regex_match | Fraction of non-null values not matching a POSIX regex. |
| row_count_in_range | Row count in a date window must fall within [min_rows, max_rows]. |
| set_exclusion | Fraction of values matching a forbidden set. |
| set_membership | Fraction of values not in the allowed set. |
| sql_assertion_violation | Fraction of rows failing a custom SQL boolean expression. |
| stddev_in_range | STDDEV(col) must fall within [min_val, max_val]. |
| string_case_violation | Fraction of values violating an upper/lower/title case rule. |
| string_length_range | Fraction of values whose character length is outside [min_len, max_len]. |
| sum_in_range | SUM(col) must fall within [min_val, max_val]. |
| uniqueness | COUNT(DISTINCT col) / COUNT(*); higher is better. |
| validity | Fraction of rows satisfying a user-supplied SQL predicate. |
| value_in_range | Fraction of rows whose value falls outside [min_val, max_val]. |
| volume | Fractional deviation of current row count from a fitted baseline. |
Extension points for arbitrary user logic.
| Slug | Summary |
|---|---|
| callable_check | Wrap any Python fn(df) -> float as a dqt detector. |
| remote_check | POST a sample to an external HTTP/GraphQL endpoint and use the returned score. |
Two-sample drift detectors comparing a reference window to a current window.
| Slug | Summary |
|---|---|
| adwin | Adaptive windowing with Hoeffding's bound; binary drift signal on streaming numeric data. |
| chi_square_drift | 1 - p_value from a chi-square test on categorical frequency counts. |
| js_divergence | Bounded symmetric Jensen-Shannon distance in [0, 1]. |
| kl_divergence | Asymmetric KL divergence in nats. |
| ks_pvalue | 1 - p_value from a two-sample Kolmogorov-Smirnov test on continuous data. |
| mmd | Kernel-based Maximum Mean Discrepancy for multivariate drift. |
| psi | Population Stability Index — industry-standard binned drift score. |
| wasserstein_1 | Earth-mover distance normalised by reference standard deviation. |
Information-theoretic association and drift measures.
| Slug | Summary |
|---|---|
| cramers_v | Cramér's V — bounded effect-size for categorical drift. |
| mutual_information | Normalised mutual information between reference and current. |
Multivariate outlier detectors operating on numeric feature matrices.
| Slug | Summary |
|---|---|
| ecod | Empirical-CDF tail probability aggregation; the default for wide tabular data. |
| hbos | Per-column histogram density score; fast feature-independent baseline. |
| isolation_forest_fraction | Tree-ensemble isolation depth; classifies anomalous rows. |
| lof | Local Outlier Factor — k-nearest-neighbour density ratio. |
| mahalanobis_distance | Chi-square distance under multivariate normality. |
| one_class_svm | Kernel SVM that learns a tight support boundary around the reference. |
Univariate outlier detectors on a single numeric column.
| Slug | Summary |
|---|---|
| adjusted_boxplot_fraction | IQR fence corrected for skewness via the medcouple statistic. |
| auto_outlier | Profiles the reference and delegates to the appropriate inner detector. |
| double_mad_outlier_fraction | Asymmetric MAD with separate scales for the left and right tails. |
| generalized_esd | Rosner's ESD test for up to k outliers in a normal column. |
| grubbs | Single-outlier hypothesis test under normality. |
| iqr_fence | Classic Tukey IQR fence. |
| mad_outlier_fraction | Modified Z-score using median and MAD; 50% breakdown point. |
| outlier_fraction_drift | Meta-detector on a time series of upstream outlier fractions. |
| zscore_outlier_fraction | Standard Z-score; valid only for confirmed Gaussian columns. |
Pattern-conformance detectors that don't need a data-driven reference.
| Slug | Summary |
|---|---|
| benford_law_fit | Chi-square goodness-of-fit against Benford's first-digit law. |
Cross-table referential integrity.
| Slug | Summary |
|---|---|
| referential_integrity_rate | Fraction of child FK values that exist in the parent table. |
Schema-drift detection.
| Slug | Summary |
|---|---|
| schema_change | Added, removed, or type-changed columns relative to the recorded baseline schema. |
Time-series anomaly and change-point detection.
| Slug | Summary |
|---|---|
| bocpd | Bayesian online change-point detection with run-length posterior. |
| cusum | Two-sided CUSUM control chart for sustained mean shifts. |
| holt_winters | Holt-Winters exponential smoothing with prediction-interval anomalies. |
| matrix_profile | STUMPY Matrix Profile for shape-based discord detection. |
| page_hinkley | Sequential one-directional mean-shift test. |
| prophet_anomaly | Meta Prophet forecast with uncertainty band; needs dqt[forecast]. |
| stl_residual_zscore | STL decomposition + Z-score on the residual component. |
scripts/regenerate_calibration_tables.py— recompute the default-threshold calibration tables on the canonical fixtures (Normal, Lognormal, Poisson, Beta, Pareto, Exponential).packages/dqt/tests/docs/test_docs_completeness.py— verifies every registered detector has a doc page with all required sections.
- Implement the detector under
packages/dqt/src/dqt/algorithms/<group>/<slug>.pyand register it. - Add an entry to
packages/dqt/src/dqt/algorithms/_scales.py. - Create
docs/algorithms/<group>/<slug>.mdfollowing the structure of the existing pages. - Run
pytest packages/dqt/tests/docs/test_docs_completeness.py— it will fail until every required section is present.