Name	Name	Last commit message	Last commit date
parent directory ..
basic	basic
custom	custom
drift	drift
info	info
outliers_multi	outliers_multi
outliers_uni	outliers_uni
pattern	pattern
referential	referential
schema	schema
timeseries	timeseries
README.md	README.md
_template.md	_template.md
adjusted_boxplot_fraction.md	adjusted_boxplot_fraction.md
adwin.md	adwin.md
auto_outlier.md	auto_outlier.md
benford_law_fit.md	benford_law_fit.md
bocpd.md	bocpd.md
callable_check.md	callable_check.md
cardinality_in_range.md	cardinality_in_range.md
causality.md	causality.md
checks.md	checks.md
chi_square_drift.md	chi_square_drift.md
column_pair_comparison.md	column_pair_comparison.md
completeness.md	completeness.md
composite_uniqueness.md	composite_uniqueness.md
cramers_v.md	cramers_v.md
cusum.md	cusum.md
date_format.md	date_format.md
date_part_missing_fraction.md	date_part_missing_fraction.md
detectors.md	detectors.md
double_mad_outlier_fraction.md	double_mad_outlier_fraction.md
drift.md	drift.md
ecod.md	ecod.md
freshness_seconds_behind.md	freshness_seconds_behind.md
generalized_esd.md	generalized_esd.md
grubbs.md	grubbs.md
hbos.md	hbos.md
holt_winters.md	holt_winters.md
iqr_fence.md	iqr_fence.md
isolation_forest_fraction.md	isolation_forest_fraction.md
js_divergence.md	js_divergence.md
kl_divergence.md	kl_divergence.md
ks_pvalue.md	ks_pvalue.md
lof.md	lof.md
mad_outlier_fraction.md	mad_outlier_fraction.md
mahalanobis_distance.md	mahalanobis_distance.md
matrix_profile.md	matrix_profile.md
max_in_range.md	max_in_range.md
median_in_range.md	median_in_range.md
min_in_range.md	min_in_range.md
mmd.md	mmd.md
monotonicity.md	monotonicity.md
mutual_information.md	mutual_information.md
null_fraction.md	null_fraction.md
numeric_mean.md	numeric_mean.md
one_class_svm.md	one_class_svm.md
outlier_fraction_drift.md	outlier_fraction_drift.md
outliers_uni.md	outliers_uni.md
page_hinkley.md	page_hinkley.md
prophet_anomaly.md	prophet_anomaly.md
psi.md	psi.md
quantile_in_range.md	quantile_in_range.md
referential_integrity_rate.md	referential_integrity_rate.md
regex_match.md	regex_match.md
remote_check.md	remote_check.md
row_count_in_range.md	row_count_in_range.md
schema_change.md	schema_change.md
set_exclusion.md	set_exclusion.md
set_membership.md	set_membership.md
sql_assertion_violation.md	sql_assertion_violation.md
stddev_in_range.md	stddev_in_range.md
stl_residual_zscore.md	stl_residual_zscore.md
string_case_violation.md	string_case_violation.md
string_length_range.md	string_length_range.md
sum_in_range.md	sum_in_range.md
timeseries.md	timeseries.md
uniqueness.md	uniqueness.md
validity.md	validity.md
value_in_range.md	value_in_range.md
volume.md	volume.md
wasserstein_1.md	wasserstein_1.md
zscore_outlier_fraction.md	zscore_outlier_fraction.md

dqt Algorithms Reference

Every registered detector in dqt has a structured doc page at docs/algorithms/<group>/<slug>.md containing:

What it computes, assumptions, and parameters
When it works well and when it fails (with failure-mode table)
Default-threshold calibration (FPR per canonical data shape)
Recommended thresholds per data shape
Canonical citation and runnable Python API example
Limitations

The 64 detectors below are grouped by dqt.algorithms.<group> module. Every detector implements the same contract: fit(reference) -> state then score(current, state) -> DetectorResult.

basic (27)

Declarative, deterministic, rule-based checks that don't require statistical fitting.

Slug	Summary
cardinality_in_range	`COUNT(DISTINCT col)` must fall within `[min_val, max_val]`.
column_pair_comparison	Fraction of rows violating a cross-column rule (`shipped_at >= created_at`).
completeness	Fraction of non-null values (inverse of `null_fraction`).
composite_uniqueness	Duplicate fraction on a multi-column composite key.
date_format	Fraction of non-null values whose string shape does not match the declared format.
date_part_missing_fraction	Fraction of expected time buckets (day/hour/...) that contain zero rows.
freshness_seconds_behind	Seconds elapsed since the most recent row timestamp.
max_in_range	`MAX(col)` must fall within `[min_val, max_val]`.
median_in_range	`PERCENTILE_CONT(0.5)` must fall within `[min_val, max_val]`.
min_in_range	`MIN(col)` must fall within `[min_val, max_val]`.
monotonicity	Sequence must be non-decreasing (or non-increasing).
null_fraction	Fraction of NULL rows in the column.
numeric_mean	Z-score of `AVG(col)` relative to the fitted baseline mean.
quantile_in_range	A specified quantile (p95 etc.) must fall within `[min_val, max_val]`.
regex_match	Fraction of non-null values not matching a POSIX regex.
row_count_in_range	Row count in a date window must fall within `[min_rows, max_rows]`.
set_exclusion	Fraction of values matching a forbidden set.
set_membership	Fraction of values not in the allowed set.
sql_assertion_violation	Fraction of rows failing a custom SQL boolean expression.
stddev_in_range	`STDDEV(col)` must fall within `[min_val, max_val]`.
string_case_violation	Fraction of values violating an upper/lower/title case rule.
string_length_range	Fraction of values whose character length is outside `[min_len, max_len]`.
sum_in_range	`SUM(col)` must fall within `[min_val, max_val]`.
uniqueness	`COUNT(DISTINCT col) / COUNT(*)`; higher is better.
validity	Fraction of rows satisfying a user-supplied SQL predicate.
value_in_range	Fraction of rows whose value falls outside `[min_val, max_val]`.
volume	Fractional deviation of current row count from a fitted baseline.

custom (2)

Extension points for arbitrary user logic.

Slug	Summary
callable_check	Wrap any Python `fn(df) -> float` as a dqt detector.
remote_check	POST a sample to an external HTTP/GraphQL endpoint and use the returned score.

drift (8)

Two-sample drift detectors comparing a reference window to a current window.

Slug	Summary
adwin	Adaptive windowing with Hoeffding's bound; binary drift signal on streaming numeric data.
chi_square_drift	`1 - p_value` from a chi-square test on categorical frequency counts.
js_divergence	Bounded symmetric Jensen-Shannon distance in `[0, 1]`.
kl_divergence	Asymmetric KL divergence in nats.
ks_pvalue	`1 - p_value` from a two-sample Kolmogorov-Smirnov test on continuous data.
mmd	Kernel-based Maximum Mean Discrepancy for multivariate drift.
psi	Population Stability Index — industry-standard binned drift score.
wasserstein_1	Earth-mover distance normalised by reference standard deviation.

info (2)

Information-theoretic association and drift measures.

Slug	Summary
cramers_v	Cramér's V — bounded effect-size for categorical drift.
mutual_information	Normalised mutual information between reference and current.

outliers_multi (6)

Multivariate outlier detectors operating on numeric feature matrices.

Slug	Summary
ecod	Empirical-CDF tail probability aggregation; the default for wide tabular data.
hbos	Per-column histogram density score; fast feature-independent baseline.
isolation_forest_fraction	Tree-ensemble isolation depth; classifies anomalous rows.
lof	Local Outlier Factor — k-nearest-neighbour density ratio.
mahalanobis_distance	Chi-square distance under multivariate normality.
one_class_svm	Kernel SVM that learns a tight support boundary around the reference.

outliers_uni (9)

Univariate outlier detectors on a single numeric column.

Slug	Summary
adjusted_boxplot_fraction	IQR fence corrected for skewness via the medcouple statistic.
auto_outlier	Profiles the reference and delegates to the appropriate inner detector.
double_mad_outlier_fraction	Asymmetric MAD with separate scales for the left and right tails.
generalized_esd	Rosner's ESD test for up to k outliers in a normal column.
grubbs	Single-outlier hypothesis test under normality.
iqr_fence	Classic Tukey IQR fence.
mad_outlier_fraction	Modified Z-score using median and MAD; 50% breakdown point.
outlier_fraction_drift	Meta-detector on a time series of upstream outlier fractions.
zscore_outlier_fraction	Standard Z-score; valid only for confirmed Gaussian columns.

pattern (1)

Pattern-conformance detectors that don't need a data-driven reference.

Slug	Summary
benford_law_fit	Chi-square goodness-of-fit against Benford's first-digit law.

referential (1)

Cross-table referential integrity.

Slug	Summary
referential_integrity_rate	Fraction of child FK values that exist in the parent table.

schema (1)

Schema-drift detection.

Slug	Summary
schema_change	Added, removed, or type-changed columns relative to the recorded baseline schema.

timeseries (7)

Time-series anomaly and change-point detection.

Slug	Summary
bocpd	Bayesian online change-point detection with run-length posterior.
cusum	Two-sided CUSUM control chart for sustained mean shifts.
holt_winters	Holt-Winters exponential smoothing with prediction-interval anomalies.
matrix_profile	STUMPY Matrix Profile for shape-based discord detection.
page_hinkley	Sequential one-directional mean-shift test.
prophet_anomaly	Meta Prophet forecast with uncertainty band; needs `dqt[forecast]`.
stl_residual_zscore	STL decomposition + Z-score on the residual component.

Tooling

scripts/regenerate_calibration_tables.py — recompute the default-threshold calibration tables on the canonical fixtures (Normal, Lognormal, Poisson, Beta, Pareto, Exponential).
packages/dqt/tests/docs/test_docs_completeness.py — verifies every registered detector has a doc page with all required sections.

Adding a new detector

Implement the detector under packages/dqt/src/dqt/algorithms/<group>/<slug>.py and register it.
Add an entry to packages/dqt/src/dqt/algorithms/_scales.py.
Create docs/algorithms/<group>/<slug>.md following the structure of the existing pages.
Run pytest packages/dqt/tests/docs/test_docs_completeness.py — it will fail until every required section is present.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

dqt Algorithms Reference

basic (27)

custom (2)

drift (8)

info (2)

outliers_multi (6)

outliers_uni (9)

pattern (1)

referential (1)

schema (1)

timeseries (7)

Tooling

Adding a new detector

FilesExpand file tree

algorithms

Directory actions

More options

Directory actions

More options

Latest commit

History

algorithms

Folders and files

parent directory

README.md

dqt Algorithms Reference

basic (27)

custom (2)

drift (8)

info (2)

outliers_multi (6)

outliers_uni (9)

pattern (1)

referential (1)

schema (1)

timeseries (7)

Tooling

Adding a new detector