This project contains an UNOFFICIAL Python port of the open-source Data Quality Dashboard (DQD) application, which is written in R.
Table of Contents
This package runs data quality checks against an OMOP CDM instance. It is a Python port of the R-based Data Quality Dashboard built by members of the OHDSI commnunities.
-
Requires Python 3.7+
-
Currently compatible with the following Datbase management systems:
- Google BigQuery
- Microsoft SQL Server (tsql)
- Sqlite
Clone the repository locally.
├── README.md
├── app
...
│ ├── index.html
│ ├── js
│ ├── results.json
│ ├── results_default.json
│ ...
├── csv
│ ├── OMOP_CDMv5.2_Check_Descriptions.csv
│ ├── OMOP_CDMv5.2_Concept_Level.csv
│ ├── OMOP_CDMv5.2_Field_Level.csv
...
├── db.py
├── r_to_python_sql_template.py
├── requirements.txt
├── run_checks.py
└── sql
├── concept_plausible_gender.sql
├── concept_plausible_unit_concept_ids.sql
├── concept_plausible_value_high.sql
├── concept_plausible_value_low.sql
...
run_checks.py: Entrypoint script that executes the DQD on an OMOP CDM instance.
r_to_python_sql_template.py: A helper script for converting R-compatible templating used by official DQD to Python-compatible templating
csv/: Directory containing instructions for running DQD checks.
sql/: Directory containing SQL queries for running DQD checks.
app/: Directory containing static files for running web application results.json: Output file of DQD execution.\
db.py: Helper module for standardizing database management.
- Install package requirements.
pip install -r requirements.txt- Install http-server for static file rendering.
npm install -g http-server- The script
run_checks.pyis the entry point. View command line parameters for the DBMS hosting OMOP CDM instance. E.g. for a BigQuery instance:
python run_checks.py bigquery -h
usage: run_checks.py bigquery [-h] [--output_folder OUTPUT_FOLDER] [--output_file OUTPUT_FILE]
[--check_names [{cdmTable,measurePersonCompleteness,measureConditionEraCompleteness,cdmField,isRequired,cdmDatatype,isPrimaryKey,isForeignKey,fkDomain,fkClass,isStandardValidConcept,measureValueCompleteness,standardConceptRecordCompleteness,sourceConceptRecordCompleteness,sourceValueCompleteness,plausibleValueLow,plausibleValueHigh,plausibleTemporalAfter,plausibleDuringLife,withinVisitDates,plausibleGender,plausibleUnitConceptIds} ...]]
[--tables_to_include [TABLES_TO_INCLUDE ...]] [--sql_only]
project_id dataset_id
positional arguments:
project_id BigQuery project id
dataset_id BigQuery dataset id
optional arguments:
-h, --help show this help message and exit
--output_folder OUTPUT_FOLDER
Folder to save DQD output
--output_file OUTPUT_FILE
Name of output json file to save results.
--check_names [{cdmTable,measurePersonCompleteness,measureConditionEraCompleteness,cdmField,isRequired,cdmDatatype,isPrimaryKey,isForeignKey,fkDomain,fkClass,isStandardValidConcept,measureValueCompleteness,standardConceptRecordCompleteness,sourceConceptRecordCompleteness,sourceValueCompleteness,plausibleValueLow,plausibleValueHigh,plausibleTemporalAfter,plausibleDuringLife,withinVisitDates,plausibleGender,plausibleUnitConceptIds} ...]
Subgroup of checks to run
--tables_to_include [TABLES_TO_INCLUDE ...]
Subgroup of OMOP tables to include in checks
--sql_only
- Execute
run_checks.pyscript with desired parameters to execute DQD on the OMOP CDM instace. E.g.
python run_checks.py bigquery my_bq_project my_bq_dataset --check_names cdmTable: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:35<00:00, 35.38s/it]
Writing results to app/results.json.
- If output_path and output_dir are left unspecified, the results are by default written to the app/ directory with the filename results.json. If the defaults were not using, the resulting file must be moved to the directory with this same name.
- Visualize the DQD results by running the DQD static web application. To start, run the following command:
http-server app/
Starting up http-server, serving app/
http-server version: 14.1.1
http-server settings:
CORS: disabled
Cache: 3600 seconds
Connection Timeout: 120 seconds
Directory Listings: visible
AutoIndex: visible
Serve GZIP Files: false
Serve Brotli Files: false
Default File Extension: none
Available on:
http://127.0.0.1:8080
http://192.168.1.64:8080
Hit CTRL-C to stop the server
- Open a web browser to URL http://127.0.0.1:8080 to view resulting tables.