diff --git a/website/src/pages/transform/quickstart.mdx b/website/src/pages/transform/quickstart.mdx index 56a7440..eb2bb4d 100644 --- a/website/src/pages/transform/quickstart.mdx +++ b/website/src/pages/transform/quickstart.mdx @@ -1,114 +1,75 @@ -# carrot-transform +# Carrot-Transform Quick Start Guide -## Quick Start +## Installation -Carrot transform is run from the command line. It now supports poetry to control the python dependencies. To install poetry, please follow the instructions [here](https://python-poetry.org/docs/). - -Once this is done, you can install the dependencies from the command line with the following command: +Carrot-Transform is now available on PyPI, so you can install it with: +```bash +pip install carrot-transform ``` + +Alternatively, if you are working with the source code, install dependencies using Poetry: +```bash poetry install ``` +To install poetry, please follow the instructions [here](https://python-poetry.org/docs/). -To run from the command line, enter: +## Running Carrot-Transform +To execute Carrot-Transform, run: ``` -poetry run python carrot_transform.py [args] +carrot-transform [command] [options] ``` For example, you can get the version number with: ``` -poetry run python carrot_transform.py -v +carrot-transform -v ``` -There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository. +There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository. Enter the following (as one command): -``` -poetry run python carrot_transform.py run mapstream carrottransform/examples/test/inputs\ - --rules-file\ - carrottransform/examples/test/rules/rules_14June2021.json\ - --person-file\ - carrottransform/examples/test/inputs/Demographics.csv\ - --output-dir\ - carrottransform/examples/test/test_output\ - --omop-ddl-file\ - carrottransform/config/OMOPCDM_postgresql_5.3_ddl.sql\ - --omop-config-file\ - carrottransform/config/omop.json +### Basic Example +To process a test dataset included in the repository, run: +```bash +carrot-transform run mapstream \ + carrottransform/examples/test/inputs \ + --rules-file carrottransform/examples/test/rules/rules_14June2021.json \ + --person-file carrottransform/examples/test/inputs/Demographics.csv \ + --output-dir carrottransform/examples/test/test_output \ + --omop-ddl-file carrottransform/config/OMOPCDM_postgresql_5.3_ddl.sql \ + --omop-config-file carrottransform/config/omop.json ``` -This should create a set of output files in this directory: -``` +This will generate a set of output files in this directory: +```bash carrottransform/examples/test/test_output ``` ## Arguments -### Required: - -``` -input-dir, - Directory containing input files. - ---rules-file - json file containing mapping rules - ---person-file - File containing person_ids in the first column - ---output-dir, - define the output directory for OMOP-format tsv files -``` - - -Either: -``` ---omop-ddl-file, - File containing OHDSI ddl statements for OMOP tables. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details. -``` - -AND -``` ---omop-config-file, - File containing additional/override json config for omop outputs. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details. -``` - -OR: -``` ---omop-version - Omop version - e.g., "5.3". Required if neither -omop-ddl-file nor --omop-config-file are set. If this is the case, the software will look for carrottransform/config/omop.json - and -carrottransform/config/OMOPCDM_postgresql_ XX_ddl.sql -to import, where XX is the version number entered as the argument. -``` - -Optional: -``` ---write-mode, - default = w - options: w, a - select whether to write new output files, or append to existing output files - ---saved-person-id-file, - Full path to person id file used to save person_id state and share person_ids between data sets - ---use-input-person-ids, - default = N - options: Y, N - If set to anything other than "N", person ids will be used from the input files. If set to "N" (default behaviour), person ids will be replaced with new integers. - ---last-used-ids-file, - Full path to last used ids file for OMOP tables. The file should be in a tab separated variable format: -tablename last_used_id -where last_used_id must be an integer. - ---log-file-threshold, - default = 0 -Change the limit for output count limit for logfile output. Logfile will contain the threshold number of output results. -``` - -Reduction in complexity over the original CaRROT-CDM version for the Transform part of *ETL* - In practice *Extract* is always -performed by Data Partners, *Load* by database bulk-load software. - - +### Required Arguments + +| Flag | Description | +|--------------------|------------------------------------------------| +| `--input-dir` | Directory containing input files | +| `--rules-file` | JSON file with mapping rules | +| `--person-file` | CSV file where the first column contains person IDs | +| `--output-dir` | Directory for OMOP-format TSV files | + + +### OMOP Configuration (Choose One Approach) +| Approach | Required Arguments | +|------------------|--------------------| +| **Specify Files** | `--omop-ddl-file` (DDL statements for OMOP tables) and `--omop-config-file` (override JSON config) | +| **Specify Version** | `--omop-version` (e.g., `5.3`, which will automatically find `carrottransform/config/omop.json` and `carrottransform/config/OMOPCDM_postgresql_XX_ddl.sql`) | + + +### Optional Arguments +| Flag | Default | Description | +|---------------------------|---------|-------------| +| `--write-mode` | `w` | Set to `w` (overwrite) or `a` (append) for output files | +| `--saved-person-id-file` | None | Path to a file to save and share `person_id` state | +| `--use-input-person-ids` | `N` | Use input person IDs (`Y`) or replace with new integers (`N`) | +| `--last-used-ids-file` | None | Path to a file tracking last used IDs (tab-separated format) | +| `--log-file-threshold` | `0` | Change output limit for log files |