Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated the quickstart documentation #17

Merged
merged 1 commit into from
Feb 21, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Updated the quickstart documentation
bp1183 committed Feb 21, 2025
commit 9168d4232e7c0e8febe6a445ff8ee5ece3d663e4
141 changes: 51 additions & 90 deletions website/src/pages/transform/quickstart.mdx
Original file line number Diff line number Diff line change
@@ -1,114 +1,75 @@
# carrot-transform
# Carrot-Transform Quick Start Guide

## Quick Start
## Installation

Carrot transform is run from the command line. It now supports poetry to control the python dependencies. To install poetry, please follow the instructions [here](https://python-poetry.org/docs/).

Once this is done, you can install the dependencies from the command line with the following command:
Carrot-Transform is now available on PyPI, so you can install it with:
```bash
pip install carrot-transform
```

Alternatively, if you are working with the source code, install dependencies using Poetry:
```bash
poetry install
```
To install poetry, please follow the instructions [here](https://python-poetry.org/docs/).

To run from the command line, enter:

## Running Carrot-Transform
To execute Carrot-Transform, run:
```
poetry run python carrot_transform.py [args]
carrot-transform [command] [options]
```

For example, you can get the version number with:
```
poetry run python carrot_transform.py -v
carrot-transform -v
```

There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository.
There are many mandatory and optional arguments for carrot transform. In the quick start, we will demonstrate the mandatory arguments on a test case (taken from carrot-CDM) included in the repository.
Enter the following (as one command):

```
poetry run python carrot_transform.py run mapstream carrottransform/examples/test/inputs\
--rules-file\
carrottransform/examples/test/rules/rules_14June2021.json\
--person-file\
carrottransform/examples/test/inputs/Demographics.csv\
--output-dir\
carrottransform/examples/test/test_output\
--omop-ddl-file\
carrottransform/config/OMOPCDM_postgresql_5.3_ddl.sql\
--omop-config-file\
carrottransform/config/omop.json
### Basic Example
To process a test dataset included in the repository, run:
```bash
carrot-transform run mapstream \
carrottransform/examples/test/inputs \
--rules-file carrottransform/examples/test/rules/rules_14June2021.json \
--person-file carrottransform/examples/test/inputs/Demographics.csv \
--output-dir carrottransform/examples/test/test_output \
--omop-ddl-file carrottransform/config/OMOPCDM_postgresql_5.3_ddl.sql \
--omop-config-file carrottransform/config/omop.json
```

This should create a set of output files in this directory:
```
This will generate a set of output files in this directory:
```bash
carrottransform/examples/test/test_output
```



## Arguments
### Required:

```
input-dir,
Directory containing input files.

--rules-file
json file containing mapping rules

--person-file
File containing person_ids in the first column

--output-dir,
define the output directory for OMOP-format tsv files
```


Either:
```
--omop-ddl-file,
File containing OHDSI ddl statements for OMOP tables. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details.
```

AND
```
--omop-config-file,
File containing additional/override json config for omop outputs. Instead of specifying the file explicitly, it can be found automatically if --omop-version is specified instead. See --omop-version for further details.
```

OR:
```
--omop-version
Omop version - e.g., "5.3". Required if neither -omop-ddl-file nor --omop-config-file are set. If this is the case, the software will look for carrottransform/config/omop.json
and
carrottransform/config/OMOPCDM_postgresql_ XX_ddl.sql
to import, where XX is the version number entered as the argument.
```

Optional:
```
--write-mode,
default = w
options: w, a
select whether to write new output files, or append to existing output files

--saved-person-id-file,
Full path to person id file used to save person_id state and share person_ids between data sets

--use-input-person-ids,
default = N
options: Y, N
If set to anything other than "N", person ids will be used from the input files. If set to "N" (default behaviour), person ids will be replaced with new integers.

--last-used-ids-file,
Full path to last used ids file for OMOP tables. The file should be in a tab separated variable format:
tablename last_used_id
where last_used_id must be an integer.

--log-file-threshold,
default = 0
Change the limit for output count limit for logfile output. Logfile will contain the threshold number of output results.
```

Reduction in complexity over the original CaRROT-CDM version for the Transform part of *ETL* - In practice *Extract* is always
performed by Data Partners, *Load* by database bulk-load software.


### Required Arguments

| Flag | Description |
|--------------------|------------------------------------------------|
| `--input-dir` | Directory containing input files |
| `--rules-file` | JSON file with mapping rules |
| `--person-file` | CSV file where the first column contains person IDs |
| `--output-dir` | Directory for OMOP-format TSV files |


### OMOP Configuration (Choose One Approach)
| Approach | Required Arguments |
|------------------|--------------------|
| **Specify Files** | `--omop-ddl-file` (DDL statements for OMOP tables) and `--omop-config-file` (override JSON config) |
| **Specify Version** | `--omop-version` (e.g., `5.3`, which will automatically find `carrottransform/config/omop.json` and `carrottransform/config/OMOPCDM_postgresql_XX_ddl.sql`) |


### Optional Arguments
| Flag | Default | Description |
|---------------------------|---------|-------------|
| `--write-mode` | `w` | Set to `w` (overwrite) or `a` (append) for output files |
| `--saved-person-id-file` | None | Path to a file to save and share `person_id` state |
| `--use-input-person-ids` | `N` | Use input person IDs (`Y`) or replace with new integers (`N`) |
| `--last-used-ids-file` | None | Path to a file tracking last used IDs (tab-separated format) |
| `--log-file-threshold` | `0` | Change output limit for log files |