diff --git a/docs/RabbitInAHat.html b/docs/RabbitInAHat.html index d2c6de18..188affaa 100644 --- a/docs/RabbitInAHat.html +++ b/docs/RabbitInAHat.html @@ -392,13 +392,13 @@

Process Overview

  • Save Rabbit-In-a-Hat work and export to a MS Word document.
  • -
    -

    Installation and support

    -

    Rabbit-In-a-Hat comes with WhiteRabbit, refer to step 1 and 2 of WhiteRabbit’s installation section.

    +
    +

    Installation and support

    +

    Rabbit-In-a-Hat comes with WhiteRabbit, refer to step 1 and 2 of WhiteRabbit’s installation section.

    -
    -

    Getting Started

    +
    +

    Using the application functions

    Creating a New Document

    To create a new document, navigate to File –> Open Scan Report. Use the “Open” window to browse for the scan document created by WhiteRabbit. When a scan report is opened, the tables scanned will appear in orange boxes on the “Source” side of the Tables.

    @@ -429,7 +429,7 @@

    Loading in a Custom CDM

    Stem table

    In some cases a source domains maps to multiple OMOP CDM target domains. For example lab values that map to both the measurement and observation domain. Using the stem table will remove some overhead of repeating the mapping for every target and will also ease implementation (see below).

    -

    The idea of the stem table is that it contains all the types of columns that you need regardless of the CDM table the data ultimately ends up in. There is a pre-specified map from stem to all CDM clinical event tables, linking every stem field to one or multiple fields in the CDM. When implementing the ETL, the vocabulary decides where a particular row mapped to stem table ultimately goes. The OMOP CDM Data Model Conventions mentions:

    +

    The idea of the stem table is that it contains all the types of columns that you need regardless of the CDM table the data ultimately ends up in. There is a pre-specified map from stem to all CDM clinical event tables, linking every stem field to one or multiple fields in the CDM. When implementing the ETL, the vocabulary decides where a particular row mapped to stem table ultimately goes. The OMOP CDM Data Model Conventions mentions:

    Write the data record into the table(s) corresponding to the domain of the Standard CONCEPT_ID(s).

    @@ -444,10 +444,9 @@

    Concept id hints (v0.9.0)

    The concept id hints are stored statically in a csv file and are not automatically updated. The code used to create the aforementioned csv file is also included in the repo.

    -
    -
    -

    Table to Table Mappings

    -

    It is assumed that the owners of the source data should be able to provide detail of what the data table contains, Rabbit-In-a-Hat will describe the columns within the table but will not provide the context a data owner should provide. For the CDM tables, if more information is needed navigate to the OMOP CDM wiki and review the current OMOP specification.

    +
    +

    Table to Table Mappings

    +

    It is assumed that the owners of the source data should be able to provide detail of what the data table contains, Rabbit-In-a-Hat will describe the columns within the table but will not provide the context a data owner should provide. For the CDM tables, if more information is needed navigate to the OMOP CDM documentation and review the current OMOP specification.

    To connect a source table to a CDM table, simply hover over the source table until an arrow head appears.

    Use your mouse to grab the arrow head and drag it to the corresponding CDM table. In the example below, the drug_claims data will provide information for the drug_exposure table.

    @@ -456,26 +455,28 @@

    Table to Table Mappings

    Continue this process until all tables that are needed to build a CDM are mapped to their corresponding CDM tables. One source table can map to multiple CDM tables and one CDM table can receive multiple mappings. There may be tables in the source data that should not be map into the CDM and there may be tables in the CDM that cannot be populated from the source data.

    -
    -

    Field to Field Mappings

    +
    +

    Field to Field Mappings

    By double clicking on an arrow connecting a source and CDM table, it will open a Fields pane below the arrow selected. The Fields pane will have all the source table and CDM fields and is meant to make the specific column mappings between tables. Hovering over a source table will generate an arrow head that can then be selected and dragged to its corresponding CDM field. For example, in the drug_claims to drug_exposure table mapping example, the source data owners know that patient_id is the patient identifier and corresponds to the CDM.person_id. Also, just as before, the arrow can be selected and Logic and Comments can be added.

    If you select the source table orange box, Rabbit-In-a-Hat will expose values the source data has for that table. This is meant to help in the process in understanding the source data and what logic may be required to handle the data in the ETL. In the example below ndcnum is selected and raw NDC codes are displayed starting with most frequent (note that in the WhiteRabbit scan a “Min cell count” could have been selected and values smaller than that count will not show).

    Continue this process until all source columns necessary in all mapped tables have been mapped to the corresponding CDM column. Not all columns must be mapped into a CDM column and not all CDM columns require a mapping. One source column may supply information to multiple CDM columns and one CDM column can receive information from multiple columns.

    -
    -

    Generating an ETL Document

    +
    +

    Output generation

    +
    +

    Generating an ETL Document

    To generate an ETL MS Word document use File –> Generate ETL document –> Generate ETL Word document and select a location to save. The ETL document can also be exported to markdown or html. In this case, a file per target table is created and you will be prompted to select a folder. Regardless of the format, the generated document will contain all mappings and notes from Rabbit-In-a-Hat.

    Once the information is in the document, if an update is needed you must either update the information in Rabbit-In-a-Hat and regenerate the document or update the document. If you make changes in the document, Rabbit-In-a-Hat will not read those changes and update the information in the tool. However, it is common to generate the document with the core mapping information and fill in more detail within the document.

    Once the document is completed, this should be shared with the individuals who plan to implement the code to execute the ETL. The markdown and html format enable easy publishing as a web page on e.g. Github. A good example is the Synthea ETL documentation.

    -
    -

    Generating a Testing Framework

    +
    +

    Generating a Testing Framework

    To make sure the ETL process is working as specified, it is highly recommended creating unit tests that evaluate the behavior of the ETL process. To efficiently create a set of unit tests Rabbit-in-a-Hat can generate a testing framework.

    -
    -

    Generating a SQL Skeleton (v0.9.0)

    +
    +

    Generating a SQL Skeleton (v0.9.0)

    The step after documenting your ETL process is to implement it in an ETL framework of your choice. As many implementations involve SQL, Rabbit-In-a-Hat provides a convenience function to export your design to an SQL skeleton. This contains all field to field mappings, with logic/descriptions as comments, as non-functional pseudo-code. This saves you copying names into your SQL code, but still requires you to implement the actual logic. The general format of the skeleton is:

    INSERT INTO <target_table> (
       <target_fields>
    @@ -485,6 +486,8 @@ 

    Generating a SQL Skeleton (v0.9.0)

    FROM <source_table> ;
    +
    +
    diff --git a/docs/WhiteRabbit.html b/docs/WhiteRabbit.html index 3a7a9611..69bdd9f6 100644 --- a/docs/WhiteRabbit.html +++ b/docs/WhiteRabbit.html @@ -406,7 +406,7 @@

    Installation

    See Running from the command line for details on how to run from the command line instead.
  • Go to Using the Application Functions for detailed instructions on how to make a scan of your data.
  • -

    Note: on releases earlier than version 0.8.0, open the respective WhiteRabbit.jar or RabbitInAHat.jar files instead.

    +

    Note: on releases earlier than version 0.8.0, open the respective WhiteRabbit.jar or RabbitInAHat.jar files instead. Note: WhiteRabbit and RabbitInaHat only work from a path with only ascii characters.

    Memory

    WhiteRabbit possibly does not start when the memory allocated by the JVM is too big or too small. By default this is set to 1200m. To increase the memory (in this example to 2400m), either set the environment variable EXTRA_JVM_ARGUMENTS=-Xmx2400m before starting or edit in bin/WhiteRabbit.bat the line %JAVACMD% %JAVA_OPTS% -Xmx2400m.... To lower the memory, set one of these variables to e.g. -Xmx600m. If you have a 32-bit Java VM installed and problems persist, consider installing 64-bit Java.

    @@ -420,7 +420,7 @@

    Support

    -

    Using the Application Functions

    +

    Using the application functions

    Specifying the Location of Source Data

    @@ -482,7 +482,7 @@

    SQL Server

    PostgreSQL

      -
    • Server location: this field contains the host name and database name (/)
    • +
    • Server location: this field contains the host name and database name (<host>/<database>). You can also specify the port (ex: <host>:<port>/<database>), which defaults to 5432.
    • User name: name of the user used to log into the server
    • Password: password for the supplied user name
    • Database name: this field contains the schema containing the tables