Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[redcap] new module #9474

Draft
wants to merge 79 commits into
base: main
Choose a base branch
from
Draft

[redcap] new module #9474

wants to merge 79 commits into from

Conversation

regisoc
Copy link
Contributor

@regisoc regisoc commented Nov 16, 2024

Brief summary of changes (WIP)

This PR adds the REDCap interoperability module.
This module opens an API endpoint to receive notifications from REDCap servers through the use of the Data Entry Trigger (DET). REDCap notifications are parsed and REDCap records are imported into LORIS.
If an error happens during this notification handling process, an error is generated in the issue tracker.
The module is imported from HBCD, and even if the core logic is the same, REDCap objects are now their own models.
This version also supports connection to multiple REDCap instances.

Features

The main goal was to replicate HBCD data import process based on REDCap DET.

  • Refactoring of the REDCap HTTP Client and linked classes/models.
  • Support for multiple REDCap instances.
  • Additional tools and features:
    • redcap2linst: script to import REDCap forms as LORIS LINST instruments.
    • fetch button: UI button on instrument panel to trigger data import (could be done later)

State

  • core module
  • tools
  • documentation
  • tests
    • automated test
    • test plan
  • RB configuration
  • additional features:
    • instrument fetch button (HBCD added feature) - UI button in the side panel of an instrument to trigger the import of data. Could be done in a later PR.

Testing instructions (if applicable)

Notification process:

  1. Define main assignee, REDCap instances and projects in the config.xml following the structure given in RB/config.xml file.
  2. Define which instruments could be imported by adding them to the configuration list in LORIS. See Front-end > Admin > Configuration > REDCap > redcap_importable_instrument.
  3. Send a URL notification to LORIS (i.e. endpoint /redcap/notifications) to simulate the reception of a DET notification. The payload should be the following (see redcap/notifications/redcapnotification.class.inc):
{
    "instrument":"instrument_backend_name",
    "project_id":"REDCap project ID",
    "project_url":"REDCap project URL",
    "record":"REDCap record ID / PSCID",
    "redcap_event_name":"REDCap event name",
    "redcap_url":"REDCap instance URL",
    "username":"REDCap username",
    "${instrument_backend_name}_complete":"2"
}

REDCap LINST instrument process:

  • See tools/redcap2linst.php usage.

Link(s) to related issue(s)

Links to related PRs

SQL/0000-00-01-Modules.sql Outdated Show resolved Hide resolved
@maximemulder maximemulder added Project: HBCD Issue or PR related to the HBCD project Project: C-BIG Issue or PR related to the C-BIG project Difficulty: Complex PR or issue that require a great effort to implementat, review, or test Area: Instruments PR or issue related instruments Release: SQL patch PR that contains an SQL patch to apply labels Nov 29, 2024
Copy link
Contributor

@maximemulder maximemulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally finished reviewing the code, will test it now.

Okay, I finished reviewing the code, I am now in the process of testing it with MPN. Congratulations ! The module seems to work quite well and provides really convenient configuration options like multi-instance and multi-project integration. In addition you also use statically typed models instead of dictionaries directly, which I really like.

I wrote a bunch of small code comments, which you can also ignore if you do not agree with them, especially those marked with SUBJECTIVE.

On a broader level, I would have the following architectural comments:

  • I think we should use readonly properties for the configuration and model classes, which would help to remove a lot of boilerplate code.
  • I would prefer to avoid abbreviated variable names like pid, iname and rc, and instead use longer, more descriptive, names like project_id, instance_name and redcap_client. I know this does not play well with the 85-character line limit, but well, the problem is the limit here, not the longer names. I would also prefer more precise names like instrument_name, event_name and record_id instead of instrument, event and record when that makes sense.
  • In variable and type names, REDCap is sometimes written Redcap and sometimes REDCap, I think we should stick with one (IMO the simpler Redcap for code, and the "official" REDCap for documentation).
  • The configuration parser can be simplified. The current code builds the global configuration first, and then mutates it to add the instance and project configurations. It would be simpler to descend the XML tree, gather the instance configurations, and only then build the global configuration (same reasoning applies for instance and project configurations).
  • I need to have an additional REDCap project configuration parameter for MPN that changes the behavior of the pipeline. However, the configuration is currently not accessible from the REDCap notification handler. I think maybe there should be one configuration object accessible from there that contains all the configuration information for the specific project and instance of this call (instead of having a tree).
  • There are several singletons and factories, but I am not sure if these are really needed. If they do not persist across requests they should be removed IMO.
  • Same thing with the REDCap HTTP client cache. Is there a specific case that needs it ? If not, I think the abstraction should either be factorized so that it does not need to be manually read/written in each HTTP request call, or removed.
  • Same thing with the REDCap HTTP client handler. I am not sure managing different clients is necessary. Probably what should happen is: 1. Get the REDCap instance and project from the DET notification, 2. Get the REDCap configuration for that instance and project from the configuration file, 3. instantiate a REDCap HTTP client from that configuration and pass it as an argument to the REDCap notification handler, without necessarily bookkeeping instantiated clients like now (unless there is a good reason to do so).
  • The RedcapNotification class should be part of the REDCap HTTP client models, as it is structured JSON data obtained by HTTP from REDCap.

modules/redcap/php/models/redcapprop.class.inc Outdated Show resolved Hide resolved
modules/redcap/php/module.class.inc Show resolved Hide resolved
modules/redcap/php/module.class.inc Show resolved Hide resolved
@maximemulder maximemulder force-pushed the 20241020_redcap_module branch from 61d5c00 to 28f3733 Compare December 18, 2024 10:08
@maximemulder
Copy link
Contributor

maximemulder commented Dec 19, 2024

UPDATE: Since Regis is taking (well-deserved) time off, I am working on this PR myself and adding the required configuration options for MPN (that are hopefully also useful for other projects !).

A few things I noticed during my tests:

  • In the REDCap get project API call, the creation time and production time fields are nullable, I changed the types in the model to reflect that.
  • In the REDCap get event API call, MPN also returns the day_offset, offset_min and offset_max fields. MPN uses REDCap v14.0.28. Maybe these fields are optional or depend on the REDCap version.
  • In the REDCap get event API call, the current code expects the REDCap "event name" and "unique event name" to be related. However, that assumption does not always hold when special characters are involved (parentheses, hyphens, underscores, spaces...). I removed the check here.

@maximemulder maximemulder force-pushed the 20241020_redcap_module branch 7 times, most recently from c72c69b to 3472533 Compare December 24, 2024 09:35
@maximemulder
Copy link
Contributor

maximemulder commented Dec 24, 2024

Okay, I made some changes to the module for it to be usable for MPN, and made a few refactors that make the code (IMO) cleaner. Still a few TODOs notably regarding error handling, but I think that's a nice progress. List of sifnificant changes:

  • Added new configuration options and a documentation for the configuration.
  • Reworked the REDCap configuration parsing.
  • Reworked the REDCap HTTP client construction (no more HTTP REDCap client handler).
  • Moved all the REDCap client -related classes and models in redcap/php/client (to be renamed redcap/php/redcap_client ?).
  • Use readonly instead of getters in the models to reduce boilerplate.
  • Added a RedcapProps class to validate REDCap response objects (replacement for RedcapProp).

With all these changes, I removed almost all of my previous code comments except for a few ones I am still interested about.

Notably TODOs:

  • Finish error handling.
  • Migrate RedcapProject to RedcapProps and readonly.
  • Is the cache useful in the REDCap HTTP client ? It should be either removed or factorized IMO.
  • IMO the REDCap HTTP client does too many things. Code that is not directly related to calling the REDCap API and formatting results should probably be extracted.

CI seems broken because of an APT dependency, this is unrelated to the changes made here.

@maximemulder maximemulder force-pushed the 20241020_redcap_module branch from 3472533 to 5223553 Compare January 2, 2025 10:11
@regisoc
Copy link
Contributor Author

regisoc commented Jan 2, 2025

@maximemulder happy new year, and thanks for adding more to this PR while I was away!
I did not look into the code yet, just based on your comments:

  • In the REDCap get project API call, the creation time and production time fields are nullable, I changed the types in the model to reflect that. -> types, good.
  • In the REDCap get event API call, MPN also returns the day_offset, offset_min and offset_max fields. MPN uses REDCap v14.0.28. Maybe these fields are optional or depend on the REDCap version. -> not sure as we do not have a clear vision on REDCap changelog. Was that added to the redcapevent class as optional args?
  • In the REDCap get event API call, the current code expects the REDCap "event name" and "unique event name" to be related. However, that assumption does not always hold when special characters are involved (parentheses, hyphens, underscores, spaces...). I removed the check here. -> the event_name is the redcap event name (e.g. 'aaa' or a LORIS visit/timepoint name in our case) and the unique_event_name concatenates the arm number with an underscore. From what we talked about with Rida based on HBCD way of naming, I thought we wanted them to be linked, thus the test. The only special characters should be an underscore between the base event name and the arm number. Is MPN using other chars?
  • Added new configuration options and a documentation for the configuration. -> there was a doc before and it is difficult to see what was there before with the force-push, thanks for updating the documentation.
  • Reworked the REDCap HTTP client construction (no more HTTP REDCap client handler). -> good if you have removed it, I did not like it but it was a first draft. I will look into this, because I just want to be sure we can access multiple REDCap instances.
  • Moved all the REDCap client -related classes and models in redcap/php/client (to be renamed redcap/php/redcap_client ?). -> I do not mind that. redcap/php/client looks good to me.
  • Use readonly instead of getters in the models to reduce boilerplate. -> great!
  • Added a RedcapProps class to validate REDCap response objects (replacement for RedcapProp). -> ok. Is the old RedcapProp still needed?

Todo:

  • Finish error handling. -> the issue creation in the issue tracker is still working, I wanted to add more specific LORIS-REDCap related exceptions.
  • Migrate RedcapProject to RedcapProps and readonly. -> not sure to follow here? Is it the only one missing in the transition from RedcapProp to RedcapProps?
  • Is the cache useful in the REDCap HTTP client ? It should be either removed or factorized IMO. -> yes it is needed and yes is will be factorized. DET triggers a looot of notifications and having big items cached (i.e. the data dictionary, events, arms, instruments list) per REDCap instance is a must IMO. It was not a priority but I put the base of it.
  • IMO the REDCap HTTP client does too many things. Code that is not directly related to calling the REDCap API and formatting results should probably be extracted. -> not sure to get this, which methods?

CI issue -> not related to phan type checks?

I will look into that for next week and I would like to have a quick meeting too so we can align on the next steps.

@maximemulder maximemulder force-pushed the 20241020_redcap_module branch from db54f90 to 3478e29 Compare January 7, 2025 23:33
Copy link
Contributor

@maximemulder maximemulder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have done all the changes I wanted to do on the REDCap module (I am just finishing to update the tests), and it now seems to cover all my needs for MPN.

I have two comments here. While I refactored the configuration, I removed the instance name, but you can add it back if you want (IMO using the URL instead of the name is enough), it should be easy to add such options with the refactored configuration and parser.

modules/redcap/php/redcapnotificationhandler.class.inc Outdated Show resolved Hide resolved
Comment on lines +605 to +676
if (isset($instrument_values['dtt'])) {
$dt = \DateTime::createFromFormat(
'Y-m-d H:i:s',
$instrument_values['dtt'],
);

if (!$dt) {
error_log(
"[redcap] Could not parse 'dtt': "
. $instrument_values['dtt']
);
} else {
$instrument_values['Date_taken'] = $dt->format('Y-m-d');
}
}

// if null/empty, try getting that based on the timestamp
if (isset($instrument_values['Date_taken'])
|| empty($instrument_values['Date_taken'])
) {
if (isset($instrument_values['timestamp'])) {
$dt = \DateTime::createFromFormat(
'Y-m-d H:i:s',
$instrument_values['timestamp']
);
if (!$dt) {
error_log(
"[redcap] Could not parse 'timestamp': "
. $instrument_values['timestamp']
);
} else {
$instrument_values['Date_taken'] = $dt->format('Y-m-d');
}
}
}

// if null/empty, try getting that based on the timestamp_start
if (isset($instrument_values['Date_taken'])
|| empty($instrument_values['Date_taken'])
) {
if (isset($instrument_values['timestamp_start'])) {
$dt = \DateTime::createFromFormat(
'Y-m-d H:i:s',
$instrument_values['timestamp_start']
);
if (!$dt) {
error_log(
"[redcap] Could not parse 'timestamp_start': "
. $instrument_values['timestamp_start']
);
} else {
$instrument_values['Date_taken'] = $dt->format('Y-m-d');
}
}
}

// if still null/empty, get the current date
if (isset($instrument_values['Date_taken'])
|| empty($instrument_values['Date_taken'])
) {
$dtNow = new \DateTimeImmutable();
$instrument_values['Date_taken'] = $dtNow->format('Y-m-d');
}

// add the timestamp_stop in the values based on the last timestamp
if (isset($instrument_values['timestamp'])
&& !empty($instrument_values['timestamp'])
) {
// rename var to uniformize with other LORIS instruments
// Duration will be calculated when _saveValues is called.
$instrument_values['timestamp_stop'] = $instrument_values['timestamp'];
}
Copy link
Contributor

@maximemulder maximemulder Jan 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand all this code. It seems rather hacky and has been a little fragile in my experience (I had to move some things around for it to work on MPN). It would be best to refactor it IMO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HBCD case, most of the timestamps were not certain to be there at some point, but we were still in need of a data of administration for each instrument Date_taken.

Copy link
Contributor Author

@regisoc regisoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly tested, I have a consistent 500 error that might not be related. Trying to debug that.

modules/redcap/CONFIGURATION.md Outdated Show resolved Hide resolved
modules/redcap/CONFIGURATION.md Show resolved Hide resolved
* @license http://www.gnu.org/licenses/gpl-3.0.txt GPLv3
* @link https://www.github.com/aces/Loris/
*/
class Queries
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used in notification handler only?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, but it is not intended and it could be extended with other queries to be used elsewhere if wanted.

TBH it might be good to split the notification handler into several classes for different tasks (get LORIS instrument, validate record dictionary, update instrument...) but I did not have the time for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the Query idea but same here, a lack of time.
Yes, we talked about splitting tasks and I wanted to it, but again, lack of time.
If it is working, I guess a refactoring can be done later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, the module is currently ok in terms of code, can be improved but nothing blocking if it works IMO.

modules/redcap/php/client/models/redcapprop.class.inc Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Instruments PR or issue related instruments Difficulty: Complex PR or issue that require a great effort to implementat, review, or test Project: C-BIG Issue or PR related to the C-BIG project Project: HBCD Issue or PR related to the HBCD project Release: SQL patch PR that contains an SQL patch to apply
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants