POC of AI-Assisted Testing for MSF OpenMRS → DHIS2 Integration

## Background

The MSF project involves mapping a high volume of forms and data elements from OpenMRS to DHIS2 using a dynamically-read metadata file. Currently testing is time-consuming, particularly for custom mappings that require transformations, value splitting, and conditional logic (e.g., multi-pick lists to booleans). With 50-60+ new forms being continuously added, we need to automate testing to improve efficiency and reduce errors.

## Problem Statement

Testing the integration between OpenMRS and DHIS2 is challenging because:
- **Standard mappings** are mostly one-to-one but still need verification
- **Custom mappings** require complex transformations and conditional logic that are error-prone
- Manual generation of test data and expected outputs is tedious
- Need to test various scenarios including null values and all option combinations for adequate coverage
- Custom logic specifications are sometimes written in difficult-to-parse formats

## Proposed Solution

Use an LLM (Claude) to automate test generation and validation through the following approach:

### Phase 1: Proof of Concept (Single Form)
1. **Input Generation**: LLM generates various input data scenarios to ensure sufficient coverage, including edge cases, null values, and all option combinations
2. **Expected Output Generation**: LLM uses **only the metadata file** to generate expected DHIS2 event payloads (not the project's existing YAML code, to avoid biasing tests)
3. **Test Execution**: OpenFn workflow reads inputs from Google Sheet, executes MSF workflow, captures actual output, and writes back to the same sheet
4. **Assertion**: JavaScript script in Google Sheet compares actual vs. expected outputs

### Architecture Overview

```
Google Sheet (Test Cases)
    ↓ (read test inputs)
OpenFn Test Workflow
    ↓ (execute)
MSF Production Workflow
    ↓ (capture actual output)
OpenFn Test Workflow
    ↓ (write actual output)
Google Sheet (Test Cases + Actual Outputs)
    ↓ (run assertion script)
Test Results (Pass/Fail + Differences)
```

### Key Design Decisions
- Expected output = the events mapping JSON body sent to DHIS2 (not the DHIS2 response after event creation)
- LLM should work from metadata file only to generate expected outputs
- Final assertion should be deterministic JavaScript code in Google Sheet
- Google Sheet serves as both input source and results repository
- Start with manual POC before building automation workflow

## Deliverables

### 1. Google Sheet Template
Create a Google Sheet with the following structure (@jfeldman-openfn I think you already had the columns set, does this align with what you had?):

**Required Columns:**
- `test_id` - Unique identifier for each test case
- `form_name` - Name of the form being tested
- `test_scenario` - Description of what's being tested (e.g., "null values", "multi-pick to boolean")
- `input_payload` - OpenMRS input data (JSON string or structured columns)
- `expected_output` - DHIS2 event payload expected (JSON string) - LLM generated from metadata
- `actual_output` - DHIS2 event payload produced by workflow (JSON string) - populated by OpenFn workflow
- `test_status` - Pass/Fail - computed by assertion script
- `differences` - Details of mismatches if test fails - computed by assertion script
- `last_run` - Timestamp of last test execution
- `notes` - Additional context or observations

**Sheet Features:**
- [ ] Set up Google Sheet with designed column structure
- [ ] Add data validation and formatting where appropriate
- [ ] Include sample test cases for initial POC
- [ ] Create JavaScript assertion script (Apps Script) that:
  - Compares `expected_output` vs `actual_output` for each row
  - Normalizes JSON before comparison (handle whitespace, property order)
  - Populates `test_status` (Pass/Fail)
  - Populates `differences` with detailed mismatch information
  - Can be triggered manually or automatically after workflow execution

### 2. OpenFn Test Execution Workflow

Build an OpenFn workflow that automates test execution:

**Workflow Requirements:**

**Step 1: Read Test Cases from Google Sheet**
- [ ] Use `@openfn/language-googlesheets` adaptor
- [ ] Read all rows from the test cases sheet where `actual_output` is empty or needs re-testing
- [ ] Parse `input_payload` column into proper JSON format
- [ ] Handle pagination if there are many test cases

**Step 2: Execute MSF Workflow for Each Test Case**
- [ ] For each test case, invoke the MSF production workflow with the `input_payload`
- [ ] This may require:
  - Setting up the workflow to accept test inputs programmatically
  - Ensuring test execution doesn't affect production data (use test DHIS2 instance or mock endpoints)
  - Capturing the transformed payload before it's sent to DHIS2 (the events mapping JSON)
- [ ] Handle errors gracefully (workflow failures should be logged, not crash entire test run)

**Step 3: Capture Actual Output**
- [ ] Extract the DHIS2 event payload that would be sent (before actual DHIS2 API call)
- [ ] This is the critical output needed for comparison
- [ ] Convert output to JSON string format for storage

**Step 4: Write Results Back to Google Sheet**
- [ ] Update the same row in Google Sheet with:
  - `actual_output` - the captured DHIS2 event payload
  - `last_run` - current timestamp
- [ ] Ensure atomic updates (don't lose data if workflow fails mid-execution)
- [ ] Handle Google Sheets API rate limits appropriately

**Workflow Configuration:**
- [ ] Workflow should accept parameters:
  - `sheet_id` - Google Sheet ID
  - `sheet_name` - Tab name (default: "Test Cases")
  - `test_mode` - Boolean to ensure safe execution
  - `test_ids` - Optional array to run specific tests only
- [ ] Include comprehensive error handling and logging
- [ ] Add retry logic for transient failures
- [ ] Log execution summary (total tests run, successes, failures)

**Workflow Considerations:**
- Should the workflow run all tests sequentially or support parallel execution?
- How to handle workflows that depend on state or previous data?
- Should test execution be isolated (separate state/credentials)?
- How to prevent test data from reaching production DHIS2?

### 3. Integration & Testing

- [ ] Verify workflow correctly reads from Google Sheet
- [ ] Confirm workflow successfully executes MSF transformation logic
- [ ] Validate actual outputs are written back to correct rows
- [ ] Test assertion script correctly identifies matches and mismatches
- [ ] Document any edge cases or limitations discovered

## Implementation Tasks

### Phase 1: Setup & POC
- [ ] Select representative form for POC (standard + custom mappings)
- [ ] Prepare metadata file and context for LLM
- [ ] Design Google Sheet structure with Justin
- [ ] Create Google Sheet template with assertion script
- [ ] Use Claude to generate test inputs covering various scenarios
- [ ] Use Claude to generate expected outputs from metadata file
- [ ] Test the process, determine validity + time savings, and whether we want to proceed with Phase 2
- [ ] Share outcomes with MSF team for their inputs (and potential budget sign-off)

### Phase 2: Workflow Development
- [ ] Build OpenFn test workflow per requirements above
- [ ] Set up test environment/credentials (non-production DHIS2 if needed)
- [ ] Implement Google Sheets read/write operations
- [ ] Implement MSF workflow invocation and output capture
- [ ] Add error handling and logging
- [ ] Test workflow with POC form

### Phase 3: Validation
- [ ] Run workflow with POC test cases
- [ ] Execute assertion script to compare outputs
- [ ] Document what contextual information LLM needs
- [ ] Evaluate LLM performance on complex conditional logic
- [ ] Assess time savings vs manual testing

## Technical Specifications

### OpenFn Workflow Structure
```
job-1: Read test cases from Google Sheet
  ├─ Get all rows from sheet
  ├─ Filter for tests to run
  └─ Parse input payloads

job-2: Execute MSF workflow (for each test case)
  ├─ Invoke MSF transformation logic
  ├─ Capture DHIS2 event payload (actual output)
  └─ Store in state

job-3: Write results to Google Sheet
  ├─ Update rows with actual outputs
  ├─ Add timestamps
  └─ Log execution summary
```

### Google Sheet Assertion Script (Pseudocode)
```javascript
function runAssertions() {
  // Get test data
  // For each row with both expected and actual outputs:
  //   - Parse JSON strings
  //   - Normalize for comparison
  //   - Deep compare objects
  //   - Mark pass/fail
  //   - Document differences
  //   - Update sheet
}
```

## Acceptance Criteria

- Google Sheet properly structured and assertion script functional
- OpenFn workflow successfully reads inputs, executes MSF logic, and writes outputs
- LLM successfully generates comprehensive input test data covering edge cases
- LLM accurately produces expected outputs based solely on metadata file
- Assertion script correctly identifies mismatches with detailed diff information
- Process significantly reduces manual testing time compared to current approach
- Approach is repeatable for additional forms

## Notes
- Assertion logic should be version controlled (export Apps Script to repo)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

POC of AI-Assisted Testing for MSF OpenMRS → DHIS2 Integration #185

Background

Problem Statement

Proposed Solution

Phase 1: Proof of Concept (Single Form)

Architecture Overview

Key Design Decisions

Deliverables

1. Google Sheet Template

2. OpenFn Test Execution Workflow

3. Integration & Testing

Implementation Tasks

Phase 1: Setup & POC

Phase 2: Workflow Development

Phase 3: Validation

Technical Specifications

OpenFn Workflow Structure

Google Sheet Assertion Script (Pseudocode)

Acceptance Criteria

Notes

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

POC of AI-Assisted Testing for MSF OpenMRS → DHIS2 Integration #185

Description

Background

Problem Statement

Proposed Solution

Phase 1: Proof of Concept (Single Form)

Architecture Overview

Key Design Decisions

Deliverables

1. Google Sheet Template

2. OpenFn Test Execution Workflow

3. Integration & Testing

Implementation Tasks

Phase 1: Setup & POC

Phase 2: Workflow Development

Phase 3: Validation

Technical Specifications

OpenFn Workflow Structure

Google Sheet Assertion Script (Pseudocode)

Acceptance Criteria

Notes

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions