Skip to content

Conversation

@fuzhaoyuan
Copy link
Contributor

This is duplicated from #11822 due to some core tests constraints

Fix #11808

Fix Clinical Attribute NA Filtering

Summary

This PR fixes a issue in NA filtering for clinical attributes introduced by PR #11603 . The fix ensures studies without an attribute definition are properly treated as NA while preserving the attribute level conflict resolution.

Problem

When PR #11603 added logic to handle attribute level conflicts (sample vs patient level), it inadvertently broke NA filtering. Samples from studies that completely lack an attribute definition were incorrectly excluded from NA filter results.

Solution

Core NA Filtering Logic

This PR implements a comprehensive three-tier conditional logic for handling clinical attribute filtering:

1. IF study has NO attribute definition:

All samples treated as NA via study_without_attribute CTE and UNION ALL

study_without_attribute AS (
    SELECT DISTINCT cancer_study_identifier FROM sample_derived
    WHERE cancer_study_identifier NOT IN (SELECT cancer_study_identifier FROM study_with_attribute)
)

2. ELSE IF study HAS the attribute at different level (sample vs patient):

→ Use data from the corresponding level only
→ Sample queries skip patient-level attributes, vice versa
→ Prevents attribute level conflicts (preserves #11603 fix)

study_attribute_levels AS (
    SELECT DISTINCT cancer_study_identifier FROM clinical_data_derived
    WHERE attribute_name = ... AND type='${type}'  -- Filters by correct level
)

3. ELSE IF study HAS the attribute at matching level:

→ LEFT JOIN to retrieve data
→ IF attribute_value IS NULL or empty → treated as NA
→ ELSE filter by actual value

Changes

Modified Files

ClickhouseStudyViewFilterMapper.xml

  • Added studyAttributeFilterCTEs and samplesFromStudiesWithoutAttributeAsNA SQL fragments
  • Refactored numericalClinicalDataCountFilter to use shared fragments and handle studies without attributes
  • Refactored categoricalClinicalDataCountFilter with same improvements

ClickhouseSampleMapperTest.java

  • Updated test expectations to reflect correct NA handling (restored pre-Fix Clinical Data NA Count Bug with Cross-Study Attribute Level Conflict Resolution #11603 behavior)
  • acc_tcga study samples (4 samples with no AGE/DEAD attributes) now correctly included in NA filters
  • Test changes:
    • NA AGE filter: 7 → 11 samples (7 explicit NA + 4 from study without attribute)
    • NA AGE + UNKNOWN: 8 → 12 samples
    • NA DEAD filter: 17 → 21 samples (17 explicit NA + 4 from study without attribute)

Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
@fuzhaoyuan fuzhaoyuan changed the title Master fix clinical na filter Fix Clinical Attribute NA Filtering Jan 6, 2026
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 6, 2026

@fuzhaoyuan fuzhaoyuan self-assigned this Jan 6, 2026
@dippindots dippindots requested a review from alisman January 6, 2026 17:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NA filtering doesn't work in combined study setting

3 participants