Skip to content

Conversation

@fuzhaoyuan
Copy link
Contributor

@fuzhaoyuan fuzhaoyuan commented Nov 21, 2025

Brought over to #11901

Fix #11808

Fix Clinical Attribute NA Filtering

Summary

This PR fixes a issue in NA filtering for clinical attributes introduced by PR #11603 . The fix ensures studies without an attribute definition are properly treated as NA while preserving the attribute level conflict resolution.

Problem

When PR #11603 added logic to handle attribute level conflicts (sample vs patient level), it inadvertently broke NA filtering. Samples from studies that completely lack an attribute definition were incorrectly excluded from NA filter results.

Solution

Core NA Filtering Logic

This PR implements a comprehensive three-tier conditional logic for handling clinical attribute filtering:

1. IF study has NO attribute definition:

All samples treated as NA via study_without_attribute CTE and UNION ALL

study_without_attribute AS (
    SELECT DISTINCT cancer_study_identifier FROM sample_derived
    WHERE cancer_study_identifier NOT IN (SELECT cancer_study_identifier FROM study_with_attribute)
)

2. ELSE IF study HAS the attribute at different level (sample vs patient):

→ Use data from the corresponding level only
→ Sample queries skip patient-level attributes, vice versa
→ Prevents attribute level conflicts (preserves #11603 fix)

study_attribute_levels AS (
    SELECT DISTINCT cancer_study_identifier FROM clinical_data_derived
    WHERE attribute_name = ... AND type='${type}'  -- Filters by correct level
)

3. ELSE IF study HAS the attribute at matching level:

→ LEFT JOIN to retrieve data
→ IF attribute_value IS NULL or empty → treated as NA
→ ELSE filter by actual value

Changes

Modified Files

ClickhouseStudyViewFilterMapper.xml

  • Added studyAttributeFilterCTEs and samplesFromStudiesWithoutAttributeAsNA SQL fragments
  • Refactored numericalClinicalDataCountFilter to use shared fragments and handle studies without attributes
  • Refactored categoricalClinicalDataCountFilter with same improvements

ClickhouseSampleMapperTest.java

  • Updated test expectations to reflect correct NA handling (restored pre-Fix Clinical Data NA Count Bug with Cross-Study Attribute Level Conflict Resolution #11603 behavior)
  • acc_tcga study samples (4 samples with no AGE/DEAD attributes) now correctly included in NA filters
  • Test changes:
    • NA AGE filter: 7 → 11 samples (7 explicit NA + 4 from study without attribute)
    • NA AGE + UNKNOWN: 8 → 12 samples
    • NA DEAD filter: 17 → 21 samples (17 explicit NA + 4 from study without attribute)

@fuzhaoyuan fuzhaoyuan self-assigned this Nov 21, 2025
@fuzhaoyuan fuzhaoyuan requested a review from alisman November 21, 2025 19:37
@fuzhaoyuan fuzhaoyuan force-pushed the master-fix-clinical-na-filter branch from 7ea4224 to 6477f67 Compare November 24, 2025 22:27
@fuzhaoyuan fuzhaoyuan force-pushed the master-fix-clinical-na-filter branch 2 times, most recently from 343b4d6 to d1d62de Compare December 2, 2025 17:13
@alisman alisman force-pushed the master-fix-clinical-na-filter branch from d1d62de to 1e32960 Compare December 22, 2025 15:45
Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
Signed-off-by: Zhaoyuan (Ryan) Fu <[email protected]>
@alisman alisman force-pushed the master-fix-clinical-na-filter branch from 1e32960 to d98b143 Compare January 5, 2026 21:19
@sonarqubecloud
Copy link

sonarqubecloud bot commented Jan 5, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

NA filtering doesn't work in combined study setting

1 participant