Fixes #11583 — Optimize molecular data multi-profile fetch for ClickHouse (reduce N+1 queries) #11840

immortal71 · 2025-12-05T23:34:39Z

This PR addresses the performance bottleneck when using ClickHouse in multi-profile molecular data fetches. Instead of repeated per-gene queries (N+1), the ClickHouse repository now fetches per-sample rows from the genetic_alteration_derived table and aggregates them into the legacy values CSV format expected by the service layer. The service now requests all entrez gene IDs in a single call.

Key changes:

New ClickHouse mapper & XML to query genetic_alteration_derived.
ClickHouse repository that aggregates per-sample rows into GeneMolecularAlteration.
Service change to call repository once with the full set of requested entrez IDs.
Unit tests for clickhouse aggregation.

Notes & next steps: Add entrez_gene_id to the derived table to avoid a join to gene during the ClickHouse query for better perf.

Copilot

Pull request overview

This PR optimizes molecular data fetching for ClickHouse by eliminating N+1 query patterns when retrieving data across multiple profiles. Instead of querying per gene, the implementation now fetches all requested genes in a single ClickHouse query and aggregates per-sample rows into the legacy CSV format expected by the service layer.

Key Changes

Introduced ClickHouse-specific repository that queries genetic_alteration_derived and aggregates results into GeneMolecularAlteration objects
Modified service layer to batch all entrez gene IDs into a single repository call
Added comprehensive unit tests for both service and repository aggregation logic

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
`MolecularDataServiceImpl.java`	Replaced per-gene streaming queries with single batched repository call
`MolecularDataMyBatisClickhouseRepository.java`	New repository implementation that aggregates per-sample ClickHouse rows into CSV format
`MolecularDataMapper.java`	New mapper interface for ClickHouse queries
`MolecularDataMapper.xml`	MyBatis XML query definition for fetching per-sample molecular data
`MolecularDataRowPerSample.java`	New model class representing individual sample-level molecular data rows
`MolecularDataServiceImplTest.java`	Added test verifying multi-profile molecular data fetch with single repository call
`MolecularDataMyBatisClickhouseRepositoryTest.java`	Added test verifying aggregation logic from per-sample rows to CSV format

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

...bioportal/legacy/persistence/mybatisclickhouse/MolecularDataMyBatisClickhouseRepository.java

immortal71 · 2025-12-07T12:05:15Z

@heuermh @jorvis @alisman , does this need any changes ?, I am little confused here can you guide me and review this pr ??

onursumer

Not sure if master is the best target for this PR. It might be better if we try optimizing the rc-7.0-clickhouse-only branch because eventually we will switch to clickhouse only implementation.

onursumer · 2025-12-10T22:15:53Z

...bioportal/legacy/persistence/mybatisclickhouse/MolecularDataMyBatisClickhouseRepository.java

+
+@Repository
+@ConditionalOnProperty(name = "clickhouse_mode", havingValue = "test")
+public class MolecularDataMyBatisClickhouseRepository implements MolecularDataRepository {


We should probably just modify MolecularDataMyBatisRepository instead of introducing another legacy repository class.

onursumer · 2025-12-10T22:17:34Z

src/main/java/org/cbioportal/legacy/model/MolecularDataRowPerSample.java

+
+import java.io.Serializable;
+
+public class MolecularDataRowPerSample implements Serializable {


Do we really need to introduce a new legacy model? Can't we achieve the same thing by just using a map and the existing GeneMolecularAlteration model?

onursumer · 2025-12-10T22:19:57Z

src/main/resources/org/cbioportal/persistence/mybatisclickhouse/MolecularDataMapper.xml

+<?xml version="1.0" encoding="UTF-8"?>
+<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
+
+<mapper namespace="org.cbioportal.legacy.persistence.mybatisclickhouse.MolecularDataMapper">


We can probably just modify the existing mapper src/main/resources/org/cbioportal/legacy/persistence/mybatis/MolecularDataMapper.xml instead of introducing a new one.

onursumer · 2025-12-10T22:22:13Z

src/main/java/org/cbioportal/legacy/persistence/mybatisclickhouse/MolecularDataMapper.java

+import org.apache.ibatis.annotations.Param;
+import org.cbioportal.legacy.model.MolecularDataRowPerSample;
+
+public interface MolecularDataMapper {


I would just modify the existing legacy mapper instead of introducing another legacy mapper

immortal71 · 2025-12-11T08:32:59Z

@onursumer
Thanks a lot for the detailed review and guidance. I understand that the preferred approach is to optimize the existing ClickHouse path rather than adding separate legacy classes and mappers. I’ll update this PR to: ~
--Target the rc-7.0-clickhouse-only branch instead of master.
-- Modify 'MolecularDataMyBatisRepository' and the existing MolecularDataMapper.xml instead of introducing new legacy repository/mapper classes and models.
I’ll push an updated version based on this direction and ping you for another review.

immortal71 · 2025-12-11T09:51:38Z

Refactoring Complete per Reviewer Feedback

I've updated this PR to address all the feedback from @onursumer

Changes Made:

Removed separate ClickHouse-specific classes:

Deleted MolecularDataMyBatisClickhouseRepository
Deleted MolecularDataRowPerSample model
Deleted ClickHouse-specific mapper files in mybatisclickhouse package

Integrated optimization into existing infrastructure:

Modified MolecularDataMyBatisRepository to include ClickHouse-specific logic using @Value("${clickhouse_mode:false}") instead of @ConditionalOnProperty
Added conditional path in getGeneMolecularAlterationsInMultipleMolecularProfiles() that switches based on clickhouseMode flag
Reuses existing GeneMolecularAlteration model throughout

Updated existing mapper:

Added getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse() method to existing MolecularDataMapper interface
Added conditional ClickHouse query in existing MolecularDataMapper.xml that queries genetic_alteration_derived table
Repository aggregates per-sample rows into CSV format as expected by service layer

Changed property value:

Changed from havingValue="test" to proper boolean flag check as suggested by Copilot

Updated PR target:

Changed base branch from master to rc-7.0-clickhouse-only as requested

The optimization still eliminates N+1 queries by fetching all genes in a single ClickHouse query, but now integrates cleanly into existing codebase without introducing parallel legacy classes.

immortal71 · 2025-12-11T09:59:09Z

@onursumer waiting for the review !! and further guidance

…ioPortal#11761 -- Created MolecularDataCountItem model to represent per-profile counts -- Added fetchMolecularDataCountsInMultipleMolecularProfiles method to service layer -- Implemented new /api/molecular-data/counts POST endpoint - Returns JSON array with count per molecular profile in single database query -- Leverages existing getMolecularDataInMultipleMolecularProfiles optimization from PR cBioPortal#11840 -- Added unit tests for service and controller layers -- Includes implementation plan document for reference

immortal71 · 2025-12-28T00:45:59Z

@zainasir @inodb @onursumer @sheridancbio I've pushed a fix (commit 97a0773) that resolves the circular dependency issue causing the build failures. The fix:

Replaced SampleService injection with SampleMapper to eliminate circular dependency
Added null safety checks and error handling
Ensures compatibility with both MySQL and ClickHouse environments

Could you please approve the pending workflows so the new builds can run with the fixed code? Thanks!

immortal71 · 2026-01-02T11:54:44Z

Friendly ping on this PR. All feedback from @onursumer has been implemented, circular dependency fixed in 97a0773, and tests pass locally. When you have time, could you please take another look ?

onursumer · 2026-01-07T22:27:27Z

Hi @immortal71, we recently merged rc-7.0-clickhouse-only into master. Can you change the base branch back to master and rebase your PR? Thanks!

immortal71 · 2026-01-08T13:39:20Z

@onursumer done !!

immortal71 · 2026-01-08T14:27:51Z

@onursumer Can you review it ?

onursumer · 2026-01-08T16:15:05Z

@immortal71 can you also rebase your branch on master (git rebase master) so that it includes latest changes from master?

immortal71 · 2026-01-09T06:07:44Z

@immortal71 can you also rebase your branch on master (git rebase master) so that it includes latest changes from master?

@onursumer done !!

immortal71 · 2026-01-09T17:31:32Z

@onursumer
Can you review this !!

onursumer · 2026-01-09T17:52:34Z

@immortal71 your branch is still 22 commits behind the master branch. Can you rebase it on the latest cBioPortal/cbioportal:master?

…gle query to reduce N+1 queries (ClickHouse perf)

…ry that aggregates per-sample rows into gene-profile values

…nto existing repository/mapper - Remove separate ClickHouse classes - Change conditional property to true

@onursumer

Per reviewer feedback from @onursumer: --> Removed separate ClickHouse-specific repository and mapper classes --> Moved optimization into existing MolecularDataMyBatisRepository - Updated existing MolecularDataMapper.xml with conditional ClickHouse query --> Changed @ConditionalOnProperty havingValue from 'test' to 'true' --> Reuses existing GeneMolecularAlteration model instead of new legacy classes The ClickHouse path queries genetic_alteration_derived table and aggregates per-sample rows into CSV format in the repository layer.

…ization - Replaced SampleService injection with SampleMapper to avoid circular dependency - Added null safety check for optional SampleMapper dependency - Added try-catch with fallback to standard method for database compatibility - Added SLF4J logger for debugging and error tracking - Ensures tests pass in both MySQL and ClickHouse environments Fixes cBioPortal#11583

immortal71 · 2026-01-10T04:42:07Z

@onursumer done!!

Copilot AI review requested due to automatic review settings December 5, 2025 23:34

Copilot started reviewing on behalf of immortal71 December 5, 2025 23:35 View session

Copilot AI reviewed Dec 5, 2025

View reviewed changes

...bioportal/legacy/persistence/mybatisclickhouse/MolecularDataMyBatisClickhouseRepository.java Outdated Show resolved Hide resolved

dippindots requested review from dippindots, fuzhaoyuan and onursumer and removed request for dippindots December 9, 2025 16:20

onursumer requested changes Dec 10, 2025

View reviewed changes

immortal71 changed the base branch from master to rc-7.0-clickhouse-only December 11, 2025 09:41

immortal71 mentioned this pull request Dec 11, 2025

molecular data meta endpoint needs to handle multiple molecular profiles #11761

Open

This was referenced Dec 11, 2025

Add molecular data counts endpoint for multiple profiles - Fixes #11761 #11856

Open

Refactor sluggish endpoints/sql for clickhouse-only "mode" #11582

Open

dippindots requested a review from onursumer January 3, 2026 00:19

immortal71 changed the base branch from rc-7.0-clickhouse-only to master January 8, 2026 13:32

immortal71 added 2 commits January 9, 2026 20:39

Optimized multi-profile molecular data by fetching all genes in a sin…

7cffb3b

…gle query to reduce N+1 queries (ClickHouse perf)

Added ClickHouse molecular data mapper, model and clickhouse reposito…

a7fd8d0

…ry that aggregates per-sample rows into gene-profile values

immortal71 added 3 commits January 9, 2026 20:39

Refactor ClickHouse optimization per reviewer feedback - Move logic i…

a0c1a9b

…nto existing repository/mapper - Remove separate ClickHouse classes - Change conditional property to true

immortal71 force-pushed the issue-11583-clickhouse-performance branch from 97a0773 to d4c610f Compare January 10, 2026 04:40


		import java.io.Serializable;

		public class MolecularDataRowPerSample implements Serializable {

Uh oh!

Fixes #11583 — Optimize molecular data multi-profile fetch for ClickHouse (reduce N+1 queries) #11840

Are you sure you want to change the base?

Fixes #11583 — Optimize molecular data multi-profile fetch for ClickHouse (reduce N+1 queries) #11840

Conversation

immortal71 commented Dec 5, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Key Changes

Reviewed changes

Uh oh!

Uh oh!

immortal71 commented Dec 7, 2025

Uh oh!

onursumer left a comment

Choose a reason for hiding this comment

Uh oh!

onursumer Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

onursumer Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

onursumer Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

onursumer Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

immortal71 commented Dec 11, 2025

Uh oh!

immortal71 commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Refactoring Complete per Reviewer Feedback

Changes Made:

Uh oh!

immortal71 commented Dec 11, 2025

Uh oh!

immortal71 commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

immortal71 commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

onursumer commented Jan 7, 2026

Uh oh!

immortal71 commented Jan 8, 2026

Uh oh!

immortal71 commented Jan 8, 2026

Uh oh!

onursumer commented Jan 8, 2026

Uh oh!

immortal71 commented Jan 9, 2026

Uh oh!

immortal71 commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

onursumer commented Jan 9, 2026

Uh oh!

immortal71 commented Jan 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

immortal71 commented Dec 11, 2025 •

edited

Loading

immortal71 commented Dec 28, 2025 •

edited

Loading

immortal71 commented Jan 2, 2026 •

edited

Loading

immortal71 commented Jan 9, 2026 •

edited

Loading