-
-
Notifications
You must be signed in to change notification settings - Fork 788
Add molecular data counts endpoint for multiple profiles - Fixes #11761 #11856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Add molecular data counts endpoint for multiple profiles - Fixes #11761 #11856
Conversation
…gle query to reduce N+1 queries (ClickHouse perf)
…ry that aggregates per-sample rows into gene-profile values
…nto existing repository/mapper - Remove separate ClickHouse classes - Change conditional property to true
Per reviewer feedback from @onursumer: --> Removed separate ClickHouse-specific repository and mapper classes --> Moved optimization into existing MolecularDataMyBatisRepository - Updated existing MolecularDataMapper.xml with conditional ClickHouse query --> Changed @ConditionalOnProperty havingValue from 'test' to 'true' --> Reuses existing GeneMolecularAlteration model instead of new legacy classes The ClickHouse path queries genetic_alteration_derived table and aggregates per-sample rows into CSV format in the repository layer.
…ioPortal#11761 -- Created MolecularDataCountItem model to represent per-profile counts -- Added fetchMolecularDataCountsInMultipleMolecularProfiles method to service layer -- Implemented new /api/molecular-data/counts POST endpoint - Returns JSON array with count per molecular profile in single database query -- Leverages existing getMolecularDataInMultipleMolecularProfiles optimization from PR cBioPortal#11840 -- Added unit tests for service and controller layers -- Includes implementation plan document for reference
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a new endpoint /api/molecular-data/counts to retrieve molecular data counts for multiple molecular profiles in a single request, addressing performance issues with ClickHouse-backed databases where N sequential HTTP requests cause slow performance.
Key Changes:
- New
MolecularDataCountItemmodel class to represent per-profile counts - New service method
fetchMolecularDataCountsInMultipleMolecularProfilesthat leverages existing optimized queries - New POST endpoint at
/api/molecular-data/countswith proper security annotations - ClickHouse-optimized query implementation in the repository layer
- Unit tests for both service and controller layers
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
MolecularDataCountItem.java |
New model class representing molecular data count per profile |
MolecularDataService.java |
Added interface method for fetching counts across multiple profiles |
MolecularDataServiceImpl.java |
Implementation that groups molecular data by profile and counts entries; optimized single query approach |
MolecularDataController.java |
New POST endpoint with security annotations and Swagger documentation |
MolecularDataMyBatisRepository.java |
Added ClickHouse-specific implementation with conditional routing based on clickhouse_mode flag |
MolecularDataMapper.java |
Added new mapper method for ClickHouse query |
MolecularDataMapper.xml |
Added ClickHouse-optimized SQL query for genetic_alteration_derived table; contains unprofessional comment |
MolecularDataServiceImplTest.java |
Added unit tests but references undefined helper methods |
MolecularDataControllerTest.java |
Added controller test with proper mocking and assertions |
ISSUE-11761-IMPLEMENTATION-PLAN.md |
Implementation plan document describing the approach |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Fetch samples and build unique ID map | ||
| List<Sample> samples = sampleService.getSamplesByInternalIds(allInternalIds); |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The repository injects SampleService as a dependency (line 22-23), creating a circular dependency issue. The repository layer should not depend on the service layer. Instead, the transformation from internal IDs to samples should be handled in the service layer, with the repository only dealing with data persistence concerns. Consider refactoring to move the sample lookup logic to MolecularDataServiceImpl.
| private List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse( | ||
| Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) { | ||
|
|
||
| List<GeneMolecularAlteration> rawRows = | ||
| molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse( | ||
| molecularProfileIds, entrezGeneIds); |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ClickHouse method signature doesn't accept the projection parameter but the caller passes it. The method getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse at line 96 has parameter String projection but the mapper method at line 100 doesn't accept a projection parameter. While the projection parameter is unused in the ClickHouse implementation (which always returns full data), this inconsistency could be confusing. Consider either removing the unused parameter from the private method signature or documenting why it's ignored for ClickHouse.
src/main/java/org/cbioportal/legacy/persistence/mybatis/MolecularDataMyBatisRepository.java
Show resolved
Hide resolved
src/main/java/org/cbioportal/legacy/web/MolecularDataController.java
Outdated
Show resolved
Hide resolved
| .thenReturn(createSampleIdMap()); | ||
|
|
||
| List<Sample> samples = createSamples(); | ||
| when(sampleService.getSamplesByInternalIds(any())).thenReturn(samples); |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test references undefined helper methods createSampleIdMap() and createSamples() that don't exist in this class or its parent class BaseServiceImplTest. These methods need to be implemented for the test to compile and run successfully.
| List<Integer> allInternalIds = new ArrayList<>(); | ||
| profileSamplesMap.values().forEach(s -> | ||
| Arrays.stream(s.getSplitSampleIds()) | ||
| .mapToInt(Integer::parseInt) |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Potential uncaught 'java.lang.NumberFormatException'.
src/main/java/org/cbioportal/legacy/persistence/mybatis/MolecularDataMyBatisRepository.java
Outdated
Show resolved
Hide resolved
…r.java Co-authored-by: Copilot <[email protected]>
…larDataMyBatisRepository.java Co-authored-by: Copilot <[email protected]>
…larDataMyBatisRepository.java Co-authored-by: Copilot <[email protected]>
…em.java Co-authored-by: Copilot <[email protected]>
|
@alisman @dippindots hey I have make some changes according to copilot suggestion !! ,can you review this and let me know if there is any other thing i should do or if it good for merger !! |
Summary
Implements a new endpoint to retrieve molecular data counts for multiple molecular profiles in a single database call, eliminating the need for N HTTP requests when querying N profiles.
Problem
The existing
/api/molecular-profiles/{molecularProfileId}/molecular-data/fetch?projection=METAendpoint returns counts in HTTP headers but requires one call per profile. This causes:Solution
Created new
/api/molecular-data/countsPOST endpoint that:[{molecularProfileId: "...", count: 123}, ...]Changes
MolecularDataCountItem- represents count per molecular profilefetchMolecularDataCountsInMultipleMolecularProfilesmethodPOST /api/molecular-data/countsendpointTesting
Related Issues
Fixes #11761
Notes
getMolecularDataInMultipleMolecularProfilesmethod from PR Fixes #11583 — Optimize molecular data multi-profile fetch for ClickHouse (reduce N+1 queries) #11840