Skip to content

Conversation

@immortal71
Copy link

Summary

Implements a new endpoint to retrieve molecular data counts for multiple molecular profiles in a single database call, eliminating the need for N HTTP requests when querying N profiles.

Problem

The existing /api/molecular-profiles/{molecularProfileId}/molecular-data/fetch?projection=META endpoint returns counts in HTTP headers but requires one call per profile. This causes:

  • N HTTP requests for N profiles
  • N database queries with joins (slow on ClickHouse)
  • Poor performance when dealing with multiple profiles

Solution

Created new /api/molecular-data/counts POST endpoint that:

Changes

  • New Model: MolecularDataCountItem - represents count per molecular profile
  • Service Layer: Added fetchMolecularDataCountsInMultipleMolecularProfiles method
  • Controller: New POST /api/molecular-data/counts endpoint
  • Tests: Unit tests for both service and controller layers
  • Documentation: Implementation plan document included

Testing

  • Added unit tests for service layer (MolecularDataServiceImplTest)
  • Added controller test (MolecularDataControllerTest)
  • Tests verify correct grouping and counting per profile
  • No compilation errors

Related Issues

Fixes #11761

Notes

…gle query to reduce N+1 queries (ClickHouse perf)
…ry that aggregates per-sample rows into gene-profile values
…nto existing repository/mapper - Remove separate ClickHouse classes - Change conditional property to true
Per reviewer feedback from @onursumer:
--> Removed separate ClickHouse-specific repository and mapper classes
--> Moved optimization into existing MolecularDataMyBatisRepository
- Updated existing MolecularDataMapper.xml with conditional ClickHouse query
--> Changed @ConditionalOnProperty havingValue from 'test' to 'true'
--> Reuses existing GeneMolecularAlteration model instead of new legacy classes

The ClickHouse path queries genetic_alteration_derived table and aggregates
per-sample rows into CSV format in the repository layer.
…ioPortal#11761

-- Created MolecularDataCountItem model to represent per-profile counts
-- Added    fetchMolecularDataCountsInMultipleMolecularProfiles method to service layer
-- Implemented new /api/molecular-data/counts POST endpoint
- Returns JSON array with count per molecular profile in single database query
-- Leverages existing getMolecularDataInMultipleMolecularProfiles optimization from PR cBioPortal#11840
-- Added unit tests for service and controller layers
-- Includes implementation plan document for reference
Copilot AI review requested due to automatic review settings December 11, 2025 12:21
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a new endpoint /api/molecular-data/counts to retrieve molecular data counts for multiple molecular profiles in a single request, addressing performance issues with ClickHouse-backed databases where N sequential HTTP requests cause slow performance.

Key Changes:

  • New MolecularDataCountItem model class to represent per-profile counts
  • New service method fetchMolecularDataCountsInMultipleMolecularProfiles that leverages existing optimized queries
  • New POST endpoint at /api/molecular-data/counts with proper security annotations
  • ClickHouse-optimized query implementation in the repository layer
  • Unit tests for both service and controller layers

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
MolecularDataCountItem.java New model class representing molecular data count per profile
MolecularDataService.java Added interface method for fetching counts across multiple profiles
MolecularDataServiceImpl.java Implementation that groups molecular data by profile and counts entries; optimized single query approach
MolecularDataController.java New POST endpoint with security annotations and Swagger documentation
MolecularDataMyBatisRepository.java Added ClickHouse-specific implementation with conditional routing based on clickhouse_mode flag
MolecularDataMapper.java Added new mapper method for ClickHouse query
MolecularDataMapper.xml Added ClickHouse-optimized SQL query for genetic_alteration_derived table; contains unprofessional comment
MolecularDataServiceImplTest.java Added unit tests but references undefined helper methods
MolecularDataControllerTest.java Added controller test with proper mocking and assertions
ISSUE-11761-IMPLEMENTATION-PLAN.md Implementation plan document describing the approach

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 122 to 123
// Fetch samples and build unique ID map
List<Sample> samples = sampleService.getSamplesByInternalIds(allInternalIds);
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The repository injects SampleService as a dependency (line 22-23), creating a circular dependency issue. The repository layer should not depend on the service layer. Instead, the transformation from internal IDs to samples should be handled in the service layer, with the repository only dealing with data persistence concerns. Consider refactoring to move the sample lookup logic to MolecularDataServiceImpl.

Copilot uses AI. Check for mistakes.
Comment on lines +96 to +101
private List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) {

List<GeneMolecularAlteration> rawRows =
molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
molecularProfileIds, entrezGeneIds);
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClickHouse method signature doesn't accept the projection parameter but the caller passes it. The method getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse at line 96 has parameter String projection but the mapper method at line 100 doesn't accept a projection parameter. While the projection parameter is unused in the ClickHouse implementation (which always returns full data), this inconsistency could be confusing. Consider either removing the unused parameter from the private method signature or documenting why it's ignored for ClickHouse.

Copilot uses AI. Check for mistakes.
Comment on lines +437 to +440
.thenReturn(createSampleIdMap());

List<Sample> samples = createSamples();
when(sampleService.getSamplesByInternalIds(any())).thenReturn(samples);
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test references undefined helper methods createSampleIdMap() and createSamples() that don't exist in this class or its parent class BaseServiceImplTest. These methods need to be implemented for the test to compile and run successfully.

Copilot uses AI. Check for mistakes.
List<Integer> allInternalIds = new ArrayList<>();
profileSamplesMap.values().forEach(s ->
Arrays.stream(s.getSplitSampleIds())
.mapToInt(Integer::parseInt)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential uncaught 'java.lang.NumberFormatException'.

Copilot uses AI. Check for mistakes.
@immortal71
Copy link
Author

immortal71 commented Dec 27, 2025

@alisman @dippindots hey I have make some changes according to copilot suggestion !! ,can you review this and let me know if there is any other thing i should do or if it good for merger !!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

molecular data meta endpoint needs to handle multiple molecular profiles

1 participant