-
-
Notifications
You must be signed in to change notification settings - Fork 789
Add molecular data counts endpoint for multiple profiles - Fixes #11761 #11856
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
178fa63
92f117c
60bb0dd
9266690
6580004
72d95b0
3b5a82f
f118ba1
b0707a2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Issue #11761: Molecular Data Meta Endpoint for Multiple Profiles | ||
|
|
||
| ## Problem Analysis | ||
| Currently, when the frontend needs to get counts for multiple molecular profiles, it must: | ||
| 1. Call `/api/molecular-profiles/{profileId}/molecular-data/fetch?projection=META` for EACH profile | ||
| 2. This causes N database hits with joins, which is slow on ClickHouse | ||
|
|
||
| The existing `/api/molecular-data/fetch?projection=META` endpoint returns ONE total count across all profiles, but doesn't provide per-profile breakdown. | ||
|
|
||
| ## Proposed Solution | ||
| Create a new endpoint that: | ||
| - Accepts multiple molecular profile IDs | ||
| - Returns per-profile counts in a single database query | ||
| - Returns JSON response (not just HTTP headers) with profile-specific counts | ||
|
|
||
| ## Implementation Plan | ||
|
|
||
| ### 1. Backend Changes | ||
|
|
||
| #### New Model Class: `MolecularDataCountItem.java` | ||
| ```java | ||
| package org.cbioportal.legacy.model; | ||
|
|
||
| public class MolecularDataCountItem { | ||
| private String molecularProfileId; | ||
| private Integer count; | ||
|
|
||
| // getters/setters | ||
| } | ||
| ``` | ||
|
|
||
| #### Update Service Interface: `MolecularDataService.java` | ||
| Add method: | ||
| ```java | ||
| List<MolecularDataCountItem> fetchMolecularDataCountsInMultipleMolecularProfiles( | ||
| List<String> molecularProfileIds, | ||
| List<String> sampleIds, | ||
| List<Integer> entrezGeneIds); | ||
| ``` | ||
|
|
||
| #### Update Service Implementation: `MolecularDataServiceImpl.java` | ||
| Implement the new method by: | ||
| 1. Querying all profiles at once | ||
| 2. Grouping results by molecularProfileId | ||
| 3. Counting entries per profile | ||
|
|
||
| #### Update Controller: `MolecularDataController.java` | ||
| Add new endpoint: | ||
| ```java | ||
| @PostMapping("/molecular-data/counts") | ||
| public ResponseEntity<List<MolecularDataCountItem>> fetchMolecularDataCountsInMultipleMolecularProfiles( | ||
| @RequestBody MolecularDataMultipleStudyFilter filter); | ||
| ``` | ||
|
|
||
| ### 2. Frontend Changes (if needed) | ||
| Update `ResultsViewPageStore.ts` to use the new endpoint instead of making N calls. | ||
|
|
||
| ## Database Query Optimization | ||
| The service will use existing `getMolecularDataInMultipleMolecularProfiles` but with projection="ID" to minimize data transfer, then group and count in Java. | ||
|
|
||
| For ClickHouse optimization, we can leverage the already-optimized query from PR #11840. | ||
|
|
||
| ## Testing Plan | ||
| 1. Unit tests for service method | ||
| 2. Integration test for controller endpoint | ||
| 3. Verify single database query is made (not N queries) | ||
| 4. Compare performance with old approach | ||
|
|
||
| ## Files to Modify | ||
| - `src/main/java/org/cbioportal/legacy/model/MolecularDataCountItem.java` (NEW) | ||
| - `src/main/java/org/cbioportal/legacy/service/MolecularDataService.java` | ||
| - `src/main/java/org/cbioportal/legacy/service/impl/MolecularDataServiceImpl.java` | ||
| - `src/main/java/org/cbioportal/legacy/web/MolecularDataController.java` | ||
| - `src/test/java/org/cbioportal/legacy/service/impl/MolecularDataServiceImplTest.java` | ||
| - `src/test/java/org/cbioportal/legacy/web/MolecularDataControllerTest.java` | ||
|
|
||
| ## Questions/Decisions | ||
| 1. Should we return counts per profile, or counts per profile+gene combination? | ||
| - **Answer**: Per profile only (simpler, matches the issue description) | ||
|
|
||
| 2. Should this endpoint support both sampleIds and sampleListId like other endpoints? | ||
| - **Answer**: Yes, for consistency | ||
|
|
||
| 3. Should we add this to the existing `/molecular-data/fetch` endpoint or create new endpoint? | ||
| - **Answer**: New endpoint `/molecular-data/counts` for clarity | ||
|
|
||
| --- | ||
|
|
||
| **Please review this plan and confirm before I proceed with implementation!** |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| package org.cbioportal.legacy.model; | ||
|
|
||
| import java.io.Serializable; | ||
|
|
||
| /** | ||
| * Represents a count of molecular data items for a specific molecular profile. | ||
| * <p> | ||
| * This model is typically used in API responses to convey the number of molecular data records | ||
| * associated with a given molecular profile in the cBioPortal legacy API. | ||
| * </p> | ||
| * <p> | ||
| * Fields: | ||
| * <ul> | ||
| * <li><b>molecularProfileId</b>: The unique identifier of the molecular profile.</li> | ||
| * <li><b>count</b>: The number of molecular data items associated with the profile.</li> | ||
| * </ul> | ||
| * </p> | ||
| */ | ||
| public class MolecularDataCountItem implements Serializable { | ||
|
|
||
| private String molecularProfileId; | ||
| private Integer count; | ||
|
|
||
| public String getMolecularProfileId() { | ||
| return molecularProfileId; | ||
| } | ||
|
|
||
| public void setMolecularProfileId(String molecularProfileId) { | ||
| this.molecularProfileId = molecularProfileId; | ||
| } | ||
|
|
||
| public Integer getCount() { | ||
| return count; | ||
| } | ||
|
|
||
| public void setCount(Integer count) { | ||
| this.count = count; | ||
| } | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,12 +9,21 @@ | |
| import org.cbioportal.legacy.model.MolecularProfileSamples; | ||
| import org.cbioportal.legacy.persistence.MolecularDataRepository; | ||
| import org.springframework.beans.factory.annotation.Autowired; | ||
| import org.springframework.beans.factory.annotation.Value; | ||
| import org.springframework.stereotype.Repository; | ||
| import org.cbioportal.legacy.model.Sample; | ||
| import org.cbioportal.legacy.service.SampleService; | ||
|
|
||
| @Repository | ||
| public class MolecularDataMyBatisRepository implements MolecularDataRepository { | ||
|
|
||
| @Autowired private MolecularDataMapper molecularDataMapper; | ||
|
|
||
| @Autowired(required = false) | ||
| private SampleService sampleService; | ||
|
|
||
| @Value("${clickhouse_mode:false}") | ||
| private boolean clickhouseMode; | ||
|
|
||
| @Override | ||
| public MolecularProfileSamples getCommaSeparatedSampleIdsOfMolecularProfile( | ||
|
|
@@ -71,9 +80,117 @@ public Iterable<GeneMolecularAlteration> getGeneMolecularAlterationsIterableFast | |
| public List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfiles( | ||
| Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) { | ||
|
|
||
| if (clickhouseMode) { | ||
| return getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse( | ||
| molecularProfileIds, entrezGeneIds, projection); | ||
| } | ||
|
|
||
| return molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfiles( | ||
| molecularProfileIds, entrezGeneIds, projection); | ||
| } | ||
|
|
||
| /** | ||
| * ClickHouse-optimized implementation that queries genetic_alteration_derived table | ||
| * and aggregates per-sample rows into CSV format expected by the service layer. | ||
| */ | ||
| private List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse( | ||
| Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) { | ||
|
|
||
| List<GeneMolecularAlteration> rawRows = | ||
| molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse( | ||
| molecularProfileIds, entrezGeneIds); | ||
|
Comment on lines
+96
to
+101
|
||
|
|
||
| if (rawRows.isEmpty()) { | ||
| return Collections.emptyList(); | ||
| } | ||
|
|
||
| // Group by profile | ||
| Map<String, List<GeneMolecularAlteration>> rowsByProfile = rawRows.stream() | ||
| .collect(Collectors.groupingBy(GeneMolecularAlteration::getMolecularProfileId)); | ||
|
|
||
| // Get sample order for each profile | ||
| Map<String, MolecularProfileSamples> profileSamplesMap = | ||
| commaSeparatedSampleIdsOfMolecularProfilesMap(molecularProfileIds); | ||
|
|
||
| // Collect all internal IDs for sample lookup | ||
| List<Integer> allInternalIds = new ArrayList<>(); | ||
| profileSamplesMap.values().forEach(s -> | ||
| Arrays.stream(s.getSplitSampleIds()) | ||
| .mapToInt(Integer::parseInt) | ||
|
||
| .forEach(allInternalIds::add)); | ||
|
|
||
| // Fetch samples and build unique ID map | ||
immortal71 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| if (sampleService == null) { | ||
| throw new IllegalStateException("SampleService is required in ClickHouse mode but is not available."); | ||
| } | ||
| List<Sample> samples = sampleService.getSamplesByInternalIds(allInternalIds); | ||
| Map<Integer, Sample> internalIdToSample = samples.stream() | ||
| .collect(Collectors.toMap(Sample::getInternalId, Function.identity())); | ||
|
|
||
| List<GeneMolecularAlteration> results = new ArrayList<>(); | ||
|
|
||
| for (String profileId : profileSamplesMap.keySet()) { | ||
| MolecularProfileSamples mps = profileSamplesMap.get(profileId); | ||
| String[] sampleIds = mps.getSplitSampleIds(); | ||
|
|
||
| // Build sample unique ID order | ||
| List<String> sampleUniqueIdOrder = new ArrayList<>(sampleIds.length); | ||
| for (String internalIdStr : sampleIds) { | ||
| try { | ||
| int internalId = Integer.parseInt(internalIdStr); | ||
| Sample s = internalIdToSample.get(internalId); | ||
| if (s != null) { | ||
| sampleUniqueIdOrder.add(s.getCancerStudyIdentifier() + "_" + s.getStableId()); | ||
| } else { | ||
| sampleUniqueIdOrder.add(null); | ||
| } | ||
| } catch (NumberFormatException e) { | ||
| // Skip invalid internalIdStr, add null to maintain order | ||
| sampleUniqueIdOrder.add(null); | ||
| } | ||
| } | ||
|
|
||
| List<GeneMolecularAlteration> profileRows = rowsByProfile.get(profileId); | ||
| if (profileRows == null) continue; | ||
|
|
||
| // Group rows by gene | ||
| Map<Integer, List<GeneMolecularAlteration>> geneRows = profileRows.stream() | ||
| .collect(Collectors.groupingBy(GeneMolecularAlteration::getEntrezGeneId)); | ||
|
|
||
| for (Map.Entry<Integer, List<GeneMolecularAlteration>> entry : geneRows.entrySet()) { | ||
| Integer geneId = entry.getKey(); | ||
| List<GeneMolecularAlteration> geneAlterations = entry.getValue(); | ||
|
|
||
| // Build map of sampleUniqueId -> value | ||
| Map<String, String> sampleValueMap = new HashMap<>(); | ||
| for (GeneMolecularAlteration alt : geneAlterations) { | ||
| // Values field contains sampleUniqueId|value for ClickHouse rows | ||
| String[] parts = alt.getValues().split("\\|", 2); | ||
| if (parts.length == 2) { | ||
| sampleValueMap.put(parts[0], parts[1]); | ||
| } | ||
| } | ||
|
|
||
| // Build CSV values string in sample order | ||
| StringBuilder sb = new StringBuilder(); | ||
| for (int i = 0; i < sampleUniqueIdOrder.size(); i++) { | ||
| if (i > 0) sb.append(','); | ||
| String sampleUniqueId = sampleUniqueIdOrder.get(i); | ||
| if (sampleUniqueId != null && sampleValueMap.containsKey(sampleUniqueId)) { | ||
| sb.append(sampleValueMap.get(sampleUniqueId)); | ||
| } | ||
| } | ||
|
|
||
| GeneMolecularAlteration alteration = new GeneMolecularAlteration(); | ||
| alteration.setEntrezGeneId(geneId); | ||
| alteration.setMolecularProfileId(profileId); | ||
| alteration.setValues(sb.toString()); | ||
| results.add(alteration); | ||
| } | ||
| } | ||
|
|
||
| return results; | ||
| } | ||
|
|
||
| @Override | ||
| public List<GenesetMolecularAlteration> getGenesetMolecularAlterations( | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.