Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
89 changes: 89 additions & 0 deletions ISSUE-11761-IMPLEMENTATION-PLAN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Issue #11761: Molecular Data Meta Endpoint for Multiple Profiles

## Problem Analysis
Currently, when the frontend needs to get counts for multiple molecular profiles, it must:
1. Call `/api/molecular-profiles/{profileId}/molecular-data/fetch?projection=META` for EACH profile
2. This causes N database hits with joins, which is slow on ClickHouse

The existing `/api/molecular-data/fetch?projection=META` endpoint returns ONE total count across all profiles, but doesn't provide per-profile breakdown.

## Proposed Solution
Create a new endpoint that:
- Accepts multiple molecular profile IDs
- Returns per-profile counts in a single database query
- Returns JSON response (not just HTTP headers) with profile-specific counts

## Implementation Plan

### 1. Backend Changes

#### New Model Class: `MolecularDataCountItem.java`
```java
package org.cbioportal.legacy.model;

public class MolecularDataCountItem {
private String molecularProfileId;
private Integer count;

// getters/setters
}
```

#### Update Service Interface: `MolecularDataService.java`
Add method:
```java
List<MolecularDataCountItem> fetchMolecularDataCountsInMultipleMolecularProfiles(
List<String> molecularProfileIds,
List<String> sampleIds,
List<Integer> entrezGeneIds);
```

#### Update Service Implementation: `MolecularDataServiceImpl.java`
Implement the new method by:
1. Querying all profiles at once
2. Grouping results by molecularProfileId
3. Counting entries per profile

#### Update Controller: `MolecularDataController.java`
Add new endpoint:
```java
@PostMapping("/molecular-data/counts")
public ResponseEntity<List<MolecularDataCountItem>> fetchMolecularDataCountsInMultipleMolecularProfiles(
@RequestBody MolecularDataMultipleStudyFilter filter);
```

### 2. Frontend Changes (if needed)
Update `ResultsViewPageStore.ts` to use the new endpoint instead of making N calls.

## Database Query Optimization
The service will use existing `getMolecularDataInMultipleMolecularProfiles` but with projection="ID" to minimize data transfer, then group and count in Java.

For ClickHouse optimization, we can leverage the already-optimized query from PR #11840.

## Testing Plan
1. Unit tests for service method
2. Integration test for controller endpoint
3. Verify single database query is made (not N queries)
4. Compare performance with old approach

## Files to Modify
- `src/main/java/org/cbioportal/legacy/model/MolecularDataCountItem.java` (NEW)
- `src/main/java/org/cbioportal/legacy/service/MolecularDataService.java`
- `src/main/java/org/cbioportal/legacy/service/impl/MolecularDataServiceImpl.java`
- `src/main/java/org/cbioportal/legacy/web/MolecularDataController.java`
- `src/test/java/org/cbioportal/legacy/service/impl/MolecularDataServiceImplTest.java`
- `src/test/java/org/cbioportal/legacy/web/MolecularDataControllerTest.java`

## Questions/Decisions
1. Should we return counts per profile, or counts per profile+gene combination?
- **Answer**: Per profile only (simpler, matches the issue description)

2. Should this endpoint support both sampleIds and sampleListId like other endpoints?
- **Answer**: Yes, for consistency

3. Should we add this to the existing `/molecular-data/fetch` endpoint or create new endpoint?
- **Answer**: New endpoint `/molecular-data/counts` for clarity

---

**Please review this plan and confirm before I proceed with implementation!**
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
package org.cbioportal.legacy.model;

import java.io.Serializable;

/**
* Represents a count of molecular data items for a specific molecular profile.
* <p>
* This model is typically used in API responses to convey the number of molecular data records
* associated with a given molecular profile in the cBioPortal legacy API.
* </p>
* <p>
* Fields:
* <ul>
* <li><b>molecularProfileId</b>: The unique identifier of the molecular profile.</li>
* <li><b>count</b>: The number of molecular data items associated with the profile.</li>
* </ul>
* </p>
*/
public class MolecularDataCountItem implements Serializable {

private String molecularProfileId;
private Integer count;

public String getMolecularProfileId() {
return molecularProfileId;
}

public void setMolecularProfileId(String molecularProfileId) {
this.molecularProfileId = molecularProfileId;
}

public Integer getCount() {
return count;
}

public void setCount(Integer count) {
this.count = count;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,9 @@ Cursor<GeneMolecularAlteration> getGeneMolecularAlterationsIter(
List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfiles(
Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection);

List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
Set<String> molecularProfileIds, List<Integer> entrezGeneIds);

List<GenesetMolecularAlteration> getGenesetMolecularAlterations(
String molecularProfileId, List<String> genesetIds, String projection);

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,21 @@
import org.cbioportal.legacy.model.MolecularProfileSamples;
import org.cbioportal.legacy.persistence.MolecularDataRepository;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.stereotype.Repository;
import org.cbioportal.legacy.model.Sample;
import org.cbioportal.legacy.service.SampleService;

@Repository
public class MolecularDataMyBatisRepository implements MolecularDataRepository {

@Autowired private MolecularDataMapper molecularDataMapper;

@Autowired(required = false)
private SampleService sampleService;

@Value("${clickhouse_mode:false}")
private boolean clickhouseMode;

@Override
public MolecularProfileSamples getCommaSeparatedSampleIdsOfMolecularProfile(
Expand Down Expand Up @@ -71,9 +80,117 @@ public Iterable<GeneMolecularAlteration> getGeneMolecularAlterationsIterableFast
public List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfiles(
Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) {

if (clickhouseMode) {
return getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
molecularProfileIds, entrezGeneIds, projection);
}

return molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfiles(
molecularProfileIds, entrezGeneIds, projection);
}

/**
* ClickHouse-optimized implementation that queries genetic_alteration_derived table
* and aggregates per-sample rows into CSV format expected by the service layer.
*/
private List<GeneMolecularAlteration> getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
Set<String> molecularProfileIds, List<Integer> entrezGeneIds, String projection) {

List<GeneMolecularAlteration> rawRows =
molecularDataMapper.getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse(
molecularProfileIds, entrezGeneIds);
Comment on lines +96 to +101
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ClickHouse method signature doesn't accept the projection parameter but the caller passes it. The method getGeneMolecularAlterationsInMultipleMolecularProfilesClickHouse at line 96 has parameter String projection but the mapper method at line 100 doesn't accept a projection parameter. While the projection parameter is unused in the ClickHouse implementation (which always returns full data), this inconsistency could be confusing. Consider either removing the unused parameter from the private method signature or documenting why it's ignored for ClickHouse.

Copilot uses AI. Check for mistakes.

if (rawRows.isEmpty()) {
return Collections.emptyList();
}

// Group by profile
Map<String, List<GeneMolecularAlteration>> rowsByProfile = rawRows.stream()
.collect(Collectors.groupingBy(GeneMolecularAlteration::getMolecularProfileId));

// Get sample order for each profile
Map<String, MolecularProfileSamples> profileSamplesMap =
commaSeparatedSampleIdsOfMolecularProfilesMap(molecularProfileIds);

// Collect all internal IDs for sample lookup
List<Integer> allInternalIds = new ArrayList<>();
profileSamplesMap.values().forEach(s ->
Arrays.stream(s.getSplitSampleIds())
.mapToInt(Integer::parseInt)
Copy link

Copilot AI Dec 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potential uncaught 'java.lang.NumberFormatException'.

Copilot uses AI. Check for mistakes.
.forEach(allInternalIds::add));

// Fetch samples and build unique ID map
if (sampleService == null) {
throw new IllegalStateException("SampleService is required in ClickHouse mode but is not available.");
}
List<Sample> samples = sampleService.getSamplesByInternalIds(allInternalIds);
Map<Integer, Sample> internalIdToSample = samples.stream()
.collect(Collectors.toMap(Sample::getInternalId, Function.identity()));

List<GeneMolecularAlteration> results = new ArrayList<>();

for (String profileId : profileSamplesMap.keySet()) {
MolecularProfileSamples mps = profileSamplesMap.get(profileId);
String[] sampleIds = mps.getSplitSampleIds();

// Build sample unique ID order
List<String> sampleUniqueIdOrder = new ArrayList<>(sampleIds.length);
for (String internalIdStr : sampleIds) {
try {
int internalId = Integer.parseInt(internalIdStr);
Sample s = internalIdToSample.get(internalId);
if (s != null) {
sampleUniqueIdOrder.add(s.getCancerStudyIdentifier() + "_" + s.getStableId());
} else {
sampleUniqueIdOrder.add(null);
}
} catch (NumberFormatException e) {
// Skip invalid internalIdStr, add null to maintain order
sampleUniqueIdOrder.add(null);
}
}

List<GeneMolecularAlteration> profileRows = rowsByProfile.get(profileId);
if (profileRows == null) continue;

// Group rows by gene
Map<Integer, List<GeneMolecularAlteration>> geneRows = profileRows.stream()
.collect(Collectors.groupingBy(GeneMolecularAlteration::getEntrezGeneId));

for (Map.Entry<Integer, List<GeneMolecularAlteration>> entry : geneRows.entrySet()) {
Integer geneId = entry.getKey();
List<GeneMolecularAlteration> geneAlterations = entry.getValue();

// Build map of sampleUniqueId -> value
Map<String, String> sampleValueMap = new HashMap<>();
for (GeneMolecularAlteration alt : geneAlterations) {
// Values field contains sampleUniqueId|value for ClickHouse rows
String[] parts = alt.getValues().split("\\|", 2);
if (parts.length == 2) {
sampleValueMap.put(parts[0], parts[1]);
}
}

// Build CSV values string in sample order
StringBuilder sb = new StringBuilder();
for (int i = 0; i < sampleUniqueIdOrder.size(); i++) {
if (i > 0) sb.append(',');
String sampleUniqueId = sampleUniqueIdOrder.get(i);
if (sampleUniqueId != null && sampleValueMap.containsKey(sampleUniqueId)) {
sb.append(sampleValueMap.get(sampleUniqueId));
}
}

GeneMolecularAlteration alteration = new GeneMolecularAlteration();
alteration.setEntrezGeneId(geneId);
alteration.setMolecularProfileId(profileId);
alteration.setValues(sb.toString());
results.add(alteration);
}
}

return results;
}

@Override
public List<GenesetMolecularAlteration> getGenesetMolecularAlterations(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import org.cbioportal.legacy.model.GeneFilterQuery;
import org.cbioportal.legacy.model.GeneMolecularAlteration;
import org.cbioportal.legacy.model.GeneMolecularData;
import org.cbioportal.legacy.model.MolecularDataCountItem;
import org.cbioportal.legacy.model.meta.BaseMeta;
import org.cbioportal.legacy.service.exception.MolecularProfileNotFoundException;

Expand Down Expand Up @@ -51,4 +52,7 @@ List<GeneMolecularData> getMolecularDataInMultipleMolecularProfilesByGeneQueries

BaseMeta getMetaMolecularDataInMultipleMolecularProfiles(
List<String> molecularProfileIds, List<String> sampleIds, List<Integer> entrezGeneIds);

List<MolecularDataCountItem> fetchMolecularDataCountsInMultipleMolecularProfiles(
List<String> molecularProfileIds, List<String> sampleIds, List<Integer> entrezGeneIds);
}
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import org.cbioportal.legacy.model.GeneFilterQuery;
import org.cbioportal.legacy.model.GeneMolecularAlteration;
import org.cbioportal.legacy.model.GeneMolecularData;
import org.cbioportal.legacy.model.MolecularDataCountItem;
import org.cbioportal.legacy.model.MolecularProfile;
import org.cbioportal.legacy.model.MolecularProfile.MolecularAlterationType;
import org.cbioportal.legacy.model.MolecularProfileSamples;
Expand Down Expand Up @@ -225,18 +226,16 @@ public List<GeneMolecularData> getMolecularDataInMultipleMolecularProfiles(
samples = sampleService.fetchSamples(studyIds, sampleIds, "ID");
}

// query each entrezGeneId separately so they can be cached
// Query all requested entrez gene ids in a single call instead of per-gene queries.
// Doing so reduces query overhead and avoids N+1 query patterns, which are a
// significant performance problem when using high-latency backends such as ClickHouse.
if (entrezGeneIds == null || entrezGeneIds.isEmpty()) {
return molecularDataList;
}

List<GeneMolecularAlteration> molecularAlterations =
entrezGeneIds.stream()
.flatMap(
gene ->
molecularDataRepository
.getGeneMolecularAlterationsInMultipleMolecularProfiles(
distinctMolecularProfileIds,
Collections.singletonList(gene),
projection)
.stream())
.collect(Collectors.toList());
molecularDataRepository.getGeneMolecularAlterationsInMultipleMolecularProfiles(
distinctMolecularProfileIds, entrezGeneIds, projection);
Map<String, List<GeneMolecularAlteration>> molecularAlterationsMap =
molecularAlterations.stream()
.collect(groupingBy(GeneMolecularAlteration::getMolecularProfileId));
Expand Down Expand Up @@ -316,6 +315,36 @@ public BaseMeta getMetaMolecularDataInMultipleMolecularProfiles(
return baseMeta;
}

@Override
@PreAuthorize(
"hasPermission(#molecularProfileIds, 'Collection<MolecularProfileId>', T(org.cbioportal.legacy.utils.security.AccessLevel).READ)")
public List<MolecularDataCountItem> fetchMolecularDataCountsInMultipleMolecularProfiles(
List<String> molecularProfileIds, List<String> sampleIds, List<Integer> entrezGeneIds) {

// Fetch molecular data with minimal projection to reduce data transfer
List<GeneMolecularData> molecularData =
getMolecularDataInMultipleMolecularProfiles(
molecularProfileIds, sampleIds, entrezGeneIds, "ID");

// Group by molecular profile ID and count
Map<String, Long> countsByProfile =
molecularData.stream()
.collect(
Collectors.groupingBy(
GeneMolecularData::getMolecularProfileId, Collectors.counting()));

// Convert to list of MolecularDataCountItem
List<MolecularDataCountItem> result = new ArrayList<>();
for (String profileId : molecularProfileIds) {
MolecularDataCountItem item = new MolecularDataCountItem();
item.setMolecularProfileId(profileId);
item.setCount(countsByProfile.getOrDefault(profileId, 0L).intValue());
result.add(item);
}

return result;
}

private void validateMolecularProfile(String molecularProfileId)
throws MolecularProfileNotFoundException {

Expand Down
Loading