-
Notifications
You must be signed in to change notification settings - Fork 19
Make idx files independent as before (no requirements of keeping fasta around) #100
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ASTA is no longer needed to run FI searches. The flanking amino acid residues, which were previously parsed on-the-fly from the FASTA, are no longer reported.
…on indexing) now store protein description text in them, making them self-sufficient for searching. The original FASTA is no longer needed; previous the flanking AA residues and protein accessions were dynamically parsed from the corresponding FASTA. NOTE: flanking AA residues are currently not stored and reported in the .idx searches; I'll work on adding these to the .idx files next. Also MzIdentML output code needs fixing with respect to .idx searches; since the original FASTA is no longer required, I need to properly handle things like exporting the DBSequence element only if the FASTA is present.
…output formats appear working now
…y previously parsed on-the-fly from the FASTA each time after each search (which is pretty horrid from a performance perspective). Now that the FASTA is no longer needed, this info needs to be stored in the .idx. Next, I need to add this to the the old peptide index.
…"struct ScoresMS1" to "struct CometScores"/"struct CometScoresMS1" to avoid some "Scores" naming collision issue with Percolator "Scores" and (2) add parenthesis around "(std::numeric_limits<unsigned int>::max)()" to avoid the "max" macro expansion in Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR makes .idx files independent so that .fasta files are no longer required after the .idx file has been created. It achieves this by storing protein description information and flanking amino acids directly in the .idx file. Additionally, it renames Comet's Scores struct to CometScores to resolve a naming conflict with Percolator's Scores class in Crux code.
Key Changes:
- Added flanking amino acid storage (cPrevAA/cNextAA) to indexed peptides so they don't need to be retrieved from FASTA
- Introduced
g_pvProteinNamesmap to store protein names in the .idx file - Added
g_bIdxNoFastaflag to handle .idx searches without requiring the corresponding .fasta file - Renamed
Scores/ScoresMS1structs toCometScores/CometScoresMS1throughout the codebase
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| RealtimeSearch/SearchMS1MS2.cs | Testing/debug configuration changes (iPrintEveryScan, bPerformMS1Search) |
| CometWrapper/CometWrapper.cpp | Updated to use renamed CometScores and CometScoresMS1 types |
| CometWrapper/CometDataWrapper.h | Updated wrapper classes to use renamed score types |
| CometSearch/CometWriteMzIdentML.cpp | Added logic to skip sequence retrieval when g_bIdxNoFasta is true; updated protein output for indexed databases |
| CometSearch/CometSearchManager.h | Updated method signatures to use renamed score types |
| CometSearch/CometSearchManager.cpp | Major refactoring of file handling to support optional FASTA; removed unused fpfasta references; added g_bIdxNoFasta flag management |
| CometSearch/CometSearch.h | Reorganized includes (moved to header from .cpp) |
| CometSearch/CometSearch.cpp | Added flanking AA calculation during index building and scoring; includes debug code that needs removal |
| CometSearch/CometPeptideIndex.cpp | Modified to write protein names and flanking AAs to .idx file; includes memory leak |
| CometSearch/CometMassSpecUtils.cpp | Simplified protein name retrieval; contains disabled GetPrevNextAA function |
| CometSearch/CometInterfaces.h | Updated interface to use renamed score types |
| CometSearch/CometFragmentIndex.h | Renamed WritePlainPeptideIndex to WriteFIPlainPeptideIndex |
| CometSearch/CometFragmentIndex.cpp | Modified to write protein names and flanking AAs to fragment index; includes memory leak and incomplete implementation |
| CometSearch/CometDataInternal.h | Added cPrevAA/cNextAA fields to DBIndex and PlainPeptideIndexStruct; added g_pvProteinNames map; contains logic error in operator< |
| CometSearch/CometData.h | Renamed Scores to CometScores and ScoresMS1 to CometScoresMS1 |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…decoy peptides (relevant only for peptide indexing)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
… admonished by copilot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
Comments suppressed due to low confidence (1)
CometSearch/CometData.h:192
- The assignment operator should take a const reference (const CometScoresMS1&) instead of a non-const reference (CometScoresMS1&). This is the standard C++ convention for copy assignment operators.
CometScoresMS1& operator=(CometScoresMS1& a)
{
fDotProduct = a.fDotProduct;
fRTime = a.fRTime;
iScanNumber = a.iScanNumber;
return *this;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Revert to making .idx files independent i.e. there's no need for .fasta anymore for peptide index and FI searches once the . idx has been created. This entails writing the protein description info and flanking AA into the .idx, both of which were previously parsed from the .fasta. Also rename Comet's "Scores" struct to "CometScores" as there's some new conflict in Crux code with a "Scores" class in Percolator.