- Mono
- RawFileReader from Planet Orbitrap or email [email protected] with Subject "Access to RawFileReader"
- mono RawRead.exe 171010_Ip_Hela_ugi.raw (for all scans)
- mono RawRead.exe <... rawFile> 0 2 (for profile scans with charge state > 1)
awk -F '\t' '{print $1" "$6}' 171010_Ip_Hela_ugi.rawCombined/combined/txt/proteinGroups.txt | less
awk -F '\t' '{print $16}' 171010_Ip_Hela_ugi.raw.intensity0.charge0-comet-human.txt | lessNote: the RawRead.cs in this repository for other extraction of all info in raw file; countIons.cs is an additional helper focused on compact per-scan tables and targeted accumulation.
RawRead.cs contains the C# code for extraction of all info in raw file
Linux (Mono C# compiler):
mcs RawRead.cs /reference:ThermoFisher.CommonCore.RawFileReader.dll \
/reference:ThermoFisher.CommonCore.Data.dll \
/reference:ThermoFisher.CommonCore.MassPrecisionEstimator.dll \
/reference:MathNet.Numerics.dll /reference:System.Numerics.dllWindows: use the MS C# compiler (csc) or a suitable .NET toolchain.
Key outputs (examples)
<raw>.MZ.txt— mass precision estimates (Mass, mmu, ppm)<raw>.chromatogram.txt— BasePeak chromatogram (RT, intensity)<raw>.centroid.MGF,<raw>.profile.MGF— MS2 blocks for centroid/profile scans<raw>.profile.intensity{insThr}.charge{chgThr}.MS.txt— per-scan profile/intensity listing<raw>.intensity{insThr}.charge{chgThr}.FFT.txt— FFT-derived summary
Notable behaviors & caveats
- Uses
GetReaction(0).PrecursorMassfor PEPMASS in MGF; some scans may lack reactions — code assumes they exist. - Extracts charge from trailer labels by matching
"Charge State:"— trailer labeling may vary by instrument/firmware. - Heuristic branches (e.g.,
title.Contains(" ms ")) determine some output formats; these heuristics may not be universal. - Large files: the code keeps arrays sized by the number of scans and may use significant memory for very long runs.
countIons.cs is an additional helper focused on compact per-scan tables and targeted accumulation. countIons.cs (compiled to countIons.exe) writes a compact per-scan TSV next to a RAW file (<raw>.cI.tsv) and can accumulate targeted TIC values from a targets CSV.
Build and run example
mcs countIons.cs /reference:ThermoFisher.CommonCore.RawFileReader.dll \
/reference:ThermoFisher.CommonCore.Data.dll \
/reference:ThermoFisher.CommonCore.MassPrecisionEstimator.dll \
/reference:MathNet.Numerics.dll /reference:System.Numerics.dll -out:countIons.exe
mono countIons.exe /path/to/file.raw /path/to/targets.csvNote
- Tolerances (mass: 0.0001, time: 0.01 min) are hard-coded, but can be exposed as CLI flags on request.
Key behavior
-
Writes per-scan TSV:
<raw>.cI.tsv(scan, BasePeakMass, TIC, title, time, etc.). If an apparently-complete.cI.tsvexists the program will reuse it instead of re-scanning the RAW. -
Reads a targets CSV (header must include
Mass [m/z]). OptionalStart [min]andEnd [min]columns are supported and used as RT windows. -
Matching rules (current defaults):
- Observed m/z is parsed from the scan
title(e.g.1120.0691in... [email protected] ...). The code does NOT use BasePeakMass for matching. - Mass tolerance: absolute difference <= 0.0001.
- Time window: if
Start [min]/End [min]are present, the scan RT must be inside [Start - 0.01, End + 0.01] minutes.
- Observed m/z is parsed from the scan
-
Because
countIons.csonly uses the precursor m/z parsed from the collected scan title (and an absolute mass tolerance of 0.0001), it cannot reliably discriminate different peptide sequences that share the same monoisotopic mass within that tolerance when their retention-time windows overlap. For example, two peptides with identical or near-identical monoisotopic mass (e.g., "LSLAQEDLISNR" vs "GSLLLGGLDAEASR" in a hypothetical case) will both be counted for the same scan if the scan's RT falls inside both targets' Start/End windows. If you need sequence-level disambiguation you should use additional information (e.g., MS2 fragment matching, narrower mass/time tolerances, or peptide-specific markers) rather than title-only m/z matching.
Note about accumulated intensities
- In the present implementation, when a single scan matches multiple targets the scan's TIC is added to each matching target's AccumulatedTIC. In other words, the same ion intensity can be "double-counted" (or counted multiple times) across targets. All such cases are listed in the
<raw>.<csv-basename>.duplicated_scans.tsvreport so you can find and inspect duplicated assignments. - Because
countIons.csonly uses the precursor m/z parsed from the collected scan title (and an absolute mass tolerance of 0.0001), it cannot reliably discriminate different peptide sequences that share the same monoisotopic mass within that tolerance when their retention-time windows overlap. For example, two peptides with identical or near-identical monoisotopic mass (e.g., "LSLAQEDLISNR" vs "GSLLLGGLDAEASR" in a hypothetical case) will both be counted for the same scan if the scan's RT falls inside both targets' Start/End windows. If you need sequence-level disambiguation you should use additional information (e.g., MS2 fragment matching, narrower mass/time tolerances, or peptide-specific markers) rather than title-only m/z matching.
Outputs
- Accumulation CSV:
<raw>.<csv-basename>— contains the original target CSV fields plusAccumulatedTIC,MatchedCount,MatchedMasses,MatchedTimes.MatchedMasseslists the per-scan identifiers (scan numbers) joined by semicolons. - Duplicates report:
<raw>.<csv-basename>.duplicated_scans.tsv— scans matched to more than one target; includes the full per-scan row and the full target CSV rows for each matching target. - Unmatched report:
<raw>.<csv-basename>.unmatched_scans.tsv— per-scan rows with no matching target. - A concise match summary is printed to the console (counts and report paths).
Future work
- A natural next step is to actually inspect MS2 fragment spectra (fragment ion matching) to disambiguate peptides that share precursor m/z. Implementing fragment-based matching would allow sequence-level confirmation (for example by matching theoretical fragment ions or using a lightweight search engine) and avoid counting ambiguous precursors based on title-only m/z. This is planned as a future enhancement.