Protein Prospector Converter #39

jcmaynard · 2022-11-17T16:58:23Z

Hi,

Would it be possible to add a converter for Protein Prospector output?

Cheers!

devonjkohler · 2023-10-19T17:00:14Z

I haven't seen the output of PaSER. Could you shoot me over some data for this? I can look into the converter if I have the data.

Devon

jcmaynard · 2023-11-02T16:48:07Z

Hi Devon,

I can send you some Prospector Output, one report with phospho and one with global proteins? Where is the best place to send it?

Cheers,

Jason

devonjkohler · 2023-11-02T20:05:44Z

Hey @jcmaynard,

Would you be able to share them via email: [email protected]

Devon

tonywu1999 · 2024-07-06T00:01:17Z

Hi @jcmaynard

I got started on writing the code for the protein prospector converter. Devon shared me your dataset, but I got a little confused on the data format.

Where is the RAW file name located in the output? I saw there was a cell containing a filename looking text - Z20180606_YvA_TotalRPLC/SW201948rc2mc2mm. Is that the RAW file name? If so, what would happen if you're looking to analyze a dataset with multiple MS runs (e.g. multiple TMT mixtures), i.e. would each MS run be its own TXT output file?
Which column represents the precursor charge? Is it z? With other tools, I usually see a column called Charge

jcmaynard · 2024-07-08T18:01:55Z

Hi @tonywu1999

Here is the manual for Protein Prospector (specifically the section on data output): https://prospector.ucsf.edu/prospector/html/instruct/batchtagman.htm#search_compare

The first two rows have some cells that represent the Project Name: "Z20180606_YvA_TotalRPLC", and the search name "SW201948rc2mc2mm".

The data report lists the Peaklists used for the search under the header "Fraction". The charge state is under column header "z".

In the case of TMT10 or TMTpro, the intensity headers will have the same name for example "Int 127" for both 127N and 127C. The N isotope will always be first. I'm in the process of trying to get the Prospector Admin to change this.

There are a number of different options for reporting peptide mods in Prospector, the data I shared was just one of them. Mods can be split out into separate columns if that would be easier to parse.

Jason

tonywu1999 · 2024-07-12T22:03:09Z

The data report lists the Peaklists used for the search under the header "Fraction". The charge state is under column header "z".

Understood - z represents the precursor charge.

The first two rows have some cells that represent the Project Name: "Z20180606_YvA_TotalRPLC", and the search name "SW201948rc2mc2mm".

I'm still confused on how to determine which run(s) produced this report. For example, if you see the attached example from another tool (MaxQuant), you can see that there's a column "Raw.file" that outlines which RAW file a particular row is associated with. But I can't seem to find that in the Prospector search file.

In the case of TMT10 or TMTpro, the intensity headers will have the same name for example "Int 127" for both 127N and 127C. The N isotope will always be first. I'm in the process of trying to get the Prospector Admin to change this.

Could you clarify what you mean by this? How would one know when a measurement is associated with the N isotope vs the C isotope based on inspecting the input dataset?

There are a number of different options for reporting peptide mods in Prospector, the data I shared was just one of them. Mods can be split out into separate columns if that would be easier to parse.

I think your initial dataset works well with how peptide mods are reported. Is there a certain setting that a user needs to select to display the mods in the current format (i.e. is this the default format)?

jcmaynard · 2024-07-15T18:10:06Z

Hi @tonywu1999,

Here is a breakdown of the column headers from the reports I sent @devonjkohler :
Column Headers

"" - this first column has no header, it is the protein rank that prospector outputs. the only meaningful thing here is that if there is a hyphen in the rank, example [2-3], that represents a homologous protein.
Uniq Pep -
Acc # - Uniprot Accession number
Gene
Num Unique - number of unique peptides matched to the protein
% Cov
Best Disc Score
Best Expect Val
M+H - Singly charged peptide mass
m/z - Precursor m/z value
z - precursor charge state
ppm - precursor mass error
Prev AA
DB Peptide - Peptide with no modifications shown
Peptide - Peptide with modifications present
Next AA
Protein Mods - All variable modification present with the protein AA number after the @ symbol. The = refers to the slip score (the modification site localization score). If a modification has a SLIP score less than 6 (less than ~95% confidence) than alternative modification sites will be shown with "|" symbol.
Composition - This is a user defined column showing if a requested modification is present.
M Cl - # of missed cleavages
Fraction - Peaklist/raw file that the spectra is from
RT - Retention time
Spectrum
MSMS Info - Scan number of the spectra from the raw file.
Int 126 - Reporter ion intensity
Int 127
Int 128
Int 129
Int 130
Int 131
Start - Protein amino acid position of the first amino acid in the peptide
Score - peptide score
Expect - expectation value from search
'# in DB - number of times the peptide is found in the database
Protein MW
Species
Protein Name

The Reporter Ion Intensity Columns for TMT greater than 6 plex will now have an Isotope label, example: "Int 127N" and "Int 127C"

Modifications can be reported in 5 different ways:

Off: Only the DB Peptide column is shown
Mods in Peptide: All mods are shown in the peptide in a column named "Peptide" (this is what is shown in the dataset you have)
Variable Mods only: Only the variable mods are shown in a separate column "Variable Mods" example: "Oxidation@11;Oxidation@12"
All Mods (1 column): All modifications are in one column named "Mods"
All Mods (2 Columns): Modifications are split between two columns: "Constant Mods" and "Variable Mods"

For the above settings modifications are reported at the Peptide level. Oxidation@11 refers to the 11th amino acid of the peptide. Protein modification is a separate column discussed above. The default is "Variable Mods only", but "Mods in Peptide" or one of the all mods are used more often.

The TMT modification names in Prospector are: TMT6plex, TMT10plex, and TMT16plex

I'm happy to set up a zoom or call to discuss if that would be helpful.

Cheers,

Jason

tonywu1999 · 2024-07-20T17:17:18Z

@jcmaynard

Hi,

I'd be happy to discuss on a call. I think you answered all my questions but I'm curious on how the modifications are reported and would like more clarity on that.

Could you email me at [email protected] and we can coordinate a time to discuss?

Thanks,
Tony

tonywu1999 · 2024-09-03T18:22:21Z

@jcmaynard

In terms of timeline for the MSstatsPTM converter, I'm anticipating for it to be complete by end of October. I had initially thought it would be complete earlier, but I noticed the code for MSstatsPTM needs some refactoring before implementing the code for the protein prospector converter.

So far, I created the converter from Protein Prospector to MSstatsTMT format, which is accessible at MSstatsConvert.

jcmaynard · 2024-09-05T18:08:47Z

Thanks for the update Tony

…

On Tue, Sep 3, 2024 at 11:22 AM tonywu1999 ***@***.***> wrote: @jcmaynard <https://github.com/jcmaynard> In terms of timeline for the MSstatsPTM converter, I'm anticipating for it to be complete by end of October. I had initially thought it would be complete earlier, but I noticed the code for MSstatsPTM needs some refactoring before implementing the code for the protein prospector converter. So far, I created the converter from Protein Prospector to MSstatsTMT format, which is accessible at MSstatsConvert. — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE7FMXFTUOZS7KAA63ALRUDZUX47HAVCNFSM6AAAAABKN4NCZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRXGE2TCOJTGU> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tonywu1999 · 2024-09-16T23:51:30Z

Adding a comment describing notes from the meeting between me and Jason here from July:

General Notes:

Fraction column has RAW file name, even if there’s multiple mixtures.
Shared proteins can show up - we should throw those out
All PSMs is a file. Users should use “Keep replicates” option.
There is ambiguity with how MSstatsTMT should handle which feature intensity to use in fractionation scenarios.

Slip Score Notes:

75=11 is slip score (localization score)
- 75 is the position of the amino acid w.r.t. the protein
- Slip score 6 = 95% confidence. (Protein Mods)
75|77 - this means identification fell below 95% confidence, unclear which site got modified
- Typically we should filter these values out.
Phospho&Phospho (110 & (113 | 115)) | 112 & (113 | 115)) is ambiguous too.
Interesting cases:
- Methylation - dimethlyation vs trimethylatkion can be reported.
- We should not see two modifications on the same amino acid at the same time.
- TMT labels modified every lysine and N-terminus are assumed (constant mods). Protein modifications are mods at the variable mods protein level.

tonywu1999 · 2024-12-16T20:18:47Z

@jcmaynard

Apologies for the delays. The PR should be merged now and is available on Github at the moment. You can install the package on Github on the R console

devtools::install_github("Vitek-Lab/MSstatsConvert", build_vignettes = TRUE)

We will look to push to bioconductor in the future. Please let me know if you have any problems.

jcmaynard · 2024-12-16T20:34:29Z

Thanks Tony, I'll check this out as soon as I can. Jason

…

On Mon, Dec 16, 2024 at 12:19 PM tonywu1999 ***@***.***> wrote: Apologies for the delays. The PR should be merged now and is available on Github at the moment. You can install the package on Github on the R console devtools::install_github("Vitek-Lab/MSstatsConvert", build_vignettes = TRUE) We will look to push to bioconductor in the future. Please let me know if you have any problems. — Reply to this email directly, view it on GitHub <#39 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AE7FMXHHRZ3YOX2QEM7COPT2F4YT3AVCNFSM6AAAAABKN4NCZKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNBWGY2DGMJTHE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

devonjkohler added the enhancement New feature or request label Oct 19, 2023

tonywu1999 mentioned this issue Jul 25, 2024

feat(protein-prospector): Add TMT converter for protein prospector Vitek-Lab/MSstatsConvert#97

Merged

3 tasks

tonywu1999 mentioned this issue Sep 16, 2024

[WIP] feature(protein-prospector): Add new MSstatsPTM converter for protein prospector #100

Merged

3 tasks

tonywu1999 added this to the Beta Testing milestone Dec 16, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Protein Prospector Converter #39

Protein Prospector Converter #39

jcmaynard commented Nov 17, 2022

devonjkohler commented Oct 19, 2023

jcmaynard commented Nov 2, 2023

devonjkohler commented Nov 2, 2023

tonywu1999 commented Jul 6, 2024 •

edited

Loading

jcmaynard commented Jul 8, 2024

tonywu1999 commented Jul 12, 2024

jcmaynard commented Jul 15, 2024

tonywu1999 commented Jul 20, 2024

tonywu1999 commented Sep 3, 2024

jcmaynard commented Sep 5, 2024 via email

tonywu1999 commented Sep 16, 2024 •

edited

Loading

tonywu1999 commented Dec 16, 2024 •

edited

Loading

jcmaynard commented Dec 16, 2024 via email

Protein Prospector Converter #39

Protein Prospector Converter #39

Comments

jcmaynard commented Nov 17, 2022

devonjkohler commented Oct 19, 2023

jcmaynard commented Nov 2, 2023

devonjkohler commented Nov 2, 2023

tonywu1999 commented Jul 6, 2024 • edited Loading

jcmaynard commented Jul 8, 2024

tonywu1999 commented Jul 12, 2024

jcmaynard commented Jul 15, 2024

tonywu1999 commented Jul 20, 2024

tonywu1999 commented Sep 3, 2024

jcmaynard commented Sep 5, 2024 via email

tonywu1999 commented Sep 16, 2024 • edited Loading

tonywu1999 commented Dec 16, 2024 • edited Loading

jcmaynard commented Dec 16, 2024 via email

tonywu1999 commented Jul 6, 2024 •

edited

Loading

tonywu1999 commented Sep 16, 2024 •

edited

Loading

tonywu1999 commented Dec 16, 2024 •

edited

Loading