Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

khan2023() correction and new datasets #29

Merged
merged 9 commits into from
Oct 28, 2024
106 changes: 106 additions & 0 deletions R/hu2023.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
##' Hu et al, 2023 (The Journal of Physical Chemistry B): Correlated protein modules
##'
##' @description
##'
##' They demonstrate the correlations between the levels of pairs of proteins
##' in single-cell proteomics (SCP) at steady state. In measuring pairwise
##' correlations among 1000 proteins in a population of K562 cells and oocytes,
##' they observed many correlated protein modules (CPMs) that are functionally
##' involved in certain biological functions. Certain CPMs are specific to a
##' particular cell type, some common to different cell types. Additionally,
##' compared to single-cell transcriptomics and bulk proteomics,
##' protein correlations are functionally and experimentally more significant
##' in SCP than those corresponding mRNAs.
##'
##' @format Two [SingleCellExperiment] objects:
##'
##' - `proteins_K562`: protein data containing quantitative data for 1249
##' proteins and 69 single-cells with zero imputation.
##' - `proteins_oocyte`: protein data containing quantitative data for 3422
##' proteins and 137 single-cells with zero imputation.
##'
##' The `colData(hu2023_oocyte())` contains cell type annotation.
##' The `colData(hu2023_K562())` contains cell type annotation.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: K562 cells were re-suspended and washed in cold PBS.
##' Single cells/10 cells were sorted into 96-well plates using a FACSAria
##' instrument. Oocyte-cumulus complexes from C57/6J mice were collected
##' after PMSG and HCG injections, with hyaluronidase used to remove cumulus
##' cells. All samples stored at -80 degrees Celsius.
##' - **Sample preparation** Cells were digested with trypsin at 37 degrees
##' Celsius for 3 hours. For label-free proteomics, digestion was terminated
##' by adding 0.43% TFA and 1% ACN in water, followed by drying in a
##' concentrator. Peptides were resuspended in 0.1% TFA and 1% ACN, and
##' then transferred to sample tubes for LC-MS/MS analysis.
##' - **Separation**: 4 microliters of peptide digests were injected into a
##' high-performance chromatography column (IonOpticks) and separated at a
##' flow rate of 100 nL/min using a nanoflow liquid chromatography system.
##' The effective gradient was 70 mins, allowing 16 cells per day.
##' - **Ionization**: Peptides were analyzed using an Orbitrap Eclipse mass
##' spectrometer with a FAIMS Pro interface. FAIMS compensation voltages of
##' −55 and −70 V were applied, with a 1-second cycle time for both voltages.
##' - **Mass spectrometry**: MS spectra were acquired with the Orbitrap
##' analyzer, while MS/MS spectra were acquired with a linear ion trap
##' analyzer. The maximum ion injection time for MS/MS was 200 ms.
##' - **Data analysis**: MS raw files were searched against the UniProt
##' human protein database and an in-house contamination database
##' using Proteome Discoverer(2.4). Label-free quantification was based on
##' peak intensity with the match-between-runs (MBR) feature enabled.
##'
##' @section Data collection:
##'
##' The oocyte protein data shared by the author and it is accessible from the
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d)
##' The K563 protein data is accessible from the
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata
##'
##' - `DataMatrix-oocyte-20240614.csv`: normalized imputed protein matrix
##' - `ProteinAbundance.Rdata`: protein matrices (normalized, log transformed)
##'
##' We initialized an empty QFeatures object and added the corresponding
##' protein assays as [SingleCellExperiment] objects.
##'
##' The oocyte protein data were exported from the shared link as
##' (`DataMatrix-oocyte-20240614.csv`). The data were formatted to a
##' [SingleCellExperiment] object and the SampleType information were added
##' as only metadata, and stored in the `colData`. The object is then added
##' to the [QFeatures] object.
##'
##' The 562 cells protein data were downloaded from the GitHub link and loaded
##' to the memory. The `Norm` object were formatted to a [SingleCellExperiment]
##' object and the SampleType information were added as only metadata, and
##' stored in the `colData`. The object is then added to the [QFeatures] object.
##'
##' @source
##' The oocyte data were downloaded from the
##' [Shared File](https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?e=E5m09k&xsdata=MDV8MDJ8ZW5lcy5heWFyQHVjbG91dmFpbi5iZXxjYjY2M2MwYzNjMDY0YjZhNjc1NTA4ZGM4YzMzNjc1YXw3YWIwOTBkNGZhMmU0ZWNmYmM3YzQxMjdiNGQ1ODJlY3wxfDB8NjM4NTM5Mzk5NjI1Mzg1NDQ3fFVua25vd258VFdGcGJHWnNiM2Q4ZXlKV0lqb2lNQzR3TGpBd01EQWlMQ0pRSWpvaVYybHVNeklpTENKQlRpSTZJazFoYVd3aUxDSlhWQ0k2TW4wPXwwfHx8&sdata=Zmt4YnZFZFViTitJRkdTc0FTK2thMjdTT0EzV2JJeS83WlZmV3R6SzdvRT0%3d)
##' The K563 cells protein data downloaded from the
##' [GitHub] https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata
##' The raw data and the quantification data can also be found in the
##' MassIVE repository `MSV000089625`:
##' ftp://[email protected]/.
##'
##' @references
##' Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
##' “Correlated protein modules revealing functional coordination of interacting
##' proteins are detected by single-cell proteomics.”. The Journal of Physical
##' Chemistry B,
##' ([link to article](https://doi.org/10.1021/acs.jpcb.3c00014)).
##'
##' @aliases hu2023_K562
##' @aliases hu2023_oocyte
##'
##' @examples
##' \donttest{
##' hu2023_oocyte()
##' hu2023_K562()
##' }
##'
##' @keywords datasets
##'
"hu2023"
16 changes: 8 additions & 8 deletions R/khan2023.R
Original file line number Diff line number Diff line change
Expand Up @@ -20,10 +20,10 @@
##' empty (negative control) channels and unused channels.
##' - `peptides`: peptide data containing quantitative data for 10055
##' peptides and 421 single-cells.
##' - `proteins_imputed`: protein data containing quantitative data for 4096
##' proteins and 421 single-cells with k-nearest neighbors (KNN) imputation.
##' - `proteins_unimputed`: protein data containing quantitative data for 4096
##' proteins and 421 single-cells without imputation.
##' - `proteins_imputed`: protein data containing quantitative data for 4571
##' proteins and 420 single-cells with k-nearest neighbors (KNN) imputation.
##' - `proteins_unimputed`: protein data containing quantitative data for 4571
##' proteins and 420 single-cells without imputation.
##'
##' The `colData(khan2023())` contains cell type and batch annotations that
##' are common to all assays. The description of the `rowData` fields for the
Expand Down Expand Up @@ -73,7 +73,7 @@
##' based on the peptide sequence information through an `AssayLink` object.
##'
##' The imputed protein data were taken from the same google drive folder
##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_imputedNotBC.csv`).
##' (`EpiToMesen.TGFB.nPoP_trial1_1PercDartFDRTMTBulkDIA.WallE_imputed.txt`).
##' The data were formatted to a [SingleCellExperiment] object and the sample
##' metadata were matched to the column names (mapping is retrieved
##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
Expand All @@ -82,7 +82,7 @@
##' based on the protein sequence information through an `AssayLink` object.
##'
##' The unimputed protein data were taken from the same google drive folder
##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_unimputed.csv`).
##' (`EpiToMesen.TGFB.nPoP_trial1_1PercDartFDRTMTBulkDIA.WallE_unimputed.txt`).
##' The data were formatted and added exactly as imputed data.
##'
##' @source
Expand All @@ -97,8 +97,8 @@
##' @references
##' Saad Khan, Rachel Conover, Anand R. Asthagiri, Nikolai Slavov. 2023.
##' "Dynamics of single-cell protein covariation during epithelial–mesenchymal
##' transition." bioRxiv.
##' ([link to article](https://doi.org/10.1101/2023.12.21.572913)).
##' transition." Journal of Proteome Research.
##' ([link to article](https://pubs.acs.org/doi/10.1021/acs.jproteome.4c00277)).
##'
##' @examples
##' \donttest{
Expand Down
84 changes: 84 additions & 0 deletions R/krull2024.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
##' Krull et al, 2024 (Nature Communications): IFN-γ response
##'
##' They develop a new strategy for data-independent acquisition (DIA) that
##' leverages the co-analysis of low-input samples alongside a corresponding
##' enhancer (ME) of higher input. Using DIA-ME, they investigate the
##' proteomic response of U-2 OS cells to interferon gamma (IFN-y) at
##' the single-cell level.
##'
##' @format A [QFeatures] object with 159 assays, each assay being a
##' [SingleCellExperiment] object.
##'
##' - Assay 1-158: DIA-NN main output report table split for each
##' acquisition run. First 15 run acquires 10 single cells (MEs) and,
##' remaining 143 run acquires 1 single cell. It contains the results
##' of the spectrum identification and quantification.
##' - `proteins`: DIA-NN protein group matrix, containing normalised
##' quantities for 1553 protein groups in 143 single cells. Proteins
##' are filtered at (Q.Value <= 0.01), (Lib.Q.Value <= 0.01), and
##' (Lib.PG.Q.Value <= 0.01).
##'
##' The `colData(krull2024())` contains cell type annotations. The description
##' of the `rowData` fields for the different assays can be found in the
##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: cells were detached with trypsin digestion, followed
##' by dilution in 1.5 mL PBS, and isolated using BD FACSAria III instrument.
##' - **Sample preparation**: Sorted single cells were collected in lysis
##' buffer (50 mM TEAB, pH 8.5, and 0.025% DDM), denatured at 70 degrees
##' Celsius for 30 minutes. Samples were acidified with 0.5% FA and
##' transferred to auto sampler plates for mass spectrometry analysis.
##' - **Separation**: Peptides were injected in a 2 microliter volume onto
##' a (25 cm x 75 micrometer) ID column at a flow rate of 300 nL/min,
##' separated using a gradient of ACN in water with 0.1% FA over 15 minutes,
##' connected to a nano-ESI source.
##' - **Ionization**: Ionization was performed using a 1,500 V capillary
##' voltage with 3.0 L/min dry gas and a dry temperature of 180 degrees
##' Celsius. MS data acquisition was conducted in diaPASEF mode using a
##' timsTOF Pro mass spectrometer.
##' - **Mass spectrometry**: MS1 scans covered a range of 200-1,700 m/z,
##' while DIA window isolation targeted 475-1,000 m/z with eight DIA scans
##' per cycle. Fragmentation was triggered by collision energy ranging from
##' 45 eV to 27 eV depending on the ion mobility.
##' - **Data analysis**: Data was processed using DIA-NN (v1.8.0) and
##' Spectronaut 18 in a library-free approach, using deep learning
##' for spectrum prediction, retention times, and ion mobility.
##'
##' @section Data collection:
##'
##' The data were collected from the PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464)
##' in the `03_SingleCell_Searches.zip` file.
##'
##' We loaded the DIA-NN main report table and generated a sample
##' annotation table based on the MS file names. We next combined the
##' sample annotation and the DIANN tables into a [QFeatures] object
##' following the `scp` data structure. We loaded the proteins group
##' matrix as a [SingleCellExperiment] object, and added the protein data
##' as a new assay and link the precursors to proteins using the
##' `Protein.Group` variable from the `rowData`.
##'
##' @source
##' The data were downloaded from PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD053464)
##' with accession ID `PXD053464`.
##'
##' @references
##' Krull, K. K., Ali, S. A., & Krijgsveld, J. 2024. "Enhanced feature matching
##' in single-cell proteomics characterizes IFN-γ response and co-existence of
##' Cell States." Nature Communications, 15(1).
##' [Link to article](https://doi.org/10.1038/s41467-024-52605-x)
##'
##' @examples
##' \donttest{
##' krull2024()
##' }
##'
##' @keywords datasets
##'
"krull2024"
3 changes: 3 additions & 0 deletions inst/extdata/metadata.csv
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,6 @@
"guise2024","Single-cell proteomics data of 108 postmortem CTL or ALS spinal moto neurons","3.19",NA,"TXT","ftp://massive.ucsd.edu/v05/MSV000092119/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Christophe Vanderaa <[email protected]>","QFeatures","Rda","scpdata/guise2024.rda",2024-01-05,47,"Proteome Discoverer","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_mES","Mouse embryonic stem cells across ground-state (m2i) and differentiation-permissive (m15) culture conditions.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT",NA,"Homo sapiens",9606,TRUE,"Dataverse","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_mES.Rda",2024-04-09,605,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"petrosius2023_AstralAML","Single-cell proteomics data of 4 cell types from the OCI-AML8227 model.","3.19",NA,"TXT","https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/4DSPJM",NA,"Homo sapiens",9606,TRUE,"Dataverse","Samuel Gregoire <[email protected]>","QFeatures","Rda","scpdata/petrosius2023_AstralAML.Rda",2023-06-08,217,"Spectronaut","LFQ",TRUE,TRUE,TRUE,TRUE,NA
"krull2024","Single-cell proteomics data IFN-γ response of U-2 OS cells","3.19",NA,"TXT","https://www.ebi.ac.uk/pride/archive/projects/PXD053464",NA,"Homo sapiens",9606,TRUE,"PRIDE","Enes Sefa Ayar <[email protected]>","QFeatures","Rda","scpdata/krull2024.Rda",2024-10-24,159,"DIA-NN","LFQ",TRUE,FALSE,TRUE,TRUE,NA
"hu2023_K562","Single-cell proteomics data of K562 cells","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_K562.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA
"hu2023_oocyte","Single-cell proteomics data of oocytes","3.19",NA,"TXT","ftp://massive.ucsd.edu/MSV000089625/",NA,"Homo sapiens",9606,TRUE,"MassIVE","Enes Sefa Ayar <[email protected]>","SingleCellExperiment","Rda","scpdata/hu2023_oocyte.Rda",2024-10-24,1,"Proteome Discoverer","LFQ",FALSE,FALSE,TRUE,TRUE,NA
43 changes: 43 additions & 0 deletions inst/scripts/make-data_hu2023_K562.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@

####---- Hu et al, 2023 ---####


## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
## “Correlated protein modules revealing functional coordination of interacting
## proteins are detected by single-cell proteomics.”. The Journal of Physical
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014

library(SingleCellExperiment)
library(scp)
library(tidyverse)

root <- "~/localdata/SCP/hu2023/"

####---- Add the protein data ----####

## Data accessible at GitHub repository
## https://github.com/dionezhang/CPM/blob/master/ProteinAbundance.Rdata

#### Load data ####
load(paste0(root, "ProteinAbundance.Rdata"))

Norm %>%
mutate(X = rownames(Norm)) %>%
readSingleCellExperiment(ecol = 1:69, fnames = "X") ->
K562

## Protein data for K562 cells
hu2023_K562 <- SingleCellExperiment(K562)

prots <- rownames(hu2023_K562)
rowData(hu2023_K562) <- Description[prots, ,drop = FALSE]
rowData(hu2023_K562)$protein <- prots

colData(hu2023_K562) <- DataFrame(row.names = colnames(Norm),
SampleType = rep("K562", length(colnames(Norm))))

## Save data
save(hu2023_K562,
file = file.path(paste0(root, "hu2023_K562.Rda")),
compress = "xz",
compression_level = 9)
39 changes: 39 additions & 0 deletions inst/scripts/make-data_hu2023_oocyte.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@

####---- Hu et al, 2023 ---####


## Hu, M., Zhang, Y., Yuan, Y., Ma, W., Zheng, Y., Gu, Q., & Xie, X. S. 2023.
## “Correlated protein modules revealing functional coordination of interacting
## proteins are detected by single-cell proteomics.”. The Journal of Physical
## Chemistry B, https://doi.org/10.1021/acs.jpcb.3c00014

library(SingleCellExperiment)
library(scp)
library(tidyverse)

root <- "~/localdata/SCP/hu2023/"

####---- Add the protein data ----####

## Data shared by the author, and accessible at
## https://biopic-my.sharepoint.cn/:x:/g/personal/humo_biopic_pku_edu_cn/EfX4CHedVopLuSx2OJNj6LABdESGNdKz4Eh8Zawvd-fNNQ?rtime=7Xzb4B303Eg

#### Load Data ####
oocyte <- read.csv(paste0(root, "DataMatrix-oocyte-20240614.csv"))
oocyte %>%
rename(protein = X) %>%
readSingleCellExperiment(ecol = 2:138, fnames = "protein") ->
oocyte

## Protein data for oocytes
hu2023_oocyte <- SingleCellExperiment(oocyte)

colData(hu2023_oocyte) <- DataFrame(row.names = colnames(hu2023_oocyte),
SampleType = rep("oocyte", length(colnames(oocyte))))

## Save data
save(hu2023_oocyte,
file = file.path(paste0(root, "hu2023_oocyte.Rda")),
compress = "xz",
compression_level = 9)

Loading
Loading