split data.R

UCLouvain-CBIO · Apr 17, 2024 · e3070d0 · e3070d0
1 parent 67f010c
commit e3070d0
Show file tree

Hide file tree

Showing 28 changed files with 2,727 additions and 2,833 deletions.
diff --git a/NEWS.md b/NEWS.md
@@ -3,6 +3,7 @@
 ## scpdata 1.11.4
 
 - fix non-ASCII characters
+- split data.R into multiple data specific files
 
 ## scpdata 1.11.3
 

diff --git a/R/brunner2022.R b/R/brunner2022.R
@@ -0,0 +1,85 @@
+##' Brunner et al. 2022 (Mol. Syst. Biol.): cell cycle state study
+##'
+##' Single cell proteomics data acquired by the Mann Lab using a newly
+##' designed timsTOF instrument, referred to as timsTOF-SCP. The
+##' dataset contains quantitative information from single-cells blocked
+##' at 4 cell cycle stages: G1, G1-S, G2, G2-M. The data was acquired
+##' using a label-free sample preparation protocole combined to a
+##' data independent (DIA) acquisition mode.
+##'
+##' @format A [QFeatures] object with 435 assays, each assay being a
+##' [SingleCellExperiment] object.
+##'
+##' - Assay 1-434: DIA-NN main output report table split for each
+##'   acquisition run. Since each run acquires 1 single cell, each
+##'   assay contains a single column. It contains the results
+##'   of the spectrum identification and quantification.
+##' - `protein`: DIA-NN protein group matrix, containing normalised
+##'   quantities for 2476 protein groups in 434 single cells. Proteins
+##'   are filtered at 1% FDR, using global q-values for protein groups
+##'   and both global and run-specific q-values for precursors.
+##'
+##' The `colData(brunner2022())` contains cell type annotations and
+##' batch annotations. The description of the `rowData` fields for the
+##' different assays can be found in the
+##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: cells were detached with trypsin treatment,
+##'   followed by strong pipetting, and isolate using FACS.
+##' - **Sample preparation**: cell lysis by freeze-heat followed by
+##'   sonication, overnight protein digestion with trypsin/lysC mix and
+##'   desalting using EvoTips trap column (EvoSep)
+##' - **Separation**: online EvoSep One LC system using a 5 cm x 75 µm
+##'   ID column with 1.9µm C18 beads (EvoSep) at 100nL/min flow rate.
+##' - **Ionization**: 10µm ID zero dead volume electrospray emitter
+##'   (Bruker Daltonik) + nanoelectro-spray ion source (Captive spray,
+##'   Bruker Daltonik)
+##' - **Mass spectrometry**: DIA PASEF mode. Correlation between IM
+##'   and m/z was used to synchronize the elution of precursors from
+##'   each IM scan with the quadrupole isolation window. Five
+##'   consecutive diaPASEF cycles. The collision energy was ramped
+##'   linearly as a function of the IM from 59 eV at 1/K0=1.6 Vs cm^2
+##'   to 20 eV at 1/K0=0.6 Vs cm^2.
+##' - **Data analysis**: DIA-NN (1.8).
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE
+##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
+##' in the `DIANN1.8_SingleCells_CellCycle.zip` file.
+##'
+##' We loaded the DIA-NN main report table and generated a sample
+##' annotation table based on the MS file names. We next combined the
+##' sample annotation and the DIANN tables into a [QFeatures] object
+##' following the `scp` data structure. We loaded the proteins group
+##' matrix as a [SingleCellExperiment] object, fixed ambiguous
+##' protein group names, and added the protein data as a new assay and
+##' link the precursors to proteins using the `Protein.Group` variable
+##' from the `rowData`.
+##'
+##' @source
+##' The data were downloaded from PRIDE
+##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
+##' with accession ID `PXD024043`.
+##'
+##' @references
+##' Brunner, Andreas-David, Marvin Thielert, Catherine Vasilopoulou,
+##' Constantin Ammar, Fabian Coscia, Andreas Mund, Ole B. Hoerning, et
+##' al. 2022. "Ultra-High Sensitivity Mass Spectrometry Quantifies
+##' Single-Cell Proteome Changes upon Perturbation." Molecular Systems
+##' Biology 18 (3): e10798.
+##' [Link to article](http://dx.doi.org/10.15252/msb.202110798)
+##'
+##' @examples
+##' \donttest{
+##' brunner2022()
+##' }
+##'
+##' @keywords datasets
+##'
+"brunner2022"
diff --git a/R/cong2020AC.R b/R/cong2020AC.R
@@ -0,0 +1,87 @@
+##' Cong et al. 2020 (Ana. Chem.): HeLa single cells
+##'
+##' Single-cell proteomics using the nanoPOTS sample processing device
+##' in combination with ultranarrow-bore (20um i.d.) packed-column LC
+##' separations and the Orbitrap Eclipse Tribrid MS. The dataset
+##' contains label-free quantitative information at PSM, peptide and
+##' protein level. The samples are single Hela cells. Bulk samples
+##' (100 and 20 cells) were also included in the experiment to
+##' increase the idendtification rate thanks to between-run matching
+##' (cf MaxQuant).
+##'
+##' @format A [QFeatures] object with 9 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `100/20 HeLa cells`: 2 assays containing PSM data for a bulk
+##'   of 100 or 20 HeLa cells, respectively.
+##' - `Blank`: assay containing the PSM data for a blank sample
+##' - `Single cell X`: 4 assays containing PSM data for a single cell.
+##'   The `X` indicates the replicate number.
+##' - `peptides`: quantitative data for 12590 peptides in 7 samples
+##'   (all runs combined).
+##' - `proteins`: quantitative data for 1801 proteins in 7 samples
+##'   (all runs combined).
+##'
+##' Sample annotation is stored in `colData(cong2020AC())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The HeLa cells were diluted and aspired
+##'   using a microcapillary with a pulled tip.
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
+##'   Lys-C digestion + cleave RapiGest (formic acid)
+##' - **Separation**: UltiMate 3000 RSLCnano pump with a home-packed
+##'   nanoLC column (60cm x 20um i.d.; approx. 20 nL/min)
+##' - **Ionization**: ESI (2,000V; Nanospray Flex)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Eclipse.
+##'   MS1 settings: accumulation time = 246ms; resolution = 120,000;
+##'   AGC = 1E6. MS/MS settings depend on quantity. All: AGC = 1E5.
+##'   20-100 cels: accumulation time = 246ms; resolution = 120,000.
+##'   Single cells: accumulation time = 500ms; resolution = 240,000.
+##' - **Data analysis**: MaxQuant (v1.6.3.3) + Excel
+##'
+##' @section Data collection:
+##'
+##' The PSM, peptide and protein data were collected from the PRIDE
+##' repository (accession ID: PXD016921).  We downloaded the
+##' `evidence.txt` file containing the PSM identification and
+##' quantification results. The sample annotation was inferred from
+##' the samples names. The data were then converted to a [QFeatures]
+##' object using the [scp::readSCP()] function.
+##'
+##' The peptide data were processed similarly from the `peptides.txt`
+##' file. The quantitative column names were adpated to match the PSM
+##' data. The peptide data were added to [QFeatures] object and link
+##' between the features were stored.
+##'
+##' The protein data were similarly processed from the
+##' `proteinGroups.txt` file. The quantitative column names were
+##' adapted to match the PSM data. The peptide data were added to
+##' [QFeatures] object and link between the features were stored.
+##'
+##' @source
+##' All files can be downloaded from the PRIDE repository PXD016921.
+##' The source link is:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/02/PXD016921
+##'
+##' @references
+##'
+##' Cong, Yongzheng, Yiran Liang, Khatereh Motamedchaboki, Romain
+##' Huguet, Thy Truong, Rui Zhao, Yufeng Shen, Daniel Lopez-Ferrer,
+##' Ying Zhu, and Ryan T. Kelly. 2020. “Improved Single-Cell Proteome
+##' Coverage Using Narrow-Bore Packed NanoLC Columns and
+##' Ultrasensitive Mass Spectrometry.” Analytical Chemistry, January.
+##' ([link to article](https://doi.org/10.1021/acs.analchem.9b04631)).
+##'
+##' @examples
+##' \donttest{
+##' cong2020AC()
+##' }
+##'
+##' @keywords datasets
+##'
+"cong2020AC"