Skip to content

Commit

Permalink
split data.R
Browse files Browse the repository at this point in the history
  • Loading branch information
lgatto committed Apr 17, 2024
1 parent 67f010c commit e3070d0
Show file tree
Hide file tree
Showing 28 changed files with 2,727 additions and 2,833 deletions.
1 change: 1 addition & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
## scpdata 1.11.4

- fix non-ASCII characters
- split data.R into multiple data specific files

## scpdata 1.11.3

Expand Down
85 changes: 85 additions & 0 deletions R/brunner2022.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
##' Brunner et al. 2022 (Mol. Syst. Biol.): cell cycle state study
##'
##' Single cell proteomics data acquired by the Mann Lab using a newly
##' designed timsTOF instrument, referred to as timsTOF-SCP. The
##' dataset contains quantitative information from single-cells blocked
##' at 4 cell cycle stages: G1, G1-S, G2, G2-M. The data was acquired
##' using a label-free sample preparation protocole combined to a
##' data independent (DIA) acquisition mode.
##'
##' @format A [QFeatures] object with 435 assays, each assay being a
##' [SingleCellExperiment] object.
##'
##' - Assay 1-434: DIA-NN main output report table split for each
##' acquisition run. Since each run acquires 1 single cell, each
##' assay contains a single column. It contains the results
##' of the spectrum identification and quantification.
##' - `protein`: DIA-NN protein group matrix, containing normalised
##' quantities for 2476 protein groups in 434 single cells. Proteins
##' are filtered at 1% FDR, using global q-values for protein groups
##' and both global and run-specific q-values for precursors.
##'
##' The `colData(brunner2022())` contains cell type annotations and
##' batch annotations. The description of the `rowData` fields for the
##' different assays can be found in the
##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: cells were detached with trypsin treatment,
##' followed by strong pipetting, and isolate using FACS.
##' - **Sample preparation**: cell lysis by freeze-heat followed by
##' sonication, overnight protein digestion with trypsin/lysC mix and
##' desalting using EvoTips trap column (EvoSep)
##' - **Separation**: online EvoSep One LC system using a 5 cm x 75 µm
##' ID column with 1.9µm C18 beads (EvoSep) at 100nL/min flow rate.
##' - **Ionization**: 10µm ID zero dead volume electrospray emitter
##' (Bruker Daltonik) + nanoelectro-spray ion source (Captive spray,
##' Bruker Daltonik)
##' - **Mass spectrometry**: DIA PASEF mode. Correlation between IM
##' and m/z was used to synchronize the elution of precursors from
##' each IM scan with the quadrupole isolation window. Five
##' consecutive diaPASEF cycles. The collision energy was ramped
##' linearly as a function of the IM from 59 eV at 1/K0=1.6 Vs cm^2
##' to 20 eV at 1/K0=0.6 Vs cm^2.
##' - **Data analysis**: DIA-NN (1.8).
##'
##' @section Data collection:
##'
##' The data were collected from the PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
##' in the `DIANN1.8_SingleCells_CellCycle.zip` file.
##'
##' We loaded the DIA-NN main report table and generated a sample
##' annotation table based on the MS file names. We next combined the
##' sample annotation and the DIANN tables into a [QFeatures] object
##' following the `scp` data structure. We loaded the proteins group
##' matrix as a [SingleCellExperiment] object, fixed ambiguous
##' protein group names, and added the protein data as a new assay and
##' link the precursors to proteins using the `Protein.Group` variable
##' from the `rowData`.
##'
##' @source
##' The data were downloaded from PRIDE
##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
##' with accession ID `PXD024043`.
##'
##' @references
##' Brunner, Andreas-David, Marvin Thielert, Catherine Vasilopoulou,
##' Constantin Ammar, Fabian Coscia, Andreas Mund, Ole B. Hoerning, et
##' al. 2022. "Ultra-High Sensitivity Mass Spectrometry Quantifies
##' Single-Cell Proteome Changes upon Perturbation." Molecular Systems
##' Biology 18 (3): e10798.
##' [Link to article](http://dx.doi.org/10.15252/msb.202110798)
##'
##' @examples
##' \donttest{
##' brunner2022()
##' }
##'
##' @keywords datasets
##'
"brunner2022"
87 changes: 87 additions & 0 deletions R/cong2020AC.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
##' Cong et al. 2020 (Ana. Chem.): HeLa single cells
##'
##' Single-cell proteomics using the nanoPOTS sample processing device
##' in combination with ultranarrow-bore (20um i.d.) packed-column LC
##' separations and the Orbitrap Eclipse Tribrid MS. The dataset
##' contains label-free quantitative information at PSM, peptide and
##' protein level. The samples are single Hela cells. Bulk samples
##' (100 and 20 cells) were also included in the experiment to
##' increase the idendtification rate thanks to between-run matching
##' (cf MaxQuant).
##'
##' @format A [QFeatures] object with 9 assays, each assay being a
##' [SingleCellExperiment] object:
##'
##' - `100/20 HeLa cells`: 2 assays containing PSM data for a bulk
##' of 100 or 20 HeLa cells, respectively.
##' - `Blank`: assay containing the PSM data for a blank sample
##' - `Single cell X`: 4 assays containing PSM data for a single cell.
##' The `X` indicates the replicate number.
##' - `peptides`: quantitative data for 12590 peptides in 7 samples
##' (all runs combined).
##' - `proteins`: quantitative data for 1801 proteins in 7 samples
##' (all runs combined).
##'
##' Sample annotation is stored in `colData(cong2020AC())`.
##'
##' @section Acquisition protocol:
##'
##' The data were acquired using the following setup. More information
##' can be found in the source article (see `References`).
##'
##' - **Cell isolation**: The HeLa cells were diluted and aspired
##' using a microcapillary with a pulled tip.
##' - **Sample preparation** performed using the nanoPOTs device.
##' Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
##' Lys-C digestion + cleave RapiGest (formic acid)
##' - **Separation**: UltiMate 3000 RSLCnano pump with a home-packed
##' nanoLC column (60cm x 20um i.d.; approx. 20 nL/min)
##' - **Ionization**: ESI (2,000V; Nanospray Flex)
##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Eclipse.
##' MS1 settings: accumulation time = 246ms; resolution = 120,000;
##' AGC = 1E6. MS/MS settings depend on quantity. All: AGC = 1E5.
##' 20-100 cels: accumulation time = 246ms; resolution = 120,000.
##' Single cells: accumulation time = 500ms; resolution = 240,000.
##' - **Data analysis**: MaxQuant (v1.6.3.3) + Excel
##'
##' @section Data collection:
##'
##' The PSM, peptide and protein data were collected from the PRIDE
##' repository (accession ID: PXD016921). We downloaded the
##' `evidence.txt` file containing the PSM identification and
##' quantification results. The sample annotation was inferred from
##' the samples names. The data were then converted to a [QFeatures]
##' object using the [scp::readSCP()] function.
##'
##' The peptide data were processed similarly from the `peptides.txt`
##' file. The quantitative column names were adpated to match the PSM
##' data. The peptide data were added to [QFeatures] object and link
##' between the features were stored.
##'
##' The protein data were similarly processed from the
##' `proteinGroups.txt` file. The quantitative column names were
##' adapted to match the PSM data. The peptide data were added to
##' [QFeatures] object and link between the features were stored.
##'
##' @source
##' All files can be downloaded from the PRIDE repository PXD016921.
##' The source link is:
##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/02/PXD016921
##'
##' @references
##'
##' Cong, Yongzheng, Yiran Liang, Khatereh Motamedchaboki, Romain
##' Huguet, Thy Truong, Rui Zhao, Yufeng Shen, Daniel Lopez-Ferrer,
##' Ying Zhu, and Ryan T. Kelly. 2020. “Improved Single-Cell Proteome
##' Coverage Using Narrow-Bore Packed NanoLC Columns and
##' Ultrasensitive Mass Spectrometry.” Analytical Chemistry, January.
##' ([link to article](https://doi.org/10.1021/acs.analchem.9b04631)).
##'
##' @examples
##' \donttest{
##' cong2020AC()
##' }
##'
##' @keywords datasets
##'
"cong2020AC"
Loading

0 comments on commit e3070d0

Please sign in to comment.