From e3070d07622c48f0644dd37d66af60bb2c872324 Mon Sep 17 00:00:00 2001
From: Laurent Gatto <laurent.gatto@uclouvain.be>
Date: Wed, 17 Apr 2024 10:11:44 +0200
Subject: [PATCH] split data.R

---
 NEWS.md                     |    1 +
 R/brunner2022.R             |   85 ++
 R/cong2020AC.R              |   87 ++
 R/data.R                    | 2833 -----------------------------------
 R/derks2022.R               |  103 ++
 R/dou2019_boosting.R        |  123 ++
 R/dou2019_lysates.R         |  119 ++
 R/dou2019_mouse.R           |  126 ++
 R/gregoire2023_mixCTRL.R    |  108 ++
 R/guise2024.R               |   99 ++
 R/khan2023.R                |  110 ++
 R/leduc2022_pSCoPE.R        |  129 ++
 R/leduc2022_plexDIA.R       |  121 ++
 R/liang2020_hela.R          |   98 ++
 R/petrosius2023_AstralAML.R |  121 ++
 R/petrosius2023_mES.R       |  113 ++
 R/schoof2021.R              |  124 ++
 R/specht2019v2.R            |  105 ++
 R/specht2019v3.R            |  110 ++
 R/williams2020_lfq.R        |  118 ++
 R/williams2020_tmt.R        |   96 ++
 R/woo2022_lung.R            |   94 ++
 R/woo2022_macrophage.R      |   94 ++
 R/zhu2018MCP.R              |   86 ++
 R/zhu2018NC_hela.R          |   86 ++
 R/zhu2018NC_islets.R        |   82 +
 R/zhu2018NC_lysates.R       |   85 ++
 R/zhu2019EL.R               |  104 ++
 28 files changed, 2727 insertions(+), 2833 deletions(-)
 create mode 100644 R/brunner2022.R
 create mode 100644 R/cong2020AC.R
 delete mode 100644 R/data.R
 create mode 100644 R/derks2022.R
 create mode 100644 R/dou2019_boosting.R
 create mode 100644 R/dou2019_lysates.R
 create mode 100644 R/dou2019_mouse.R
 create mode 100644 R/gregoire2023_mixCTRL.R
 create mode 100644 R/guise2024.R
 create mode 100644 R/khan2023.R
 create mode 100644 R/leduc2022_pSCoPE.R
 create mode 100644 R/leduc2022_plexDIA.R
 create mode 100644 R/liang2020_hela.R
 create mode 100644 R/petrosius2023_AstralAML.R
 create mode 100644 R/petrosius2023_mES.R
 create mode 100644 R/schoof2021.R
 create mode 100644 R/specht2019v2.R
 create mode 100644 R/specht2019v3.R
 create mode 100644 R/williams2020_lfq.R
 create mode 100644 R/williams2020_tmt.R
 create mode 100644 R/woo2022_lung.R
 create mode 100644 R/woo2022_macrophage.R
 create mode 100644 R/zhu2018MCP.R
 create mode 100644 R/zhu2018NC_hela.R
 create mode 100644 R/zhu2018NC_islets.R
 create mode 100644 R/zhu2018NC_lysates.R
 create mode 100644 R/zhu2019EL.R

diff --git a/NEWS.md b/NEWS.md
index 8895c93..a23a710 100644
--- a/NEWS.md
+++ b/NEWS.md
@@ -3,6 +3,7 @@
 ## scpdata 1.11.4
 
 - fix non-ASCII characters
+- split data.R into multiple data specific files
 
 ## scpdata 1.11.3
 
diff --git a/R/brunner2022.R b/R/brunner2022.R
new file mode 100644
index 0000000..7787ff1
--- /dev/null
+++ b/R/brunner2022.R
@@ -0,0 +1,85 @@
+##' Brunner et al. 2022 (Mol. Syst. Biol.): cell cycle state study
+##'
+##' Single cell proteomics data acquired by the Mann Lab using a newly
+##' designed timsTOF instrument, referred to as timsTOF-SCP. The
+##' dataset contains quantitative information from single-cells blocked
+##' at 4 cell cycle stages: G1, G1-S, G2, G2-M. The data was acquired
+##' using a label-free sample preparation protocole combined to a
+##' data independent (DIA) acquisition mode.
+##'
+##' @format A [QFeatures] object with 435 assays, each assay being a
+##' [SingleCellExperiment] object.
+##'
+##' - Assay 1-434: DIA-NN main output report table split for each
+##'   acquisition run. Since each run acquires 1 single cell, each
+##'   assay contains a single column. It contains the results
+##'   of the spectrum identification and quantification.
+##' - `protein`: DIA-NN protein group matrix, containing normalised
+##'   quantities for 2476 protein groups in 434 single cells. Proteins
+##'   are filtered at 1% FDR, using global q-values for protein groups
+##'   and both global and run-specific q-values for precursors.
+##'
+##' The `colData(brunner2022())` contains cell type annotations and
+##' batch annotations. The description of the `rowData` fields for the
+##' different assays can be found in the
+##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: cells were detached with trypsin treatment,
+##'   followed by strong pipetting, and isolate using FACS.
+##' - **Sample preparation**: cell lysis by freeze-heat followed by
+##'   sonication, overnight protein digestion with trypsin/lysC mix and
+##'   desalting using EvoTips trap column (EvoSep)
+##' - **Separation**: online EvoSep One LC system using a 5 cm x 75 µm
+##'   ID column with 1.9µm C18 beads (EvoSep) at 100nL/min flow rate.
+##' - **Ionization**: 10µm ID zero dead volume electrospray emitter
+##'   (Bruker Daltonik) + nanoelectro-spray ion source (Captive spray,
+##'   Bruker Daltonik)
+##' - **Mass spectrometry**: DIA PASEF mode. Correlation between IM
+##'   and m/z was used to synchronize the elution of precursors from
+##'   each IM scan with the quadrupole isolation window. Five
+##'   consecutive diaPASEF cycles. The collision energy was ramped
+##'   linearly as a function of the IM from 59 eV at 1/K0=1.6 Vs cm^2
+##'   to 20 eV at 1/K0=0.6 Vs cm^2.
+##' - **Data analysis**: DIA-NN (1.8).
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE
+##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
+##' in the `DIANN1.8_SingleCells_CellCycle.zip` file.
+##'
+##' We loaded the DIA-NN main report table and generated a sample
+##' annotation table based on the MS file names. We next combined the
+##' sample annotation and the DIANN tables into a [QFeatures] object
+##' following the `scp` data structure. We loaded the proteins group
+##' matrix as a [SingleCellExperiment] object, fixed ambiguous
+##' protein group names, and added the protein data as a new assay and
+##' link the precursors to proteins using the `Protein.Group` variable
+##' from the `rowData`.
+##'
+##' @source
+##' The data were downloaded from PRIDE
+##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
+##' with accession ID `PXD024043`.
+##'
+##' @references
+##' Brunner, Andreas-David, Marvin Thielert, Catherine Vasilopoulou,
+##' Constantin Ammar, Fabian Coscia, Andreas Mund, Ole B. Hoerning, et
+##' al. 2022. "Ultra-High Sensitivity Mass Spectrometry Quantifies
+##' Single-Cell Proteome Changes upon Perturbation." Molecular Systems
+##' Biology 18 (3): e10798.
+##' [Link to article](http://dx.doi.org/10.15252/msb.202110798)
+##'
+##' @examples
+##' \donttest{
+##' brunner2022()
+##' }
+##'
+##' @keywords datasets
+##'
+"brunner2022"
diff --git a/R/cong2020AC.R b/R/cong2020AC.R
new file mode 100644
index 0000000..7076ca7
--- /dev/null
+++ b/R/cong2020AC.R
@@ -0,0 +1,87 @@
+##' Cong et al. 2020 (Ana. Chem.): HeLa single cells
+##'
+##' Single-cell proteomics using the nanoPOTS sample processing device
+##' in combination with ultranarrow-bore (20um i.d.) packed-column LC
+##' separations and the Orbitrap Eclipse Tribrid MS. The dataset
+##' contains label-free quantitative information at PSM, peptide and
+##' protein level. The samples are single Hela cells. Bulk samples
+##' (100 and 20 cells) were also included in the experiment to
+##' increase the idendtification rate thanks to between-run matching
+##' (cf MaxQuant).
+##'
+##' @format A [QFeatures] object with 9 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `100/20 HeLa cells`: 2 assays containing PSM data for a bulk
+##'   of 100 or 20 HeLa cells, respectively.
+##' - `Blank`: assay containing the PSM data for a blank sample
+##' - `Single cell X`: 4 assays containing PSM data for a single cell.
+##'   The `X` indicates the replicate number.
+##' - `peptides`: quantitative data for 12590 peptides in 7 samples
+##'   (all runs combined).
+##' - `proteins`: quantitative data for 1801 proteins in 7 samples
+##'   (all runs combined).
+##'
+##' Sample annotation is stored in `colData(cong2020AC())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The HeLa cells were diluted and aspired
+##'   using a microcapillary with a pulled tip.
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
+##'   Lys-C digestion + cleave RapiGest (formic acid)
+##' - **Separation**: UltiMate 3000 RSLCnano pump with a home-packed
+##'   nanoLC column (60cm x 20um i.d.; approx. 20 nL/min)
+##' - **Ionization**: ESI (2,000V; Nanospray Flex)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Eclipse.
+##'   MS1 settings: accumulation time = 246ms; resolution = 120,000;
+##'   AGC = 1E6. MS/MS settings depend on quantity. All: AGC = 1E5.
+##'   20-100 cels: accumulation time = 246ms; resolution = 120,000.
+##'   Single cells: accumulation time = 500ms; resolution = 240,000.
+##' - **Data analysis**: MaxQuant (v1.6.3.3) + Excel
+##'
+##' @section Data collection:
+##'
+##' The PSM, peptide and protein data were collected from the PRIDE
+##' repository (accession ID: PXD016921).  We downloaded the
+##' `evidence.txt` file containing the PSM identification and
+##' quantification results. The sample annotation was inferred from
+##' the samples names. The data were then converted to a [QFeatures]
+##' object using the [scp::readSCP()] function.
+##'
+##' The peptide data were processed similarly from the `peptides.txt`
+##' file. The quantitative column names were adpated to match the PSM
+##' data. The peptide data were added to [QFeatures] object and link
+##' between the features were stored.
+##'
+##' The protein data were similarly processed from the
+##' `proteinGroups.txt` file. The quantitative column names were
+##' adapted to match the PSM data. The peptide data were added to
+##' [QFeatures] object and link between the features were stored.
+##'
+##' @source
+##' All files can be downloaded from the PRIDE repository PXD016921.
+##' The source link is:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/02/PXD016921
+##'
+##' @references
+##'
+##' Cong, Yongzheng, Yiran Liang, Khatereh Motamedchaboki, Romain
+##' Huguet, Thy Truong, Rui Zhao, Yufeng Shen, Daniel Lopez-Ferrer,
+##' Ying Zhu, and Ryan T. Kelly. 2020. “Improved Single-Cell Proteome
+##' Coverage Using Narrow-Bore Packed NanoLC Columns and
+##' Ultrasensitive Mass Spectrometry.” Analytical Chemistry, January.
+##' ([link to article](https://doi.org/10.1021/acs.analchem.9b04631)).
+##'
+##' @examples
+##' \donttest{
+##' cong2020AC()
+##' }
+##'
+##' @keywords datasets
+##'
+"cong2020AC"
diff --git a/R/data.R b/R/data.R
deleted file mode 100644
index 0d84857..0000000
--- a/R/data.R
+++ /dev/null
@@ -1,2833 +0,0 @@
-####---- specht2019v2 ----####
-
-
-##' Specht et al. 2019 - SCoPE2 (biorRxiv): macrophages vs monocytes
-##' (version 2)
-##'
-##' @description
-##'
-##' Single cell proteomics data acquired by the Slavov Lab. This is
-##' the version 2 of the data released in December 2019. It contains
-##' quantitative information of macrophages and monocytes at PSM,
-##' peptide and protein level.
-##'
-##' @format A [QFeatures] object with 179 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-63: PSM data for SCoPE2 sets acquired with a TMT-11plex
-##'   protocol, hence those assays contain 11 columns. Columns
-##'   hold quantitative information from single-cell channels, carrier
-##'   channels, reference channels, empty (blank) channels and unused
-##'   channels.
-##' - Assay 64-177: PSM data for SCoPE2 sets acquired with a
-##'   TMT-16plex protocol, hence those assays contain 16 columns.
-##'   Columns hold quantitative information from single-cell channels,
-##'   carrier channels, reference channels, empty (blank) channels and
-##'   unused channels.
-##' - `peptides`: peptide data containing quantitative data for 9208
-##'   peptides and 1018 single-cells.
-##' - `proteins`: protein data containing quantitative data for 2772
-##'   proteins and 1018 single-cells.
-##'
-##' The `colData(specht2019v2())` contains cell type annotation and
-##' batch annotation that are common to all assays. The description of
-##' the `rowData` fields for the PSM data can be found in the
-##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: flow cytometry (BD FACSAria I).
-##' - **Sample preparation** performed using the SCoPE2 protocol. mPOP
-##'   cell lysis + trypsin digestion + TMT-11plex or 16plex labelling
-##'   and pooling.
-##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
-##'   25cm x 75um IonOpticksAurora Series UHPLC column; 200nL/min).
-##' - **Ionization**: ESI (2,200V).
-##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
-##'   resolution = 70,000; MS1 accumulation time = 300ms; MS2
-##'   resolution = 70,000).
-##' - **Data analysis**: DART-ID + MaxQuant (1.6.2.3).
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a shared Google Drive folder that
-##' is accessible from the SlavovLab website (see `Source` section).
-##' The folder contains the following files of interest:
-##'
-##' - `ev_updated.txt`: the MaxQuant/DART-ID output file
-##' - `annotation_fp60-97.csv`: sample annotation
-##' - `batch_fp60-97.csv`: batch annotation
-##'
-##' We combined the sample annotation and the batch annotation in
-##' a single table. We also formatted the quantification table so that
-##' columns match with those of the annotation and filter only for
-##' single-cell runs. Both table are then combined in a single
-##' [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' The peptide data were taken from the Slavov lab directly
-##' (`Peptides-raw.csv`). It is provided as a spreadsheet. The data
-##' were formatted to a [SingleCellExperiment] object and the sample
-##' metadata were matched to the column names (mapping is retrieved
-##' after running the SCoPE2 R script) and stored in the `colData`.
-##' The object is then added to the [QFeatures] object (containing the
-##' PSM assays) and the rows of the peptide data are linked to the
-##' rows of the PSM data based on the peptide sequence information
-##' through an `AssayLink` object.
-##'
-##' The protein data (`Proteins-processed.csv`) is formatted similarly
-##' to the peptide data, and the rows of the proteins were mapped onto
-##' the rows of the peptide data based on the protein sequence
-##' information.
-##'
-##' @source
-##' The data were downloaded from the
-##' [Slavov Lab](https://scope2.slavovlab.net/mass-spec/data) website via a
-##' shared Google Drive
-##' [folder](https://drive.google.com/drive/folders/1VzBfmNxziRYqayx3SP-cOe2gu129Obgx).
-##' The raw data and the quantification data can also be found in the
-##' massIVE repository `MSV000083945`:
-##' ftp://massive.ucsd.edu/MSV000083945.
-##'
-##' @references Specht, Harrison, Edward Emmott, Aleksandra A.
-##' Petelski, R. Gray Huffman, David H. Perlman, Marco Serra, Peter
-##' Kharchenko, Antonius Koller, and Nikolai Slavov. 2019.
-##' "Single-Cell Mass-Spectrometry Quantifies the Emergence of
-##' Macrophage Heterogeneity." bioRxiv.
-##' ([link to article](https://doi.org/10.1101/665307)).
-##'
-##' @examples
-##' \donttest{
-##' specht2019v2()
-##' }
-##'
-##' @keywords datasets
-##'
-"specht2019v2"
-
-####---- specht2019v3 ----####
-
-##' Specht et al. 2019 - SCoPE2 (biorRxiv): macrophages vs monocytes
-##' (version 3)
-##'
-##' Single cell proteomics data acquired by the Slavov Lab. This is
-##' the version 3 of the data released in October 2020. It contains
-##' quantitative information of macrophages and monocytes at PSM,
-##' peptide and protein level.
-##'
-##' @format A [QFeatures] object with 179 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-63: PSM data for SCoPE2 sets acquired with a TMT-11plex
-##'   protocol, hence those assays contain 11 columns. Columns
-##'   hold quantitative information from single-cell channels, carrier
-##'   channels, reference channels, empty (blank) channels and unused
-##'   channels.
-##' - Assay 64-177: PSM data for SCoPE2 sets acquired with a
-##'   TMT-16plex protocol, hence those assays contain 16 columns.
-##'   Columns hold quantitative information from single-cell channels,
-##'   carrier channels, reference channels, empty (blank) channels and
-##'   unused channels.
-##' - `peptides`: peptide data containing quantitative data for 9208
-##'   peptides and 1018 single-cells.
-##' - `proteins`: protein data containing quantitative data for 2772
-##'   proteins and 1018 single-cells.
-##'
-##' The `colData(specht2019v2())` contains cell type annotation and
-##' batch annotation that are common to all assays. The description of
-##' the `rowData` fields for the PSM data can be found in the
-##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: flow cytometry (BD FACSAria I).
-##' - **Sample preparation** performed using the SCoPE2 protocol. mPOP
-##'   cell lysis + trypsin digestion + TMT-11plex or 16plex labeling
-##'   and pooling.
-##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
-##'   25cm x 75um IonOpticksAurora Series UHPLC column; 200nL/min).
-##' - **Ionization**: ESI (2,200V).
-##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
-##'   resolution = 70,000; MS2 accumulation time = 300ms; MS2
-##'   resolution = 70,000).
-##' - **Data analysis**: DART-ID + MaxQuant (1.6.2.3).
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a shared Google Drive folder that
-##' is accessible from the SlavovLab website (see `Source` section).
-##' The folder contains the following files of interest:
-##'
-##' - `ev_updated_v2.txt`: the MaxQuant/DART-ID output file
-##' - `annotation_fp60-97.csv`: sample annotation
-##' - `batch_fp60-97.csv`: batch annotation
-##'
-##' We combined the sample annotation and the batch annotation in
-##' a single table. We also formatted the quantification table so that
-##' columns match with those of the annotation and filter only for
-##' single-cell runs. Both table are then combined in a single
-##' [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' The peptide data were taken from the Slavov lab directly
-##' (`Peptides-raw.csv`). It is provided as a spreadsheet. The data
-##' were formatted to a [SingleCellExperiment] object and the sample
-##' metadata were matched to the column names (mapping is retrieved
-##' after running the SCoPE2 R script) and stored in the `colData`.
-##' The object is then added to the [QFeatures] object (containing the
-##' PSM assays) and the rows of the peptide data are linked to the
-##' rows of the PSM data based on the peptide sequence information
-##' through an `AssayLink` object.
-##'
-##' The protein data (`Proteins-processed.csv`) is formatted similarly
-##' to the peptide data, and the rows of the proteins were mapped onto
-##' the rows of the peptide data based on the protein sequence
-##' information.
-##'
-##' @note Since version 2, a serious bug in the data were corrected
-##' for TMT channels 12 to 16. Many more cells are therefore contained
-##' in the data. Version 2 is maintained for backward compatibility.
-##' Although the final version of the article was published in 2021,
-##' we have kept `specht2019v3` as the data set name for consistency
-##' with the previous data version `specht2019v2`.
-##'
-##' @source
-##' The data were downloaded from the
-##' [Slavov Lab](https://scope2.slavovlab.net/docs/data) website via a
-##' shared Google Drive
-##' [folder](https://drive.google.com/drive/folders/1VzBfmNxziRYqayx3SP-cOe2gu129Obgx).
-##' The raw data and the quantification data can also be found in the
-##' massIVE repository `MSV000083945`:
-##' ftp://massive.ucsd.edu/MSV000083945.
-##'
-##' @references Specht, Harrison, Edward Emmott, Aleksandra A.
-##' Petelski, R. Gray Huffman, David H. Perlman, Marco Serra, Peter
-##' Kharchenko, Antonius Koller, and Nikolai Slavov. 2021.
-##' "Single-Cell Proteomic and Transcriptomic Analysis of Macrophage
-##' Heterogeneity Using SCoPE2." Genome Biology 22 (1): 50.
-##' ([link to article](http://dx.doi.org/10.1186/s13059-021-02267-5)).
-##'
-##' @examples
-##' \donttest{
-##' specht2019v3()
-##' }
-##'
-##' @keywords datasets
-##'
-"specht2019v3"
-
-
-####---- dou2019_lysates ----####
-
-
-##' Dou et al. 2019 (Anal. Chem.): HeLa lysates
-##'
-##' @description
-##'
-##' Single-cell proteomics using nanoPOTS combined with TMT
-##' multiplexing. It contains quantitative information at PSM and
-##' protein level. The samples are commercial Hela lysates diluted to
-##' single-cell amounts (0.2 ng). The boosting wells contain the same
-##' digest but at higher amount (10 ng).
-##'
-##' @format A [QFeatures] object with 3 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `Hela_run_1`: PSM data with 10 columns corresponding to the
-##'   TMT-10plex channels. Columns hold quantitative information for
-##'   HeLa lysate samples (either 0, 0.2 or 10ng). This is the data
-##'   for run 1.
-##' - `Hela_run_1`: PSM data with 10 columns corresponding to the
-##'   TMT-10plex channels. Columns hold quantitative information for
-##'   HeLa lysate samples (either 0, 0.2 or 10ng). This is the data
-##'   for run 2.
-##' - `peptides`: peptide data containing quantitative data for 13,934
-##'   peptides in 20 samples (run 1 and run 2 combined).
-##' - `proteins`: protein data containing quantitative data for 1641
-##'   proteins in 20 samples (run 1 and run 2 combined).
-##'
-##' Sample annotation is stored in `colData(dou2019_lysates())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: commercially available HeLa protein digest
-##'   (Thermo Scientific).
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
-##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
-##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
-##'   50cm x 30um LC columns; 50nL/min)
-##' - **Ionization**: ESI (2,000V)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
-##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
-##'   60,000; MS2 AGC = 1E5)
-##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
-##'   (custom R package)
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from the MassIVE repository
-##' MSV000084110 (see `Source` section). The downloaded files are:
-##'
-##' - `Hela_run_*_msgfplus.mzid`: the MS-GF+ identification result
-##'   files
-##' - `Hela_run_*_ReporterIons.txt`: the MASIC quantification result
-##'   files
-##'
-##' For each batch, the quantification and identification data were
-##' combined based on the scan number (common to both data sets). The
-##' combined datasets for the different runs were then concatenated
-##' feature-wise. To avoid data duplication due to ambiguous matching
-##' of spectra to peptides or ambiguous mapping of peptides to proteins,
-##' we combined ambiguous peptides to peptides groups and proteins to
-##' protein groups. Feature annotations that are not common within a
-##' peptide or protein group are are separated by a `;`. The sample
-##' annotation table was manually created based on the available
-##' information provided in the article. The data were then converted
-##' to a [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' We generated the peptide data. First, we removed PSM matched to
-##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
-##' the PSM to peptides based on the peptide (or peptide group)
-##' sequence(s) using the median PSM instenity. The peptide data for
-##' the different runs were then joined in a single assay (see
-##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
-##' We then removed the peptide groups. Links between the peptide and
-##' the PSM data were created using [QFeatures::addAssayLink]. Note
-##' that links between PSM and peptide groups are not stored.
-##'
-##' The protein data were downloaded from `Supporting information`
-##' section from the publisher's website (see `Sources`). The data is
-##' supplied as an Excel file `ac9b03349_si_003.xlsx`. The file
-##' contains 7 sheets from which we only took the sheet 6 (named
-##' `5 - Run 1 and 2 raw data`) with the combined protein data for the
-##' two runs. We converted the data to a [SingleCellExperiment]
-##' object and added the object as a new assay in the [QFeatures]
-##' dataset (containing the PSM data). Links between the proteins and
-##' the peptides were created. Note that links to protein groups are
-##' not stored.
-##'
-##' @source
-##' The PSM data can be downloaded from the massIVE repository
-##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
-##'
-##' The protein data can be downloaded from the
-##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
-##' website (Supporting information section).
-##'
-##' @references
-##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
-##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
-##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
-##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
-##' September
-##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
-##'
-##' @seealso
-##' [dou2019_mouse], [dou2019_boosting]
-##'
-##' @examples
-##' \donttest{
-##' dou2019_lysates()
-##' }
-##'
-##' @keywords datasets
-##'
-##'
-"dou2019_lysates"
-
-
-####---- dou2019_mouse ----####
-
-
-##' Dou et al. 2019 (Anal. Chem.): murine cell lines
-##'
-##' @description
-##'
-##' Single-cell proteomics using nanoPOTS combined with TMT isobaric
-##' labeling. It contains quantitative information at PSM and protein
-##' level. The cell types are either "Raw" (macrophage cells), "C10"
-##' (epihelial cells), or "SVEC" (endothelial cells). Out of the 132
-##' wells, 72 contain single cells, corresponding to 24 C10 cells, 24
-##' RAW cells, and 24 SVEC. The other wells are either boosting
-##' channels (12), empty channels (36) or reference channels (12).
-##' Boosting and reference channels are balanced (1:1:1) mixes of C10,
-##' SVEC, and RAW samples at 5 ng and 0.2 ng, respectively. The
-##' different cell types where evenly distributed across 4 nanoPOTS
-##' chips. Samples were 11-plexed with TMT labeling.
-##'
-##' @format A [QFeatures] object with 13 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `Single_Cell_Chip_X_Y`: PSM data with 11 columns corresponding
-##'   to the TMT channels (see `Notes`). The `X` indicates the chip
-##'   number (from 1 to 4) and `Y` indicates the row name on the chip
-##'   (from A to C).
-##' - `peptides`: peptide data containing quantitative data for 15,492
-##'   peptides in 132 samples (run 1 and run 2 combined).
-##' - `proteins`: protein data containing quantitative data for 2331
-##'   proteins in 132 samples (all runs combined).
-##'
-##' Sample annotation is stored in `colData(dou2019_mouse())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: single-cells from the three murine cell
-##'   lines were isolated using FACS (BD Influx II cell sorter ).
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
-##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
-##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
-##'   50cm x 30um LC columns; 50nL/min)
-##' - **Ionization**: ESI (2,000V)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
-##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
-##'   60,000; MS2 AGC = 1E5)
-##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
-##'   (custom R package)
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from the MassIVE repository
-##' MSV000084110 (see `Source` section). The downloaded files are:
-##'
-##'
-##' - `Single_Cell_Chip_*_*_msgfplus.mzid`: the MS-GF+ identification
-##'   result files.
-##' - `Single_Cell_Chip_*_*_ReporterIons.txt`: the MASIC
-##'   quantification result files.
-##'
-##' For each batch, the quantification and identification data were
-##' combined based on the scan number (common to both data sets). The
-##' combined datasets for the different runs were then concatenated
-##' feature-wise. To avoid data duplication due to ambiguous matching
-##' of spectra to peptides or ambiguous mapping of peptides to proteins,
-##' we combined ambiguous peptides to peptides groups and proteins to
-##' protein groups. Feature annotations that are not common within a
-##' peptide or protein group are are separated by a `;`. The sample
-##' annotation table was manually created based on the available
-##' information provided in the article. The data were then converted
-##' to a [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' We generated the peptide data. First, we removed PSM matched to
-##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
-##' the PSM to peptides based on the peptide (or peptide group)
-##' sequence(s) using the median PSM instenity. The peptide data for
-##' the different runs were then joined in a single assay (see
-##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
-##' We then removed the peptide groups. Links between the peptide and
-##' the PSM data were created using [QFeatures::addAssayLink]. Note
-##' that links between PSM and peptide groups are not stored.
-##'
-##' The protein data were downloaded from `Supporting information`
-##' section from the publisher's website (see `Sources`). The data is
-##' supplied as an Excel file `ac9b03349_si_005.xlsx`. The file
-##' contains 7 sheets from which we only took the 2nd (named
-##' `01 - Raw sc protein data`) with the combined protein data for the
-##' 12 runs. We converted the data to a [SingleCellExperiment] object
-##' and added the object as a new assay in the [QFeatures] dataset
-##' (containing the PSM data). Links between the proteins and the
-##' corresponding PSM were created. Note that links to protein groups
-##' are not stored.
-##'
-##' @note Although a TMT-10plex labeling is reported in the article,
-##' the PSM data contained 11 channels for each run. Those 11th
-##' channel contain mostly missing data and are hence assumed to be
-##' empty channels.
-##'
-##' @source
-##' The PSM data can be downloaded from the massIVE repository
-##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
-##'
-##' The protein data can be downloaded from the
-##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
-##' website (Supporting information section).
-##'
-##' @references
-##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
-##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
-##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
-##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
-##' September
-##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
-##'
-##' @seealso
-##' [dou2019_lysates], [dou2019_boosting]
-##'
-##' @examples
-##' \donttest{
-##' dou2019_mouse()
-##' }
-##'
-##' @keywords datasets
-##'
-"dou2019_mouse"
-
-
-####---- dou2019_boosting ----####
-
-
-##' Dou et al. 2019 (Anal. Chem.): testing boosting ratios
-##'
-##' @description
-##'
-##' Single-cell proteomics using nanoPOTS combined with TMT isobaric
-##' labeling. It contains quantitative information at PSM and protein
-##' level. The cell types are either "Raw" (macrophage cells), "C10"
-##' (epihelial cells), or "SVEC" (endothelial cells). Each cell is
-##' replicated 2 or 3 times. Each cell type was run using 3 levels of
-##' boosting: 0 ng (no boosting), 5 ng or 50 ng. When boosting was
-##' applied, 1 reference well and 1 boosting well were added,
-##' otherwise 1 empty well was added. Each boosting setting (0ng, 5ng,
-##' 50ng) was run in duplicate.
-##'
-##' @format A [QFeatures] object with 7 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `Boosting_X_run_Y`: PSM data with 10 columns corresponding to
-##'   the TMT-10plex channels. The `X` indicates the boosting amount
-##'   (0ng, 5ng or 50ng) and `Y` indicates the run number (1 or 2).
-##' - `peptides`: peptide data containing quantitative data for 13,462
-##'   peptides in 60 samples (run 1 and run 2 combined).
-##' - `proteins`: protein data containing quantitative data for 1436
-##'   proteins and 60 samples (all runs combined).
-##'
-##' Sample annotation is stored in `colData(dou2019_boosting())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: single-cells from the three murine cell
-##'   lines were isolated using FACS (BD Influx II cell sorter ).
-##'   Boosting sample were prepared (presumably in bulk) from 1:1:1
-##'   mix of the three cell lines.
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
-##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
-##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
-##'   50cm x 30um LC columns; 50nL/min)
-##' - **Ionization**: ESI (2,000V)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
-##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
-##'   60,000; MS2 AGC = 1E5)
-##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
-##'   (custom R package)
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from the MassIVE repository
-##' MSV000084110 (see `Source` section). The downloaded files are:
-##'
-##' - `Boosting_*ng_run_*_msgfplus.mzid`: the MS-GF+ identification
-##'   result files.
-##' - `Boosting_*ng_run_*_ReporterIons.txt`: the MASIC quantification
-##'   result files.
-##'
-##' For each batch, the quantification and identification data were
-##' combined based on the scan number (common to both data sets). The
-##' combined datasets for the different runs were then concatenated
-##' feature-wise. To avoid data duplication due to ambiguous matching
-##' of spectra to peptides or ambiguous mapping of peptides to proteins,
-##' we combined ambiguous peptides to peptides groups and proteins to
-##' protein groups. Feature annotations that are not common within a
-##' peptide or protein group are are separated by a `;`. The sample
-##' annotation table was manually created based on the available
-##' information provided in the article. The data were then converted
-##' to a [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' We generated the peptide data. First, we removed PSM matched to
-##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
-##' the PSM to peptides based on the peptide (or peptide group)
-##' sequence(s) using the median PSM instenity. The peptide data for
-##' the different runs were then joined in a single assay (see
-##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
-##' We then removed the peptide groups. Links between the peptide and
-##' the PSM data were created using [QFeatures::addAssayLink]. Note
-##' that links between PSM and peptide groups are not stored.
-##'
-##' The protein data were downloaded from `Supporting information`
-##' section from the publisher's website (see `Sources`). The data is
-##' supplied as an Excel file `ac9b03349_si_004.xlsx`. The file
-##' contains 7 sheets from which we took the 2nd, 4th and 6th sheets
-##' (named `01 - No Boost raw data`, `03 - 5ng boost raw data`,
-##' `05 - 50ng boost raw data`, respectively). The sheets contain the
-##' combined protein data for the duplicate runs given the boosting
-##' amount. We joined the data for all boosting ration based on the
-##' protein name and converted the data to a [SingleCellExperiment]
-##' object. We then added the object as a new assay in the [QFeatures]
-##' dataset (containing the PSM data). Links between the proteins and
-##' the corresponding PSM were created. Note that links to protein
-##' groups are not stored.
-##'
-##' @source
-##' The PSM data can be downloaded from the massIVE repository
-##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
-##'
-##' The protein data can be downloaded from the
-##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
-##' website (Supporting information section).
-##'
-##' @references
-##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
-##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
-##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
-##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
-##' September
-##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
-##'
-##' @seealso
-##' [dou2019_lysates], [dou2019_mouse]
-##'
-##' @examples
-##' \donttest{
-##' dou2019_boosting()
-##' }
-##'
-##' @keywords datasets
-##'
-##'
-"dou2019_boosting"
-
-
-####---- zhu2018MCP ----####
-
-
-##' Zhu et al. 2018 (Mol. Cel. Prot.): rat brain laser dissections
-##'
-##' Near single-cell proteomics data of laser captured
-##' micro-dissection samples. The samples are 24 brain sections from
-##' rat pups (day 17). The slices are 12 um thick squares of either
-##' 50, 100, or 200 um width. 5 samples were dissected from the corpus
-##' callum (`CC`), 4 samples were dissected from the corpus collosum
-##' (`CP`), 13 samples were extracted from the cerebral cortex
-##' (`CTX`), and 2 samples are labeled as (`Mix`).
-##'
-##' @format A [QFeatures] object with 4 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides`: quantitative information for 13,055 peptides from
-##'   24 samples
-##' - `proteins_intensity`: protein intensities for 2,257 proteins
-##'   from 24 samples
-##' - `proteins_LFQ`: LFQ intensities for 2,257 proteins from 24 samples
-##' - `proteins_iBAQ`: iBAQ values for 2,257 proteins from 24 samples
-##'
-##' Sample annotation is stored in `colData(zhu2018MCP())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the original article (see `References`).
-##'
-##' - **Cell isolation**: brain patches were collected using
-##'   laser-capture microdissection (PALM MicroBeam) on flash frozen
-##'   rat (*Rattus norvergicus*) brain tissues. Note that the samples
-##'   were stained with H&E before dissection for histological
-##'   analysis. DMSO is used as sample collection solution
-##' - **Sample preparation** performed using the nanoPOTs device: DMSO
-##'   evaporation + protein extraction (DMM + DTT) + alkylation (IAA)
-##'   + Lys-C digestion + trypsin digestion.
-##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
-##'   60cm x 30um LC columns; 50nL/min)
-##' - **Ionization**: ESI (2,000V)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid (MS1 accumulation time = 246ms; MS1 resolution =
-##'   120,000; MS1 AGC = 3E6). The MS/MS settings  depend on the
-##'   sample size, excepted for the AGC = 1E5. 50um (time = 502ms;
-##'   resolution = 240,000), 100um (time = 246ms; resolution =
-##'   120,000), 200um (time = 118ms; resolution = 60,000).
-##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus (v1.5.6.0) +
-##'   Origin Pro 2017
-##'
-##' @section Data collection:
-##'
-##' The data were collected from the PRIDE repository (accession
-##' ID: PXD008844).  We downloaded the `MaxQuant_Peptides.txt`
-##' and the `MaxQuant_ProteinGroups.txt` files containing the
-##' combined identification and quantification
-##' results. The sample annotations were inferred from the names of
-##' columns holding the quantification data and the information in the
-##' article. The peptides data were converted to a [SingleCellExperiment]
-##' object. We split the protein table to separate the three types of
-##' quantification: protein intensity, label-free quantitification
-##' (LFQ) and intensity based absolute quantification (iBAQ). Each
-##' table is converted to a [SingleCellExperiment] object along with
-##' the remaining protein annotations. The 4 objects are combined in
-##' a single [QFeatures] object and feature links are created based on
-##' the peptide leading razor protein ID and the protein ID.
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD008844. FTP link
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/07/PXD008844
-##'
-##' @references
-##' Zhu, Ying, Maowei Dou, Paul D. Piehowski, Yiran Liang, Fangjun
-##' Wang, Rosalie K. Chu, William B. Chrisler, et al. 2018. “Spatially
-##' Resolved Proteome Mapping of Laser Capture Microdissected Tissue
-##' with Automated Sample Transfer to Nanodroplets.” Molecular &
-##' Cellular Proteomics: MCP 17 (9): 1864–74
-##' ([link to article](http://dx.doi.org/10.1074/mcp.TIR118.000686)).
-##'
-##' @examples
-##' \donttest{
-##' zhu2018MCP()
-##' }
-##'
-##' @keywords datasets
-##'
-##'
-"zhu2018MCP"
-
-
-####---- zhu2018NC_hela ----####
-
-
-##' Zhu et al. 2018 (Nat. Comm.): HeLa titration
-##'
-##' Near single-cell proteomics data of HeLa samples containing
-##' different number of cells. There are three groups of cell
-##' concentrations: low (10-14 cells), medium (35-45 cells) and high
-##' (137-141 cells). The data also contain measures for blanks, HeLa
-##' lysates (50 cell equivalent) and 2 cancer cell line lysates (MCF7
-##' and THP1, 50 cell equivalent).
-##'
-##' @format A [QFeatures] object with 4 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides`: quantitative information for 37,795 peptides from
-##'   21 samples
-##' - `proteins_intensity`: protein intensities for 3,984 proteins
-##'   from 21 samples
-##' - `proteins_LFQ`: LFQ intensities for 3,984 proteins from 21
-##'   samples
-##' - `proteins_iBAQ`: iBAQ values for 3,984 proteins from 21 samples
-##'
-##' Sample annotation is stored in `colData(zhu2018NC_hela())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the original article (see `References`).
-##'
-##' - **Cell isolation**: HeLa cell concentration was adjusted by
-##'   serial dilution and cell counting was performed manually using
-##'   an inverted microscope.
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
-##'   Lys-C digestion + cleave RapiGest (formic acid).
-##' - **Separation**: nanoACQUITY UPLC pump (60nL/min) with an
-##'   Self-Pack PicoFrit 70cm x 30um LC columns.
-##' - **Ionization**: ESI (1,900V).
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
-##'   120,000; AGC = 1E6. MS/MS settings, depend on the sample size,
-##'   excepted for the AGC = 1E5. Blank and approx. 10 cells (time = 502ms;
-##'   resolution = 240,000), approx. 40 cells (time = 246ms; resolution =
-##'   120,000), approx. 140 cells (time = 118ms; resolution = 60,000).
-##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
-##'   2017
-##'
-##' @section Data collection:
-##'
-##' The data were collected from the PRIDE repository (accession
-##' ID: PXD006847).  We downloaded the `CulturedCells_peptides.txt`
-##' and the `CulturedCells_proteinGroups.txt` files containing the
-##' combined identification and quantification
-##' results. The sample annotations were inferred from the names of
-##' columns holding the quantification data and the information in the
-##' article. The peptides data were converted to a [SingleCellExperiment]
-##' object. We split the protein table to separate the three types of
-##' quantification: protein intensity, label-free quantitification
-##' (LFQ) and intensity based absolute quantification (iBAQ). Each
-##' table is converted to a [SingleCellExperiment] object along with
-##' the remaining protein annotations. The 4 objects are combined in
-##' a single [QFeatures] object and feature links are created based on
-##' the peptide leading razor protein ID and the protein ID.
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD006847. FTP link:
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
-##'
-##' @references
-##'
-##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
-##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
-##' Processing Platform for Deep and Quantitative Proteome Profiling
-##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
-##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
-##'
-##' @seealso The same experiment was conducted on HeLa lysates:
-##' [zhu2018NC_lysates].
-##'
-##' @examples
-##' \donttest{
-##' zhu2018NC_hela()
-##' }
-##'
-##' @keywords datasets
-##'
-"zhu2018NC_hela"
-
-
-####---- zhu2018NC_lysates ----####
-
-
-##' Zhu et al. 2018 (Nat. Comm.): HeLa lysates
-##'
-##' Near single-cell proteomics data of HeLa lysates at different
-##' concentrations (10, 40 and 140 cell equivalent). Each
-##' concentration is acquired in triplicate.
-##'
-##' @format A [QFeatures] object with 4 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides`: quantitative information for 14,921 peptides from
-##'   9 lysate samples
-##' - `proteins_intensity`: quantitative information for 2,199
-##'   proteins from 9 lysate samples
-##' - `proteins_LFQ`: LFQ intensities for 2,199 proteins from 9 lysate
-##'   samples
-##' - `proteins_iBAQ`: iBAQ values for 2,199 proteins from 9 lysate
-##'   samples
-##'
-##' Sample annotation is stored in `colData(zhu2018NC_lysates())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the original article (see `References`).
-##'
-##' - **Cell isolation**: HeLas were collected from cell cultures.
-##' - **Sample preparation** performed in bulk (5E5 cells/mL). Protein
-##'   extraction using RapiGest (+ DTT) + dilution to target
-##'   concentration + alkylation (IAA) + Lys-C digestion + trypsin
-##'   digestion + cleave RapiGest (formic acid).
-##' - **Separation**: nanoACQUITY UPLC pump (60nL/min) with an
-##'   Self-Pack PicoFrit 70cm x 30um LC columns.
-##' - **Ionization**: ESI (1,900V).
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
-##'   120,000; AGC = 1E6. MS/MS settings, depend on the sample size,
-##'   excepted for the AGC = 1E5. Blank and approx. 10 cells (time = 502ms;
-##'   resolution = 240,000), approx. 40 cells (time = 246ms; resolution =
-##'   120,000), approx. 140 cells (time = 118ms; resolution = 60,000).
-##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
-##'   2017.
-##'
-##' @section Data collection:
-##'
-##' The data were collected from the PRIDE repository (accession
-##' ID: PXD006847).  We downloaded the `Vail_Prep_Vail_peptides.txt`
-##' and the `Vail_Prep_Vail_proteinGroups.txt` files containing the
-##' combined identification and quantification
-##' results. The sample annotations were inferred from the names of
-##' columns holding the quantification data and the information in the
-##' article. The peptides data were converted to a [SingleCellExperiment]
-##' object. We split the protein table to separate the three types of
-##' quantification: protein intensity, label-free quantitification
-##' (LFQ) and intensity based absolute quantification (iBAQ). Each
-##' table is converted to a [SingleCellExperiment] object along with
-##' the remaining protein annotations. The 4 objects are combined in
-##' a single [QFeatures] object and feature links are created based on
-##' the peptide leading razor protein ID and the protein ID.
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD006847. The source link is:
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
-##'
-##' @references
-##'
-##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
-##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
-##' Processing Platform for Deep and Quantitative Proteome Profiling
-##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
-##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
-##'
-##' @seealso The same experiment was conducted directly on HeLa cells
-##' samples rather than lysates. The data is available in
-##' [zhu2018NC_hela].
-##'
-##' @examples
-##' \donttest{
-##' zhu2018NC_lysates()
-##' }
-##'
-##' @keywords datasets
-##'
-##'
-"zhu2018NC_lysates"
-
-
-####---- zhu2018NC_islets ----####
-
-
-##' Zhu et al. 2018 (Nat. Comm.): human pancreatic islets
-##'
-##'
-##' Near single-cell proteomics data human pancreas samples. The
-##' samples were collected from pancreatic tissue slices using laser
-##' dissection. The pancreata were obtained from organ donors through
-##' the JDRFNetwork for Pancreatic Organ Donors with Diabetes (nPOD)
-##' program. The sample come either from control patients (n=9) or
-##' from type 1 diabetes (T1D) patients (n=9).
-##'
-##' @format A [QFeatures] object with 4 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides`: quantitative information for 24,321 peptides from
-##'   18 islet samples
-##' - `proteins_intensity`: quantitative information for 3,278
-##'   proteins from 18 islet samples
-##' - `proteins_LFQ`: LFQ intensities for 3,278 proteins from 18 islet
-##'   samples
-##' - `proteins_iBAQ`: iBAQ values for 3,278 proteins from 18 islet
-##'   samples
-##'
-##' Sample annotation is stored in `colData(zhu2018NC_islets())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: The islets were extracted from the pacreatic
-##'   tissues using laser-capture microdissection.
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
-##'   Lys-C digestion + cleave RapiGest (formic acid)
-##' - **Separation**: nanoACQUITY UPLC pump with an Self-Pack PicoFrit
-##'   70cm x 30um LC columns; 60nL/min)
-##' - **Ionization**: ESI (1,900V)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
-##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
-##'   120,000; AGC = 1E6. MS/MS settings: accumulation time = 118ms;
-##'   resolution = 60,000; AGC = 1E5.
-##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
-##'   2017
-##'
-##' @section Data collection:
-##'
-##' The data were collected from the PRIDE repository (accession
-##' ID: PXD006847).  We downloaded the `Islet_t1d_ct_peptides.txt`
-##' and the `Islet_t1d_ct_proteinGroups.txt` files containing the
-##' combined identification and quantification results. The sample
-##' types were inferred from the names of columns holding the
-##' quantification data. The peptides data were converted to a
-##' [SingleCellExperiment] object. We split the protein table to
-##' separate the three types of quantification: protein intensity,
-##' label-free quantitification (LFQ) and intensity based absolute
-##' quantification (iBAQ). Each table is converted to a
-##' [SingleCellExperiment] object along with the remaining protein
-##' annotations. The 4 objects are combined in a single [QFeatures]
-##' object and feature links are created based on the peptide leading
-##' razor protein ID and the protein ID.
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD006847. The source link is:
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
-##'
-##' @references
-##'
-##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
-##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
-##' Processing Platform for Deep and Quantitative Proteome Profiling
-##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
-##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
-##'
-##' @examples
-##' \donttest{
-##' zhu2018NC_islets()
-##' }
-##'
-##' @keywords datasets
-##'
-"zhu2018NC_islets"
-
-
-####---- cong2020AC ----####
-
-
-##' Cong et al. 2020 (Ana. Chem.): HeLa single cells
-##'
-##' Single-cell proteomics using the nanoPOTS sample processing device
-##' in combination with ultranarrow-bore (20um i.d.) packed-column LC
-##' separations and the Orbitrap Eclipse Tribrid MS. The dataset
-##' contains label-free quantitative information at PSM, peptide and
-##' protein level. The samples are single Hela cells. Bulk samples
-##' (100 and 20 cells) were also included in the experiment to
-##' increase the idendtification rate thanks to between-run matching
-##' (cf MaxQuant).
-##'
-##' @format A [QFeatures] object with 9 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `100/20 HeLa cells`: 2 assays containing PSM data for a bulk
-##'   of 100 or 20 HeLa cells, respectively.
-##' - `Blank`: assay containing the PSM data for a blank sample
-##' - `Single cell X`: 4 assays containing PSM data for a single cell.
-##'   The `X` indicates the replicate number.
-##' - `peptides`: quantitative data for 12590 peptides in 7 samples
-##'   (all runs combined).
-##' - `proteins`: quantitative data for 1801 proteins in 7 samples
-##'   (all runs combined).
-##'
-##' Sample annotation is stored in `colData(cong2020AC())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: The HeLa cells were diluted and aspired
-##'   using a microcapillary with a pulled tip.
-##' - **Sample preparation** performed using the nanoPOTs device.
-##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
-##'   Lys-C digestion + cleave RapiGest (formic acid)
-##' - **Separation**: UltiMate 3000 RSLCnano pump with a home-packed
-##'   nanoLC column (60cm x 20um i.d.; approx. 20 nL/min)
-##' - **Ionization**: ESI (2,000V; Nanospray Flex)
-##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Eclipse.
-##'   MS1 settings: accumulation time = 246ms; resolution = 120,000;
-##'   AGC = 1E6. MS/MS settings depend on quantity. All: AGC = 1E5.
-##'   20-100 cels: accumulation time = 246ms; resolution = 120,000.
-##'   Single cells: accumulation time = 500ms; resolution = 240,000.
-##' - **Data analysis**: MaxQuant (v1.6.3.3) + Excel
-##'
-##' @section Data collection:
-##'
-##' The PSM, peptide and protein data were collected from the PRIDE
-##' repository (accession ID: PXD016921).  We downloaded the
-##' `evidence.txt` file containing the PSM identification and
-##' quantification results. The sample annotation was inferred from
-##' the samples names. The data were then converted to a [QFeatures]
-##' object using the [scp::readSCP()] function.
-##'
-##' The peptide data were processed similarly from the `peptides.txt`
-##' file. The quantitative column names were adpated to match the PSM
-##' data. The peptide data were added to [QFeatures] object and link
-##' between the features were stored.
-##'
-##' The protein data were similarly processed from the
-##' `proteinGroups.txt` file. The quantitative column names were
-##' adapted to match the PSM data. The peptide data were added to
-##' [QFeatures] object and link between the features were stored.
-##'
-##' @source
-##' All files can be downloaded from the PRIDE repository PXD016921.
-##' The source link is:
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2020/02/PXD016921
-##'
-##' @references
-##'
-##' Cong, Yongzheng, Yiran Liang, Khatereh Motamedchaboki, Romain
-##' Huguet, Thy Truong, Rui Zhao, Yufeng Shen, Daniel Lopez-Ferrer,
-##' Ying Zhu, and Ryan T. Kelly. 2020. “Improved Single-Cell Proteome
-##' Coverage Using Narrow-Bore Packed NanoLC Columns and
-##' Ultrasensitive Mass Spectrometry.” Analytical Chemistry, January.
-##' ([link to article](https://doi.org/10.1021/acs.analchem.9b04631)).
-##'
-##' @examples
-##' \donttest{
-##' cong2020AC()
-##' }
-##'
-##' @keywords datasets
-##'
-"cong2020AC"
-
-
-####---- zhu2019EL ----####
-
-
-##' Zhu et al. 2019 (eLife): chicken utricle cells
-##'
-##'
-##' Single-cell proteomics data from chicken utricle acquired to
-##' study the hair-cell development. The cells are isolated from
-##' peeled utrical epithelium and separated into hair cells (FM1-43
-##' high) and supporting cells (FM1-43 low). The sample contain either
-##' 1 cell (n = 28), 3 cells (n = 7), 5 cells (n = 8) or 20 cells (n =
-##' 14).
-##'
-##' @format A [QFeatures] object with 62 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `XYZw`: 60 assays containing PSM data. The sample are annotated
-##'   as follows. `X` indicates the experiment, either 1 or 2. `Y`
-##'   indicated the FM1-43 signal, either high (H) or low (L). `Z`
-##'   indicates the number of cells (0, 1, 3, 5 or 20). `w` indicates
-##'   the replicate, starting from `a`, it can go up to `j`.
-##' - `peptides`: quantitative data for 3444 peptides in 60 samples
-##'   (all runs are combined).
-##' - `proteins_intensity`: protein intensities for 840 proteins
-##'   from 24 samples
-##' - `proteins_iBAQ`: iBAQ values for 840 proteins from 24 samples
-##'
-##' Sample annotation is stored in `colData(zhu2019EL())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: The cells were taken from the utricles of
-##'   E15 chick embryos. Samples were stained with FM1-43FX and the
-##'   cells were dissociated using enzymatic digestion. Cells were
-##'   FACS sorted (BD Influx) and split based on their FM1-43 signal,
-##'   while ensuring no debris, doublets or dead cells are retained.
-##' - **Sample preparation** performed using the nanoPOTs device. Cell
-##'   lysis and protein extraction and reduction are performed using
-##'   dodecyl beta-D-maltoside + DTT + ammonium bicarbonate. Protein
-##'   were then alkylated using IAA. Protein digestion is performed
-##'   using Lys-C and trypsin. Finally samples acidification is
-##'   performed using formic acid.
-##' - **Separation**: Dionex UltiMate pump with an C18-Packed column
-##'   (50cm x 30um; 60nL/min)
-##' - **Ionization**: ESI (2,000V)
-##' - **Mass spectrometry**: Orbitrap Fusion Lumos Tribrid. MS1
-##'   settings: accumulation time = 246ms; resolution = 120,000; AGC =
-##'   3E6. MS/MS settings: accumulation time = 502ms; resolution =
-##'   120,000; AGC = 2E5.
-##' - **Data analysis**: Andromeda & MaxQuant (v1.5.3.30) and the
-##'   search database is NCBI GRCg6a.
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the PRIDE repository (accession ID:
-##' PXD014256).
-##'
-##' The sample annotation information is provided in the
-##' `Zhu_2019_chick_single_cell_samples_CORRECTED.xlsx` file. This file
-##' was given during a personal discussion and is a corrected version
-##' of the annotation table available on the PRIDE repository.
-##'
-##' The PSM data were found in the `evidence.txt` (in the
-##' `Experiment 1+ 2`) folder. The PSM data were filtered so that it
-##' contains only samples that are annotated. The data were then
-##' converted to a [QFeatures] object using the [scp::readSCP()]
-##' function.
-##'
-##' The peptide data were found in the `peptides.txt` file. The column
-##' names holding the quantitative data were adapted to match the
-##' sample names in the [QFeatures] object. The data were then
-##' converted to a [SingleCellExperiment] object and then inserted in
-##' the [QFeatures] object. Links between the PSMs and the peptides
-##' were added
-##'
-##' A similar procedure was applied to the protein data. The data were
-##' found in the `proteinGroups.txt` file. We split the protein table
-##' to separate the two types of quantification: summed intensity and
-##' intensity based absolute quantification (iBAQ). Both tables are
-##' converted to [SingleCellExperiment] objects and are added to the
-##' [QFeatures] object as well as the `AssayLink` between peptides and
-##' proteins.
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD014256. The source link is:
-##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2019/11/PXD014256
-##'
-##' @references
-##'
-##' Zhu, Ying, Mirko Scheibinger, Daniel Christian Ellwanger, Jocelyn
-##' F. Krey, Dongseok Choi, Ryan T. Kelly, Stefan Heller, and Peter G.
-##' Barr-Gillespie. 2019. “Single-Cell Proteomics Reveals Changes in
-##' Expression during Hair-Cell Development.” eLife 8 (November).
-##' ([link to article](https://doi.org/10.7554/eLife.50777)).
-##'
-##' @examples
-##' \donttest{
-##' zhu2019EL()
-##' }
-##'
-##' @keywords datasets
-##'
-"zhu2019EL"
-
-
-####---- liang2020_hela ----####
-
-
-##' Liang et al. 2020 (Anal. Chem.): HeLa cells (MaxQuant preprocessing)
-##'
-##' Single-cell proteomics data from HeLa cells using the autoPOTS
-##' acquisition workflow. The samples contain either no cells (blanks),
-##' 1 cell, 10 cells, 150 cells or 500 cells. Samples containing
-##' between 0 and 10 cells are isolated using micro-pipetting while
-##' samples containing between 150 and 500 cells were prepared using
-##' dilution of a bulk sample.
-##'
-##' @format A [QFeatures] object with 17 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `HeLa_*`: 15 assays containing PSM data.
-##' - `peptides`: quantitative data for 48705 peptides in 15 samples
-##'   (all runs are combined).
-##' - `proteins`: quantitative data for 3970 protein groups in 15
-##'   samples (all runs combined).
-##'
-##' Sample annotation is stored in `colData(liang2020_hela())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: The HeLa cells come from a commercially
-##'   available cell line. Samples containing between 0 and 10 cells
-##'   were isolated using micro-manipulation and the counts were
-##'   validated using a microscope. Samples containing between 150 and
-##'   500 cells were prepared by diluting a bulk sample and the exact
-##'   counts were evaluated by obtaining phtotmicrographs.
-##' - **Sample preparation** performed using the autoPOTS worflow that
-##'   relied on the OT-2 pipeting robot. Cell are lysed using
-##'   sonication. Samples are then processed by successive incubation
-##'   with DTT (reduction), then IAA (alkylation), then Lys-C and
-##'   trypsin (protein digestion).
-##' - **Separation**: Samples were injected on the column using a
-##'   modified Ultimate WPS-3000 TPL autosampler coupled to an UltiMate
-##'   3000 RSLCnano pump. The LC column is a home-packed nanoLC column
-##'   (45cm x 30um; 40nL/min)
-##' - **Ionization**: Nanospray Flex ion source (2,000V)
-##' - **Mass spectrometry**: Orbitrap Exploris 480. MS1 settings:
-##'   accumulation time = 250 ms (0-10 cells) or 100 ms (150-500 cells);
-##'   resolution = 120,000; AGC = 100\%. MS2 settings: exlusion
-##'   duration = 90 s (0-10 cells) or 60 s (150-500 cells) ; accumulation
-##'   time = 500 ms (0-1 cell), 250 ms (10 cells), 100 ms (150 cells)
-##'   or 50 ms (500 cells); resolution = 60,000 (0-10 cells) or 30,000
-##'   (150-500 cells); AGC = 5E3 (0-1 cells) or 1E4 (10-500 cells).
-##' - **Data analysis**: MaxQuant (v1.6.7.0) and the search database
-##'   is Swiss-Prot (July 2020).
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the PRIDE repository (accession ID:
-##' PXD021882).
-##'
-##' The sample annotations were collected from the methods section and
-##' from table S3 in the paper.
-##'
-##' The PSM data were found in the `evidence.txt` file. The data were
-##' converted to a [QFeatures] object using the [scp::readSCP()]
-##' function.
-##'
-##' The peptide data were found in the `peptides.txt` file. The column
-##' names holding the quantitative data were adapted to match the
-##' sample names in the [QFeatures] object. The data were then
-##' converted to a [SingleCellExperiment] object and then inserted in
-##' the [QFeatures] object. Links between the PSMs and the peptides
-##' were added
-##'
-##' A similar procedure was applied to the protein data. The data were
-##' found in the `proteinGroups.txt` file. The column names were
-##' adapted, the data were converted to a [SingleCellExperiment]
-##' object and then inserted in the [QFeatures] object. Links between
-##' the peptides and the proteins were added
-##'
-##' @source
-##' The PSM data can be downloaded from the PRIDE repository
-##' PXD021882 The source link is:
-##' http://ftp.pride.ebi.ac.uk/pride/data/archive/2020/12/PXD021882/
-##'
-##' @references
-##'
-##' Liang, Yiran, Hayden Acor, Michaela A. McCown, Andikan J. Nwosu,
-##' Hannah Boekweg, Nathaniel B. Axtell, Thy Truong, Yongzheng Cong,
-##' Samuel H. Payne, and Ryan T. Kelly. 2020. “Fully Automated Sample
-##' Processing and Analysis Workflow for Low-Input Proteome
-##' Profiling.” Analytical Chemistry, December.
-##' ([link to article](https://doi.org/10.1021/acs.analchem.0c04240)).
-##'
-##' @examples
-##' \donttest{
-##' liang2020_hela()
-##' }
-##'
-##' @keywords datasets
-##'
-"liang2020_hela"
-
-
-
-####---- schoof2021 ----####
-
-
-##' Schoof et al. 2021 (Nat. Comm.): acute myeloid leukemia
-##' differentiation
-##'
-##' Single-cell proteomics data from OCI-AML8227 cell culture to
-##' reconstruct the cellular hierarchy. The data were acquired using
-##' TMTpro multiplexing. The samples contain either no cells,
-##' single cells, 10 cells (reference channel) 200 cells (booster
-##' channel) or are simply empty wells. Single cells are expected to
-##' be one of progenitor cells (`PROG`), leukaemia stem cells (`LSC`),
-##' CD38- blast cells (`BLAST CD38-`) or CD38+ blast cells
-##' (`BLAST CD38+`). Booster are either a known 1:1:1 mix of cells
-##' (PROG, LSC and BLAST) or are isolated directly from the bulk
-##' sample. Samples were isolated and annotated using flow cytometry.
-##'
-##' @format A [QFeatures] object with 194 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `F*`: 192 assays containing PSM quantification data for 16
-##'    TMT channels. The quantification data contain signal to noise
-##'    ratios as computed by Proteome Discoverer.
-##' - `proteins`: quantitative data for 2898 protein groups in 3072
-##'   samples (all runs combined). The quantification data contain
-##'   signal to noise ratios as computed by Proteome Discoverer.
-##' - `logNormProteins`: quantitative data for 2723 protein groups in
-##'   2025 single-cell samples. This assay is the protein datasets that
-##'   was processed by the authors. Dimension reduction and clustering
-##'   data are also available in the `reducedDims` and `colData` slots,
-##'   respectively
-##'
-##' Sample annotation is stored in `colData(schoof2021())`. The cell
-##' type annotation is stored in the `Population` column. The flow
-##' cytometry data is also available: FSC-A, FSC-H, FSC-W, SSC-A,
-##' SSC-H, SSC-W, APC-Cy7-A (= CD34) and PE-A (= CD38).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: cultured AML 8227 cells were stained with
-##'   anti-CD34 and anti-CD38. The sorting was performed by FACSAria
-##'   instrument and deposited in 384 well plates.
-##' - **Sample preparation**: cells are lysed using freeze-boil and
-##'   sonication in a lysis buffer (TFE) that also includes reduction
-##'   and alkylation reagents (TCEP and CAA), followed by trypsin
-##'   (protein) and benzonase (DNA) digestion, TMT-16 labeling and
-##'   quenching, desalting using SOLAµ C18 plate, peptide
-##'   concentration, pooling and peptide concentration again. The
-##'   booster channel contains 200 cell equivalents.
-##' - **Liquid chromatography**: peptides are separated using a C18
-##'   reverse-phase column (50cm x 75 µm i.d., Thermo EasySpray) combined
-##'   to a Thermo EasyLC 1200 for 160 minute gradient with a flowrate of
-##'   100nl/min.
-##' - **Mass spectrometry**: FAIMSPro interface is used. MS1 setup:
-##'   resolution 60.000, AGC target of 300%, accumulation of 50ms. MS2
-##'   setup: resolution 45.000, AGC target of 150, 300 or 500%,
-##'   accumulation of 150, 300, 500, or 1000ms.
-##' - **Raw data processing**: Proteome Discoverer 2.4 + Sequest spectral
-##'   search engine and validation with Percolator
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the PRIDE repository (accession ID:
-##' PXD020586). The data and metadata were extracted from the
-##' `SCeptre_FINAL.zip` file.
-##'
-##' We performed extensive data wrangling to combine al the metadata
-##' available from different files into a single table available using
-##' `colData(schoof2021)`.
-##'
-##' The PSM data were found in the `bulk_PSMs.txt` file. Contaminants
-##' were defined based on the protein accessions listed in
-##' `contaminant.txt`. The data were converted to a [QFeatures]
-##'  object using the [scp::readSCP()] function.
-##'
-##' The protein data were found in the `bulk_Proteins.txt` file.
-##' Contaminants were defined based on the protein accessions listed
-##' in `contaminant.txt`.The column names holding the quantitative
-##' data were adapted to match the sample names in the [QFeatures]
-##' object. Unnecessary feature annotations (such as in which assay
-##' a protein is found) were removed. Feature names were created
-##' following the procedure in SCeptre: features names are the
-##' protein symbol (or accession if missing) and if duplicated
-##' symbols are present (protein isoforms), they are made unique by
-##' appending the protein accession.  Contaminants were defined based
-##' on the protein accessions listed in `contaminant.txt`. The data
-##' were then converted to a [SingleCellExperiment] object and
-##' inserted in the [QFeatures] object.
-##'
-##' The log-normalized protein data were found in the `bulk.h5ad` file.
-##' This dataset was generated by the authors by running the notebook
-##' called `bulk.ipynb`. The `bulk.h5ad` was loaded as an `AnnData`
-##' object using the `scanpy` Python module. The object was then
-##' converted to a `SingleCellExperiment` object using the
-##' `zellkonverter` package. The column names holding the quantitative
-##' data were adapted to match the sample names in the [QFeatures]
-##' object. The data were then inserted in the [QFeatures] object.
-##'
-##' The script to reproduce the `QFeatures` object is available at
-##' `system.file("scripts", "make-data_schoof2021.R", package = "scpdata")`
-##'
-##' @source
-##'
-##' The PSM and protein data can be downloaded from the PRIDE
-##' repository PXD020586 The source link is:
-##' https://www.ebi.ac.uk/pride/archive/projects/PXD020586
-##'
-##' @references
-##'
-##' Schoof, Erwin M., Benjamin Furtwängler, Nil Üresin, Nicolas Rapin,
-##' Simonas Savickas, Coline Gentil, Eric Lechman, Ulrich auf Dem
-##' Keller, John E. Dick, and Bo T. Porse. 2021. “Quantitative
-##' Single-Cell Proteomics as a Tool to Characterize Cellular
-##' Hierarchies.” Nature Communications 12 (1): 745679.
-##' ([link to article](http://dx.doi.org/10.1038/s41467-021-23667-y)).
-##'
-##' @examples
-##' \donttest{
-##' schoof2021()
-##' }
-##'
-##' @keywords datasets
-##'
-"schoof2021"
-
-
-####---- williams2020 LFQ ----####
-
-
-##' Williams et al. 2020 (Anal. Chem.): MCF10A cell line
-##'
-##' Single-cell label free proteomics data from a MCF10A cell line
-##' culture. The data were acquired using a label-free quantification
-##' protocole based on the nanoPOTS technology. The objective was to
-##' test 2 elution gradients for single-cell applications and to
-##' demonstrate successful use of the new nanoPOTS autosampler
-##' presented in the article. The samples contain either no cells,
-##' single cells, 3 cells, 10 cells  50 cells.
-##'
-##' @format A [QFeatures] object with 9 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides_[30 or 60]min_[intensity or LFQ]`: 3 assays
-##'   containing peptide intensities or LFQ normalized
-##'   quantifications (see `References`) for either a 30min or a 60 min
-##'   gradient.
-##' - `proteins_[30 or 60]min_[intensity or iBAQ or LFQ]`: 6 assays
-##'   containing protein intensities, iBAQ normalized or LFQ normalized
-##'   quantifications (see `References`) for either a 30min or a 60 min
-##'   gradient.
-##'
-##' Sample annotation is stored in `colData(williams2020_lfq())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: cultured MCF10A cells were isolated using
-##'   flow-cytometry based cell sorting and deposit on nanoPOTS
-##'   microwells
-##' - **Sample preparation**: cells are lysed using using a DDM+DTT
-##'   lysis buffer. Alkylation was then performed using an IAA solution.
-##'   Proteins are digested with Lys-C and trypsin followed by
-##'   acidification with FA. Sample droplets are then dried until
-##'   LC-MS/MS analysis.
-##' - **Liquid chromatography**: peptides are loaded using the new
-##'   autosampler described in the paper. Samples are loaded using a
-##'   a homemade miniature syringe pump. The samples are then desalted
-##'   and concentrated through a SPE column (4cm x 100µm i.d. packed
-##'   with 5µm C18) with microflow LC pump. The peptides are then eluted
-##'   from a long LC column (60cm x 50 µm i.d. packed with 3µm C18)
-##'   coupled to a nanoflox LC pump at 150nL/mL with either a 30 min
-##'   or a 60 min gradient.
-##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
-##'   Lumos Tribrid MS coupled to a 2kV ESI. MS1 setup: Orbitrap
-##'   analyzer at resolution 120.000, AGC target of 1E6, accumulation
-##'   of 246ms. MS2 setup: ion trap with CID at resolution 60.000, AGC
-##'   target of 2E4, accumulation of 120ms (50 cells) or 250ms (0-10
-##'   cells).
-##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
-##'   that use Andromeda search engine (with UniProtKB 2016-21-29),
-##'   MBR and LFQ normalization were enabled.
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the MASSIVE repository (accession ID:
-##' MSV000085230).
-##'
-##' The peptide and protein data were extracted from the `Peptides_[...].txt`
-##' or `ProteinGroups[...].txt` files, respectively, in the
-##' `MCF10A_LC_[30 or 60]minutes` folders.
-##'
-##' The tables were duplicated so that peptide intensisities, peptide
-##' LFQ, protein intensities, protein LFQ and protein intensities are
-##' contained in separate tables. Tables are then converted to
-##' [SingleCellExperiment] objects. Sample annotations were infered
-##' from the sample names and from the paper. All data is combined in
-##' a [QFeatures] object. [AssayLinks] were stored between peptide
-##' assays and their corresponding proteins assays based on the
-##' leading razor protein (hence only unique peptides are linked to
-##' proteins).
-##'
-##' The script to reproduce the `QFeatures` object is available at
-##' `system.file("scripts", "make-data_williams2020_lfq.R", package = "scpdata")`
-##'
-##' @section Suggestion:
-##'
-##' See `QFeatures::joinAssays` if you want to join the 30min and
-##' 60min assays in a single assay for an integrated analysis.
-##'
-##' @source
-##'
-##' The PSM and protein data can be downloaded from the MASSIVE
-##' repository MSV000085230.
-##'
-##' @references
-##'
-##' **Source article**: Williams, Sarah M., Andrey V. Liyu, Chia-Feng
-##' Tsai, Ronald J. Moore, Daniel J. Orton, William B. Chrisler,
-##' Matthew J. Gaffrey, et al. 2020. “Automated Coupling of
-##' Nanodroplet Sample Preparation with Liquid Chromatography-Mass
-##' Spectrometry for High-Throughput Single-Cell Proteomics.”
-##' Analytical Chemistry 92 (15): 10588–96.
-##' ([link to article](http://dx.doi.org/10.1021/acs.analchem.0c01551)).
-##'
-##' **LFQ normalization**: Cox, Jürgen, Marco Y. Hein, Christian A. Luber,
-##' Igor Paron, Nagarjuna Nagaraj, and Matthias Mann. 2014. “Accurate
-##' Proteome-Wide Label-Free Quantification by Delayed Normalization
-##' and Maximal Peptide Ratio Extraction, Termed MaxLFQ.” Molecular
-##' & Cellular Proteomics: MCP 13 (9): 2513–26.
-##' ([link to article](http://dx.doi.org/10.1074/mcp.M113.031591)).
-##'
-##' **iBAQ normalization**: Schwanhäusser, Björn, Dorothea Busse, Na
-##' Li, Gunnar Dittmar, Johannes Schuchhardt, Jana Wolf, Wei Chen, and
-##' Matthias Selbach. 2011. “Global Quantification of Mammalian Gene
-##' Expression Control.” Nature 473 (7347): 337–42.
-##' ([link to article](http://dx.doi.org/10.1038/nature10098)).
-##'
-##' @examples
-##' \donttest{
-##' williams2020_lfq()
-##' }
-##'
-##' @keywords datasets
-##'
-"williams2020_lfq"
-
-####---- williams2020 TMT ----####
-
-
-##' Williams et al. 2020 (Anal. Chem.): 3 AML cell line
-##'
-##' Single-cell label data from three acute myeloid
-##' leukemia cell line culture (MOLM-14, K562, CMK). The data were
-##' acquired using a TMT-based quantification protocole and the
-##' nanoPOTS technology. The objective was to demonstrate successful
-##' use of the new nanoPOTS autosampler presented in the source
-##' article. The samples contain either carrier (10 ng), reference
-##' (0.2ng), empty or single-cell samples..
-##'
-##' @format A [QFeatures] object with 4 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides_[intensity or corrected]`: 2 assays containing peptide
-##'   reporter ion intensities or corrected reporter ion intensities
-##'   as computed by MaxQuant.
-##' - `proteins_[intensity or corrected]`: 2 assays containing protein
-##'   reporter ion intensities or corrected reporter ion intensities
-##'   as computed by MaxQuant.
-##'
-##' Sample annotation is stored in `colData(williams2020_tmt())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: cultured MOLM-14, K562 or CMK cells were
-##'   isolated using flow-cytometry based cell sorting and deposit on
-##'   nanoPOTS microwells
-##' - **Sample preparation**: cells are lysed using using a DDM lysis
-##'   buffer. Proteins are digested with trypsin followed by TMT
-##'   labelling and quanching with HA. The samples are then acidified
-##'   with FA, pooled in a single samples (adding carrier and reference
-##'   peptide mixtures), and dried until LC-MS/MS analysis.
-##' - **Liquid chromatography**: peptides are loaded using the new
-##'   autosampler described in the paper. Samples are loaded using a
-##'   a homemade miniature syringe pump. The samples are then desalted
-##'   and concentrated through a SPE column (4cm x 100µm i.d. packed
-##'   with 5µm C18) with microflow LC pump. The peptides are then eluted
-##'   from a long LC column (60cm x 50 µm i.d. packed with 3µm C18)
-##'   coupled to a nanoflox LC pump at 150nL/mL (elution time is not
-##'   expliceted).
-##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
-##'   Lumos Tribrid MS coupled to a 2kV ESI. MS1 setup: Orbitrap
-##'   analyzer at resolution 120.000, AGC target of 1E6, accumulation
-##'   of 246ms. MS2 setup: Orbitrap with HCD at resolution 120.000, AGC
-##'   target of 1E6, accumulation of 246ms.
-##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
-##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the MASSIVE repository (accession ID:
-##' MSV000085230).
-##'
-##' The peptide and protein data were extracted from the
-##' `Peptides_AML_SingleCell.txt` or `ProteinGroups_AML_SingleCell.txt`
-##' files, respectively, in the `AML_SingleCell` folders.
-##'
-##' The tables were duplicated so that intensisities and corrected
-##' intensities are contained in separate tables. Tables are then
-##' converted to [SingleCellExperiment] objects. Sample annotations
-##' were inferred from the sample names, from table S2 and from the
-##' Experimental Section of the paper. All data is combined in
-##' a [QFeatures] object. [AssayLinks] were stored between peptide
-##' assays and their corresponding proteins assays based on the
-##' leading razor protein (hence only unique peptides are linked to
-##' proteins).
-##'
-##' The script to reproduce the `QFeatures` object is available at
-##' `system.file("scripts", "make-data_williams2020_tmt.R", package = "scpdata")`
-##'
-##' @source
-##'
-##' The PSM and protein data can be downloaded from the MASSIVE
-##' repository MSV000085230.
-##'
-##' @references
-##'
-##' **Source article**: Williams, Sarah M., Andrey V. Liyu, Chia-Feng
-##' Tsai, Ronald J. Moore, Daniel J. Orton, William B. Chrisler,
-##' Matthew J. Gaffrey, et al. 2020. “Automated Coupling of
-##' Nanodroplet Sample Preparation with Liquid Chromatography-Mass
-##' Spectrometry for High-Throughput Single-Cell Proteomics.”
-##' Analytical Chemistry 92 (15): 10588–96.
-##' ([link to article](http://dx.doi.org/10.1021/acs.analchem.0c01551)).
-##'
-##' @examples
-##' \donttest{
-##' williams2020_tmt()
-##' }
-##'
-##' @keywords datasets
-##'
-"williams2020_tmt"
-
-####---- leduc2022_pSCoPE ----####
-
-##' Leduc et al. 2022 - pSCoPE (biorRxiv): melanoma cells vs monocytes
-##'
-##' Single cell proteomics data acquired by the Slavov Lab. This is
-##' the dataset associated to the third version of the preprint. It
-##' contains quantitative information of melanoma cells and monocytes
-##' at PSM, peptide and protein level. This version of the data was
-##' acquired using the pSCoPE MS acquisition approach.
-##'
-##' @format A [QFeatures] object with 138 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-134: PSM data acquired with a TMT-18plex protocol, hence
-##'   those assays contain 18 columns. Columns hold quantitative
-##'   information from single-cell channels, carrier channels,
-##'   reference channels, empty (negative control) channels and
-##'   unused channels.
-##' - `peptides`: peptide data containing quantitative data for 20,804
-##'   peptides and 1556 single-cells. These data have been filtered
-##'   to keep high-quality PSMs, all batches have been normalized to
-##'   the reference channel, PSMs were aggregated to peptides, and
-##'   single-cells with low median coefficient of variation were kept.
-##' - `peptides_log`: peptide data containing quantitative data for
-##'   12,284 peptides and 1543 single-cells. The `peptides` data was
-##'   further normalized, highly missing peptides were removed and the
-##'   quantifications were log-transformed.
-##' - `proteins_norm2`: protein data containing quantitative data for
-##'   2844 proteins and 1543 single-cells. The peptides from
-##'   `peptides_log` were aggregated to proteins and normalized.
-##' - `proteins_processed`: protein data containing quantitative data
-##'   for 2844 proteins and 1543 single-cells. The `proteins_norm2`
-##'   data were imputed, batch corrected and normalized.
-##'
-##' The `colData(leduc2022_pSCoPE())` contains cell type annotation,
-##' LC batch information, the TMT label, the MS run ID. We also added
-##' the sample prep annotations provided by the cellenONE dispensing
-##' device (only for single cells): time stamp of cell isolation by the
-##' device, the diameter and elongation of the cell, the ID of the
-##' sample glass side (4 slides in total), the field within the glass
-##' (each slide is divided in 4 field), the pooled well ID (each field
-##' contains 9 pools), the x and y coordinates of each cell dropped in
-##' a field and of each cell pool upon pickup. Finally, we also
-##' retrieved the melanoma subpopulation generated by the authors upon
-##' data analysis. The main population is encoded as `A` while the
-##' small population is encoded `B`. The description of the `rowData`
-##' fields for the PSM data can be found in the
-##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: CellenONE cell sorting.
-##' - **Sample preparation** performed using the improved SCoPE2
-##'   protocol using the CellenONE liquid handling system. nPOP cell
-##'   lysis (DMSO) + trypsin digestion + TMT-18plex
-##'   labeling and pooling. A target library was generated as well to
-##'   perform prioritized DDA (Huffman et al. 2022) using MaxQuant.Live
-##'   (2.0.3).
-##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
-##'   25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).
-##' - **Ionization**: ESI (1,800V).
-##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
-##'   resolution = 70,000; MS2 accumulation time = 300ms; MS2
-##'   resolution = 70,000). Prioritized data acquisition was performed
-##'   using the pSCoPE protocol (Huffman et al. 2022)
-##' - **Data analysis**: MaxQuant (1.6.17.0) + DART-ID
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a shared Google Drive folder that
-##' is accessible from the SlavovLab website (see `Source` section).
-##' The folder contains the following files of interest:
-##'
-##' - `ev_updated.txt`: the MaxQuant/DART-ID output file
-##' - `annotation.csv`: sample annotation
-##' - `batch.csv`: batch annotation
-##' - `t0.csv`: the processed data table containing the `peptides` data
-##' - `t3.csv`: the processed data table containing the `peptides_log`
-##'   data
-##' - `t4b.csv`: the processed data table containing the
-##'   `proteins_norm2` data
-##' - `t6.csv`: the processed data table containing the
-##'   `proteins_processed` data
-##'
-##' We combined the sample annotation and the batch annotation in
-##' a single table. We also formatted the quantification table so that
-##' columns match with those of the annotations. Both annotation and
-##' quantification tables are then combined in a single [QFeatures]
-##' object using the [scp::readSCP()] function.
-##'
-##' The 4 CSV files were loaded and formatted as [SingleCellExperiment]
-##' objects and the sample metadata were matched to the column names
-##' (mapping is retrieved after running the author's original R script)
-##' and stored in the `colData`.
-##' The object is then added to the [QFeatures] object (containing the
-##' PSM assays) and the rows of the peptide data are linked to the
-##' rows of the PSM data based on the peptide sequence information
-##' through an `AssayLink` object.
-##'
-##' @source
-##' The data were downloaded from the
-##' [Slavov Lab](https://scp.slavovlab.net/Leduc_et_al_2022) website.
-##' The raw data and the quantification data can also be found in the
-##' massIVE repository `MSV000089159`:
-##' ftp://massive.ucsd.edu/MSV000089159.
-##'
-##' @references
-##' Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet
-##' Sample Preparation for Single-Cell Proteomics Applied to the Cell
-##' Cycle.” bioRxiv. [Link to article](https://doi.org/10.1101/2021.04.24.441211)
-##'
-##' Gray Huffman, Andrew Leduc, Christoph Wichmann, Marco di Gioia,
-##' Francesco Borriello, Harrison Specht, Jason Derks, et al. 2022.
-##' “Prioritized Single-Cell Proteomics Reveals Molecular and
-##' Functional Polarization across Primary Macrophages.” bioRxiv.
-##' [Link to article](https://doi.org/10.1101/2022.03.16.484655).
-##'
-##' @seealso
-##' [leduc2022_plexDIA]
-##'
-##' @examples
-##' \donttest{
-##' leduc2022_pSCoPE()
-##' }
-##'
-##' @keywords datasets
-##'
-"leduc2022_pSCoPE"
-
-####---- leduc2022_plexDIA ----####
-
-##' Leduc et al. 2022 - plexDIA (biorRxiv): melanoma cells
-##'
-##' Single cell proteomics data acquired by the Slavov Lab. This is
-##' the dataset associated to the fourth version of the preprint (and
-##' the Genome Biology publication). It contains quantitative
-##' information of melanoma cells at precursor, peptide and protein level.
-##' This version of the data was acquired using the plexDIA MS
-##' acquisition protocol.
-##'
-##' @format A [QFeatures] object with 48 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-45: precursor data acquired with a mTRAQ-3 protocol,
-##'   hence those assays contain 3 columns. Columns hold quantitative
-##'   information from single cells or negative control samples.
-##' - `Ms1Extracted`: the DIA-NN MS1 extracted signal, it combines the
-##'   information from assays 1-45.
-##' - `peptides`: peptide data containing quantitative data for 3,608
-##'   peptides and 104 single cells. The  data were filtered  to  1%
-##'   protein FDR.
-##' - `proteins`: protein data containing quantitative data for 508
-##'   proteins and 105 single cells. Note that the peptide and protein
-##'   data provided by the authors differ by 3 samples. The precursor
-##'   data were aggregated to protein intensity using maxLFQ. The
-##'   protein data were further median normalized by column and by row,
-##'   log2 transformed, impute using KNN (k = 3), again median
-##'   normalized by column and by row, batch corrected using ComBat,
-##'   and median normalized by column and by row once more.
-##'
-##' The `colData(leduc2022_plexDIA())` contains cell type annotation and
-##' batch annotation that are common to all assays. The description of
-##' the `rowData` fields for the precursor data can be found in the
-##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: CellenONE cell sorting.
-##' - **Sample preparation** performed using the improved SCoPE2
-##'   protocol using the CellenONE liquid handling system. nPOP cell
-##'   lysis (DMSO) + trypsin digestion + mTRAQ-3
-##'   labeling and pooling.
-##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
-##'   25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).
-##' - **Ionization**: ESI (1,800V).
-##' - **Mass spectrometry**: Thermo Scientific Q-Exactive. The duty
-##'   cycle = 1 MS1 + 4 DIA MS2 windows (120 Th, 120 Th, 200 Th and
-##'   580 Th, spanning 378-1,402 m/z). Each MS1 and MS2 scan was
-##'   conducted at 70,000 resolving power, 3×10E6 AGC and 300ms
-##'   maximum injection time.
-##' - **Data analysis**: DIA-NN.
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a shared Google Drive folder that
-##' is accessible from the SlavovLab website (see `Source` section).
-##' The folder contains the following files of interest:
-##'
-##' - `annotation_plexDIA.csv`: sample annotation
-##' - `report_plexDIA_mel_nPOP.tsv`: the DIA-NN output file
-##'   with the precursor data
-##' - `report.pr_matrix_channels_ms1_extracted.tsv`: the DIA-NN
-##'   output file with the combined precursor data
-##' - `plexDIA_peptide.csv`: the processed data table containing the
-##'   `peptide` data
-##' - `plexDIA_protein_imputed.csv`: the processed data table
-##'   containing the `protein` data
-##'
-##' We removed the failed runs as identified by the authors. We also
-##' formatted the annotation and precuror quantification tables to
-##' facilitate matching between corresponding columns. Both annotation
-##' and quantification tables are then combined in a single [QFeatures]
-##' object using `scp::readSCPfromDIANN()`.
-##'
-##' The `plexDIA_peptide.csv` and `plexDIA_protein_imputed.csv` files
-##' were loaded and formatted as [SingleCellExperiment] objects. The
-##' columns names were adapted to match those in the `QFeatures`
-##' object. The `SingleCellExperiment` objects were then added to the
-##' [QFeatures] object and the rows of the peptide data are linked to
-##' the rows of the precursor data based on the peptide sequence or
-##' the protein name through an `AssayLink` object.
-##'
-##' @source
-##' The links to the data were found on the
-##' [Slavov Lab website](https://scp.slavovlab.net/Leduc_et_al_2022).
-##' The data were downloaded from the
-##' [Google drive folder 1](https://drive.google.com/drive/folders/117ZUG5aFIJt0vrqIxpKXQJorNtekO-BV) and
-##' [Google drive folder 2](https://drive.google.com/drive/folders/12-H2a1mfSHZUGf8O50Cr0pPZ4zIDjTac).
-##' The raw data and the quantification data can also be found in the
-##' massIVE repository `MSV000089159`:
-##' ftp://massive.ucsd.edu/MSV000089159.
-##'
-##' @references
-##' Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet
-##' Sample Preparation for Single-Cell Proteomics Applied to the Cell
-##' Cycle.” bioRxiv. [Link to article](https://doi.org/10.1101/2021.04.24.441211)
-##'
-##' Andrew Leduc, Gray Huffman, Joshua Cantlon, Saad Khan, and Nikolai
-##' Slavov. 2022. “Exploring Functional Protein Covariation across
-##' Single Cells Using nPOP.” Genome Biology 23 (1): 261.
-##' [Link to article](http://dx.doi.org/10.1186/s13059-022-02817-5)
-##'
-##' Jason Derks, Andrew Leduc, Georg Wallmann, Gray Huffman, Matthew
-##' Willetts, Saad Khan, Harrison Specht, Markus Ralser, Vadim
-##' Demichev, and Nikolai Slavov. 2023. “Increasing the Throughput of
-##' Sensitive Proteomics by plexDIA.” Nature Biotechnology 41 (1):
-##' 50–59. [Link to article](http://dx.doi.org/10.1038/s41587-022-01389-w)
-##'
-##' @seealso
-##' [leduc2022_pSCoPE]
-##'
-##' @examples
-##' \donttest{
-##' leduc2022_plexDIA()
-##' }
-##'
-##' @keywords datasets
-##'
-"leduc2022_plexDIA"
-
-####---- derks2022 ----####
-
-##' Derks et al. 2022 - plexDIA (Nat. Biotechnol.): PDAC vs melanoma
-##' cells vs monocytes
-##'
-##' Single cell proteomics data acquired by the Slavov Lab using the
-##' plexDIA protocol. It contains quantitative information from
-##' pancreatic ductal acinar cells (PDAC; HPAF-II), melanoma cells
-##' (WM989-A6-G3) and monocytes (U-937) at precursor and protein
-##' level. The each run acquired 3 samples thanks to mTRAQ
-##' multiplexing.
-##'
-##' @format A [QFeatures] object with 66 assays, each assay being a
-##' [SingleCellExperiment] object. The assays either hold the DIA-NN
-##' main output report table or the DIA-NN MS1 extracted signal table.
-##' The DIA-NN main output report table contains the results of the spectrum
-##' identification and quantification. The DIA-NN MS1 extracted
-##' signal table contains quantification for all mTRAQ channels if its
-##' precursors was identified in at least one of the channels,
-##' regardless of whether there is sufficient evidence in those
-##' channels at 1% FDR.
-##'
-##' The data is composed of three datasets
-##'
-##' 1. **Bulk**: dataset containing bulk (100-cell) data acquired
-##'    using a Q-Exactive mass spectrometer. Assays 1-3 contain data
-##'    from the DIA-NN main output report; assay 4 is the DIA-NN MS1
-##'    extracted signal.
-##' 2. **tims**: dataset containing single-cell data acquired using a
-##'    timsTOF-SCP mass spectrometer. Assays 5-15  contain data
-##'    from the DIA-NN main output report; assay 16 is the DIA-NN MS1
-##'    extracted signal.
-##' 3. **qe**: dataset containing single-cell data acquired
-##'    using a Q-Exactive mass spectrometer. Assays 17-64 contain data
-##'    from the DIA-NN main output report; assay 65 is the DIA-NN MS1
-##'    extracted signal.
-##'
-##' The last assay `proteins` contains the processed protein data
-##' table generated by the authors.
-##'
-##' The `colData(derks2022())` contains cell type annotations and
-##' batch annotations. The description of the `rowData` fields for the
-##' different assays can be found in the
-##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: CellenONE cell sorting.
-##' - **Sample preparation** performed using the improved SCoPE2
-##'   protocol using the CellenONE liquid handling system. nPOP cell
-##'   lysis (DMSO) + trypsin digestion + mTRAQ (3plex) labelling and
-##'   pooling. A target library was generated as well to
-##'   perform prioritized DDA (Huffman et al. 2022) using MaxQuant.Live
-##'   (2.0.3).
-##' - **Separation**: `bulk` - online nLC (Dionex UltiMate 3000 UHPLC)
-##'   with a 25 cm × 75 µm IonOpticks Aurora Series UHPLC column
-##'   (AUR2-25075C18A), 200nL/min. `qe` - online nLC (Dionex UltiMate
-##'   3000 UHPLC) with a 15 cm × 75 µm IonOpticks Aurora Series UHPLC
-##'   column (AUR2-15075C18A), 200nL/min. `tims` - nanoElute liquid
-##'   chromatography system (Bruker Daltonics) using a 25 cm × 75 µm,
-##'   1.6-µm C18 (AUR2-25075C18A-CSI, IonOpticks).
-##' - **Ionization**: ESI.
-##' - **Mass spectrometry**: cf article.
-##' - **Data analysis**: DIA-NN (1.8.1 beta 16).
-##'
-##' @section Data collection:
-##'
-##' The data were collected from a shared Google Drive
-##' [folder](https://drive.google.com/drive/folders/1pUC2zgXKtKYn22mlor0lmUDK0frgwL)
-##' that is accessible from the SlavovLab website (see `Source` section).
-##'
-##' For each dataset separately, we combined the sample annotation
-##' and the DIANN tables in a [QFeatures] object following the `scp`
-##' data structure. We then combined the three datasets in a single
-##' `QFeatures` object. We load the proteins table processed by the
-##' authors as a [SingleCellExperiment] object and adapted the sample
-##' names to match those in the `QFeatures` object. We added the
-##' protein data as a new assay and link the precursors to proteins
-##' using the `Protein.Group` variable from the `rowData`.
-##'
-##' @source
-##' The data were downloaded from the
-##' [Slavov Lab](https://scp.slavovlab.net/Derks_et_al_2022) website.
-##' The raw data and the quantification data can also be found in the
-##' massIVE repository `MSV000089093`.
-##'
-##' @references
-##' Derks, Jason, Andrew Leduc, Georg Wallmann, R. Gray Huffman,
-##' Matthew Willetts, Saad Khan, Harrison Specht, Markus Ralser,
-##' Vadim Demichev, and Nikolai Slavov. 2022. "Increasing the
-##' Throughput of Sensitive Proteomics by plexDIA." Nature
-##' Biotechnology, July.
-##' [Link to article](http://dx.doi.org/10.1038/s41587-022-01389-w)
-##'
-##' @examples
-##' \donttest{
-##' derks2022()
-##' }
-##'
-##' @keywords datasets
-##'
-"derks2022"
-
-
-####---- brunner2022 ----####
-
-##' Brunner et al. 2022 (Mol. Syst. Biol.): cell cycle state study
-##'
-##' Single cell proteomics data acquired by the Mann Lab using a newly
-##' designed timsTOF instrument, referred to as timsTOF-SCP. The
-##' dataset contains quantitative information from single-cells blocked
-##' at 4 cell cycle stages: G1, G1-S, G2, G2-M. The data was acquired
-##' using a label-free sample preparation protocole combined to a
-##' data independent (DIA) acquisition mode.
-##'
-##' @format A [QFeatures] object with 435 assays, each assay being a
-##' [SingleCellExperiment] object.
-##'
-##' - Assay 1-434: DIA-NN main output report table split for each
-##'   acquisition run. Since each run acquires 1 single cell, each
-##'   assay contains a single column. It contains the results
-##'   of the spectrum identification and quantification.
-##' - `protein`: DIA-NN protein group matrix, containing normalised
-##'   quantities for 2476 protein groups in 434 single cells. Proteins
-##'   are filtered at 1% FDR, using global q-values for protein groups
-##'   and both global and run-specific q-values for precursors.
-##'
-##' The `colData(brunner2022())` contains cell type annotations and
-##' batch annotations. The description of the `rowData` fields for the
-##' different assays can be found in the
-##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: cells were detached with trypsin treatment,
-##'   followed by strong pipetting, and isolate using FACS.
-##' - **Sample preparation**: cell lysis by freeze-heat followed by
-##'   sonication, overnight protein digestion with trypsin/lysC mix and
-##'   desalting using EvoTips trap column (EvoSep)
-##' - **Separation**: online EvoSep One LC system using a 5 cm x 75 µm
-##'   ID column with 1.9µm C18 beads (EvoSep) at 100nL/min flow rate.
-##' - **Ionization**: 10µm ID zero dead volume electrospray emitter
-##'   (Bruker Daltonik) + nanoelectro-spray ion source (Captive spray,
-##'   Bruker Daltonik)
-##' - **Mass spectrometry**: DIA PASEF mode. Correlation between IM
-##'   and m/z was used to synchronize the elution of precursors from
-##'   each IM scan with the quadrupole isolation window. Five
-##'   consecutive diaPASEF cycles. The collision energy was ramped
-##'   linearly as a function of the IM from 59 eV at 1/K0=1.6 Vs cm^2
-##'   to 20 eV at 1/K0=0.6 Vs cm^2.
-##' - **Data analysis**: DIA-NN (1.8).
-##'
-##' @section Data collection:
-##'
-##' The data were collected from the PRIDE
-##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
-##' in the `DIANN1.8_SingleCells_CellCycle.zip` file.
-##'
-##' We loaded the DIA-NN main report table and generated a sample
-##' annotation table based on the MS file names. We next combined the
-##' sample annotation and the DIANN tables into a [QFeatures] object
-##' following the `scp` data structure. We loaded the proteins group
-##' matrix as a [SingleCellExperiment] object, fixed ambiguous
-##' protein group names, and added the protein data as a new assay and
-##' link the precursors to proteins using the `Protein.Group` variable
-##' from the `rowData`.
-##'
-##' @source
-##' The data were downloaded from PRIDE
-##' [repository](https://www.ebi.ac.uk/pride/archive/projects/PXD024043)
-##' with accession ID `PXD024043`.
-##'
-##' @references
-##' Brunner, Andreas-David, Marvin Thielert, Catherine Vasilopoulou,
-##' Constantin Ammar, Fabian Coscia, Andreas Mund, Ole B. Hoerning, et
-##' al. 2022. "Ultra-High Sensitivity Mass Spectrometry Quantifies
-##' Single-Cell Proteome Changes upon Perturbation." Molecular Systems
-##' Biology 18 (3): e10798.
-##' [Link to article](http://dx.doi.org/10.15252/msb.202110798)
-##'
-##' @examples
-##' \donttest{
-##' brunner2022()
-##' }
-##'
-##' @keywords datasets
-##'
-"brunner2022"
-
-####---- woo2022_macrophage ----####
-
-##' Woo et al. 2022 (Cell Syst.): LPS-treated macrophages
-##'
-##' Single-cell data from macrophages subjected to 3 LPS
-##' treatments. The data were
-##' acquired using the TIFF (transfer identification based on FAIMS
-##' filtering) acquisition method. The data contain 155 single cells:
-##' 54 control cells (no treatment), 52 cells treated with LPS during
-##' 24h and 49 cells treated with LPS during 49h.
-##'
-##' @format A [QFeatures] object with 5 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides_[intensity or LFQ]`: 2 assays containing peptide
-##'   quantities or normalized quantities using the maxLFQ method
-##'   as computed by MaxQuant.
-##' - `proteins_[intensity or iBAQ or LFQ]`: 3 assays containing
-##'   protein quantities or normalized proteins using the iBAQ or
-##'   maxLFQ methods as computed by MaxQuant.
-##'
-##' Sample annotation is stored in `colData(woo_macrophage())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: cultured RAW 264.7 cells treated or not
-##'   with 100 ng/ul LPS. The cells were sorted using the Influx II
-##'   cell sorter and deposited on a nanoPOTS chip.
-##' - **Sample preparation**: cells are lysed using using a DDM+DTT
-##'   lysis and reduction buffer. The proteins are alkylated with IAA
-##'   and digested with LysC and trypsin. Samples are then acidified
-##'   with FA, vacuum dried and stored in freezer until data
-##'   acquisition.
-##' - **Liquid chromatography**: peptides are loaded using an in-house
-##'   autosampler (Williams et al. 2020). The samples are concentrated
-##'   through a SPE column (4cm x 100µm i.d. packed with 5µm C18) with
-##'   microflow LC pump. The peptides are then eluted from an LC
-##'   column (25cm x 50 µm i.d. packed with 1.7µm C18) from a 60 min
-##'   gradient (100nL/min).
-##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
-##'   Lumos Tribrid MS with FAIMSpro coupled to a 2.4 kV ESI. FAIMS
-##'   setup: 4-CV method (-45, -55, -65, -75 V). MS1 setup: resolution
-##'   = 120.000, range = 350-1500 m/z,AGC target of 1E6, accumulation
-##'   of 254ms. MS2 setup: 30% HCD, resolution AGC 2E4, accumulation
-##'   of 254ms.
-##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
-##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
-##'   MBR was enabled.
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the MASSIVE repository (accession ID:
-##' MSV000085937).
-##'
-##' The peptide and protein data were extracted from the
-##' `peptides_RAW_LPS_scProteomics.txt` or
-##' `proteinGroups_RAW_LPS_scProteomics.txt` files, respectively, in
-##' the `RAW_LPS_SingleCellProteomics` folders.
-##'
-##' The tables were split so that intensities, maxLFQ, and iBAQ
-##' data are contained in separate tables. Tables are then
-##' converted to [SingleCellExperiment] objects. Sample annotations
-##' were inferred from the sample names. All data is combined in
-##' a [QFeatures] object. [AssayLinks] were stored between peptide
-##' assays and their corresponding proteins assays based on the
-##' leading razor protein (hence only unique peptides are linked to
-##' proteins).
-##'
-##' The script to reproduce the `QFeatures` object is available at
-##' `system.file("scripts", "make-data_woo2022_macrophage.R", package = "scpdata")`
-##'
-##' @source
-##'
-##' The peptide and protein data can be downloaded from the MASSIVE
-##' repository MSV000085937
-##'
-##' @references
-##'
-##' **Source article**: Woo, Jongmin, Geremy C. Clair, Sarah M.
-##' Williams, Song Feng, Chia-Feng Tsai, Ronald J. Moore, William B.
-##' Chrisler, et al. 2022. “Three-Dimensional Feature Matching
-##' Improves Coverage for Single-Cell Proteomics Based on Ion Mobility
-##' Filtering.” Cell Systems 13 (5): 426–34.e4.
-##' ([link to article](http://dx.doi.org/10.1016/j.cels.2022.02.003)).
-##'
-##' @examples
-##' \donttest{
-##' woo2022_macrophage()
-##' }
-##'
-##' @keywords datasets
-##'
-"woo2022_macrophage"
-
-####---- woo2022_lung ----####
-
-##' Woo et al. 2022 (Cell Syst.): 26 primary human lung cells
-##'
-##' Single-cell proteomics data from dissociated primary human lung
-##' cells. The data were
-##' acquired using the TIFF (transfer identification based on FAIMS
-##' filtering) acquisition method. The data contain 26 single cells.
-##'
-##' @format A [QFeatures] object with 5 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `peptides_[intensity or LFQ]`: 2 assays containing peptide
-##'   quantities or normalized quantities using the maxLFQ method
-##'   as computed by MaxQuant.
-##' - `proteins_[intensity or iBAQ or LFQ]`: 3 assays containing
-##'   protein quantities or normalized proteins using the iBAQ or
-##'   maxLFQ methods as computed by MaxQuant.
-##'
-##' Sample annotation is stored in `colData(woo_lung())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: primary human lung cells were dissociated
-##'   following the protocol in Bandyopadhyay et al., 2018. The cells
-##'   were sorted using the Influx II cell sorter and deposited on a
-##'   nanoPOTS chip.
-##' - **Sample preparation**: cells are lysed using using a DDM+DTT
-##'   lysis and reduction buffer. The proteins are alkylated with IAA
-##'   and digested with LysC and trypsin. Samples are then acidified
-##'   with FA, vacuum dried and stored in freezer until data
-##'   acquisition.
-##' - **Liquid chromatography**: peptides are loaded using an in-house
-##'   autosampler (Williams et al. 2020). The samples are concentrated
-##'   through a SPE column (4cm x 100µm i.d. packed with 5µm C18) with
-##'   microflow LC pump. The peptides are then eluted from an LC
-##'   column (25cm x 50 µm i.d. packed with 1.7µm C18) from a 60 min
-##'   gradient (100nL/min).
-##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
-##'   Lumos Tribrid MS with FAIMSpro coupled to a 2.4 kV ESI. FAIMS
-##'   setup: 4-CV method (-45, -55, -65, -75 V). MS1 setup: resolution
-##'   = 120.000, range = 350-1500 m/z,AGC target of 1E6, accumulation
-##'   of 254ms. MS2 setup: 30% HCD, resolution AGC 2E4, accumulation
-##'   of 254ms.
-##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
-##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
-##'   MBR was enabled.
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the MASSIVE repository (accession ID:
-##' MSV000085937).
-##'
-##' The peptide and protein data were extracted from the
-##' `peptides_nondepleted_Lung_scProteomics.txt` or
-##' `proteinGroups_nondepleted_Lung_scProteomics.txt` files,
-##' respectively, in the `NonDepleted_Lung_SingleCellProteomics`
-##' folders.
-##'
-##' The tables were split so that intensities, maxLFQ, and iBAQ
-##' data are contained in separate tables. Tables are then
-##' converted to [SingleCellExperiment] objects. Sample annotations
-##' were inferred from the sample names. All data is combined in
-##' a [QFeatures] object. [AssayLinks] were stored between peptide
-##' assays and their corresponding proteins assays based on the
-##' leading razor protein (hence only unique peptides are linked to
-##' proteins).
-##'
-##' The script to reproduce the `QFeatures` object is available at
-##' `system.file("scripts", "make-data_woo2022_lung.R", package = "scpdata")`
-##'
-##' @source
-##'
-##' The peptide and protein data can be downloaded from the MASSIVE
-##' repository MSV000085937
-##'
-##' @references
-##'
-##' **Source article**: Woo, Jongmin, Geremy C. Clair, Sarah M.
-##' Williams, Song Feng, Chia-Feng Tsai, Ronald J. Moore, William B.
-##' Chrisler, et al. 2022. “Three-Dimensional Feature Matching
-##' Improves Coverage for Single-Cell Proteomics Based on Ion Mobility
-##' Filtering.” Cell Systems 13 (5): 426–34.e4.
-##' ([link to article](http://dx.doi.org/10.1016/j.cels.2022.02.003)).
-##'
-##' @examples
-##' \donttest{
-##' woo2022_lung()
-##' }
-##'
-##' @keywords datasets
-##'
-"woo2022_lung"
-
-####---- gregoire2023_mixCTRL ----####
-
-##' Grégoire et al. 2023 - mixCTRL (arXiv): benchmark using
-##' monocytes/macrophages
-##'
-##' Single cell proteomics data acquired using the SCoPE2 protocol.
-##' The dataset contains two monocytes cell lines (THP1 and U937) as
-##' well as controled mixtures of both and macrophage-like cells
-##' produced upon PMA treatment. It contains quantitative information
-##' at PSM, peptide and protein levels. Data was acquired using Lumos
-##' Orbitrap (mainly) and timsTOF SCP mass spectrometers.
-##'
-##' @format A [QFeatures] object with 119 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assays 1-42: PSM data acquired with a TMT-16plex protocol, hence
-##'   those assays contain 16 columns. Columns hold quantitative
-##'   information from single-cell channels, carrier channels,
-##'   blank (negative control) channels and unused channels.
-##' - Assays 43-84: peptide data resulting from the PSM to peptide
-##'   aggregation of the 42 PSM assays.
-##' - Assays 85-91: peptide data for each of the 7 acquisition
-##'   batches. Peptide data were joined based on their respective
-##'   acquisition batches.
-##' - Assays 92-98: normalised peptide data.
-##' - Assays 99-105: normalised and log-transformed peptide data.
-##' - Assays 106-112: protein data for each of the 7 acquisition
-##'   batches. Normalised and log-transformed peptide data were
-##'   agreggated to protein.
-##' - Assays 113-119: Batch corrected protein data. Normalised and
-##'   log-transformed protein data were batch corrected to remove
-##'   technical variability induced by runs and channels.
-##'
-##' All the data has been filtered to keep high quality features and
-##' samples.
-##'
-##' The `colData(gregoire2023_mixCTRL())` contains cell type annotation and
-##' batch annotation that are common to all assays. The description of
-##' the `rowData` fields for the PSM data can be found in the
-##' [`sage` documentation](https://sage-docs.vercel.app/docs/results/search).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see *References*).
-##'
-##' - **Cell isolation**: BD FACSAria III cell sorting.
-##' - **Sample preparation** performed using the SCoPE2 protocol: mPOP
-##'   cell lysis + trypsin digestion + TMT-16plex labeling and
-##'   pooling.
-##' - **Separation**: online nLC (Ultimate 3000 LC System or Vanquish
-##'   Neo UHPLC System) with a BioZen Peptide Polar C18 250 x 0.0075mm
-##'   column.
-##' - **Mass spectrometry**:  Orbitrap Fusion Lumos Tribrid (MS1
-##'   resolution = 70,000; MS2 accumulation time = 120ms; MS2
-##'   resolution = 70,000) and timsTOF SCP.
-##' - **Data preprocessing**: Sage.
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a Zenodo archive (see `Source`
-##' section). The folder contains the following files of interest:
-##'
-##' - `results.sage.cbio.tsv`: the sage identification output file for
-##'   batches acquired on the Lumos MS.
-##' - `results.sage.giga.tsv`: the sage identification output file for
-##'   batches acquired on the timsTOF SCP MS.
-##' - `quant.cbio.tsv`: the sage quantification output file for
-##'   batches acquired on the Lumos MS.
-##' - `quant.giga.tsv`: the sage quantification output file for
-##'   batches acquired on the timsTOF SCP MS.
-##' - `sampleAnnotation_batch.csv`: sample annotation for each
-##'   acquisition batch. There are in total 8 different annotation
-##'   files.
-##'
-##' We combined the sample annotations in a single table. We also
-##' combined `cbio` and `giga` tables together and merged resulting
-##' identification and quantification tables. Both annotation and
-##' features tables are then combined in a single [QFeatures] object
-##' using the [scp::readSCP()] function.
-##'
-##' The [QFeatures] object was processed as described in the author's
-##' manuscript (see `source`). Note that the imputed assays were used
-##' in the paper for illustrative purposes only and have not been
-##' reproduced here.
-##'
-##' @source
-##' The data were downloaded from the [Zenodo
-##' repository](https://zenodo.org/records/8417228).  The raw data and
-##' the quantification data can also be found in the ProteomeXchange
-##' Consortium via the [PRIDE partner
-##' repository](https://www.ebi.ac.uk/pride/archive/projects/PXD046211),
-##' project `PXD046211`.
-##'
-##' @references
-##' Samuel Grégoire, Christophe Vanderaa, Sébastien Pyr dit Ruys,
-##' Gabriel Mazzucchelli, Christopher Kune, Didier Vertommen and
-##' Laurent Gatto. 2023. *Standardised workflow for mass spectrometry-
-##' based single-cell proteomics data processing and analysis using
-##' the scp package.*
-##' arXiv. DOI:[10.48550/arXiv.2310.13598](https://doi.org/10.48550/arXiv.2310.13598)
-##'
-##' @examples
-##' \donttest{
-##' gregoire2023_mixCTRL()
-##' }
-##'
-##' @keywords datasets
-##'
-"gregoire2023_mixCTRL"
-
-####---- khan2023 ----####
-
-
-##' Khan et al, 2023 (biorRxiv): Epithelial–Mesenchymal Transition
-##'
-##' @description
-##'
-##' Single-cell samples were prepared using the nPOP sample
-##' preparation method.  Proteomics data were acquired using the
-##' SCoPE2 protocol on a Thermo Scientific Q-Exactive mass
-##' spectrometer. The dataset contains quantitative information on 421
-##' MCF-10A single cells undergoing epithelial–mesenchymal transition
-##' (EMT) triggered by TGF beta. The data are available at the PSM,
-##' and protein levels. The paper investigates the dynamics of
-##' correlation modules at the protein level.
-##'
-##' @format A [QFeatures] object with 47 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-44: PSM data acquired with a TMTPro 16plex protocol, hence
-##'   those assays contain 16 columns. Columns hold quantitative information
-##'   from single-cell channels, carrier channels, reference channels,
-##'   empty (negative control) channels and unused channels.
-##' - `peptides`: peptide data containing quantitative data for 10055
-##'   peptides and 421 single-cells.
-##' - `proteins_imputed`: protein data containing quantitative data for 4096
-##'   proteins and 421 single-cells with k-nearest neighbors (KNN) imputation.
-##' - `proteins_unimputed`: protein data containing quantitative data for 4096
-##'   proteins and 421 single-cells without imputation.
-##'
-##' The `colData(khan2023())` contains cell type and batch annotations that
-##' are common to all assays. The description of the `rowData` fields for the
-##' PSM data can be found in the
-##' [`MaxQuant` documentation](https://cox-labs.github.io/coxdocs/output_tables.html).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: CellenONE cell sorting.
-##' - **Sample preparation** performed using the SCoPE2 protocol. nPOP
-##'   cell lysis (DMSO) + trypsin digestion + TMTPro 16plex protocol.
-##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
-##'   25cm x 75um IonOpticks Odyssey Series column (ODY3-25075C18); 200nL/min).
-##' - **Ionization**: ESI (1,700 V).
-##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
-##'   resolution = 70,000; MS1 accumulation time = 300ms; MS2
-##'   resolution = 70,000).
-##' - **Data analysis**: MaxQuant(2.4.13.0) + DART-ID.
-##'
-##' @section Data collection:
-##'
-##' The PSM data were collected from a shared Google Drive folder that
-##' is accessible from the SlavovLab website (see `Source` section).
-##' The folder ('/002-singleCellDataGeneration') contains the following
-##' files of interest:
-##'
-##' - `ev_updated_NS.DIA.txt`: the MaxQuant/DART-ID output file
-##' - `annotation.csv`: sample annotation
-##' - `batch.csv`: batch annotation
-##'
-##' We combined the sample annotation and the batch annotation in
-##' a single table. We also formatted the quantification table so that
-##' columns match with those of the annotation and filter only for
-##' single-cell runs. Both table are then combined in a single
-##' [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' The peptide data were taken from the same google drive folder
-##' (`EpiToMesen.TGFB.nPoP_trial1_pepByCellMatrix_NSThreshDART_medIntCrNorm.txt`).
-##' The data were formatted to a [SingleCellExperiment] object and the sample
-##' metadata were matched to the column names (mapping is retrieved
-##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
-##' stored in the `colData`. The object is then added to the [QFeatures] object
-##' and the rows of the PSM data are linked to the rows of the peptide data
-##' based on the peptide sequence information through an `AssayLink` object.
-##'
-##' The imputed protein data were taken from the same google drive folder
-##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_imputedNotBC.csv`).
-##' The data were formatted to a [SingleCellExperiment] object and the sample
-##' metadata were matched to the column names (mapping is retrieved
-##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
-##' stored in the `colData`. The object is then added to the [QFeatures] object
-##' and the rows of the peptide data are linked to the rows of the protein data
-##' based on the protein sequence information through an `AssayLink` object.
-##'
-##' The unimputed protein data were taken from the same google drive folder
-##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_unimputed.csv`).
-##' The data were formatted and added exactly as imputed data.
-##'
-##' @source
-##' The data were downloaded from the
-##' [Slavov Lab](https://scp.slavovlab.net/Khan_et_al_2023) website via a
-##' shared Google Drive
-##' [folder](https://drive.google.com/drive/folders/1zCsRKWNQuAz5msxx0DfjDrIe6pUjqQmj).
-##' The raw data and the quantification data can also be found in the
-##' MassIVE repository `MSV000092872`:
-##' ftp://MSV000092872@massive.ucsd.edu/.
-##'
-##' @references
-##' Saad Khan, Rachel Conover, Anand R. Asthagiri, Nikolai Slavov. 2023.
-##' "Dynamics of single-cell protein covariation during epithelial–mesenchymal
-##' transition." bioRxiv.
-##' ([link to article](https://doi.org/10.1101/2023.12.21.572913)).
-##'
-##' @examples
-##' \donttest{
-##' khan2023()
-##' }
-##'
-##' @keywords datasets
-##'
-"khan2023"
-
-
-####---- guise2024 ----####
-
-
-##' Guise et al. 2020 (Cell Rep.): postmortem ALS spinal moto neurons
-##'
-##' Single-cell proteomics data from postmortem human spinal moto
-##' neurons (MN) obtained from control donors or donors with amyotrophic
-##' lateral sclerosis (ALS). The data were generated following the
-##' NanoPOTS protocol. Cells were isolated from samples obtained by
-##' the university of Miami Brain Bank using laser capture
-##' microdissection (LCM). Additional information about the amount of
-##' TDP-43 intra-cellular levels has been assigned into levels 0 to 4.
-##'
-##' @format A [QFeatures] object with 102 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - `F*`: 100 assays containing PSM data.
-##' - `peptides`: quantitative data for 34,315 peptides in 108 samples.
-##'   All samples combined, along with 8 additional unannotated
-##'   samples.
-##' - `proteins`: quantitative data for 4,437 protein groups in 108
-##'   samples. All samples combined, along with 8 additional
-##'   unannotated samples.
-##'
-##' Sample annotation is stored in `colData(guise2024())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Cell isolation**: The MN were isolated from samples obtained
-##'   by the university of Miami Brain Bank using LCM.
-##' - **Sample preparation** performed using the nanoPOTS workflow.
-##'   Cells are treated with 0.1% DDM (for lysis) added with DTT
-##'   (protein reduction), then IAA (alkylation), then Lys-C and
-##'   trypsin (protein digestion).
-##' - **Separation**: Samples were injected on the column using an
-##'   Ultimate 3000 RSLCnano pump. The in-line loading column is a
-##'   home-packed SPE column (5cm x 75um) while the peptide
-##'   separation is performed on a an in-house-packed analytical SPE
-##'   column (50 cm x 30um), using a 20nL/min flow rate.
-##' - **Ionization**: nanospray emmitter (2,000V)
-##' - **Mass spectrometry**: Orbitrap Exploris 480. HCD fragmentation.
-##'   MS1 settings: accumulation time = 200 ms; resolution = 120,000;
-##'   AGC = 1E6. MS2 settings: exclusion duration = 90 s;
-##'   accumulation time = 500 ms; resolution = 30,000; AGC = 1E5.
-##' - **Data analysis**: Sequest HT in Proteome Discoverer (v2.5) and
-##'   the search database is Swiss-Prot (July 2020).
-##'
-##' @section Data collection:
-##'
-##' All data were collected from the MassIVE repository (accession ID:
-##' MSV000092119).
-##'
-##' The sample annotations were combined from the tables in
-##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_InputFiles.txt` and in
-##' `Groups.txt`.
-##'
-##' The PSM data were found in the
-##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_PSMs.txt` file. The
-##' data were converted to a [QFeatures] object using the [scp::readSCP()]
-##' function. We could not find sample annotations for MS run ID:
-##' F61, F34, F42, F88, F77, F8, F21, F5.
-##'
-##' The peptide data were found in the
-##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_PeptideGroups.txt`
-##' file. The column names holding the quantitative data were adapted
-##' to match the sample names in the [QFeatures] object. The data were
-##' then converted to a [SingleCellExperiment] object and then
-##' inserted in the [QFeatures] object.
-##'
-##' A similar procedure was applied to the protein data. The data were
-##' found in the
-##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_Proteins.txt` file. The
-##'  column names were
-##' adapted, the data were converted to a [SingleCellExperiment]
-##' object and then inserted in the [QFeatures] object.
-##'
-##' @source
-##'
-##' All data can be downloaded from the MassIVE repository
-##' MSV000092119. The source link is:
-##' ftp://massive.ucsd.edu/v05/MSV000092119/
-##'
-##' @references
-##'
-##' Guise, Amanda J., Santosh A. Misal, Richard Carson, Jen-Hwa Chu,
-##' Hannah Boekweg, Daisha Van Der Watt, Nora C. Welsh, et al. 2024.
-##' “TDP-43-Stratified Single-Cell Proteomics of Postmortem Human
-##' Spinal Motor Neurons Reveals Protein Dynamics in Amyotrophic
-##' Lateral Sclerosis.” Cell Reports 43 (1): 113636.
-##' ([link to article](http://dx.doi.org/10.1016/j.celrep.2023.113636)).
-##'
-##' @examples
-##' \donttest{
-##' guise2024()
-##' }
-##'
-##' @keywords datasets
-##'
-"guise2024"
-
-####---- petrosius2023_mES ----####
-
-##' Petrosius et al, 2023 (Nat. Comm.): Mouse embryonic stem cell (mESC) in
-##' different culture conditions
-##'
-##' @description
-##' Profiling mouse embryonic stem cells across ground-state (m2i) and
-##' differentiation-permissive (m15) culture conditions. The data were
-##' acquired using orbitrap-based data-independent acquisition (DIA).
-##' The objective was to demonstrate the capability of their approach
-##' by profiling mouse embryonic stem cell culture conditions, showcasing
-##' heterogeneity in global proteomes, and highlighting differences in
-##' the expression of key metabolic enzymes in distinct cell subclusters.
-##'
-##' @format A [QFeatures] object with 605 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assay 1-603: PSM data acquired with an orbitrap-based data-independent
-##'   acquisition (DIA) protocol, hence those assays contain single column
-##'   that contains the quantitative information.
-##' - `peptides`: peptide data containing quantitative data for 9884
-##'   peptides and 603 single-cells.
-##' - `proteins`: protein data containing quantitative data for 4270
-##'   proteins and 603 single-cells.
-##'
-##' Sample annotation is stored in `colData(petrosius2023_mES())`.
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see `References`).
-##'
-##' - **Sample isolation**: Cell sorting was done on a Sony MA900 cell sorter
-##'   using a 130 microm sorting chip. Cells were sorted at single-cell resolution,
-##'   into a 384-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 1 microL
-##'   of lysis buffer.
-##' - **Sample preparation**: Single-cell protein lysates were digested with
-##'   2 ng of Trypsin (Sigma cat. Nr. T6567) supplied in 1 microL of digestion
-##'   buffer (100mM TEAB pH 8.5, 1:5000 (v/v) benzonase (Sigma cat. Nr. E1014)).
-##'   The digestion was carried out overnight at 37 °C, and subsequently
-##'   acidified by the addition of 1 microL 1% (v/v) trifluoroacetic acid (TFA).
-##'   All liquid dispensing was done using an I-DOT One instrument (Dispendix).
-##' - **Liquid chromatography**: The Evosep one liquid chromatography system was
-##'   used for DIA isolation window survey and HRMS1-DIA experiments.The standard
-##'   31 min or 58min pre-defined Whisper gradients were used, where peptide
-##'   elution is carried out with 100 nl/min flow rate. A 15 cm × 75 microm
-##'   ID column (PepSep) with 1.9 microm C18 beads (Dr. Maisch, Germany) and a 10
-##'   microm ID silica electrospray emitter (PepSep) was used. Both LC systems were
-##'   coupled online to an orbitrap Eclipse TribridMass Spectrometer
-##'   (ThermoFisher Scientific) via an EasySpray ion source connected to a
-##'   FAIMSPro device.
-##' - **Mass spectrometry**: The mass spectrometer was operated in positive
-##'   mode with the FAIMSPro interface compensation voltage set to -45 V.
-##'   MS1 scans were carried out at 120,000 resolution with an automatic gain
-##'   control (AGC) of 300% and maximum injection time set to auto. For the DIA
-##'   isolation window survey a scan range of 500–900 was used and 400–1000
-##'   rest of the experiments. Higher energy collisional dissociation (HCD) was
-##'   used for precursor fragmentation with a normalized collision energy (NCE)
-##'   of 33% and MS2 scan AGC target was set to 1000%.
-##' - **Raw data processing**: The mESC raw data files were processed with
-##'   Spectronaut 17 and protein abundance tables exported and analyzed further
-##'   with python.
-##'
-##' @section Data collection:
-##'
-##' The data were provided by the Author and is accessible at the
-##' [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
-##' The folder ('20240205_111248_mESC_SNEcombine_m15-m2i/') contains
-##' the following files of interest:
-##'
-##' - `20240205_111251_PEPQuant (Normal).tsv`: the PSM level data
-##' - `20240205_111251_Peptide Quant (Normal).tsv`: the peptide level data
-##' - `20240205_111251_PGQuant (Normal).tsv`: the protein level data
-##'
-##' The metadata were downloaded from the [Zenodo repository](https://zenodo.org/records/8146605).
-##'
-##' - `sample_facs.csv`: the metadata
-##'
-##' We formatted the quantification table so that columns match with the
-##' metadata. Then, both tables are then combined in a single
-##' [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' The peptide data were formated to a [SingleCellExperiment] object and the
-##' sample metadata were matched to the column names and stored in the `colData`.
-##' The object is then added to the [QFeatures] object and the rows of the PSM
-##' data are linked to the rows of the peptide data based on the peptide sequence
-##' information through an `AssayLink` object.
-##'
-##' The protein data were formated to a [SingleCellExperiment] object and
-##' the sample metadata were matched to the column names and stored in the
-##' `colData`. The object is then added to the [QFeatures] object and the rows
-##' of the peptide data are linked to the rows of the protein data based on the
-##' protein sequence information through an `AssayLink` object.
-##'
-##' @source The peptide and protein data can be downloaded from the
-##'     [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
-##'     The raw data and the quantification data can also be found in
-##'     the MassIVE repository `MSV000092429`:
-##'     ftp://MSV000092429@massive.ucsd.edu/.
-##'
-##' @references
-##' **Source article**: Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al.
-##' "Exploration of cell state heterogeneity using single-cell proteomics
-##' through sensitivity-tailored data-independent acquisition."
-##' Nat Commun 14, 5910 (2023).
-##' ([link to article](https://doi.org/10.1038/s41467-023-41602-1)).
-##'
-##' @examples
-##' \donttest{
-##' petrosius2023_mES()
-##' }
-##'
-##' @keywords datasets
-##'
-"petrosius2023_mES"
-
-####---- petrosius2023_AstralAML ----####
-
-##' Petrosius et al. 2023 (bioRxiv): AML hierarchy on Astral.
-##'
-##' Single cell proteomics data from FACS sorted cells from the
-##' OCI-AML8227 model. The dataset contains leukemic stem cells (LSC;
-##' CD34+, CD38-), progenitor cells (CD34+, CD38+), CD38+ blasts
-##' (CD34-, CD38+) and CD38- blasts (CD34-, CD38-). It contains
-##' quantitative information at PSM, peptide and protein levels. Data
-##' was acquired using an Orbitrap Astral mass spectrometer. Direct DIA
-##' analysis was performed with Spectronaut version 17.
-##'
-##' @format A [QFeatures] object with 217 assays, each assay being a
-##' [SingleCellExperiment] object:
-##'
-##' - Assays 1-215: PSM data from the Spectronaut PEPQuant file with
-##'   LFQ quantities from the FG.MS1Quantity column.
-##' - `peptides`: Peptide data resulting from the PSM to peptide
-##'   aggregation the 215 PSM assays. Resulting peptide assays were
-##'   joined into a single assay.
-##' - `proteins`: Protein data from the Spectronaut PGQuant file with
-##'   LFQ quantities from the PG.Quantity column.
-##'
-##' The `colData(petrosius2023_AstralAML())` contains cell type annotation, batch
-##' annotation and FACS data. The description of the `rowData` fields
-##' can be found in the [`Spectronaut` user manual](https://biognosys.com/content/uploads/2023/03/Spectronaut-17_UserManual.pdf).
-##'
-##' @section Acquisition protocol:
-##'
-##' The data were acquired using the following setup. More information
-##' can be found in the source article (see *References*).
-##'
-##' - **Cell isolation**: Cell sorting was done on a FACS Aria III or
-##'   Aria II instrument, controlled by the DIVA software package and
-##'   operated with a 100 microm nozzle. Cells were sorted at single-cell
-##'   resolution, into a 384-well Eppendorf LoBind PCR plate containing
-##'   1 microL of lysis buffer.
-##' - **Sample preparation** Single-cell protein lysates were digested
-##'   overnight at 37°C with 2 ng of Trypsin supplied in 1 microL of
-##'   digestion buffer. Digestion was stopped by the addition of 1 microL
-##'   1% (v/v) trifluoroacetic acid (TFA). All liquid dispensing was
-##'   done using an I-DOT One instrument.
-##' - **Liquid chromatography**: Chromatographic separation of peptides
-##'   was conducted on a vanquish Neo UHPLC system connected to a 50 cm
-##'   uPAC Neo Low-load and an EASY-spray. Autosampler and injection
-##'   valves were configured to perform direct injections from a 384
-##'   well plate using a 25 uL injection loop on 11.8 min gradients.
-##' - **Mass spectrometry**: Acquisition was conducted with an Orbitrap
-##'   Astral mass spectrometer operated in positive mode with the
-##'   FAIMSPro interface compensation voltage set to -45 V.
-##'   MS1 scans were acquired with the Orbitrap at a resolution of
-##'   120,000 and a scan range of 400 to 900 m/z with normalized
-##'   automatic gain control (AGC) target of 300 % and maximum
-##'   injection time of 246 ms. Data independent acquisition of MS2
-##'   spectra was performed in the Astral using loop control set to 0.7
-##'   seconds per cycle with varying isolation window widths and
-##'   injection times. Fragmentation of precursor ions was performed
-##'   using higher energy collisional dissociation (HCD) using a
-##'   normalized collision energy (NCE) of 25 %. AGC target was set to
-##'   800 %.
-##' - **Raw data processing**: Raw files were processed using
-##'   Spectronaut version 17. Direct DIA analysis was performed in
-##'   pipeline mode. Pulsar searches were performed without fixed
-##'   modifications. N-terminal acetylation and methionine oxidation
-##'   were set as variable modifications. Quantification level was set
-##'   to MS1 and the quantity type set to area under the curve.
-##'
-##' @section Data collection:
-##'
-##' The data were provided by the authors and is accessible at the
-##' [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
-##' The dataset ('Astral AML single-cell data from Petrosius et
-##' al. 2023 preprint') contains the following files of interest:
-##'
-##' - `20240201_130747_PEPQuant (Normal).tsv`: the PSM level data
-##' - `20240201_130747_PGQuant (Normal).tsv`: the protein level data
-##' - `index_map.csv`: FACS data.
-##' - `msRuns_overview.csv`: Sample annotations.
-##'
-##' We added the FACS data to the sample annotations in a single table.
-##' Both annotations and PSM features tables are then combined in a
-##' single [QFeatures] object using the [scp::readSCP()] function.
-##'
-##' The peptide data were obtained by aggregation of the PSM data to
-##' the peptide level. All of the resulting peptides assays were joined
-##' into a single assays. Individual peptides assays were discarded.
-##'
-##' The protein data were formatted from the `20240201_130747_PGQuant (Normal).tsv`
-##' to a [SingleCellExperiment] object and the sample metadata were
-##' matched to the column names and stored in the  `colData`. The
-##' object is then added to the [QFeatures] object and the rows of the
-##' peptide data are linked to the rows of the protein data based on
-##' the protein sequence information through an `AssayLink` object.
-##'
-##' Note that the [QFeatures] object has not been further processed and
-##' has therefore not been normalized, log-transformed or
-##' batch-corrected.
-##'
-##' @source The PSM data, protein data and sample annotations can be
-##'     downloaded from the dataset 'Astral AML single-cell data from
-##'     Petrosius et al. 2023 preprint' in the
-##'     [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT).
-##'
-##' @references
-##'
-##' Valdemaras Petrosius, Pedro Aragon-Fernandez, Tabiwang N. Arrey,
-##' Nil Üresin, Benjamin Furtwängler, Hamish Stewart, Eduard Denisov,
-##' Johannes Petzoldt, Amelia C. Peterson, Christian Hock, Eugen
-##' Damoc, Alexander Makarov, Vlad Zabrouskov, Bo T. Porse and Erwin
-##' M. Schoof.
-##' 2023. "Evaluating the capabilities of the Astral mass analyzer for single-cell proteomics."
-##' biorxiv.  https://doi.org/10.1101/2023.06.06.543943
-##' DOI:[10.1101/2023.06.06.543943](https://doi.org/10.1101/2023.06.06.543943)
-##'
-##' @examples
-##' \donttest{
-##' petrosius2023_AstralAML()
-##' }
-##'
-##' @keywords datasets
-##'
-"petrosius2023_AstralAML"
diff --git a/R/derks2022.R b/R/derks2022.R
new file mode 100644
index 0000000..e780350
--- /dev/null
+++ b/R/derks2022.R
@@ -0,0 +1,103 @@
+##' Derks et al. 2022 - plexDIA (Nat. Biotechnol.): PDAC vs melanoma
+##' cells vs monocytes
+##'
+##' Single cell proteomics data acquired by the Slavov Lab using the
+##' plexDIA protocol. It contains quantitative information from
+##' pancreatic ductal acinar cells (PDAC; HPAF-II), melanoma cells
+##' (WM989-A6-G3) and monocytes (U-937) at precursor and protein
+##' level. The each run acquired 3 samples thanks to mTRAQ
+##' multiplexing.
+##'
+##' @format A [QFeatures] object with 66 assays, each assay being a
+##' [SingleCellExperiment] object. The assays either hold the DIA-NN
+##' main output report table or the DIA-NN MS1 extracted signal table.
+##' The DIA-NN main output report table contains the results of the spectrum
+##' identification and quantification. The DIA-NN MS1 extracted
+##' signal table contains quantification for all mTRAQ channels if its
+##' precursors was identified in at least one of the channels,
+##' regardless of whether there is sufficient evidence in those
+##' channels at 1% FDR.
+##'
+##' The data is composed of three datasets
+##'
+##' 1. **Bulk**: dataset containing bulk (100-cell) data acquired
+##'    using a Q-Exactive mass spectrometer. Assays 1-3 contain data
+##'    from the DIA-NN main output report; assay 4 is the DIA-NN MS1
+##'    extracted signal.
+##' 2. **tims**: dataset containing single-cell data acquired using a
+##'    timsTOF-SCP mass spectrometer. Assays 5-15  contain data
+##'    from the DIA-NN main output report; assay 16 is the DIA-NN MS1
+##'    extracted signal.
+##' 3. **qe**: dataset containing single-cell data acquired
+##'    using a Q-Exactive mass spectrometer. Assays 17-64 contain data
+##'    from the DIA-NN main output report; assay 65 is the DIA-NN MS1
+##'    extracted signal.
+##'
+##' The last assay `proteins` contains the processed protein data
+##' table generated by the authors.
+##'
+##' The `colData(derks2022())` contains cell type annotations and
+##' batch annotations. The description of the `rowData` fields for the
+##' different assays can be found in the
+##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: CellenONE cell sorting.
+##' - **Sample preparation** performed using the improved SCoPE2
+##'   protocol using the CellenONE liquid handling system. nPOP cell
+##'   lysis (DMSO) + trypsin digestion + mTRAQ (3plex) labelling and
+##'   pooling. A target library was generated as well to
+##'   perform prioritized DDA (Huffman et al. 2022) using MaxQuant.Live
+##'   (2.0.3).
+##' - **Separation**: `bulk` - online nLC (Dionex UltiMate 3000 UHPLC)
+##'   with a 25 cm × 75 µm IonOpticks Aurora Series UHPLC column
+##'   (AUR2-25075C18A), 200nL/min. `qe` - online nLC (Dionex UltiMate
+##'   3000 UHPLC) with a 15 cm × 75 µm IonOpticks Aurora Series UHPLC
+##'   column (AUR2-15075C18A), 200nL/min. `tims` - nanoElute liquid
+##'   chromatography system (Bruker Daltonics) using a 25 cm × 75 µm,
+##'   1.6-µm C18 (AUR2-25075C18A-CSI, IonOpticks).
+##' - **Ionization**: ESI.
+##' - **Mass spectrometry**: cf article.
+##' - **Data analysis**: DIA-NN (1.8.1 beta 16).
+##'
+##' @section Data collection:
+##'
+##' The data were collected from a shared Google Drive
+##' [folder](https://drive.google.com/drive/folders/1pUC2zgXKtKYn22mlor0lmUDK0frgwL)
+##' that is accessible from the SlavovLab website (see `Source` section).
+##'
+##' For each dataset separately, we combined the sample annotation
+##' and the DIANN tables in a [QFeatures] object following the `scp`
+##' data structure. We then combined the three datasets in a single
+##' `QFeatures` object. We load the proteins table processed by the
+##' authors as a [SingleCellExperiment] object and adapted the sample
+##' names to match those in the `QFeatures` object. We added the
+##' protein data as a new assay and link the precursors to proteins
+##' using the `Protein.Group` variable from the `rowData`.
+##'
+##' @source
+##' The data were downloaded from the
+##' [Slavov Lab](https://scp.slavovlab.net/Derks_et_al_2022) website.
+##' The raw data and the quantification data can also be found in the
+##' massIVE repository `MSV000089093`.
+##'
+##' @references
+##' Derks, Jason, Andrew Leduc, Georg Wallmann, R. Gray Huffman,
+##' Matthew Willetts, Saad Khan, Harrison Specht, Markus Ralser,
+##' Vadim Demichev, and Nikolai Slavov. 2022. "Increasing the
+##' Throughput of Sensitive Proteomics by plexDIA." Nature
+##' Biotechnology, July.
+##' [Link to article](http://dx.doi.org/10.1038/s41587-022-01389-w)
+##'
+##' @examples
+##' \donttest{
+##' derks2022()
+##' }
+##'
+##' @keywords datasets
+##'
+"derks2022"
diff --git a/R/dou2019_boosting.R b/R/dou2019_boosting.R
new file mode 100644
index 0000000..3ce9dd9
--- /dev/null
+++ b/R/dou2019_boosting.R
@@ -0,0 +1,123 @@
+##' Dou et al. 2019 (Anal. Chem.): testing boosting ratios
+##'
+##' @description
+##'
+##' Single-cell proteomics using nanoPOTS combined with TMT isobaric
+##' labeling. It contains quantitative information at PSM and protein
+##' level. The cell types are either "Raw" (macrophage cells), "C10"
+##' (epihelial cells), or "SVEC" (endothelial cells). Each cell is
+##' replicated 2 or 3 times. Each cell type was run using 3 levels of
+##' boosting: 0 ng (no boosting), 5 ng or 50 ng. When boosting was
+##' applied, 1 reference well and 1 boosting well were added,
+##' otherwise 1 empty well was added. Each boosting setting (0ng, 5ng,
+##' 50ng) was run in duplicate.
+##'
+##' @format A [QFeatures] object with 7 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `Boosting_X_run_Y`: PSM data with 10 columns corresponding to
+##'   the TMT-10plex channels. The `X` indicates the boosting amount
+##'   (0ng, 5ng or 50ng) and `Y` indicates the run number (1 or 2).
+##' - `peptides`: peptide data containing quantitative data for 13,462
+##'   peptides in 60 samples (run 1 and run 2 combined).
+##' - `proteins`: protein data containing quantitative data for 1436
+##'   proteins and 60 samples (all runs combined).
+##'
+##' Sample annotation is stored in `colData(dou2019_boosting())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: single-cells from the three murine cell
+##'   lines were isolated using FACS (BD Influx II cell sorter ).
+##'   Boosting sample were prepared (presumably in bulk) from 1:1:1
+##'   mix of the three cell lines.
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
+##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
+##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
+##'   50cm x 30um LC columns; 50nL/min)
+##' - **Ionization**: ESI (2,000V)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
+##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
+##'   60,000; MS2 AGC = 1E5)
+##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
+##'   (custom R package)
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from the MassIVE repository
+##' MSV000084110 (see `Source` section). The downloaded files are:
+##'
+##' - `Boosting_*ng_run_*_msgfplus.mzid`: the MS-GF+ identification
+##'   result files.
+##' - `Boosting_*ng_run_*_ReporterIons.txt`: the MASIC quantification
+##'   result files.
+##'
+##' For each batch, the quantification and identification data were
+##' combined based on the scan number (common to both data sets). The
+##' combined datasets for the different runs were then concatenated
+##' feature-wise. To avoid data duplication due to ambiguous matching
+##' of spectra to peptides or ambiguous mapping of peptides to proteins,
+##' we combined ambiguous peptides to peptides groups and proteins to
+##' protein groups. Feature annotations that are not common within a
+##' peptide or protein group are are separated by a `;`. The sample
+##' annotation table was manually created based on the available
+##' information provided in the article. The data were then converted
+##' to a [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' We generated the peptide data. First, we removed PSM matched to
+##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
+##' the PSM to peptides based on the peptide (or peptide group)
+##' sequence(s) using the median PSM instenity. The peptide data for
+##' the different runs were then joined in a single assay (see
+##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
+##' We then removed the peptide groups. Links between the peptide and
+##' the PSM data were created using [QFeatures::addAssayLink]. Note
+##' that links between PSM and peptide groups are not stored.
+##'
+##' The protein data were downloaded from `Supporting information`
+##' section from the publisher's website (see `Sources`). The data is
+##' supplied as an Excel file `ac9b03349_si_004.xlsx`. The file
+##' contains 7 sheets from which we took the 2nd, 4th and 6th sheets
+##' (named `01 - No Boost raw data`, `03 - 5ng boost raw data`,
+##' `05 - 50ng boost raw data`, respectively). The sheets contain the
+##' combined protein data for the duplicate runs given the boosting
+##' amount. We joined the data for all boosting ration based on the
+##' protein name and converted the data to a [SingleCellExperiment]
+##' object. We then added the object as a new assay in the [QFeatures]
+##' dataset (containing the PSM data). Links between the proteins and
+##' the corresponding PSM were created. Note that links to protein
+##' groups are not stored.
+##'
+##' @source
+##' The PSM data can be downloaded from the massIVE repository
+##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
+##'
+##' The protein data can be downloaded from the
+##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
+##' website (Supporting information section).
+##'
+##' @references
+##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
+##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
+##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
+##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
+##' September
+##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
+##'
+##' @seealso
+##' [dou2019_lysates], [dou2019_mouse]
+##'
+##' @examples
+##' \donttest{
+##' dou2019_boosting()
+##' }
+##'
+##' @keywords datasets
+##'
+##'
+"dou2019_boosting"
diff --git a/R/dou2019_lysates.R b/R/dou2019_lysates.R
new file mode 100644
index 0000000..cabbb65
--- /dev/null
+++ b/R/dou2019_lysates.R
@@ -0,0 +1,119 @@
+##' Dou et al. 2019 (Anal. Chem.): HeLa lysates
+##'
+##' @description
+##'
+##' Single-cell proteomics using nanoPOTS combined with TMT
+##' multiplexing. It contains quantitative information at PSM and
+##' protein level. The samples are commercial Hela lysates diluted to
+##' single-cell amounts (0.2 ng). The boosting wells contain the same
+##' digest but at higher amount (10 ng).
+##'
+##' @format A [QFeatures] object with 3 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `Hela_run_1`: PSM data with 10 columns corresponding to the
+##'   TMT-10plex channels. Columns hold quantitative information for
+##'   HeLa lysate samples (either 0, 0.2 or 10ng). This is the data
+##'   for run 1.
+##' - `Hela_run_1`: PSM data with 10 columns corresponding to the
+##'   TMT-10plex channels. Columns hold quantitative information for
+##'   HeLa lysate samples (either 0, 0.2 or 10ng). This is the data
+##'   for run 2.
+##' - `peptides`: peptide data containing quantitative data for 13,934
+##'   peptides in 20 samples (run 1 and run 2 combined).
+##' - `proteins`: protein data containing quantitative data for 1641
+##'   proteins in 20 samples (run 1 and run 2 combined).
+##'
+##' Sample annotation is stored in `colData(dou2019_lysates())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: commercially available HeLa protein digest
+##'   (Thermo Scientific).
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
+##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
+##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
+##'   50cm x 30um LC columns; 50nL/min)
+##' - **Ionization**: ESI (2,000V)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
+##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
+##'   60,000; MS2 AGC = 1E5)
+##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
+##'   (custom R package)
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from the MassIVE repository
+##' MSV000084110 (see `Source` section). The downloaded files are:
+##'
+##' - `Hela_run_*_msgfplus.mzid`: the MS-GF+ identification result
+##'   files
+##' - `Hela_run_*_ReporterIons.txt`: the MASIC quantification result
+##'   files
+##'
+##' For each batch, the quantification and identification data were
+##' combined based on the scan number (common to both data sets). The
+##' combined datasets for the different runs were then concatenated
+##' feature-wise. To avoid data duplication due to ambiguous matching
+##' of spectra to peptides or ambiguous mapping of peptides to proteins,
+##' we combined ambiguous peptides to peptides groups and proteins to
+##' protein groups. Feature annotations that are not common within a
+##' peptide or protein group are are separated by a `;`. The sample
+##' annotation table was manually created based on the available
+##' information provided in the article. The data were then converted
+##' to a [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' We generated the peptide data. First, we removed PSM matched to
+##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
+##' the PSM to peptides based on the peptide (or peptide group)
+##' sequence(s) using the median PSM instenity. The peptide data for
+##' the different runs were then joined in a single assay (see
+##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
+##' We then removed the peptide groups. Links between the peptide and
+##' the PSM data were created using [QFeatures::addAssayLink]. Note
+##' that links between PSM and peptide groups are not stored.
+##'
+##' The protein data were downloaded from `Supporting information`
+##' section from the publisher's website (see `Sources`). The data is
+##' supplied as an Excel file `ac9b03349_si_003.xlsx`. The file
+##' contains 7 sheets from which we only took the sheet 6 (named
+##' `5 - Run 1 and 2 raw data`) with the combined protein data for the
+##' two runs. We converted the data to a [SingleCellExperiment]
+##' object and added the object as a new assay in the [QFeatures]
+##' dataset (containing the PSM data). Links between the proteins and
+##' the peptides were created. Note that links to protein groups are
+##' not stored.
+##'
+##' @source
+##' The PSM data can be downloaded from the massIVE repository
+##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
+##'
+##' The protein data can be downloaded from the
+##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
+##' website (Supporting information section).
+##'
+##' @references
+##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
+##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
+##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
+##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
+##' September
+##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
+##'
+##' @seealso
+##' [dou2019_mouse], [dou2019_boosting]
+##'
+##' @examples
+##' \donttest{
+##' dou2019_lysates()
+##' }
+##'
+##' @keywords datasets
+##'
+##'
+"dou2019_lysates"
diff --git a/R/dou2019_mouse.R b/R/dou2019_mouse.R
new file mode 100644
index 0000000..0afab3b
--- /dev/null
+++ b/R/dou2019_mouse.R
@@ -0,0 +1,126 @@
+##' Dou et al. 2019 (Anal. Chem.): murine cell lines
+##'
+##' @description
+##'
+##' Single-cell proteomics using nanoPOTS combined with TMT isobaric
+##' labeling. It contains quantitative information at PSM and protein
+##' level. The cell types are either "Raw" (macrophage cells), "C10"
+##' (epihelial cells), or "SVEC" (endothelial cells). Out of the 132
+##' wells, 72 contain single cells, corresponding to 24 C10 cells, 24
+##' RAW cells, and 24 SVEC. The other wells are either boosting
+##' channels (12), empty channels (36) or reference channels (12).
+##' Boosting and reference channels are balanced (1:1:1) mixes of C10,
+##' SVEC, and RAW samples at 5 ng and 0.2 ng, respectively. The
+##' different cell types where evenly distributed across 4 nanoPOTS
+##' chips. Samples were 11-plexed with TMT labeling.
+##'
+##' @format A [QFeatures] object with 13 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `Single_Cell_Chip_X_Y`: PSM data with 11 columns corresponding
+##'   to the TMT channels (see `Notes`). The `X` indicates the chip
+##'   number (from 1 to 4) and `Y` indicates the row name on the chip
+##'   (from A to C).
+##' - `peptides`: peptide data containing quantitative data for 15,492
+##'   peptides in 132 samples (run 1 and run 2 combined).
+##' - `proteins`: protein data containing quantitative data for 2331
+##'   proteins in 132 samples (all runs combined).
+##'
+##' Sample annotation is stored in `colData(dou2019_mouse())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: single-cells from the three murine cell
+##'   lines were isolated using FACS (BD Influx II cell sorter ).
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction (DMM + TCEAP) + alkylation (IAA) + Lys-C
+##'   digestion + trypsin digestion + TMT-10plex labeling and pooling.
+##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
+##'   50cm x 30um LC columns; 50nL/min)
+##' - **Ionization**: ESI (2,000V)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid (MS1 accumulation time = 50ms; MS1 resolution = 120,000;
+##'   MS1 AGC = 1E6; MS2 accumulation time = 246ms; MS2 resolution =
+##'   60,000; MS2 AGC = 1E5)
+##' - **Data analysis**: MS-GF+ + MASIC (v3.0.7111) + RomicsProcessor
+##'   (custom R package)
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from the MassIVE repository
+##' MSV000084110 (see `Source` section). The downloaded files are:
+##'
+##'
+##' - `Single_Cell_Chip_*_*_msgfplus.mzid`: the MS-GF+ identification
+##'   result files.
+##' - `Single_Cell_Chip_*_*_ReporterIons.txt`: the MASIC
+##'   quantification result files.
+##'
+##' For each batch, the quantification and identification data were
+##' combined based on the scan number (common to both data sets). The
+##' combined datasets for the different runs were then concatenated
+##' feature-wise. To avoid data duplication due to ambiguous matching
+##' of spectra to peptides or ambiguous mapping of peptides to proteins,
+##' we combined ambiguous peptides to peptides groups and proteins to
+##' protein groups. Feature annotations that are not common within a
+##' peptide or protein group are are separated by a `;`. The sample
+##' annotation table was manually created based on the available
+##' information provided in the article. The data were then converted
+##' to a [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' We generated the peptide data. First, we removed PSM matched to
+##' contaminants or decoy peptides and ensured a 1% FDR. We aggregated
+##' the PSM to peptides based on the peptide (or peptide group)
+##' sequence(s) using the median PSM instenity. The peptide data for
+##' the different runs were then joined in a single assay (see
+##' [QFeatures::joinAssays]), again based on the peptide sequence(s).
+##' We then removed the peptide groups. Links between the peptide and
+##' the PSM data were created using [QFeatures::addAssayLink]. Note
+##' that links between PSM and peptide groups are not stored.
+##'
+##' The protein data were downloaded from `Supporting information`
+##' section from the publisher's website (see `Sources`). The data is
+##' supplied as an Excel file `ac9b03349_si_005.xlsx`. The file
+##' contains 7 sheets from which we only took the 2nd (named
+##' `01 - Raw sc protein data`) with the combined protein data for the
+##' 12 runs. We converted the data to a [SingleCellExperiment] object
+##' and added the object as a new assay in the [QFeatures] dataset
+##' (containing the PSM data). Links between the proteins and the
+##' corresponding PSM were created. Note that links to protein groups
+##' are not stored.
+##'
+##' @note Although a TMT-10plex labeling is reported in the article,
+##' the PSM data contained 11 channels for each run. Those 11th
+##' channel contain mostly missing data and are hence assumed to be
+##' empty channels.
+##'
+##' @source
+##' The PSM data can be downloaded from the massIVE repository
+##' MSV000084110. FTP link: ftp://massive.ucsd.edu/MSV000084110/
+##'
+##' The protein data can be downloaded from the
+##' [ACS Publications](https://pubs.acs.org/doi/10.1021/acs.analchem.9b03349)
+##' website (Supporting information section).
+##'
+##' @references
+##' Dou, Maowei, Geremy Clair, Chia-Feng Tsai, Kerui Xu, William B.
+##' Chrisler, Ryan L. Sontag, Rui Zhao, et al. 2019. “High-Throughput
+##' Single Cell Proteomics Enabled by Multiplex Isobaric Labeling in a
+##' Nanodroplet Sample Preparation Platform.” Analytical Chemistry,
+##' September
+##' ([link to article](https://doi.org/10.1021/acs.analchem.9b03349)).
+##'
+##' @seealso
+##' [dou2019_lysates], [dou2019_boosting]
+##'
+##' @examples
+##' \donttest{
+##' dou2019_mouse()
+##' }
+##'
+##' @keywords datasets
+##'
+"dou2019_mouse"
diff --git a/R/gregoire2023_mixCTRL.R b/R/gregoire2023_mixCTRL.R
new file mode 100644
index 0000000..a50d8f2
--- /dev/null
+++ b/R/gregoire2023_mixCTRL.R
@@ -0,0 +1,108 @@
+##' Grégoire et al. 2023 - mixCTRL (arXiv): benchmark using
+##' monocytes/macrophages
+##'
+##' Single cell proteomics data acquired using the SCoPE2 protocol.
+##' The dataset contains two monocytes cell lines (THP1 and U937) as
+##' well as controled mixtures of both and macrophage-like cells
+##' produced upon PMA treatment. It contains quantitative information
+##' at PSM, peptide and protein levels. Data was acquired using Lumos
+##' Orbitrap (mainly) and timsTOF SCP mass spectrometers.
+##'
+##' @format A [QFeatures] object with 119 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assays 1-42: PSM data acquired with a TMT-16plex protocol, hence
+##'   those assays contain 16 columns. Columns hold quantitative
+##'   information from single-cell channels, carrier channels,
+##'   blank (negative control) channels and unused channels.
+##' - Assays 43-84: peptide data resulting from the PSM to peptide
+##'   aggregation of the 42 PSM assays.
+##' - Assays 85-91: peptide data for each of the 7 acquisition
+##'   batches. Peptide data were joined based on their respective
+##'   acquisition batches.
+##' - Assays 92-98: normalised peptide data.
+##' - Assays 99-105: normalised and log-transformed peptide data.
+##' - Assays 106-112: protein data for each of the 7 acquisition
+##'   batches. Normalised and log-transformed peptide data were
+##'   agreggated to protein.
+##' - Assays 113-119: Batch corrected protein data. Normalised and
+##'   log-transformed protein data were batch corrected to remove
+##'   technical variability induced by runs and channels.
+##'
+##' All the data has been filtered to keep high quality features and
+##' samples.
+##'
+##' The `colData(gregoire2023_mixCTRL())` contains cell type annotation and
+##' batch annotation that are common to all assays. The description of
+##' the `rowData` fields for the PSM data can be found in the
+##' [`sage` documentation](https://sage-docs.vercel.app/docs/results/search).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see *References*).
+##'
+##' - **Cell isolation**: BD FACSAria III cell sorting.
+##' - **Sample preparation** performed using the SCoPE2 protocol: mPOP
+##'   cell lysis + trypsin digestion + TMT-16plex labeling and
+##'   pooling.
+##' - **Separation**: online nLC (Ultimate 3000 LC System or Vanquish
+##'   Neo UHPLC System) with a BioZen Peptide Polar C18 250 x 0.0075mm
+##'   column.
+##' - **Mass spectrometry**:  Orbitrap Fusion Lumos Tribrid (MS1
+##'   resolution = 70,000; MS2 accumulation time = 120ms; MS2
+##'   resolution = 70,000) and timsTOF SCP.
+##' - **Data preprocessing**: Sage.
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a Zenodo archive (see `Source`
+##' section). The folder contains the following files of interest:
+##'
+##' - `results.sage.cbio.tsv`: the sage identification output file for
+##'   batches acquired on the Lumos MS.
+##' - `results.sage.giga.tsv`: the sage identification output file for
+##'   batches acquired on the timsTOF SCP MS.
+##' - `quant.cbio.tsv`: the sage quantification output file for
+##'   batches acquired on the Lumos MS.
+##' - `quant.giga.tsv`: the sage quantification output file for
+##'   batches acquired on the timsTOF SCP MS.
+##' - `sampleAnnotation_batch.csv`: sample annotation for each
+##'   acquisition batch. There are in total 8 different annotation
+##'   files.
+##'
+##' We combined the sample annotations in a single table. We also
+##' combined `cbio` and `giga` tables together and merged resulting
+##' identification and quantification tables. Both annotation and
+##' features tables are then combined in a single [QFeatures] object
+##' using the [scp::readSCP()] function.
+##'
+##' The [QFeatures] object was processed as described in the author's
+##' manuscript (see `source`). Note that the imputed assays were used
+##' in the paper for illustrative purposes only and have not been
+##' reproduced here.
+##'
+##' @source
+##' The data were downloaded from the [Zenodo
+##' repository](https://zenodo.org/records/8417228).  The raw data and
+##' the quantification data can also be found in the ProteomeXchange
+##' Consortium via the [PRIDE partner
+##' repository](https://www.ebi.ac.uk/pride/archive/projects/PXD046211),
+##' project `PXD046211`.
+##'
+##' @references
+##' Samuel Grégoire, Christophe Vanderaa, Sébastien Pyr dit Ruys,
+##' Gabriel Mazzucchelli, Christopher Kune, Didier Vertommen and
+##' Laurent Gatto. 2023. *Standardised workflow for mass spectrometry-
+##' based single-cell proteomics data processing and analysis using
+##' the scp package.*
+##' arXiv. DOI:[10.48550/arXiv.2310.13598](https://doi.org/10.48550/arXiv.2310.13598)
+##'
+##' @examples
+##' \donttest{
+##' gregoire2023_mixCTRL()
+##' }
+##'
+##' @keywords datasets
+##'
+"gregoire2023_mixCTRL"
diff --git a/R/guise2024.R b/R/guise2024.R
new file mode 100644
index 0000000..cdfced9
--- /dev/null
+++ b/R/guise2024.R
@@ -0,0 +1,99 @@
+##' Guise et al. 2020 (Cell Rep.): postmortem ALS spinal moto neurons
+##'
+##' Single-cell proteomics data from postmortem human spinal moto
+##' neurons (MN) obtained from control donors or donors with amyotrophic
+##' lateral sclerosis (ALS). The data were generated following the
+##' NanoPOTS protocol. Cells were isolated from samples obtained by
+##' the university of Miami Brain Bank using laser capture
+##' microdissection (LCM). Additional information about the amount of
+##' TDP-43 intra-cellular levels has been assigned into levels 0 to 4.
+##'
+##' @format A [QFeatures] object with 102 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `F*`: 100 assays containing PSM data.
+##' - `peptides`: quantitative data for 34,315 peptides in 108 samples.
+##'   All samples combined, along with 8 additional unannotated
+##'   samples.
+##' - `proteins`: quantitative data for 4,437 protein groups in 108
+##'   samples. All samples combined, along with 8 additional
+##'   unannotated samples.
+##'
+##' Sample annotation is stored in `colData(guise2024())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The MN were isolated from samples obtained
+##'   by the university of Miami Brain Bank using LCM.
+##' - **Sample preparation** performed using the nanoPOTS workflow.
+##'   Cells are treated with 0.1% DDM (for lysis) added with DTT
+##'   (protein reduction), then IAA (alkylation), then Lys-C and
+##'   trypsin (protein digestion).
+##' - **Separation**: Samples were injected on the column using an
+##'   Ultimate 3000 RSLCnano pump. The in-line loading column is a
+##'   home-packed SPE column (5cm x 75um) while the peptide
+##'   separation is performed on a an in-house-packed analytical SPE
+##'   column (50 cm x 30um), using a 20nL/min flow rate.
+##' - **Ionization**: nanospray emmitter (2,000V)
+##' - **Mass spectrometry**: Orbitrap Exploris 480. HCD fragmentation.
+##'   MS1 settings: accumulation time = 200 ms; resolution = 120,000;
+##'   AGC = 1E6. MS2 settings: exclusion duration = 90 s;
+##'   accumulation time = 500 ms; resolution = 30,000; AGC = 1E5.
+##' - **Data analysis**: Sequest HT in Proteome Discoverer (v2.5) and
+##'   the search database is Swiss-Prot (July 2020).
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the MassIVE repository (accession ID:
+##' MSV000092119).
+##'
+##' The sample annotations were combined from the tables in
+##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_InputFiles.txt` and in
+##' `Groups.txt`.
+##'
+##' The PSM data were found in the
+##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_PSMs.txt` file. The
+##' data were converted to a [QFeatures] object using the [scp::readSCP()]
+##' function. We could not find sample annotations for MS run ID:
+##' F61, F34, F42, F88, F77, F8, F21, F5.
+##'
+##' The peptide data were found in the
+##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_PeptideGroups.txt`
+##' file. The column names holding the quantitative data were adapted
+##' to match the sample names in the [QFeatures] object. The data were
+##' then converted to a [SingleCellExperiment] object and then
+##' inserted in the [QFeatures] object.
+##'
+##' A similar procedure was applied to the protein data. The data were
+##' found in the
+##' `Biogen_TDP43_Round2_Reanalysis_10-13-2021_Proteins.txt` file. The
+##'  column names were
+##' adapted, the data were converted to a [SingleCellExperiment]
+##' object and then inserted in the [QFeatures] object.
+##'
+##' @source
+##'
+##' All data can be downloaded from the MassIVE repository
+##' MSV000092119. The source link is:
+##' ftp://massive.ucsd.edu/v05/MSV000092119/
+##'
+##' @references
+##'
+##' Guise, Amanda J., Santosh A. Misal, Richard Carson, Jen-Hwa Chu,
+##' Hannah Boekweg, Daisha Van Der Watt, Nora C. Welsh, et al. 2024.
+##' “TDP-43-Stratified Single-Cell Proteomics of Postmortem Human
+##' Spinal Motor Neurons Reveals Protein Dynamics in Amyotrophic
+##' Lateral Sclerosis.” Cell Reports 43 (1): 113636.
+##' ([link to article](http://dx.doi.org/10.1016/j.celrep.2023.113636)).
+##'
+##' @examples
+##' \donttest{
+##' guise2024()
+##' }
+##'
+##' @keywords datasets
+##'
+"guise2024"
diff --git a/R/khan2023.R b/R/khan2023.R
new file mode 100644
index 0000000..30da107
--- /dev/null
+++ b/R/khan2023.R
@@ -0,0 +1,110 @@
+##' Khan et al, 2023 (biorRxiv): Epithelial–Mesenchymal Transition
+##'
+##' @description
+##'
+##' Single-cell samples were prepared using the nPOP sample
+##' preparation method.  Proteomics data were acquired using the
+##' SCoPE2 protocol on a Thermo Scientific Q-Exactive mass
+##' spectrometer. The dataset contains quantitative information on 421
+##' MCF-10A single cells undergoing epithelial–mesenchymal transition
+##' (EMT) triggered by TGF beta. The data are available at the PSM,
+##' and protein levels. The paper investigates the dynamics of
+##' correlation modules at the protein level.
+##'
+##' @format A [QFeatures] object with 47 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-44: PSM data acquired with a TMTPro 16plex protocol, hence
+##'   those assays contain 16 columns. Columns hold quantitative information
+##'   from single-cell channels, carrier channels, reference channels,
+##'   empty (negative control) channels and unused channels.
+##' - `peptides`: peptide data containing quantitative data for 10055
+##'   peptides and 421 single-cells.
+##' - `proteins_imputed`: protein data containing quantitative data for 4096
+##'   proteins and 421 single-cells with k-nearest neighbors (KNN) imputation.
+##' - `proteins_unimputed`: protein data containing quantitative data for 4096
+##'   proteins and 421 single-cells without imputation.
+##'
+##' The `colData(khan2023())` contains cell type and batch annotations that
+##' are common to all assays. The description of the `rowData` fields for the
+##' PSM data can be found in the
+##' [`MaxQuant` documentation](https://cox-labs.github.io/coxdocs/output_tables.html).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: CellenONE cell sorting.
+##' - **Sample preparation** performed using the SCoPE2 protocol. nPOP
+##'   cell lysis (DMSO) + trypsin digestion + TMTPro 16plex protocol.
+##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
+##'   25cm x 75um IonOpticks Odyssey Series column (ODY3-25075C18); 200nL/min).
+##' - **Ionization**: ESI (1,700 V).
+##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
+##'   resolution = 70,000; MS1 accumulation time = 300ms; MS2
+##'   resolution = 70,000).
+##' - **Data analysis**: MaxQuant(2.4.13.0) + DART-ID.
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a shared Google Drive folder that
+##' is accessible from the SlavovLab website (see `Source` section).
+##' The folder ('/002-singleCellDataGeneration') contains the following
+##' files of interest:
+##'
+##' - `ev_updated_NS.DIA.txt`: the MaxQuant/DART-ID output file
+##' - `annotation.csv`: sample annotation
+##' - `batch.csv`: batch annotation
+##'
+##' We combined the sample annotation and the batch annotation in
+##' a single table. We also formatted the quantification table so that
+##' columns match with those of the annotation and filter only for
+##' single-cell runs. Both table are then combined in a single
+##' [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' The peptide data were taken from the same google drive folder
+##' (`EpiToMesen.TGFB.nPoP_trial1_pepByCellMatrix_NSThreshDART_medIntCrNorm.txt`).
+##' The data were formatted to a [SingleCellExperiment] object and the sample
+##' metadata were matched to the column names (mapping is retrieved
+##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
+##' stored in the `colData`. The object is then added to the [QFeatures] object
+##' and the rows of the PSM data are linked to the rows of the peptide data
+##' based on the peptide sequence information through an `AssayLink` object.
+##'
+##' The imputed protein data were taken from the same google drive folder
+##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_imputedNotBC.csv`).
+##' The data were formatted to a [SingleCellExperiment] object and the sample
+##' metadata were matched to the column names (mapping is retrieved
+##' after running the SCoPE2 R script, `EMTTGFB_singleCellProcessing.R`) and
+##' stored in the `colData`. The object is then added to the [QFeatures] object
+##' and the rows of the peptide data are linked to the rows of the protein data
+##' based on the protein sequence information through an `AssayLink` object.
+##'
+##' The unimputed protein data were taken from the same google drive folder
+##' (`EpiToMesen.TGFB.nPoP_trial1_ProtByCellMatrix_NSThreshDART_medIntCrNorm_unimputed.csv`).
+##' The data were formatted and added exactly as imputed data.
+##'
+##' @source
+##' The data were downloaded from the
+##' [Slavov Lab](https://scp.slavovlab.net/Khan_et_al_2023) website via a
+##' shared Google Drive
+##' [folder](https://drive.google.com/drive/folders/1zCsRKWNQuAz5msxx0DfjDrIe6pUjqQmj).
+##' The raw data and the quantification data can also be found in the
+##' MassIVE repository `MSV000092872`:
+##' ftp://MSV000092872@massive.ucsd.edu/.
+##'
+##' @references
+##' Saad Khan, Rachel Conover, Anand R. Asthagiri, Nikolai Slavov. 2023.
+##' "Dynamics of single-cell protein covariation during epithelial–mesenchymal
+##' transition." bioRxiv.
+##' ([link to article](https://doi.org/10.1101/2023.12.21.572913)).
+##'
+##' @examples
+##' \donttest{
+##' khan2023()
+##' }
+##'
+##' @keywords datasets
+##'
+"khan2023"
diff --git a/R/leduc2022_pSCoPE.R b/R/leduc2022_pSCoPE.R
new file mode 100644
index 0000000..0cf7b37
--- /dev/null
+++ b/R/leduc2022_pSCoPE.R
@@ -0,0 +1,129 @@
+##' Leduc et al. 2022 - pSCoPE (biorRxiv): melanoma cells vs monocytes
+##'
+##' Single cell proteomics data acquired by the Slavov Lab. This is
+##' the dataset associated to the third version of the preprint. It
+##' contains quantitative information of melanoma cells and monocytes
+##' at PSM, peptide and protein level. This version of the data was
+##' acquired using the pSCoPE MS acquisition approach.
+##'
+##' @format A [QFeatures] object with 138 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-134: PSM data acquired with a TMT-18plex protocol, hence
+##'   those assays contain 18 columns. Columns hold quantitative
+##'   information from single-cell channels, carrier channels,
+##'   reference channels, empty (negative control) channels and
+##'   unused channels.
+##' - `peptides`: peptide data containing quantitative data for 20,804
+##'   peptides and 1556 single-cells. These data have been filtered
+##'   to keep high-quality PSMs, all batches have been normalized to
+##'   the reference channel, PSMs were aggregated to peptides, and
+##'   single-cells with low median coefficient of variation were kept.
+##' - `peptides_log`: peptide data containing quantitative data for
+##'   12,284 peptides and 1543 single-cells. The `peptides` data was
+##'   further normalized, highly missing peptides were removed and the
+##'   quantifications were log-transformed.
+##' - `proteins_norm2`: protein data containing quantitative data for
+##'   2844 proteins and 1543 single-cells. The peptides from
+##'   `peptides_log` were aggregated to proteins and normalized.
+##' - `proteins_processed`: protein data containing quantitative data
+##'   for 2844 proteins and 1543 single-cells. The `proteins_norm2`
+##'   data were imputed, batch corrected and normalized.
+##'
+##' The `colData(leduc2022_pSCoPE())` contains cell type annotation,
+##' LC batch information, the TMT label, the MS run ID. We also added
+##' the sample prep annotations provided by the cellenONE dispensing
+##' device (only for single cells): time stamp of cell isolation by the
+##' device, the diameter and elongation of the cell, the ID of the
+##' sample glass side (4 slides in total), the field within the glass
+##' (each slide is divided in 4 field), the pooled well ID (each field
+##' contains 9 pools), the x and y coordinates of each cell dropped in
+##' a field and of each cell pool upon pickup. Finally, we also
+##' retrieved the melanoma subpopulation generated by the authors upon
+##' data analysis. The main population is encoded as `A` while the
+##' small population is encoded `B`. The description of the `rowData`
+##' fields for the PSM data can be found in the
+##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: CellenONE cell sorting.
+##' - **Sample preparation** performed using the improved SCoPE2
+##'   protocol using the CellenONE liquid handling system. nPOP cell
+##'   lysis (DMSO) + trypsin digestion + TMT-18plex
+##'   labeling and pooling. A target library was generated as well to
+##'   perform prioritized DDA (Huffman et al. 2022) using MaxQuant.Live
+##'   (2.0.3).
+##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
+##'   25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).
+##' - **Ionization**: ESI (1,800V).
+##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
+##'   resolution = 70,000; MS2 accumulation time = 300ms; MS2
+##'   resolution = 70,000). Prioritized data acquisition was performed
+##'   using the pSCoPE protocol (Huffman et al. 2022)
+##' - **Data analysis**: MaxQuant (1.6.17.0) + DART-ID
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a shared Google Drive folder that
+##' is accessible from the SlavovLab website (see `Source` section).
+##' The folder contains the following files of interest:
+##'
+##' - `ev_updated.txt`: the MaxQuant/DART-ID output file
+##' - `annotation.csv`: sample annotation
+##' - `batch.csv`: batch annotation
+##' - `t0.csv`: the processed data table containing the `peptides` data
+##' - `t3.csv`: the processed data table containing the `peptides_log`
+##'   data
+##' - `t4b.csv`: the processed data table containing the
+##'   `proteins_norm2` data
+##' - `t6.csv`: the processed data table containing the
+##'   `proteins_processed` data
+##'
+##' We combined the sample annotation and the batch annotation in
+##' a single table. We also formatted the quantification table so that
+##' columns match with those of the annotations. Both annotation and
+##' quantification tables are then combined in a single [QFeatures]
+##' object using the [scp::readSCP()] function.
+##'
+##' The 4 CSV files were loaded and formatted as [SingleCellExperiment]
+##' objects and the sample metadata were matched to the column names
+##' (mapping is retrieved after running the author's original R script)
+##' and stored in the `colData`.
+##' The object is then added to the [QFeatures] object (containing the
+##' PSM assays) and the rows of the peptide data are linked to the
+##' rows of the PSM data based on the peptide sequence information
+##' through an `AssayLink` object.
+##'
+##' @source
+##' The data were downloaded from the
+##' [Slavov Lab](https://scp.slavovlab.net/Leduc_et_al_2022) website.
+##' The raw data and the quantification data can also be found in the
+##' massIVE repository `MSV000089159`:
+##' ftp://massive.ucsd.edu/MSV000089159.
+##'
+##' @references
+##' Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet
+##' Sample Preparation for Single-Cell Proteomics Applied to the Cell
+##' Cycle.” bioRxiv. [Link to article](https://doi.org/10.1101/2021.04.24.441211)
+##'
+##' Gray Huffman, Andrew Leduc, Christoph Wichmann, Marco di Gioia,
+##' Francesco Borriello, Harrison Specht, Jason Derks, et al. 2022.
+##' “Prioritized Single-Cell Proteomics Reveals Molecular and
+##' Functional Polarization across Primary Macrophages.” bioRxiv.
+##' [Link to article](https://doi.org/10.1101/2022.03.16.484655).
+##'
+##' @seealso
+##' [leduc2022_plexDIA]
+##'
+##' @examples
+##' \donttest{
+##' leduc2022_pSCoPE()
+##' }
+##'
+##' @keywords datasets
+##'
+"leduc2022_pSCoPE"
diff --git a/R/leduc2022_plexDIA.R b/R/leduc2022_plexDIA.R
new file mode 100644
index 0000000..02e8783
--- /dev/null
+++ b/R/leduc2022_plexDIA.R
@@ -0,0 +1,121 @@
+##' Leduc et al. 2022 - plexDIA (biorRxiv): melanoma cells
+##'
+##' Single cell proteomics data acquired by the Slavov Lab. This is
+##' the dataset associated to the fourth version of the preprint (and
+##' the Genome Biology publication). It contains quantitative
+##' information of melanoma cells at precursor, peptide and protein level.
+##' This version of the data was acquired using the plexDIA MS
+##' acquisition protocol.
+##'
+##' @format A [QFeatures] object with 48 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-45: precursor data acquired with a mTRAQ-3 protocol,
+##'   hence those assays contain 3 columns. Columns hold quantitative
+##'   information from single cells or negative control samples.
+##' - `Ms1Extracted`: the DIA-NN MS1 extracted signal, it combines the
+##'   information from assays 1-45.
+##' - `peptides`: peptide data containing quantitative data for 3,608
+##'   peptides and 104 single cells. The  data were filtered  to  1%
+##'   protein FDR.
+##' - `proteins`: protein data containing quantitative data for 508
+##'   proteins and 105 single cells. Note that the peptide and protein
+##'   data provided by the authors differ by 3 samples. The precursor
+##'   data were aggregated to protein intensity using maxLFQ. The
+##'   protein data were further median normalized by column and by row,
+##'   log2 transformed, impute using KNN (k = 3), again median
+##'   normalized by column and by row, batch corrected using ComBat,
+##'   and median normalized by column and by row once more.
+##'
+##' The `colData(leduc2022_plexDIA())` contains cell type annotation and
+##' batch annotation that are common to all assays. The description of
+##' the `rowData` fields for the precursor data can be found in the
+##' [`DIA-NN` documentation](https://github.com/vdemichev/DiaNN#readme).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: CellenONE cell sorting.
+##' - **Sample preparation** performed using the improved SCoPE2
+##'   protocol using the CellenONE liquid handling system. nPOP cell
+##'   lysis (DMSO) + trypsin digestion + mTRAQ-3
+##'   labeling and pooling.
+##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
+##'   25cm x 75um IonOpticks Aurora Series UHPLC column; 200nL/min).
+##' - **Ionization**: ESI (1,800V).
+##' - **Mass spectrometry**: Thermo Scientific Q-Exactive. The duty
+##'   cycle = 1 MS1 + 4 DIA MS2 windows (120 Th, 120 Th, 200 Th and
+##'   580 Th, spanning 378-1,402 m/z). Each MS1 and MS2 scan was
+##'   conducted at 70,000 resolving power, 3×10E6 AGC and 300ms
+##'   maximum injection time.
+##' - **Data analysis**: DIA-NN.
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a shared Google Drive folder that
+##' is accessible from the SlavovLab website (see `Source` section).
+##' The folder contains the following files of interest:
+##'
+##' - `annotation_plexDIA.csv`: sample annotation
+##' - `report_plexDIA_mel_nPOP.tsv`: the DIA-NN output file
+##'   with the precursor data
+##' - `report.pr_matrix_channels_ms1_extracted.tsv`: the DIA-NN
+##'   output file with the combined precursor data
+##' - `plexDIA_peptide.csv`: the processed data table containing the
+##'   `peptide` data
+##' - `plexDIA_protein_imputed.csv`: the processed data table
+##'   containing the `protein` data
+##'
+##' We removed the failed runs as identified by the authors. We also
+##' formatted the annotation and precuror quantification tables to
+##' facilitate matching between corresponding columns. Both annotation
+##' and quantification tables are then combined in a single [QFeatures]
+##' object using `scp::readSCPfromDIANN()`.
+##'
+##' The `plexDIA_peptide.csv` and `plexDIA_protein_imputed.csv` files
+##' were loaded and formatted as [SingleCellExperiment] objects. The
+##' columns names were adapted to match those in the `QFeatures`
+##' object. The `SingleCellExperiment` objects were then added to the
+##' [QFeatures] object and the rows of the peptide data are linked to
+##' the rows of the precursor data based on the peptide sequence or
+##' the protein name through an `AssayLink` object.
+##'
+##' @source
+##' The links to the data were found on the
+##' [Slavov Lab website](https://scp.slavovlab.net/Leduc_et_al_2022).
+##' The data were downloaded from the
+##' [Google drive folder 1](https://drive.google.com/drive/folders/117ZUG5aFIJt0vrqIxpKXQJorNtekO-BV) and
+##' [Google drive folder 2](https://drive.google.com/drive/folders/12-H2a1mfSHZUGf8O50Cr0pPZ4zIDjTac).
+##' The raw data and the quantification data can also be found in the
+##' massIVE repository `MSV000089159`:
+##' ftp://massive.ucsd.edu/MSV000089159.
+##'
+##' @references
+##' Andrew Leduc, Gray Huffman, and Nikolai Slavov. 2022. “Droplet
+##' Sample Preparation for Single-Cell Proteomics Applied to the Cell
+##' Cycle.” bioRxiv. [Link to article](https://doi.org/10.1101/2021.04.24.441211)
+##'
+##' Andrew Leduc, Gray Huffman, Joshua Cantlon, Saad Khan, and Nikolai
+##' Slavov. 2022. “Exploring Functional Protein Covariation across
+##' Single Cells Using nPOP.” Genome Biology 23 (1): 261.
+##' [Link to article](http://dx.doi.org/10.1186/s13059-022-02817-5)
+##'
+##' Jason Derks, Andrew Leduc, Georg Wallmann, Gray Huffman, Matthew
+##' Willetts, Saad Khan, Harrison Specht, Markus Ralser, Vadim
+##' Demichev, and Nikolai Slavov. 2023. “Increasing the Throughput of
+##' Sensitive Proteomics by plexDIA.” Nature Biotechnology 41 (1):
+##' 50–59. [Link to article](http://dx.doi.org/10.1038/s41587-022-01389-w)
+##'
+##' @seealso
+##' [leduc2022_pSCoPE]
+##'
+##' @examples
+##' \donttest{
+##' leduc2022_plexDIA()
+##' }
+##'
+##' @keywords datasets
+##'
+"leduc2022_plexDIA"
diff --git a/R/liang2020_hela.R b/R/liang2020_hela.R
new file mode 100644
index 0000000..c82d480
--- /dev/null
+++ b/R/liang2020_hela.R
@@ -0,0 +1,98 @@
+##' Liang et al. 2020 (Anal. Chem.): HeLa cells (MaxQuant preprocessing)
+##'
+##' Single-cell proteomics data from HeLa cells using the autoPOTS
+##' acquisition workflow. The samples contain either no cells (blanks),
+##' 1 cell, 10 cells, 150 cells or 500 cells. Samples containing
+##' between 0 and 10 cells are isolated using micro-pipetting while
+##' samples containing between 150 and 500 cells were prepared using
+##' dilution of a bulk sample.
+##'
+##' @format A [QFeatures] object with 17 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `HeLa_*`: 15 assays containing PSM data.
+##' - `peptides`: quantitative data for 48705 peptides in 15 samples
+##'   (all runs are combined).
+##' - `proteins`: quantitative data for 3970 protein groups in 15
+##'   samples (all runs combined).
+##'
+##' Sample annotation is stored in `colData(liang2020_hela())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The HeLa cells come from a commercially
+##'   available cell line. Samples containing between 0 and 10 cells
+##'   were isolated using micro-manipulation and the counts were
+##'   validated using a microscope. Samples containing between 150 and
+##'   500 cells were prepared by diluting a bulk sample and the exact
+##'   counts were evaluated by obtaining phtotmicrographs.
+##' - **Sample preparation** performed using the autoPOTS worflow that
+##'   relied on the OT-2 pipeting robot. Cell are lysed using
+##'   sonication. Samples are then processed by successive incubation
+##'   with DTT (reduction), then IAA (alkylation), then Lys-C and
+##'   trypsin (protein digestion).
+##' - **Separation**: Samples were injected on the column using a
+##'   modified Ultimate WPS-3000 TPL autosampler coupled to an UltiMate
+##'   3000 RSLCnano pump. The LC column is a home-packed nanoLC column
+##'   (45cm x 30um; 40nL/min)
+##' - **Ionization**: Nanospray Flex ion source (2,000V)
+##' - **Mass spectrometry**: Orbitrap Exploris 480. MS1 settings:
+##'   accumulation time = 250 ms (0-10 cells) or 100 ms (150-500 cells);
+##'   resolution = 120,000; AGC = 100\%. MS2 settings: exlusion
+##'   duration = 90 s (0-10 cells) or 60 s (150-500 cells) ; accumulation
+##'   time = 500 ms (0-1 cell), 250 ms (10 cells), 100 ms (150 cells)
+##'   or 50 ms (500 cells); resolution = 60,000 (0-10 cells) or 30,000
+##'   (150-500 cells); AGC = 5E3 (0-1 cells) or 1E4 (10-500 cells).
+##' - **Data analysis**: MaxQuant (v1.6.7.0) and the search database
+##'   is Swiss-Prot (July 2020).
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the PRIDE repository (accession ID:
+##' PXD021882).
+##'
+##' The sample annotations were collected from the methods section and
+##' from table S3 in the paper.
+##'
+##' The PSM data were found in the `evidence.txt` file. The data were
+##' converted to a [QFeatures] object using the [scp::readSCP()]
+##' function.
+##'
+##' The peptide data were found in the `peptides.txt` file. The column
+##' names holding the quantitative data were adapted to match the
+##' sample names in the [QFeatures] object. The data were then
+##' converted to a [SingleCellExperiment] object and then inserted in
+##' the [QFeatures] object. Links between the PSMs and the peptides
+##' were added
+##'
+##' A similar procedure was applied to the protein data. The data were
+##' found in the `proteinGroups.txt` file. The column names were
+##' adapted, the data were converted to a [SingleCellExperiment]
+##' object and then inserted in the [QFeatures] object. Links between
+##' the peptides and the proteins were added
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD021882 The source link is:
+##' http://ftp.pride.ebi.ac.uk/pride/data/archive/2020/12/PXD021882/
+##'
+##' @references
+##'
+##' Liang, Yiran, Hayden Acor, Michaela A. McCown, Andikan J. Nwosu,
+##' Hannah Boekweg, Nathaniel B. Axtell, Thy Truong, Yongzheng Cong,
+##' Samuel H. Payne, and Ryan T. Kelly. 2020. “Fully Automated Sample
+##' Processing and Analysis Workflow for Low-Input Proteome
+##' Profiling.” Analytical Chemistry, December.
+##' ([link to article](https://doi.org/10.1021/acs.analchem.0c04240)).
+##'
+##' @examples
+##' \donttest{
+##' liang2020_hela()
+##' }
+##'
+##' @keywords datasets
+##'
+"liang2020_hela"
diff --git a/R/petrosius2023_AstralAML.R b/R/petrosius2023_AstralAML.R
new file mode 100644
index 0000000..c39aca8
--- /dev/null
+++ b/R/petrosius2023_AstralAML.R
@@ -0,0 +1,121 @@
+##' Petrosius et al. 2023 (bioRxiv): AML hierarchy on Astral.
+##'
+##' Single cell proteomics data from FACS sorted cells from the
+##' OCI-AML8227 model. The dataset contains leukemic stem cells (LSC;
+##' CD34+, CD38-), progenitor cells (CD34+, CD38+), CD38+ blasts
+##' (CD34-, CD38+) and CD38- blasts (CD34-, CD38-). It contains
+##' quantitative information at PSM, peptide and protein levels. Data
+##' was acquired using an Orbitrap Astral mass spectrometer. Direct DIA
+##' analysis was performed with Spectronaut version 17.
+##'
+##' @format A [QFeatures] object with 217 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assays 1-215: PSM data from the Spectronaut PEPQuant file with
+##'   LFQ quantities from the FG.MS1Quantity column.
+##' - `peptides`: Peptide data resulting from the PSM to peptide
+##'   aggregation the 215 PSM assays. Resulting peptide assays were
+##'   joined into a single assay.
+##' - `proteins`: Protein data from the Spectronaut PGQuant file with
+##'   LFQ quantities from the PG.Quantity column.
+##'
+##' The `colData(petrosius2023_AstralAML())` contains cell type
+##' annotation, batch annotation and FACS data. The description of the
+##' `rowData` fields can be found in the
+##' [`Spectronaut` user manual](https://biognosys.com/content/uploads/2023/03/Spectronaut-17_UserManual.pdf).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see *References*).
+##'
+##' - **Cell isolation**: Cell sorting was done on a FACS Aria III or
+##'   Aria II instrument, controlled by the DIVA software package and
+##'   operated with a 100 microm nozzle. Cells were sorted at single-cell
+##'   resolution, into a 384-well Eppendorf LoBind PCR plate containing
+##'   1 microL of lysis buffer.
+##' - **Sample preparation** Single-cell protein lysates were digested
+##'   overnight at 37°C with 2 ng of Trypsin supplied in 1 microL of
+##'   digestion buffer. Digestion was stopped by the addition of 1 microL
+##'   1% (v/v) trifluoroacetic acid (TFA). All liquid dispensing was
+##'   done using an I-DOT One instrument.
+##' - **Liquid chromatography**: Chromatographic separation of peptides
+##'   was conducted on a vanquish Neo UHPLC system connected to a 50 cm
+##'   uPAC Neo Low-load and an EASY-spray. Autosampler and injection
+##'   valves were configured to perform direct injections from a 384
+##'   well plate using a 25 uL injection loop on 11.8 min gradients.
+##' - **Mass spectrometry**: Acquisition was conducted with an Orbitrap
+##'   Astral mass spectrometer operated in positive mode with the
+##'   FAIMSPro interface compensation voltage set to -45 V.
+##'   MS1 scans were acquired with the Orbitrap at a resolution of
+##'   120,000 and a scan range of 400 to 900 m/z with normalized
+##'   automatic gain control (AGC) target of 300 % and maximum
+##'   injection time of 246 ms. Data independent acquisition of MS2
+##'   spectra was performed in the Astral using loop control set to 0.7
+##'   seconds per cycle with varying isolation window widths and
+##'   injection times. Fragmentation of precursor ions was performed
+##'   using higher energy collisional dissociation (HCD) using a
+##'   normalized collision energy (NCE) of 25 %. AGC target was set to
+##'   800 %.
+##' - **Raw data processing**: Raw files were processed using
+##'   Spectronaut version 17. Direct DIA analysis was performed in
+##'   pipeline mode. Pulsar searches were performed without fixed
+##'   modifications. N-terminal acetylation and methionine oxidation
+##'   were set as variable modifications. Quantification level was set
+##'   to MS1 and the quantity type set to area under the curve.
+##'
+##' @section Data collection:
+##'
+##' The data were provided by the authors and is accessible at the
+##' [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
+##' The dataset ('Astral AML single-cell data from Petrosius et
+##' al. 2023 preprint') contains the following files of interest:
+##'
+##' - `20240201_130747_PEPQuant (Normal).tsv`: the PSM level data
+##' - `20240201_130747_PGQuant (Normal).tsv`: the protein level data
+##' - `index_map.csv`: FACS data.
+##' - `msRuns_overview.csv`: Sample annotations.
+##'
+##' We added the FACS data to the sample annotations in a single table.
+##' Both annotations and PSM features tables are then combined in a
+##' single [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' The peptide data were obtained by aggregation of the PSM data to
+##' the peptide level. All of the resulting peptides assays were joined
+##' into a single assays. Individual peptides assays were discarded.
+##'
+##' The protein data were formatted from the `20240201_130747_PGQuant (Normal).tsv`
+##' to a [SingleCellExperiment] object and the sample metadata were
+##' matched to the column names and stored in the  `colData`. The
+##' object is then added to the [QFeatures] object and the rows of the
+##' peptide data are linked to the rows of the protein data based on
+##' the protein sequence information through an `AssayLink` object.
+##'
+##' Note that the [QFeatures] object has not been further processed and
+##' has therefore not been normalized, log-transformed or
+##' batch-corrected.
+##'
+##' @source The PSM data, protein data and sample annotations can be
+##'     downloaded from the dataset 'Astral AML single-cell data from
+##'     Petrosius et al. 2023 preprint' in the
+##'     [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT).
+##'
+##' @references
+##'
+##' Valdemaras Petrosius, Pedro Aragon-Fernandez, Tabiwang N. Arrey,
+##' Nil Üresin, Benjamin Furtwängler, Hamish Stewart, Eduard Denisov,
+##' Johannes Petzoldt, Amelia C. Peterson, Christian Hock, Eugen
+##' Damoc, Alexander Makarov, Vlad Zabrouskov, Bo T. Porse and Erwin
+##' M. Schoof.
+##' 2023. "Evaluating the capabilities of the Astral mass analyzer for single-cell proteomics."
+##' biorxiv.  https://doi.org/10.1101/2023.06.06.543943
+##' DOI:[10.1101/2023.06.06.543943](https://doi.org/10.1101/2023.06.06.543943)
+##'
+##' @examples
+##' \donttest{
+##' petrosius2023_AstralAML()
+##' }
+##'
+##' @keywords datasets
+##'
+"petrosius2023_AstralAML"
diff --git a/R/petrosius2023_mES.R b/R/petrosius2023_mES.R
new file mode 100644
index 0000000..15e621c
--- /dev/null
+++ b/R/petrosius2023_mES.R
@@ -0,0 +1,113 @@
+##' Petrosius et al, 2023 (Nat. Comm.): Mouse embryonic stem cell (mESC) in
+##' different culture conditions
+##'
+##' @description
+##' Profiling mouse embryonic stem cells across ground-state (m2i) and
+##' differentiation-permissive (m15) culture conditions. The data were
+##' acquired using orbitrap-based data-independent acquisition (DIA).
+##' The objective was to demonstrate the capability of their approach
+##' by profiling mouse embryonic stem cell culture conditions, showcasing
+##' heterogeneity in global proteomes, and highlighting differences in
+##' the expression of key metabolic enzymes in distinct cell subclusters.
+##'
+##' @format A [QFeatures] object with 605 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-603: PSM data acquired with an orbitrap-based data-independent
+##'   acquisition (DIA) protocol, hence those assays contain single column
+##'   that contains the quantitative information.
+##' - `peptides`: peptide data containing quantitative data for 9884
+##'   peptides and 603 single-cells.
+##' - `proteins`: protein data containing quantitative data for 4270
+##'   proteins and 603 single-cells.
+##'
+##' Sample annotation is stored in `colData(petrosius2023_mES())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: Cell sorting was done on a Sony MA900 cell sorter
+##'   using a 130 microm sorting chip. Cells were sorted at single-cell resolution,
+##'   into a 384-well Eppendorf LoBind PCR plate (Eppendorf AG) containing 1 microL
+##'   of lysis buffer.
+##' - **Sample preparation**: Single-cell protein lysates were digested with
+##'   2 ng of Trypsin (Sigma cat. Nr. T6567) supplied in 1 microL of digestion
+##'   buffer (100mM TEAB pH 8.5, 1:5000 (v/v) benzonase (Sigma cat. Nr. E1014)).
+##'   The digestion was carried out overnight at 37 °C, and subsequently
+##'   acidified by the addition of 1 microL 1% (v/v) trifluoroacetic acid (TFA).
+##'   All liquid dispensing was done using an I-DOT One instrument (Dispendix).
+##' - **Liquid chromatography**: The Evosep one liquid chromatography system was
+##'   used for DIA isolation window survey and HRMS1-DIA experiments.The standard
+##'   31 min or 58min pre-defined Whisper gradients were used, where peptide
+##'   elution is carried out with 100 nl/min flow rate. A 15 cm × 75 microm
+##'   ID column (PepSep) with 1.9 microm C18 beads (Dr. Maisch, Germany) and a 10
+##'   microm ID silica electrospray emitter (PepSep) was used. Both LC systems were
+##'   coupled online to an orbitrap Eclipse TribridMass Spectrometer
+##'   (ThermoFisher Scientific) via an EasySpray ion source connected to a
+##'   FAIMSPro device.
+##' - **Mass spectrometry**: The mass spectrometer was operated in positive
+##'   mode with the FAIMSPro interface compensation voltage set to -45 V.
+##'   MS1 scans were carried out at 120,000 resolution with an automatic gain
+##'   control (AGC) of 300% and maximum injection time set to auto. For the DIA
+##'   isolation window survey a scan range of 500–900 was used and 400–1000
+##'   rest of the experiments. Higher energy collisional dissociation (HCD) was
+##'   used for precursor fragmentation with a normalized collision energy (NCE)
+##'   of 33% and MS2 scan AGC target was set to 1000%.
+##' - **Raw data processing**: The mESC raw data files were processed with
+##'   Spectronaut 17 and protein abundance tables exported and analyzed further
+##'   with python.
+##'
+##' @section Data collection:
+##'
+##' The data were provided by the Author and is accessible at the
+##' [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
+##' The folder ('20240205_111248_mESC_SNEcombine_m15-m2i/') contains
+##' the following files of interest:
+##'
+##' - `20240205_111251_PEPQuant (Normal).tsv`: the PSM level data
+##' - `20240205_111251_Peptide Quant (Normal).tsv`: the peptide level data
+##' - `20240205_111251_PGQuant (Normal).tsv`: the protein level data
+##'
+##' The metadata were downloaded from the [Zenodo repository](https://zenodo.org/records/8146605).
+##'
+##' - `sample_facs.csv`: the metadata
+##'
+##' We formatted the quantification table so that columns match with the
+##' metadata. Then, both tables are then combined in a single
+##' [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' The peptide data were formated to a [SingleCellExperiment] object and the
+##' sample metadata were matched to the column names and stored in the `colData`.
+##' The object is then added to the [QFeatures] object and the rows of the PSM
+##' data are linked to the rows of the peptide data based on the peptide sequence
+##' information through an `AssayLink` object.
+##'
+##' The protein data were formated to a [SingleCellExperiment] object and
+##' the sample metadata were matched to the column names and stored in the
+##' `colData`. The object is then added to the [QFeatures] object and the rows
+##' of the peptide data are linked to the rows of the protein data based on the
+##' protein sequence information through an `AssayLink` object.
+##'
+##' @source The peptide and protein data can be downloaded from the
+##'     [Dataverse](https://dataverse.uclouvain.be/dataset.xhtml?persistentId=doi:10.14428/DVN/EMAVLT)
+##'     The raw data and the quantification data can also be found in
+##'     the MassIVE repository `MSV000092429`:
+##'     ftp://MSV000092429@massive.ucsd.edu/.
+##'
+##' @references
+##' **Source article**: Petrosius, V., Aragon-Fernandez, P., Üresin, N. et al.
+##' "Exploration of cell state heterogeneity using single-cell proteomics
+##' through sensitivity-tailored data-independent acquisition."
+##' Nat Commun 14, 5910 (2023).
+##' ([link to article](https://doi.org/10.1038/s41467-023-41602-1)).
+##'
+##' @examples
+##' \donttest{
+##' petrosius2023_mES()
+##' }
+##'
+##' @keywords datasets
+##'
+"petrosius2023_mES"
diff --git a/R/schoof2021.R b/R/schoof2021.R
new file mode 100644
index 0000000..d9b17e6
--- /dev/null
+++ b/R/schoof2021.R
@@ -0,0 +1,124 @@
+##' Schoof et al. 2021 (Nat. Comm.): acute myeloid leukemia
+##' differentiation
+##'
+##' Single-cell proteomics data from OCI-AML8227 cell culture to
+##' reconstruct the cellular hierarchy. The data were acquired using
+##' TMTpro multiplexing. The samples contain either no cells,
+##' single cells, 10 cells (reference channel) 200 cells (booster
+##' channel) or are simply empty wells. Single cells are expected to
+##' be one of progenitor cells (`PROG`), leukaemia stem cells (`LSC`),
+##' CD38- blast cells (`BLAST CD38-`) or CD38+ blast cells
+##' (`BLAST CD38+`). Booster are either a known 1:1:1 mix of cells
+##' (PROG, LSC and BLAST) or are isolated directly from the bulk
+##' sample. Samples were isolated and annotated using flow cytometry.
+##'
+##' @format A [QFeatures] object with 194 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `F*`: 192 assays containing PSM quantification data for 16
+##'    TMT channels. The quantification data contain signal to noise
+##'    ratios as computed by Proteome Discoverer.
+##' - `proteins`: quantitative data for 2898 protein groups in 3072
+##'   samples (all runs combined). The quantification data contain
+##'   signal to noise ratios as computed by Proteome Discoverer.
+##' - `logNormProteins`: quantitative data for 2723 protein groups in
+##'   2025 single-cell samples. This assay is the protein datasets that
+##'   was processed by the authors. Dimension reduction and clustering
+##'   data are also available in the `reducedDims` and `colData` slots,
+##'   respectively
+##'
+##' Sample annotation is stored in `colData(schoof2021())`. The cell
+##' type annotation is stored in the `Population` column. The flow
+##' cytometry data is also available: FSC-A, FSC-H, FSC-W, SSC-A,
+##' SSC-H, SSC-W, APC-Cy7-A (= CD34) and PE-A (= CD38).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: cultured AML 8227 cells were stained with
+##'   anti-CD34 and anti-CD38. The sorting was performed by FACSAria
+##'   instrument and deposited in 384 well plates.
+##' - **Sample preparation**: cells are lysed using freeze-boil and
+##'   sonication in a lysis buffer (TFE) that also includes reduction
+##'   and alkylation reagents (TCEP and CAA), followed by trypsin
+##'   (protein) and benzonase (DNA) digestion, TMT-16 labeling and
+##'   quenching, desalting using SOLAµ C18 plate, peptide
+##'   concentration, pooling and peptide concentration again. The
+##'   booster channel contains 200 cell equivalents.
+##' - **Liquid chromatography**: peptides are separated using a C18
+##'   reverse-phase column (50cm x 75 µm i.d., Thermo EasySpray) combined
+##'   to a Thermo EasyLC 1200 for 160 minute gradient with a flowrate of
+##'   100nl/min.
+##' - **Mass spectrometry**: FAIMSPro interface is used. MS1 setup:
+##'   resolution 60.000, AGC target of 300%, accumulation of 50ms. MS2
+##'   setup: resolution 45.000, AGC target of 150, 300 or 500%,
+##'   accumulation of 150, 300, 500, or 1000ms.
+##' - **Raw data processing**: Proteome Discoverer 2.4 + Sequest spectral
+##'   search engine and validation with Percolator
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the PRIDE repository (accession ID:
+##' PXD020586). The data and metadata were extracted from the
+##' `SCeptre_FINAL.zip` file.
+##'
+##' We performed extensive data wrangling to combine al the metadata
+##' available from different files into a single table available using
+##' `colData(schoof2021)`.
+##'
+##' The PSM data were found in the `bulk_PSMs.txt` file. Contaminants
+##' were defined based on the protein accessions listed in
+##' `contaminant.txt`. The data were converted to a [QFeatures]
+##'  object using the [scp::readSCP()] function.
+##'
+##' The protein data were found in the `bulk_Proteins.txt` file.
+##' Contaminants were defined based on the protein accessions listed
+##' in `contaminant.txt`.The column names holding the quantitative
+##' data were adapted to match the sample names in the [QFeatures]
+##' object. Unnecessary feature annotations (such as in which assay
+##' a protein is found) were removed. Feature names were created
+##' following the procedure in SCeptre: features names are the
+##' protein symbol (or accession if missing) and if duplicated
+##' symbols are present (protein isoforms), they are made unique by
+##' appending the protein accession.  Contaminants were defined based
+##' on the protein accessions listed in `contaminant.txt`. The data
+##' were then converted to a [SingleCellExperiment] object and
+##' inserted in the [QFeatures] object.
+##'
+##' The log-normalized protein data were found in the `bulk.h5ad` file.
+##' This dataset was generated by the authors by running the notebook
+##' called `bulk.ipynb`. The `bulk.h5ad` was loaded as an `AnnData`
+##' object using the `scanpy` Python module. The object was then
+##' converted to a `SingleCellExperiment` object using the
+##' `zellkonverter` package. The column names holding the quantitative
+##' data were adapted to match the sample names in the [QFeatures]
+##' object. The data were then inserted in the [QFeatures] object.
+##'
+##' The script to reproduce the `QFeatures` object is available at
+##' `system.file("scripts", "make-data_schoof2021.R", package = "scpdata")`
+##'
+##' @source
+##'
+##' The PSM and protein data can be downloaded from the PRIDE
+##' repository PXD020586 The source link is:
+##' https://www.ebi.ac.uk/pride/archive/projects/PXD020586
+##'
+##' @references
+##'
+##' Schoof, Erwin M., Benjamin Furtwängler, Nil Üresin, Nicolas Rapin,
+##' Simonas Savickas, Coline Gentil, Eric Lechman, Ulrich auf Dem
+##' Keller, John E. Dick, and Bo T. Porse. 2021. “Quantitative
+##' Single-Cell Proteomics as a Tool to Characterize Cellular
+##' Hierarchies.” Nature Communications 12 (1): 745679.
+##' ([link to article](http://dx.doi.org/10.1038/s41467-021-23667-y)).
+##'
+##' @examples
+##' \donttest{
+##' schoof2021()
+##' }
+##'
+##' @keywords datasets
+##'
+"schoof2021"
diff --git a/R/specht2019v2.R b/R/specht2019v2.R
new file mode 100644
index 0000000..ea1b8d9
--- /dev/null
+++ b/R/specht2019v2.R
@@ -0,0 +1,105 @@
+##' Specht et al. 2019 - SCoPE2 (biorRxiv): macrophages vs monocytes
+##' (version 2)
+##'
+##' @description
+##'
+##' Single cell proteomics data acquired by the Slavov Lab. This is
+##' the version 2 of the data released in December 2019. It contains
+##' quantitative information of macrophages and monocytes at PSM,
+##' peptide and protein level.
+##'
+##' @format A [QFeatures] object with 179 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-63: PSM data for SCoPE2 sets acquired with a TMT-11plex
+##'   protocol, hence those assays contain 11 columns. Columns
+##'   hold quantitative information from single-cell channels, carrier
+##'   channels, reference channels, empty (blank) channels and unused
+##'   channels.
+##' - Assay 64-177: PSM data for SCoPE2 sets acquired with a
+##'   TMT-16plex protocol, hence those assays contain 16 columns.
+##'   Columns hold quantitative information from single-cell channels,
+##'   carrier channels, reference channels, empty (blank) channels and
+##'   unused channels.
+##' - `peptides`: peptide data containing quantitative data for 9208
+##'   peptides and 1018 single-cells.
+##' - `proteins`: protein data containing quantitative data for 2772
+##'   proteins and 1018 single-cells.
+##'
+##' The `colData(specht2019v2())` contains cell type annotation and
+##' batch annotation that are common to all assays. The description of
+##' the `rowData` fields for the PSM data can be found in the
+##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: flow cytometry (BD FACSAria I).
+##' - **Sample preparation** performed using the SCoPE2 protocol. mPOP
+##'   cell lysis + trypsin digestion + TMT-11plex or 16plex labelling
+##'   and pooling.
+##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
+##'   25cm x 75um IonOpticksAurora Series UHPLC column; 200nL/min).
+##' - **Ionization**: ESI (2,200V).
+##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
+##'   resolution = 70,000; MS1 accumulation time = 300ms; MS2
+##'   resolution = 70,000).
+##' - **Data analysis**: DART-ID + MaxQuant (1.6.2.3).
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a shared Google Drive folder that
+##' is accessible from the SlavovLab website (see `Source` section).
+##' The folder contains the following files of interest:
+##'
+##' - `ev_updated.txt`: the MaxQuant/DART-ID output file
+##' - `annotation_fp60-97.csv`: sample annotation
+##' - `batch_fp60-97.csv`: batch annotation
+##'
+##' We combined the sample annotation and the batch annotation in
+##' a single table. We also formatted the quantification table so that
+##' columns match with those of the annotation and filter only for
+##' single-cell runs. Both table are then combined in a single
+##' [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' The peptide data were taken from the Slavov lab directly
+##' (`Peptides-raw.csv`). It is provided as a spreadsheet. The data
+##' were formatted to a [SingleCellExperiment] object and the sample
+##' metadata were matched to the column names (mapping is retrieved
+##' after running the SCoPE2 R script) and stored in the `colData`.
+##' The object is then added to the [QFeatures] object (containing the
+##' PSM assays) and the rows of the peptide data are linked to the
+##' rows of the PSM data based on the peptide sequence information
+##' through an `AssayLink` object.
+##'
+##' The protein data (`Proteins-processed.csv`) is formatted similarly
+##' to the peptide data, and the rows of the proteins were mapped onto
+##' the rows of the peptide data based on the protein sequence
+##' information.
+##'
+##' @source
+##' The data were downloaded from the
+##' [Slavov Lab](https://scope2.slavovlab.net/mass-spec/data) website via a
+##' shared Google Drive
+##' [folder](https://drive.google.com/drive/folders/1VzBfmNxziRYqayx3SP-cOe2gu129Obgx).
+##' The raw data and the quantification data can also be found in the
+##' massIVE repository `MSV000083945`:
+##' ftp://massive.ucsd.edu/MSV000083945.
+##'
+##' @references Specht, Harrison, Edward Emmott, Aleksandra A.
+##' Petelski, R. Gray Huffman, David H. Perlman, Marco Serra, Peter
+##' Kharchenko, Antonius Koller, and Nikolai Slavov. 2019.
+##' "Single-Cell Mass-Spectrometry Quantifies the Emergence of
+##' Macrophage Heterogeneity." bioRxiv.
+##' ([link to article](https://doi.org/10.1101/665307)).
+##'
+##' @examples
+##' \donttest{
+##' specht2019v2()
+##' }
+##'
+##' @keywords datasets
+##'
+"specht2019v2"
diff --git a/R/specht2019v3.R b/R/specht2019v3.R
new file mode 100644
index 0000000..286ced5
--- /dev/null
+++ b/R/specht2019v3.R
@@ -0,0 +1,110 @@
+##' Specht et al. 2019 - SCoPE2 (biorRxiv): macrophages vs monocytes
+##' (version 3)
+##'
+##' Single cell proteomics data acquired by the Slavov Lab. This is
+##' the version 3 of the data released in October 2020. It contains
+##' quantitative information of macrophages and monocytes at PSM,
+##' peptide and protein level.
+##'
+##' @format A [QFeatures] object with 179 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - Assay 1-63: PSM data for SCoPE2 sets acquired with a TMT-11plex
+##'   protocol, hence those assays contain 11 columns. Columns
+##'   hold quantitative information from single-cell channels, carrier
+##'   channels, reference channels, empty (blank) channels and unused
+##'   channels.
+##' - Assay 64-177: PSM data for SCoPE2 sets acquired with a
+##'   TMT-16plex protocol, hence those assays contain 16 columns.
+##'   Columns hold quantitative information from single-cell channels,
+##'   carrier channels, reference channels, empty (blank) channels and
+##'   unused channels.
+##' - `peptides`: peptide data containing quantitative data for 9208
+##'   peptides and 1018 single-cells.
+##' - `proteins`: protein data containing quantitative data for 2772
+##'   proteins and 1018 single-cells.
+##'
+##' The `colData(specht2019v2())` contains cell type annotation and
+##' batch annotation that are common to all assays. The description of
+##' the `rowData` fields for the PSM data can be found in the
+##' [`MaxQuant` documentation](http://www.coxdocs.org/doku.php?id=maxquant:table:evidencetable).
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: flow cytometry (BD FACSAria I).
+##' - **Sample preparation** performed using the SCoPE2 protocol. mPOP
+##'   cell lysis + trypsin digestion + TMT-11plex or 16plex labeling
+##'   and pooling.
+##' - **Separation**: online nLC (DionexUltiMate 3000 UHPLC with a
+##'   25cm x 75um IonOpticksAurora Series UHPLC column; 200nL/min).
+##' - **Ionization**: ESI (2,200V).
+##' - **Mass spectrometry**: Thermo Scientific Q-Exactive (MS1
+##'   resolution = 70,000; MS2 accumulation time = 300ms; MS2
+##'   resolution = 70,000).
+##' - **Data analysis**: DART-ID + MaxQuant (1.6.2.3).
+##'
+##' @section Data collection:
+##'
+##' The PSM data were collected from a shared Google Drive folder that
+##' is accessible from the SlavovLab website (see `Source` section).
+##' The folder contains the following files of interest:
+##'
+##' - `ev_updated_v2.txt`: the MaxQuant/DART-ID output file
+##' - `annotation_fp60-97.csv`: sample annotation
+##' - `batch_fp60-97.csv`: batch annotation
+##'
+##' We combined the sample annotation and the batch annotation in
+##' a single table. We also formatted the quantification table so that
+##' columns match with those of the annotation and filter only for
+##' single-cell runs. Both table are then combined in a single
+##' [QFeatures] object using the [scp::readSCP()] function.
+##'
+##' The peptide data were taken from the Slavov lab directly
+##' (`Peptides-raw.csv`). It is provided as a spreadsheet. The data
+##' were formatted to a [SingleCellExperiment] object and the sample
+##' metadata were matched to the column names (mapping is retrieved
+##' after running the SCoPE2 R script) and stored in the `colData`.
+##' The object is then added to the [QFeatures] object (containing the
+##' PSM assays) and the rows of the peptide data are linked to the
+##' rows of the PSM data based on the peptide sequence information
+##' through an `AssayLink` object.
+##'
+##' The protein data (`Proteins-processed.csv`) is formatted similarly
+##' to the peptide data, and the rows of the proteins were mapped onto
+##' the rows of the peptide data based on the protein sequence
+##' information.
+##'
+##' @note Since version 2, a serious bug in the data were corrected
+##' for TMT channels 12 to 16. Many more cells are therefore contained
+##' in the data. Version 2 is maintained for backward compatibility.
+##' Although the final version of the article was published in 2021,
+##' we have kept `specht2019v3` as the data set name for consistency
+##' with the previous data version `specht2019v2`.
+##'
+##' @source
+##' The data were downloaded from the
+##' [Slavov Lab](https://scope2.slavovlab.net/docs/data) website via a
+##' shared Google Drive
+##' [folder](https://drive.google.com/drive/folders/1VzBfmNxziRYqayx3SP-cOe2gu129Obgx).
+##' The raw data and the quantification data can also be found in the
+##' massIVE repository `MSV000083945`:
+##' ftp://massive.ucsd.edu/MSV000083945.
+##'
+##' @references Specht, Harrison, Edward Emmott, Aleksandra A.
+##' Petelski, R. Gray Huffman, David H. Perlman, Marco Serra, Peter
+##' Kharchenko, Antonius Koller, and Nikolai Slavov. 2021.
+##' "Single-Cell Proteomic and Transcriptomic Analysis of Macrophage
+##' Heterogeneity Using SCoPE2." Genome Biology 22 (1): 50.
+##' ([link to article](http://dx.doi.org/10.1186/s13059-021-02267-5)).
+##'
+##' @examples
+##' \donttest{
+##' specht2019v3()
+##' }
+##'
+##' @keywords datasets
+##'
+"specht2019v3"
diff --git a/R/williams2020_lfq.R b/R/williams2020_lfq.R
new file mode 100644
index 0000000..91391e6
--- /dev/null
+++ b/R/williams2020_lfq.R
@@ -0,0 +1,118 @@
+##' Williams et al. 2020 (Anal. Chem.): MCF10A cell line
+##'
+##' Single-cell label free proteomics data from a MCF10A cell line
+##' culture. The data were acquired using a label-free quantification
+##' protocole based on the nanoPOTS technology. The objective was to
+##' test 2 elution gradients for single-cell applications and to
+##' demonstrate successful use of the new nanoPOTS autosampler
+##' presented in the article. The samples contain either no cells,
+##' single cells, 3 cells, 10 cells  50 cells.
+##'
+##' @format A [QFeatures] object with 9 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides_[30 or 60]min_[intensity or LFQ]`: 3 assays
+##'   containing peptide intensities or LFQ normalized
+##'   quantifications (see `References`) for either a 30min or a 60 min
+##'   gradient.
+##' - `proteins_[30 or 60]min_[intensity or iBAQ or LFQ]`: 6 assays
+##'   containing protein intensities, iBAQ normalized or LFQ normalized
+##'   quantifications (see `References`) for either a 30min or a 60 min
+##'   gradient.
+##'
+##' Sample annotation is stored in `colData(williams2020_lfq())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: cultured MCF10A cells were isolated using
+##'   flow-cytometry based cell sorting and deposit on nanoPOTS
+##'   microwells
+##' - **Sample preparation**: cells are lysed using using a DDM+DTT
+##'   lysis buffer. Alkylation was then performed using an IAA solution.
+##'   Proteins are digested with Lys-C and trypsin followed by
+##'   acidification with FA. Sample droplets are then dried until
+##'   LC-MS/MS analysis.
+##' - **Liquid chromatography**: peptides are loaded using the new
+##'   autosampler described in the paper. Samples are loaded using a
+##'   a homemade miniature syringe pump. The samples are then desalted
+##'   and concentrated through a SPE column (4cm x 100µm i.d. packed
+##'   with 5µm C18) with microflow LC pump. The peptides are then eluted
+##'   from a long LC column (60cm x 50 µm i.d. packed with 3µm C18)
+##'   coupled to a nanoflox LC pump at 150nL/mL with either a 30 min
+##'   or a 60 min gradient.
+##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
+##'   Lumos Tribrid MS coupled to a 2kV ESI. MS1 setup: Orbitrap
+##'   analyzer at resolution 120.000, AGC target of 1E6, accumulation
+##'   of 246ms. MS2 setup: ion trap with CID at resolution 60.000, AGC
+##'   target of 2E4, accumulation of 120ms (50 cells) or 250ms (0-10
+##'   cells).
+##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
+##'   that use Andromeda search engine (with UniProtKB 2016-21-29),
+##'   MBR and LFQ normalization were enabled.
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the MASSIVE repository (accession ID:
+##' MSV000085230).
+##'
+##' The peptide and protein data were extracted from the `Peptides_[...].txt`
+##' or `ProteinGroups[...].txt` files, respectively, in the
+##' `MCF10A_LC_[30 or 60]minutes` folders.
+##'
+##' The tables were duplicated so that peptide intensisities, peptide
+##' LFQ, protein intensities, protein LFQ and protein intensities are
+##' contained in separate tables. Tables are then converted to
+##' [SingleCellExperiment] objects. Sample annotations were infered
+##' from the sample names and from the paper. All data is combined in
+##' a [QFeatures] object. [AssayLinks] were stored between peptide
+##' assays and their corresponding proteins assays based on the
+##' leading razor protein (hence only unique peptides are linked to
+##' proteins).
+##'
+##' The script to reproduce the `QFeatures` object is available at
+##' `system.file("scripts", "make-data_williams2020_lfq.R", package = "scpdata")`
+##'
+##' @section Suggestion:
+##'
+##' See `QFeatures::joinAssays` if you want to join the 30min and
+##' 60min assays in a single assay for an integrated analysis.
+##'
+##' @source
+##'
+##' The PSM and protein data can be downloaded from the MASSIVE
+##' repository MSV000085230.
+##'
+##' @references
+##'
+##' **Source article**: Williams, Sarah M., Andrey V. Liyu, Chia-Feng
+##' Tsai, Ronald J. Moore, Daniel J. Orton, William B. Chrisler,
+##' Matthew J. Gaffrey, et al. 2020. “Automated Coupling of
+##' Nanodroplet Sample Preparation with Liquid Chromatography-Mass
+##' Spectrometry for High-Throughput Single-Cell Proteomics.”
+##' Analytical Chemistry 92 (15): 10588–96.
+##' ([link to article](http://dx.doi.org/10.1021/acs.analchem.0c01551)).
+##'
+##' **LFQ normalization**: Cox, Jürgen, Marco Y. Hein, Christian A. Luber,
+##' Igor Paron, Nagarjuna Nagaraj, and Matthias Mann. 2014. “Accurate
+##' Proteome-Wide Label-Free Quantification by Delayed Normalization
+##' and Maximal Peptide Ratio Extraction, Termed MaxLFQ.” Molecular
+##' & Cellular Proteomics: MCP 13 (9): 2513–26.
+##' ([link to article](http://dx.doi.org/10.1074/mcp.M113.031591)).
+##'
+##' **iBAQ normalization**: Schwanhäusser, Björn, Dorothea Busse, Na
+##' Li, Gunnar Dittmar, Johannes Schuchhardt, Jana Wolf, Wei Chen, and
+##' Matthias Selbach. 2011. “Global Quantification of Mammalian Gene
+##' Expression Control.” Nature 473 (7347): 337–42.
+##' ([link to article](http://dx.doi.org/10.1038/nature10098)).
+##'
+##' @examples
+##' \donttest{
+##' williams2020_lfq()
+##' }
+##'
+##' @keywords datasets
+##'
+"williams2020_lfq"
diff --git a/R/williams2020_tmt.R b/R/williams2020_tmt.R
new file mode 100644
index 0000000..cd73140
--- /dev/null
+++ b/R/williams2020_tmt.R
@@ -0,0 +1,96 @@
+##' Williams et al. 2020 (Anal. Chem.): 3 AML cell line
+##'
+##' Single-cell label data from three acute myeloid
+##' leukemia cell line culture (MOLM-14, K562, CMK). The data were
+##' acquired using a TMT-based quantification protocole and the
+##' nanoPOTS technology. The objective was to demonstrate successful
+##' use of the new nanoPOTS autosampler presented in the source
+##' article. The samples contain either carrier (10 ng), reference
+##' (0.2ng), empty or single-cell samples..
+##'
+##' @format A [QFeatures] object with 4 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides_[intensity or corrected]`: 2 assays containing peptide
+##'   reporter ion intensities or corrected reporter ion intensities
+##'   as computed by MaxQuant.
+##' - `proteins_[intensity or corrected]`: 2 assays containing protein
+##'   reporter ion intensities or corrected reporter ion intensities
+##'   as computed by MaxQuant.
+##'
+##' Sample annotation is stored in `colData(williams2020_tmt())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: cultured MOLM-14, K562 or CMK cells were
+##'   isolated using flow-cytometry based cell sorting and deposit on
+##'   nanoPOTS microwells
+##' - **Sample preparation**: cells are lysed using using a DDM lysis
+##'   buffer. Proteins are digested with trypsin followed by TMT
+##'   labelling and quanching with HA. The samples are then acidified
+##'   with FA, pooled in a single samples (adding carrier and reference
+##'   peptide mixtures), and dried until LC-MS/MS analysis.
+##' - **Liquid chromatography**: peptides are loaded using the new
+##'   autosampler described in the paper. Samples are loaded using a
+##'   a homemade miniature syringe pump. The samples are then desalted
+##'   and concentrated through a SPE column (4cm x 100µm i.d. packed
+##'   with 5µm C18) with microflow LC pump. The peptides are then eluted
+##'   from a long LC column (60cm x 50 µm i.d. packed with 3µm C18)
+##'   coupled to a nanoflox LC pump at 150nL/mL (elution time is not
+##'   expliceted).
+##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
+##'   Lumos Tribrid MS coupled to a 2kV ESI. MS1 setup: Orbitrap
+##'   analyzer at resolution 120.000, AGC target of 1E6, accumulation
+##'   of 246ms. MS2 setup: Orbitrap with HCD at resolution 120.000, AGC
+##'   target of 1E6, accumulation of 246ms.
+##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
+##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the MASSIVE repository (accession ID:
+##' MSV000085230).
+##'
+##' The peptide and protein data were extracted from the
+##' `Peptides_AML_SingleCell.txt` or `ProteinGroups_AML_SingleCell.txt`
+##' files, respectively, in the `AML_SingleCell` folders.
+##'
+##' The tables were duplicated so that intensisities and corrected
+##' intensities are contained in separate tables. Tables are then
+##' converted to [SingleCellExperiment] objects. Sample annotations
+##' were inferred from the sample names, from table S2 and from the
+##' Experimental Section of the paper. All data is combined in
+##' a [QFeatures] object. [AssayLinks] were stored between peptide
+##' assays and their corresponding proteins assays based on the
+##' leading razor protein (hence only unique peptides are linked to
+##' proteins).
+##'
+##' The script to reproduce the `QFeatures` object is available at
+##' `system.file("scripts", "make-data_williams2020_tmt.R", package = "scpdata")`
+##'
+##' @source
+##'
+##' The PSM and protein data can be downloaded from the MASSIVE
+##' repository MSV000085230.
+##'
+##' @references
+##'
+##' **Source article**: Williams, Sarah M., Andrey V. Liyu, Chia-Feng
+##' Tsai, Ronald J. Moore, Daniel J. Orton, William B. Chrisler,
+##' Matthew J. Gaffrey, et al. 2020. “Automated Coupling of
+##' Nanodroplet Sample Preparation with Liquid Chromatography-Mass
+##' Spectrometry for High-Throughput Single-Cell Proteomics.”
+##' Analytical Chemistry 92 (15): 10588–96.
+##' ([link to article](http://dx.doi.org/10.1021/acs.analchem.0c01551)).
+##'
+##' @examples
+##' \donttest{
+##' williams2020_tmt()
+##' }
+##'
+##' @keywords datasets
+##'
+"williams2020_tmt"
diff --git a/R/woo2022_lung.R b/R/woo2022_lung.R
new file mode 100644
index 0000000..92cd18b
--- /dev/null
+++ b/R/woo2022_lung.R
@@ -0,0 +1,94 @@
+##' Woo et al. 2022 (Cell Syst.): 26 primary human lung cells
+##'
+##' Single-cell proteomics data from dissociated primary human lung
+##' cells. The data were
+##' acquired using the TIFF (transfer identification based on FAIMS
+##' filtering) acquisition method. The data contain 26 single cells.
+##'
+##' @format A [QFeatures] object with 5 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides_[intensity or LFQ]`: 2 assays containing peptide
+##'   quantities or normalized quantities using the maxLFQ method
+##'   as computed by MaxQuant.
+##' - `proteins_[intensity or iBAQ or LFQ]`: 3 assays containing
+##'   protein quantities or normalized proteins using the iBAQ or
+##'   maxLFQ methods as computed by MaxQuant.
+##'
+##' Sample annotation is stored in `colData(woo_lung())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: primary human lung cells were dissociated
+##'   following the protocol in Bandyopadhyay et al., 2018. The cells
+##'   were sorted using the Influx II cell sorter and deposited on a
+##'   nanoPOTS chip.
+##' - **Sample preparation**: cells are lysed using using a DDM+DTT
+##'   lysis and reduction buffer. The proteins are alkylated with IAA
+##'   and digested with LysC and trypsin. Samples are then acidified
+##'   with FA, vacuum dried and stored in freezer until data
+##'   acquisition.
+##' - **Liquid chromatography**: peptides are loaded using an in-house
+##'   autosampler (Williams et al. 2020). The samples are concentrated
+##'   through a SPE column (4cm x 100µm i.d. packed with 5µm C18) with
+##'   microflow LC pump. The peptides are then eluted from an LC
+##'   column (25cm x 50 µm i.d. packed with 1.7µm C18) from a 60 min
+##'   gradient (100nL/min).
+##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
+##'   Lumos Tribrid MS with FAIMSpro coupled to a 2.4 kV ESI. FAIMS
+##'   setup: 4-CV method (-45, -55, -65, -75 V). MS1 setup: resolution
+##'   = 120.000, range = 350-1500 m/z,AGC target of 1E6, accumulation
+##'   of 254ms. MS2 setup: 30% HCD, resolution AGC 2E4, accumulation
+##'   of 254ms.
+##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
+##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
+##'   MBR was enabled.
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the MASSIVE repository (accession ID:
+##' MSV000085937).
+##'
+##' The peptide and protein data were extracted from the
+##' `peptides_nondepleted_Lung_scProteomics.txt` or
+##' `proteinGroups_nondepleted_Lung_scProteomics.txt` files,
+##' respectively, in the `NonDepleted_Lung_SingleCellProteomics`
+##' folders.
+##'
+##' The tables were split so that intensities, maxLFQ, and iBAQ
+##' data are contained in separate tables. Tables are then
+##' converted to [SingleCellExperiment] objects. Sample annotations
+##' were inferred from the sample names. All data is combined in
+##' a [QFeatures] object. [AssayLinks] were stored between peptide
+##' assays and their corresponding proteins assays based on the
+##' leading razor protein (hence only unique peptides are linked to
+##' proteins).
+##'
+##' The script to reproduce the `QFeatures` object is available at
+##' `system.file("scripts", "make-data_woo2022_lung.R", package = "scpdata")`
+##'
+##' @source
+##'
+##' The peptide and protein data can be downloaded from the MASSIVE
+##' repository MSV000085937
+##'
+##' @references
+##'
+##' **Source article**: Woo, Jongmin, Geremy C. Clair, Sarah M.
+##' Williams, Song Feng, Chia-Feng Tsai, Ronald J. Moore, William B.
+##' Chrisler, et al. 2022. “Three-Dimensional Feature Matching
+##' Improves Coverage for Single-Cell Proteomics Based on Ion Mobility
+##' Filtering.” Cell Systems 13 (5): 426–34.e4.
+##' ([link to article](http://dx.doi.org/10.1016/j.cels.2022.02.003)).
+##'
+##' @examples
+##' \donttest{
+##' woo2022_lung()
+##' }
+##'
+##' @keywords datasets
+##'
+"woo2022_lung"
diff --git a/R/woo2022_macrophage.R b/R/woo2022_macrophage.R
new file mode 100644
index 0000000..0bd1e38
--- /dev/null
+++ b/R/woo2022_macrophage.R
@@ -0,0 +1,94 @@
+##' Woo et al. 2022 (Cell Syst.): LPS-treated macrophages
+##'
+##' Single-cell data from macrophages subjected to 3 LPS
+##' treatments. The data were
+##' acquired using the TIFF (transfer identification based on FAIMS
+##' filtering) acquisition method. The data contain 155 single cells:
+##' 54 control cells (no treatment), 52 cells treated with LPS during
+##' 24h and 49 cells treated with LPS during 49h.
+##'
+##' @format A [QFeatures] object with 5 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides_[intensity or LFQ]`: 2 assays containing peptide
+##'   quantities or normalized quantities using the maxLFQ method
+##'   as computed by MaxQuant.
+##' - `proteins_[intensity or iBAQ or LFQ]`: 3 assays containing
+##'   protein quantities or normalized proteins using the iBAQ or
+##'   maxLFQ methods as computed by MaxQuant.
+##'
+##' Sample annotation is stored in `colData(woo_macrophage())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Sample isolation**: cultured RAW 264.7 cells treated or not
+##'   with 100 ng/ul LPS. The cells were sorted using the Influx II
+##'   cell sorter and deposited on a nanoPOTS chip.
+##' - **Sample preparation**: cells are lysed using using a DDM+DTT
+##'   lysis and reduction buffer. The proteins are alkylated with IAA
+##'   and digested with LysC and trypsin. Samples are then acidified
+##'   with FA, vacuum dried and stored in freezer until data
+##'   acquisition.
+##' - **Liquid chromatography**: peptides are loaded using an in-house
+##'   autosampler (Williams et al. 2020). The samples are concentrated
+##'   through a SPE column (4cm x 100µm i.d. packed with 5µm C18) with
+##'   microflow LC pump. The peptides are then eluted from an LC
+##'   column (25cm x 50 µm i.d. packed with 1.7µm C18) from a 60 min
+##'   gradient (100nL/min).
+##' - **Mass spectrometry**: MS/MS was performed on an Orbitrap Fusion
+##'   Lumos Tribrid MS with FAIMSpro coupled to a 2.4 kV ESI. FAIMS
+##'   setup: 4-CV method (-45, -55, -65, -75 V). MS1 setup: resolution
+##'   = 120.000, range = 350-1500 m/z,AGC target of 1E6, accumulation
+##'   of 254ms. MS2 setup: 30% HCD, resolution AGC 2E4, accumulation
+##'   of 254ms.
+##' - **Raw data processing**: preprocessing using Maxquant v1.6.2.10
+##'   that use Andromeda search engine (with UniProtKB 2016-21-29).
+##'   MBR was enabled.
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the MASSIVE repository (accession ID:
+##' MSV000085937).
+##'
+##' The peptide and protein data were extracted from the
+##' `peptides_RAW_LPS_scProteomics.txt` or
+##' `proteinGroups_RAW_LPS_scProteomics.txt` files, respectively, in
+##' the `RAW_LPS_SingleCellProteomics` folders.
+##'
+##' The tables were split so that intensities, maxLFQ, and iBAQ
+##' data are contained in separate tables. Tables are then
+##' converted to [SingleCellExperiment] objects. Sample annotations
+##' were inferred from the sample names. All data is combined in
+##' a [QFeatures] object. [AssayLinks] were stored between peptide
+##' assays and their corresponding proteins assays based on the
+##' leading razor protein (hence only unique peptides are linked to
+##' proteins).
+##'
+##' The script to reproduce the `QFeatures` object is available at
+##' `system.file("scripts", "make-data_woo2022_macrophage.R", package = "scpdata")`
+##'
+##' @source
+##'
+##' The peptide and protein data can be downloaded from the MASSIVE
+##' repository MSV000085937
+##'
+##' @references
+##'
+##' **Source article**: Woo, Jongmin, Geremy C. Clair, Sarah M.
+##' Williams, Song Feng, Chia-Feng Tsai, Ronald J. Moore, William B.
+##' Chrisler, et al. 2022. “Three-Dimensional Feature Matching
+##' Improves Coverage for Single-Cell Proteomics Based on Ion Mobility
+##' Filtering.” Cell Systems 13 (5): 426–34.e4.
+##' ([link to article](http://dx.doi.org/10.1016/j.cels.2022.02.003)).
+##'
+##' @examples
+##' \donttest{
+##' woo2022_macrophage()
+##' }
+##'
+##' @keywords datasets
+##'
+"woo2022_macrophage"
diff --git a/R/zhu2018MCP.R b/R/zhu2018MCP.R
new file mode 100644
index 0000000..bb59a94
--- /dev/null
+++ b/R/zhu2018MCP.R
@@ -0,0 +1,86 @@
+##' Zhu et al. 2018 (Mol. Cel. Prot.): rat brain laser dissections
+##'
+##' Near single-cell proteomics data of laser captured
+##' micro-dissection samples. The samples are 24 brain sections from
+##' rat pups (day 17). The slices are 12 um thick squares of either
+##' 50, 100, or 200 um width. 5 samples were dissected from the corpus
+##' callum (`CC`), 4 samples were dissected from the corpus collosum
+##' (`CP`), 13 samples were extracted from the cerebral cortex
+##' (`CTX`), and 2 samples are labeled as (`Mix`).
+##'
+##' @format A [QFeatures] object with 4 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides`: quantitative information for 13,055 peptides from
+##'   24 samples
+##' - `proteins_intensity`: protein intensities for 2,257 proteins
+##'   from 24 samples
+##' - `proteins_LFQ`: LFQ intensities for 2,257 proteins from 24 samples
+##' - `proteins_iBAQ`: iBAQ values for 2,257 proteins from 24 samples
+##'
+##' Sample annotation is stored in `colData(zhu2018MCP())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the original article (see `References`).
+##'
+##' - **Cell isolation**: brain patches were collected using
+##'   laser-capture microdissection (PALM MicroBeam) on flash frozen
+##'   rat (*Rattus norvergicus*) brain tissues. Note that the samples
+##'   were stained with H&E before dissection for histological
+##'   analysis. DMSO is used as sample collection solution
+##' - **Sample preparation** performed using the nanoPOTs device: DMSO
+##'   evaporation + protein extraction (DMM + DTT) + alkylation (IAA)
+##'   + Lys-C digestion + trypsin digestion.
+##' - **Separation**: nanoLC (Dionex UltiMate with an in-house packed
+##'   60cm x 30um LC columns; 50nL/min)
+##' - **Ionization**: ESI (2,000V)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid (MS1 accumulation time = 246ms; MS1 resolution =
+##'   120,000; MS1 AGC = 3E6). The MS/MS settings  depend on the
+##'   sample size, excepted for the AGC = 1E5. 50um (time = 502ms;
+##'   resolution = 240,000), 100um (time = 246ms; resolution =
+##'   120,000), 200um (time = 118ms; resolution = 60,000).
+##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus (v1.5.6.0) +
+##'   Origin Pro 2017
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE repository (accession
+##' ID: PXD008844).  We downloaded the `MaxQuant_Peptides.txt`
+##' and the `MaxQuant_ProteinGroups.txt` files containing the
+##' combined identification and quantification
+##' results. The sample annotations were inferred from the names of
+##' columns holding the quantification data and the information in the
+##' article. The peptides data were converted to a [SingleCellExperiment]
+##' object. We split the protein table to separate the three types of
+##' quantification: protein intensity, label-free quantitification
+##' (LFQ) and intensity based absolute quantification (iBAQ). Each
+##' table is converted to a [SingleCellExperiment] object along with
+##' the remaining protein annotations. The 4 objects are combined in
+##' a single [QFeatures] object and feature links are created based on
+##' the peptide leading razor protein ID and the protein ID.
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD008844. FTP link
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/07/PXD008844
+##'
+##' @references
+##' Zhu, Ying, Maowei Dou, Paul D. Piehowski, Yiran Liang, Fangjun
+##' Wang, Rosalie K. Chu, William B. Chrisler, et al. 2018. “Spatially
+##' Resolved Proteome Mapping of Laser Capture Microdissected Tissue
+##' with Automated Sample Transfer to Nanodroplets.” Molecular &
+##' Cellular Proteomics: MCP 17 (9): 1864–74
+##' ([link to article](http://dx.doi.org/10.1074/mcp.TIR118.000686)).
+##'
+##' @examples
+##' \donttest{
+##' zhu2018MCP()
+##' }
+##'
+##' @keywords datasets
+##'
+##'
+"zhu2018MCP"
diff --git a/R/zhu2018NC_hela.R b/R/zhu2018NC_hela.R
new file mode 100644
index 0000000..27e6823
--- /dev/null
+++ b/R/zhu2018NC_hela.R
@@ -0,0 +1,86 @@
+##' Zhu et al. 2018 (Nat. Comm.): HeLa titration
+##'
+##' Near single-cell proteomics data of HeLa samples containing
+##' different number of cells. There are three groups of cell
+##' concentrations: low (10-14 cells), medium (35-45 cells) and high
+##' (137-141 cells). The data also contain measures for blanks, HeLa
+##' lysates (50 cell equivalent) and 2 cancer cell line lysates (MCF7
+##' and THP1, 50 cell equivalent).
+##'
+##' @format A [QFeatures] object with 4 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides`: quantitative information for 37,795 peptides from
+##'   21 samples
+##' - `proteins_intensity`: protein intensities for 3,984 proteins
+##'   from 21 samples
+##' - `proteins_LFQ`: LFQ intensities for 3,984 proteins from 21
+##'   samples
+##' - `proteins_iBAQ`: iBAQ values for 3,984 proteins from 21 samples
+##'
+##' Sample annotation is stored in `colData(zhu2018NC_hela())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the original article (see `References`).
+##'
+##' - **Cell isolation**: HeLa cell concentration was adjusted by
+##'   serial dilution and cell counting was performed manually using
+##'   an inverted microscope.
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
+##'   Lys-C digestion + cleave RapiGest (formic acid).
+##' - **Separation**: nanoACQUITY UPLC pump (60nL/min) with an
+##'   Self-Pack PicoFrit 70cm x 30um LC columns.
+##' - **Ionization**: ESI (1,900V).
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
+##'   120,000; AGC = 1E6. MS/MS settings, depend on the sample size,
+##'   excepted for the AGC = 1E5. Blank and approx. 10 cells (time = 502ms;
+##'   resolution = 240,000), approx. 40 cells (time = 246ms; resolution =
+##'   120,000), approx. 140 cells (time = 118ms; resolution = 60,000).
+##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
+##'   2017
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE repository (accession
+##' ID: PXD006847).  We downloaded the `CulturedCells_peptides.txt`
+##' and the `CulturedCells_proteinGroups.txt` files containing the
+##' combined identification and quantification
+##' results. The sample annotations were inferred from the names of
+##' columns holding the quantification data and the information in the
+##' article. The peptides data were converted to a [SingleCellExperiment]
+##' object. We split the protein table to separate the three types of
+##' quantification: protein intensity, label-free quantitification
+##' (LFQ) and intensity based absolute quantification (iBAQ). Each
+##' table is converted to a [SingleCellExperiment] object along with
+##' the remaining protein annotations. The 4 objects are combined in
+##' a single [QFeatures] object and feature links are created based on
+##' the peptide leading razor protein ID and the protein ID.
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD006847. FTP link:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
+##'
+##' @references
+##'
+##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
+##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
+##' Processing Platform for Deep and Quantitative Proteome Profiling
+##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
+##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
+##'
+##' @seealso The same experiment was conducted on HeLa lysates:
+##' [zhu2018NC_lysates].
+##'
+##' @examples
+##' \donttest{
+##' zhu2018NC_hela()
+##' }
+##'
+##' @keywords datasets
+##'
+"zhu2018NC_hela"
diff --git a/R/zhu2018NC_islets.R b/R/zhu2018NC_islets.R
new file mode 100644
index 0000000..c0026a7
--- /dev/null
+++ b/R/zhu2018NC_islets.R
@@ -0,0 +1,82 @@
+##' Zhu et al. 2018 (Nat. Comm.): human pancreatic islets
+##'
+##'
+##' Near single-cell proteomics data human pancreas samples. The
+##' samples were collected from pancreatic tissue slices using laser
+##' dissection. The pancreata were obtained from organ donors through
+##' the JDRFNetwork for Pancreatic Organ Donors with Diabetes (nPOD)
+##' program. The sample come either from control patients (n=9) or
+##' from type 1 diabetes (T1D) patients (n=9).
+##'
+##' @format A [QFeatures] object with 4 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides`: quantitative information for 24,321 peptides from
+##'   18 islet samples
+##' - `proteins_intensity`: quantitative information for 3,278
+##'   proteins from 18 islet samples
+##' - `proteins_LFQ`: LFQ intensities for 3,278 proteins from 18 islet
+##'   samples
+##' - `proteins_iBAQ`: iBAQ values for 3,278 proteins from 18 islet
+##'   samples
+##'
+##' Sample annotation is stored in `colData(zhu2018NC_islets())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The islets were extracted from the pacreatic
+##'   tissues using laser-capture microdissection.
+##' - **Sample preparation** performed using the nanoPOTs device.
+##'   Protein extraction using RapiGest (+ DTT) + alkylation (IAA) +
+##'   Lys-C digestion + cleave RapiGest (formic acid)
+##' - **Separation**: nanoACQUITY UPLC pump with an Self-Pack PicoFrit
+##'   70cm x 30um LC columns; 60nL/min)
+##' - **Ionization**: ESI (1,900V)
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
+##'   120,000; AGC = 1E6. MS/MS settings: accumulation time = 118ms;
+##'   resolution = 60,000; AGC = 1E5.
+##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
+##'   2017
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE repository (accession
+##' ID: PXD006847).  We downloaded the `Islet_t1d_ct_peptides.txt`
+##' and the `Islet_t1d_ct_proteinGroups.txt` files containing the
+##' combined identification and quantification results. The sample
+##' types were inferred from the names of columns holding the
+##' quantification data. The peptides data were converted to a
+##' [SingleCellExperiment] object. We split the protein table to
+##' separate the three types of quantification: protein intensity,
+##' label-free quantitification (LFQ) and intensity based absolute
+##' quantification (iBAQ). Each table is converted to a
+##' [SingleCellExperiment] object along with the remaining protein
+##' annotations. The 4 objects are combined in a single [QFeatures]
+##' object and feature links are created based on the peptide leading
+##' razor protein ID and the protein ID.
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD006847. The source link is:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
+##'
+##' @references
+##'
+##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
+##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
+##' Processing Platform for Deep and Quantitative Proteome Profiling
+##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
+##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
+##'
+##' @examples
+##' \donttest{
+##' zhu2018NC_islets()
+##' }
+##'
+##' @keywords datasets
+##'
+"zhu2018NC_islets"
diff --git a/R/zhu2018NC_lysates.R b/R/zhu2018NC_lysates.R
new file mode 100644
index 0000000..c49ab40
--- /dev/null
+++ b/R/zhu2018NC_lysates.R
@@ -0,0 +1,85 @@
+##' Zhu et al. 2018 (Nat. Comm.): HeLa lysates
+##'
+##' Near single-cell proteomics data of HeLa lysates at different
+##' concentrations (10, 40 and 140 cell equivalent). Each
+##' concentration is acquired in triplicate.
+##'
+##' @format A [QFeatures] object with 4 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `peptides`: quantitative information for 14,921 peptides from
+##'   9 lysate samples
+##' - `proteins_intensity`: quantitative information for 2,199
+##'   proteins from 9 lysate samples
+##' - `proteins_LFQ`: LFQ intensities for 2,199 proteins from 9 lysate
+##'   samples
+##' - `proteins_iBAQ`: iBAQ values for 2,199 proteins from 9 lysate
+##'   samples
+##'
+##' Sample annotation is stored in `colData(zhu2018NC_lysates())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the original article (see `References`).
+##'
+##' - **Cell isolation**: HeLas were collected from cell cultures.
+##' - **Sample preparation** performed in bulk (5E5 cells/mL). Protein
+##'   extraction using RapiGest (+ DTT) + dilution to target
+##'   concentration + alkylation (IAA) + Lys-C digestion + trypsin
+##'   digestion + cleave RapiGest (formic acid).
+##' - **Separation**: nanoACQUITY UPLC pump (60nL/min) with an
+##'   Self-Pack PicoFrit 70cm x 30um LC columns.
+##' - **Ionization**: ESI (1,900V).
+##' - **Mass spectrometry**: Thermo Fisher Orbitrap Fusion Lumos
+##'   Tribrid. MS1 settings: accumulation time = 246ms; resolution =
+##'   120,000; AGC = 1E6. MS/MS settings, depend on the sample size,
+##'   excepted for the AGC = 1E5. Blank and approx. 10 cells (time = 502ms;
+##'   resolution = 240,000), approx. 40 cells (time = 246ms; resolution =
+##'   120,000), approx. 140 cells (time = 118ms; resolution = 60,000).
+##' - **Data analysis**: MaxQuant (v1.5.3.30) + Perseus + OriginLab
+##'   2017.
+##'
+##' @section Data collection:
+##'
+##' The data were collected from the PRIDE repository (accession
+##' ID: PXD006847).  We downloaded the `Vail_Prep_Vail_peptides.txt`
+##' and the `Vail_Prep_Vail_proteinGroups.txt` files containing the
+##' combined identification and quantification
+##' results. The sample annotations were inferred from the names of
+##' columns holding the quantification data and the information in the
+##' article. The peptides data were converted to a [SingleCellExperiment]
+##' object. We split the protein table to separate the three types of
+##' quantification: protein intensity, label-free quantitification
+##' (LFQ) and intensity based absolute quantification (iBAQ). Each
+##' table is converted to a [SingleCellExperiment] object along with
+##' the remaining protein annotations. The 4 objects are combined in
+##' a single [QFeatures] object and feature links are created based on
+##' the peptide leading razor protein ID and the protein ID.
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD006847. The source link is:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2018/01/PXD006847
+##'
+##' @references
+##'
+##' Zhu, Ying, Paul D. Piehowski, Rui Zhao, Jing Chen, Yufeng Shen,
+##' Ronald J. Moore, Anil K. Shukla, et al. 2018. “Nanodroplet
+##' Processing Platform for Deep and Quantitative Proteome Profiling
+##' of 10-100 Mammalian Cells.” Nature Communications 9 (1): 882
+##' ([link to article](http://dx.doi.org/10.1038/s41467-018-03367-w)).
+##'
+##' @seealso The same experiment was conducted directly on HeLa cells
+##' samples rather than lysates. The data is available in
+##' [zhu2018NC_hela].
+##'
+##' @examples
+##' \donttest{
+##' zhu2018NC_lysates()
+##' }
+##'
+##' @keywords datasets
+##'
+##'
+"zhu2018NC_lysates"
diff --git a/R/zhu2019EL.R b/R/zhu2019EL.R
new file mode 100644
index 0000000..e37b6f6
--- /dev/null
+++ b/R/zhu2019EL.R
@@ -0,0 +1,104 @@
+##' Zhu et al. 2019 (eLife): chicken utricle cells
+##'
+##'
+##' Single-cell proteomics data from chicken utricle acquired to
+##' study the hair-cell development. The cells are isolated from
+##' peeled utrical epithelium and separated into hair cells (FM1-43
+##' high) and supporting cells (FM1-43 low). The sample contain either
+##' 1 cell (n = 28), 3 cells (n = 7), 5 cells (n = 8) or 20 cells (n =
+##' 14).
+##'
+##' @format A [QFeatures] object with 62 assays, each assay being a
+##' [SingleCellExperiment] object:
+##'
+##' - `XYZw`: 60 assays containing PSM data. The sample are annotated
+##'   as follows. `X` indicates the experiment, either 1 or 2. `Y`
+##'   indicated the FM1-43 signal, either high (H) or low (L). `Z`
+##'   indicates the number of cells (0, 1, 3, 5 or 20). `w` indicates
+##'   the replicate, starting from `a`, it can go up to `j`.
+##' - `peptides`: quantitative data for 3444 peptides in 60 samples
+##'   (all runs are combined).
+##' - `proteins_intensity`: protein intensities for 840 proteins
+##'   from 24 samples
+##' - `proteins_iBAQ`: iBAQ values for 840 proteins from 24 samples
+##'
+##' Sample annotation is stored in `colData(zhu2019EL())`.
+##'
+##' @section Acquisition protocol:
+##'
+##' The data were acquired using the following setup. More information
+##' can be found in the source article (see `References`).
+##'
+##' - **Cell isolation**: The cells were taken from the utricles of
+##'   E15 chick embryos. Samples were stained with FM1-43FX and the
+##'   cells were dissociated using enzymatic digestion. Cells were
+##'   FACS sorted (BD Influx) and split based on their FM1-43 signal,
+##'   while ensuring no debris, doublets or dead cells are retained.
+##' - **Sample preparation** performed using the nanoPOTs device. Cell
+##'   lysis and protein extraction and reduction are performed using
+##'   dodecyl beta-D-maltoside + DTT + ammonium bicarbonate. Protein
+##'   were then alkylated using IAA. Protein digestion is performed
+##'   using Lys-C and trypsin. Finally samples acidification is
+##'   performed using formic acid.
+##' - **Separation**: Dionex UltiMate pump with an C18-Packed column
+##'   (50cm x 30um; 60nL/min)
+##' - **Ionization**: ESI (2,000V)
+##' - **Mass spectrometry**: Orbitrap Fusion Lumos Tribrid. MS1
+##'   settings: accumulation time = 246ms; resolution = 120,000; AGC =
+##'   3E6. MS/MS settings: accumulation time = 502ms; resolution =
+##'   120,000; AGC = 2E5.
+##' - **Data analysis**: Andromeda & MaxQuant (v1.5.3.30) and the
+##'   search database is NCBI GRCg6a.
+##'
+##' @section Data collection:
+##'
+##' All data were collected from the PRIDE repository (accession ID:
+##' PXD014256).
+##'
+##' The sample annotation information is provided in the
+##' `Zhu_2019_chick_single_cell_samples_CORRECTED.xlsx` file. This file
+##' was given during a personal discussion and is a corrected version
+##' of the annotation table available on the PRIDE repository.
+##'
+##' The PSM data were found in the `evidence.txt` (in the
+##' `Experiment 1+ 2`) folder. The PSM data were filtered so that it
+##' contains only samples that are annotated. The data were then
+##' converted to a [QFeatures] object using the [scp::readSCP()]
+##' function.
+##'
+##' The peptide data were found in the `peptides.txt` file. The column
+##' names holding the quantitative data were adapted to match the
+##' sample names in the [QFeatures] object. The data were then
+##' converted to a [SingleCellExperiment] object and then inserted in
+##' the [QFeatures] object. Links between the PSMs and the peptides
+##' were added
+##'
+##' A similar procedure was applied to the protein data. The data were
+##' found in the `proteinGroups.txt` file. We split the protein table
+##' to separate the two types of quantification: summed intensity and
+##' intensity based absolute quantification (iBAQ). Both tables are
+##' converted to [SingleCellExperiment] objects and are added to the
+##' [QFeatures] object as well as the `AssayLink` between peptides and
+##' proteins.
+##'
+##' @source
+##' The PSM data can be downloaded from the PRIDE repository
+##' PXD014256. The source link is:
+##' ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2019/11/PXD014256
+##'
+##' @references
+##'
+##' Zhu, Ying, Mirko Scheibinger, Daniel Christian Ellwanger, Jocelyn
+##' F. Krey, Dongseok Choi, Ryan T. Kelly, Stefan Heller, and Peter G.
+##' Barr-Gillespie. 2019. “Single-Cell Proteomics Reveals Changes in
+##' Expression during Hair-Cell Development.” eLife 8 (November).
+##' ([link to article](https://doi.org/10.7554/eLife.50777)).
+##'
+##' @examples
+##' \donttest{
+##' zhu2019EL()
+##' }
+##'
+##' @keywords datasets
+##'
+"zhu2019EL"