Skip to content

Guide to processing raw LCMS metabolomic and lipidomic data using MS-DIAL, followed by data pre-processing and secondary annotation (of metabolites) in R.

License

Notifications You must be signed in to change notification settings

clzani/metabolome-lipidome-MSDIAL

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

59 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Processing Metabolome and Lipidome Data with MS-DIAL

MS-DIAL provides a pipeline for untargeted metabolomics. Here we will discuss the process for generating intensity height tables from raw LC-MS data, and subsequent export for downstream analysis and secondary annotation methods.

Requirements

As of MS-DIAL version 4.70, it is no longer a requirement to convert your .RAW LC-MS files into .ABF format, however if you are using an older version of MS-DIAL, you will need to download the ABF file converter.

Methods

Start up a project

Begin by opening MS-DIAL and starting a new project from the 'File' menu at the top-left of the screen. You will then need to select the 'Project file path:' that contains your LC-MS files using the 'Browse' button.

It is recommended that you change the default .mtd file name generated by MS-DIAL to something more easily recognisable in the future, such as \date_sampletype_ionisationmode.mtd.

For our data, most of the default options will be appropriate. Ensure you choose the correct 'Ion mode' at the bottom left, and that you select the correct 'Target omics' from either metabolomics or lipidomics.

New project window

The next screen will ask you to define your 'Analysis file paths'. Click the browse button, and select all of your LC-MS files. Note that these must be located in the same folder as your project in order to be valid.

Once you have done this, you will need to change the 'Type' column to indicate whether that sample is a sample, blank, QC, or standard. It is also recommended, for quick analysis later on, to change the values in the 'Class ID' column.

For example, in the 'Class ID' column below, it is indicated what the sample type is (if you have multiple groups, you can specify these here), and of note, while MS2 samples are given a 'Type' of QC, you can specify that they are their own class in this column.

Once you have finished, click 'Next' to continue.

Analysis parameter setting - Data collection

If you have a parameter configuration file, you can load it in via the 'Load' button in the bottom-left of the window.

We begin the setting of analysis parameters by inputting the data collection parameters.

Mass Accuracy:

Metabolomics Lipidomics
MS1 tolerance 0.002 Da 0.002 Da
MS2 tolerance 0.002 Da 0.002 Da

Data collection parameters:

Metabolomics Lipidomics
Retention time begin 0 min 0 min
Retention time end 40 min 40 min
MS1 mass range begin 50 Da 200 Da
MS1 mass range end 1000 Da 1300 Da
MS/MS mass range begin 50 Da 200 Da
MS/MS mass range end 1000 Da 1300 Da

Isotope recognition:

Metabolomics Lipidomics
Maximum charged number 2 2
Consider Cl and Br elements FALSE FALSE

Multithreading (will depend on your machine):

Metabolomics Lipidomics
Number of threads 8 8
Execute retention time corrections FALSE FALSE

Peak detection

Next we select the minimum peak height threshold. Peaks below this threshold will not be retained. A value of 100,000 is recommended for data acquired by Thermo Scientific Xcalibur machines. However, this will vary by apparatus, and may require data-dependant tuning.

We will leave the 'Mass slice width' value to the default, along with all options in the drop-down 'Advanced' menu.

Metabolomics Lipidomics
Minimum peak height 100000 100000
Mass slice width 0.05 Da 0.05 Da

MS2Dec

The default values are suitable.

Metabolomics Lipidomics
Sigma window value 0.5 0.5
MS/MS abundance cut off 0 0
Exclude after precursor ion TRUE TRUE
Keep the isotopic ions until 0.5 Da 0.5 Da
Keep the isotopic ions w/o MS2Dec FALSE FALSE

Identification

In this tab, you will need the appropriate .msp file for identification of metabolites: this will be either the positive or negative MassBank file you downloaded depending on the ionisation mode you are running currently. Leave this blank for lipidomics.

We will also set a number of parameters:

Metabolomics Lipidomics
Retention time tolerance 0.1 min
Accurate mass tolerance (MS1) 0.002 Da
Accurate mass tolerance (MS2) 0.002 Da
Identification score cut off 80 %
Use retention time for scoring FALSE
Use retention time for filtering FALSE

Under the 'Advanced' drop-down menu, you can select a text file containing the standards run during data acquisition. This file should contain three columns, including the metabolite name, m/z, and retention time (in that order). These columns should be named Metabolite, MZ, and RT.

We will also adjust the settings here:

Metabolomics Lipidomics
Retention time tolerance 0.1 min
Accurate mass tolerance 0.002 Da
Identification score cut off 85 %
Relative abundance cut off 0 %
Only report the top hit TRUE

Adduct

Here we will select the appropriate adduct ion settings for our runs.

Metabolomics Positive Metabolomics Negative Lipidomics
[M+H]+ [M-H]- [M+H]+
[M+NH4]+ [M+Na-2H]- [M+NH4]+
[M+Na]+ [M+Cl]- [M-H]-

Alignment

Here, you should rename the 'Result name:' to something more recognisable. For example, alignment_data_sampletype_ionisation.

Set the 'Reference file:' to your second QC sample. The first QC sample is typically different than the others, and this will affect your results.

The 'Retention time tolerance:' parameter will be data-dependant, however a good start is to try either 0.3 min or 1 min.

Metabolomics Lipidomics
Retention time tolerance 0.3 min 0.3 min
MS1 tolerance 0.002 Da 0.002 Da

There are also some other parameters we need to check in the 'Advanced' drop-down menu. Because we don't remove the features based on blank information, the following three tick boxes are 'greyed out', and not available to change. You do not need to change the default options for these.

Gap filling by compulsion may be required for your data, but typically with a higher retention time tolerance (as set above), gap filling does not improve the results, so it is suggested to leave it unchecked initially.

Metabolomics Lipidomics
Retention time factor 0.5 0.5
MS1 factor 0.5 0.5
Peak count filter 0 % 0 %
N% detected in at least one group 0 % 0 %
Remove features based on blank information FALSE FALSE
Sample max / blank average 5 5
Keep 'reference matched' metabolite features FALSE FALSE
Keep 'suggested (w/o MS2)' metabolite features FALSE FALSE
Keep removable features and assign the tag FALSE FALSE
Gap filling by compulsion FALSE FALSE
Together with Alignment TRUE TRUE

Run the pipeline

Once you have finished setting up the analysis parameters, click 'Finish' to begin the run.

When the run finishes, the data will appear on the screen. On the left-hand side of the screen, you will see an 'Alignment navigator' box. Double click the file that is inside to load all of your data at once.

Viewing and exporting data

The 'Show ion table' button in the middle of the screen is a good place to start to see how many metabolites/lipids (features) were found during the run. You can click the 'Metabolite name' column to group all features that were annotated.

Aligned spot information

If you want to view more information about a specific feature, you can single click on the row within the ion table list.

At the top of the screen, the default image shown is the 'Bar chart of aligned spot (OH)'. This will show you the average intensities of each of your 'Class IDs'.

To view the actual peak itself, you can click the adjacent 'EIC of aligned spot' tab to view and assess the quality of the peak. Good quality peaks, especially for smaller weight features, are tight and well-aligned (see the image below).

Next, if you want to see the individual peaks for each of your samples, you can right-click on the peak window and click the 'Table viewer for curating each chromatogram' option. This will show you the peak for that feature for each sample so you can further assess their quality. This will be necessary for manual curation of features later to confirm confidence in the annotation of significant findings.

Ideally, after downstream pre-processing, you would come back to MS-DIAL and manually curate each of the features, removing those with poor-quality spectra. In practice, sometimes there can be a couple of thousand annotated metabolites, and this may not be feasible. It should be noted in these cases that your ordination plots (e.g. PCoA) will be affected by these poor quality features.

Exporting for downstream processing and analysis

There are a few useful things we can routinely export:

  • The raw data height matrix (contains all of our intensity values and feature information).
  • Files for secondary MS/MS annotation with GNPS (see our GNPS processing guide).
  • Analysis parameters (to streamline future analysis set-up).

For consistency and easy import into R, best practice is to open each of your exported files in Excel, and save them as .csv files, then delete the original .txt files.

Height table export

To export the raw height matrix table, navigate to the 'Alignment result' option within the 'Export' drop-down menu.

Select the directory for export, leave all other options to their defaults, and click 'Export'.

GNPS export

Our GNPS processing guide goes through this in more detail, but the key points are to choose your desired export directory, and then:

  • Deselect the 'Raw data matrix (Height) checkbox.
  • Select the 'GNPS export' checkbox.
  • Set the 'Target file' drop-down menu to your second QC sample.
  • Set the 'Export format:' option to mgf.
  • Click 'Export'

Parameter export

To export the analysis parameters, navigate to the 'Parameter export (Tab-delimited text)' option within the 'Export' drop-down menu. Select your export directory, and click 'Save'.

Data pre-processing in R

To pre-process our raw height data within R, we will use tools from the pmp package. These steps are explained in depth within our pmp preprocessing guide.

A selection of custom scripts are also available to handle the pre-processing steps for you, with all of the customisation and fine-tuning normally available when running the steps individually.

Secondary annotation methods for metabolomics data

We can use additional tools to annotate more of our metabolite features than MS-DIAL can achieve alone.

GNPS secondary annotation

We can acquire secondary annotation of our MS/MS data using an online tool provided by the Global Natural Products Social (GNPS) Molecular Network.

See our guide for running feature-based molecular networking (FBMN) with GNPS, and subsequent incorporation with your MS-DIAL data in R here.

HMDB secondary annotation

Similarly we can acquire secondary annotation of our MS1 data using the Human Metabolome Database (HMDB). Unlike secondary MS/MS annotation with GNPS, this step occurs following pre-preprocessing within R.

See our guide for annotation of your MS-DIAL data in R using the HMDB database here.

Manual curation of peaks

Now that we have a SummarizedExperiment object, filtered for features with at least one annotation, we need to manually curate the spectra within MS-DIAL. While the pmp pre-processing pipeline does a good job at filtering your LCMS datasets for the best quality data, it is not able to discern the quality of the spectra directly. It is important to manually curate the data before continuing with downstream analysis, as the poor quality data will affect ordination and statistical tests.

The most efficient method for manual curation is to save the alignment ID, names, and ionisation mode information element from your SummarizedExperiment object, for example (using the function save_curation_table()):

# Save the curation table
save_curation_table(metab_stool_glog, here::here('data', 'manual_curation', 'stool_curation.csv'))

Now that we have our stool_curation.csv file, we can import it into Google Sheets and add an additional column named quality_peak that is filled with tickboxes.

Now, when we return to MS-DIAL, we can check the peaks using the Alignment_ID values as our guide, and check the tickboxes of those that are accepted.

Once we have completed this task for both the positive and negative ionisation modes, we can save the Google sheet as a .csv file, and import it into R. Thankfully, it is very easy to filter SummarizedExperiment objects, and we can do this using our quality_peak column.

# Load quality data
metab_stool_quality <- read_csv(here::here('data', 'manual_curation', 'stool_curated.csv'))

# Filter the features using the TRUE/FALSE quality_peak column
metab_stool_glog <- metab_stool_glog[metab_stool_quality$quality_peak,]

Downstream analysis

Once you have reached this point, you should have a SummarizedExperiment object (complete with sample and feature metadata) that contains only annotated features, with intensity values that have been normalised and transformed, and are ready for downstream analysis in R.

Rights

  • Copyright (c) 2021 Respiratory Immunology lab, Monash University, Melbourne, Australia.
  • MS-DIAL: link
  • License: This pipeline is provided under the MIT license (See LICENSE.txt for details)
  • Authors: M. Macowan, C. Pattaroni, A. Butler, and G. Iacono.

About

Guide to processing raw LCMS metabolomic and lipidomic data using MS-DIAL, followed by data pre-processing and secondary annotation (of metabolites) in R.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • R 100.0%