Processing Metabolome and Lipidome Data with MS-DIAL

MS-DIAL provides a pipeline for untargeted metabolomics. Here we will discuss the process for generating intensity height tables from raw LC-MS data, and subsequent export for downstream analysis and secondary annotation methods.

Requirements

As of MS-DIAL version 4.70, it is no longer a requirement to convert your .RAW LC-MS files into .ABF format, however if you are using an older version of MS-DIAL, you will need to download the ABF file converter.

ABF File Converter (depending on MS-DIAL version).
MS-DIAL software for Windows.
MassBank MS/MS positive and negative database files.
The LC-MS files all present in a single folder.
The LC-MS standards file generated during data collection (should contain columns for: metabolite name, m/z, and retention time).

Methods

Start up a project

Begin by opening MS-DIAL and starting a new project from the 'File' menu at the top-left of the screen. You will then need to select the 'Project file path:' that contains your LC-MS files using the 'Browse' button.

It is recommended that you change the default .mtd file name generated by MS-DIAL to something more easily recognisable in the future, such as \date_sampletype_ionisationmode.mtd.

For our data, most of the default options will be appropriate. Ensure you choose the correct 'Ion mode' at the bottom left, and that you select the correct 'Target omics' from either metabolomics or lipidomics.

New project window

The next screen will ask you to define your 'Analysis file paths'. Click the browse button, and select all of your LC-MS files. Note that these must be located in the same folder as your project in order to be valid.

Once you have done this, you will need to change the 'Type' column to indicate whether that sample is a sample, blank, QC, or standard. It is also recommended, for quick analysis later on, to change the values in the 'Class ID' column.

For example, in the 'Class ID' column below, it is indicated what the sample type is (if you have multiple groups, you can specify these here), and of note, while MS2 samples are given a 'Type' of QC, you can specify that they are their own class in this column.

Once you have finished, click 'Next' to continue.

Analysis parameter setting - Data collection

If you have a parameter configuration file, you can load it in via the 'Load' button in the bottom-left of the window.

We begin the setting of analysis parameters by inputting the data collection parameters.

Mass Accuracy:

	Metabolomics	Lipidomics
MS1 tolerance	0.002 Da	0.002 Da
MS2 tolerance	0.002 Da	0.002 Da

Data collection parameters:

	Metabolomics	Lipidomics
Retention time begin	0 min	0 min
Retention time end	40 min	40 min
MS1 mass range begin	50 Da	200 Da
MS1 mass range end	1000 Da	1300 Da
MS/MS mass range begin	50 Da	200 Da
MS/MS mass range end	1000 Da	1300 Da

Isotope recognition:

	Metabolomics	Lipidomics
Maximum charged number	2	2
Consider Cl and Br elements	FALSE	FALSE

Multithreading (will depend on your machine):

	Metabolomics	Lipidomics
Number of threads	8	8
Execute retention time corrections	FALSE	FALSE

Peak detection

Next we select the minimum peak height threshold. Peaks below this threshold will not be retained. A value of 100,000 is recommended for data acquired by Thermo Scientific Xcalibur machines. However, this will vary by apparatus, and may require data-dependant tuning.

We will leave the 'Mass slice width' value to the default, along with all options in the drop-down 'Advanced' menu.

	Metabolomics	Lipidomics
Minimum peak height	100000	100000
Mass slice width	0.05 Da	0.05 Da

MS2Dec

The default values are suitable.

	Metabolomics	Lipidomics
Sigma window value	0.5	0.5
MS/MS abundance cut off	0	0
Exclude after precursor ion	TRUE	TRUE
Keep the isotopic ions until	0.5 Da	0.5 Da
Keep the isotopic ions w/o MS2Dec	FALSE	FALSE

Identification

In this tab, you will need the appropriate .msp file for identification of metabolites: this will be either the positive or negative MassBank file you downloaded depending on the ionisation mode you are running currently. Leave this blank for lipidomics.

We will also set a number of parameters:

	Metabolomics	Lipidomics
Retention time tolerance	0.1 min
Accurate mass tolerance (MS1)	0.002 Da
Accurate mass tolerance (MS2)	0.002 Da
Identification score cut off	80 %
Use retention time for scoring	FALSE
Use retention time for filtering	FALSE

Under the 'Advanced' drop-down menu, you can select a text file containing the standards run during data acquisition. This file should contain three columns, including the metabolite name, m/z, and retention time (in that order). These columns should be named Metabolite, MZ, and RT.

We will also adjust the settings here:

	Metabolomics	Lipidomics
Retention time tolerance	0.1 min
Accurate mass tolerance	0.002 Da
Identification score cut off	85 %
Relative abundance cut off	0 %
Only report the top hit	TRUE

Adduct

Here we will select the appropriate adduct ion settings for our runs.

Metabolomics Positive	Metabolomics Negative	Lipidomics
[M+H]+	[M-H]-	[M+H]+
[M+NH4]+	[M+Na-2H]-	[M+NH4]+
[M+Na]+	[M+Cl]-	[M-H]-

Alignment

Here, you should rename the 'Result name:' to something more recognisable. For example, alignment_data_sampletype_ionisation.

Set the 'Reference file:' to your second QC sample. The first QC sample is typically different than the others, and this will affect your results.

The 'Retention time tolerance:' parameter will be data-dependant, however a good start is to try either 0.3 min or 1 min.

	Metabolomics	Lipidomics
Retention time tolerance	0.3 min	0.3 min
MS1 tolerance	0.002 Da	0.002 Da

There are also some other parameters we need to check in the 'Advanced' drop-down menu. Because we don't remove the features based on blank information, the following three tick boxes are 'greyed out', and not available to change. You do not need to change the default options for these.

Gap filling by compulsion may be required for your data, but typically with a higher retention time tolerance (as set above), gap filling does not improve the results, so it is suggested to leave it unchecked initially.

	Metabolomics	Lipidomics
Retention time factor	0.5	0.5
MS1 factor	0.5	0.5
Peak count filter	0 %	0 %
N% detected in at least one group	0 %	0 %
Remove features based on blank information	FALSE	FALSE
Sample max / blank average	5	5
Keep 'reference matched' metabolite features	FALSE	FALSE
Keep 'suggested (w/o MS2)' metabolite features	FALSE	FALSE
Keep removable features and assign the tag	FALSE	FALSE
Gap filling by compulsion	FALSE	FALSE
Together with Alignment	TRUE	TRUE

Run the pipeline

Once you have finished setting up the analysis parameters, click 'Finish' to begin the run.

When the run finishes, the data will appear on the screen. On the left-hand side of the screen, you will see an 'Alignment navigator' box. Double click the file that is inside to load all of your data at once.

Viewing and exporting data

The 'Show ion table' button in the middle of the screen is a good place to start to see how many metabolites/lipids (features) were found during the run. You can click the 'Metabolite name' column to group all features that were annotated.

Aligned spot information

If you want to view more information about a specific feature, you can single click on the row within the ion table list.

At the top of the screen, the default image shown is the 'Bar chart of aligned spot (OH)'. This will show you the average intensities of each of your 'Class IDs'.

To view the actual peak itself, you can click the adjacent 'EIC of aligned spot' tab to view and assess the quality of the peak. Good quality peaks, especially for smaller weight features, are tight and well-aligned (see the image below).

Next, if you want to see the individual peaks for each of your samples, you can right-click on the peak window and click the 'Table viewer for curating each chromatogram' option. This will show you the peak for that feature for each sample so you can further assess their quality. This will be necessary for manual curation of features later to confirm confidence in the annotation of significant findings.

Ideally, after downstream pre-processing, you would come back to MS-DIAL and manually curate each of the features, removing those with poor-quality spectra. In practice, sometimes there can be a couple of thousand annotated metabolites, and this may not be feasible. It should be noted in these cases that your ordination plots (e.g. PCoA) will be affected by these poor quality features.

Exporting for downstream processing and analysis

There are a few useful things we can routinely export:

The raw data height matrix (contains all of our intensity values and feature information).
Files for secondary MS/MS annotation with GNPS (see our GNPS processing guide).
Analysis parameters (to streamline future analysis set-up).

For consistency and easy import into R, best practice is to open each of your exported files in Excel, and save them as .csv files, then delete the original .txt files.

Height table export

To export the raw height matrix table, navigate to the 'Alignment result' option within the 'Export' drop-down menu.

Select the directory for export, leave all other options to their defaults, and click 'Export'.

GNPS export

Our GNPS processing guide goes through this in more detail, but the key points are to choose your desired export directory, and then:

Deselect the 'Raw data matrix (Height) checkbox.
Select the 'GNPS export' checkbox.
Set the 'Target file' drop-down menu to your second QC sample.
Set the 'Export format:' option to mgf.
Click 'Export'

Parameter export

To export the analysis parameters, navigate to the 'Parameter export (Tab-delimited text)' option within the 'Export' drop-down menu. Select your export directory, and click 'Save'.

Data pre-processing in R

To pre-process our raw height data within R, we will use tools from the pmp package. These steps are explained in depth within our pmp preprocessing guide.

A selection of custom scripts are also available to handle the pre-processing steps for you, with all of the customisation and fine-tuning normally available when running the steps individually.

Secondary annotation methods for metabolomics data

We can use additional tools to annotate more of our metabolite features than MS-DIAL can achieve alone.

GNPS secondary annotation

We can acquire secondary annotation of our MS/MS data using an online tool provided by the Global Natural Products Social (GNPS) Molecular Network.

See our guide for running feature-based molecular networking (FBMN) with GNPS, and subsequent incorporation with your MS-DIAL data in R here.

HMDB secondary annotation

Similarly we can acquire secondary annotation of our MS1 data using the Human Metabolome Database (HMDB). Unlike secondary MS/MS annotation with GNPS, this step occurs following pre-preprocessing within R.

See our guide for annotation of your MS-DIAL data in R using the HMDB database here.

Manual curation of peaks

Now that we have a SummarizedExperiment object, filtered for features with at least one annotation, we need to manually curate the spectra within MS-DIAL. While the pmp pre-processing pipeline does a good job at filtering your LCMS datasets for the best quality data, it is not able to discern the quality of the spectra directly. It is important to manually curate the data before continuing with downstream analysis, as the poor quality data will affect ordination and statistical tests.

The most efficient method for manual curation is to save the alignment ID, names, and ionisation mode information element from your SummarizedExperiment object, for example (using the function save_curation_table()):

# Save the curation table
save_curation_table(metab_stool_glog, here::here('data', 'manual_curation', 'stool_curation.csv'))

Now that we have our stool_curation.csv file, we can import it into Google Sheets and add an additional column named quality_peak that is filled with tickboxes.

Now, when we return to MS-DIAL, we can check the peaks using the Alignment_ID values as our guide, and check the tickboxes of those that are accepted.

Once we have completed this task for both the positive and negative ionisation modes, we can save the Google sheet as a .csv file, and import it into R. Thankfully, it is very easy to filter SummarizedExperiment objects, and we can do this using our quality_peak column.

# Load quality data
metab_stool_quality <- read_csv(here::here('data', 'manual_curation', 'stool_curated.csv'))

# Filter the features using the TRUE/FALSE quality_peak column
metab_stool_glog <- metab_stool_glog[metab_stool_quality$quality_peak,]

Downstream analysis

Once you have reached this point, you should have a SummarizedExperiment object (complete with sample and feature metadata) that contains only annotated features, with intensity values that have been normalised and transformed, and are ready for downstream analysis in R.

Rights

Copyright (c) 2021 Respiratory Immunology lab, Monash University, Melbourne, Australia.
MS-DIAL: link
License: This pipeline is provided under the MIT license (See LICENSE.txt for details)
Authors: M. Macowan, C. Pattaroni, A. Butler, and G. Iacono.

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
assets		assets
gnps_processing		gnps_processing
hmdb_processing		hmdb_processing
pmp_preprocessing		pmp_preprocessing
LICENSE		LICENSE
README.md		README.md
save_curation_table.R		save_curation_table.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Processing Metabolome and Lipidome Data with MS-DIAL

Requirements

Methods

Start up a project

New project window

Analysis parameter setting - Data collection

Peak detection

MS2Dec

Identification

Adduct

Alignment

Run the pipeline

Viewing and exporting data

Aligned spot information

Exporting for downstream processing and analysis

Height table export

GNPS export

Parameter export

Data pre-processing in R

Secondary annotation methods for metabolomics data

GNPS secondary annotation

HMDB secondary annotation

Manual curation of peaks

Downstream analysis

Rights

About

Releases

Packages

Languages

License

clzani/metabolome-lipidome-MSDIAL

Folders and files

Latest commit

History

Repository files navigation

Processing Metabolome and Lipidome Data with MS-DIAL

Requirements

Methods

Start up a project

New project window

Analysis parameter setting - Data collection

Peak detection

MS2Dec

Identification

Adduct

Alignment

Run the pipeline

Viewing and exporting data

Aligned spot information

Exporting for downstream processing and analysis

Height table export

GNPS export

Parameter export

Data pre-processing in R

Secondary annotation methods for metabolomics data

GNPS secondary annotation

HMDB secondary annotation

Manual curation of peaks

Downstream analysis

Rights

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages