MS-DIAL provides a pipeline for untargeted metabolomics. Here we will discuss the process for generating intensity height tables from raw LC-MS data, and subsequent export for downstream analysis and secondary annotation methods.
As of MS-DIAL version 4.70, it is no longer a requirement to convert your .RAW LC-MS files into .ABF format, however if you are using an older version of MS-DIAL, you will need to download the ABF file converter.
- ABF File Converter (depending on MS-DIAL version).
- MS-DIAL software for Windows.
- MassBank MS/MS positive and negative database files.
- The LC-MS files all present in a single folder.
- The LC-MS standards file generated during data collection (should contain columns for: metabolite name, m/z, and retention time).
Begin by opening MS-DIAL and starting a new project from the 'File'
menu at the top-left of the screen.
You will then need to select the 'Project file path:'
that contains your LC-MS files using the 'Browse'
button.
It is recommended that you change the default .mtd
file name generated by MS-DIAL to something more easily recognisable in the future,
such as \date_sampletype_ionisationmode.mtd
.
For our data, most of the default options will be appropriate. Ensure you choose the correct 'Ion mode'
at the bottom left,
and that you select the correct 'Target omics'
from either metabolomics or lipidomics.
The next screen will ask you to define your 'Analysis file paths'
. Click the browse button, and select all of your LC-MS files.
Note that these must be located in the same folder as your project in order to be valid.
Once you have done this, you will need to change the 'Type'
column to indicate whether that sample is a sample, blank, QC, or standard.
It is also recommended, for quick analysis later on, to change the values in the 'Class ID'
column.
For example, in the 'Class ID'
column below, it is indicated what the sample type is (if you have multiple groups, you can specify these here),
and of note, while MS2
samples are given a 'Type'
of QC
, you can specify that they are their own class in this column.
Once you have finished, click 'Next'
to continue.
If you have a parameter configuration file, you can load it in via the 'Load'
button in the bottom-left of the window.
We begin the setting of analysis parameters by inputting the data collection parameters.
Mass Accuracy:
Metabolomics | Lipidomics | |
---|---|---|
MS1 tolerance | 0.002 Da | 0.002 Da |
MS2 tolerance | 0.002 Da | 0.002 Da |
Data collection parameters:
Metabolomics | Lipidomics | |
---|---|---|
Retention time begin | 0 min | 0 min |
Retention time end | 40 min | 40 min |
MS1 mass range begin | 50 Da | 200 Da |
MS1 mass range end | 1000 Da | 1300 Da |
MS/MS mass range begin | 50 Da | 200 Da |
MS/MS mass range end | 1000 Da | 1300 Da |
Isotope recognition:
Metabolomics | Lipidomics | |
---|---|---|
Maximum charged number | 2 | 2 |
Consider Cl and Br elements | FALSE | FALSE |
Multithreading (will depend on your machine):
Metabolomics | Lipidomics | |
---|---|---|
Number of threads | 8 | 8 |
Execute retention time corrections | FALSE | FALSE |
Next we select the minimum peak height threshold. Peaks below this threshold will not be retained.
A value of 100,000
is recommended for data acquired by Thermo Scientific Xcalibur machines.
However, this will vary by apparatus, and may require data-dependant tuning.
We will leave the 'Mass slice width'
value to the default, along with all options in the drop-down 'Advanced'
menu.
Metabolomics | Lipidomics | |
---|---|---|
Minimum peak height | 100000 | 100000 |
Mass slice width | 0.05 Da | 0.05 Da |
The default values are suitable.
Metabolomics | Lipidomics | |
---|---|---|
Sigma window value | 0.5 | 0.5 |
MS/MS abundance cut off | 0 | 0 |
Exclude after precursor ion | TRUE | TRUE |
Keep the isotopic ions until | 0.5 Da | 0.5 Da |
Keep the isotopic ions w/o MS2Dec | FALSE | FALSE |
In this tab, you will need the appropriate .msp
file for identification of metabolites: this will be either
the positive or negative MassBank file you downloaded
depending on the ionisation mode you are running currently.
Leave this blank for lipidomics.
We will also set a number of parameters:
Metabolomics | Lipidomics | |
---|---|---|
Retention time tolerance | 0.1 min | |
Accurate mass tolerance (MS1) | 0.002 Da | |
Accurate mass tolerance (MS2) | 0.002 Da | |
Identification score cut off | 80 % | |
Use retention time for scoring | FALSE | |
Use retention time for filtering | FALSE |
Under the 'Advanced'
drop-down menu, you can select a text file containing the standards run during data acquisition.
This file should contain three columns, including the metabolite name, m/z, and retention time (in that order).
These columns should be named Metabolite
, MZ
, and RT
.
We will also adjust the settings here:
Metabolomics | Lipidomics | |
---|---|---|
Retention time tolerance | 0.1 min | |
Accurate mass tolerance | 0.002 Da | |
Identification score cut off | 85 % | |
Relative abundance cut off | 0 % | |
Only report the top hit | TRUE |
Here we will select the appropriate adduct ion settings for our runs.
Metabolomics Positive | Metabolomics Negative | Lipidomics |
---|---|---|
[M+H]+ | [M-H]- | [M+H]+ |
[M+NH4]+ | [M+Na-2H]- | [M+NH4]+ |
[M+Na]+ | [M+Cl]- | [M-H]- |
Here, you should rename the 'Result name:'
to something more recognisable.
For example, alignment_data_sampletype_ionisation
.
Set the 'Reference file:'
to your second QC sample. The first QC sample is typically different than the others,
and this will affect your results.
The 'Retention time tolerance:'
parameter will be data-dependant, however a good start is to try either 0.3 min
or 1 min
.
Metabolomics | Lipidomics | |
---|---|---|
Retention time tolerance | 0.3 min | 0.3 min |
MS1 tolerance | 0.002 Da | 0.002 Da |
There are also some other parameters we need to check in the 'Advanced'
drop-down menu.
Because we don't remove the features based on blank information, the following three tick boxes are 'greyed out', and not available to
change. You do not need to change the default options for these.
Gap filling by compulsion may be required for your data, but typically with a higher retention time tolerance (as set above), gap filling does not improve the results, so it is suggested to leave it unchecked initially.
Metabolomics | Lipidomics | |
---|---|---|
Retention time factor | 0.5 | 0.5 |
MS1 factor | 0.5 | 0.5 |
Peak count filter | 0 % | 0 % |
N% detected in at least one group | 0 % | 0 % |
Remove features based on blank information | FALSE | FALSE |
Sample max / blank average | 5 | 5 |
Keep 'reference matched' metabolite features | FALSE | FALSE |
Keep 'suggested (w/o MS2)' metabolite features | FALSE | FALSE |
Keep removable features and assign the tag | FALSE | FALSE |
Gap filling by compulsion | FALSE | FALSE |
Together with Alignment | TRUE | TRUE |
Once you have finished setting up the analysis parameters, click 'Finish'
to begin the run.
When the run finishes, the data will appear on the screen. On the left-hand side of the screen, you will see an 'Alignment navigator'
box.
Double click the file that is inside to load all of your data at once.
The 'Show ion table'
button in the middle of the screen is a good place to start to see how many metabolites/lipids (features) were found during the run.
You can click the 'Metabolite name'
column to group all features that were annotated.
If you want to view more information about a specific feature, you can single click on the row within the ion table list.
At the top of the screen, the default image shown is the 'Bar chart of aligned spot (OH)'
. This will show you the average intensities
of each of your 'Class IDs'
.
To view the actual peak itself, you can click the adjacent 'EIC of aligned spot'
tab to view and assess the quality of the peak.
Good quality peaks, especially for smaller weight features, are tight and well-aligned (see the image below).
Next, if you want to see the individual peaks for each of your samples, you can right-click on the peak window and click the
'Table viewer for curating each chromatogram'
option. This will show you the peak for that feature for each sample so you can further
assess their quality. This will be necessary for manual curation of features later to confirm confidence in the annotation of significant findings.
Ideally, after downstream pre-processing, you would come back to MS-DIAL and manually curate each of the features, removing those with poor-quality spectra. In practice, sometimes there can be a couple of thousand annotated metabolites, and this may not be feasible. It should be noted in these cases that your ordination plots (e.g. PCoA) will be affected by these poor quality features.
There are a few useful things we can routinely export:
- The raw data height matrix (contains all of our intensity values and feature information).
- Files for secondary MS/MS annotation with GNPS (see our GNPS processing guide).
- Analysis parameters (to streamline future analysis set-up).
For consistency and easy import into R, best practice is to open each of your exported files in Excel, and save them as .csv
files, then delete the
original .txt
files.
To export the raw height matrix table, navigate to the 'Alignment result'
option within the 'Export'
drop-down menu.
Select the directory for export, leave all other options to their defaults, and click 'Export'
.
Our GNPS processing guide goes through this in more detail, but the key points are to choose your desired export directory, and then:
- Deselect the
'Raw data matrix (Height)
checkbox. - Select the
'GNPS export'
checkbox. - Set the
'Target file'
drop-down menu to your second QC sample. - Set the
'Export format:'
option tomgf
. - Click
'Export'
To export the analysis parameters, navigate to the 'Parameter export (Tab-delimited text)'
option within the 'Export'
drop-down menu.
Select your export directory, and click 'Save'
.
To pre-process our raw height data within R, we will use tools from the pmp
package.
These steps are explained in depth within our pmp preprocessing guide.
A selection of custom scripts are also available to handle the pre-processing steps for you, with all of the customisation and fine-tuning normally available when running the steps individually.
We can use additional tools to annotate more of our metabolite features than MS-DIAL can achieve alone.
We can acquire secondary annotation of our MS/MS data using an online tool provided by the Global Natural Products Social (GNPS) Molecular Network.
See our guide for running feature-based molecular networking (FBMN) with GNPS, and subsequent incorporation with your MS-DIAL data in R here.
Similarly we can acquire secondary annotation of our MS1 data using the Human Metabolome Database (HMDB). Unlike secondary MS/MS annotation with GNPS, this step occurs following pre-preprocessing within R.
See our guide for annotation of your MS-DIAL data in R using the HMDB database here.
Now that we have a SummarizedExperiment
object, filtered for features with at least one annotation, we need to manually curate the spectra within MS-DIAL. While the pmp
pre-processing pipeline does a good job at filtering your LCMS datasets for the best quality data, it is not able to discern the quality of the spectra directly. It is important to manually curate the data before continuing with downstream analysis, as the poor quality data will affect ordination and statistical tests.
The most efficient method for manual curation is to save the alignment ID, names, and ionisation mode information element from your SummarizedExperiment
object, for example (using the function save_curation_table()
):
# Save the curation table
save_curation_table(metab_stool_glog, here::here('data', 'manual_curation', 'stool_curation.csv'))
Now that we have our stool_curation.csv
file, we can import it into Google Sheets and add an additional column named quality_peak
that is filled with tickboxes.
Now, when we return to MS-DIAL, we can check the peaks using the Alignment_ID
values as our guide, and check the tickboxes of those that are accepted.
Once we have completed this task for both the positive and negative ionisation modes, we can save the Google sheet as a .csv
file, and import it into R. Thankfully, it is very easy to filter SummarizedExperiment
objects, and we can do this using our quality_peak
column.
# Load quality data
metab_stool_quality <- read_csv(here::here('data', 'manual_curation', 'stool_curated.csv'))
# Filter the features using the TRUE/FALSE quality_peak column
metab_stool_glog <- metab_stool_glog[metab_stool_quality$quality_peak,]
Once you have reached this point, you should have a SummarizedExperiment
object (complete with sample and feature metadata) that contains only annotated features, with intensity values that have been normalised and transformed, and are ready for downstream analysis in R.
- Copyright (c) 2021 Respiratory Immunology lab, Monash University, Melbourne, Australia.
- MS-DIAL: link
- License: This pipeline is provided under the MIT license (See LICENSE.txt for details)
- Authors: M. Macowan, C. Pattaroni, A. Butler, and G. Iacono.