PhD project offered by the IMPRS-gBGC in July 2021

Anchor

Understanding dissolved organic matter (DOM) by destroying it: Combining metabolomics and bioinformatics to overcome the chimeric nature of DOM

Sebastian Böcker , Gerd Gleixner , Kai Dührkop , Carsten Simon , Daniel Petras

Project description

This project focuses on the information content of complex mixtures, in particular dissolved organic matter (DOM), and how it can be properly revealed by modern mass spectrometry and bioinformatic tools (Fig. 1). DOM is one of the most ubiquitous and complex mixtures on earth, being composed of thousands of individual molecules (Hawkes et al., 2020; Roth et al., 2019; Smith et al., 2018). This makes it a perfect study object to benchmark new bioinformatics tools to deconvolute such mixtures. Modern mass spectrometry allows us to resolve finest details in complex samples, but reaches its limits in identification of the individual components (Hertkorn et al., 2008). This is due to the chimeric nature of DOM that persists even after chromatographic separation, and which hampers the acquisition of clean tandem (MS2) mass spectra for identification (Petras et al., 2017), for example by fragmentation trees (Dührkop et al., 2015). Hence, the structural causes of chemical differences between DOM types remain elusive, which limits our understanding of the information that DOM inherits from metabolic processes at different ecosystem scales (microbial community, soil profile, landscape, watershed). New approaches to deconvolute or aggregate chimeric MS2 data are thus instrumental to spark progress in reaching full metabolome annotation in complex DOM samples (Dührkop et al., 2020; Rogers et al., 2019).

Research aim & questions

The overall research aim of the PhD project is to analyze a representative set of DOM samples in full detail by ultrahigh resolution (Orbitrap) MS2 analysis and to apply and advance existing bioinformatics approaches for the optimal analysis of the MS2 datasets (Aron et al., 2020; Dührkop et al., 2015, 2020; Rogers et al., 2019). Depending on the successful candidates’ qualifications and development, the project may head in the metabolomics or bioinformatics direction.

The metabolomics project centers around the following questions:
  • Which characteristic MS2 features (mass differences, fragment ions) are common to different DOM samples, and which ones discern them?
  • Which structural classes do these features represent?
  • Which bioinformatics approach is suited best for analyzing these patterns?
  • How do synthetic mixtures of known compounds compare to natural mixtures, and what does that imply for the deconvolution of chimeric MS2 data?

The bioinformatics project centers around computation method development for the analysis of DOM MS2 spectra:
  • Can we compute fragmentation trees even if the MS2 spectrum represents a mixture of several isobaric or even isomeric compounds?
  • Do we rediscover mass spectral motifs in the DOM data (van der Hooft et al., 2016) which allow to decompose the chimeric spectra?
  • Can machine learning techniques correctly identify and predict compound classes in such mixtures?

Applied techniques

Depending on the successful candidate’s qualifications (chemistry or informatics) work will focus more on MS2 data acquisition and biogeochemical interpretation (chemistry focus) or development and optimization of computational routines for the analysis of complex MS2 datasets (informatics focus). The MS2 data will be acquired with an ultrahigh resolution Orbitrap Elite mass spectrometer that allows both direct injection and LC analyses (Simon et al., 2018). Biogeochemical analysis will encompass the use of self-written routines and existing tools like GNPS (Aron et al., 2020; Petras et al., 2017), SIRIUS (Dührkop et al., 2015), or CANOPUS (Dührkop et al., 2020). Computational development of novel bioinformatics pipelines will deploy extensive use of machine learning techniques to decompose MS2 information from mixtures of knowns and unknowns. Both research foci will allow to derive novel insights into complex mixture information content in terms of indicative mass differences, their diversity, and potential uses for deconvolution of structural substance classes in complex samples.

Affiliation and support

The PhD candidate will be affiliated to the Chair of Bioinformatics at the Institute for Informatics at the FSU Jena and in the working group “Molecular Biogeochemistry” at MPI-BGC. Supervision at the Friedrich Schiller University Jena is provided by Prof. Dr. Sebastian Böcker, and by Prof. Dr. Gerd Gleixner at MPI-BGC. Additional support will be provided by Kai Dührkop (bioinformatics, especially SIRIUS and CANOPUS), Carsten Simon (DOM analysis, Orbitrap), and Daniel Petras (DOM analysis, LC-MS2 with GNPS).

Requirements

Applications to the IMPRS-gBGC are open to well-motivated and highly-qualified students from all countries. For this particular PhD project we seek a candidate either with qualifications in the field of metabolomics or bioinformatics.

For the metabolomics focus, we search a candidate with
  • a Master’s degree in Chemistry, Biochemistry or other chemistry related sciences,
  • experience in analytical chemistry, LC-MS, and handling of big data sets,
  • of advantage is experience in high resolution MS data analysis (FT-ICR-MS or Orbitrap)
  • very good oral and written communication skills in English
For the bioinformatics focus, we search a candidate with
  • a Master’s degree in Bioinformatics, Informatics, or other informatics-related sciences,
  • experience in programming and the use of LC-MS tools such as SIRIUS, GNPS, or CANOPUS
  • of advantage is experience in machine learning techniques and small molecule identification
  • very good oral and written communication skills in English
The Max Planck Society seeks to increase the number of women in those areas where they are underrepresented and therefore explicitly encourages women to apply. The Max Planck Society is committed to increasing the number of individuals with disabilities in its workforce and therefore encourages applications from such qualified individuals.

Fig.1: Overcoming the chimeric nature of DOM: (a) typical non-targeted direct-injection mass spectra of dissolved organic matter resemble a haystack of metabolites, hampering the true identification of unknowns; (b) By isolation and fragmentation, certain “needles” (at m/z 301) can be “pulled out of the haystack” for MS fragmentation, yielding structural information. Thanks to ultrahigh resolution and exact mass determination, chimeric tandem MS data can be dissected to obtain information on specific “needles”. Three exact mass differences are highlighted: neutral loss of CO<sub>2</sub>, a common mass loss (m/z 257); neutral loss of C<sub>8</sub>H<sub>6</sub>O<sub>3</sub>, which is also observed in flavonoids (m/z 151, see also c); and a neutral loss of C<sub>6</sub>H<sub>10</sub>O<sub>5</sub> (m/z 139), which could represent the loss of a sugar moiety (like the one in c, green circle); (c) Bioinformatics tools like SIRIUS can help to interpret the fragmentation of unknown metabolites and can be used to identify unknowns. Here shown is the fragmentation tree from tandem MS experiments with the aglycone (yellow) of the flavonoid Spiraeoside, which has a mass of 301 Da. Color indicates relative ion abundance in the tandem MS spectrum (blue, high; red, low). The indicative mass difference 150.0317 can be obtained by sequential losses of CO<sub>2</sub>, C<sub>6</sub>H<sub>6</sub>, and CO units, and is also detected in DOM (b, m/z 151).
Fig.1: Overcoming the chimeric nature of DOM: (a) typical non-targeted direct-injection mass spectra of dissolved organic matter resemble a haystack of metabolites, hampering the true identification of unknowns; (b) By isolation and fragmentation, certain “needles” (at m/z 301) can be “pulled out of the haystack” for MS fragmentation, yielding structural information. Thanks to ultrahigh resolution and exact mass determination, chimeric tandem MS data can be dissected to obtain information on specific “needles”. Three exact mass differences are highlighted: neutral loss of CO2, a common mass loss (m/z 257); neutral loss of C8H6O3, which is also observed in flavonoids (m/z 151, see also c); and a neutral loss of C6H10O5 (m/z 139), which could represent the loss of a sugar moiety (like the one in c, green circle); (c) Bioinformatics tools like SIRIUS can help to interpret the fragmentation of unknown metabolites and can be used to identify unknowns. Here shown is the fragmentation tree from tandem MS experiments with the aglycone (yellow) of the flavonoid Spiraeoside, which has a mass of 301 Da. Color indicates relative ion abundance in the tandem MS spectrum (blue, high; red, low). The indicative mass difference 150.0317 can be obtained by sequential losses of CO2, C6H6, and CO units, and is also detected in DOM (b, m/z 151).


>> more information about the IMPRS-gBGC + application