Applications of machine learning in human microbiome studies: a review on feature selection, biomarker identification, disease prediction and treatment
Journal
Frontiers in microbiology
Date Issued
2021
Author(s)
Marcos-Zambrano, Laura Judith
Karaduzovic-Hadziabdic, Kanita
Loncar Turukalo, Tatjana
Przymus, Piotr
Aasmets, Oliver
Berland, Magali
Gruca, Aleksandra
Hasic, Jasminka
Hron, Karel
Kolev, Mikhail
Klammsteiner, Thomas
Lahti, Leo
B Lopes, Marta
Moreno, Victor
Naskinova, Irina
Org, Elin
Paciência, Inês
Georgios Papoutsoglou,
Rajesh Shigdel,
Blaz Stres,
Baiba Vilne,
Malik Yousef,
Tsamardinos, Ioannis
Carrillo de Santa Pau, Enrique
J Claesson, Marcus
Moreno-Indias, Isabel
Abstract
The number of microbiome-related studies has notably increased the availability of data
on human microbiome composition and function. These studies provide the essential
material to deeply explore host-microbiome associations and their relation to the
development and progression of various complex diseases. Improved data-analytical
tools are needed to exploit all information from these biological datasets, taking into
account the peculiarities of microbiome data, i.e., compositional, heterogeneous and
sparse nature of these datasets. The possibility of predicting host-phenotypes based on
taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard,
machine learning (ML) provides new insights into the development of models that can
be used to predict outputs, such as classification and prediction in microbiology, infer
host phenotypes to predict diseases and use microbial communities to stratify patients
by their characterization of state-specific microbial signatures. Here we review the stateof-the-art ML methods and respective software applied in human microbiome studies,
performed as part of the COST Action ML4Microbiome activities. This scoping review
focuses on the application of ML in microbiome studies related to association and
clinical use for diagnostics, prognostics, and therapeutics. Although the data presented
here is more related to the bacterial community, many algorithms could be applied in
general, regardless of the feature type. This literature and software review covering this
broad topic is aligned with the scoping review methodology. The manual identification of
data sources has been complemented with: (1) automated publication search through
digital libraries of the three major publishers using natural language processing (NLP)
Toolkit, and (2) an automated identification of relevant software repositories on GitHub
and ranking of the related research papers relying on learning to rank approach.
on human microbiome composition and function. These studies provide the essential
material to deeply explore host-microbiome associations and their relation to the
development and progression of various complex diseases. Improved data-analytical
tools are needed to exploit all information from these biological datasets, taking into
account the peculiarities of microbiome data, i.e., compositional, heterogeneous and
sparse nature of these datasets. The possibility of predicting host-phenotypes based on
taxonomy-informed feature selection to establish an association between microbiome and predict disease states is beneficial for personalized medicine. In this regard,
machine learning (ML) provides new insights into the development of models that can
be used to predict outputs, such as classification and prediction in microbiology, infer
host phenotypes to predict diseases and use microbial communities to stratify patients
by their characterization of state-specific microbial signatures. Here we review the stateof-the-art ML methods and respective software applied in human microbiome studies,
performed as part of the COST Action ML4Microbiome activities. This scoping review
focuses on the application of ML in microbiome studies related to association and
clinical use for diagnostics, prognostics, and therapeutics. Although the data presented
here is more related to the bacterial community, many algorithms could be applied in
general, regardless of the feature type. This literature and software review covering this
broad topic is aligned with the scoping review methodology. The manual identification of
data sources has been complemented with: (1) automated publication search through
digital libraries of the three major publishers using natural language processing (NLP)
Toolkit, and (2) an automated identification of relevant software repositories on GitHub
and ranking of the related research papers relying on learning to rank approach.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
fmicb-12-634511.pdf
Size
5.1 MB
Format
Adobe PDF
Checksum
(MD5):2b48f34bab0bcb5166bfd1c2454636c1
