TY - JOUR
T1 - Automatic identification of optimal marker genes for phenotypic and taxonomic groups of microorganisms
AU - Segev, Elad
AU - Pasternak, Zohar
AU - Sasson, Tom Ben
AU - Jurkevitch, Edouard
AU - Gonen, Mira
N1 - Publisher Copyright:
© 2018 Segev et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
PY - 2018/5
Y1 - 2018/5
N2 - Finding optimal markers for microorganisms important in the medical, agricultural, environmental or ecological fields is of great importance. Thousands of complete microbial genomes now available allow us, for the first time, to exhaustively identify marker proteins for groups of microbial organisms. In this work, we model the biological task as the well-known mathematical “hitting set” problem, solving it based on both greedy and randomized approximation algorithms. We identify unique markers for 17 phenotypic and taxonomic microbial groups, including proteins related to the nitrite reductase enzyme as markers for the non-anammox nitrifying bacteria group, and two transcription regulation proteins, nusG and yhiF, as markers for the Archaea and Escherichia/Shigella taxonomic groups, respectively. Additionally, we identify marker proteins for three subtypes of pathogenic E. coli, which previously had no known optimal markers. Practically, depending on the completeness of the database this algorithm can be used for identification of marker genes for any microbial group, these marker genes may be prime candidates for the understanding of the genetic basis of the group’s phenotype or to help discover novel functions which are uniquely shared among a group of microbes. We show that our method is both theoretically and practically efficient, while establishing an upper bound on its time complexity and approximation ratio; thus, it promises to remain efficient and permit the identification of marker proteins that are specific to phenotypic or taxonomic groups, even as more and more bacterial genomes are being sequenced.
AB - Finding optimal markers for microorganisms important in the medical, agricultural, environmental or ecological fields is of great importance. Thousands of complete microbial genomes now available allow us, for the first time, to exhaustively identify marker proteins for groups of microbial organisms. In this work, we model the biological task as the well-known mathematical “hitting set” problem, solving it based on both greedy and randomized approximation algorithms. We identify unique markers for 17 phenotypic and taxonomic microbial groups, including proteins related to the nitrite reductase enzyme as markers for the non-anammox nitrifying bacteria group, and two transcription regulation proteins, nusG and yhiF, as markers for the Archaea and Escherichia/Shigella taxonomic groups, respectively. Additionally, we identify marker proteins for three subtypes of pathogenic E. coli, which previously had no known optimal markers. Practically, depending on the completeness of the database this algorithm can be used for identification of marker genes for any microbial group, these marker genes may be prime candidates for the understanding of the genetic basis of the group’s phenotype or to help discover novel functions which are uniquely shared among a group of microbes. We show that our method is both theoretically and practically efficient, while establishing an upper bound on its time complexity and approximation ratio; thus, it promises to remain efficient and permit the identification of marker proteins that are specific to phenotypic or taxonomic groups, even as more and more bacterial genomes are being sequenced.
UR - http://www.scopus.com/inward/record.url?scp=85046489539&partnerID=8YFLogxK
U2 - 10.1371/journal.pone.0195537
DO - 10.1371/journal.pone.0195537
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 29718935
AN - SCOPUS:85046489539
SN - 1932-6203
VL - 13
JO - PLoS ONE
JF - PLoS ONE
IS - 5
M1 - e0195537
ER -