Quant Biol    2013, Vol. 1 Issue (3) : 192-200
Automated interpretation of metabolic capacity from genome and metagenome sequences
Minoru Kanehisa()
Institute for Chemical Research, Kyoto University, Uji Kyoto 611-0011, Japan
The KEGG pathway maps are widely used as a reference data set for inferring high-level functions of the organism or the ecosystem from its genome or metagenome sequence data. The KEGG modules, which are tighter functional units often corresponding to subpathways in the KEGG pathway maps, are designed for better automation of genome interpretation. Each KEGG module is represented by a simple Boolean expression of KEGG Orthology (KO) identifiers (K numbers), enabling automatic evaluation of the completeness of genes in the genome. Here we focus on metabolic functions and introduce reaction modules for improving annotation and signature modules for inferring metabolic capacity. We also describe how genome annotation is performed in KEGG using the manually created KO database and the computationally generated SSDB database. The resulting KEGG GENES database with KO (K number) annotation is a reference sequence database to be compared for automated annotation and interpretation of newly determined genomes.

Keywords metabolic pathway      functional module      genome annotation      genome interpretation      KEGG database     
Corresponding Author(s): Kanehisa Minoru,   
Issue Date: 05 September 2013
Minoru Kanehisa. Automated interpretation of metabolic capacity from genome and metagenome sequences[J]. Quant Biol, 2013, 1(3): 192-200.
Fig.1  The pathway maps of citrate cycle for (A) , (B) , (C) , and (D) are generated by matching the genomic content of enzymes genes against the KEGG reference pathway map ( The KEGG module M00010 is defined for the first segment of citrate cycle from oxaloacetate to 2-oxoglutarate, because has only this segment, lacks this segment, and has only this segment encoded by adjacent genes in an operon-like structure.
Fig.2  The reaction module RM001 is a characteristic reaction sequence involving tricarboxylic acids for extension of 2-oxocarboxylic chain using acetyl-CoA derived carbon. It is shown in pink boxes (A) from oxaloacetate to 2-oxoglutaratein in citrate cycle, (B) from 2-oxoglutarate to 2-oxoadipate in lysine biosynthesis, (C) from pyruvate to 2-oxobutanoate and from 2-oxoisovalerate to 2-oxoisocaproate in valine, leucine and isoleucine biosynthesis.
Fig.3  Linking genomes to molecular networks for interpretation of metabolic capacities.
Fig.4  The KEGG pathway map does not contain all substrates and products defined in the reaction formula. Instead, it uses a simplified representation as shown in the reaction map formula for main compounds. Reactant pairs are defined from the reaction formula as one-to-one relationships of substrate-product pairs. Those on the pathway map are called main reactant pairs. Reaction class corresponds to a set of main reactant pairs that have the same chemical structure transformation patterns defined as RDM patterns.
Fig.5  This is a simplified version of the KEGG pathway map (, where 2-oxocarboxylic acids are denoted by red circles. The chain elongation module RM001 is shown in the vertical direction, and chain modification modules and other reactions are shown in the horizontal direction. The correspondence of RM001 to KEGG modules (M00010, etc.) is also shown.
ModulePathwayOrganism groupSynthaseDehydrataseDehydrogenase
M00433LysineFungiK01655 (LYS21)K17450 (ACO2)K01705 (LYS4)K05824 (LYS12)
M00433LysineGreen non-sulfur bacteriaDeinococcus-ThermusK01655 (LYS21)K16792 (aksD) + K16793 (aksE)K05824 (LYS12)
M00608Lysine,Coenzyme BMethanogenic archaeaK10977(aksA)K16792 (aksD) + K16793 (aksE)K10978 (aksF)
M00432LeucinePyrococcusK01649 (leuA)K01703 (leuC) + K01704 (leuD)K00052 (leuB)K10978 (aksF)
Tab.1  Combination of paralogous genes in 2-oxocarboxylic acid chain extension.
Metabolic capacitySignature moduleDefinition
Carbon fixation in plants and cyanobacteria(M00161,M00163) + M00165M00161 Photosystem IIM00163 Photosystem IM00165 Reductive pentose phosphate cycle (Calvin cycle)
Carbon fixation in alphaproteobacteriaM00597+ M00165M00597 Anoxygenic photosystem IIM00165 Reductive pentose phosphate cycle (Calvin cycle)
Carbon fixation in green nonsulfur bacteriaM00597+ M00376M00597 Anoxygenic photosystem IIM00376 3-Hydroxypropionate bi-cycle
Carbon fixation in green sulfur bacteriaM00598+ M00161M00598 Anoxygenic photosystem IM00173 Reductive citrate cycle (Arnon-Buchanan cycle)
Nitrate assimilation(K02575,M00438) + M00531K02575 MFS transporter, NNP family, nitrate/nitrite transporterM00438 ABC transporter, nitrate/nitrite transport systemM00531 Assimilatory nitrate reduction, nitrate=>ammonia
Sulfate assimilation(K14708,M00185) + M00176K14708 SLC family 26, sodium-independent sulfate transporterM00185 ABC transporter, sulfate transport systemM00176 Assimilatory sulfate reduction, sulfate=>H2S
MethanogenesisM00567,M00357,M00356,M00563M00567 Methanogenesis, CO2=>methaneM00357 Methanogenesis, acetate=>methaneM00356 Methanogenesis, methanol=>methaneM00563 Methanogenesis, methylamine=>methane
AcetogenesisM00377+ M00579M00377 Reductive acetyl-CoA pathway (Wood-Ljungdahl pathway)M00579 Phosphate acetyltransferase-acetate kinase pathway
Tab.2  Examples of signature modules.
