MTMO: an efficient network-centric algorithm for subtree counting and enumeration

doi:10.1007/s40484-018-0140-y

Quant. Biol.

2018, Vol. 6

Issue (2) : 142-154 https://doi.org/10.1007/s40484-018-0140-y

RESEARCH ARTICLE

MTMO: an efficient network-centric algorithm for subtree counting and enumeration

Guanghui Li^1,², Jiawei Luo¹(

), Zheng Xiao¹, Cheng Liang¹

¹. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China
². School of Information Engineering, East China Jiaotong University, Nanchang 330013, China

Download: PDF(626 KB) HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Abstract

Background: The frequency of small subtrees in biological, social, and other types of networks could shed light into the structure, function, and evolution of such networks. However, counting all possible subtrees of a prescribed size can be computationally expensive because of their potentially large number even in small, sparse networks. Moreover, most of the existing algorithms for subtree counting belong to the subtree-centric approaches, which search for a specific single subtree type at a time, potentially taking more time by searching again on the same network.

Methods: In this paper, we propose a network-centric algorithm (MTMO) to efficiently count k-size subtrees. Our algorithm is based on the enumeration of all connected sets of k–1 edges, incorporates a labeled rooted tree data structure in the enumeration process to reduce the number of isomorphism tests required, and uses an array-based indexing scheme to simplify the subtree counting method.

Results: The experiments on three representative undirected complex networks show that our algorithm is roughly an order of magnitude faster than existing subtree-centric approaches and base network-centric algorithm which does not use rooted tree, allowing for counting larger subtrees in larger networks than previously possible. We also show major differences between unicellular and multicellular organisms. In addition, our algorithm is applied to find network motifs based on pattern growth approach.

Conclusions: A network-centric algorithm which allows for a faster counting of non-induced subtrees is proposed. This enables us to count larger motif in larger networks than previously.

Keywords complex network evolutionary systems biology network motif discovery subtree counting subtree isomorphism

Corresponding Author(s): Jiawei Luo

Issue Date: 11 June 2018

Cite this article:

Guanghui Li,Jiawei Luo,Zheng Xiao, et al. MTMO: an efficient network-centric algorithm for subtree counting and enumeration[J]. Quant. Biol., 2018, 6(2): 142-154.

URL:

https://academic.hep.com.cn/qb/EN/10.1007/s40484-018-0140-y
https://academic.hep.com.cn/qb/EN/Y2018/V6/I2/142

Tab.1 Experimental Datasets

Tab.2 Experimental results for MTMO vs. NRMO

Tab.3 Experimental results for MTMO vs. MODA

Network	No. of vertices	No. of edges	Average degree	Clustering coefficient	Characteristic path length	Diameter	Power-law distribution N= $α$ k-^r
Network	No. of vertices	No. of edges	Average degree	Clustering coefficient	Characteristic path length	Diameter	$α$	r
S.cere	5137	22440	8.737	0.097	3.984	10	4014.5	1.707
E.coli	2955	11645	7.882	0.094	4.004	12	985.1	1.445
H.pylo	715	1364	3.815	0.017	4.135	9	352.2	1.651
C.eleg	2711	4042	2.982	0.022	4.837	14	601.8	1.607

Tab.4 Topological statistics in the PPI networks we studied

Fig.1 Normalized treelet distribution size 8 of the Yeast PPI network (Red), E. coli (Green), H. pylo (Blue) and C. eleg (Pink).

Fig.2 Shapes and labels for 3 and 4-node subgraphs in an undirected network.

Tab.5 The statistical properties of MIPS network generated by our algorithm (non-induced subgraphs)

Tab.6 The statistical properties of MIPS network generated by ESU (induced subgraphs)

Fig.3 Example graph G₁ and its 3-trees.

Fig.4 An instance of a network.

Fig.5 The implicit built trees rooted at vertex 1 of size 4 for network in Figure 4.

Fig.6 An example labeled rooted tree of enumerating 4-subtrees.

Fig.7 The steps of searching the rooted tree during enumerating a sample subtree.

Fig.8 List of treelets for k=5 with their canonical labelings.

Tab.7 Number of each k-subtree non-isomorphic class, and the maximum number of corresponding labeled rooted tree leaves

Tab.8 The minimum and maximum of the canonical labeling for different k-subtrees, and the length of corresponding array

1	Milo, R., Shen-Orr, S., Itzkovitz, S., Kashtan, N., Chklovskii, D. and Alon, U. (2002) Network motifs: simple building blocks of complex networks. Science, 298, 824–827 https://doi.org/10.1126/science.298.5594.824 pmid: 12399590
2	Alon, N., Dao, P., Hajirasouliha, I., Hormozdiari, F. and Sahinalp, S. C. (2008) Biomolecular network motif counting and discovery by color coding. Bioinformatics, 24, i241–i249 https://doi.org/10.1093/bioinformatics/btn163 pmid: 18586721
3	Huan, J., Wang, W. and Prins, J. (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In Proc. Third IEEE Int’l Conf. on Data Mining, pp. 549–552
4	Kuramochi, M. and Karypis, G. (2001) Frequent subgraph discovery. In Proc. First IEEE Int’l Conf. on Data Mining, pp. 313–320
5	Liang, C., Li, Y., Luo, J. and Zhang, Z. (2015) A novel motif-discovery algorithm to identify co-regulatory motifs in large transcription factor and microRNA co-regulatory networks in human. Bioinformatics, 31, 2348–2355 https://doi.org/10.1093/bioinformatics/btv159 pmid: 25788622
6	Chen, X., Yan, C. C., Zhang, X., Zhang, X., Dai, F., Yin, J. and Zhang, Y. (2016) Drug-target interaction prediction: databases, web servers and computational models. Brief. Bioinform., 17, 696–712 https://doi.org/10.1093/bib/bbv066 pmid: 26283676
7	Chen, X., Yan, C. C., Zhang, X. and You, Z. H. (2017) Long non-coding RNAs and complex diseases: from experimental results to computational models. Brief. Bioinformatics, 18, 558–576 pmid: 27345524
8	Chen, X., Huang, Y. A., You, Z. H., Yan, G. Y. and Wang, X. S. (2017) A novel approach based on KATZ measure to predict associations of human microbiota with non-infectious diseases. Bioinformatics, 33, 733–739 pmid: 28025197
9	Chen, X., Xie, D., Zhao, Q. and You, Z. H. (2017) MicroRNAs and complex diseases: from experimental results to computational models. Brief. Bioinform., doi: 10.1093/bib/bbx130 pmid: 29045685
10	Chen, X., Yan, C. C., Zhang, X., You, Z. H., Deng, L., Liu, Y., Zhang, Y. and Dai, Q. (2016) WBSMDA: within and between score for MiRNA-disease association prediction. Sci. Rep., 6, 21106 https://doi.org/10.1038/srep21106 pmid: 26880032
11	Dao, P., Schönhuth, A., Hormozdiari, F., Hajirasouliha, I., Sahinalp, S. C. and Este, M. (2009) Quantifying systemic evolutionary changes by color coding confidence-scored PPI networks. In 9th Int’l Workshop on Algorithms in Bioinformatics, pp. 37–48
12	Omidi, S., Schreiber, F. and Masoudi-Nejad, A. (2009) MODA: an efficient algorithm for network motif discovery in biological networks. Genes Genet. Syst., 84, 385–395 https://doi.org/10.1266/ggs.84.385 pmid: 20154426
13	Alon, N., Yuster, R. and Zwick, U. (1995) Color-coding. J. Assoc. Comput. Mach., 42, 844–856 https://doi.org/10.1145/210332.210337
14	Zhao, Z., Khan, M., Kumar, V. S. A. and Marathe, M. V. (2010) Subgraph enumeration in large social contact networks using parallel color coding and streaming. In Proc. IEEE 39th Int’l Conf. on Parallel Processing, pp. 594–603
15	Zhao, Z., Wang, G., Butt, A. R., Khan, M., Kumar, V. S. A. and Marathe, M. V. (2012) SAHAD: subgraph analysis in massive networks using Hadoop. In Proc. 26th Int’l. Parallel and Distributed Processing Symp., pp. 390–401
16	Slota, G. M. and Madduri, K. (2013) Fast approximate subgraph counting and enumeration. In Proc. IEEE 42nd Int’l Conf. on Parallel Processing, pp. 210–219
17	Slota, G. M. and Madduri, K. (2014) Complex network analysis using parallel approximate motif counting. In Proc. 28th Int’l. Parallel and Distributed Processing Symp., pp. 405–414
18	Slota, G. M. and Madduri, K. (2015) Parallel color-coding. Parallel Comput., 47, 51–69 https://doi.org/10.1016/j.parco.2015.02.004
19	Wernicke, S. (2006) Efficient detection of network motifs. IEEE/ACM Trans. Comput. Biol. Bioinformatics, 3, 347–359 https://doi.org/10.1109/TCBB.2006.51 pmid: 17085844
20	Kashani, Z. R., Ahrabian, H., Elahi, E., Nowzari-Dalini, A., Ansari, E. S., Asadi, S., Mohammadi, S., Schreiber, F. and Masoudi-Nejad, A. (2009) Kavosh: a new algorithm for finding network motifs. BMC Bioinformatics, 10, 318 https://doi.org/10.1186/1471-2105-10-318 pmid: 19799800
21	Khakabimamaghani, S., Sharafuddin, I., Dichter, N., Koch, I. and Masoudi-Nejad, A. (2013) QuateXelero: an accelerated exact network motif detection algorithm. PLoS One, 8, e68073 https://doi.org/10.1371/journal.pone.0068073 pmid: 23874498
22	Paredes, P. and Ribeiro, P. (2013) Towards a faster network-centric subgraph census. In IEEE/ACM Int’l Conf. on Advances in Social Networks Analysis and Mining, pp. 264–271
23	Ferreira, R., Grossi, R. and Rizzi, R. (2011) Output-sensitive listing of bounded-size trees in undirected graphs. In Proc. ESA’11, pp. 275–286
24	Wasa, K. (2016) Enumeration of enumeration algorithms. arXiv:1605.05102
25	Bu, D., Zhao, Y., Cai, L., Xue, H., Zhu, X., Lu, H., Zhang, J., Sun, S., Ling, L., Zhang, N., et al. (2003) Topological structure analysis of the protein-protein interaction network in budding yeast. Nucleic Acids Res., 31, 2443–2450 https://doi.org/10.1093/nar/gkg340 pmid: 12711690
26	Batagelj, V. and Mrvar, A. (2006) Pajek Datasets. Available:
27	ISCAS89 benchmark suite.
28	Lusseau, D., Schneider, K., Boisseau, O. J., Haase, P., Slooten, E. and Dawson, S. M. (2003) The bottlenose dolphin community of doubtful sound features a large proportion of long-lasting associations. Behav. Ecol. Sociobiol., 54, 396–405 https://doi.org/10.1007/s00265-003-0651-y
29	Newman, M. (2009) Network Data. Available:
30	Xenarios, I., Salwínski, L., Duan, X. J., Higney, P., Kim, S. M. and Eisenberg, D. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 30, 303–305 https://doi.org/10.1093/nar/30.1.303 pmid: 11752321
31	Mewes, H. W., Amid, C., Arnold, R., Frishman, D., Güldener, U., Mannhaupt, G., Münsterkötter, M., Pagel, P., Strack, N., Stümpflen, V., et al. (2004) MIPS: analysis and annotation of proteins from whole genomes. Nucleic Acids Res., 32, D41– D44 https://doi.org/10.1093/nar/gkh092 pmid: 14681354
32	Wan, S. and Zou, Q. (2017) HAlign-II: efficient ultra-large multiple sequence alignment and phylogenetic tree reconstruction with distributed and parallel computing. Algorithms Mol. Biol., 12, 25 https://doi.org/10.1186/s13015-017-0116-x pmid: 29026435
33	Zou, Q., Hu, Q., Guo, M. and Wang, G. (2015) HAlign: fast multiple similar DNA/RNA sequence alignment based on the centre star strategy. Bioinformatics, 31, 2475–2481 https://doi.org/10.1093/bioinformatics/btv177 pmid: 25812743
34	Stinson, D. (1999) Combinatorial Algorithms: Generation, Enumeration, and Search, pp. 48–52. Boca Raton: CRC Press
35	Alamgir, Z. and Abbasi, S. (2007) Combinatorial algorithms for listing paths in minimal change order. In Proc. Fourth Conf. Combinatorial and Algorithmic Aspects of Networking, pp. 112–130
36	Hilton, P. and Pedersen, J. (1991) Catalan numbers, their generalization, and their uses. Math. Intell., 13, 64–75 https://doi.org/10.1007/BF03024089
37	Aho, A., Hopcroft, J. and Ullman, J. (1974) The Design and Analysis of Computer Algorithms, pp 84–85. New Jersey: Addison-Wesley Publishing Co.
38	Heubach, S. and Mansour, T. (2004) Compositions of n with parts in a set. Congr. Numer., 168, 127–143

[1]

Download

[1]	Ye Yuan, Xinying Ren, Zhen Xie, Xiaowo Wang. A quantitative understanding of microRNA- mediated competing endogenous RNA regulation[J]. Quant. Biol., 2016, 4(1): 47-57.

Viewed

Full text

Abstract

Cited

Shared

Discussed