Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (3) : 541-554    https://doi.org/10.1007/s11704-016-5300-5
RESEARCH ARTICLE
Finding susceptible and protective interaction patterns in large-scale genetic association study
Yuan LI1,2, Yuhai ZHAO1(), Guoren WANG1, Xiaofeng ZHU3, Xiang ZHANG2, Zhanghui WANG1, Jun PANG1
1. School of Computer Science and Engineering, Northeastern University, Shenyang 110819, China
2. Department of Electronic Engineering and Computer Science, Case Western Reserve University, Cleveland OH 44106, USA
3. Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland OH 44106, USA
 Download: PDF(987 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Interaction detection in large-scale genetic association studies has attracted intensive research interest, since many diseases have complex traits. Various approaches have been developed for finding significant genetic interactions. In this article, we propose a novel framework SRMiner to detect interacting susceptible and protective genotype patterns. SRMiner can discover not only probable combination of single nucleotide polymorphisms (SNPs) causing diseases but also the corresponding SNPs suppressing their pathogenic functions, which provides a better prospective to uncover the underlying relevance between genetic variants and complex diseases. We have performed extensive experiments on several real Wellcome Trust Case Control Consortium (WTCCC) datasets. We use the pathway-based and the protein-protein interaction (PPI) network-based evaluation methods to verify the discovered patterns. The results show that SRMiner successfully identifies many disease-related genes verified by the existing work. Furthermore, SRMiner can also infer some uncomfirmed but highly possible disease-related genes.

Keywords genetic association studies      genotype pattern mining      data mining      bioinformatics     
Corresponding Author(s): Yuhai ZHAO   
Just Accepted Date: 17 June 2016   Online First Date: 14 September 2016    Issue Date: 25 May 2017
 Cite this article:   
Yuan LI,Yuhai ZHAO,Guoren WANG, et al. Finding susceptible and protective interaction patterns in large-scale genetic association study[J]. Front. Comput. Sci., 2017, 11(3): 541-554.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5300-5
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I3/541
1 LiJ, WangL M, GuoM Z, Zhang R J, DaiQ G , LiuX Y, WangC Y, TengZ, Xuan P, ZhangM M . Mining disease genes using integrated protein-protein interaction and gene-gene co-regulation information. FEBS Open Bio, 2015, 5(1): 251–256
https://doi.org/10.1016/j.fob.2015.03.011
2 CordellH J. Detecting gene-gene interactions that underlie human diseases. Natural Reviews Genetics, 2009, 10(6): 392–404
https://doi.org/10.1038/nrg2579
3 ZengX X, ZhangX, ZouQ. Integrative approaches for predicting microRNA function and prioritizing disease-related microRNA using biological interaction networks. Briefings in Bioinformatics, 2015
4 ZouQ, LiJ J, SongL, Zeng X X, WangG H . Similarity computation strategies in the microRNA-disease network: a survey. Briefings in Functional Genomics, 2016, 15(1): 55–64
5 ZhangL, ChenS C, LiuX J. Detecting differential expression from RNA-seq data with expression measurement uncertainty. Frontiers of Computer Science, 2015, 9(4): 652–663
https://doi.org/10.1007/s11704-015-4308-6
6 ShangJ L, ZhangJ Y, SunY, Liu D, YeD J , YinY L. Performance analysis of novel methods for detecting epistasis. BMC Bioinformatics, 2011, 12(1)
https://doi.org/10.1186/1471-2105-12-475
7 WangY, LiuG M, FengM L, Wong L. An empirical comparison of several recent epistatic interaction detection methods. Bioinformatics, 2011, 27(21): 2936–2943
https://doi.org/10.1093/bioinformatics/btr512
8 LiP, GuoM Z, WangC Y, Liu X Y, ZouQ . An overview of SNP interactions in genome-wide association studies. Briefings in Functional Genomics, 2014, 14(3): 129–141
9 LiJ, HuangD L, GuoM Z, Liu X Y, WangC Y , TengZ X, ZhangR J, JiangY S, Lv H C, WangL M . A gene-based information gain method for detecting gene-gene interactions in case-control studies. European Journal of Human Genetics, 2015
https://doi.org/10.1038/ejhg.2015.16
10 PanJ B, HuS C, WangH, Zou Q, JiZ L . PaGeFinder: quantitative identification of spatiotemporal pattern genes. Bioinformatics, 2012, 28(11): 1544–1545
https://doi.org/10.1093/bioinformatics/bts169
11 InfanteJ, SanzC, Fernández-LunaJ L , LlorcaJ, Berciano J, CombarrosO . Gene-gene interaction between interleukin-1A and interleukin-8 increases Alzheimer’s disease risk. Journal of Neurology, 2004, 251(4): 482–483
https://doi.org/10.1007/s00415-004-0375-6
12 CombarrosO, van Duijn C M, HammondN , BelbinO, Arias-Vásquez A, Cortina-BorjaM , LehmannM G, Aulchenko Y S, SchuurM , KölschH. Replication by the Epistasis Project of the interaction between the genes for IL-6 and IL-10 in the risk of Alzheimer’s disease. Journal of Neuroinflammation, 2009, 6(1): 22
https://doi.org/10.1186/1742-2094-6-22
13 BaryshnikovaA, Costanzo M, MyersC L , AndrewsB, BooneC. Genetic interaction networks: toward an understanding of heritability. Annual Review of Genomics and Human Genetics, 2013, 14(1)
https://doi.org/10.1146/annurev-genom-082509-141730
14 GoldsteinD B. Common genetic variation and human traits. New England Journal of Medicine, 2009, 360(17): 1696
https://doi.org/10.1056/NEJMp0806284
15 McCarthyM I, Abecasis G R, CardonL R , GoldsteinD B, LittleJ, IoannidisJ P A , HirschhornJ N. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Reviews Genetics, 2008, 9(5): 356–369
https://doi.org/10.1038/nrg2344
16 MooreJ H, Gilbert J C, TsaiC T , ChiangF T, HoldenT, BarneyN, White B C. A flexible computational framework for detecting, characterizing, and interpreting statistical patterns of epistasis in genetic studies of human disease susceptibility. Journal of Theoretical Biology, 2006, 241(2): 252–261
https://doi.org/10.1016/j.jtbi.2005.11.036
17 WanX, YangC, YangQ, Xue H, FanX D , TangN L S, YuW C. BOOST: a fast approach to detecting gene-gene interactions in genome-wide case-control studies. The American Journal of Human Genetics, 2010, 87(3): 325–340
https://doi.org/10.1016/j.ajhg.2010.07.021
18 WanX, YangCan, YangQ, Xue H, TangN L S , YuW C. Predictive rule inference for epistatic interaction detection in genome-wide association studies. Bioinformatics, 2010, 26(1): 30–37
https://doi.org/10.1093/bioinformatics/btp622
19 ZhangY, LiuJ S. Bayesian inference of epistatic interactions in casecontrol studies. Nature Genetics, 2007, 39(9): 1167–1173
https://doi.org/10.1038/ng2110
20 ZhangX, HuangS P, ZouF, Wang W. TEAM: efficient two-locus epistasis tests in human genome-wide association study. Bioinformatics, 2010, 26(12): i217–i227
https://doi.org/10.1093/bioinformatics/btq186
21 JanssensA C J W, van Duijn C M. Genome-based prediction of common diseases: advances and prospects. Human Molecular Genetics, 2008, 17(R2): R166–R173
https://doi.org/10.1093/hmg/ddn250
22 AbdiH, Williams L J. Principal component analysis. Wiley Interdisciplinary Reviews: Computational Statistics, 2010, 2(4): 433–459
https://doi.org/10.1002/wics.101
23 ZhaoY H, WangG R, LiY, WangZ H. Finding novel diagnostic gene patterns based on interesting non-redundant contrast sequence rules. In: Proceedings of IEEE International Conference on Data Mining. 2011, 972–981
https://doi.org/10.1109/icdm.2011.68
24 MontgomeryS. Linkage disequilibrium—understanding the evolutionary past and mapping the medical future. Nature Reviews Genetics, 2008, 9(6): 477–485
https://doi.org/10.1038/nrg2361
25 PurcellS, NealeB, Todd-BrownK , ThomasL, Ferreira M A R, BenderD , MallerJ, SklarP, de BakkerP I W , DalyM J, ShamP C. PLINK: A tool set for whole-genome association and population-based linkage analyses. American Journal of Human Genetics, 2007, 81(3): 559–575
https://doi.org/10.1086/519795
26 GoldbergA V. Finding a maximum density subgraph. University of California Berkeley, CA, 1984
27 CharikarM. Greedy approximation algorithms for finding dense components in a graph. Approximation Algorithms for Combinatorial Optimization, 2000, 139–152
https://doi.org/10.1007/3-540-44436-x_10
28 FanW, ZhangK, ChengH, Gao J, YanX F , HanJ W, YuP, VerscheureO . Direct mining of discriminative and essential frequent patterns via model-based search tree. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 230–238
https://doi.org/10.1145/1401890.1401922
29 The Well come Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature, 2007, 447(7145): 661–678
https://doi.org/10.1038/nature05911
30 HanJ W, PeiJ, YinY W. Mining frequent patterns without candidate generation. ACM SIGMOD Record, 2000, 29(2): 1–12
https://doi.org/10.1145/335191.335372
31 PanF, CongG, TungA K H, Yang J, ZakiM J . Carpenter: finding closed patterns in long biological datasets. In: Proceedings of ACM International Conference on Knowledge Discovery and Data Mining. 2003, 637–642
32 SacconeS F, QuanJ X, JonesP L. BioQ: tracing experimental origins in public genomic databases using a novel data provenance model. Bioinformatics, 2012, 28(8): 1189–1191
https://doi.org/10.1093/bioinformatics/bts117
33 Chatr-aryamontriA, Breitkreutz B J, HeinickeS , BoucherL, WinterA, StarkC, Nixon J, RamageL , KolasN, O’Donmell L. The BioGRID interaction database: 2013 update. Nucleic Acids Research, 2013, 41(D1): D816–D823
https://doi.org/10.1093/nar/gks1158
34 WangK, LiM Y, BucanM. Pathway-based approaches for analysis of genomewide association studies. The American Journal of Human Genetics, 2007, 81(6): 1278–1283
https://doi.org/10.1086/522374
35 ChenL S, HutterC M, PotterJ D, Liu Y, PrenticeR L , PetersU, HsuL. Insights into colon cancer etiology via a regularized approach to gene set analysis of gwas data. The American Journal of Human Genetics, 2010, 86(6): 860–871
https://doi.org/10.1016/j.ajhg.2010.04.014
36 LiM X, KwanJ S H, ShamP C. HYST: A hybrid set-based test for genome-wide association studies, with application to protein-protein interaction-based association analysis. The American Journal of Human Genetics, 2012, 91(3): 478–488
https://doi.org/10.1016/j.ajhg.2012.08.004
37 PawsonT, NashP. Protein–protein interactions define specificity in signal transduction. Genes & Development, 2000, 14(9): 1027–1047
38 SharanR, Ulitsky I, ShamirR . Network-based prediction of protein function. Molecular Systems Biology, 2007, 3(1): 88
https://doi.org/10.1038/msb4100129
[1] FCS-0541-15300-YHZ_suppl_1 Download
[1] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[2] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[3] Guijuan ZHANG, Yang LIU, Xiaoning JIN. A survey of autoencoder-based recommender systems[J]. Front. Comput. Sci., 2020, 14(2): 430-450.
[4] Lu LIU, Shang WANG. Meta-path-based outlier detection in heterogeneous information network[J]. Front. Comput. Sci., 2020, 14(2): 388-403.
[5] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[6] Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[7] Shuaiqiang WANG, Yilong YIN. Polygene-based evolutionary algorithms with frequent pattern mining[J]. Front. Comput. Sci., 2018, 12(5): 950-965.
[8] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[9] Jinyu CHEN, Shihua ZHANG. Integrative cancer genomics: models, algorithms and analysis[J]. Front. Comput. Sci., 2017, 11(3): 392-406.
[10] Junhua LU,Wei CHEN,Yuxin MA,Junming KE,Zongzhuang LI,Fan ZHANG,Ross MACIEJEWSKI. Recent progress and trends in predictive visual analytics[J]. Front. Comput. Sci., 2017, 11(2): 192-207.
[11] Chengliang WANG,Yayun PENG,Debraj DE,Wen-Zhan SONG. DPHK: real-time distributed predicted data collecting based on activity pattern knowledge mined from trajectories in smart environments[J]. Front. Comput. Sci., 2016, 10(6): 1000-1011.
[12] Xin XU,Wei WANG,Jianhong WANG. A three-way incremental-learning algorithm for radar emitter identification[J]. Front. Comput. Sci., 2016, 10(4): 673-688.
[13] Wenmei LIU,Hui LIU. Major motivations for extract method refactorings: analysis based on interviews and change histories[J]. Front. Comput. Sci., 2016, 10(4): 644-656.
[14] Yaobin HE, Haoyu TAN, Wuman LUO, Shengzhong FENG, Jianping FAN. MR-DBSCAN: a scalable MapReduce-based DBSCAN algorithm for heavily skewed data[J]. Front. Comput. Sci., 2014, 8(1): 83-99.
[15] Fabian GIESEKE, Gabriel MORUZ, Jan VAHRENHOLD. Resilient k-d trees: k-means in space revisited[J]. Front Comput Sci, 2012, 6(2): 166-178.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed