Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2017, Vol. 5 Issue (3) : 215-225    https://doi.org/10.1007/s40484-017-0113-6
REVIEW
Computational tools for Hi-C data analysis
Zhijun Han1,2, Gang Wei1()
1. CAS Key Laboratory of Computational Biology, Collaborative Innovation Center for Genetics and Developmental Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China
2. University of Chinese Academy of Sciences, Beijing 100049, China
 Download: PDF(466 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long-range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process.

Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results.

Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.

Author Summary  Hi-C, the derivative of the chromosome conformation capture (3C) technology, has been widely used to dissect chromatin architecture and greatly contributed to our understanding of the relationship between genome organization and genome function. The computational methods for data analysis are essential for a full interpretation of Hi-C data. In this article, we review the general Hi-C data processing workflow and popular Hi-C data processing tools. We also discuss the challenges and future perspective regarding the improvement of Hi-C data analysis.
Keywords 3D genome structure      Hi-C data processing tool      chromatin interactions     
Corresponding Author(s): Gang Wei   
About author:

Tongcan Cui and Yizhe Hou contributed equally to this work.

Online First Date: 01 August 2017    Issue Date: 24 August 2017
 Cite this article:   
Zhijun Han,Gang Wei. Computational tools for Hi-C data analysis[J]. Quant. Biol., 2017, 5(3): 215-225.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1007/s40484-017-0113-6
https://academic.hep.com.cn/qb/EN/Y2017/V5/I3/215
  General Hi-C data processing workflow.
  Filtering mapped PETs using restriction enzyme (RE) fragments.
Tool Aligner Mapping strategy PETs filtering Normalization Descriptions
Hiclib [15] Bowtie2 Iterative RE fragments ICE No standalone pipeline provided. runHiC is based on hiclib and is command-line based
HIPPIE [14] BWA - - Explicit model Designed for high performance computing cluster with Oracle Grid Engine. Can integrate with epigenetic datasets and GWAS data
HiC-inspector [11] Bowtie - RE fragments Coverage correction RE filtering only keeps PETs with 3'-end facing the restriction site. Command-line based and provides simple interactive browser
HiC-Box [12] Bowtie2 - Not detailed Not detailed GUI based, compatible with Genome Re-Assembly Assessing Likelihood (GRAAL). No published paper with details
HiC-Pro [18] Bowtie2 Trimming RE fragments Optimized ICE Command-line based and easy to use. Provides complete workflow from mapping to normalized matrix, can handle SNP information
HiCUP [17] Bowtie, Bowtie2 Pre-truncation RE fragments - Command-line based with incomplete workflow, needs other tools such as HiCpipe [11] to finish normalization and other processes
HiCdat [13] Subread, Bowtie2 - RE fragments Three options GUI and R based with mapping command provided but not piped. Provides comprehensive functions for high-order analysis and integrating with epigenetic datasets
TADbit [19] GEM Iterative /Trimming RE fragments ICE No standalone pipeline provided. Can call and compare TADs between samples. No published paper with details
Juicer [15] BWA - RE fragments Matrix balancing Command-line based. Provides many high-order functions such as calling TADs, loops, compartments and displaying with Juicebox
Tab.1  Tools for Hi-C data processing pipeline.
Tool name Model assumption Description
Normalization HiCNorm [22] Three systematic biases Generalized linear regression-based
method, much faster than Yaffe’s method [24]
Hi-Corrector [42] Matrix balancing Parallelized and memory-controllable ICE, very fast
HiFive [43] Three options GUI based and integrated into Galaxy
HiCNormCis [10] Three systematic biases Poisson-regression-based method for local regions, result can be used to call FIREs. Not publicly available
Calling TADs DI-HMM [9] Directional indexes bias with HMM Insensitive to parameters and hence it is hard to identify sub-TADs
Arrowhead [15] Dynamic programming Can call sub-TADs, integrated in Juicer
Armatus [27] Multi-scale approach Can call TADs with different scales, but not easy to choose fine scale ranges
HiCseg [28] Linear segmentation Turn 2D into 1D, can model the uncertainty
CHDF [29] Dynamic programming Robust to different resolution but need users to control the total number of TADs for each chr
Tab.2  Tools for post-processing Hi-C data (tools provide complete workflow are listed in Table 1).
1 D. U. Gorkin, , D. Leung, and B. Ren, (2014) The 3D genome in transcriptional regulation and pluripotency. Cell Stem Cell, 14, 762–775
https://doi.org/10.1016/j.stem.2014.05.017 pmid: 24905166
2 J. E. Phillips-Cremins, , M. E. Sauria, , A. Sanyal, , T. I. Gerasimova, , B. R. Lajoie, , J. S. Bell, , C. T. Ong, , T. A. Hookway, , C. Guo, , Y. Sun, , et al. (2013) Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell, 153, 1281–1295
https://doi.org/10.1016/j.cell.2013.04.053 pmid: 23706625
3 J. Dekker, , K. Rippe, , M. Dekker, and N. Kleckner, (2002) Capturing chromosome conformation. Science, 295, 1306–1311
https://doi.org/10.1126/science.1067799 pmid: 11847345
4 M. Simonis, , P. Klous, , E. Splinter, , Y. Moshkin, , R. Willemsen, , E. de Wit, , B. van Steensel, and W. de Laat, (2006) Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38, 1348–1354
https://doi.org/10.1038/ng1896 pmid: 17033623
5 J. Dostie, , T. A. Richmond, , R. A. Arnaout, , R. R. Selzer, , W. L. Lee, , T. A. Honan, , E. D. Rubio, , A. Krumm, , J. Lamb, , C. Nusbaum, , et al. (2006) Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res., 16, 1299–1309
https://doi.org/10.1101/gr.5571506 pmid: 16954542
6 E. Lieberman-Aiden, , N. L. van Berkum, , L. Williams, , M. Imakaev, , T. Ragoczy, , A. Telling, , I. Amit, , B. R. Lajoie, , P. J. Sabo, , M. O. Dorschner, , et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293
https://doi.org/10.1126/science.1181369 pmid: 19815776
7 M. J. Fullwood, , M. H. Liu, , Y. F. Pan, , J. Liu, , H. Xu, , Y. B. Mohamed, , Y. L. Orlov, , S. Velkov, , A. Ho, , P. H. Mei, , et al. (2009) An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462, 58–64
https://doi.org/10.1038/nature08497 pmid: 19890323
8 R. Jäger, , G. Migliorini, , M. Henrion, , R. Kandaswamy, , H. E. Speedy, , A. Heindl, , N. Whiffin, , M. J. Carnicer, , L. Broome, , N. Dryden, , et al. (2015) Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun., 6, 6178
https://doi.org/10.1038/ncomms7178 pmid: 25695508
9 J. R. Dixon, , S. Selvaraj, , F. Yue, , A. Kim, , Y. Li, , Y. Shen, , M. Hu, , J. S. Liu, and B. Ren, (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376–380
https://doi.org/10.1038/nature11082 pmid: 22495300
10 A. D. Schmitt, , M. Hu, , I. Jung, , Z. Xu, , Y. Qiu, , C. L. Tan, , Y. Li, , S. Lin, , Y. Lin, , C. L. Barr, , et al. (2016) A Compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep., 17, 2042–2059
https://doi.org/10.1016/j.celrep.2016.10.061 pmid: 27851967
11 G. Castellano, , F. Le Dily, , A. Hermoso Pulido, , M. Beato, and G. Roma, (2015) Hi-Cpipe: a pipeline for high-throughput chromosome capture. bioRxiv, doi: https://doi.org/10.1101/020636
12 HiC-Box. available from
13 M. W. Schmid, , S. Grob, and U. Grossniklaus, (2015) HiCdat: a fast and easy-to-use Hi-C data analysis tool. BMC Bioinformatics, 16, 277
https://doi.org/10.1186/s12859-015-0678-x pmid: 26334796
14 Y. C. Hwang, , C. F. Lin, , O. Valladares, , J. Malamon, , P. P. Kuksa, , Q. Zheng, , B. D. Gregory, and L. S. Wang, (2015) HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics, 31, 1290–1292
https://doi.org/10.1093/bioinformatics/btu801 pmid: 25480377
15 N. C. Durand, , M. S. Shamim, , I. Machol, , S. S. Rao, , M. H. Huntley, , E. S. Lander, and E. L. Aiden, (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 3, 95–98
https://doi.org/10.1016/j.cels.2016.07.002 pmid: 27467249
16 M. Imakaev, , G. Fudenberg, , R. P. McCord, , N. Naumova, , A. Goloborodko, , B. R. Lajoie, , J. Dekker, and L. A. Mirny, (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods, 9, 999–1003
https://doi.org/10.1038/nmeth.2148 pmid: 22941365
17 S. Wingett, , P. Ewels, , M. Furlan-Magaril, , T. Nagano, , S. Schoenfelder, , P. Fraser, and S. Andrews, (2015) HiCUP: pipeline for mapping and processing Hi-C data. F1000Res, 4, 1310
pmid: 26835000
18 N. Servant, , N. Varoquaux, , B. R. Lajoie, , E. Viara, , C. J. Chen, , J. P. Vert, , E. Heard, , J. Dekker, and E. Barillot, (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol., 16, 259
https://doi.org/10.1186/s13059-015-0831-x pmid: 26619908
19 F. Serra, , D. Baù, , G. Filion, and M. A. Marti-Renom, (2016) Structural features of the fly chromatin colors revealed by automatic three-dimensional modeling. bioRxiv, doi: https://doi.org/10.1101/036764
20 H. Li, , B. Handsaker, , A. Wysoker, , T. Fennell, , J. Ruan, , N. Homer, , G. Marth, , G. Abecasis, , R. Durbin, , and the 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079
https://doi.org/10.1093/bioinformatics/btp352 pmid: 19505943
21 W. Ma, , F. Ay, , C. Lee, , G. Gulsoy, , X. Deng, , S. Cook, , J. Hesson, , C. Cavanaugh, , C. B. Ware, , A. Krumm, , et al. (2015) Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat. Methods, 12, 71–78
https://doi.org/10.1038/nmeth.3205 pmid: 25437436
22 M. Hu, , K. Deng, , S. Selvaraj, , Z. Qin, , B. Ren, and J. S. Liu, (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics, 28, 3131–3133
https://doi.org/10.1093/bioinformatics/bts570 pmid: 23023982
23 P. A. Knight, and D. Ruiz, (2013) A fast algorithm for matrix balancing. IMA J. Numer. Anal., 33, 1029–1047
https://doi.org/10.1093/imanum/drs019
24 E. Yaffe, and A. Tanay, (2011) Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet., 43, 1059–1065
https://doi.org/10.1038/ng.947 pmid: 22001755
25 S. S. Rao, , M. H. Huntley, , N. C. Durand, , E. K. Stamenova, , I. D. Bochkov, , J. T. Robinson, , A. L. Sanborn, , I. Machol, , A. D. Omer, , E. S. Lander, , et al. (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159, 1665–1680
https://doi.org/10.1016/j.cell.2014.11.021 pmid: 25497547
26 T. Sexton, , E. Yaffe, , E. Kenigsberg, , F. Bantignies, , B. Leblanc, , M. Hoichman, , H. Parrinello, , A. Tanay, and G. Cavalli, (2012) Three-dimensional folding and functional organization principles of the Drosophila genome. Cell, 148, 458–472
https://doi.org/10.1016/j.cell.2012.01.010 pmid: 22265598
27 D. Filippova, , R. Patro, , G. Duggal, and C. Kingsford, (2014) Identification of alternative topological domains in chromatin. Algorithms Mol. Biol., 9, 14
https://doi.org/10.1186/1748-7188-9-14 pmid: 24868242
28 C. Lévy-Leduc, , M. Delattre, , T. Mary-Huard, and S. Robin, (2014) Two-dimensional segmentation for analyzing Hi-C data. Bioinformatics, 30, i386–i392
https://doi.org/10.1093/bioinformatics/btu443 pmid: 25161224
29 Y. Wang, , Y. Li, , J. Gao, and M. Q. Zhang, (2015) A novel method to identify topological domains using Hi-C data. Quant. Biol., 3, 81–89
https://doi.org/10.1007/s40484-015-0047-9
30 X. Zhou, , R. F. Lowdon, , D. Li, , H. A. Lawson, , P. A. Madden, , J. F. Costello, and T. Wang, (2013) Exploring long-range genome interactions using the WashU Epigenome Browser. Nat. Methods, 10, 375–376
https://doi.org/10.1038/nmeth.2440 pmid: 23629413
31 The 3D Genome Browser.
32 D. Karolchik, , G. P. Barber, , J. Casper, , H. Clawson, , M. S. Cline, , M. Diekhans, , T. R. Dreszer, , P. A. Fujita, , L. Guruvadoo, , M. Haeussler, , et al. (2014) The UCSC Genome Browser database: 2014 update. Nucleic Acids Res., 42, D764–D770
https://doi.org/10.1093/nar/gkt1168 pmid: 24270787
33 T. M. Asbury, , M. Mitman, , J. Tang, and W. J. Zheng, (2010) Genome3D: a viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics, 11, 444
https://doi.org/10.1186/1471-2105-11-444 pmid: 20813045
34 T. E. Lewis, , I. Sillitoe, , A. Andreeva, , T. L. Blundell, , D. W. Buchan, , C. Chothia, , D. Cozzetto, , J. M. Dana, , I. Filippis, , J. Gough, , et al. (2015) Genome3D: exploiting structure to help users understand their sequences. Nucleic Acids Res., 43, D382–D386
https://doi.org/10.1093/nar/gku973 pmid: 25348407
35 T. E. Lewis, , I. Sillitoe, , A. Andreeva, , T. L. Blundell, , D. W. Buchan, , C. Chothia, , A. Cuff, , J. M. Dana, , I. Filippis, , J. Gough, , et al. (2013) Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Res., 41, D499–D507
https://doi.org/10.1093/nar/gks1266 pmid: 23203986
36 TADkit. available from
37 F. Ay, and W. S. Noble, (2015) Analysis methods for studying the 3D architecture of the genome. Genome Biol., 16, 183
https://doi.org/10.1186/s13059-015-0745-7 pmid: 26328929
38 A. D. Schmitt, , M. Hu, and B. Ren, (2016) Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol., 17, 743–755
https://doi.org/10.1038/nrm.2016.104 pmid: 27580841
39 N. Ashish, , P. Dewan, , J. L. Ambite, and A. W. Toga, (2015) GEM: the GAAIN entity mapper. Data Integr. Life Sci., 9162, 13–27
https://doi.org/10.1007/978-3-319-21843-4_2 pmid: 26665184
40 S. Marco-Sola, , M. Sammeth, , R. Guigó, and P. Ribeca, (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods, 9, 1185–1188
https://doi.org/10.1038/nmeth.2221 pmid: 23103880
41 N. C. Durand, , J. T. Robinson, , M. S. Shamim, , I. Machol, , J. P. Mesirov, , E. S. Lander, and E. L. Aiden, (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst., 3, 99–101
https://doi.org/10.1016/j.cels.2015.07.012 pmid: 27467250
42 W. Li, , K. Gong, , Q. Li, , F. Alber, and X. J. Zhou, (2015) Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data. Bioinformatics, 31, 960–962
https://doi.org/10.1093/bioinformatics/btu747 pmid: 25391400
43 M. E. Sauria, , J. E. Phillips-Cremins, , V. G. Corces, and J. Taylor, (2015) HiFive: a tool suite for easy and efficient HiC and 5C data analysis. Genome Biol., 16, 237
https://doi.org/10.1186/s13059-015-0806-y pmid: 26498826
44 A. T. Lun, and G. K. Smyth, (2015) diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics, 16, 258
https://doi.org/10.1186/s12859-015-0683-0 pmid: 26283514
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed