|
|
|
Computational tools for Hi-C data analysis |
Zhijun Han1,2, Gang Wei1( ) |
1. CAS Key Laboratory of Computational Biology, Collaborative Innovation Center for Genetics and Developmental Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences, Shanghai 200031, China 2. University of Chinese Academy of Sciences, Beijing 100049, China |
|
|
|
|
Abstract Background: In eukaryotic genome, chromatin is not randomly distributed in cell nuclei, but instead is organized into higher-order structures. Emerging evidence indicates that these higher-order chromatin structures play important roles in regulating genome functions such as transcription and DNA replication. With the advancement in 3C (chromosome conformation capture) based technologies, Hi-C has been widely used to investigate genome-wide long-range chromatin interactions during cellular differentiation and oncogenesis. Since the first publication of Hi-C assay in 2009, lots of bioinformatic tools have been implemented for processing Hi-C data from mapping raw reads to normalizing contact matrix and high interpretation, either providing a whole workflow pipeline or focusing on a particular process. Results: This article reviews the general Hi-C data processing workflow and the currently popular Hi-C data processing tools. We highlight on how these tools are used for a full interpretation of Hi-C results. Conclusions: Hi-C assay is a powerful tool to investigate the higher-order chromatin structure. Continued development of novel methods for Hi-C data analysis will be necessary for better understanding the regulatory function of genome organization.
|
| Author Summary Hi-C, the derivative of the chromosome conformation capture (3C) technology, has been widely used to dissect chromatin architecture and greatly contributed to our understanding of the relationship between genome organization and genome function. The computational methods for data analysis are essential for a full interpretation of Hi-C data. In this article, we review the general Hi-C data processing workflow and popular Hi-C data processing tools. We also discuss the challenges and future perspective regarding the improvement of Hi-C data analysis. |
| Keywords
3D genome structure
Hi-C data processing tool
chromatin interactions
|
|
Corresponding Author(s):
Gang Wei
|
| About author: Tongcan Cui and Yizhe Hou contributed equally to this work. |
|
Online First Date: 01 August 2017
Issue Date: 24 August 2017
|
|
| 1 |
D. U. Gorkin, , D. Leung, and B. Ren, (2014) The 3D genome in transcriptional regulation and pluripotency. Cell Stem Cell, 14, 762–775
https://doi.org/10.1016/j.stem.2014.05.017
pmid: 24905166
|
| 2 |
J. E. Phillips-Cremins, , M. E. Sauria, , A. Sanyal, , T. I. Gerasimova, , B. R. Lajoie, , J. S. Bell, , C. T. Ong, , T. A. Hookway, , C. Guo, , Y. Sun, , et al. (2013) Architectural protein subclasses shape 3D organization of genomes during lineage commitment. Cell, 153, 1281–1295
https://doi.org/10.1016/j.cell.2013.04.053
pmid: 23706625
|
| 3 |
J. Dekker, , K. Rippe, , M. Dekker, and N. Kleckner, (2002) Capturing chromosome conformation. Science, 295, 1306–1311
https://doi.org/10.1126/science.1067799
pmid: 11847345
|
| 4 |
M. Simonis, , P. Klous, , E. Splinter, , Y. Moshkin, , R. Willemsen, , E. de Wit, , B. van Steensel, and W. de Laat, (2006) Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38, 1348–1354
https://doi.org/10.1038/ng1896
pmid: 17033623
|
| 5 |
J. Dostie, , T. A. Richmond, , R. A. Arnaout, , R. R. Selzer, , W. L. Lee, , T. A. Honan, , E. D. Rubio, , A. Krumm, , J. Lamb, , C. Nusbaum, , et al. (2006) Chromosome conformation capture carbon copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res., 16, 1299–1309
https://doi.org/10.1101/gr.5571506
pmid: 16954542
|
| 6 |
E. Lieberman-Aiden, , N. L. van Berkum, , L. Williams, , M. Imakaev, , T. Ragoczy, , A. Telling, , I. Amit, , B. R. Lajoie, , P. J. Sabo, , M. O. Dorschner, , et al. (2009) Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326, 289–293
https://doi.org/10.1126/science.1181369
pmid: 19815776
|
| 7 |
M. J. Fullwood, , M. H. Liu, , Y. F. Pan, , J. Liu, , H. Xu, , Y. B. Mohamed, , Y. L. Orlov, , S. Velkov, , A. Ho, , P. H. Mei, , et al. (2009) An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462, 58–64
https://doi.org/10.1038/nature08497
pmid: 19890323
|
| 8 |
R. Jäger, , G. Migliorini, , M. Henrion, , R. Kandaswamy, , H. E. Speedy, , A. Heindl, , N. Whiffin, , M. J. Carnicer, , L. Broome, , N. Dryden, , et al. (2015) Capture Hi-C identifies the chromatin interactome of colorectal cancer risk loci. Nat. Commun., 6, 6178
https://doi.org/10.1038/ncomms7178
pmid: 25695508
|
| 9 |
J. R. Dixon, , S. Selvaraj, , F. Yue, , A. Kim, , Y. Li, , Y. Shen, , M. Hu, , J. S. Liu, and B. Ren, (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature, 485, 376–380
https://doi.org/10.1038/nature11082
pmid: 22495300
|
| 10 |
A. D. Schmitt, , M. Hu, , I. Jung, , Z. Xu, , Y. Qiu, , C. L. Tan, , Y. Li, , S. Lin, , Y. Lin, , C. L. Barr, , et al. (2016) A Compendium of chromatin contact maps reveals spatially active regions in the human genome. Cell Rep., 17, 2042–2059
https://doi.org/10.1016/j.celrep.2016.10.061
pmid: 27851967
|
| 11 |
G. Castellano, , F. Le Dily, , A. Hermoso Pulido, , M. Beato, and G. Roma, (2015) Hi-Cpipe: a pipeline for high-throughput chromosome capture. bioRxiv, doi: https://doi.org/10.1101/020636
|
| 12 |
HiC-Box. available from
|
| 13 |
M. W. Schmid, , S. Grob, and U. Grossniklaus, (2015) HiCdat: a fast and easy-to-use Hi-C data analysis tool. BMC Bioinformatics, 16, 277
https://doi.org/10.1186/s12859-015-0678-x
pmid: 26334796
|
| 14 |
Y. C. Hwang, , C. F. Lin, , O. Valladares, , J. Malamon, , P. P. Kuksa, , Q. Zheng, , B. D. Gregory, and L. S. Wang, (2015) HIPPIE: a high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics, 31, 1290–1292
https://doi.org/10.1093/bioinformatics/btu801
pmid: 25480377
|
| 15 |
N. C. Durand, , M. S. Shamim, , I. Machol, , S. S. Rao, , M. H. Huntley, , E. S. Lander, and E. L. Aiden, (2016) Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst., 3, 95–98
https://doi.org/10.1016/j.cels.2016.07.002
pmid: 27467249
|
| 16 |
M. Imakaev, , G. Fudenberg, , R. P. McCord, , N. Naumova, , A. Goloborodko, , B. R. Lajoie, , J. Dekker, and L. A. Mirny, (2012) Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods, 9, 999–1003
https://doi.org/10.1038/nmeth.2148
pmid: 22941365
|
| 17 |
S. Wingett, , P. Ewels, , M. Furlan-Magaril, , T. Nagano, , S. Schoenfelder, , P. Fraser, and S. Andrews, (2015) HiCUP: pipeline for mapping and processing Hi-C data. F1000Res, 4, 1310
pmid: 26835000
|
| 18 |
N. Servant, , N. Varoquaux, , B. R. Lajoie, , E. Viara, , C. J. Chen, , J. P. Vert, , E. Heard, , J. Dekker, and E. Barillot, (2015) HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol., 16, 259
https://doi.org/10.1186/s13059-015-0831-x
pmid: 26619908
|
| 19 |
F. Serra, , D. Baù, , G. Filion, and M. A. Marti-Renom, (2016) Structural features of the fly chromatin colors revealed by automatic three-dimensional modeling. bioRxiv, doi: https://doi.org/10.1101/036764
|
| 20 |
H. Li, , B. Handsaker, , A. Wysoker, , T. Fennell, , J. Ruan, , N. Homer, , G. Marth, , G. Abecasis, , R. Durbin, , and the 1000 Genome Project Data Processing Subgroup. (2009) The sequence alignment/map format and SAMtools. Bioinformatics, 25, 2078–2079
https://doi.org/10.1093/bioinformatics/btp352
pmid: 19505943
|
| 21 |
W. Ma, , F. Ay, , C. Lee, , G. Gulsoy, , X. Deng, , S. Cook, , J. Hesson, , C. Cavanaugh, , C. B. Ware, , A. Krumm, , et al. (2015) Fine-scale chromatin interaction maps reveal the cis-regulatory landscape of human lincRNA genes. Nat. Methods, 12, 71–78
https://doi.org/10.1038/nmeth.3205
pmid: 25437436
|
| 22 |
M. Hu, , K. Deng, , S. Selvaraj, , Z. Qin, , B. Ren, and J. S. Liu, (2012) HiCNorm: removing biases in Hi-C data via Poisson regression. Bioinformatics, 28, 3131–3133
https://doi.org/10.1093/bioinformatics/bts570
pmid: 23023982
|
| 23 |
P. A. Knight, and D. Ruiz, (2013) A fast algorithm for matrix balancing. IMA J. Numer. Anal., 33, 1029–1047
https://doi.org/10.1093/imanum/drs019
|
| 24 |
E. Yaffe, and A. Tanay, (2011) Probabilistic modeling of Hi-C contact maps eliminates systematic biases to characterize global chromosomal architecture. Nat. Genet., 43, 1059–1065
https://doi.org/10.1038/ng.947
pmid: 22001755
|
| 25 |
S. S. Rao, , M. H. Huntley, , N. C. Durand, , E. K. Stamenova, , I. D. Bochkov, , J. T. Robinson, , A. L. Sanborn, , I. Machol, , A. D. Omer, , E. S. Lander, , et al. (2014) A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159, 1665–1680
https://doi.org/10.1016/j.cell.2014.11.021
pmid: 25497547
|
| 26 |
T. Sexton, , E. Yaffe, , E. Kenigsberg, , F. Bantignies, , B. Leblanc, , M. Hoichman, , H. Parrinello, , A. Tanay, and G. Cavalli, (2012) Three-dimensional folding and functional organization principles of the Drosophila genome. Cell, 148, 458–472
https://doi.org/10.1016/j.cell.2012.01.010
pmid: 22265598
|
| 27 |
D. Filippova, , R. Patro, , G. Duggal, and C. Kingsford, (2014) Identification of alternative topological domains in chromatin. Algorithms Mol. Biol., 9, 14
https://doi.org/10.1186/1748-7188-9-14
pmid: 24868242
|
| 28 |
C. Lévy-Leduc, , M. Delattre, , T. Mary-Huard, and S. Robin, (2014) Two-dimensional segmentation for analyzing Hi-C data. Bioinformatics, 30, i386–i392
https://doi.org/10.1093/bioinformatics/btu443
pmid: 25161224
|
| 29 |
Y. Wang, , Y. Li, , J. Gao, and M. Q. Zhang, (2015) A novel method to identify topological domains using Hi-C data. Quant. Biol., 3, 81–89
https://doi.org/10.1007/s40484-015-0047-9
|
| 30 |
X. Zhou, , R. F. Lowdon, , D. Li, , H. A. Lawson, , P. A. Madden, , J. F. Costello, and T. Wang, (2013) Exploring long-range genome interactions using the WashU Epigenome Browser. Nat. Methods, 10, 375–376
https://doi.org/10.1038/nmeth.2440
pmid: 23629413
|
| 31 |
The 3D Genome Browser.
|
| 32 |
D. Karolchik, , G. P. Barber, , J. Casper, , H. Clawson, , M. S. Cline, , M. Diekhans, , T. R. Dreszer, , P. A. Fujita, , L. Guruvadoo, , M. Haeussler, , et al. (2014) The UCSC Genome Browser database: 2014 update. Nucleic Acids Res., 42, D764–D770
https://doi.org/10.1093/nar/gkt1168
pmid: 24270787
|
| 33 |
T. M. Asbury, , M. Mitman, , J. Tang, and W. J. Zheng, (2010) Genome3D: a viewer-model framework for integrating and visualizing multi-scale epigenomic information within a three-dimensional genome. BMC Bioinformatics, 11, 444
https://doi.org/10.1186/1471-2105-11-444
pmid: 20813045
|
| 34 |
T. E. Lewis, , I. Sillitoe, , A. Andreeva, , T. L. Blundell, , D. W. Buchan, , C. Chothia, , D. Cozzetto, , J. M. Dana, , I. Filippis, , J. Gough, , et al. (2015) Genome3D: exploiting structure to help users understand their sequences. Nucleic Acids Res., 43, D382–D386
https://doi.org/10.1093/nar/gku973
pmid: 25348407
|
| 35 |
T. E. Lewis, , I. Sillitoe, , A. Andreeva, , T. L. Blundell, , D. W. Buchan, , C. Chothia, , A. Cuff, , J. M. Dana, , I. Filippis, , J. Gough, , et al. (2013) Genome3D: a UK collaborative project to annotate genomic sequences with predicted 3D structures based on SCOP and CATH domains. Nucleic Acids Res., 41, D499–D507
https://doi.org/10.1093/nar/gks1266
pmid: 23203986
|
| 36 |
TADkit. available from
|
| 37 |
F. Ay, and W. S. Noble, (2015) Analysis methods for studying the 3D architecture of the genome. Genome Biol., 16, 183
https://doi.org/10.1186/s13059-015-0745-7
pmid: 26328929
|
| 38 |
A. D. Schmitt, , M. Hu, and B. Ren, (2016) Genome-wide mapping and analysis of chromosome architecture. Nat. Rev. Mol. Cell Biol., 17, 743–755
https://doi.org/10.1038/nrm.2016.104
pmid: 27580841
|
| 39 |
N. Ashish, , P. Dewan, , J. L. Ambite, and A. W. Toga, (2015) GEM: the GAAIN entity mapper. Data Integr. Life Sci., 9162, 13–27
https://doi.org/10.1007/978-3-319-21843-4_2
pmid: 26665184
|
| 40 |
S. Marco-Sola, , M. Sammeth, , R. Guigó, and P. Ribeca, (2012) The GEM mapper: fast, accurate and versatile alignment by filtration. Nat. Methods, 9, 1185–1188
https://doi.org/10.1038/nmeth.2221
pmid: 23103880
|
| 41 |
N. C. Durand, , J. T. Robinson, , M. S. Shamim, , I. Machol, , J. P. Mesirov, , E. S. Lander, and E. L. Aiden, (2016) Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst., 3, 99–101
https://doi.org/10.1016/j.cels.2015.07.012
pmid: 27467250
|
| 42 |
W. Li, , K. Gong, , Q. Li, , F. Alber, and X. J. Zhou, (2015) Hi-Corrector: a fast, scalable and memory-efficient package for normalizing large-scale Hi-C data. Bioinformatics, 31, 960–962
https://doi.org/10.1093/bioinformatics/btu747
pmid: 25391400
|
| 43 |
M. E. Sauria, , J. E. Phillips-Cremins, , V. G. Corces, and J. Taylor, (2015) HiFive: a tool suite for easy and efficient HiC and 5C data analysis. Genome Biol., 16, 237
https://doi.org/10.1186/s13059-015-0806-y
pmid: 26498826
|
| 44 |
A. T. Lun, and G. K. Smyth, (2015) diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data. BMC Bioinformatics, 16, 258
https://doi.org/10.1186/s12859-015-0683-0
pmid: 26283514
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|