|
|
|
Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation |
Yuyang Wang1,2, Yu Sun1, Zeyu Liu3, Bijia Chen1, Hebing Chen1, Chao Ren1, Xuanwei Lin2, Pengzhen Hu4, Peiheng Jia5, Xiang Xu1, Kang Xu6, Ximeng Liu2( ), Hao Li1,3( ), Xiaochen Bo1( ) |
1. Institute of Health Service and Transfusion Medicine, Beijing, China 2. College of Computer and Data Science, Fuzhou University, Fuzhou, China 3. Beijing Institute of Radiation Medicine, Beijing, China 4. School of Life Sciences, Northwestern Polytechnical University, Xi’an, China 5. School of Mathematics and Computer Science, Shanxi Normal University, Taiyuan, China 6. School of Software, Shandong University, Qingdao, China |
|
|
|
|
Abstract Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.
|
| Keywords
copy number variant
deep learning
graph convolution network
Hi‐C
|
|
Corresponding Author(s):
Ximeng Liu,Hao Li,Xiaochen Bo
|
|
Issue Date: 09 October 2024
|
|
| 1 |
Ö Gökçümen , C Lee . Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009; 49 (1): 18- 25.
|
| 2 |
L Seaman , H Chen , M Brown , D Wangsa , G Patterson , J Camps , et al. Nucleome analysis reveals structure-function relationships for colon cancer. Mol Cancer Res. 2017; 15 (7): 821- 30.
|
| 3 |
F Ay , TH Vu , MJ Zeitz , N Varoquaux , JE Carette , J-P Vert , et al. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genom. 2015; 16: 1- 17.
|
| 4 |
DG Lupiáñez , M Spielmann , S Mundlos . Breaking TADs: how alterations of chromatin domains result in disease. Trends Genet. 2016; 32 (4): 225- 37.
|
| 5 |
JR Dixon , S Selvaraj , F Yue , A Kim , Y Li , Y Shen , et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485 (7398): 376- 80.
|
| 6 |
J Wang , H Tao , H Li , X Bo , H Chen . 3D genomic organization in cancers. Quant Biol. 2023; 11 (2): 109- 21.
|
| 7 |
P Wu , T Li , R Li , L Jia , P Zhu , Y Liu , et al. 3D genome of multiple myeloma reveals spatial genome disorganization associated with copy number variations. Nat Commun. 2017; 8 (1): 1937.
|
| 8 |
P Guan , W-K Sung . Structural variation detection using next-generation sequencing data: a comparative technical review. Methods. 2016; 102: 36- 49.
|
| 9 |
S Wang , S Lee , C Chu , D Jain , P Kerpedjiev , GM Nelson , et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 2020; 21: 1- 15.
|
| 10 |
A Chakraborty , F Ay . Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics. 2018; 34 (2): 338- 45.
|
| 11 |
V Gordeeva , E Sharova , G Arapidi . Progress in methods for copy number variation profiling. Int J Mol Sci. 2022; 23 (4): 2143.
|
| 12 |
E Lieberman-Aiden , NL Van Berkum , L Williams , M Imakaev , T Ragoczy , A Telling , et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326 (5950): 289- 93.
|
| 13 |
TJ Stevens , D Lando , S Basu , LP Atkinson , Y Cao , SF Lee , et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017; 544 (7648): 59- 64.
|
| 14 |
H-J Wu , F Michor . A computational strategy to adjust for copy numberintumorHi-Cdata. Bioinformatics. 2016; 32 (24): 3695- 701.
|
| 15 |
J Jumper , R Evans , A Pritzel , T Green , M Figurnov , O Ronneberger , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.
|
| 16 |
S Wang , S Sun , Z Li , R Zhang , JJ Xu . Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol. 2017; 13 (1): e1005324.
|
| 17 |
W Li , WH Wong , RJ Jiang . DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res. 2019; 47 (10): e60.
|
| 18 |
R Schwessinger , M Gosden , D Downes , RC Brown , AM Oudelaar , J Telenius , et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods. 2020; 17 (11): 1118- 24.
|
| 19 |
G Fudenberg , DR Kelley , KS Pollard . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17 (11): 1111- 7.
|
| 20 |
Y Zhang , L An , J Xu , B Zhang , WJ Zheng , M Hu , et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun. 2018; 9 (1): 750.
|
| 21 |
X Wang , Y Luan , FJ Yue . EagleC: a deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci Adv. 2022; 8 (24): eabn9215.
|
| 22 |
TN Kipf , M Welling . Semi‐supervised classification with graph convolutional networks; 2016. Preprint at arXiv:1609.02907.
|
| 23 |
H Li , Y Sun , H Hong , X Huang , H Tao , Q Huang , et al. Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. Nat Mach Intell. 2022; 4: 389- 400.
|
| 24 |
M Imakaev , G Fudenberg , RP McCord , N Naumova , A Goloborodko , BR Lajoie , et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012; 9 (10): 999- 1003.
|
| 25 |
A Grover , J Leskovec . Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 855- 64.
|
| 26 |
K He , X Zhang , S Ren , J Sun . Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2015. p. 770- 8.
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
| |
Shared |
|
|
|
|
| |
Discussed |
|
|
|
|