Please wait a minute...
Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

Postal Subscription Code 80-971

Quant. Biol.    2024, Vol. 12 Issue (3) : 231-244    https://doi.org/10.1002/qub2.52
Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation
Yuyang Wang1,2, Yu Sun1, Zeyu Liu3, Bijia Chen1, Hebing Chen1, Chao Ren1, Xuanwei Lin2, Pengzhen Hu4, Peiheng Jia5, Xiang Xu1, Kang Xu6, Ximeng Liu2(), Hao Li1,3(), Xiaochen Bo1()
1. Institute of Health Service and Transfusion Medicine, Beijing, China
2. College of Computer and Data Science, Fuzhou University, Fuzhou, China
3. Beijing Institute of Radiation Medicine, Beijing, China
4. School of Life Sciences, Northwestern Polytechnical University, Xi’an, China
5. School of Mathematics and Computer Science, Shanxi Normal University, Taiyuan, China
6. School of Software, Shandong University, Qingdao, China
 Download: PDF(3157 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.

Keywords copy number variant      deep learning      graph convolution network      Hi‐C     
Corresponding Author(s): Ximeng Liu,Hao Li,Xiaochen Bo   
Issue Date: 09 October 2024
 Cite this article:   
Yuyang Wang,Yu Sun,Zeyu Liu, et al. Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation[J]. Quant. Biol., 2024, 12(3): 231-244.
 URL:  
https://academic.hep.com.cn/qb/EN/10.1002/qub2.52
https://academic.hep.com.cn/qb/EN/Y2024/V12/I3/231
1 Ö Gökçümen , C Lee . Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods. 2009; 49 (1): 18- 25.
2 L Seaman , H Chen , M Brown , D Wangsa , G Patterson , J Camps , et al. Nucleome analysis reveals structure-function relationships for colon cancer. Mol Cancer Res. 2017; 15 (7): 821- 30.
3 F Ay , TH Vu , MJ Zeitz , N Varoquaux , JE Carette , J-P Vert , et al. Identifying multi-locus chromatin contacts in human cells using tethered multiple 3C. BMC Genom. 2015; 16: 1- 17.
4 DG Lupiáñez , M Spielmann , S Mundlos . Breaking TADs: how alterations of chromatin domains result in disease. Trends Genet. 2016; 32 (4): 225- 37.
5 JR Dixon , S Selvaraj , F Yue , A Kim , Y Li , Y Shen , et al. Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature. 2012; 485 (7398): 376- 80.
6 J Wang , H Tao , H Li , X Bo , H Chen . 3D genomic organization in cancers. Quant Biol. 2023; 11 (2): 109- 21.
7 P Wu , T Li , R Li , L Jia , P Zhu , Y Liu , et al. 3D genome of multiple myeloma reveals spatial genome disorganization associated with copy number variations. Nat Commun. 2017; 8 (1): 1937.
8 P Guan , W-K Sung . Structural variation detection using next-generation sequencing data: a comparative technical review. Methods. 2016; 102: 36- 49.
9 S Wang , S Lee , C Chu , D Jain , P Kerpedjiev , GM Nelson , et al. HiNT: a computational method for detecting copy number variations and translocations from Hi-C data. Genome Biol. 2020; 21: 1- 15.
10 A Chakraborty , F Ay . Identification of copy number variations and translocations in cancer cells from Hi-C data. Bioinformatics. 2018; 34 (2): 338- 45.
11 V Gordeeva , E Sharova , G Arapidi . Progress in methods for copy number variation profiling. Int J Mol Sci. 2022; 23 (4): 2143.
12 E Lieberman-Aiden , NL Van Berkum , L Williams , M Imakaev , T Ragoczy , A Telling , et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science. 2009; 326 (5950): 289- 93.
13 TJ Stevens , D Lando , S Basu , LP Atkinson , Y Cao , SF Lee , et al. 3D structures of individual mammalian genomes studied by single-cell Hi-C. Nature. 2017; 544 (7648): 59- 64.
14 H-J Wu , F Michor . A computational strategy to adjust for copy numberintumorHi-Cdata. Bioinformatics. 2016; 32 (24): 3695- 701.
15 J Jumper , R Evans , A Pritzel , T Green , M Figurnov , O Ronneberger , et al. Highly accurate protein structure prediction with AlphaFold. Nature. 2021; 596 (7873): 583- 9.
16 S Wang , S Sun , Z Li , R Zhang , JJ Xu . Accurate de novo prediction of protein contact map by ultra-deep learning model. PLoS Comput Biol. 2017; 13 (1): e1005324.
17 W Li , WH Wong , RJ Jiang . DeepTACT: predicting 3D chromatin contacts via bootstrapping deep learning. Nucleic Acids Res. 2019; 47 (10): e60.
18 R Schwessinger , M Gosden , D Downes , RC Brown , AM Oudelaar , J Telenius , et al. DeepC: predicting 3D genome folding using megabase-scale transfer learning. Nat Methods. 2020; 17 (11): 1118- 24.
19 G Fudenberg , DR Kelley , KS Pollard . Predicting 3D genome folding from DNA sequence with Akita. Nat Methods. 2020; 17 (11): 1111- 7.
20 Y Zhang , L An , J Xu , B Zhang , WJ Zheng , M Hu , et al. Enhancing Hi-C data resolution with deep convolutional neural network HiCPlus. Nat Commun. 2018; 9 (1): 750.
21 X Wang , Y Luan , FJ Yue . EagleC: a deep-learning framework for detecting a full range of structural variations from bulk and single-cell contact maps. Sci Adv. 2022; 8 (24): eabn9215.
22 TN Kipf , M Welling . Semi‐supervised classification with graph convolutional networks; 2016. Preprint at arXiv:1609.02907.
23 H Li , Y Sun , H Hong , X Huang , H Tao , Q Huang , et al. Inferring transcription factor regulatory networks from single-cell ATAC-seq data based on graph neural networks. Nat Mach Intell. 2022; 4: 389- 400.
24 M Imakaev , G Fudenberg , RP McCord , N Naumova , A Goloborodko , BR Lajoie , et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat Methods. 2012; 9 (10): 999- 1003.
25 A Grover , J Leskovec . Node2vec: scalable feature learning for networks. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining; 2016. p. 855- 64.
26 K He , X Zhang , S Ren , J Sun . Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR); 2015. p. 770- 8.
[1] Xinyue Li, Zhankun Xiong, Wen Zhang, Shichao Liu. Deep learning for drug-drug interaction prediction: A comprehensive review[J]. Quant. Biol., 2024, 12(1): 30-52.
[2] Chao Pang, Henry H. Y. Tong, Leyi Wei. Advanced deep learning methods for molecular property prediction[J]. Quant. Biol., 2023, 11(4): 395-404.
[3] Yuanpeng Xiong, Xuan He, Dan Zhao, Tao Jiang, Jianyang Zeng. DeepRCI: predicting RNA-chromatin interactions via deep learning with multi-omics data[J]. Quant. Biol., 2023, 11(3): 275-286.
[4] Qin Xie, Wei Ma, Jianhang Zhang, Shiliang Li, Xiaobing Deng, Youjun Xu, Weilin Zhang. Exploration on learning molecular docking with deep learning models[J]. Quant. Biol., 2023, 11(3): 320-331.
[5] Qijin Yin, Rui Fan, Xusheng Cao, Qiao Liu, Rui Jiang, Wanwen Zeng. DeepDrug: A general graph-based deep learning framework for drug-drug interactions and drug-target interactions prediction[J]. Quant. Biol., 2023, 11(3): 260-274.
[6] Xiuquan Wang, Mian Umair Ahsan, Yunyun Zhou, Kai Wang. Transformer-based DNA methylation detection on ionic signals from Oxford Nanopore sequencing data[J]. Quant. Biol., 2023, 11(3): 287-296.
[7] Mateusz Chiliński, Anup Kumar Halder, Dariusz Plewczynski. Prediction of chromatin looping using deep hybrid learning (DHL)[J]. Quant. Biol., 2023, 11(2): 155-162.
[8] Haiyan Gong, Zhengyuan Chen, Yuxin Tang, Minghong Li, Sichen Zhang, Xiaotong Zhang, Yang Chen. Computational methods for identifying enhancer-promoter interactions[J]. Quant. Biol., 2023, 11(2): 122-142.
[9] Huijie Sun, Junli Zhao, Chengyuan Wang, Yi Li, Niankai Zhang, Mingquan Zhou. Skull ethnic classification by combining skull auxiliary image with deep learning[J]. Quant. Biol., 2022, 10(4): 381-389.
[10] Xiaokang Chai, Yachao Di, Zhao Feng, Yue Guan, Guoqing Zhang, Anan Li, Qingming Luo. Deep learning-based large-scale named entity recognition for anatomical region of mammalian brain[J]. Quant. Biol., 2022, 10(3): 253-263.
[11] HyeongChan Jo, Juhyun Kim, Tzu-Chen Huang, Yu-Li Ni. condLSTM-Q: A novel deep learning model for predicting COVID-19 mortality in fine geographical scale[J]. Quant. Biol., 2022, 10(2): 125-138.
[12] Aishwarza Panday, Muhammad Ashad Kabir, Nihad Karim Chowdhury. A survey of machine learning techniques for detecting and diagnosing COVID-19 from imaging[J]. Quant. Biol., 2022, 10(2): 188-207.
[13] Md. Bahadur Badsha, Rui Li, Boxiang Liu, Yang I. Li, Min Xian, Nicholas E. Banovich, Audrey Qiuyan Fu. Imputation of single-cell gene expression with an autoencoder neural network[J]. Quant. Biol., 2020, 8(1): 78-94.
[14] Jie Ren, Kai Song, Chao Deng, Nathan A. Ahlgren, Jed A. Fuhrman, Yi Li, Xiaohui Xie, Ryan Poplin, Fengzhu Sun. Identifying viruses from metagenomic data using deep learning[J]. Quant. Biol., 2020, 8(1): 64-77.
[15] Jie Zheng, Ke Wang. Emerging deep learning methods for single-cell RNA-seq data analysis[J]. Quant. Biol., 2019, 7(4): 247-254.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed