| 
					
						|  |  
    					|  |  
    					| Prediction of chromatin looping using deep hybrid learning (DHL) |  
						| Mateusz Chiliński1,2, Anup Kumar Halder1,2, Dariusz Plewczynski1,2(  ) |  
						| 1. Faculty of Mathematics and Information Sciences, Warsaw University of Technology, 00-662 Warsaw, Poland 2. Laboratory of Functional and Structural Genomics Centre of New Technologies University of Warsaw, 02-097 Warsaw, Poland
 |  
						|  |  
					
						| 
								
									|  
          
          
            
              
				
								                
													
													    |  |  
														| 
													
													    | Abstract Background: With the development of rapid and cheap sequencing techniques, the cost of whole-genome sequencing (WGS) has dropped significantly. However, the complexity of the human genome is not limited to the pure sequence—and additional experiments are required to learn the human genome’s influence on complex traits. One of the most exciting aspects for scientists nowadays is the spatial organisation of the genome, which can be discovered using spatial experiments (e.g., Hi-C, ChIA-PET). The information about the spatial contacts helps in the analysis and brings new insights into our understanding of the disease developments.  Methods: We have used an ensemble of deep learning with classical machine learning algorithms. The deep learning network we used was DNABERT, which utilises the BERT language model (based on transformers) for the genomic function. The classical machine learning models included support vector machines (SVMs), random forests (RFs), and K-nearest neighbor (KNN). The whole approach was wrapped together as deep hybrid learning (DHL).  Results: We found that the DNABERT can be used to predict the ChIA-PET experiments with high precision. Additionally, the DHL approach has increased the metrics on CTCF and RNAPII sets.  Conclusions: DHL approach should be taken into consideration for the models utilising the power of deep learning. While straightforward in the concept, it can improve the results significantly. |  
															| Keywords 
																																																				deep learning  
																		  																																				3D genomics  
																		  																																				transformers  
																		  																																				spatial organisation of nucleus  
																		  																																				ChIA-PET  
																		  																																				DNA-Seq |  
															| Corresponding Author(s):
																Dariusz Plewczynski |  
															| About author: * These authors contributed equally to this work. |  
															| Just Accepted Date: 10 February 2023  
																																														Online First Date: 13 March 2023   
																																														Issue Date: 21 June 2023 |  |  
								            
								                
																																												
															| 1 | E. S., Lander, L. M., Linton, B., Birren, C., Nusbaum, M. C., Zody, J., Baldwin, K., Devon, K., Dewar, M., Doyle, W. FitzHugh, et al.. (2001). Initial sequencing and analysis of the human genome. Nature, 409: 860–921 https://doi.org/10.1038/35057062
 |  
															| 2 | I. H. G. S. Consortium, (2004). Finishing the euchromatic sequence of the human genome. Nature, 431: 931–945 https://doi.org/10.1038/nature03001
 |  
															| 3 | G. R., Abecasis, D., Altshuler, A., Auton, L. D., Brooks, R. M., Durbin, R. A., Gibbs, M. E. Hurles, G. A. McVean, (2010). A map of human genome variation from population-scale sequencing. Nature, 467: 1061–1073 https://doi.org/10.1038/nature09534
 |  
															| 4 | A., Auton, L. D., Brooks, R. M., Durbin, E. P., Garrison, H. M., Kang, J. O., Korbel, J. L., Marchini, S., McCarthy, G. A., McVean, G. R. Abecasis, et al.. (2015). A global reference for human genetic variation. Nature, 526: 68–74 https://doi.org/10.1038/nature15393
 |  
															| 5 | M. J. P., Chaisson, A. D., Sanders, X., Zhao, A., Malhotra, D., Porubsky, T., Rausch, E. J., Gardner, O. L., Rodriguez, L., Guo, R. L. Collins, et al.. (2019). Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun., 10: 1784 https://doi.org/10.1038/s41467-018-08148-z
 |  
															| 6 | K., Ozaki, Y., Ohnishi, A., Iida, A., Sekine, R., Yamada, T., Tsunoda, H., Sato, H., Sato, M., Hori, Y. Nakamura, et al.. (2002). Functional SNPs in the lymphotoxin-α gene that are associated with susceptibility to myocardial infarction. Nat. Genet., 32: 650–654 https://doi.org/10.1038/ng1047
 |  
															| 7 | A. Pombo, (2015). Three-dimensional genome architecture: players and mechanisms. Nat. Rev. Mol. Cell Biol., 16: 245–257 https://doi.org/10.1038/nrm3965
 |  
															| 8 | J., Dekker, K., Rippe, M. Dekker, (2002). Capturing chromosome conformation. Science, 295: 1306–1311 https://doi.org/10.1126/science.1067799
 |  
															| 9 | M., Simonis, P., Klous, E., Splinter, Y., Moshkin, R., Willemsen, E., de Wit, B. van Steensel, (2006). Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat. Genet., 38: 1348–1354 https://doi.org/10.1038/ng1896
 |  
															| 10 | E., Lieberman-Aiden, N. L., van Berkum, L., Williams, M., Imakaev, T., Ragoczy, A., Telling, I., Amit, B. R., Lajoie, P. J., Sabo, M. O. Dorschner, et al.. (2009). Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science, 326: 289–293 https://doi.org/10.1126/science.1181369
 |  
															| 11 | M. J., Fullwood, M. H., Liu, Y. F., Pan, J., Liu, H., Xu, Y. B., Mohamed, Y. L., Orlov, S., Velkov, A., Ho, P. H. Mei, et al.. (2009). An oestrogen-receptor-alpha-bound human chromatin interactome. Nature, 462: 58–64 https://doi.org/10.1038/nature08497
 |  
															| 12 | G., Fudenberg, D. R. Kelley, K. Pollard, (2020). Predicting 3D genome folding from DNA sequence with Akita. Nat. Methods, 17: 1111–1117 https://doi.org/10.1038/s41592-020-0958-x
 |  
															| 13 | J., TanN., Shenker-TaurisJ., Rodriguez-HernaezE., WangT., SakellaropoulosF., BoccalatteP., ThandapaniJ., SkokI., Aifantis. (2022) Cell type-specific prediction of 3D chromatin architecture. Nat. Biotechnol., |  
															| 14 | J., Devlin, M. Chang, K. Lee, (2018). Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv, 181004805 |  
															| 15 | J., Zou, M., Huss, A., Abid, P., Mohammadi, A. Torkamani, (2019). A primer on deep learning in genomics. Nat. Genet., 51: 12–18 https://doi.org/10.1038/s41588-018-0295-5
 |  
															| 16 | A. Sherstinsky. (2020) Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 404: 132306 |  
															| 17 | Y., Ji, Z., Zhou, H. Liu, R. Davuluri, (2021). DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37: 2112–2120 https://doi.org/10.1093/bioinformatics/btab083
 |  
															| 18 | C. Cortes, (1995). Support-vector networks. Mach. Learn., 20: 273–297 https://doi.org/10.1007/BF00994018
 |  
															| 19 | L. Breiman, (2001). Random forests. Mach. Learn., 45: 5–32 https://doi.org/10.1023/A:1010933404324
 |  
															| 20 | E. Fix, J. Hodges, (1989). Discriminatory analysis. Nonparametric discrimination: consistency properties. Int. Stat. Rev., 57: 238–247 https://doi.org/10.2307/1403797
 |  
															| 21 | S. S. P., Rao, M. H., Huntley, N. C., Durand, E. K., Stamenova, I. D., Bochkov, J. T., Robinson, A. L., Sanborn, I., Machol, A. D., Omer, E. S. Lander, et al.. (2014). A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159: 1665–1680 https://doi.org/10.1016/j.cell.2014.11.021
 |  
															| 22 | E. McArthur, J. Capra, (2021). Topologically associating domain boundaries that are stable across diverse cell types are evolutionarily constrained and enriched for heritability. Am. J. Hum. Genet., 108: 269–283 https://doi.org/10.1016/j.ajhg.2021.01.001
 |  
															| 23 | A. Halder, P., Chatterjee, M., Nasipuri, D. Plewczynski, (2019). 3gClust: human protein cluster analysis. IEEE/ACM Trans. Comput. Biol. Bioinforma., 16: 1773–1784 https://doi.org/10.1109/TCBB.2018.2840996
 |  
								            
												
											    	
											        	|  | Viewed |  
											        	|  |  |  
												        |  | Full text 
 | 
 
 |  
												        |  |  |  
												        |  | Abstract 
 | 
 |  
												        |  |  |  
												        |  | Cited |  |  
												        |  |  |  |  
													    |  | Shared |  |  
													    |  |  |  |  
													    |  | Discussed |  |  |  |  |