Quantitative Biology

ISSN 2095-4689

ISSN 2095-4697(Online)

CN 10-1028/TM

邮发代号 80-971

   优先出版

合作单位

2024年, 第12卷 第3期 出版日期:2024-09-15

选择: 合并摘要 显示/隐藏图片
Effectiveness of machine learning at modeling the relationship between Hi‐C data and copy number variation
Yuyang Wang, Yu Sun, Zeyu Liu, Bijia Chen, Hebing Chen, Chao Ren, Xuanwei Lin, Pengzhen Hu, Peiheng Jia, Xiang Xu, Kang Xu, Ximeng Liu, Hao Li, Xiaochen Bo
Quantitative Biology. 2024, 12 (3): 231-244.  
https://doi.org/10.1002/qub2.52

摘要   PDF (3157KB)

Copy number variation (CNV) refers to the number of copies of a specific sequence in a genome and is a type of chromatin structural variation. The development of the Hi‐C technique has empowered research on the spatial structure of chromatins by capturing interactions between DNA fragments. We utilized machine‐learning methods including the linear transformation model and graph convolutional network (GCN) to detect CNV events from Hi‐C data and reveal how CNV is related to three‐dimensional interactions between genomic fragments in terms of the one‐dimensional read count signal and features of the chromatin structure. The experimental results demonstrated a specific linear relation between the Hi‐C read count and CNV for each chromosome that can be well qualified by the linear transformation model. In addition, the GCN‐based model could accurately extract features of the spatial structure from Hi‐C data and infer the corresponding CNV across different chromosomes in a cancer cell line. We performed a series of experiments including dimension reduction, transfer learning, and Hi‐C data perturbation to comprehensively evaluate the utility and robustness of the GCN‐based model. This work can provide a benchmark for using machine learning to infer CNV from Hi‐C data and serves as a necessary foundation for deeper understanding of the relationship between Hi‐C data and CNV.

参考文献 | 相关文章 | 多维度评价
Comprehensive cross cancer analyses reveal mutational signature cancer specificity
Rui Xin, Limin Jiang, Hui Yu, Fengyao Yan, Jijun Tang, Yan Guo
Quantitative Biology. 2024, 12 (3): 245-254.  
https://doi.org/10.1002/qub2.49

摘要   PDF (2474KB)

Mutational signatures refer to distinct patterns of DNA mutations that occur in a specific context or under certain conditions. It is a powerful tool to describe cancer etiology. We conducted a study to show cancer heterogeneity and cancer specificity from the aspect of mutational signatures through collinearity analysis and machine learning techniques. Through thorough training and independent validation, our results show that while the majority of the mutational signatures are distinct, similarities between certain mutational signature pairs can be observed through both mutation patterns and mutational signature abundance. The observation can potentially assist to determine the etiology of yet elusive mutational signatures. Further analysis using machine learning approaches demonstrated moderate mutational signature cancer specificity. Skin cancer among all cancer types demonstrated the strongest mutational signature specificity.

参考文献 | 相关文章 | 多维度评价
A substructure‐aware graph neural network incorporating relation features for drug–drug interaction prediction
Liangcheng Dong, Baoming Feng, Zengqian Deng, Jinlong Wang, Peihao Ni, Yuanyuan Zhang
Quantitative Biology. 2024, 12 (3): 255-270.  
https://doi.org/10.1002/qub2.66

摘要   PDF (1493KB)

Identifying drug–drug interactions (DDIs) is an important aspect of drug design research, and predicting DDIs serves as a crucial guarantee for avoiding potential adverse effects. Current substructure‐based prediction methods still have some limitations: (ⅰ) The process of substructure extraction does not fully exploit the graph structure information of drugs, as it only evaluates the importance of different radius substructures from a single perspective. (ⅱ) The process of constructing drug representations has overlooked the significant impact of relation embedding on optimizing drug representations. In this work, we propose a substructure‐aware graph neural network incorporating relation features (RFSA‐DDI) for DDI prediction, which introduces a directed message passing neural network with substructure attention mechanism based on graph self‐adaptive pooling (GSP‐DMPNN) and a substructure‐aware interaction module incorporating relation features (RSAM). GSP‐DMPNN utilizes graph self‐adaptive pooling to comprehensively consider node features and local drug information for adaptive extraction of substructures. RSAM interacts drug features with relation representations to enhance their respective features individually, highlighting substructures that significantly impact predictions. RFSA‐DDI is evaluated on two real‐world datasets. Compared to existing methods, RFSA‐DDI demonstrates certain advantages in both transductive and inductive settings, effectively handling the task of predicting DDIs for unseen drugs and exhibiting good generalization capability. The experimental results show that RFSA‐DDI can effectively capture valuable structural information of drugs more accurately for DDI prediction, and provide more reliable assistance for potential DDIs detection in drug development and treatment stages.

参考文献 | 相关文章 | 多维度评价
Characterizing diseases using genetic and clinical variables: A data analytics approach
Madhuri Gollapalli, Harsh Anand, Satish Mahadevan Srinivasan
Quantitative Biology. 2024, 12 (3): 271-285.  
https://doi.org/10.1002/qub2.46

摘要   PDF (892KB)

Predictive analytics is crucial in precision medicine for personalized patient care. To aid in precision medicine, this study identifies a subset of genetic and clinical variables that can serve as predictors for classifying diseased tissues/disease types. To achieve this, experiments were performed on diseased tissues obtained from the L1000 dataset to assess differences in the functionality and predictive capabilities of genetic and clinical variables. In this study, the k‐means technique was used for clustering the diseased tissue types, and the multinomial logistic regression (MLR) technique was applied for classifying the diseased tissue types. Dimensionality reduction techniques including principal component analysis and Boruta are used extensively to reduce the dimensionality of genetic and clinical variables. The results showed that landmark genes performed slightly better in clustering diseased tissue types compared to any random set of 978 non‐landmark genes, and the difference is statistically significant. Furthermore, it was evident that both clinical and genetic variables were important in predicting the diseased tissue types. The top three clinical predictors for predicting diseased tissue types were identified as morphology, gender, and age of diagnosis. Additionally, this study explored the possibility of using the latent representations of the clusters of landmark and non‐landmark genes as predictors for an MLR classifier. The classification models built using MLR revealed that landmark genes can serve as a subset of genetic variables and/or as a proxy for clinical variables. This study concludes that combining predictive analytics with dimensionality reduction effectively identifies key predictors in precision medicine, enhancing diagnostic accuracy.

参考文献 | 相关文章 | 多维度评价
Mathematical modeling of evolution of cell networks in epithelial tissues
Ivan Krasnyakov
Quantitative Biology. 2024, 12 (3): 286-300.  
https://doi.org/10.1002/qub2.62

摘要   PDF (3058KB)

Epithelial cell networks imply a packing geometry characterized by various cell shapes and distributions in terms of number of cell neighbors and areas. Despite such simple characteristics describing cell sheets, the formation of bubble‐like cells during the morphogenesis of epithelial tissues remains poorly understood. This study proposes a topological mathematical model of morphogenesis in a squamous epithelial. We introduce a new potential that takes into account not only the elasticity of cell perimeter and area but also the elasticity of their internal angles. Additionally, we incorporate an integral equation for chemical signaling, allowing us to consider chemo‐mechanical cell interactions. In addition to the listed factors, the model takes into account essential processes in real epithelial, such as cell proliferation and intercalation. The presented mathematical model has yielded novel insights into the packing of epithelial sheets. It has been found that there are two main states: one consists of cells of the same size, and the other consists of “bubble” cells. An example is provided of the possibility of accounting for chemo‐mechanical interactions in a multicellular environment. The introduction of a parameter determining the flexibility of cell shapes enables the modeling of more complex cell behaviors, such as considering change of cell phenotype. The developed mathematical model of morphogenesis of squamous epithelium allows progress in understanding the processes of formation of cell networks. The results obtained from mathematical modeling are of significant importance for understanding the mechanisms of morphogenesis and development of epithelial tissues. Additionally, the obtained results can be applied in developing methods to influence morphogenetic processes in medical applications.

参考文献 | 相关文章 | 多维度评价
In silico designing and optimization of anti‐epidermal growth factor receptor scaffolds by complementary‐determining regions‐grafting technique
Razieh Rezaei Adriani, Seyed Latif Mousavi Gargari, Hamid Bakherad, Jafar Amani
Quantitative Biology. 2024, 12 (3): 301-312.  
https://doi.org/10.1002/qub2.63

摘要   PDF (2478KB)

Monoclonal antibodies are attractive therapeutic agents in a wide range of human disorders that bind specifically to their target through their complementary‐determining regions (CDRs). Small proteins with structurally preserved CDRs are promising antibodies mimetics. In this in silico study, we presented new antibody mimetics against the cancer marker epidermal growth factor receptor (EGFR) created by the CDRs grafting technique. Ten potential graft acceptor sites that efficiently immobilize the grafted CDR loops were selected from three small protein scaffolds using a computer. The three most involved CDR loops in antibody‐receptor interactions extracted from panitumumab antibody against the EGFR domain III crystal structure were then grafted to the selected scaffolds through the loop randomization technique. The combination of three CDR loops and 10 grafting sites revealed that three of the 36 combinations showed specific binding to EGFR DIII by binding energy calculations. Thus, the present strategy and selected small protein scaffolds are promising tools in the design of new binders against EGFR with high binding energy.

参考文献 | 相关文章 | 多维度评价
A penalized integrative deep neural network for variable selection among multiple omics datasets
Yang Li, Xiaonan Ren, Haochen Yu, Tao Sun, Shuangge Ma
Quantitative Biology. 2024, 12 (3): 313-323.  
https://doi.org/10.1002/qub2.51

摘要   PDF (5652KB)

Deep learning has been increasingly popular in omics data analysis. Recent works incorporating variable selection into deep learning have greatly enhanced the model’s interpretability. However, because deep learning desires a large sample size, the existing methods may result in uncertain findings when the dataset has a small sample size, commonly seen in omics data analysis. With the explosion and availability of omics data from multiple populations/studies, the existing methods naively pool them into one dataset to enhance the sample size while ignoring that variable structures can differ across datasets, which might lead to inaccurate variable selection results. We propose a penalized integrative deep neural network (PIN) to simultaneously select important variables from multiple datasets. PIN directly aggregates multiple datasets as input and considers both homogeneity and heterogeneity situations among multiple datasets in an integrative analysis framework. Results from extensive simulation studies and applications of PIN to gene expression datasets from elders with different cognitive statuses or ovarian cancer patients at different stages demonstrate that PIN outperforms existing methods with considerably improved performance among multiple datasets. The source code is freely available on Github (rucliyang/PINFunc). We speculate that the proposed PIN method will promote the identification of disease‐related important variables based on multiple studies/datasets from diverse origins.

参考文献 | 相关文章 | 多维度评价
Assessing the inhibition efficacy of clinical drugs against the main proteases of SARS‐CoV‐2 variants and other coronaviruses
Wenlong Zhao, Cecylia S. Lupala, Shifeng Hou, Shuxin Yang, Ziqi Yan, Shujie Liao, Xuefei Li, Nan Li
Quantitative Biology. 2024, 12 (3): 324-328.  
https://doi.org/10.1002/qub2.60

摘要   PDF (1080KB)
参考文献 | 相关文章 | 多维度评价
CShaperApp: Segmenting and analyzing cellular morphologies of the developing Caenorhabditis elegans embryo
Jianfeng Cao, Lihan Hu, Guoye Guan, Zelin Li, Zhongying Zhao, Chao Tang, Hong Yan
Quantitative Biology. 2024, 12 (3): 329-334.  
https://doi.org/10.1002/qub2.47

摘要   PDF (3866KB)

Caenorhabditis elegans has been widely used as a model organism in developmental biology due to its invariant development. In this study, we developed a desktop software CShaperApp to segment fluorescence‐labeled images of cell membranes and analyze cellular morphologies interactively during C. elegans embryogenesis. Based on the previously proposed framework CShaper, CShaperApp empowers biologists to automatically and efficiently extract quantitative cellular morphological data with either an existing deep learning model or a fine‐tuned one adapted to their in‐house dataset. Experimental results show that it takes about 30 min to process a three‐dimensional time‐lapse (4D) dataset, which consists of 150 image stacks at a ~1.5‐min interval and covers C. elegans embryogenesis from the 4‐cell to 350‐cell stages. The robustness of CShaperApp is also validated with the datasets from different laboratories. Furthermore, modularized implementation increases the flexibility in multi‐task applications and promotes its flexibility for future enhancements. As cell morphology over development has emerged as a focus of interest in developmental biology, CShaperApp is anticipated to pave the way for those studies by accelerating the high‐throughput generation of systems‐level quantitative data collection. The software can be freely downloaded from the website of Github (cao13jf/CShaperApp) and is executable on Windows, macOS, and Linux operating systems.

参考文献 | 相关文章 | 多维度评价
9篇文章