Applications of integrative OMICs approaches to gene regulation studies
Jing Qin1, Bin Yan2,3, Yaohua Hu4, Panwen Wang5, Junwen Wang5,6()
1. School of Life Sciences, The Chinese University of Hong Kong, Hong Kong SAR 999077, China
2. Laboratory for Food Safety and Environmental Technology, Institutes of Biomedicine and Biotechnology, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
3. School of Biomedical Sciences, The University of Hong Kong, Hong Kong SAR 999077, China
4. College of Mathematics and Statistics, Shenzhen University, Shenzhen 518060, China
5. Department of Health Sciences Research and Center for Individualized Medicine, Mayo Clinic, Scottsdale, AZ 85259, USA
6. Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ 85259, USA
Background: Functional genomics employs dozens of OMICs technologies to explore the functions of DNA, RNA and protein regulators in gene regulation processes. Despite each of these technologies being powerful tools on their own, like the parable of blind men and an elephant, any one single technology has a limited ability to depict the complex regulatory system. Integrative OMICS approaches have emerged and become an important area in biology and medicine. It provides a precise and effective way to study gene regulations.

Results: This article reviews current popular OMICs technologies, OMICs data integration strategies, and bioinformatics tools used for multi-dimensional data integration. We highlight the advantages of these methods, particularly in elucidating molecular basis of biological regulatory mechanisms.

Conclusions: To better understand the complexity of biological processes, we need powerful bioinformatics tools to integrate these OMICs data. Integrating multi-dimensional OMICs data will generate novel insights into system-level gene regulations and serves as a foundation for further hypothesis-driven research.

Author Summary  Dozens of OMICs technologies have been developed to dissect the functions of DNA, RNA and protein in gene regulation. Each of these technologies is powerful on their own. However, like the parable of blind men and an elephant, any single technology has a limited ability to depict the whole complex regulatory system. Integrative approaches have emerged and become an important area in biology and medicine. It provides a precise and effective way to elucidate the regulatory mechanisms. This article reviews the popular OMICs technologies, OMICs data integration strategies, and the bioinformatics tools used for multi-dimensional data integration.
Keywords gene regulatory networks      integrative analysis      OMICs      ChIP-seq      RNA-seq     
About author:

Fig.1  High-throughput OMICs data describing the complex gene regulatory system.
TechnologyData typeDescription
Protein arrayKinase-substrate interactomeProtein array tracks the interactions and activities of proteins [2]
IP-MSProtein-protein interactomeImmunoprecipitation followed by mass spectrometry identifies interacting partners of a protein of interests [14]
Y2HProtein-protein interactomeYeast two-hybrid screening discovers protein-protein interactions [4,15]
TAP-MSProtein-protein interactomeTandem affinity purification combined with mass spectrometry identifies components of protein complexes [3]
MSMetabolomeMass spectrometry measures the consumption and release metabolites [16]
ChIP-seq/chipProtein-DNA interactomeChromatin immunoprecipitation coupled with sequencing or microarray reveals the repertoire of in vivo protein (TF, chromatin modifier, etc.) binding sites on the genome [17]
ChIP-exoProtein-DNA interactomeChromatin immunoprecipitation combines with the use of exonucleases to achieve a high resolution of protein binding sites [18]
BS-seqDNA methylomeBisulfite sequencing uses bisulfite treatment of DNA to determine its pattern of methylation [19]
MSCCDNA methylomeMethyl sensitive cut counting, a whole genome methylation profiling method based on the sensitivity of CCGG sites to the restriction enzymes [20]
MeDIPDNA methylomeMethylated DNA immunoprecipitation coupled with microarray or sequencing enriches methylated DNA sequences which then detected by microarray or sequencing [21]
BSPPDNA methylomeBisulfite padlock probes, a targeted method that isolates selected locations for methylation profiling [20]
RRBSDNA methylomeReduced representation bisulfite sequencing combines restriction enzymes and bisulfite sequencing to enrich high CpG sequence and measure their methylation levels [22]
Methyl-MAPSDNA methylomeMethylation mapping analysis by paired-end sequencing [23]
DNase-seqOpen chromatin regionDNase I hypersensitive sites sequencing identifies the location of regulatory regions, based on the genome-wide sequencing of regions super sensitive to cleavage by DNase I [24]
Sono-seqOpen chromatin regionMethod for isolating protein-free genomic regions by sonication of chromatin combined with a size-selection step and massively parallel short-read sequencing [25]
ATAC-seqOpen chromatin regionAssay for detecting transposase-accessible chromatin using sequencing [26]
FAIRE-SeqOpen chromatin regionFormaldehyde-assisted isolation of regulatory elements determines the sequences of regulatory region [27]
NOMe-seqNucleosome occupancy and methylomeA high-resolution single-molecule mapping approach to simultaneously investigate endogenous DNA methylation and nucleosome occupancies [28]
MNase-seqNucleosome occupancyPaired-end sequencing of micrococcal nuclease-digested chromatin determines genome-wide nucleosome occupancy [29]
Hi-C/4C/5CLong-range DNA interactomeTechniques to detect chromatin interactions based on chromosome conformation capture [6]
ChIA-PETLong-range DNA interactomeChromatin interaction analysis by paired-end tag sequencing determines long-range interactions [6]
RNA-seqTranscriptomeRNA sequencing measures the abundance of RNA transcripts [30]
MicroarrayTranscriptomeMicroarray contains probes designed for RNA transcripts, on which hybridization signals indicate the abundance of RNA transcripts [31]
GRO-seqTranscriptomeGlobal run-on sequencing identifies the genes that are being transcribed [32]
CLIP-seqProtein-RNA interactomeSequencing of RNA isolated by crosslinking immunoprecipitation maps protein-RNA binding sites in vivo [33]
Degradome-seqProtein-RNA interactomeDegradome sequencing, also known as parallel analysis of RNA ends (PARE) sequencing, detects cDNA ends, which implies mRNAs that are degraded under miRNA regulations [34]
RIP-seqProtein-RNA interactomeRNA-immunoprecipitation sequencing captures the protein-bound RNAs [12]
CLASHmiRNA-RNA interactomeCrosslinking, ligation, and sequencing of hybrids directly map miRNA-RNA interactions [35]
dChIRP-seqRNA-RNA, RNA-DNA and Protein-RNA interactomeDomain-specific chromatin isolation by RNA purification followed by sequencing dissects pairwise RNA-RNA, RNA-protein and RNA-chromatin interactions that reveal lncRNA architecture and function [11]
Structure-seqRNA structureA high-throughput and quantitative method detects genome-wide information on RNA structure at single-nucleotide resolution [36]
CGHGenomeComparative genomic hybridization surveys copy number variations [37]
DNA-seqGenomeDNA sequencing for genome assembly or resequencing for detection of variations [38]
Exome-seqGenomeDNA sequencing of exomes for detection of variations [39]
SILACProteomeStable isotope labeling with amino acids in cell culture quantitatively detects differences in protein abundance among samples using non-radioactive isotopic labeling [40]
PRISMProteomeProteomic investigation strategy for mammals [41]
DEEP SEQ MSProteomeDeep efficient peptide sequencing and quantification mass spectrometry [42]
Ribo-seqProteomeRibosome profiling determines the mRNAs that are being actively translated and measures the translation efficiency [8]
Fig.2  Statistics of OMICs studies.
Signaling pathway/network analysis
iPEAP [79]Integrative Pathway Enrichment Analysis Platform aggregates transcriptome, proteome, metabolome and GWAS data to detect enriched signaling pathways
PathFinder [80]With known PPI network, PathFinder utilizes transcriptome data, protein subcellular localization and sequence information to filter the false positives, and incorporates protein families to fix false negative pairs
bioPIXIE [81]An integrative system combines PPI network and gene expression data to find pathways
SPINE [82]Signaling-regulatory Pathway INferencE constructs PPI and protein-DNA interaction networks, and uses an integer programming solver to get final pathways from knockout expression data
ReponseNet [83,84]ReponseNet uses a network-optimization approach to detect both signaling and regulatory networks from the integration of genetic screen, transcriptome data and PPI networks
MINDy [85]Modulator inference by network dynamics (MINDy) facilitates genome-wide identification of cell-specific post-translational modulators of TF activity, which dissect the cross-talk between signaling pathways and transcriptional regulations
Zhu and Guan [86]A Markov chain theory is applied to detect signaling networks using a known phosphorylation networks
CEASAR [2]A comprehensive integrative approach incorporating functional protein microarrays, mass spectrometry and bioinformatics to detect cell-specific genome-wide phosphorylation network
SNPLS [87]Sparse Network-regularized Partial Least Square identifies gene-drug modules from large-scale pairwise gene-expression and drug-response data
TF-gene regulation
ChIP-Array [88]A web server that integrates ChIP-seq/chip and expression data to detect direct and indirect target genes of a TF of interest
ChIP-Array 2 [89]An enhanced version of ChIP-Array integrates long-range chromatin interaction, open chromatin region and histone modification data to dissect more comprehensive GRNs involving diverse regulatory components
BETA [90]Binding and Expression Target Analysis ranks the direct targets of a TF based on two probability ranking derived from ChIP-seq/chip and transcriptome data
LpRGNI [91]Inferring gene regulatory networks by integrating ChIP- seq/chip and transcriptome via LASSO-type regularization methods
EMBER [92]Expectation Maximization of Binding and Expression pRofiles, a unsupervised machine learning algorithm to search enriched gene expression pattern around TFBSs
ChIPXpress [93]An R package that rank TF targets by integrating ChIP-seq/chip data with large amounts of Publicly available gene Expression Data
Tang et al. [94]Bayesian statistical modeling and modularity analysis integrates time-series ChIP-seq and gene expression data to construct dynamic regulatory network for any given TF
Yan et al. [95]A bioinformatics method to uncover interactive relationships between TFs or microRNAs and genes based on matrix decomposition modeling under the joint constraints of sparseness and regulator-target connectivity
Pique-Regi et al. [96]A probabilistic framework that integrates epigenetic data with genomic information to draw a genome-wide map of tissue specific TFBS
miRNA-gene regulation
GenMiR++ [97]GenMir++ uses a Bayesian model to predict miRNA targets based on both mRNA 3′ UTR region sequence features and the correlation between expressions of miRNA and its targets
mirAct [98]A tool designed to investigate miRNA activity based on gene-expression data by using the negative regulation relationship between miRNAs and their target genes
MMIA [99]MiRNA and MRNA Integrated Analysis is a web server that integrates miRNA and mRNA expression data with predicted miRNA target information for analyzing miRNA-associated phenotypes and biological functions
MAGIA[100]MiRNA and Genes Integrated Analysis is a web tool for the integrative analysis of miRNA target predictions
ProteoMirExpress [101]A web server that combines proteome and transcriptome data to infer miRNA-centered regulatory networks
Epigenetic regulation
EpiRegNet [102]A web server that detects important histone modifications affecting the expression of genes
CMGRN [103]Constructing Multilevel Gene Regulatory Networks uses the Bayesian network modeling to infer causal interrelationships among transcription factors and epigenetic modifications
Multi-dimensional integration
mirConnX [104]Condition-specific mRNA-miRNA network integrator uses TF binding in the promoter regions of miRNAs and mRNA, as well as predicted miRNA targets, to construct TF-miRNA-gene regulatory network
SNMNMF [105]A Sparse Network-Regularized Multiple Non-negative Matrix Factorization framework integrates miRNA and mRNA profiles, as well as miRNA-mRNA, TF-gene and PPI networks, for achieving modular patterns
jNMF [106]Joint Non-negative Matrix Factorization framework identifies multi-dimensional modules
sMBPLS [107]Sparse Multi-Block Partial Least Squares regression method identifies regulatory modules from multiple OMICs data
PTHGRN [108]Post-translational hierarchical GRN constructs multi-layer network by virtue of a graphical Gaussian model with partial least squares regression-based methodology
LRAcluster [109]A method using low-rank approximation based integrative probabilistic model to perform fast dimension reduction and unsupervised clustering of large-scale multi-OMICs data
Tab.2  Current bioinformatics tools for integrative analysis of OMICs data.
Fig.3  Integrative OMICs have profoundly changed the strategy on basic biological research and is playing significant roles in medical fields.
Full text


