v0.1 cell line networks
v0.2 aggregate tissue networks
v0.3 TF enrichement tool
v0.4 single-sample tissue networks
v0.5 CLUEreg tool
v0.6 colon cancer networks
v0.7 drug networks
v0.7.1 API v1
v0.8 liver, cervix, and breast cancer networks v0.9 miRNA tissue networks v0.9.1 bootstrap 5 v1.0 glioblastoma networks v1.0.1 drug combinations v1.1 Cell line networks v1.2 User upload network v1.2.1 Gene targeting table v1.2.2 DRAGON miRNA network v1.3 Network comparison
The networks can be reconstructed using the tool specified in the tool field in the data table. For example, the LCL cell line network can be built using the PANDA tool as implemented in the package netZooM version 0.1. The other fields in the datatables refer to the arguments or data sources needed for the tool. For example, PANDA requires a TF-TF PPI network, gene expression samples, and TF motif binding site.
These data files along with their sources are provided to reconstruct the network. Since PANDA is a deterministic algorithm, reconstructing the network using the data priors and the same software release should yield the exact same network as the one available for visualization and download.
The networks can be downloaded with two different properties.
Edg: stands for edges format. The network is written in a text or csv file with the following format:
node1 node2 edge_weight
e.g., A B 1.0
or in the following format for multi-sample files:
Adj: stands for Adjacency matrix format. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes).
gt: stands for gene targeting. Gene targeting is the sum of weighted in-degrees in the network.
tt: stands for TF targeting. TF targeting is the sum of weighted out-degrees in the network.
The edges weights are computed by PANDA, PUMA, OTTER, and LIONESS algorithms. They usually vary between -20 and 20.
A large value means a high likelihood of the existence of an edge between two nodes, a low negative value means a small probability of interaction between two given nodes. For DRAGON, edges weights are partial correlations between the nodes.
Download options are available in different formats. For example allows to download a network in edges format with original edge weights.
Important! All gene regulatory networks are complete bipartite graphs of genes*TFs edges.
The following brief guides show to how to access a specific network and browse the phenotypic variables.
Small molecule drugs
The analysis section allows to analyse a set of TFs or genes in disease or small molecule category. It exploits the duality between TFs and genes
in bipartite gene regulatory networks.
The small molecule analysis section allows to find compounds that optimally reverse the gene targeting or the transcription factor activity patterns in the query set.
Gene targeting refers the to the weighted in-degree of a given gene. Since PANDA is usually validated against ChIP-seq data, targeting can be interpreted as the binding profile of TFs for a given gene in a particular experiment. Transcription targeting activity refers to the weighted out-degree of a given transcription factor. This tool serves for hypothesis generation wheras compounds that reverse/aggravate the gene/TF targeting patterns of a given experiment are hypothetical candidates for experimental validation.
The overlap score is equal to card(intersect(Input_Genes_Up,Drug_Genes_down)) + card(intersect(Input_Genes_Down,Drug_Genes_Up))
- card(intersect(Input_Genes_Up,Drugs_Genes_Up)) - card(intersect(Input_Genes_Down,Drugs_Genes_Down)). The cosine similarity compares the input vector to all the drugs. It is a signed measure that computes the direction of the vectors rather than their amplitude. In our case, we are interested in measuring the vectors of opposite directions, thus having negative cosine similarity.
To enable network visualization, please make sure to use Google Chrome while navigating GRAND. The visualization module has 2 components. The first one allows to plot and query subgraphs of gene regulatory networks based on user selection. The second one allows to compute targeting scores from gene regulatory networks.
PUMA is an implementation of PANDA that infers miRNA to gene bipartite networks by integrating two data sources: 1) gene coexpression and 2) miRNA target networks. Details and examples can be found here.
LIONESS allows to estimate single-sample gene regulatory networks from one aggregate network such as PANDA, PUMA, or coexpression networks. Details and examples can be found here.
DRAGON allows the simultaneous inference of multi-layer Gaussian Graphical Model (GGM) omic networks. DRAGON allows the estimation of partial correlations using a low number of sample by implementing covariance shrinkage. Details and examples can be found here.
DRAGON was compared to GGM for the accuracy to identify edges in single-cell multiomic dataset of simultaneously measured transcitptome and epitope (CITE-seq) for 6 diffrent sample sizes.
Targeting is a score for dircted bipartite networks. For source nodes (TFs/miRNA), targeting is the weighted outdegree, i.e., the sum of edge weights originating from the node. For target nodes (Genes), targeting is the weighted indegree, i.e., the sum of edge weights reaching the node.
This score has been detailed in Weighill et al. 2021.
GTEx is a project that collected samples from 38 normal, non-disased tissues in humans and measured gene expression across each tissues. More details can be found here.
TCGA hosts gene expression data as well as other genomic information such as mutations, copy number variation and methylation for several cancer types in different tissues. More details can be found here.
CCLE datasets characterize more than 1500 human cell lines that model several cancer types across a large array of omic layers including copy number variation, mutations, methylation, protein and metabolite levels, alternative splicing, and chromatin marks quantification. More details can be found here.
The connectivity map measured the activity of more than 20,000 approved and experimental small molecule compounds in normal and cancerous cell lines. The expression of 987 genes was measured using the L1000 assay and the expression of close to 11,000 genes were inferred from the 1000 genes set. More details can be found here.
You can access the database programmatically through the API, here we provide an example using the requests library of Python3. Additionally, you need to install the library awscli to download networks through the command line.
We perform a GET operation to access the drugs database but we can also query cancer type, genes, cells, and tissues. We need to make sure the returned status code is 200.
Filtering can be made server-side through querying the attributes of the network of interest. The full list of attributes is available at the documentation page. For example, to query by drug name, we can use the following command:
Since the API results are paginated for faster access, the previous command returns the first page with the first 50 drugs. To get the address of the next page, please use the following command:
Then, we transform the result data into JSON strings for easier parsing.
We can print the name of the first 50 drugs in the database among other attributes.
for drug in drugs:
We can also serially download the drug-induced gene regulatory network through awscli library. You can wrap this command into a for loop:
os.system('curl -O ' + drug['network'] + ' .')
C. Tutorial 2
It is possible to rebuild the networks hosted in grand database using the metadata provided with each network. For example, we can rebuild the LCL cell line network using the provided prior data.
A closer look at the LCL cell line entry shows that the network has been reconstructed using the tool PANDA, as implemented in netZooM version 0.1. PANDA takes as input, a PPI network,
gene expression data, and TF DNA binding motif data. These priors are provided along with the network. Since the network has been reconstructed in MATLAB, the following MATLAB tutorial goes through all the steps of downloading
the priors through programmatic access and the reconstruction of the network: Using GRAND database's API for reproducible network reconstruction.
The code is provided using button. For networks generated using MATLAB, the .m script can be downloaded, however, for open-source languages (R and Python), the code is provided as Jupyter notebooks hosted on a cloud server called netbooks.