GRAND has been tested on Google Chrome (V92.0.4515.107), the Brave browser (V1.27.108), Firefox (V90.0.2), and Safari (V14.1.2). However, we do recommend Google Chrome for optimal user experience.
Network graph does not display after clicking submit.
GRAND uses thrid party libraries that are accessed remotly. Therefore, please make sure that browser security settings and firewall allow inbound channels.
Network graph does not display in Brave browser
Brave browser blocks GRAND by default and the security setting is set to 'shields up' by default. Set security settings to 'shields down' in the top right corner of the browser to allow network visualization.
Exporting the network to png downloads a low resolution file
The Save as png button in network visualization allows to export the network as a png file. However the resolution depends on the screen resolution, therefore to get higher quality images, please increase your screen resolution.
1. Networks
A. Reconstruction
The networks can be reconstructed using the tool specified in the tool field in the data table. For example, the LCL cell line network can be built using the PANDA tool as implemented in the package netZooM version 0.1. The other fields in the datatables refer to the arguments or data sources needed for the tool. For example, PANDA requires a TF-TF PPI network, gene expression samples, and TF motif binding site.
These data files along with their sources are provided to reconstruct the network. Since PANDA is a deterministic algorithm, reconstructing the network using the data priors and the same software release should yield the exact same network as the one available for visualization and download.
B. Download
The networks can be downloaded with two different formats: Edg and Adj.
Format:
Edg: stands for edges format. The network is written on a file with the following format:
node1 node2 edge_weight
e.g., A B 1.0
or in the following format for multi-sample files:
Adj: stands for Adjacency matrix format. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes).
gt: stands for gene targeting. Gene targeting is the sum of weighted in-degrees in the network.
tt: stands for TF targeting. TF targeting is the sum of weighted out-degrees in the network.
Networks are saved on 2 file extensions: First, a clear text .csv or .txt file for most of the networks of GRAND. More recently, we started adopting binary .h5 files which use only 1/3 of memory space in comparison to clear text files.
Edge weights:
The edges weights are computed by PANDA, PUMA, OTTER, and LIONESS algorithms. They usually vary between -20 and 20.
A large value means a high likelihood of the existence of an edge between two nodes, a low negative value means a small probability of interaction between two given nodes. For DRAGON, edges weights are partial correlations between the nodes.
Download options are available in different formats. For example allows to download a network in edges format with original edge weights.
Important! All gene regulatory networks are complete bipartite graphs of genes*TFs edges.
2. Navigation
The following brief animated guide show to how to access a specific network and browse the phenotypic variables. Clicking the button in each page, takes you on step-by-step guide of the interface of the specific page or tool.
Small molecule drugs
Cancer types
Tissues
Cell lines
Use case: drug repurposing in Melanoma
To illustrate specific use cases of GRAND, we will perform an integrated differential gene regulatory network analysis in Melanoma and predict small molecule drug that reverse this condition.
First, to build a differential gene regulatory network, we need to navigate to the network comparison tab from the Analysis dropdown menu.
Then, we need to select normal skin as our first control network and skin cutaneous melanoma (SKCM) as our second network. The number of genes is set 250, the options 'largest' and 'absolute value' are checked to compute the 250 largest targeting scores by absolute values. We can do the same for TFs. Finally, we can click on the 'submit' button.
Once the targeting scores are computed and displayed on the browser, we can export these results for drug repurposing. The drug repurposing tool (CLUEreg) works on both TFs and genes to reverse the targeting pattern. We can select the 'by gene' option and the click 'CLUEreg' to export the results.
This leads up to the CLUEreg page with prefilled forms. We can remove the 'investigational drugs' and then click 'submit'.
The top 100 small molecules among nearly 20,000 possible compounds are displayed on the browser. We can see that Melatonin ranks among those 100 compounds. Melatonin is a naturally occuring compound, that controls the circadian rythm and is taken as a supplement to improve sleep quality. In addition, it has been suggested that melatonin plays an important role in Melanoma control.
3. Analysis
The analysis section allows to analyse a set of TFs or genes in disease or small molecule category. It exploits the duality between TFs and genes
in bipartite gene regulatory networks.
The small molecule analysis section allows to find compounds that optimally reverse the gene targeting or the transcription factor activity patterns in the query set.
Gene targeting refers the to the weighted in-degree of a given gene. Since PANDA is usually validated against ChIP-seq data, targeting can be interpreted as the binding profile of TFs for a given gene in a particular experiment. Transcription targeting activity refers to the weighted out-degree of a given transcription factor. This tool serves for hypothesis generation wheras compounds that reverse/aggravate the gene/TF targeting patterns of a given experiment are hypothetical candidates for experimental validation.
The overlap score is equal to card(intersect(Input_Genes_Up,Drug_Genes_down)) + card(intersect(Input_Genes_Down,Drug_Genes_Up))
- card(intersect(Input_Genes_Up,Drugs_Genes_Up)) - card(intersect(Input_Genes_Down,Drugs_Genes_Down)). The cosine similarity compares the input vector to all the drugs. It is a signed measure that computes the direction of the vectors rather than their amplitude. In our case, we are interested in measuring the vectors of opposite directions, thus having negative cosine similarity.
4. Visualization
To enable network visualization, please make sure to use Google Chrome while navigating GRAND. The visualization module has 2 components. The first one allows to plot and query subgraphs of gene regulatory networks based on user selection. The second one allows to compute targeting scores from gene regulatory networks. Clicking the button on each page explains the parameters of the subnetwork selection and visualization.
PUMA is an implementation of PANDA that infers miRNA to gene bipartite networks by integrating two data sources: 1) gene coexpression and 2) miRNA target networks. Details and examples can be found here.
LIONESS
LIONESS allows to estimate single-sample gene regulatory networks from one aggregate network such as PANDA, PUMA, or coexpression networks. Details and examples can be found here.
DRAGON
DRAGON allows the simultaneous inference of multi-layer Gaussian Graphical Model (GGM) omic networks. DRAGON allows the estimation of partial correlations using a low number of sample by implementing covariance shrinkage. Details and examples can be found here.
DRAGON was compared to GGM for the accuracy to identify edges in single-cell multiomic dataset of simultaneously measured transcitptome and epitope (CITE-seq) for 6 diffrent sample sizes.
EGRET
EGRET builds genotype-specific gene regualtory networks by integrating genomic variant information such as eQTLs and their effects on TF binding. EGRET seeds the PANDA algorithm with additional data that allows to reconstruct a network that is specific for a given genotype; these additional inputs are genotype data (vcf files), Qbic-pred prediction of TF biding alteration for each genotype, and eQTL data. Additional details and examples can be found here.
Gene/TF/miRNA targeting
Targeting is a score for dircted bipartite networks. For source nodes (TFs/miRNA), targeting is the weighted outdegree, i.e., the sum of edge weights originating from the node. For target nodes (Genes), targeting is the weighted indegree, i.e., the sum of edge weights reaching the node.
This score has been detailed in Weighill et al. 2021.
GTEx
GTEx is a project that collected samples from 38 normal, non-disased tissues in humans and measured gene expression across each tissues. More details can be found here.
TCGA
TCGA hosts gene expression data as well as other genomic information such as mutations, copy number variation and methylation for several cancer types in different tissues. More details can be found here.
CCLE
CCLE datasets characterize more than 1500 human cell lines that model several cancer types across a large array of omic layers including copy number variation, mutations, methylation, protein and metabolite levels, alternative splicing, and chromatin marks quantification. More details can be found here.
Connectivity Map
The connectivity map measured the activity of more than 20,000 approved and experimental small molecule compounds in normal and cancerous cell lines. The expression of 987 genes was measured using the L1000 assay and the expression of close to 11,000 genes were inferred from the 1000 genes set. More details can be found here.
You can access the database programmatically through the API, here we provide an example using the requests library of Python3. Additionally, you need to install the library awscli to download networks through the command line.
import requests
import os
We perform a GET operation to access the drugs database but we can also query cancer type, genes, cells, and tissues. We need to make sure the returned status code is 200.
Filtering can be made server-side through querying the attributes of the network of interest. The full list of attributes is available at the documentation page. For example, to query by drug name, we can use the following command:
Since the API results are paginated for faster access, the previous command returns the first page with the first 50 drugs. To get the address of the next page, please use the following command:
data['next']
Then, we transform the result data into JSON strings for easier parsing.
data=response.json()
drugs=data['results']
We can print the name of the first 50 drugs in the database among other attributes.
for drug in drugs:
print(drug['drug'])
We can also serially download the drug-induced gene regulatory network through awscli library. You can wrap this command into a for loop:
os.system('curl -O ' + drug['network'] + ' .')
C. Tutorial 2
It is possible to rebuild the networks hosted in grand database using the metadata provided with each network. For example, we can rebuild the LCL cell line network using the provided prior data.
A closer look at the LCL cell line entry shows that the network has been reconstructed using the tool PANDA, as implemented in netZooM version 0.1. PANDA takes as input, a PPI network,
gene expression data, and TF DNA binding motif data. These priors are provided along with the network. Since the network has been reconstructed in MATLAB, the following MATLAB tutorial goes through all the steps of downloading
the priors through programmatic access and the reconstruction of the network: Using GRAND database's API for reproducible network reconstruction.
The code is provided using button. For networks generated using MATLAB, the .m script can be downloaded, however, for open-source languages (R and Python), the code is provided as Jupyter notebooks hosted on a cloud server called netbooks.