1. Networks

A. Reconstruction

The networks can be reconstructed using the tool specified in the tool field in the data table. For example, the LCL cell line network can be built using the PANDA tool as implemented in the package netZooM version 0.1. The other fields in the datatables refer to the arguments or data sources needed for the tool. For example, PANDA requires a TF-TF PPI network, gene expression samples, and TF motif binding site. These data files along with their sources are provided to reconstruct the network. Since PANDA is a deterministic algorithm, reconstructing the network using the data priors and the same software release should yield the exact same network as the one available for visualization and download.

B. Download

The networks can be downloaded with two different properties.
Format:
Edg: stands for edges format. The network is written in a text or csv file with the following format:
                node1    node2    edge_weight 
    
e.g., A B 1.0

or in the following format for multi-sample files:
                node1:node2    edge_weight_sample_1   edge_weight_sample_n
    
e.g., A:B 1.0 2.0

Adj: stands for Adjacency matrix format. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes).
gt: stands for gene targeting. Gene targeting is the sum of weighted in-degrees in the network.
tt: stands for TF targeting. TF targeting is the sum of weighted out-degrees in the network.
Edge weights:
The edges weights are computed by PANDA, PUMA, OTTER, and LIONESS algorithms. They usually vary between -20 and 20. A large value means a high likelihood of the existence of an edge between two nodes, a low negative value means a small probability of interaction between two given nodes. For DRAGON, edges weights are partial correlations between the nodes.
Download options are available in different formats. For example allows to download a network in edges format with original edge weights.



The following brief guides show to how to access a specific network and browse the phenotypic variables.

Small molecule drugs

Cancer types

Tissues

Cell lines



3. Analysis

The analysis section allows to analyse a set of TFs or genes in disease or small molecule category. It exploits the duality between TFs and genes in bipartite gene regulatory networks.

The small molecule analysis section allows to find compounds that optimally reverse the gene targeting or the transcription factor activity patterns in the query set.

Gene targeting refers the to the weighted in-degree of a given gene. Since PANDA is usually validated against ChIP-seq data, targeting can be interpreted as the binding profile of TFs for a given gene in a particular experiment. Transcription targeting activity refers to the weighted out-degree of a given transcription factor. This tool serves for hypothesis generation wheras compounds that reverse/aggravate the gene/TF targeting patterns of a given experiment are hypothetical candidates for experimental validation.

The overlap score is equal to card(intersect(Input_Genes_Up,Drug_Genes_down)) + card(intersect(Input_Genes_Down,Drug_Genes_Up)) - card(intersect(Input_Genes_Up,Drugs_Genes_Up)) - card(intersect(Input_Genes_Down,Drugs_Genes_Down)). The cosine similarity compares the input vector to all the drugs. It is a signed measure that computes the direction of the vectors rather than their amplitude. In our case, we are interested in measuring the vectors of opposite directions, thus having negative cosine similarity.



4. Visualization

To enable network visualization, please make sure to use Google Chrome while navigating GRAND. The visualization module has 2 components. The first one allows to plot and query subgraphs of gene regulatory networks based on user selection. The second one allows to compute targeting scores from gene regulatory networks.

Network view

Targeting view



5. Wiki

PANDA

PANDA is a method that allows the inference of a TF to gene bipartite gene regulatory network by integrating three data sources: 1) gene coexpression, 2) TF PPI network, and 3) TF motif network. Details and examples can be found here. PANDA was compared to four other inference methods using ChIP-ChIP data in yeast in three conditions: TF knockout (KO), cell cycle (CC), and stress response (SR).

OTTER

OTTER is an implementation of PANDA that uses convex optimization to infer the gene regulatory network. OTTER assumes that PPI and coexpression networks are projections of the regulatory network on the TF and Gene space and infers the network by optimzing graph matching between the three input networks. Details and examples can be found
here. The accuracy of OTTER networks has been benchmarked against ChIP-seq data in breast cancer, liver cancer, and cervix cancer.

PUMA

PUMA is an implementation of PANDA that infers miRNA to gene bipartite networks by integrating two data sources: 1) gene coexpression and 2) miRNA target networks. Details and examples can be found here.

LIONESS

LIONESS allows to estimate single-sample gene regulatory networks from one aggregate network such as PANDA, PUMA, or coexpression networks. Details and examples can be found here.

DRAGON

DRAGON allows the simultaneous inference of multi-layer Gaussian Graphical Model (GGM) omic networks. DRAGON allows the estimation of partial correlations using a low number of sample by implementing covariance shrinkage. Details and examples can be found here. DRAGON was compared to GGM for the accuracy to identify edges in single-cell multiomic dataset of simultaneously measured transcitptome and epitope (CITE-seq) for 6 diffrent sample sizes.

Gene/TF/miRNA targeting

Targeting is a score for dircted bipartite networks. For source nodes (TFs/miRNA), targeting is the weighted outdegree, i.e., the sum of edge weights originating from the node. For target nodes (Genes), targeting is the weighted indegree, i.e., the sum of edge weights reaching the node. This score has been detailed in Weighill et al. 2021.

GTEx

GTEx is a project that collected samples from 38 normal, non-disased tissues in humans and measured gene expression across each tissues. More details can be found here.

TCGA

TCGA hosts gene expression data as well as other genomic information such as mutations, copy number variation and methylation for several cancer types in different tissues. More details can be found here.

CCLE

CCLE datasets characterize more than 1500 human cell lines that model several cancer types across a large array of omic layers including copy number variation, mutations, methylation, protein and metabolite levels, alternative splicing, and chromatin marks quantification. More details can be found here.

Connectivity Map

The connectivity map measured the activity of more than 20,000 approved and experimental small molecule compounds in normal and cancerous cell lines. The expression of 987 genes was measured using the L1000 assay and the expression of close to 11,000 genes were inferred from the 1000 genes set. More details can be found here.

6. API

A. Documentation

GRAND has an API that allows accessing and filtering the data programmatically to perform large-scale downloads. The full API documentation is available at https://grand.networkmedicine.org/redoc/. A web view of the API is also available for cancer type, genes, tissues, and drug networks.

B. Tutorial 1

You can access the database programmatically through the API, here we provide an example using the requests library of Python3. Additionally, you need to install the library awscli to download networks through the command line.

    import requests
    import os      
    
We perform a GET operation to access the drugs database but we can also query cancer type, genes, cells, and tissues. We need to make sure the returned status code is 200.

    response=requests.get('https://grand.networkmedicine.org/api/v1/drugapi/')
    
    
Filtering can be made server-side through querying the attributes of the network of interest. The full list of attributes is available at the documentation page. For example, to query by drug name, we can use the following command:
    
    responseFiltered=requests.get('https://grand.networkmedicine.org/api/v1/drugapi/?drug=1-phenylbiguanide')
    
    
To check that the data was collected correclty, we can do the following test.
    
    if response.status_code == 200:
    	print('Success!')
    elif response.status_code == 404:
    	print('Not Found.')
    
Since the API results are paginated for faster access, the previous command returns the first page with the first 50 drugs. To get the address of the next page, please use the following command:

    data['next']
    
Then, we transform the result data into JSON strings for easier parsing.

    data=response.json()
    drugs=data['results']
    
We can print the name of the first 50 drugs in the database among other attributes.

    for drug in drugs:
    	print(drug['drug'])
    
We can also serially download the drug-induced gene regulatory network through awscli library. You can wrap this command into a for loop:

    os.system('curl -O ' + drug['network'] + ' .')
    

C. Tutorial 2

It is possible to rebuild the networks hosted in grand database using the metadata provided with each network. For example, we can rebuild the LCL cell line network using the provided prior data. A closer look at the LCL cell line entry shows that the network has been reconstructed using the tool PANDA, as implemented in netZooM version 0.1. PANDA takes as input, a PPI network, gene expression data, and TF DNA binding motif data. These priors are provided along with the network. Since the network has been reconstructed in MATLAB, the following MATLAB tutorial goes through all the steps of downloading the priors through programmatic access and the reconstruction of the network: Using GRAND database's API for reproducible network reconstruction.
The code is provided using button. For networks generated using MATLAB, the .m script can be downloaded, however, for open-source languages (R and Python), the code is provided as Jupyter notebooks hosted on a cloud server called netbooks.

7. Image credit