Networks

Reconstruction

The networks can be reconstructed using the tool specified in the tool field in the data table. For example, the LCL cell line network can be built using the PANDA tool as implemented in the package netZooM version 0.1. The other fields in the datatables refer to the arguments or data sources needed for the tool. For example, PANDA requires a TF-TF PPI network, gene expression samples, and TF motif binding site. These data files along with their sources are provided to reconstruct the network. Since PANDA is a deterministic algorithm, reconstructing the network using the data priors and the same software release should yield the exact same network as the one available for direct download.

Download

The networks can be downloaded with two different properties.
Format:
p: stands for .pairs format. The network is written in a text or csv file with the following format:
                node1    node2    edge_weight 
    
e.g., A B 1.0

or in the following format for multi-sample files:
                node1:node2    edge_weight_sample_1   edge_weight_sample_n
    
e.g., A:B 1.0 2.0

c: stands for .csv format. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes).
m: stands for .mat format. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes) in MATLAB format.
gt: stands for gene targeting. Gene targeting is the sum of weighted in-degrees in the network.
tt: stands for TF targeting. TF targeting is the sum of weighted out-degrees in the network.
Edge weights:
o: stands for original edge weights. These are the original edges weights computed by the PANDA algorithm. They usually vary between -20 and 20. A large value means a high likelihood of the existence of an edge between two nodes, a low negative value means a small probability of interaction between two given nodes.
t1: stands for transformation 1. This is the transformation of original edge weights into a positive quantity. The transformation is detailed in Sonawane et al, Cell reports, 2017.
t2: stands for transformation 2. This is a transformation of the original edge weights into a quantity between 0 and 1 using a logistic regression of parameter 0.3.
Download options will be in <format><edge weights>. For example allows to download a network in .pairs format with original edge weights.

Analysis

The analysis section allows to analyse a set of TFs or genes in disease or small molecule category. It exploits the duality between TFs and genes in bipartite gene regulatory networks.

The small molecule analysis section allows to find compounds that optimally reverse the gene targeting or the transcription factor activity patterns in the query set.

Gene targeting refers the to the weighted in-degree of a given gene. Since PANDA is usually validated against Chip-Seq data, targeting can be interpreted as the binding profile of TFs for a given gene in a particular experiment. Transcription targeting activity refers to the weighted out-degree of a given transcription factor. This tool serves for hypothesis generation wheras compounds that reverse/aggravate the gene/TF targeting patterns of a given experiment are hypothetical candidates for experimental validation.

The overlap score is equal to card(intersect(Input_Genes_Up,Drug_Genes_down)) + card(intersect(Input_Genes_Down,Drug_Genes_Up)) - card(intersect(Input_Genes_Up,Drugs_Genes_Up)) - card(intersect(Input_Genes_Down,Drugs_Genes_Down)). The cosine similarity compares the input vector to all the drugs. It is a signed measure that computes the direction of the vectors rather than their amplitude. In our case, we are interested in measuring the vectors of opposite directions, thus having negative cosine similarity.

Wiki

Gene/TF/miRNA targeting

Targeting is a score for dircted bipartite networks. For source nodes (TFs/miRNA), targeting is the weighted outdegree, i.e., the sum of edge weights originating from the node. For target nodes (Genes), targeting is the weighted indegree, i.e., the sum of edge weights reaching the node. This score has been detailed in Weighill et al. 2021.

API

Documentation

GRAND has an API that allows accessing and filtering the data programmatically to perform large-scale downloads. The full API documentation is available at https://grand.networkmedicine.org/redoc/. A web view of the API is also available for cells, cancer type, genes, tissues, and drug networks.

Tutorial 1

You can access the database programmatically through the API, here we provide an example using the requests library of Python3. Additionally, you need to install the library awscli to download networks through the command line.

    import requests
    import os      
    
We perform a GET operation to access the drugs database but we can also query cancer type, genes, cells, and tissues. We need to make sure the returned status code is 200.

    response=requests.get('https://grand.networkmedicine.org/api/v1/drugapi/')
    
    
Filtering can be made server-side through querying the attributes of the network of interest. The full list of attributes is available at the documentation page. For example, to query by drug name, we can use the following command:
    
    responseFiltered=requests.get('https://grand.networkmedicine.org/api/v1/drugapi/?drug=1-phenylbiguanide')
    
    
    
    if response.status_code == 200:
    	print('Success!')
    elif response.status_code == 404:
    	print('Not Found.')
    
Since the API results are paginated for faster access, the previous command returns the first page with the first 50 drugs. To get the address of the next page, please use the following command:

    data['next']
    
Then, we transform the result data into JSON strings for easier parsing.

    data=response.json()
    drugs=data['results']
    
We can print the name of the first 50 drugs in the database among other attributes.

    for drug in drugs:
    	print(drug['drug'])
    
We can also serially download the drug-induced gene regulatory network through awscli library. You can wrap this command into a for loop:

    os.system('aws s3 cp ' + drug['network'] + ' .')
    

Tutorial 2

It is possible to rebuild the networks hosted in grand database using the metadata provided with each network. For example, we can rebuild the LCL cell line network using the provided prior data. A closer look at the LCL cell line entry shows that the network has been reconstructed using the tool PANDA, as implemented in netZooM version 0.1. PANDA takes as input, a PPI network, gene expression data, and TF DNA binding motif data. These priors are provided along with the network. Since the network has been reconstructed in MATLAB, the following MATLAB tutorial goes through all the steps of downloading the priors through programmatic access and the reconstruction of the network: Using GRAND database's API for reproducible network reconstruction.

Image credit