.pairsformat. The network is written in a text or csv file with the following format:
node1 node2 edge_weight
e.g., A B 1.0
node1:node2 edge_weight_sample_1 edge_weight_sample_n
e.g., A:B 1.0 2.0
.csvformat. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes).
.matformat. The bipartite network is saved as the weighted adjacency matrix W(TFs,Genes) in MATLAB format.
gt:stands for gene targeting. Gene targeting is the sum of weighted in-degrees in the network.
tt:stands for TF targeting. TF targeting is the sum of weighted out-degrees in the network.
o:stands for original edge weights. These are the original edges weights computed by the PANDA algorithm. They usually vary between -20 and 20. A large value means a high likelihood of the existence of an edge between two nodes, a low negative value means a small probability of interaction between two given nodes.
t1:stands for transformation 1. This is the transformation of original edge weights into a positive quantity. The transformation is detailed in Sonawane et al, Cell reports, 2017.
t2:stands for transformation 2. This is a transformation of the original edge weights into a quantity between 0 and 1 using a logistic regression of parameter 0.3.
<format><edge weights>. For example allows to download a network in
.pairsformat with original edge weights.
The analysis section allows to analyse a set of TFs or genes in disease or small molecule category. It exploits the duality between TFs and genes in bipartite gene regulatory networks.
The small molecule analysis section allows to find compounds that optimally reverse the gene targeting or the transcription factor activity patterns in the query set.
Gene targeting refers the to the weighted in-degree of a given gene. Since PANDA is usually validated against Chip-Seq data, targeting can be interpreted as the binding profile of TFs for a given gene in a particular experiment. Transcription targeting activity refers to the weighted out-degree of a given transcription factor. This tool serves for hypothesis generation wheras compounds that reverse/aggravate the gene/TF targeting patterns of a given experiment are hypothetical candidates for experimental validation.
The overlap score is equal to card(intersect(Input_Genes_Up,Drug_Genes_down)) + card(intersect(Input_Genes_Down,Drug_Genes_Up)) - card(intersect(Input_Genes_Up,Drugs_Genes_Up)) - card(intersect(Input_Genes_Down,Drugs_Genes_Down)). The cosine similarity compares the input vector to all the drugs. It is a signed measure that computes the direction of the vectors rather than their amplitude. In our case, we are interested in measuring the vectors of opposite directions, thus having negative cosine similarity.
We perform a GET operation to access the drugs database but we can also query cancer type, genes, cells, and tissues. We need to make sure the returned status code is 200.
import requests import os
Filtering can be made server-side through querying the attributes of the network of interest. The full list of attributes is available at the documentation page. For example, to query by drug name, we can use the following command:
Since the API results are paginated for faster access, the previous command returns the first page with the first 50 drugs. To get the address of the next page, please use the following command:
if response.status_code == 200: print('Success!') elif response.status_code == 404: print('Not Found.')
Then, we transform the result data into JSON strings for easier parsing.
We can print the name of the first 50 drugs in the database among other attributes.
We can also serially download the drug-induced gene regulatory network through awscli library. You can wrap this command into a for loop:
for drug in drugs: print(drug['drug'])
os.system('aws s3 cp ' + drug['network'] + ' .')