Tissue-spEcific mrNa iSoform functional Networks (TENSION) This README file describes the structure and contents of TENSION. Requirements: R, Python, STAR (https://github.com/alexdobin/STAR), StringTie (https://ccb.jhu.edu/software/stringtie/), and linux based operating system. Data Structure (An * represents multiple files in a series): - Data: All raw data files such as Gene Ontology annotations, gene ontology structure, and protein-protein interactions etc. are provided here. Encode_MMU_Experiments_Types.txt.gz : This file describes the relation between Encode experiment ids and the tissue. go_*.obo.gz : These contains the gene ontology structure used for functional label generation. mm_ref_GRCm38.p4_chr*.fa.edit.gz : These set of files contain the mouse genome (build GRCm38.p4). The headers have been modified to make them compatible with the annotation file. MMU_APID_L2_*.txt.gz : These contain the gene/protein level interactions obtained from Agile Protein Interactomes DataServer (APID) that have at least 2 experimental evidences (level 2 dataset). MMU_BioGrid_*.txt.gz : These contain the gene/protein level interactions obtained from Biological General Repository for Interaction Datasets (BioGRID). MMU_FPKM_Transcript.txt.gz : This contains the mRNA Isoform expression profile from all 359 RNA-Seq samples used in this study. The rows represent an mRNA Isoform while each of the column corresponds to the FPKM values for a particular sample. MMU_GO_*.gz : These contain the mouse gene ontology annotations downloaded for generating mRNA Isoform pair labels. MMU_IID_*.txt.gz : These contain the gene/protein level interactions obtained from Integrated Interactions Database (IID). We remove all such interactions for which there is only orthologous evidence. MMU_IntAct_Web_*.txt.gz : These contain the gene/protein level interactions obtained from IntAct. MMU_Kegg_*.txt.gz : These contain the gene/protein level pathway annotations obtained from Kyoto Encyclopedia of Genes and Genomes (KEGG). MMU_Mentha_*.txt.gz : These contain the gene/protein level interactions obtained from Mentha. We remove interactions with a score less than 0.2. MouseCyc_GeneName_*.txt.gz : These contain the gene/protein level pathway annotations obtained from BioCyc. protein_modified_header.fa.gz : The protein sequences for the mouse genome (build GRCm38.p4). The headers have been modified to be same as the mRNA producing them. ref_GRCm38.p4_top_level.gff3.gz : This contains the mouse genome annotations (build GRCm38.p4). rna_modified_header.fa.gz : The mRNA sequences for the mouse genome (build GRCm38.p4). The headers have been modified to contain only the NCBI RefSeq IDs. seq_gene.md.gz : This mapview file from NCBI contains mapping of the EntrezID in Kegg to GeneName (build GRCm38.p4). STAR_Index.tar.gz : The mouse genome index generated from STAR and used throughout this study. - Predictions: This folder contains the input/output files for TENSION. MMU_*_Cases1.csv.gz : These contain the tissue specific functional (Case2) and non-functional (Case3) mRNA Isoform pairs identified by TENSION. The rows represent mRNA Isoform pairs and each column contains the probability and the predicted labels (Negative:0; Positive:1) for all tissues. MMU_functionalNetworks13Nov18.RData : This contains the filtered tissue-specific functional mRNA isoform pairs for all 17 tissues. MMU_modelJoblib_13Nov18.pkl : This contains the original training and testing datasets, the final models, predictions and performance metrics for the testing dataset. MMU_nonFunctionalNetworks13Nov18.RData : This contains the filtered tissue-specific non-functional mRNA isoform pairs for all 17 tissues. MMU_Predict_*.gz : These contain the features for all the mRNA Isoform pairs. The first two columns contain the mRNA Isoform IDs, while the remaining columns contain the z-score for each of the 27 features. MMU_Predictions_13Nov18.csv.gz : The predictions for the original testing dataset. MMU_Test_13Nov18.csv.gz : The original testing file used throughout the study. MMU_Train_13Nov18.csv.gz : The original training file used throughout the study. MMU_TrueLabels_*.txt.gz : These contain the positive and negative mRNA Isoform pairs from original and validation dataset. MMU_Truelabels...csv.gz : This file contains the features for all the positive/negative mRNA Isoform pairs used to generate the training/testing datasets. pairsToValidate13Nov18.txt.gz : This file contains the validation mRNA Isoform pairs generated using the newer GO annotations, PPIs and pathway data. prediction_cut1.01._fold1._5.rna.gz : This contains the predictions from the Multi-Instance Learning based Bayesian Network Classifier method on the original testing dataset. Predictions_MMU_*.csv.gz : These contain the predictions from TENSION for all the mRNA Isoform pairs. The first two columns contain the mRNA Isoform IDs, while the remaining columns contains the probability and the predicted labels (Negative:0; Positive:1) for all tissues. validatedTPPairsLabelled13Nov18.txt.gz : Ths file contains the predictions and actual labels for the validation dataset. - Scripts: This folder contains all the scripts used to build TENSION. extractNewTPPairs.R : An R script to extract the predictions for the validation dataset. extractTissueCases.py : A python script to extract the tissue-specific functional and non-functioal mRNA Isoform pairs. fpkmPairs.R : An R script to calculate the mRNA Isoform pair features using all the RNA-Seq samples. fpkmTissuePairs.R : An R script to calculate the tissue level mRNA Isoform pair features using the RNA-Seq samples. generateGOPairs.R : An R script to extract the mRNA Isoform pair labels for TENSION. generatePairs.sh : A shell script to run the R scripts (fpkmPairs.R, fpkmTissuePairs.R, protSeqPairs.R and mrnaSeqPairs.R) to calculate the mRNA Isoform pair features. generateValidationGOPairs.R : An R script to extract the mRNA Isoform pair labels for the validation dataset of TENSION. mergeFeatures.sh : A shell script to merge all the feature files into a single large data matrix; extract features for the labelled mRNA Isoform pairs and split the large data matrix into smaller file chunks. mergeRNASeq.R : An R script to merge the mRNA Isoform expression profile from StringTie. methodComparison.py : A python script to compare the predictions of TENSION with a previously published method and plot the performance curves. mrnaSeqPairs.R : An R script to calculate the mRNA Isoform level features using the mRNA sequences. protSeqPairs.R : An R script to calculate the mRNA Isoform level features using the protein sequences. RFLabelShuffleBoxplot.py : A python script to generate randomized datasets with shuffled class labels and plot a boxplot for the performance metrics. RFPredict.py : A python script to make predictions on all mRNA Isoform pairs. RFRandomizedBoxplot.py : A python script to generate randomized datasets and plot a boxplot for the performance metrics. RFSaveModels.py : A python script to make predictions on the testing dataset, save the final model, and plot performance curves. RNASeq.sh : A shell script with the arguments used to run STAR and StringTie for processing RNA-Seq samples. stratifiedKFold.py : A python script to perform stratified K-fold validation and plot performance curves. validatePredictionsCurve.py : A python script to plot the performance curves for the validation dataset.