OncoMiner Pipeline | Help Page

Purpose of the OncoMiner Pipeline

The pipeline accepts nucleotide variant data in the format provided by Otogenetics Corporation's exome sequencing servvices. Otogenetics provides their data in Microsoft Excel format; the files need to be saved in comma-separated-value (.CSV) format in order to be submitted to the OncoMiner pipeline. The pipeline will compute the following results for each variant in the inpute data:

A submitted job will produce four output files:

  1. jobname.PROVEAN.csv: contains rows of the same format as the input file, except that each row is duplicated for each isoform of the affected protein, and columns indicating the AA variant in Human Genome Variation Society format, the PROVEAN score for the specific isoform variant (as compared with the GenBank human genome reference assembly, build 37), BBRC FASTA header name, and protein length.
  2. jobname.summary.csv: contains rows of the same format as the input file, with one row per input row. Additional columns are added indicating the minimum and maximum PROVEAN scores for any isoform resulting from the variant; and a column is added containing all isoforms' PROVEAN scores separated by vertical bars.
  3. jobname.annotated.csv: this file is in the same format as the jobname.PROVEAN.csv file, except that additional columns indicating the Gene Ontology terms associated with each row's gene name, and hyperlinks to any PubMed references that were found for the dbSNP or HGVS variant, are added. Also, rows with high (non-damaging) PROVEAN scores are omitted.
  4. jobname.PROVEAN.full.annotated.csv: This is like the jobname.annotated.csv file, except it contains all rows; no filtering by PROVEAN score is performed.
Click here to see a specification of the required format of input files.

Click the "Submit a nucleotide variant job" link on any page to submit a nucleotide variant file for processing. You will be asked to choose a file for upload, and to supply an email address to which a link to the output files will be mailed when the job is complete. You will also be able to choose lists of gene names to be included or excluded from processing. You can upload a gene list using the "Upload a gene-list file" link, and it will then be available for use as an include/exclude list.

A typical whole-exome data set for a single individual typically takes about 24 hours to process When your job has completed, you will receive an email at the address you supplied when submitting the job. That email will contain a link to download the results from the pipeline server.

Circos heat-map generation from pipeline output

An additional service supplied by the pipeline server is the ability to combine the results of several pipeline output files and create a heat map of the damaging-ness of the affected protein variants. In order to use the heat-map generator, you must download the pipeline results to your local computer, unzip the result file to obtain the pipeline output files, and then upload the files from which you want to create a heat-map.

Use the " Upload scored data sets for Circos heat-map generation" link to submit pipeline result files to the heat-map generator.


Home | Help | Function predictions
Upload to Circos | Compute Statistics