OncoMiner Pipeline | Help Page
Purpose of the OncoMiner Pipeline
The pipeline accepts nucleotide variant data in the format provided by Otogenetics Corporation's
exome sequencing servvices. Otogenetics provides their data in Microsoft Excel format; the files need to be saved in comma-separated-value (.CSV) format in order to be submitted to the OncoMiner pipeline. The pipeline will compute the following results for each
variant in the inpute data:
- The Human Genome Variation Society notation describing the protein
variant induced by the nucleotide variant on each isoform of the affected gene's coded protein.
- An estimate of the functional impact of the variant for protein isoform, using the
J. Craig Venter Institute's PROVEAN protein-variation scoring tool.
- Gene Ontology terms associated with the affected gene.
- Hyperlinks to any PubMed publications found by searching for the gene name in conjunction with
either the nucleotide variant's dbSNP ID, or the HGVS notation associated with the induced
protein variants.
A submitted job will produce four output files:
- jobname.PROVEAN.csv: contains rows of the same format as the input file, except that each row is duplicated for each isoform of the affected protein, and columns indicating the AA variant in Human Genome Variation Society format, the PROVEAN score for the specific isoform variant (as compared with the GenBank human genome reference assembly, build 37), BBRC FASTA header name, and protein length.
- jobname.summary.csv: contains rows of the same format as the input file, with one row per input row. Additional columns are added indicating the minimum and maximum PROVEAN scores for any isoform resulting from the variant; and a column is added containing all isoforms' PROVEAN scores separated by vertical bars.
- jobname.annotated.csv: this file is in the same format as the jobname.PROVEAN.csv file, except that additional columns indicating the Gene Ontology terms associated with each row's gene name, and hyperlinks to any PubMed references that were found for the dbSNP or HGVS variant, are added. Also, rows with high (non-damaging) PROVEAN scores are omitted.
- jobname.PROVEAN.full.annotated.csv: This is like the jobname.annotated.csv file, except it contains all rows; no filtering by PROVEAN score is performed.
Click
here to see a specification of the required format of input files.
Click the "Submit a nucleotide variant job" link on
any page to submit a nucleotide variant file for processing. You will
be asked to choose a file for upload, and to supply an email address to
which a link to the output files will be mailed when the job is
complete. You will also be able to choose lists of gene names to be
included or excluded from processing. You can upload a gene list using
the "Upload a gene-list file" link, and it
will then be available for use as an include/exclude list.
A typical whole-exome data set for a single individual typically takes
about 24 hours to process
When your job has completed, you will receive an email at the address you supplied
when submitting the job. That email will contain a link to download the results
from the pipeline server.
Circos heat-map generation from pipeline output
An additional service supplied by the pipeline server is the ability to combine the
results of several pipeline output files and create a heat map of the damaging-ness
of the affected protein variants. In order to use the heat-map generator, you must
download the pipeline results to your local computer, unzip the result file to
obtain the pipeline output files, and then upload the files from which you want to
create a heat-map.
Use the " Upload scored data sets for Circos heat-map generation"
link to submit pipeline result files to the heat-map generator.