OncoMiner Pipeline | Required File Format
Required input format for nucleotide variant files
- File names should not contain spaces, and should end with ".csv".
- The first line of the file must contain a comma-separated list of column names,
indicating the names of the data in the corresponding positions on each
successive line.
- Successive lines must contain comma-separated lists of data items in the
order implied by the first line.
- The order of columns is not important, as long as all data lines are
consistent with the column names in the header line.
- The following columns must be present; other columns will be ignored:
- var_index
- Numeric index of the data row
- chrom
- Chromosome on which the variant appears, eg chr3, chrX
- left
- Locus within the chromosome at which the variant appears, in base-pairs from the 5' end
- right
- Chromosome immediately after the end of the variant; for SNPs this will be left+1
- ref_seq
- Nucleotides in the reference genome; for example, C if the reference genome contains C at the
- var_seq1
- First nucleotide variation; for example, A
- var_seq2
- Second nucleotide variation; for example, C. var_seq1 and var_seq2 may be the same if the variant affects both copies of the chromosome in the same way.
- var_score
- Read score of the variant; this is a measure of the reliability of the variant read, and varies from 0 to 35, with 35 the most-reliable
- gene_name
- GenBank name of the gene in or near which the variant appears
- where_in_transcript
- CDS, Intron, etc. Only CDS rows will be processed
- change_type1
- Synonymous or Non-Synonymous, indicating the peptide change induced by the variant
Example
Here is an example of the text of a valid input file containing two data rows. Note that the example has additional columns (eg var_peptide1) that
are not among the required columns. That is fine; the OncoMiner service will ignore those columns. The example data is extracted directly
from a data set provided by Otogenetics Corp.
var_index,chrom,left,right,ref_seq,var_type,zygosity,var_seq1,var_seq2,var_score,not_ref_score,coverage,read_count1,read_count2,conservation,gene_name,transcript_name,where_in_transcript,change_type1,ref_peptide1,var_peptide1,change_type2,ref_peptide2,var_peptide2,dbsnp,dbsnp_build
1,chr1,150199051,150199057,ttcctc,DEL,Het,,ttcctc,35.00000000,35.00000000,60,13,44,,ANP32E,NM_001136478,CDS,Non-Synonymous,EE,,,,,rs56692627,129
2,chr10,76854564,76854565,c,SNP,Hom,t,t,35.00000000,35.00000000,81,80,80,,DUSP13,NM_001007272,CDS,Non-Synonymous,C,Y,Non-Synonymous,C,Y,rs3088142,102