Ask Your Question

Feature to filter out known SNPs (SNPs with rs#) from the vcf file using SNP analysis package?

asked 2013-05-16 15:45:34 -0600

Bhakti Dwivedi gravatar image

Given a vcf file, is there a feature in SNP analysis package to filter out known SNPs (SNPs with rs# associated with)? If so, how is it done? Is it simply the concordance between the variant call in the vcf file to the known SNP in the dbSNP 137 build by chromosome#, position, reference, and alternate allele?

I want to filter out any known and common SNPs present in dbSNP or Thousand Genomes from my vcf file to get to the nvel SNPs. I am noticing discrepancies in the tools out there that provide any dbSNP validation versus the SNPs listed in the dbSNP 137 build file itself (downloaded from ncbi ftp). For example, for a given variant, if I do not find a SNP in the dbSNP 137 file, I end up finding an rs# associated with the same variant from a tool such as Polyphen-2 or SeattleSeq while I am searching for protein prediction. SNP analysis package by golden helix integrates these small multiple steps all in on one environment and I like that, however I want to know if I use SNP analysis package and don't find an rs# associated with my variant call, is it really so?.

Any feedback or comment will be appreciated!

Thank you!

Regards, BD

edit retag flag offensive close merge delete

1 answer

Sort by ยป oldest newest most voted

answered 2013-05-21 09:25:31 -0600

Hi Bhakti,

The tool in our SNP and Variation Suite (SVS7) software that allows you filter out known SNPs uses matching criteria between the annotation information of your dataset and the data contained in the probe track. The most recent track we have available for this tool is the SNPs 137 track which was created using the downloadable file from NCBI.

For the most basic options available for the tool if chromosome# and position information for a particular marker in your data matches a probe in the track then this is considered a match and if filtering is selected the marker in your data will either be kept or removed depending on user selected filter parameters.

There are also options for more stringent matching criteria, one option available is to match only when alleles of the genotype columns match the alleles for the probe in the track. There is also a matching option for InDels which requires a reference allele field in the marker map of your dataset. You can find more information about this tool in our online manual at the following link. SVS Manual

With SVS we can only annotate using provided downloads from the database sources so these tracks may not contain the most up-to-date information that can be available directly from the source. So it is always possible that the tool in SVS will not find an rs# associated with a particular variant call but that one exists.

Please let us know if you have any more questions.

Thanks, Jami...

edit flag offensive delete link more
Login/Signup to Answer

Questions should be tagged FeatureRequest for asking about a non-existing feature or proposing a new idea, GeneralInquiry for general questions about GenomeBrowse or directions on how to do something, or RanIntoProblem if you want to report an issue or had difficulty getting to an expected result.

Question Tools

1 follower


Asked: 2013-05-16 15:45:34 -0600

Seen: 35,962 times

Last updated: May 21 '13