Given a vcf file, is there a feature in SNP analysis package to filter out known SNPs (SNPs with rs# associated with)? If so, how is it done? Is it simply the concordance between the variant call in the vcf file to the known SNP in the dbSNP 137 build by chromosome#, position, reference, and alternate allele?
I want to filter out any known and common SNPs present in dbSNP or Thousand Genomes from my vcf file to get to the nvel SNPs. I am noticing discrepancies in the tools out there that provide any dbSNP validation versus the SNPs listed in the dbSNP 137 build file itself (downloaded from ncbi ftp). For example, for a given variant, if I do not find a SNP in the dbSNP 137 file, I end up finding an rs# associated with the same variant from a tool such as Polyphen-2 or SeattleSeq while I am searching for protein prediction. SNP analysis package by golden helix integrates these small multiple steps all in on one environment and I like that, however I want to know if I use SNP analysis package and don't find an rs# associated with my variant call, is it really so?.
Any feedback or comment will be appreciated!