Ask Your Question
0

custom reference : Unable to find reference sequence source for build

asked 2014-06-30 04:38:41 -0600

Yadhu Kumar gravatar image

Hi There, I am having problem with generating coverage files from custom reference. Could you please throw some light here?

Build:GenomeBrowse-Lin64-2.0.2

Command: gautil coverage sample.sort.bam -ref sample.ref.fa Problem: Unable to find reference sequence source for build

I am using the exact reference used for generating the BAM file.

Thanks!

edit retag flag offensive close merge delete

5 answers

Sort by ยป oldest newest most voted
0

answered 2014-08-19 11:14:50 -0600

Hi Michael,

I was able to take a look at the GRCH37g1k fasta and it looks like our Convert Wizard has some issues with the GZ version of the file, when loading it into the Convert Wizard and after the scanning phase completes if you scroll to the bottom of the segment list, it seems to be picking up some strange encoding from the compression of the file.

image description

If you load the unzipped file into the Convert Wizard the extra data at the bottom disappears and once you rename the FASTA segments to the standard 1,2,3,etc. this reference file seems to match up to the GRCh37_g1k TSF we have available.

Let me know if you continue to have issues or if you have any further questions.

Thanks, Jami...

edit flag offensive delete link more
0

answered 2014-08-14 11:55:04 -0600

Michael Imbeault gravatar image

updated 2014-08-15 03:54:13 -0600

Thanks for the amazing level of support as always. I got it to work with your instructions but it has weird behavior. It will only work with the .tsf from the GenomeBrowse distribution, not with the .tsf I generated from the GRCh37g1k.fasta file from ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/humang1kv37.fasta.gz - I'll try to investigate the difference.

Also, I had to copy the file to /Data, because it was in /CommonData which is not searched by gautil by default.

Also, from a technical point of view, its weird that the path parameter in --refFolder must be enclosed within "", while the source is not - I expected to do --refFolder G:\

Any way, it works now, so now I can precalculate coverages :)

Thank you so much, Michael

edit flag offensive delete link more
0

answered 2014-08-14 10:49:08 -0600

Hi Michael,

I am sorry you are also having issues with the gautil functionality.

For precomputing BAM coverage using the "coverage" function of gautil, it will by default look for the reference sequence TSF file in the ../GenomeBrowse/Application/Data folder. If the reference file is saved in a different directory then you can use the --refFolder="folder location" command to point to the new location.

As an example if you had your BAM and index files (sample.bam and sample.bam.bai) stored on a network drive (M) and the reference file was stored on an external hard drive (G) then the command would look something like the following.

gautil coverage M:/sample.bam --refFolder="G:/" or ./gautil.exe coverage M:/sample.bam --refFolder="G:/" if you are on a Linux machine.

The tool will then pick out the correct reference TSF file to use for computing the coverage based on the chromosome names and lengths listed in the header of your BAM file. The names and lengths listed in the BAM must match exactly to what is listed in the TSF file for the coverage tool to be able to identify the correct reference.

Please let me know if you continue to have any issues. We will also look into improving the documentation for the gautil tool to be more comprehensive and accurate.

Thanks, Jami...

edit flag offensive delete link more
0

answered 2014-08-14 07:19:52 -0600

Michael Imbeault gravatar image

Encountering the same problem, even with giving the .tsf file as a -ref parameter. Doesn't work in Unix and Windows, latest builds. I built the .tsf file from the fasta I mapped to, GRCh37g1k. The documentation of the gautil functions could use some work, there's a lot of ambiguity for the parameters it expects, as well as many typos.

edit flag offensive delete link more
0

answered 2014-07-01 08:44:14 -0600

Hi Yadhu,

I am sorry you are having issues computing the coverage for your BAM file.

The gautil function for coverage computations does not directly support using the FASTA formatted reference sequence. The reference file must be converted to Golden Helix TSF format using the Convert Data Wizard as a first step before it can be used in conjunction with gautil functions.

If you launch GenomeBrowse and open the Add dialog you can click Convert... in the lower left corner of the Data Source Library to launch the converter. You would then add your FASTA file and follow the prompts to create the TSF reference sequence file. You can find specifics on the options for converting in our manual at the following link. http://doc.goldenhelix.com/GenomeBrowse/2.0.2/convertsourcewizard.html

Please let us know if you have any further issues.

Thanks, Jami...

edit flag offensive delete link more
Login/Signup to Answer

Questions should be tagged FeatureRequest for asking about a non-existing feature or proposing a new idea, GeneralInquiry for general questions about GenomeBrowse or directions on how to do something, or RanIntoProblem if you want to report an issue or had difficulty getting to an expected result.

Stats

Asked: 2014-06-30 04:38:41 -0600

Seen: 10,772 times

Last updated: Aug 19 '14