Converting fasta for reference sequence

asked 2014-05-27 12:38:35 -0600

Krystel Duval
1 ●2 ●2 ●3

updated 2014-05-27 12:41:18 -0600

Hi,

When I try to put in my own sequence of fasta to convert it into a .tsf file for it to be my own reference sequence, either: - GenomeBrowse crashes when I'm all done with the converting and finally get to select my genome in the drop down menu, or - GenomeBrowse gives me the error code 400 warning on the track, and then it's like it cannot show my genome sequence.

Do you see what the problem can be?

I am using GenomeBrowse 2.0.1 on a MacBook Pro. The genome I want to upload is the one of the Enterobacteria phage Lambda, a virus of 48 502 bp. I also tried recreating the Mybacterium turberculosis that is present in GenomeBrowse and I can't seem to be able to make it through the conversion and visualization either.

Thank you very much.

answered 2014-05-29 16:17:51 -0600

Jami Bartole
201 ●3 ●3 http://www.goldenhelix.co...

Hi Krystel,

I am sorry you are having issues creating your reference sequence track.

Our Data Convert Wizard is very specific in the supported file formats and extensions for each type of conversion. In particular, for the FASTA converter the files must be *.fa or *.fasta or the *.gz versions for the wizard to pick the correct convert options.

If you are downloading your reference sequence from the NCBI FTP site for the Enterobacteria phage lambda virus genome then the file with the closest supported format is NC_001416.fna. Before adding the file to the converter you will first want to rename the extension to be *.fa.

image description

Then on the next dialog you will need to rename the segments (chromosome/scaffolds) present in the file since NCBI files contain more than just chromosome names in the headers for their FASTA files.

Additionally if you have BAM files for this species then you will want to make sure the segment names you select for the convert wizard match exactly to what is contained in the header of the BAM file. In the below screenshot I have chosen to rename the segment to "1" and then listed the NCBI naming convention "NC_001416.1" as the Alias name. Either the "Segment" or "Alias Name" can be used to match the BAM file naming convention.

image description

Also at this point you will want to create a Build Name for your genome. We generally use the assembly name given by NCBI, but for this genome an assembly name does not exist so you can just give it an informative name that you will recognize, keeping in mind that if you create any further data sources for this genome you will want to use the same naming convention.

If you continue to have issues creating and using the reference sequence for this genome please let me know. If you could provide a link to the file you are trying to convert or provide a copy of the file to me at [email protected] that would be most useful.

Thanks, Jami...

edit flag offensive delete link

Converting fasta for reference sequence

1 answer

Question Tools

Stats

Related questions