Ask Your Question
0

How can I load a file in CRAM format into my GH browser?

asked 2020-08-07 17:57:06 -0700

Whit Athey gravatar image

I have used the GH browser successfully on a number of BAM files. Now I have a genome in CRAM format and I can't seem to do anything with it. GH apparently won't load it. I can not understand the unix tools that may be available for converning files. Is there any simple way around this problem? Is there any way to make GH recognize and load a CRAM file? Maybe I don't have the most recent version?

edit retag flag offensive close merge delete

2 answers

Sort by ยป oldest newest most voted
1

answered 2021-12-03 08:48:22 -0700

Christophe Lambert gravatar image

updated 2021-12-03 09:22:20 -0700

A workaround I used was to install the open-source samtools software, and convert from the CRAM to BAM format via the command:

samtools view -b -o outfile.bam infile.cram

Note you can speed this up by using, say 24 threads via:

samtools view -@ 24 -b -o outfile.bam infile.cram

This will work if your CRAM file has header information that provides MD5 sums for the reference sequence, and samtools will even automatically download needed reference data for you (into ~/.cache/hts-ref/), see this post: https://www.biostars.org/p/489646/

If you have the reference sequence FASTA file that was used to build the CRAM file, you can specify it (say it was called ref.fa) by running the command:

samtools view -@ 24 -b -T ref.fa -o outfile.bam infile.cram

samtools is available here: http://www.htslib.org/

Under Ubuntu I was able to install samtools (though not the latest version) via:

sudo apt-get install samtools

Under RedHat / CentOS I was able to install samtools via:

sudo yum install samtools
edit flag offensive delete link more
0

answered 2020-08-11 08:30:51 -0700

Hi Whit,

We have plans to make upgrades that will enable us to support reading CRAM files directly, but at the moment GenomeBrowse does not read them. CRAM is a complicated format and there is really only one good reference implementation of the code to read the file, so we have to make some infrastructure upgrades to get to the point where we can use that library directly.

Gabe

edit flag offensive delete link more
Login/Signup to Answer

Questions should be tagged FeatureRequest for asking about a non-existing feature or proposing a new idea, GeneralInquiry for general questions about GenomeBrowse or directions on how to do something, or RanIntoProblem if you want to report an issue or had difficulty getting to an expected result.

Question Tools

Follow
1 follower

Stats

Asked: 2020-08-07 17:57:06 -0700

Seen: 2,028 times

Last updated: Dec 03 '21