Download and Install


You can install Bellerophontes in two different ways.

The easiest (and preferred) way is to install the provided .deb file. In debian based distributions you can open the "bellerophontes.deb file. After the automated procedure, the tool will be installed.

screenshot
Or, if you prefer, you can install it by typing:
sudo dpkg -i bellerophontes.deb


If you don't have a debian-based linux distribution, or if you prefer to install it manually, download and decompress the archive into a local folder.
tar xzf bellerophontes_0.4.0.tgz

If you chose to decompress the archive, you need to manually add the install directory to your PATH, and follow the instructions provided in the README.txt file.

Bellerophontes is build on top of the following programs:

  • TopHat (1.0.14) - optional
  • Cufflinks (0.9.3) - optional
  • Bowtie (> 0.12.5)
  • Blast

These programs have been modified in the next version and Bellerophontes has been developed considering the reported version. It is recommended to use these version. Please note that TopHat and Cufflinks are fully optional, but strongly recommended in order to generate a new transcriptoma based on the sample under study.

Requirements

 

Be sure you have installed java6-jdk and java6-jre on your machine. In ubuntu 11.04 you can do:

 sudo vi /etc/apt/source.list 

add the following line at the end of the file:

 deb http://archive.canonical.com/ubuntu maverick partner 

then type:

sudo apt-get update
sudo apt-get install sun-java6-jre sun-java6-jdk

BLAST program is also needed. Specifically, Bellerophontes needs blastall command that is part of the blast2 package available in ubuntu distrubution:

sudo apt-get install blast2

Install EMBOSS suite:

sudo apt-get install emboss


If you are using an OS different from Ubuntu, check the oracle site for instructions on how to install java.

Data Sets

 

In order to run Bellerophontes you need to provide bowtie index of the human genome. In Bellerophontes distribution we also include a pre-built index of the human genome HG19 located under Bellerophontes/reference.

Follow the bowtie-build instruction on how to build a reference.

We also provide a dataset test sample of paired-end 75 bp RNA-Seq Data of Chronic Myelogenous Leukaemia. The two mates fastq files are provided with the Bellerophontes distribution into the Bellerophontes/samples folder. The provided sample is a subset of reads belonging to the real sample but still revealing the BCR-ABL1 fusion.

Configuration

 

A configuration file properties.config is included in the Bellerophontes distribution. If you want to run the Bellerophontes test data set (recommended the first time) you should use the default configuration. A configuration file should look like this:

# Configuration File
 #AutoGenerated #Fri May 18 12:24:40 CEST 2012
intersectBED_exec=intersectBED
min_enco_read=8
trim_size=0
bowtie_exec=bowtie
skip_bowtie=yes
min_word_length_spanning=15
mate_length=50
max_gap_distance=5000
maximum_inner_length=400

Gene Filtering Candidates (optional)

A filtering candidates could be evaluated in details using the optional file gene_filter_list.txt

Where a list of gene_id could be written in "or" condition as well as a couple of geneid involved in funsion gene in "and" condition, in the following way:

<gene_id1>|<gene_id2>|<gene_id3>|<gene_id3>|<gene_id4>|<gene_id5>
<gene_id6>&<gene_id7>
<gene_id8>&<gene_id9>
<gene_id(N)>&<gene_id(M)>
i.e.
ENSG00000149212|ENSG0000019885|ENSG000002422996|ENSG00000249193|ENSG00000121594
ENSG00000121594&ENSG00000114631

Using Gene Filtering Candidates is possible to focus bellerophontes to some genes subset or fusions

Run

 

In order to run Bellerophontes you have to launch the Bellerophontes executable with the following options:

bellerophontes
 -a,--annotation-file [arg]   GTF Annotation file
 -d,--working-dir [arg]       working_dir
 -f1,--fastq1 [arg]           First mate file
 -f2,--fastq2 [arg]           Second mate file
 -g,--genome-file [arg]       Genome file
 -h,--help                    Print this help message
 -n,--de-novo                 A new reference will be generated, using the
			genome coverage computed using the provided
			samples
 -t,--thread-level [n]      The number of thread that will be used (Default: 1)
 -u1,--unmapped1 [arg]        Unmapped mate file (optional)
 -u2,--unmapped2 [arg]        Unmapped mate file (optional)

Notes: the genome reference folder should contain both the Bowtie indexes and the original fasta file. For instance, suppose that the fasta file is the hg19.fa file (provided in the Bellerophontes distribution located on Bellerophontes/reference) the folder should contain the following files:

 hg19.1.ebwt
 hg19.2.ebwt
 hg19.3.ebwt
 hg19.4.ebwt
 hg19.fa
 hg19.fa.fai
 hg19.rev.1.ebwt
 hg19.rev.1.ebwt
 hg19.rev.2.ebwt

also a default annotation file is provided in the data set, named Homo_sapiens.GRCh37.60.chr.gtf.tar.gz. Alternatively, the annotation file from the UCSC can be retrieved with the following procedure:

 wget ftp://ftp.ensembl.org/pub/current/gtf/homo_sapiens/Homo_sapiens.GRCh37.60.gtf.gz
 gunzip Homo_sapiens.GRCh37.60.gtf.gz

For example, from any directory you can run:

 bellerophontes -t 8 -a Homo_sapiens.GRCh37.60.chr.gtf -g reference/hg19/hg19 -d working_dir/ -f1 samples/s_7_1_sequence_chr22chr9.fq -f2  samples/s_7_2_sequence_chr22chr9.fq --de-novo

all temporary files and results will be stored into the "working_dir" folder.

So far, Bellerophontes produces two report files:

  • report.txt: list of detected fusions. For instances:
gene1_name gene1_id strand1 chr_gene1 gene2_name gene2_id strand2 chr_gene2 start_spanning_gene1 breakpoint1 | breakpoint2 end_spanning_gene2 #spanning #encompassing
breakpoint_sequence

  • FilterHS_final_result.txt: list of valid encompassing regions.