Download and Install

Download and decompress the archive VDJSeq-Solver.tar.gz from the 'Releases' Section  into a local folder.

 tar xzf VDJSeq-Solver.tar.gz

Being build on top of the following programs:

You need to install the afore mentioned softwares to run VDJSeq-Solver program. If you dont' want to install these programs on your machine you can also take advantage of the directory TOOLS we included in VDJSeq-Solver.tar.gz distribution by adding the programs contained in the directory to your PATH. All these programs have been modified in the next versions, but being VDJSeq-Solver developed considering the reported version, it is recommended to use them.

Back to top

Requirements

All VDJSeq-Solver pipeline is written in Java and Perl languages so make sure that these two programming languages are currently present on your machine. The recommended versions are respectively 1.6.0_16 and 5.8.8.

You can check the version of both the programming languages by typing:

perl -version                                                                         java -version

Back to top

Datasets

In order to run VDJSeq-Solver you need to provide the bowtie index of the human genome, the VJ and D gene sequences in fasta format, and finally a .bed file containing the location relative to heavy chain locus genes  In VDJSeq-Solver distribution we included these files in the directory called REFERENCE.

In particular:

Build the reference is however possible following the bowtie-build instructions, whereas the other files can be easily downloaded from online repositories.

We also provide a dataset test of paired-end 100 bp RNA-Seq reads from MCL (Mantel Cell Lymphoma) located in VDJSeq-Solver  main folder  (sequence_1.fastq, sequence_2.fastq) . It is a subset of reads belonging to a real sample  reduced because the complete dataset was not yet published. The main clone recombination involves, however once a time IGHV2-70*11 IGHD1-26*01 IGHJ4*02 alleles. The recombined sequence reconstructed (where g3 is imposed equal to 4 as detailed in the following) is:

CGGGTCACCTTGAGGGAGTCTGGTCCTGCGCTGGTGAAACCCACACAGACCCTCACACTGACCTGCACCTTCTCTGGGTTCTCACTCAG CACTAGTGGAATGTGTGTGAGCTGGATCCGTCAGCCCCCAGGGAAGGCCCTGGAGTGGCTTGCACGCATTGATTGGGATGATGATAAAT ACTACAGCACATCTCTGAAGACCAGGCTCACCATCTCCAAGGACACCTCCAAAGACCAGGTGGTCCTTACAATGACCAACATGGACCCT GTGGACACAGCCACGTATTACTGTGCACGGTCTGCAGGTGGGAGCTACCCCGATGACTACTGGGGCCAGGGAACCCTGGTCACCGTCTC CTCAG



Back to top

Run

In order to run VDJSeq-Solver you have to launch inside the  VDJSeq-Solver directory the VDJSeq-Solver_run.pl script with the following options:

-g1: First mate file in .fastq format
-g2: Second mate file in .fastq format
-g3: Number of VJ recombinations for which the D analysis has to be performed

An out directory will be automatically created into the main folder. It contains:

num_supporting_reads J_allele-0-V_allele

1) output_classify.txt file: a .txt file reporting every D allele identified after Shrimp alignment as   recombinant for the VJ couple with relative score (number of mates supporting it). For instance:

  dir_name

  num_supporting_VJ_reads VJ_couple

  D_allele_name num_supporting_D_mates

            If a D allele can't be detected from the reads, the last line of the file will report this information.

2) output_def.txt file: a sorted .txt file containing the number of mates supporting the recombination sequences created by VDJSeq-Solver that involves the most scored D allele for the specific VJ recombination. For instance:

  num_supporting_VDJ_sequence_reads VDJ_alleles VDJ_sequence

 

Back to top