===== STRViper: Short Tandem Repeat Variation Identification from Paired-End Reads ===== === Introduction === STRViper is a bioinformatics tool for detection of short tandem repeat (STR) variations from paired-end next generation sequencing data. STRViper makes variant calls based on deviations in sequence fragment sizes, allowing the analysis of repeats of size up to the fragment length. This strategy also helps avoiding false calls resulting from errors that arose from repetitive DNA. STRViper is written in Java and hence can be run cross platforms. We provided scripts for conveniently running STRViper on Unix, Linux and MacOS and scripts for Windows are under development. If you are interested in knowing more about STRViper, please contact Minh Duc Cao (minhduc.cao@gmail.com). If you find STRViper useful for your research, please cite it in your publications. Cao MD, Tasker E, Willadsen K, Imelfort M, Vishwanathan S, Sureshkumar S, Balasubramanian S and Bodén M (2014) Inferring Short Tandem Repeat Variation from Paired-End Short Reads, //Nucleic Acids Research//. Feb;42(3):e16. DOI: [[http://dx.doi.org/10.1093/nar/gkt1313|10.1093/nar/gkt1313]]. === Installation (original version) === 1. Download the strviper package: http://bioinf.scmb.uq.edu.au:8080/STRViper/strviper.tar.gz 2. Unzip the package to $PATH_TO_STRV (i.e., tar zxvf strviper.tar.gz) 3. Add $PATH_TO_STRV/scripts to you $PATH === Installation (JAPSA version) === 1. Download JAPSA from https://github.com/mdcao/japsa 2. Follow instructions there === Call variation (original version) === Preprocessing: Obtain a list of STRs from the genome using trf, and convert it to the format that STRViper will read in: $ trf TAIR10.fas 2 5 5 80 10 40 6 -h $ jsat.str parseTRF --input TAIR10.fas.2.5.5.80.10.40.6.dat --output TAIR10.str --format str Align reads using an aligner such as bwa, and sort the bam file. It is important to ensure that the chromosome names and order in the genome are identical to that in the bam file. $ bwa index TAIR10.fas $ bwa mem TAIR10.fas read.1.fq.gz read.2.fq.gz > lib.sam $ samtools view -bS lib.sam > lib.bam $ samtools sort lib.bam lib.sort Extract fragment size from samfile $ jsat.str sam2fragment --input lib.sort.bam --output lib.fragment $ jsat.str sam2fragment --input lib.sort.bam --output lib.fragment Sort fragment list $ jsat.str sortFragment --input lib.fragment --output lib.sort.fragment Make the variation calls $ jsat.str fragment2var --trfFile TAIR10.str --output lib.strv lib.sort.fragment (Optional) Convert variation calls in strv format to vcf format $ jsat.str strv2vcf --input lib.strv --output lib.vcf --reference TAIR10.fas === Call variation (JAPSA version) === As above but note the equivalent programs are in: jsa.trv.parseTRF jsa.trv.sam2fragment jsa.trv.sortFragment jsa.trv.fragment2var