Home   Documentation   Download   Publications   Contact

Tool for Alignment of Pyrosequencing Reads

Command Line Options

Build index:
./tapyr I <reference-file>
The indexing operation only needs to be performed once for each reference file.
The input reference genome file should be in the FASTA format.
The output index is a binary file with the same name as the genome file but with extension ".fmi"
Multiple reference sequences/contigs inside the same file are supported.
Non A/C/G/T characters will be discarded (support will be added in the future v1.3).
Support for genomes with more than 2Gbps will only be added in the next version (v1.3).

Align reads:
./tapyr <index-file> <reads-file> <options>
The index file is the file with extension ".fmi" created in the previous step.
The input reads file(s) should be in FASTA, FASTQ or SFF formats.
Multiple input reads files can be provided.
The mapped reads are returned according to the SAM format specification.
The output file will have the same name as the first/only reads file but with extension ".sam"
For paired-end reads just provide both reads files instead of just one (see bellow).

Optional parameters:
 -B output only the best hit for each read instead of all hits
 -I minimum identity percentage required for each read (default is 80%)
 -E maximum number of errors (mismatches and indels) allowed for each read

Align paired-end reads:
./tapyr <index-file> <reads-file-1> <reads-file-2> <options>
If an insertion size is provided, only pairs of hits satisfying that distance will be accepted.
Multiple pairs of reads files with different insertion sizes can be provided.
Single files in SFF format containing paired-end reads will be auto-detected and processed accordingly.

Optional parameters:
 -P force handling of all files as paired-end reads files, if they are not auto-detected
 -D average insertion size between both mates (distance in basepairs)
 -S standard deviation in insertion size (distance in basepairs) (default is 25% of size)

Generate consensus:
./tapyr C <reference-file> <mapped-reads-file>
The reference file should be in the FASTA format and the mapped reads file in the SAM format.
The positions that are not covered by any reads are replaced by N's.
Two files in the FASTA format will be produced, containing the consensus and the isolated contigs.
A SNP or indel is assumed only if:
 · the number of reads is more than half the average reference coverage
 · the mapping quality of the reads is more than half the average mapping quality
 · there are reads from both strands

Visualize alignment:
./tapyr V <reference-file> <mapped-reads-file> <start-position> <end-position>
If no start and end positions are given, a global coverage mapping plot will be generated.
If the range between the start and end positions is less than 1000 bp, a local mapping plot is created.
The reference file should be in the FASTA format and the mapped reads file in the SAM format.
The alignment map image is returned in the BMP format.


tapyr I MyGenome.fasta
tapyr MyGenome.fmi MyReads.fasta
tapyr C MyGenome.fasta MyReads.sam
tapyr V MyGenome.fasta MyReads.sam