INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Sublinear alignment

02/27/2004 - 16:30
02/27/2004 - 17:30
Etc/GMT

Classical algorithms for computing the similarity between two sequences use a dynamic programming matrix, and compares two strings of size n in O(n2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global similarity computations. The speed-up is achieved by using the Lempel-Ziv parsing of both strings. It leads to an O(h n2 / logn) algorithm for an input on a constant-size alphabet, where h < 1 is the entropy of strings.