INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

An Efficient Algorithm for Generating Super Condensed Neighborhoods

02/10/2005 - 16:30
02/10/2005 - 17:30
Etc/GMT

Approximate string matching is the problem of finding a pattern in a text allowing for some errors to occur.
Indexing methods for the approximate string matching problem spend a considerable effort generating condensed neighborhoods. In this session we point out that condensed neighborhoods are not a minimal representation of a pattern neighborhood. We show that we can restrict our attention to super condensed neighborhoods which are minimal. We then present a basic algorithm for generating Super Condensed Neighborhoods. The algorithm runs in O(m s), where m is the pattern size and s is the size of the super condensed neighborhood. Previous algorithms took O(m c) time, where c is the size of the condensed neighborhood. We present an implementation of the algorithm using modern Bit-Parallelism and Increased Bit-Parallelism techniques.