INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Improved Indexing of Text using the Ziv-Lempel Trie

01/06/2006 - 14:30
01/06/2006 - 15:30
Etc/GMT

A full-text index provides fast search for any pattern in a text. Tradicional full-text indexes, however, have a serious problem with space usage. A recent trend is to develop indexes that exploit the compressibility of the text so that their size is a function of the size of the compressed text. This field has introduced the concept of self-indexes, which, in addition to providing index capabilities, are capable of replacing the original text. The two most successful lines of research are the ones exploring compressed suffix arrays and the Burrows-Wheeler transform. In this talk we will describe an algorithm that builds an index based on the well known Ziv-Lempel data compression technique. This algorithm represents a significant improvement over previous methods. We expect that it will be very competitive against the two mainstream approaches.