INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Sucint structures to self-indexing text

09/28/2012 - 11:00
09/28/2012 - 12:00

The development of applications that manage large text
collections needs indexing methods which allow efficient retrieval
over text. Several indexes have been proposed which try to reach a
good trade-off between the space needed to store both the text and the
index, and its search efficiency.

Self-indexes are becoming more and more popular in the last years. Not
only they index the text, but they keep enough information to recover
any portion of it without the need of keeping it explicitly.
Therefore, they actually replace the text.

In this talk I will present two useful self-index with good
properties. They need only about a 35% of the space of the plain text,
but they can efficiently answer retrieval queries thanks to their
indexing capabilities.