INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Compressing Web Graphs as Texts

09/28/2007 - 16:30
09/28/2007 - 17:30

The need to run different kinds of algorithms over large Web graphs motivates the research for compressed graph representations that permit accessing without decompressing them. At this point there exist a few such compression proposals, some of them very effective in practice. In this talk we introduce a novel approach to graph compression, based on regarding the graph as a text and using existing techniques for text compression/indexing. This permits accessing the graph efficiently without decompressing it, and in addition brings in new functionalities over the compressed graph. Our experimental results show that our technique has the potential of being competitive with the best alternative techniques, yet not fully satisfactory. Then we introduce a second approach, where we go back to pure compression. By far the best current result is the technique by Boldi and Vigna, which takes advantage of several particular properties of Web graphs. We show that the same properties can be exploited with a different and elegant technique, built on Re-Pair compression, which achieves about the same space but much faster navigation of the graph. Moreover, the technique has the potential of adapting well to secondary memory. Finally, we comment on ongoing work to combine those approaches. The successful scheme can be enriched with succinct data structures so as to permit further graph traversal operations.