INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

G-Tries: an efficient data-structure for counting subgraphs

12/02/2011 - 14:30
12/02/2011 - 15:30

Complex networks are ubiquitous in real-world systems. In order to understand their design principles, the concept of network motifs emerged. These are recurrent overrepresented patterns of interconnections that can be seen as building blocks of networks. Algorithmically, discovering these motifs is a hard problem, which limits their practical applicability.

I will give an overview of the state of the art in algorithms for finding these patterns. I will then present a novel data structure, g-tries, designed to efficiently represent a collection of graphs and to search for them as induced subgraphs of another larger graph. I will explain how it takes advantage of common substructure and how symmetry breaking conditions can be used to avoid redundant computations. I will also briefly introduce a sampling methodology capable of trading accuracy for even better execution times, and give some notes on the scalability of the methods, showing that they are suitable for a parallel computation.

Finally, I will show an extensive empirical evaluation of the developed algorithms on a set of diversified complex networks, showing that g-tries can outperform all previously existing competitor algorithms.