INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Florestas de Arvores de Decisão para fluxo contínuo de Dados

10/30/2003 - 11:00

This work presents an hybrid adaptive system for induction of forest of trees from data streams.
Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is based on the Hoeffding bound. For multi-class problems, the algorithm builds a binary tree for each possible pair of classes leading to a forest of trees. We study the behavior of the system in different problems and demonstrate its utility in large and medium data sets.