INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Kernel methods for the prioritization of candidate genes

12/19/2008 - 11:00
12/19/2008 - 12:00
Etc/GMT

Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates. So far, biologists have relied on literature studies, extensive queries to multiple databases and hunches about expected properties of the disease gene to determine such an ordering. Recently, the data mining tool ENDEAVOUR has been introduced, which performs this task automatically by relying on different genome-wide data sources, such as Gene Ontology, literature, microarray, sequence and more. A novel kernel method that operates in the same setting is presented: based on a number of different views on a set of training genes, a prioritization of test genes is obtained. A thorough theoretical analysis of the guaranteed performance of the method will also be presented. Finally, the application of the method to the disease data sets on which ENDEAVOUR has been benchmarked, will be reported, showing that a considerable improvement in empirical performance has been obtained.