INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Florestas de Arvores de Decisão para fluxo contínuo de Dados

10/30/2003 - 11:00

This work presents an hybrid adaptive system for induction of forest of trees from data streams.

A parallel algorithm for the extraction of structured motifs

10/17/2003 - 13:30
10/17/2003 - 14:30

We present a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm is based on an existing approach for extracting structured motifs. A structured motif consists of an ordered collection of p boxes, p substitution rates and p-1 distances between successive boxes. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure.

Machine Learning Methods for Computational Proteomics and Beyond

07/29/2003 - 15:00
07/29/2003 - 16:00

Predicting protein structure is a fundamental problem in biology, especially in the genomic era where over one third of newly discovered genes have unknown structure and function. Because sequence and structure data (hence training sets) continue to grow exponentially, this area is ideally suited for machine learning approaches. Neural networks, in particular, have had remarkable success and have led, for instance, to the construction of the best secondary structure predictors.

SAT Methods for Multiple Sequence Alignment

07/15/2003 - 11:00
07/15/2003 - 12:00

Multiple sequence alignment is a central and challenging problem in Bioinformatics. Several approaches to it have been tried, some very specialised (heuristic search based on progressive alignment) and some using generic techniques (genetic algorithms, dynamic programming, branch-and-cut). I describe a prototype SAT-based approach that sometimes finds better alignments than standard alignment packages. It is much slower but will be improved in future work.

Clustering, Fuzzy Clustering and Biclustering: An Overview

06/27/2003 - 13:30
06/27/2003 - 14:30

Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects called clusters. According to this definition a cluster is a collection of objects similar to one another within the same cluster and dissimilar to the objects in other clusters. In gene expression data analysis, and by using a microarray gene expression matrix, clustering can be used to group genes according to their expression under multiple conditions, group conditions based on the expression of a number of genes, or even to group genes and conditions simultaneously.

Microarray data normalization and transformation

06/06/2003 - 13:30
06/06/2003 - 14:30

Microarray experiments analyse biological systems under controlled conditions and try to infer biologically meaningful information from the differences observed between gene expression profiles. Actually, one gets spot fluorescence profiles, related but not exactly the mRNA concentration in the samples studied. This talk will focus on the relation between the spot fluorescence values and the desired mRNA concentration, the factors that affect this relation and how to deal with them during the data analysis step. Data normalization and transformation procedures will be discussed.

Extracção de Informação da Literatura Biológica

05/09/2003 - 13:30
05/09/2003 - 14:30

Os resultados obtidos em biologia molecular, tal como em outras áreas de estudo, têm sido na sua maior parte publicados na literatura científica sobre a área. A literatura é um conjunto enorme de informação não estruturada, o que torna penoso o acesso aos resultados nela documentados. Para lidar com este problema, foram criadas ao longo dos últimos anos bases de dados que organizam de forma sistematizada esses resultados.

Pattern analysis of microarray data: gene clustering, gene selection, and sample classification

02/11/2003 - 11:00
02/11/2003 - 12:00

Modern microarray technology provides thousands of gene expression values for each sample. This large amount of data can be analyzed from several perspectives and with different goals. Although standard pattern recognition, machine learning, or statistical analysis methods can be called into action, gene expression data have specific characteristics which demand some special care. For example, in sample classification, one often has to deal with just a few samples (say 10 to 100) in a very high dimensional space (i.e., number of genes, say 1000 to 10000).

Hermes: Server and library of information retrieval models

02/21/2003 - 13:30
02/21/2003 - 14:30

Hermes: a digital library component that allows getting relevant information by means of use of different information retrieval models; text and query processors; and accesses to different collections. Hermes proposes an architecture with which the independence of each level is achieved, allowing its future expansion by adding new models, collections, or text and query processors.

DNA Chips

01/29/2003 - 11:00
01/29/2003 - 12:00

The field of DNA microarrays or DNA chips can potentially revolutionize the acquisition and analysis of genetic information. In these devices, DNA hybridization may occur in a massively parallel manner with different single-strand DNA capture probes immobilized at specific sites on a microarray.