This work presents an hybrid adaptive system for induction of forest of trees from data streams.
We present a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm is based on an existing approach for extracting structured motifs. A structured motif consists of an ordered collection of p boxes, p substitution rates and p-1 distances between successive boxes. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure.
Predicting protein structure is a fundamental problem in biology, especially in the genomic era where over one third of newly discovered genes have unknown structure and function. Because sequence and structure data (hence training sets) continue to grow exponentially, this area is ideally suited for machine learning approaches. Neural networks, in particular, have had remarkable success and have led, for instance, to the construction of the best secondary structure predictors.
Multiple sequence alignment is a central and challenging problem in Bioinformatics. Several approaches to it have been tried, some very specialised (heuristic search based on progressive alignment) and some using generic techniques (genetic algorithms, dynamic programming, branch-and-cut). I describe a prototype SAT-based approach that sometimes finds better alignments than standard alignment packages. It is much slower but will be improved in future work.
Clustering is the process of grouping a set of physical or abstract objects into classes of similar objects called clusters. According to this definition a cluster is a collection of objects similar to one another within the same cluster and dissimilar to the objects in other clusters. In gene expression data analysis, and by using a microarray gene expression matrix, clustering can be used to group genes according to their expression under multiple conditions, group conditions based on the expression of a number of genes, or even to group genes and conditions simultaneously.
Microarray experiments analyse biological systems under controlled conditions and try to infer biologically meaningful information from the differences observed between gene expression profiles. Actually, one gets spot fluorescence profiles, related but not exactly the mRNA concentration in the samples studied. This talk will focus on the relation between the spot fluorescence values and the desired mRNA concentration, the factors that affect this relation and how to deal with them during the data analysis step. Data normalization and transformation procedures will be discussed.
Os resultados obtidos em biologia molecular, tal como em outras áreas de estudo, têm sido na sua maior parte publicados na literatura científica sobre a área. A literatura é um conjunto enorme de informação não estruturada, o que torna penoso o acesso aos resultados nela documentados. Para lidar com este problema, foram criadas ao longo dos últimos anos bases de dados que organizam de forma sistematizada esses resultados.
Modern microarray technology provides thousands of gene expression values for each sample. This large amount of data can be analyzed from several perspectives and with different goals. Although standard pattern recognition, machine learning, or statistical analysis methods can be called into action, gene expression data have specific characteristics which demand some special care. For example, in sample classification, one often has to deal with just a few samples (say 10 to 100) in a very high dimensional space (i.e., number of genes, say 1000 to 10000).
Hermes: a digital library component that allows getting relevant information by means of use of different information retrieval models; text and query processors; and accesses to different collections. Hermes proposes an architecture with which the independence of each level is achieved, allowing its future expansion by adding new models, collections, or text and query processors.
The field of DNA microarrays or DNA chips can potentially revolutionize the acquisition and analysis of genetic information. In these devices, DNA hybridization may occur in a massively parallel manner with different single-strand DNA capture probes immobilized at specific sites on a microarray.