INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Seminars

Shortest Path Search Algorithms with Heuristic and Bidirectional Searches

10/21/2004 - 16:00
10/21/2004 - 17:30
Etc/GMT

Algoritmos de pesquisa por caminhos são estudados desde a antiguidade, mas desenvolveram-se formalmente sobre o modelo de matemático de Grafo principalmente no século passado. O algoritmo de Dijkstra apresentado em 1959 faz uma pesquisa BFS à partir da origem até encontrar o destino. Em 1968 Hart, Nilsson e Raphael apresentaram o algoritmo A*, que usa informações da estrutura do grafo para saber quais são os nodos mais promissores a pesquisar. A* inaugurou a "pesquisa heurística", mas ainda assim é capaz de encontrar o caminho ótimo. Também foi provado que ele é extremamente eficiente, e sob algumas condições é um algoritmo ótimo, ou seja, não é possível existir outro algoritmo de pesquisa unidirecional que seja mais eficiente. A pesquisa heurística "bidirecional" é investigada desde a década de 70, mas somente no final da década da 90 obtiveram-se resultados melhores que A*. Kaindl e Kainz em 1996 e 1997 mostram a pesquisa heurística bidirecional não simultânea, e o melhor uso possível da memória disponível em uma máquina. Já o algoritmo LCS* é apresentado por Johann, Caldwell, Kahng e Reis em 2000 e é o primeiro algoritmo de pesquisa heurística bidirecional simultânea com desempenho médio melhor que A*. Nesta palestra apresentaremos esses algoritmos e outros recentes avanços em pesquisa heurística, seus princípios de funcionamento, propriedades formais e desempenho, os quais têm inúmeras aplicações em variadas áreas da ciência da computação, como inteligência artificial, síntese de circuitos VLSI, trânsito, etc...

Gene Function Prediction by Mining Bioimedical Literature

10/07/2004 - 16:00
10/07/2004 - 17:30
Etc/GMT

This seminar will discuss the application of text mining to automate the identification of the function of large sets of genes from the biomedical literature. An approach will be presented to obtain this knowledge as annotations that associate biologic entities to Gene Ontology terms.

Estimating the Conformational Energy of Biological Macromolecules

06/29/2004 - 14:00
06/29/2004 - 15:00
Etc/GMT

The conformational energy of a molecule in solution consists of two components: the intramolecular potential energy of the molecule and the solvation free energy of the molecule, the latter describing its interaction with the solvent. Although the intramolecular potential energy of small organic molecules can be calculated to a high degree of accuracy with modern Quantum Mechanical methods, biological macromolecules such as proteins and nucleic acids are far too large for the application of these methods. Alternative strategies for estimating the intramolecular potential energy have had to be devised for these molecules. These can be broadly classifed as physical effective energy function (PEEF) methods and statistical effective energy function (SEEF) methods. PEEFs attempt to reproduce the true Born-Oppenheimer surface of the molecule. This class contains the various Molecular Mechanics force fields. SEEFs are derived from the statistical analysis of the known 3D structures of biological macromolecules and attempt to reproduce the observed frequencies of given features in these structures. This class contains, among many others, the protein threading potentials. As for solvation, the most direct strategy consisting of modelling solvent molecules explicitly is usually too slow for most applications. Therefore, several implicit solvent solvation models have been developed to estimate the solvation free energy of a molecule. In this talk, I describe the general features of PEEF and SEEF methods, as well as the directions current research on the development of these methods is taking. I will also give an overview of the solvation methods most commonly applied to biological macromolecules. These will include, among others, the Poisson-Boltzmann method and the Langevin Dipoles method.

Secure Computing: Tamper Evident and Tamper Resistant Processing

06/25/2004 - 14:00
06/25/2004 - 15:00
Etc/GMT

Usually security protects the computer owner from attack, but sometimes the computer owner is the potential "enemy". For example, in digital rights management the owner of a computer platform is motivated to break its security to make illegal copies of protected digital content. In mobile agent applications sensitive electronic transactions are performed at untrusted hosts.

With the proliferation and increasing usage of embedded, portable and wearable devices, in addition to protecting against attacks from malignant software, we also have to be concerned with physical attacks that corrupt data, discover private data or violate copy-protection. In this presentation key concepts towards a single-chip secure processor are dicussed. This includes new ideas regarding memory integrity checking, processor architecture, and controlled physical random functions.

Model Checking

05/28/2004 - 14:00
05/28/2004 - 15:00
Etc/GMT

The process of verifying the correctness of a reactive system consists of establishing that the system behaves as expected. This is usually a crucial task, with key economic significance. Most often, and for hardware systems, the verification of the correct behavior of the system is ensured by model checking. The objective of this talk is to provide a brief tutorial on the verification of reactive systems by model checking. The main topics covered in the talk will be the utilization of model checking in the verification of reactive systems, the syntax and semantics of computation tree logic, the analysis of a simple model checking algorithm, the utilization of symbolic model checking, and the utilization of satisfiability-based model checking algorithms.

Feature Selection for Supervised Learning

05/14/2004 - 14:00
05/14/2004 - 15:00
Etc/GMT

It has recently been showed that feature selection in supervised learning can be embedded in the learning algorithm by using sparsity-promoting priors/penalties that encourage the coefficient estimates to be either significantly large or exactly zero. In the first half of this talk, I will review this type of approach (which includes the well-known LASSO criterion for regression) and present some recent developments: (i) simple and efficient algorithms (with both parallel and sequential updates) for both binary and multi-class problems; (ii) generalization bounds; (iii) feature selection "inside" the kernel for kernel-based formulations. Experimental results (on standard benchmark data-sets and also on gene expression data) reveal that this class of methods achieves state-of-the-art performance.

Rényi continuous entropy of DNA sequences

04/16/2004 - 14:00
04/16/2004 - 15:00
Etc/GMT

Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon�s formalism is proposed. Rényi�s quadratic entropy calculated with Parzen window density estimation method applied to Chaos Game Representation/Universal Sequence Map of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behavior of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation.

Predição da Estrutura Terciária das Proteínas: Algoritmos e Complexidade

03/26/2004 - 14:00
03/26/2004 - 15:00
Etc/GMT

A predição da estrutura terciária das proteínas é um dos problemas centrais da bioquímica. Para além do interesse académico na resolução deste problema, existe um grande interesse por parte da indústria, nomeadamente por parte da insdústria farmacêutica. A principal abordagem a este problema passa por determinar o minímo global de uma dada função de energia. São vários os modelos matemáticos e os métodos propostos. No entanto, verifica-se que este problema de optimização é de facto NP-hard. Os modelos simplificados conduzem por sua vez a problemas desta mesma classe. Por outro lado, é conhecido o facto de que a natureza resolve este problema numa fracção de segundos! Ao mesmo tempo têm vindo a ser propostos vários métodos alternativos baseados em aprendizagem e data mining, nomeadamente recorrendo a classificadores como as SVM's. Nesta apresentação iremos discutir as principais abordagens ao problema, em particular a classificação com SVM's.

Sublinear alignment

02/27/2004 - 16:30
02/27/2004 - 17:30
Etc/GMT

Classical algorithms for computing the similarity between two sequences use a dynamic programming matrix, and compares two strings of size n in O(n2) time. We address the challenge of computing the similarity of two strings in sub-quadratic time, for metrics which use a scoring matrix of unrestricted weights. Our algorithm applies to both local and global similarity computations. The speed-up is achieved by using the Lempel-Ziv parsing of both strings. It leads to an O(h n2 / logn) algorithm for an input on a constant-size alphabet, where h < 1 is the entropy of strings.

Genetic Programming

02/06/2004 - 13:00
02/06/2004 - 14:00
Etc/GMT

Genetic Programming (GP) solves complex problems by evolving populations of computer programs, using Darwinian evolution and Mendelian genetics as inspiration. Bloat is an excess of code growth caused by the genetic operators in search of better solutions, without the corresponding improvement in fitness. After a brief overview of GP, we will focus on the causes and possible solutions for bloat, including a recently published technique by the authors.