INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Seminars

Constraint networks as a possible framework for ncRNA signature modeling and searching

06/06/2011 - 16:30
06/06/2011 - 17:30
Etc/GMT

Our understanding of the role of the RNA has changed considerably in recent decades. Advances in molecular biology have shown that some so-called non-coding RNAs (ncRNAs) play different roles in several stages of the life of the cell. These ncRNAs are transcribed from DNA, but, unlike messenger RNAs, they do not code for a protein and are functional. It is now a major challenge to find these new ncRNA in sequenced genomes, and several approaches can be used. One of these approaches focuses on the localization of new members of known ncRNA family. The approach supposes a prior knowledge of the signature of the family, i.e., the conserved sequence and structural elements of the known members of a family. The aim is to find all the regions in a genome that match the signature. However, none of these softwares is based on any precise formalism, their efficiency is variable, and they often lack clear scoring system to enable ranking of solutions. Moreover, none of them, including Infernal, accepts RNA-RNA inter-molecular interactions, which are required to accurately describe some ncRNA families We first proposed MilPat that makes possible to model RNA-RNA interactions, as well as several other sequence and structural elements. MilPat is based on the constraint network framework, and gives good results on the ncRNA localization problem. However, MilPat does not support costs, which is the major drawback of the approach. In this talk, I will present Darn!, which extends the approach implemented in MilPat, by integrating costs, as well as some other mechanisms.

Stochastic Modeling of Stem Cell Induction Protocols

04/15/2011 - 14:00
04/15/2011 - 15:00
Etc/GMT

Generation of pluripotent stem cells starting from adult human cells using induction processes is a technology that has the potential to revolutionize regenerative medicine. However, the production of these so called iPS cells is still quite inefficient and may be dominated by stochastic effects. In this work we build mass action models of the core circuitry controlling stem cell induction and maintenance. The model includes not only the network of transcription factors NANOG, OCT4, SOX2, but also important epigenetic regulatory features of DNA methylation and histone modifications. We are able to show that the network topology reported in the literature is consistent with the observed experimental behavior of bistability and inducibility. Based on simulations of stem cell generation protocols we show that cooperative and independent reaction mechanisms have experimentally identifiable differences in the dynamics of reprogramming, and we analyze such differences and their biological basis. It had been argued that stochastic and elite models of stem cell generation represent distinct fundamental mechanisms. Work presented here illustrates the possibility that rather they represent differences in the amount of information we have about the distribution of cellular states before and during reprogramming protocols. We show that unpredictability decreases as the cell moves through the necessary induction stages, and that identifiable groups of cells with elite-like behavior can come about by stochastic process. We also show how different mechanisms and kinetic properties impact the prospects of improving the efficiency of iPS cell generation protocols.

Computational Methods for DNA Resequencing: A Survey

03/18/2011 - 14:00
03/18/2011 - 15:00
Etc/GMT

Recent developments in next-generation sequencing technologies allow constantly increasing throughput and shorter running times while reducing the costs of the sequencing process. This leads to the production of huge amounts of data which raise important computation challenges not only due to the large volume of information but also to the increase of the reads length and sequencing errors. Several assembly and mapping tools have recently been developed for generating assemblies from short, unpaired sequencing reads. However, the need for faster and more accurate algorithmic approaches to keep up with the demand of frequently emerging resequencing projects, justify the growing number of short read mapping tools that surfaced in the last couple of years. In this report we present an overview of the state of the art software applications detailing their algorithms and data structures.

A Tutorial on Genetic Programming

02/15/2011 - 13:00
02/15/2011 - 14:00
Etc/GMT

Genetic Programming (GP) is the youngest paradigm inside the Artificial Intelligence field called Evolutionary Computation. Created by John Koza in 1992, it can be regarded as a powerful generalization of Genetic Algorithms, but unfortunately it is still poorly understood outside the GP community. The goal of this tutorial is to provide motivation, intuition and practical advice about GP, along with very few technical details.

Methods for the Detection of Multilocus Interactions

02/11/2011 - 14:00
02/11/2011 - 15:00
Etc/GMT

In recent years there has been intense research to find genetic factors that influence common complex traits. The approach that is commonly followed to discover those associations between genetic factors and complex traits such as diseases is to perform a Genome-Wide Association Study (GWAS). It has been pointed out that there is no single marker for disease risk and no single protective marker but, rather, a collection of markers that confer a graded risk of disease. As an example of this, it has been suggested that many genes with small effects rather than few genes with strong effects contribute to the development of asthma. For human height the heritability explained with SNPs discovered with GWAS is about 5%. However, a recent study showed that it is possible to explain around 45% of the phenotypic variance for height with GWAS data. The problem is that the individual effects of the interacting SNPs are too small to be detected with common statistical methods. This shows that there is a need for powerful methods that are able to consider interactions between SNPs with low marginal effects. In this document we describe a wide range of methods that have been proposed to detect interactions between SNPs in association studies data. We will give examples of statistical methods (explaining also how to deal with the multiple testing problem), search methods (deterministic and stochastic) and machine learning methods.

Methods for the Detection of Multilocus Interactions

02/11/2011 - 14:00
02/11/2011 - 15:00
Etc/GMT

In recent years there has been intense research to find genetic factors that influence common complex traits. The approach that is commonly followed to discover those associations between genetic factors and complex traits such as diseases is to perform a Genome-Wide Association Study (GWAS). It has been pointed out that there is no single marker for disease risk and no single protective marker but, rather, a collection of markers that confer a graded risk of disease. As an example of this, it has been suggested that many genes with small effects rather than few genes with strong effects contribute to the development of asthma. For human height the heritability explained with SNPs discovered with GWAS is about 5%. However, a recent study showed that it is possible to explain around 45% of the phenotypic variance for height with GWAS data. The problem is that the individual effects of the interacting SNPs are too small to be detected with common statistical methods. This shows that there is a need for powerful methods that are able to consider interactions between SNPs with low marginal effects. In this document we describe a wide range of methods that have been proposed to detect interactions between SNPs in association studies data. We will give examples of statistical methods (explaining also how to deal with the multiple testing problem), search methods (deterministic and stochastic) and machine learning methods.

Biclustering-based Classification of Clinical Expression Time Series: A Case Study in Patients with Multiple Sclerosis

02/04/2011 - 15:00
02/04/2011 - 16:00
Etc/GMT

In the last years the constant drive towards a more personalized medicine led to an increasing interest in temporal gene expression analyses. In fact, considering a temporal aspect represents a great advantage to better understand disease progression and treatment results at a molecular level. In this work, we analyse multiple gene expression time series in order to classify the response of Multiple Sclerosis patients to the standard treatment with Interferon-β , to which nearly half of the patients reveal a negative response. In this context, obtaining a highly predictive model of a patient’s response would definitely improve his quality of life, avoiding useless and possibly harmful therapies for the non-responder group. We propose new strategies for time series classification based on biclustering. Preliminary results achieved a prediction accuracy of 94.23% and reveal potentialities to be further explored in classification problems involving other (clinical) time series.

Network-Based Disease Candidate Gene Prioritization: Towards Global Diffusion in Heterogeneous Association Networks

01/28/2011 - 14:00
01/28/2011 - 15:00
Etc/GMT

Disease candidate gene prioritization addresses the association of genes with disease susceptibility. Network-based approaches have successfully exploited the connectivity of biological networks to compute a disease-relatedness score between candidate and known disease genes. Nonetheless, available strategies yield three major concerns: (1) most networks used rely exclusively on curated physical interactions, resulting in poor genome coverage and sparsity issues; (2) devised scores are often local and thus restrict the search to a limited neighborhood around known genes and ignore potentially informative indirect paths; (3) some methods disregard interaction confidence weights which could confer extra reliability. Results: We hypothesized that capturing disease-relatedness at the interactome scale based on weighted gene associations integrated from heterogeneous sources is likely to outperform current methods lacking one of these features and proposed to combine a particular personalized ranking method with data from STRING. Our claim was confirmed in comparative leave-one-out cross-validation case studies assessing the impact of network density and coverage, score globality and confidence weights on the prioritization of candidate genes for 29 diseases. Finally, the proposed method was applied to Parkinson’s disease and proved effective to recover prior knowledge and unravel interesting genes which could be linked to several pathological mechanisms of the disease.

A Computational Device Based on Regulation

01/14/2011 - 14:00
01/14/2011 - 15:00
Etc/GMT

Nature is a great designer and problem solver. The theories posit by Darwin, Mendel and all those that contributed for the modern synthesis, based on molecular biology, explained us how this could happen. Some decades ago, computer scientists start proposing computational models, called evolutionary algorithms, based on some of the processes used by nature, in order to solve problems that either do not have an analytical solution or are to costly if we apply exact methods. Along time, many complex problems were satisfactory solved by those algorithms, even if those nature-inspired heuristic methods are very simplistic, and based on a basic separation between the genotype and the phenotype. In recent years, the biologic understanding was increased with the comprehension of the multitude of regulatory mechanisms that are fundamental in both processes of inheritance and of development, and some researchers advocate the need to explore computationally this new understanding. One of the outcomes was the Artificial Gene Regulatory model, first proposed by Wolfgang Banzhaf. In this talk, we will present a modification of this model, aimed at solving some of its limitations, and show experimentally that it is effective in solving a set of benchmark problems. We will also discuss some future developments of the model.

In-silico strategies in drug design

12/13/2010 - 12:30
12/13/2010 - 13:30
Etc/GMT