INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
-
technology from seed

kdbio

Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa
Home
 
 

Seminars

Motif representation and discovery

01/08/2010 - 14:00
01/08/2010 - 15:00
Etc/GMT

Location(s)

INESC-ID
Portugal

An important part of gene regulation is mediated by specific proteins, called transcription factors (TF), which influence the transcription of a particular gene by binding to specific sites on DNA sequences, called transcription factor binding sites (TFBS). Such binding sites are relatively short stretches of DNA, normally 5 to 25 nucleotides long. A commonly used representation of TFBS is a position specific scoring matrices (PSSM) which assumes independence of nucleotides in the binding sites. Recently, some works argued in the direction of non-additivity in protein-DNA interactions making a way for more complex models to appear which account for nucleotide interactions. We propose to model TFBS representing nucleotide interactions with consistent k-graph Bayesian networks (where k represents the maximum number of interactions between nucleotides) jointly with a set of features, directly scored from each base sequence, which appear to be relevant for TFBS characterization. The model is flexible to incorporate any set of features scored from base sequences. We consider discriminative learning of such models since it outperforms generative learning in the context of classification with a large set of features.

Management and analysis of heterogeneous biological data : how the web can help

12/04/2009 - 14:00
12/04/2009 - 15:00
Etc/GMT

Location(s)

INESC-ID
Portugal

The World Wide Web has revolutionized how researchers from various disciplines collaborate over long distances. This is nowhere more important than in the Life Sciences, where interdisciplinary approaches are becoming increasingly powerful as a driver of both integration and discovery. In this talk I will focus on new data management solutions for the Life Sciences field, showing the desired key features of a web-based data management system. Examples of Web 2.0 applications data standards and semantic web projects in Life Sciences will be presented.

Optimization and Control for Metabolic Networks

11/30/2009 - 14:00
11/30/2009 - 15:00
Etc/GMT

Location(s)

INESC-ID
Portugal

The increasing availability of metabolic network models and data poses new challenges in what concerns optimization. Due to the high level of complexity and uncertainty associated to these networks the suggested models often lack detail and liability, required to determine the proper optimization strategies. A possible approach to overcome this limitation is the combination of both kinetic and stoichiometric models. In the first part of this paper three control optimization methods, Direct Optimization and Bi-level optimization using two different inner-optimization procedures, with different levels of complexity and assuming various degrees of process information, are presented and their results compared using a prototype network. The results obtained show that the bi-level optimization provides a good approximation to networks with incomplete kinetic information. The process of formulating Metabolic Network models and the estimation of its parameters is complex and there is no defined framework to obtain valid solutions. On the second part of this paper, a procedure to estimate parameters using data sets from different experiments is presented. The procedure is illustrated by a case study on the effect of Nisin on Mannitol production by Lactococcus lactis. The obtained results are encouraging, providing a consistent estimate of the model parameters.

Hacking life: how to build a new life form in your computer

10/23/2009 - 14:00
Etc/GMT

Synthetic biology is a new field of research that combines computer models of biological systems with DNA synthesis and genetic engineering techniques in order to design and build new biological functions, systems and organisms. While still in its infancy, this area of research is expected to develop rapidly, so that very soon researchers, companies and hackers will be able to design, build and release in the wild new organisms. In this talk, I will address some questions and challenges posed by this technology, and, in particular, the role that will be played by research areas such as Systems Biology, Bioinformatics and Information Systems in the design of artificial life forms.

Preparing a cyanobacterial chassis for H2 production: a synthetic biology approach

10/09/2009 - 14:00
Etc/GMT

Location(s)

INESC-ID
Portugal

Molecular hydrogen (H2) is an environmentally clean energy carrier that can be a valuable alternative to the limited fossil fuel resources of today. The BioModularH2 project aims at designing reusable, standardized molecular building blocks that integrated into a “chassis” will result in a photosynthetic bacterium containing engineered chemical pathways for competitive, clean and sustainable hydrogen production. For this project the unicellular cyanobacterium Synechocystis sp. PCC 6803 (Synechocystis) is being used as the photoautotrophic “chassis” for this project. To prepare the chassis for an optimal H2 production, the Synechocystis native bidirectional hydrogenase was inactivated. Later on, a synthetic circuit containing a heterologous highly efficient hydrogenase will be introduced into the “chassis”. Due to hydrogenase sensitivity to molecular oxygen, and to provide the anaerobic environment required for an optimal heterologous hydrogenase activity, synthetic oxygen consuming devices are being prepared based on native and heterologous enzymes that use O2 as substrate, and will be subsequently tested. Finally, the integration of the designed synthetic circuits into the “chassis” will provide an anaerobic environment within the cell for an optimized and highly active hydrogenase.

Neurodynamic Optimization with Its Application for Model Predictive Control

09/29/2009 - 11:00
Etc/GMT

Location(s)

INESC-ID
Portugal

Optimization problems arise in a wide variety of scientific and engineering applications. It is computationally challenging when optimization procedures have to be performed in real time to optimize the performance of dynamical systems. For such applications, classical optimization techniques may not be competent due to the problem dimensionality and stringent requirement on computational time. One very promising approach to dynamic optimization is to apply artificial neural networks. Because of the inherent nature of parallel and distributed information processing in neural networks, the convergence rate of the solution process is not decreasing as the size of the problem increases. Neural networks can be implemented physically in designated hardware such as ASICs where optimization is carried out in a truly parallel and distributed manner. This feature is particularly desirable for dynamic optimization in decentralized decisionmaking situations arising frequently in control and robotics. In this talk, I will present the historic review and the state of the art of neurodynamic optimization models and selected applications in robotics and control. Specifically, starting from the motivation of neurodynamic optimization, we will review various recurrent neural network models for optimization. Theoretical results about the stability and optimality of the neurodynamic optimization models will be given along with illustrative examples and simulation results. It will be shown that many problems in control systems, such model predictive control, can be readily solved by using the neurodynamic optimization models. Specifically, linear and nonlinear model predictive control based on neurodynamic optimization will be delineated.

Apt-pbo: Solving the Software Dependency Problem using Pseudo-Boolean Optimization

09/25/2009 - 13:00
Etc/GMT

Location(s)

INESC-ID
Portugal

The installation of software packages (on Linux as well as in other package-driven platforms as eclipse plugins) depends on the correct resolution of dependencies and conflicts between packages. As an NP-complete problem, this is an hard task which todays technology does not address in an acceptable way. This seminar introduces a new approach to solving the software dependency problem in a Linux environment, devising a way for solving dependencies according to available packages and user preferences. We present the “apt-pbo” tool - the first available tool that solves dependencies in a complete and optimal way. The contribution is threefold. Our main finding is an efficient encoding of the dependencies and conflicts as a pseudo-boolean optimization problem without the need of ILP or SAT extra-steps. Second, we achieve this goal without sacrificing performance, a critical issue for a tool with user interaction. Finally, the developed tool is available under a free license allowing enhancement and benchmarking.

Next-generation sequencing (for dummies)

09/10/2009 - 14:00
09/10/2009 - 15:00
Etc/GMT

Location(s)

INESC-ID
Portugal

We present the basics of the new high-throughput sequencing technologies and discuss some of its applications and associated research problems from a bioinformatics perspective.

Single nucleotide polymorphisms characterization in a Portuguese Caucasian breast cancer and control population

07/24/2009 - 14:00
07/24/2009 - 15:00
Etc/GMT

Cancer is a complex somatic genetic disease that is caused mainly by environmental factors. However a few inherited mutations in some critical genes can be associated with cancer development. Breast cancer accounts for one in four of all female cancers, making it the first leading cause of cancer deaths in women in the western world. Numerous epidemiological factors affect the likelihood of developing breast, but no other predictor is as powerful as an inherited mutation in the tumour-suppressor genes BRCA1 or BRCA2. TP53 was deemed a plausible candidate as well. Hereditary breast cancer accounts for only 5–10% of all breast cancer cases and individuals carrying mutations in one of these genes have a 40–80% chance of developing breast cancer, making these mutations the strongest breast cancer predictors known. The other 90-95% of breast cancer cases are sporadic and occur in women in the absence of mutations in the referred susceptibility genes. This way the identification of a plausible cause for the remaining sporadic cases is a challenging work. Recent evidence shows that there are probably background genetic factors that contribute to the development of sporadic breast cancer, such as single nucleotide polymorphisms (SNPs). The emergence of comprehensive high density maps of SNPs and affordable genotyping platforms has allowed the accomplishment of association studies. Due to linkage disequilibrium, a panel of a few hundred thousand reporter SNPs (tSNPs) can be used as tags for the majority of the millions of common variants in the genome. Statistical approaches have been extensively used for the purpose of inferring haplotypes from diploid population data. An alternative, but not very explored, approach is called the Pure-Parsimony approach. This approach finds a solution to the haplotype inference problem that minimizes the total number of distinct haplotypes used, using the well know fact that haplotypes are, in general, much less numerous than genotypes. In order to get real data to develop satisfiability models and algorithms for the problem of haplotype inference by pure parsimony, a set of breast cancer patients and control populations was genotyped. To achieve this goal, approximately 100 breast cancer patients were recruited in Oncologic Units of several Lisbon Hospitals. Each cancer patient was matched, when possible, with two healthy control individuals, with the same age, tobacco smoking status and alcohol consumption habits. A second control population (about 50 individuals) characterized by the absence of breast cancer was also used to help in ascertaining the possible role of the gene polymorphisms under study as a control population. This population was identified in the Indian reserve of Sangradouro (Mato Grosso, Brasil) where the predominant ethnic group is Xavante. For each cancer and control populations 7 SNPs in BRCA1, 19 SNPs in BRCA2 and 6 SNPs in TP53 genes were genotyped, using real-time PCR, in particularly, TaqMan® SNP Genotyping Assays from Applied Biosystems. Since the majority of genotyped SNPs were tag of other ones, the real number of SNPs analyzed is much superior then those 32 analyzed in all three genes. These experiments are expected to give sufficient data to clarify the effects of variation in SNPs in the breast cancer susceptibility, and to explain specific characteristics of the populations under study that are of great interest to science.

CSI: are Mendel's data too good to be true?

07/17/2009 - 14:00
07/17/2009 - 15:00
Etc/GMT

Gregor Mendel (1822-1884) is almost unanimously recognized as the founder of modern genetics. However, long ago, a shadow of doubt was cast on his integrity by another eminent scientist, the statistician and geneticist, Sir Ronald Fisher (1890?1962), who questioned the honesty of the data that form the core of Mendel's work. This issue, nowadays called the Mendel-Fisher controversy, can be traced back to 1911, when Fisher first presented his doubts about Mendel's results, though he only published a paper with his analysis of Mendel's data in 1936. A large number of papers have been published about this controversy culminating with the publication in 2008 of a book (Franklin et al., Ending the Mendel-Fisher controversy) aiming at ending the issue, definitely rehabilitating Mendel image. However, quoting from Franklin et al., the issue of the too good to be true aspect of Mendel's data found by Fisher still stands. We have submitted Mendel data and Fisher's statistical analysis to extensive computations and simulations attempting to discover an hidden explanation or hint that could help finding an answer to the questions: is Fisher right or wrong, and if Fisher is right is there any reasonable explanation for the too good to be true, other than deliberate fraud? In this talk some results of this investigation and the conclusions obtained will be presented.