INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa


Identification of microRNAs and analysis of their expression in Eucalyptus globulus

04/12/2013 - 11:00
04/12/2013 - 12:00

Portugal is one the largest producers of pulp derived from Eucalyptus
globulus, making it a fun- damental species for the country. The
selection of adequate genotypes would make the exploitation of
cultivation areas more efficient. A key objective is to understand the
regulatory mechanisms impacting wood characteristics. Here we focus on
microRNA-mediated regulation. MicroRNAs are endogenous molecules that
act by silencing targeted messenger RNAs. Although approximately
21,000 microRNAs have been identified for many species, none is
documented for the Eucalyptus genus. Here, we propose a pipeline that
makes use of Cravela, a single-genome miRNA finding tool, and a new
NGS data analysis algorithm that provides a novel scoring function to
evaluate the expression profile of candidates. This approach produced
a short list of candidates, including both conserved and non-conserved
sequences. Experimental validation showed amplification in 4 out of 5
candidates chosen from the best-scoring non-conserved sequences.

The NEUROCLINOMICS project - Part II: the ALS case study

03/22/2013 - 11:00
03/22/2013 - 12:00

After the general introduction of the NEUROCLINOMICS project and
discussion of the Alzheimer's Disease case study, we now focus on the
Amyotrophic Lateral Sclerosis problem. The relevant questions we wish to
address are discussed, starting from a diagnosis problem, and then
considering a prognostic prediction. The available datasets are shown,
with their particular characteristics, which many times present a real
challenge by themselves. Finally, we present and discuss the chosen
approaches to deal with the different problems, ending with the outline of
future work.


03/01/2013 - 11:00
03/01/2013 - 12:00

The need for integrative approaches to provide a broader understanding of brain related pathologies, in general, and neurogenerative diseases, in particular, has been largely recognised. These approaches should infer relationships between omics, clinical, and personal data. In NEUROCLINOMICS, we are interested in the development of innovative approaches to understanding neurodegenerative diseases through heterogeneous data integration. We work on the development of a sophisticated knowledge discovery system to integrate powerful data mining algorithms to unravel potentially relevant links between omics and clinical data. Disease diagnostic and prognostic markers, disease progression rates, and patient profiles, are tackled. Together with the challenging task of studying complex diseases we also embrace the challenging topic of developing efficient and effective mining algorithms for biomedical data integration. We now use Amyotrophic Lateral Sclerosis and Alzheimer's disease as case studies.


02/01/2013 - 11:00
02/01/2013 - 12:00

Abstract: This talk introduces a method to design a distributed sensor network for field reconstruction that is minimal with respect to a communication cost function. This cost function is given by the sum of communication between sensors and that of a subset of sensors used for backbone communication.
To achieve this goal, we want to create an observable distributed sensor network, where through the (at most the number of sensors) measurements collected by the central authority, the central authority can recover the initial parameters at different sensors location. To achieve this goal, we need to first decide which sensors should communicate and after design the weights by which each sensor should update their states with those of its neighbors, in other words, the distributed sensor network dynamics. In addition, we need to identify a subset of sensors that can report their state to a central location, corresponding to the design of the backbone
reporting function. The joint design of the sensor network dynamics and the backbone reporting function to recover the initial state of the dynamic system justifies the notion of an observable distributed sensor network.
We show an efficient algorithm for designing the optimal observable distributed sensor network
for a given set of sensors and cost function, providing an illustrative example.

Understanding the mechanisms of virulence and resistance

01/18/2013 - 11:00
01/18/2013 - 12:00

Infectious diseases remain among the major causes of human death in the world. Several infections at hospitals are due to opportunistic pathogens, microorganisms that rarely infect healthy people, but are a frequent cause of infection in people with basal diseases, who are immunodepressed or debilitated. Environmental bacteria, frequently antibiotic resistant, constitute a large percentage of those pathogens. Our work focuses on understanding the mechanisms of virulence and resistance, as well as possible crosstalk, of these pathogens. Within this scope, in the last two years, we have been defining those genes whose mutation changes the phenotype of antibiotic susceptibility. As a result, we have selected nearly three hundred genes for future analysis and are currently studying whether those mutations that challenge intrinsic resistance also alter the virulence of Pseudomonas aeruginosa and Stenotrophomonas maltophilia. We found that mutations in several genes encoding proteins from different categories that include multidrug efflux pumps, two component systems, metabolic enzymes or global regulators, simultaneously alter the antibiotic susceptibility and the virulence of P. aeruginosa. Another opportunistic pathogen we are working with is S. maltophilia, which is characterized by its intrinsic low susceptibility to several antibiotics. Part of this low susceptibility relies on the expression of chromosomally-encoded multidrug efflux pumps. Including this, the metagenome approach infers new pathways to explain the transmission of antibiotic enconded genes caused by horizontal gene transfer.

Semantics and Fitness Landscapes in Genetic Programming

12/07/2012 - 11:00
12/07/2012 - 12:00

Abstract: Moraglio et al. have recently introduced new genetic
operators for genetic
programming, called geometric semantic operators. These operators induce
a unimodal fitness landscape for all the problems consisting in matching
data with known target outputs (like regression and classification). This
facilitates genetic programming evolvability, which makes these
operators extremely
promising. Nevertheless, Moraglio et al. leave one big open problem:
these operators, by construction, always produce offspring that are
larger than their parents,
causing an exponential growth in the size of the individuals, which
actually renders
them useless in practice.
In this seminar, I offer a general introduction to optimization, to
fitness landscapes
and to evolutionary computation. After that, I present geometric semantic
operators and I show that they induce a unimodal fitness landscape
on every possible instance of regression and classification.
Finally, after discussing the limitation of geometric semantic operators,
I show a new efficient implementation of them, recently proposed
by myself in collaboration with Sara Silva, Mauro Castelli and Luca
This allows us, for the first time, to use them on complex real-life
like the two problems in pharmacokinetics that I discuss in the seminar.
The presented experiments confirm the excellent evolvability
of geometric semantic operators, demonstrated by the good results
obtained on
training data. Furthermore, I show that we have also achieved a
good generalization ability, and I discuss the fact it that can be
considering some properties of geometric semantic operators, which makes
them even more appealing than before.

Evolutionary reaction systems

11/23/2012 - 11:00
11/23/2012 - 12:00

Abstract: In the recent years many bio-inspired computational methods
were defined
and successfully applied to real life problems. Examples of those methods
are particle swarm optimization, ant colony, evolutionary algorithms, and many
others. At the same time, computational formalisms inspired by natural systems
were defined and their suitability to represent different functions
efficiently was
studied. One of those is a formalism known as reaction systems. The aim of this
work is to establish, for the first time, a relationship between
evolutionary algorithms
and reaction systems, by proposing an evolutionary version of reaction
systems. In this paper we show that the resulting new genetic programming system
has better, or at least comparable performances to a set of well known machine
learning methods on a set of problems, also including real-life applications.
Furthermore, we discuss the expressiveness of the solutions evolved by
the presented
evolutionary reaction systems.

Modelos e Métodos para Alinhamento de Transcritoma

11/09/2012 - 11:00
11/09/2012 - 12:00

Abstract:In recent years, the introduction of new DNA sequencing
platforms dramatically changed the landscape of genetic studies. These
protocols for next-generation sequencing (NGS) are able to generate
massive amounts of data, requiring the creation of new computational
tools to deal with this data quickly and economically. With the
development of the RNA-Seq methodology, which uses the new sequencing
protocols to get information about RNA samples, the study of the
transcriptome gained a new boost. Problems such as the identification of
genes expression levels and alternative splicing can be solved with the
assembly and the study of the transcriptome. At the same time, the use
of this technology has the great advantage of allowing new biological
discoveries and observations. This technology has, however, the downside
of requiring a very considerable computational effort. This work aims to
present a detailed study about the problem of transcriptome alignment,
presenting an efficient computational solution, which requires the
development of heuristics to identify splice junctions using methods and
data structures for an efficient mapping.

Biclustering-based Classification: a case study with Hypnograms

10/26/2012 - 11:00
10/26/2012 - 12:00

Abstract: Hypnograms are a schematic representation of sleep,
depicting its different stages through the night. This work
presents a new strategy for classifying these data, based on CCCBiclustering
for time series. By exploring local sleep patterns, one
might be able to find significant ones which are characteristic to
a given group of patients. The developed method is applied on
a dataset consisting of hypnograms for six different pathologic
populations and a control group. Although the preliminary
results are not so satisfactory, future work is discussed, showing
promising potentialities.

Sucint structures to self-indexing text

09/28/2012 - 11:00
09/28/2012 - 12:00

The development of applications that manage large text
collections needs indexing methods which allow efficient retrieval
over text. Several indexes have been proposed which try to reach a
good trade-off between the space needed to store both the text and the
index, and its search efficiency.

Self-indexes are becoming more and more popular in the last years. Not
only they index the text, but they keep enough information to recover
any portion of it without the need of keeping it explicitly.
Therefore, they actually replace the text.

In this talk I will present two useful self-index with good
properties. They need only about a 35% of the space of the plain text,
but they can efficiently answer retrieval queries thanks to their
indexing capabilities.