Seminars

Identification and quantification of reachable attractors over asynchronous discrete dynamics

Submitted by lsr on Tue, 12/10/2013 - 09:38.

Start: 12/19/2014 - 14:30

End: 12/19/2014 - 15:30

Timezone: Etc/GMT

Models of discrete concurrent systems often lead to huge and complex
state transition graphs that represent their dynamics.
Here, we are particularly interested in logical models of biological
regulatory networks. Given an initial condition, it is of real interest
to identify reachable attractors that denote the potential asymptotical
behaviours of the system. These attractors are described as terminal
strongly connected components, that are either single (stable) states or
sets of states (denoting cyclical behaviours).

Beyond attractor identification, we propose to assess the probability to
reach each of them from an initial condition or from any portion of the
state space, relying on the structure of the state transition graph.
First, we present a solution to the problem with an original algorithm
called FIREFRONT, based on the exhaustive exploration of the reachable
state space. Then, for the cases where FIREFRONT is not applicable, we
define a modified Monte Carlo simulation, termed AVATAR.

» Array Array

A data mining approach to study disease presentation patterns in Primary Progressive Aphasia.

Submitted by lsr on Fri, 11/29/2013 - 13:33.

Start: 12/05/2014 - 14:30

End: 12/05/2014 - 15:30

Timezone: Etc/GMT

Nowadays the world is faced with an ageing population and the related challenges, as
healthcare issues given the current incidence of diseases more prevalent in elders, such as
neurodegenerative diseases. Primary Progressive Aphasia (PPA) is a neurodegenerative disease
characterized by a gradual dissolution of language abilities, being these patients regarded with
special attention since they possess higher risk to evolve to dementia. Consequently,
discovering the different subtypes of PPA patients is fundamental to the timely administration
of pharmaceutics and therapeutic interventions, improving patient's quality of life.
This thesis aims to propose a data mining approach to extract relevant knowledge from
clinical data, namely to learn the variants of PPA. Initially, standard clustering algorithms were
applied with the purpose of studying the number of groups existent in the dataset and
eventually, study the potential existence of new groups, different from the PPA subtypes
already defined in the literature. Then, during a second phase, supervised learning techniques
were used to analyze patients according to their clinical classification in one of the three PPA
variants and develop a new and accurate classification model.
The unsupervised learning analysis pointed to the existence of two main groups in the
dataset analyzed in this work. This study included the evaluation of diverse sets of attributes in
order to access which type/set of attributes produced better results. Finally, two new
methodologies for classifying patients with PPA were developed, reaching good accuracies in
the dataset under study. One of those methodologies enables the identification of instances
which are (potentially) not from any of the already defined three PPA subtypes.

» Array Array

Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis

Submitted by lsr on Thu, 01/30/2014 - 15:31.

Start: 09/26/2014 - 14:30

End: 09/26/2014 - 15:30

Timezone: Etc/GMT

Next Generation Sequecing (NGS) is a set of molecular biology technologies
which generate, at low cost, many millions of short nucleotide reads. Typical
datasets consist of tens of millions of reads, with each read comprising 35-500
basepairs (depending on the technology used, different read sizes can be
obtained).

There are many tools for handing these datasets. However, they must still be
combined to build a full analysis pipeline. Current solutions to build these
pipelines are Make-like tools which can handle text-files and Unix-like
commands. Several GUI-based solutions allow users who are not comfortable with
the command line to build and run these pipelines. However, they still operate
at the semantic level of Make: file dependencies and transformation commands.

Because each problem and each variation on the technology requires a
different processing pipeline, it would be impossible to design a single
pipeline for every need. This paper aims at the description of a context aware tool
that will allow for the first phase of NGS analysis.

» Array Array

Data integration tools for pre-processing biological data

Submitted by ptgm on Thu, 05/22/2014 - 23:47.

Start: 06/26/2014 - 14:30

End: 06/26/2014 - 15:30

Timezone: Etc/GMT

The increasing use of Electronic Health Records (EHRs) enables a better analysis of patient data, improving the quality of medical care. EHRs must be processed in order to provide a variety of services to the physician, such as risk classification and summarization. EHRs usually are stored in unstructured text or Excel files containing different data formats and types, missing information, and, sometimes, inconsistent information. Therefore, before analyzing the data, we often need to transform and integrate it. In this presentation, we show some examples of data integration tools that can be used to extract and transform data. As example, we use an Excel file containing exam information regarding patients with ALS (Amyotrophic Lateral Sclerosis).

» Array Array

The Biodegradation and Surfactants Database

Submitted by ptgm on Thu, 06/05/2014 - 19:56.

Start: 06/12/2014 - 14:30

End: 06/12/2014 - 15:30

Timezone: Etc/GMT

The Biodegradation and Surfactants Database (BioSurfDB) is a curated relational information system currently integrating 14 metagenomes, 137 organisms, 73 biodegradation relevant genes, 62 proteins and 6 of their metabolic pathways; 29 documented bioremediation experiments, with specific pollutants treatment efficiencies by surfactant producing organisms; and a 46 biosurfactants curated list, grouped by producing organism, surfactant name and class and reference.

Our goal is to gather published and novel information on the identification and characterization of genes involved in Oil Biodegradation and Bioremediation of polluted environments and provide it in a curated way together with a series of computational tools to aid biology studies.

» Array Array

Integrative biomarker discovery in neurodegenerative diseases: a survey

Submitted by ptgm on Thu, 04/17/2014 - 21:24.

Start: 04/24/2014 - 14:30

End: 04/24/2014 - 15:30

Timezone: Etc/GMT

Data mining has been widely applied in biomarker discovery, resulting in
significant findings of different clinical and biological biomarkers. With
developments in technology, from genomics to proteomics analysis, a deluge
of data has become available, as well as standardized data repositories.
Nonetheless, researchers are still facing important challenges in
analyzing the data, especially when considering the complexity of pathways
involved in biological processes or diseases. Data from single sources
seem unable to explain complex processes, such as the ones involved in
brain related disorders, thus rising the need for a more comprehensive
perspective. A possible solution relies on data and model integration,
where several data types are combined to provide complementary views,
which in turn can result in the discovery of previously unknown
biomarkers, by unravelling otherwise hidden relationships between data of
different sources. In this work, we review the different single-source
types of data used for biomarker discovery in neurodegenerative diseases,
and then proceed to provide an overview on recent efforts to perform
integrative analysis in these disorders, discussing major challenges and
advantages.

» Array Array

Novel metric for the use of Minimum Spanning Trees in phylogenetic trees studies

Submitted by ptgm on Mon, 03/31/2014 - 09:22.

Start: 04/03/2014 - 14:30

End: 04/03/2014 - 15:30

Timezone: Etc/GMT

The use of trees for phylogenetic representations started in the
middle of the 19th century. One of their most popular uses is Charles
Darwin's sole illustration in "The Origin of Species" [4]. The
simplicity of the tree representation makes it still the method of
choice today to easily convey the diversification and relationships
between species. Yet trees suffer from several drawbacks that are not
always clear to researchers. Since several different algorithms can be
used to infer and draw the tree, one must be aware of each algorithm's
set of assumptions.
In the analysis of sequence-based microbial typing methods, Minimum
Spanning Trees (MSTs) are becoming the standard for representing
relationships between strains. However, these suffer from several
limitations that can mislead in the interpretation of the resulting
tree. The fact that a single tree is reported from a multitude of
possible and equally optimal solutions and that no statistical metrics
exist to evaluate them, justified a recent heuristic approach to
address these issues.
We present a new edge betweenness metric for undirected and weighted
graphs. This metric is defined as the fraction of minimum spanning
trees where a given edge is present and it was motivated by the
necessity of evaluating phylogenetic trees. Moreover we provide
results and methods concerning the exact computation of this metric
based on the well-known Kirchhoff's matrix tree theorem.

» Array Array

Extracting academic data and linked data anonymization

Submitted by ptgm on Fri, 03/14/2014 - 10:39.

Start: 03/20/2014 - 14:30

End: 03/20/2014 - 15:30

Timezone: Etc/GMT

Data is becoming more valuable each day as more diverse and rich
data sources become available, allowing us to discover knowledge
on unprecedented ways.

IST uses FénixEdu information system for managing most of internal
data. The system contains data about students, teachers, employees,
courses, and all major aspects of IST as an organization. Such data
may be useful for both external agents and, more importantly, for IST
itself to study our academic environment. Data may be used as input
for state-of-art IR and KD technologies to extract newer and deeper
knowledge about academic agents allowing to solve problems on and to
understand better our community.

Releasing this kind of data publicly comprises an additional
step in what concerns privacy preserving of referred individuals and,
as has been shown, simple de-identification may not be enough to achieve
such goal. On the other hand we must deal with both internal and
external data, on top of an evolving environment, where linked data
based approaches can definitely help us to deal with such complexity.
In this talk we will discuss a solution for exposing, sharing, and
connecting data, information, and knowledge available on IST information
system, taking into consideration privacy and anonymity issues.

» Array Array

Network mining based analysis of whole brain functional connectivity

Submitted by lsr on Fri, 02/28/2014 - 14:52.

Start: 03/06/2014 - 14:30

End: 03/06/2014 - 15:30

Timezone: Etc/GMT

Mapping the human brain has been a topic of interest for the last few
decades. In spite of its incredible complexity it is now possible to
map the brain using a combination of advanced data representation and
data processing algorithms supported on the huge computational power
that is available nowadays. In this work we describe an approach for
mapping whole-brain functional connectivity. The starting point of our
work is a set of high resolution functional magnetic resonance images
(fMRI) obtained with a 7T magnetic field that cover a wider brain
volume than usual. The fMRIs are then used to build the so called
brain functional connectivity network. These networks extracted from
the brain can be represented as graphs, i.e., a set of nodes (regions)
and a set of edges connecting such nodes. With the networks
represented as graphs we apply network mining techniques to them,
namely clustering and modularity algorithms that allow us, for
instance, to identify functional modules of the brain. Presumably, the
increased resolution will allow to obtain more detailed information
and potential to uncover additional structure. Due to the size of the
graphs all the algorithms must be optimized in order to minimize the
used resources.

» Array Array

Computational prediction of microRNA targets in plant genomes

Submitted by lsr on Mon, 02/03/2014 - 09:56.

Start: 02/20/2014 - 14:30

End: 02/20/2014 - 15:30

Timezone: Etc/GMT

MicroRNAs (miRNAs) are important posttranscriptional regulators and
act by recognizing and binding to sites in their target messenger RNAs
(mRNAs). They are present in nearly all eukaryotes, in particular in
plants, where they play important roles in developmental and stress
response processes by targeting mRNAs for cleavage or translational
repression. MiRNAs have been shown to have a crucial role in gene
expression regulation, but so far only a few miRNA targets in plants
have been experimentally validated. Based on the number of identified
genes, on the number of experimentally validated miRNAs and on the
fact that one miRNA often regulates multiple genes, a long list of yet
unidentified targets is to be expected. Here, we present a novel miRNA
target prediction method for plants, that incorporates an evolutionary
approach. With this approach, we intend to understand whether a
transcript shows evidence of exhibiting a sequence bias towards either
eliciting or avoiding target sites for a particular miRNA.

» Array Array

kdbio

Navigation

Seminars

Identification and quantification of reachable attractors over asynchronous discrete dynamics

A data mining approach to study disease presentation patterns in Primary Progressive Aphasia.

Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis

Data integration tools for pre-processing biological data

The Biodegradation and Surfactants Database

Integrative biomarker discovery in neurodegenerative diseases: a survey

Novel metric for the use of Minimum Spanning Trees in phylogenetic trees studies

Extracting academic data and linked data anonymization

Network mining based analysis of whole brain functional connectivity

Computational prediction of microRNA targets in plant genomes

User login

Syndicate

Navigation Content Events Reading Groups Seminars Templates & Logos Forums Recent posts Create content	Seminars Identification and quantification of reachable attractors over asynchronous discrete dynamics Submitted by lsr on Tue, 12/10/2013 - 09:38. Start: 12/19/2014 - 14:30 End: 12/19/2014 - 15:30 Timezone: Etc/GMT Models of discrete concurrent systems often lead to huge and complex state transition graphs that represent their dynamics. Here, we are particularly interested in logical models of biological regulatory networks. Given an initial condition, it is of real interest to identify reachable attractors that denote the potential asymptotical behaviours of the system. These attractors are described as terminal strongly connected components, that are either single (stable) states or sets of states (denoting cyclical behaviours). Beyond attractor identification, we propose to assess the probability to reach each of them from an initial condition or from any portion of the state space, relying on the structure of the state transition graph. First, we present a solution to the problem with an original algorithm called FIREFRONT, based on the exhaustive exploration of the reachable state space. Then, for the cases where FIREFRONT is not applicable, we define a modified Monte Carlo simulation, termed AVATAR. » Array Array A data mining approach to study disease presentation patterns in Primary Progressive Aphasia. Submitted by lsr on Fri, 11/29/2013 - 13:33. Start: 12/05/2014 - 14:30 End: 12/05/2014 - 15:30 Timezone: Etc/GMT Nowadays the world is faced with an ageing population and the related challenges, as healthcare issues given the current incidence of diseases more prevalent in elders, such as neurodegenerative diseases. Primary Progressive Aphasia (PPA) is a neurodegenerative disease characterized by a gradual dissolution of language abilities, being these patients regarded with special attention since they possess higher risk to evolve to dementia. Consequently, discovering the different subtypes of PPA patients is fundamental to the timely administration of pharmaceutics and therapeutic interventions, improving patient's quality of life. This thesis aims to propose a data mining approach to extract relevant knowledge from clinical data, namely to learn the variants of PPA. Initially, standard clustering algorithms were applied with the purpose of studying the number of groups existent in the dataset and eventually, study the potential existence of new groups, different from the PPA subtypes already defined in the literature. Then, during a second phase, supervised learning techniques were used to analyze patients according to their clinical classification in one of the three PPA variants and develop a new and accurate classification model. The unsupervised learning analysis pointed to the existence of two main groups in the dataset analyzed in this work. This study included the evaluation of diverse sets of attributes in order to access which type/set of attributes produced better results. Finally, two new methodologies for classifying patients with PPA were developed, reaching good accuracies in the dataset under study. One of those methodologies enables the identification of instances which are (potentially) not from any of the already defined three PPA subtypes. » Array Array Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis Submitted by lsr on Thu, 01/30/2014 - 15:31. Start: 09/26/2014 - 14:30 End: 09/26/2014 - 15:30 Timezone: Etc/GMT Next Generation Sequecing (NGS) is a set of molecular biology technologies which generate, at low cost, many millions of short nucleotide reads. Typical datasets consist of tens of millions of reads, with each read comprising 35-500 basepairs (depending on the technology used, different read sizes can be obtained). There are many tools for handing these datasets. However, they must still be combined to build a full analysis pipeline. Current solutions to build these pipelines are Make-like tools which can handle text-files and Unix-like commands. Several GUI-based solutions allow users who are not comfortable with the command line to build and run these pipelines. However, they still operate at the semantic level of Make: file dependencies and transformation commands. Because each problem and each variation on the technology requires a different processing pipeline, it would be impossible to design a single pipeline for every need. This paper aims at the description of a context aware tool that will allow for the first phase of NGS analysis. » Array Array Data integration tools for pre-processing biological data Submitted by ptgm on Thu, 05/22/2014 - 23:47. Start: 06/26/2014 - 14:30 End: 06/26/2014 - 15:30 Timezone: Etc/GMT The increasing use of Electronic Health Records (EHRs) enables a better analysis of patient data, improving the quality of medical care. EHRs must be processed in order to provide a variety of services to the physician, such as risk classification and summarization. EHRs usually are stored in unstructured text or Excel files containing different data formats and types, missing information, and, sometimes, inconsistent information. Therefore, before analyzing the data, we often need to transform and integrate it. In this presentation, we show some examples of data integration tools that can be used to extract and transform data. As example, we use an Excel file containing exam information regarding patients with ALS (Amyotrophic Lateral Sclerosis). » Array Array The Biodegradation and Surfactants Database Submitted by ptgm on Thu, 06/05/2014 - 19:56. Start: 06/12/2014 - 14:30 End: 06/12/2014 - 15:30 Timezone: Etc/GMT The Biodegradation and Surfactants Database (BioSurfDB) is a curated relational information system currently integrating 14 metagenomes, 137 organisms, 73 biodegradation relevant genes, 62 proteins and 6 of their metabolic pathways; 29 documented bioremediation experiments, with specific pollutants treatment efficiencies by surfactant producing organisms; and a 46 biosurfactants curated list, grouped by producing organism, surfactant name and class and reference. Our goal is to gather published and novel information on the identification and characterization of genes involved in Oil Biodegradation and Bioremediation of polluted environments and provide it in a curated way together with a series of computational tools to aid biology studies. » Array Array Integrative biomarker discovery in neurodegenerative diseases: a survey Submitted by ptgm on Thu, 04/17/2014 - 21:24. Start: 04/24/2014 - 14:30 End: 04/24/2014 - 15:30 Timezone: Etc/GMT Data mining has been widely applied in biomarker discovery, resulting in significant findings of different clinical and biological biomarkers. With developments in technology, from genomics to proteomics analysis, a deluge of data has become available, as well as standardized data repositories. Nonetheless, researchers are still facing important challenges in analyzing the data, especially when considering the complexity of pathways involved in biological processes or diseases. Data from single sources seem unable to explain complex processes, such as the ones involved in brain related disorders, thus rising the need for a more comprehensive perspective. A possible solution relies on data and model integration, where several data types are combined to provide complementary views, which in turn can result in the discovery of previously unknown biomarkers, by unravelling otherwise hidden relationships between data of different sources. In this work, we review the different single-source types of data used for biomarker discovery in neurodegenerative diseases, and then proceed to provide an overview on recent efforts to perform integrative analysis in these disorders, discussing major challenges and advantages. » Array Array Novel metric for the use of Minimum Spanning Trees in phylogenetic trees studies Submitted by ptgm on Mon, 03/31/2014 - 09:22. Start: 04/03/2014 - 14:30 End: 04/03/2014 - 15:30 Timezone: Etc/GMT The use of trees for phylogenetic representations started in the middle of the 19th century. One of their most popular uses is Charles Darwin's sole illustration in "The Origin of Species" [4]. The simplicity of the tree representation makes it still the method of choice today to easily convey the diversification and relationships between species. Yet trees suffer from several drawbacks that are not always clear to researchers. Since several different algorithms can be used to infer and draw the tree, one must be aware of each algorithm's set of assumptions. In the analysis of sequence-based microbial typing methods, Minimum Spanning Trees (MSTs) are becoming the standard for representing relationships between strains. However, these suffer from several limitations that can mislead in the interpretation of the resulting tree. The fact that a single tree is reported from a multitude of possible and equally optimal solutions and that no statistical metrics exist to evaluate them, justified a recent heuristic approach to address these issues. We present a new edge betweenness metric for undirected and weighted graphs. This metric is defined as the fraction of minimum spanning trees where a given edge is present and it was motivated by the necessity of evaluating phylogenetic trees. Moreover we provide results and methods concerning the exact computation of this metric based on the well-known Kirchhoff's matrix tree theorem. » Array Array Extracting academic data and linked data anonymization Submitted by ptgm on Fri, 03/14/2014 - 10:39. Start: 03/20/2014 - 14:30 End: 03/20/2014 - 15:30 Timezone: Etc/GMT Data is becoming more valuable each day as more diverse and rich data sources become available, allowing us to discover knowledge on unprecedented ways. IST uses FénixEdu information system for managing most of internal data. The system contains data about students, teachers, employees, courses, and all major aspects of IST as an organization. Such data may be useful for both external agents and, more importantly, for IST itself to study our academic environment. Data may be used as input for state-of-art IR and KD technologies to extract newer and deeper knowledge about academic agents allowing to solve problems on and to understand better our community. Releasing this kind of data publicly comprises an additional step in what concerns privacy preserving of referred individuals and, as has been shown, simple de-identification may not be enough to achieve such goal. On the other hand we must deal with both internal and external data, on top of an evolving environment, where linked data based approaches can definitely help us to deal with such complexity. In this talk we will discuss a solution for exposing, sharing, and connecting data, information, and knowledge available on IST information system, taking into consideration privacy and anonymity issues. » Array Array Network mining based analysis of whole brain functional connectivity Submitted by lsr on Fri, 02/28/2014 - 14:52. Start: 03/06/2014 - 14:30 End: 03/06/2014 - 15:30 Timezone: Etc/GMT Mapping the human brain has been a topic of interest for the last few decades. In spite of its incredible complexity it is now possible to map the brain using a combination of advanced data representation and data processing algorithms supported on the huge computational power that is available nowadays. In this work we describe an approach for mapping whole-brain functional connectivity. The starting point of our work is a set of high resolution functional magnetic resonance images (fMRI) obtained with a 7T magnetic field that cover a wider brain volume than usual. The fMRIs are then used to build the so called brain functional connectivity network. These networks extracted from the brain can be represented as graphs, i.e., a set of nodes (regions) and a set of edges connecting such nodes. With the networks represented as graphs we apply network mining techniques to them, namely clustering and modularity algorithms that allow us, for instance, to identify functional modules of the brain. Presumably, the increased resolution will allow to obtain more detailed information and potential to uncover additional structure. Due to the size of the graphs all the algorithms must be optimized in order to minimize the used resources. » Array Array Computational prediction of microRNA targets in plant genomes Submitted by lsr on Mon, 02/03/2014 - 09:56. Start: 02/20/2014 - 14:30 End: 02/20/2014 - 15:30 Timezone: Etc/GMT MicroRNAs (miRNAs) are important posttranscriptional regulators and act by recognizing and binding to sites in their target messenger RNAs (mRNAs). They are present in nearly all eukaryotes, in particular in plants, where they play important roles in developmental and stress response processes by targeting mRNAs for cleavage or translational repression. MiRNAs have been shown to have a crucial role in gene expression regulation, but so far only a few miRNA targets in plants have been experimentally validated. Based on the number of identified genes, on the number of experimentally validated miRNAs and on the fact that one miRNA often regulates multiple genes, a long list of yet unidentified targets is to be expected. Here, we present a novel miRNA target prediction method for plants, that incorporates an evolutionary approach. With this approach, we intend to understand whether a transcript shows evidence of exhibiting a sequence bias towards either eliciting or avoiding target sites for a particular miRNA. » Array Array 12 3 4 5 6 7 8 9…next ›last »	User login Username: * Password: * Create new account Request new password Syndicate



© 2005, Inesc-ID. All rights reserved