Seminars

Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis

Submitted by lsr on Thu, 01/30/2014 - 15:37.

Start: 02/06/2014 - 14:30

End: 02/06/2014 - 15:30

Timezone: Etc/GMT

Next Generation Sequecing (NGS) is a set of molecular biology technologies
which generate, at low cost, many millions of short nucleotide reads. Typical
datasets consist of tens of millions of reads, with each read comprising 35-500
basepairs (depending on the technology used, different read sizes can be
obtained).

There are many tools for handing these datasets. However, they must still be
combined to build a full analysis pipeline. Current solutions to build these
pipelines are Make-like tools which can handle text-files and Unix-like
commands. Several GUI-based solutions allow users who are not comfortable with
the command line to build and run these pipelines. However, they still operate
at the semantic level of Make: file dependencies and transformation commands.

Because each problem and each variation on the technology requires a
different processing pipeline, it would be impossible to design a single
pipeline for every need. This paper aims at the description of a context aware tool
that will allow for the first phase of NGS analysis.

» Array Array

Evaluating differential gene expression using RNA-sequencing data

Submitted by lsr on Mon, 11/25/2013 - 12:13.

Start: 11/28/2013 - 14:30

End: 11/28/2013 - 15:30

Timezone: Etc/GMT

Unlike the genome, the cell transcriptome is dynamic and specific for a given cell developmental stage or physiological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells. Recently, developments of high-throughput DNA sequencing methodologies have provided a new method to sequence RNA at unprecedented high resolutions. This method is termed RNA-Seq and has been emerging as the preferred technology for both characterization and quantification of the cell transcripts.

Bearing this in mind, in this thesis I propose a bioinformatics pipeline to compare two RNA-Seq samples. This pipeline permits biological insight into the analysed samples, by extracting the main biological processes that are differentially active among the samples in analysis. Subsequent to this pipeline, I developed a novel methodology to inspect the activation of a given cellular pathway in a time-course RNA-Seq dataset.

The evaluation of a Listeria monocytogenes RNA-Seq dataset with the developed tools testified its proper functioning. It was possible to identify global changes in the human host transcriptome and associate these changes to different stages of the Listeria monocytogenes infection lifecycle.

» Array Array

MetaGen-FRAME

Submitted by lsr on Thu, 10/24/2013 - 16:21.

Start: 10/31/2013 - 14:30

End: 10/31/2013 - 15:30

Timezone: Etc/GMT

Metagenomics is the study of metagenomes, unprocessed genetic material residing in the most varied
sites, without separation into individual organisms. Metagenomic approaches to the study of biological
communities are quickly changing our understanding of the function and inter-relationships among
living organisms in ecosystems. The rapid advances in metagenomics are largely due to the hasty development
of high throughput platforms for deoxyribonucleic acid (DNA) sequencing, that need to be
accompanied by significant advances in data analysis techniques.
With this work, I intended to develop and apply new techniques for data analysis that can be applied
to large amounts of data generated by metagenomics. This document presents a proposal to address the
challenges posed by the storage and manipulation of such information types and the need to develop
new data analysis techniques that can be applied directly to this problem. For this purpose, there was
an intention to harness the power of parallel computing.
The target-result of this thesis was MetaGen-FRAME, a metagenomic framework capable of handling
heterogeneous data types (from DNA sequences to genome, proteome and metabolome annotations)
though the use of different data structures and computational approaches.

» Array Array

On Multi-class Classification Problems Using Genetic Programming

Submitted by lsr on Tue, 10/15/2013 - 13:46.

Start: 10/24/2013 - 14:30

End: 10/24/2013 - 15:30

Timezone: Etc/GMT

Genetic Programming (GP) is a field under the hood of Evolutionary
Computing, that has been successful in addressing a variety of
problems in the field of data mining and machine learning,
notexcluding the problems of multi-class classification
(mcc). However, its realms have been successful only in extending the
binary GP classifiers to the problems of mcc, thereof still retaining
a void of not having any efficient multi-class classifiers, when
compared to non-GP classifiers. In this work, I will present a novel
algorithm that incorporates some ideas on the representation of the
solution space for a tree based GP, that will lay some foundations on
filling this void, which might also lead to some future research in
this direction. During the presentation, I shall reveal the success
and competitiveness of this approach, and discuss about the future
directions.

» Array Array

Quick Hyper-Volume

Submitted by lsr on Wed, 10/09/2013 - 13:48.

Start: 10/10/2013 - 14:30

End: 10/10/2013 - 15:30

Timezone: Etc/GMT

I will present a new algorithm to calculate exact hypervolumes. Given
a set of $d$-dimensional points, it computes the
hypervolume of the dominated space. Determining this value is an
important subroutine of Multiobjective Evolutionary Algorithms
(MOEAs). We analyze the ``Quick Hypervolume'' QHV algorithm
theoretically and experimentally. The theoretical results are
a significant contribution to the current state of the art. Moreover
the experimental performance is also very competitive, compared
with existing exact hypervolume algorithms.

» Array Array

Parallel efficient alignment of reads for re-sequencing applications

Submitted by lsr on Mon, 09/23/2013 - 10:55.

Start: 09/26/2013 - 14:30

End: 09/26/2013 - 15:30

Timezone: Etc/GMT

In bioinformatics, in the context of resequencing projects,
the e cient and accurate mapping of reads to a reference
genome is a critical problem. One instance of this problem
is the local alignment of pyrosequencing reads produced
by the 454 GS FLX system against a reference sequence,
an instance for which the software tool TAPyR (Tool for
the Alignment of Pyrosequencing Reads) was developed.
TAPyR implements a methodology to e ciently solve this
problem, which proved to yield results of a quality (both in
terms of content and execution speed) higher than those of
mainstream applications. With the goal of further improving
this platform's results, we produced a parallel implementation
of the query and reference sequence access procedures
of the original version. Through the use of multithreading,
this new version, P-TAPyR, produces considerable
reductions in the processing time of queries, scaling with
the amount of hardware-supported threads (not accounting
for hyper-threading) available. For larger data sets, we
were able to observe running times roughly 26 times faster
than serial execution with 30 executing threads, showing
an experimental (progressively-decreasing) execution serial
fraction of 0.8% (determined by the Karp-Rabin Metric described
in a posterior section). Herein we present the modi
cations made to this software tool to allow for parallel
querying of reads against an indexed reference which, scales
proportionally to the amount of available physical cores.

» Array Array

Host-pathogen interaction upon infection with Listeria using NGS techniques

Submitted by lsr on Wed, 05/29/2013 - 08:58.

Start: 06/07/2013 - 11:00

End: 06/07/2013 - 12:00

Timezone: Etc/GMT

Listeria monocytogenes is a model bacterial pathogen whose, after internalization, is
capable of disrupting a double-membrane vacuole, replicate in the host cytosol and
manipulate the innate response triggered in the cytosol. Its intracellular lifecycle in the
human host provides insight into the dynamics of general host-pathogen
interactions. The identification of host sequences affected during these interactions is
paramount to our understanding of how pathogens engineer their cellular
environments.
The main goal of this project is, therefore, to comprehend in which way pathogens are
influencing human host cells, by identifying global changes in the host transcriptome
and characterizing the alterations in host nuclear architecture. Furthermore, it is aimed
to associate these changes to different stages of the Listeria monocytogenes infection
lifecycle. For that, total RNA was extracted from three different cell populations at four
time-points (after 20, 60, 120 and 240 minutes) with the purpose of having represented
specific stages in the bacterium lifecycle.

» Array Array

Novel semantic approaches in Genetic Programming.

Submitted by lsr on Wed, 05/15/2013 - 16:42.

Start: 05/24/2013 - 11:00

End: 05/24/2013 - 12:00

Timezone: Etc/GMT

Evolutionary algorithms are stochastic optimization techniques based on the
principles of natural evolution and Genetic Programming (GP) belongs to this family .

In recent years the study of GP systems has been extended to phenotypic aspects while in previous phase it was mainly focused on genotypic and syntactic aspects.

Phenotype or semantic is utilized with the aim of optimizing the capacity of GP algorithms to explore the solution space in an effective way, classifying similar individuals and exploring new semantic areas, increasing the probability to find an optimal solution and to escape local optimum.

Currently semantic GP is strictly related to the evaluation of individual's behavior in the candidate population: this kind of evaluation is mainly obtained through the fitness function itself.

This work introduces a new way of measuring semantic similarity between individuals that is more independent from the fitness itself, allowing a fair comparison even when the finesses values involved are very far away from each other. This new measure enable a new series of techniques to be used to tackle the open problems in GP, like bloat and over-fitting, and also targeting the phenotype's variety preservation thereby enhancing performances. Preliminary results will be provided.

A new theoretical GP algorithm based on this new semantic measure it is also introduced showing the potential advantages. Very early results coming from a first naive implementation show interesting insight on this potential comparing with others on the cutting edge algorithms.

» Array Array

Equilibria in a Repeated Epidemic Dissemination Game

Submitted by lsr on Thu, 05/02/2013 - 11:32.

Start: 05/10/2013 - 11:30

End: 05/10/2013 - 12:00

Timezone: Etc/GMT

Abstract: "Epidemic dissemination protocols are known to be extremely
scalable and robust. As a result, they are particularly well suited to
support the dissemination of information in large-scale peer-to-peer
systems. In such an environment, nodes do not belong to the same
administrative domain. On the contrary, many of these systems rely on
resources made available by rational nodes that are not necessarily
obedient to the protocol. There are two main incentive mechanisms that
can be used to deal with rational behavior. One is to rely on balanced
exchanges, which is feasible to implement in epidemic protocols where
interactions are symmetric. For the asymmetric case, incentives based on
a monitoring approach are more suited. Unfortunately, the literature
does not provide any meaningful theoretical results for this last type
of incentives. In this talk, I will present basic results that establish

a tradeoff between the amount of information provided by a monitor and
the ability to sustain cooperation among rational nodes, assuming a
perfect monitoring."

Xavier Vilaça is a PhD student at IST and a researcher of Distributed
Systems Group at INESC-ID. He got a MSc degree in Computer Science and
Engineering from IST in 2011 and a BSc also in Computer Science and
Engineering from University of Minho in 2009.

This work is being presented as a final report for the Complex Network
Analysis course from the PhD program in Computer Science and
Engineering at IST.

» Array Array

Novel semantic approaches in Genetic Programming.

Submitted by lsr on Mon, 04/22/2013 - 08:52.

Start: 04/26/2013 - 11:00

End: 04/26/2013 - 12:00

Timezone: Etc/GMT

Evolutionary algorithms are stochastic optimization techniques based on the
principles of natural evolution and Genetic Programming (GP) belongs to this family .

In recent years the study of GP systems has been extended to phenotypic aspects while in previous phase it was mainly focused on genotypic and syntactic aspects.

Phenotype or semantic is utilized with the aim of optimizing the capacity of GP algorithms to explore the solution space in an effective way, classifying similar individuals and exploring new semantic areas, increasing the probability to find an optimal solution and to escape local optimum.

Currently semantic GP is strictly related to the evaluation of individual's behavior in the candidate population: this kind of evaluation is mainly obtained through the fitness function itself.

This work introduces a new way of measuring semantic similarity between individuals that is more independent from the fitness itself, allowing a fair comparison even when the finesses values involved are very far away from each other. This new measure enable a new series of techniques to be used to tackle the open problems in GP, like bloat and over-fitting, and also targeting the phenotype's variety preservation thereby enhancing performances. Preliminary results will be provided.

A new theoretical GP algorithm based on this new semantic measure it is also introduced showing the potential advantages. Very early results coming from a first naive implementation show interesting insight on this potential comparing with others on the cutting edge algorithms.

» Array Array

kdbio

Navigation

Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis

Evaluating differential gene expression using RNA-sequencing data

MetaGen-FRAME

On Multi-class Classification Problems Using Genetic Programming

Quick Hyper-Volume

Parallel efficient alignment of reads for re-sequencing applications

Host-pathogen interaction upon infection with Listeria using NGS techniques

Novel semantic approaches in Genetic Programming.

Equilibria in a Repeated Epidemic Dissemination Game

Novel semantic approaches in Genetic Programming.

User login

Syndicate

Navigation Content Events Reading Groups Seminars Templates & Logos Forums Recent posts Create content	Seminars Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis Submitted by lsr on Thu, 01/30/2014 - 15:37. Start: 02/06/2014 - 14:30 End: 02/06/2014 - 15:30 Timezone: Etc/GMT Next Generation Sequecing (NGS) is a set of molecular biology technologies which generate, at low cost, many millions of short nucleotide reads. Typical datasets consist of tens of millions of reads, with each read comprising 35-500 basepairs (depending on the technology used, different read sizes can be obtained). There are many tools for handing these datasets. However, they must still be combined to build a full analysis pipeline. Current solutions to build these pipelines are Make-like tools which can handle text-files and Unix-like commands. Several GUI-based solutions allow users who are not comfortable with the command line to build and run these pipelines. However, they still operate at the semantic level of Make: file dependencies and transformation commands. Because each problem and each variation on the technology requires a different processing pipeline, it would be impossible to design a single pipeline for every need. This paper aims at the description of a context aware tool that will allow for the first phase of NGS analysis. » Array Array Evaluating differential gene expression using RNA-sequencing data Submitted by lsr on Mon, 11/25/2013 - 12:13. Start: 11/28/2013 - 14:30 End: 11/28/2013 - 15:30 Timezone: Etc/GMT Unlike the genome, the cell transcriptome is dynamic and specific for a given cell developmental stage or physiological condition. Understanding the transcriptome is essential for interpreting the functional elements of the genome and revealing the molecular constituents of cells. Recently, developments of high-throughput DNA sequencing methodologies have provided a new method to sequence RNA at unprecedented high resolutions. This method is termed RNA-Seq and has been emerging as the preferred technology for both characterization and quantification of the cell transcripts. Bearing this in mind, in this thesis I propose a bioinformatics pipeline to compare two RNA-Seq samples. This pipeline permits biological insight into the analysed samples, by extracting the main biological processes that are differentially active among the samples in analysis. Subsequent to this pipeline, I developed a novel methodology to inspect the activation of a given cellular pathway in a time-course RNA-Seq dataset. The evaluation of a Listeria monocytogenes RNA-Seq dataset with the developed tools testified its proper functioning. It was possible to identify global changes in the human host transcriptome and associate these changes to different stages of the Listeria monocytogenes infection lifecycle. » Array Array MetaGen-FRAME Submitted by lsr on Thu, 10/24/2013 - 16:21. Start: 10/31/2013 - 14:30 End: 10/31/2013 - 15:30 Timezone: Etc/GMT Metagenomics is the study of metagenomes, unprocessed genetic material residing in the most varied sites, without separation into individual organisms. Metagenomic approaches to the study of biological communities are quickly changing our understanding of the function and inter-relationships among living organisms in ecosystems. The rapid advances in metagenomics are largely due to the hasty development of high throughput platforms for deoxyribonucleic acid (DNA) sequencing, that need to be accompanied by significant advances in data analysis techniques. With this work, I intended to develop and apply new techniques for data analysis that can be applied to large amounts of data generated by metagenomics. This document presents a proposal to address the challenges posed by the storage and manipulation of such information types and the need to develop new data analysis techniques that can be applied directly to this problem. For this purpose, there was an intention to harness the power of parallel computing. The target-result of this thesis was MetaGen-FRAME, a metagenomic framework capable of handling heterogeneous data types (from DNA sequences to genome, proteome and metabolome annotations) though the use of different data structures and computational approaches. » Array Array On Multi-class Classification Problems Using Genetic Programming Submitted by lsr on Tue, 10/15/2013 - 13:46. Start: 10/24/2013 - 14:30 End: 10/24/2013 - 15:30 Timezone: Etc/GMT Genetic Programming (GP) is a field under the hood of Evolutionary Computing, that has been successful in addressing a variety of problems in the field of data mining and machine learning, notexcluding the problems of multi-class classification (mcc). However, its realms have been successful only in extending the binary GP classifiers to the problems of mcc, thereof still retaining a void of not having any efficient multi-class classifiers, when compared to non-GP classifiers. In this work, I will present a novel algorithm that incorporates some ideas on the representation of the solution space for a tree based GP, that will lay some foundations on filling this void, which might also lead to some future research in this direction. During the presentation, I shall reveal the success and competitiveness of this approach, and discuss about the future directions. » Array Array Quick Hyper-Volume Submitted by lsr on Wed, 10/09/2013 - 13:48. Start: 10/10/2013 - 14:30 End: 10/10/2013 - 15:30 Timezone: Etc/GMT I will present a new algorithm to calculate exact hypervolumes. Given a set of $d$-dimensional points, it computes the hypervolume of the dominated space. Determining this value is an important subroutine of Multiobjective Evolutionary Algorithms (MOEAs). We analyze the ``Quick Hypervolume'' QHV algorithm theoretically and experimentally. The theoretical results are a significant contribution to the current state of the art. Moreover the experimental performance is also very competitive, compared with existing exact hypervolume algorithms. » Array Array Parallel efficient alignment of reads for re-sequencing applications Submitted by lsr on Mon, 09/23/2013 - 10:55. Start: 09/26/2013 - 14:30 End: 09/26/2013 - 15:30 Timezone: Etc/GMT In bioinformatics, in the context of resequencing projects, the e cient and accurate mapping of reads to a reference genome is a critical problem. One instance of this problem is the local alignment of pyrosequencing reads produced by the 454 GS FLX system against a reference sequence, an instance for which the software tool TAPyR (Tool for the Alignment of Pyrosequencing Reads) was developed. TAPyR implements a methodology to e ciently solve this problem, which proved to yield results of a quality (both in terms of content and execution speed) higher than those of mainstream applications. With the goal of further improving this platform's results, we produced a parallel implementation of the query and reference sequence access procedures of the original version. Through the use of multithreading, this new version, P-TAPyR, produces considerable reductions in the processing time of queries, scaling with the amount of hardware-supported threads (not accounting for hyper-threading) available. For larger data sets, we were able to observe running times roughly 26 times faster than serial execution with 30 executing threads, showing an experimental (progressively-decreasing) execution serial fraction of 0.8% (determined by the Karp-Rabin Metric described in a posterior section). Herein we present the modi cations made to this software tool to allow for parallel querying of reads against an indexed reference which, scales proportionally to the amount of available physical cores. » Array Array Host-pathogen interaction upon infection with Listeria using NGS techniques Submitted by lsr on Wed, 05/29/2013 - 08:58. Start: 06/07/2013 - 11:00 End: 06/07/2013 - 12:00 Timezone: Etc/GMT Listeria monocytogenes is a model bacterial pathogen whose, after internalization, is capable of disrupting a double-membrane vacuole, replicate in the host cytosol and manipulate the innate response triggered in the cytosol. Its intracellular lifecycle in the human host provides insight into the dynamics of general host-pathogen interactions. The identification of host sequences affected during these interactions is paramount to our understanding of how pathogens engineer their cellular environments. The main goal of this project is, therefore, to comprehend in which way pathogens are influencing human host cells, by identifying global changes in the host transcriptome and characterizing the alterations in host nuclear architecture. Furthermore, it is aimed to associate these changes to different stages of the Listeria monocytogenes infection lifecycle. For that, total RNA was extracted from three different cell populations at four time-points (after 20, 60, 120 and 240 minutes) with the purpose of having represented specific stages in the bacterium lifecycle. » Array Array Novel semantic approaches in Genetic Programming. Submitted by lsr on Wed, 05/15/2013 - 16:42. Start: 05/24/2013 - 11:00 End: 05/24/2013 - 12:00 Timezone: Etc/GMT Evolutionary algorithms are stochastic optimization techniques based on the principles of natural evolution and Genetic Programming (GP) belongs to this family . In recent years the study of GP systems has been extended to phenotypic aspects while in previous phase it was mainly focused on genotypic and syntactic aspects. Phenotype or semantic is utilized with the aim of optimizing the capacity of GP algorithms to explore the solution space in an effective way, classifying similar individuals and exploring new semantic areas, increasing the probability to find an optimal solution and to escape local optimum. Currently semantic GP is strictly related to the evaluation of individual's behavior in the candidate population: this kind of evaluation is mainly obtained through the fitness function itself. This work introduces a new way of measuring semantic similarity between individuals that is more independent from the fitness itself, allowing a fair comparison even when the finesses values involved are very far away from each other. This new measure enable a new series of techniques to be used to tackle the open problems in GP, like bloat and over-fitting, and also targeting the phenotype's variety preservation thereby enhancing performances. Preliminary results will be provided. A new theoretical GP algorithm based on this new semantic measure it is also introduced showing the potential advantages. Very early results coming from a first naive implementation show interesting insight on this potential comparing with others on the cutting edge algorithms. » Array Array Equilibria in a Repeated Epidemic Dissemination Game Submitted by lsr on Thu, 05/02/2013 - 11:32. Start: 05/10/2013 - 11:30 End: 05/10/2013 - 12:00 Timezone: Etc/GMT Abstract: "Epidemic dissemination protocols are known to be extremely scalable and robust. As a result, they are particularly well suited to support the dissemination of information in large-scale peer-to-peer systems. In such an environment, nodes do not belong to the same administrative domain. On the contrary, many of these systems rely on resources made available by rational nodes that are not necessarily obedient to the protocol. There are two main incentive mechanisms that can be used to deal with rational behavior. One is to rely on balanced exchanges, which is feasible to implement in epidemic protocols where interactions are symmetric. For the asymmetric case, incentives based on a monitoring approach are more suited. Unfortunately, the literature does not provide any meaningful theoretical results for this last type of incentives. In this talk, I will present basic results that establish a tradeoff between the amount of information provided by a monitor and the ability to sustain cooperation among rational nodes, assuming a perfect monitoring." Xavier Vilaça is a PhD student at IST and a researcher of Distributed Systems Group at INESC-ID. He got a MSc degree in Computer Science and Engineering from IST in 2011 and a BSc also in Computer Science and Engineering from University of Minho in 2009. This work is being presented as a final report for the Complex Network Analysis course from the PhD program in Computer Science and Engineering at IST. » Array Array Novel semantic approaches in Genetic Programming. Submitted by lsr on Mon, 04/22/2013 - 08:52. Start: 04/26/2013 - 11:00 End: 04/26/2013 - 12:00 Timezone: Etc/GMT Evolutionary algorithms are stochastic optimization techniques based on the principles of natural evolution and Genetic Programming (GP) belongs to this family . In recent years the study of GP systems has been extended to phenotypic aspects while in previous phase it was mainly focused on genotypic and syntactic aspects. Phenotype or semantic is utilized with the aim of optimizing the capacity of GP algorithms to explore the solution space in an effective way, classifying similar individuals and exploring new semantic areas, increasing the probability to find an optimal solution and to escape local optimum. Currently semantic GP is strictly related to the evaluation of individual's behavior in the candidate population: this kind of evaluation is mainly obtained through the fitness function itself. This work introduces a new way of measuring semantic similarity between individuals that is more independent from the fitness itself, allowing a fair comparison even when the finesses values involved are very far away from each other. This new measure enable a new series of techniques to be used to tackle the open problems in GP, like bloat and over-fitting, and also targeting the phenotype's variety preservation thereby enhancing performances. Preliminary results will be provided. A new theoretical GP algorithm based on this new semantic measure it is also introduced showing the potential advantages. Very early results coming from a first naive implementation show interesting insight on this potential comparing with others on the cutting edge algorithms. » Array Array « first ‹ previous123 4 5 6 7 8 9…next ›last »	User login Username: * Password: * Create new account Request new password Syndicate



© 2005, Inesc-ID. All rights reserved