INESC-ID   Instituto de Engenharia de Sistemas e Computadores Investigação e Desenvolvimento em Lisboa
technology from seed


Knowledge Discovery and Bioinformatics
Inesc-ID Lisboa

Design and Implementation of a Domain Specific Language for Next Generation Sequence Analysis

09/26/2014 - 14:30
09/26/2014 - 15:30

Next Generation Sequecing (NGS) is a set of molecular biology technologies
which generate, at low cost, many millions of short nucleotide reads. Typical
datasets consist of tens of millions of reads, with each read comprising 35-500
basepairs (depending on the technology used, different read sizes can be

There are many tools for handing these datasets. However, they must still be
combined to build a full analysis pipeline. Current solutions to build these
pipelines are Make-like tools which can handle text-files and Unix-like
commands. Several GUI-based solutions allow users who are not comfortable with
the command line to build and run these pipelines. However, they still operate
at the semantic level of Make: file dependencies and transformation commands.

Because each problem and each variation on the technology requires a
different processing pipeline, it would be impossible to design a single
pipeline for every need. This paper aims at the description of a context aware tool
that will allow for the first phase of NGS analysis.