Metagenomics is the study of metagenomes, unprocessed genetic material residing in the most varied
sites, without separation into individual organisms. Metagenomic approaches to the study of biological
communities are quickly changing our understanding of the function and inter-relationships among
living organisms in ecosystems. The rapid advances in metagenomics are largely due to the hasty development
of high throughput platforms for deoxyribonucleic acid (DNA) sequencing, that need to be
accompanied by significant advances in data analysis techniques.
With this work, I intended to develop and apply new techniques for data analysis that can be applied
to large amounts of data generated by metagenomics. This document presents a proposal to address the
challenges posed by the storage and manipulation of such information types and the need to develop
new data analysis techniques that can be applied directly to this problem. For this purpose, there was
an intention to harness the power of parallel computing.
The target-result of this thesis was MetaGen-FRAME, a metagenomic framework capable of handling
heterogeneous data types (from DNA sequences to genome, proteome and metabolome annotations)
though the use of different data structures and computational approaches.