There are currently more than 600 bacterial species and 28 vertebrate species, ranging from primates to fishes, for which we know (nearly) their entire DNA sequences. These number will continue to increase rapidly over the next few years. Comparing these genome sequences has emerged as one of the most important areas of computational biology. For example, one way to predict functional portions of the human genome is to search among related genomes for sequences that appear to be remarkably similar due to purifying selection.
We do not have a semantic web as such yet and instead have a collection of semantic web technologies. These technologies have recently started to deliver on their promise of an interoperable world particularly in data driven initiatives that integrate data management with its analysis. In this presentation we will describe our own travails with identifying and putting to use data driven representations of biomolecular repositories for biomarker studies.
The study of gene regulatory networks, as well as other biological networks, have recently yield an increase on the number and detail of available models describing specific intracellular processes. The study of these models by means of analysis and simulation tools leads to innumerous predictions representing the possible behaviours of the system. In order to validate these predictions one must confront them with experimental data.
This talk provides an introductory overview to DNA sequencing, as well as to the algorithms and architectures used for sequence alignment. The presentation will start with a brief introduction to the DNA sequencing process. Afterwards, a description of the optimal and heuristic algorithms for sequence alignment will be presented, as well as the data structures that usually support them. Special attention will be put on approximate string matching algorithms, due to the considerable speedup that may be obtained by using this type of search.
The aim of this work is to benchmark scoring functions used by Bayesian network learning algorithms in the context of classification. We considered both information-theoretic scores, such as LL, AIC, BIC/MDL, NML and MIT, and Bayesian scores, such as K2, BD, BDe and BDeu. We tested the scores in a classification task by learning the optimal TAN classifier with benchmark datasets. We conclude that, in general, information-theoretic scores perform better than Bayesian scores.
In this seminar I will be honoured to present myself to the local community and talk about the joys of cities with beautiful riverside landscapes. Incidentally, I might be caught talking about my research interests concerning the characterisation of conserved functional gene modules from heterogeneous high throughput data.
As ArrayExpress and other repositories of genome- wide experiments are reaching a mature size, it is becoming more meaningful to search for related experiments, given a particular study. We introduce methods that allow for the search to be based upon measurement data, instead of the more customary annotation data. The goal is to retrieve experiments in which the same biological processes are activated. This can be due either to experiments targeting the same biological question, or to as-yet unknown relationships.
The characterization and engineering of monoclonal antibodies is usually preceded by time-consuming Edman/cDNA sequencing steps for determination of the heavy and light chain sequences – a low-throughput pipeline that does not address post-translational modifications. In a departure from these platforms, we have developed the Comparative Shotgun Protein Sequencing (CSPS) suite of algorithms – a mass spectrometry based protein sequencing approach resulting in over 95% sequence coverage and automatic discovery of unexpected post-translational modifications.
This talk will address methods for the analysis and modeling of HIV evolution, including phylogenetics and the relationship between genotype and phenotype of the HIV virus.
Hunting disease genes is a problem of primary importance in biomedical research. Biologists usually approach this problem in two steps: first a set of candidate genes is identified using traditional positional cloning or high-throughput genomics techniques; second, these genes are further investigated and validated in the wet lab, one by one. To speed up discovery and limit the number of costly wet lab experiments, biologists must test the candidate genes starting with the most probable candidates.