A parallel algorithm for the extraction of structured motifs
We present a parallel algorithm for the efficient extraction of binding-site consensus from genomic sequences. This algorithm is based on an existing approach for extracting structured motifs. A structured motif consists of an ordered collection of p boxes, p substitution rates and p-1 distances between successive boxes. The contents of the boxes, which represent the extracted motifs, are unknown at the start of the process and are found by the algorithm using a suffix tree as the fundamental data structure.
By partitioning the structured motif searching space we divide the most demanding part of the algorithm by a number of processors that can be loosely coupled. In this way we obtain, under conditions that are easily met, a speedup that is linear on the number of available processing units. This speedup is verified by both theoretical and experimental analysis.