Wednesday, November 19, 2008

Genomic Sequence search on Blue Gene/P

This afternoon I find my self in a talk about massively parallel sequence search.

I'll take a look at the conference procedings and edit this to put in a reference to their paper.


As we know sequence search is a fundamental tool in computational biology. A popular search algorithm in BLAST.

Genomic databases are growing faster than compute capability of single CPUs (clock scaling hits the power wall). This requires more and more processors to complete the search in a timely manner. The BLAST algorithm is O(n^2) worst case.

These researchers are using mpiBLAST on the Blue Gene/P, with very high efficiency. There have been many scalability improvements to mpiBLAST in the past few years, but it still didn't scale well beyond several thousand processors. They have identified key design issues of scalable sequence search and have made modifications to mpiBLAST to improve its performance at massive scales.

one limitation was the fixed worker-to-master mapping, and high overhead with fine-grained load balancing. Their optimizations include improvements that allow mapping arbitrary workers to a master, and hide balancing overhead with query prefetching.

There are I/O challenges as well. They have implemented asynchronous two-phase I/O to get high throughput without forcing synchronization.

They show 93% efficiency on 32,000 processors with their modified mpiBLAST.

No comments: