Monday, November 17, 2008

Sun HPC Consortium

After taking a year off from attending the Sun HPC Consortium, I am glad I chose to attend again this year. (I have attended the Seattle and Tampa meetings, and skipped Reno - but still attended SC07). It is always good to see what is in the pipeline at Sun, what a few Sun Partners have up their sleeve, and also see what other customers are doing. As Gregg mentioned over at Mental Burdocks, we had to sign a NDA to attend, so there are details we can't blog about. I will be blogging at a very high level and not specific details so I'll be safe. Sun folks are aware of Gregg's blog, so he is also making sure that he is honoring the NDA and in some cases has asked them what is and isn't OK to blog about.

Gregg is doing a great job of discussing things, so I'm just going to elaborate on a few things I found particularly exciting. First up is a potential GPGPU application for a project I have been working on:

For quite some time I have been working on software for the Institute for Molecular Biophysics (IMB), (for Joerg Bewersdorf, Dr. rer. nat., at The Jackson Laboratory). The software is a parallel implementation of the 3D sub diffraction localization software for the Biplane FPALM microscope. The parallel version has been a great improvement over the serial version they run on their lab workstations. For example, a run that had taken over two days on the workstation could be completed in under an hour on our cluster. I don't remember how many cores this test was run on, but I don't think it was any more than 40 cores. Despite the apparent success of this parallelization effort, I am nervous that they will eventually overwhelm our small cluster as they scale their problem size up dramatically. There may be some algorithmic optimizations that could help us get more out of our hardware, but right now a very large percentage of time is spent in the fftw library, and I would have no hope of implementing a faster FFT than the fftw authors. I may implement an option to use single-precision floating point math - I don't think double precision is a necessity, but will have to talk to Dr. Bewersdorf about that. We also have the option of an off-site compute resource, but it would be nice to be able to do the jobs in house, with a fast turn around, and not have to spend a ton of money on more nodes that might sit idle when there isn't any biplane data to crunch. What I have been thinking about this weekend is the NVIDIA Tesla - I have heard a lot about fast FFT on the Tesla this weekend, and adding some of their 1U 4-Tesla units to a small subset of our cluster nodes could very well give us all the computing power we need for this project. I will be doing more investigation at SC08.

No comments: