Wednesday, November 19, 2008

OpenMPI BoF

I am currently attending the OpenMPI BoF, being led by Jeff Squires of Cisco, one of the OpenMPI main developers. Prior to working at Cisco on OpenMPI, Jeff was part of the LAM/MPI project at Indiana University.

For a little background, OpenMPI is a project that was spawned when a bunch of MPI implementers got together and decided to work together since they were all working on basically the same thing. LAM/MPI (which we have been using at the Lab), FT-MPI, Sun CT 6, LA-MPI, PACX-MPI


What's new in 1.3 (to be released soon):
  • ConnetX XRC support
  • More scalability improvements
  • more compiler and run time environment support
  • fine-grained processor affinity control
  • MPI 2.1 compliant
  • notifier framework
  • better documentation
  • more architectures, more OSes, more batch systems
  • thread safety (some devices, point to point only)
  • MPI_REAL16, MPI_COMPLEX32 (optional, no clean way in C)
  • C++ binding improvements
  • valgrind (memchecker) support
  • updated ROMIO version
  • condensed error messages (MPI_Abort() only prints one error message)
  • lots of little improvements
Scalability
  • keep the same on-demand connection setup as prior version
  • decrease memory footprint
  • sparse groups and communicators
  • many improvements in OpenMPI run time system
Pont to point Message Layer (PML)
  • improved latency
  • smaller memory footprint

collectives
  • more algorithms, more performance
  • special shared memory collective
  • hierarchical collective active by default
open fabrics: now support iWarp, not just infiniband open fabric. XRC support, message coalescing (resisted because only really useful for benchmarking). uDAPL improvements by Sun (not really open fabric)

Fault Tolerance
  • coordinated checkpoint/restart
  • support BLCR and self (self means you give function pointer to call for checkpoint)
  • able to handle real process migration (i.e. change network type during migration)
  • improved message logging
OpenMPI on Roadrunner - scaling to 1 petaflop
  • reduce launch times by order of magnitude
  • reliability: cleanup, robustness
  • maintainability: cleanup, simplify program. remove everything not required for OMP
routed out of band communications

Roadmap:
v1.4 in planning phase only, feature list not fully decided

run-time usability
  • parameter usability options
  • sysadmin lock certain parameter values
  • spelling checks, validity checks
run-time system improvements
  • next-gen launcher
  • integration with other run-time systems
more processor and memory affinity support, topology awareness

shared memory improvements: allocations sizes, sharing. scalability to manycore

I/O redirection features
  • line by line tagging
  • output multiplexing
  • "screen"-like features
Blocking progress
MPI connectivity map
refresh included software


Upcoming Challenges:
fFault tolerance, first step similar to FT-MPI approach - if a rank dies the rest of the ranks are still able to communicate, up to programmer to detect and recover if possible
Scalability at run time and MPI level
Collective communication - when to switch between algorithms, take advantage of physical topology

MPI Forum
HLRS is selling MPI 2.1 spec at cost $22 (586 pages), both #1353
what do you want in MPI 3.0?
what don't you want in MPI 3.0?


Feedback:
Question regarding combining OpenMPI with OpenMP: Jeff: yes and no, OpenMPI has better threading support now, but can't guarantee it won't break yet - should be fine with devices that support mpi thread multiple

Can you compare OpenMPI with other mpi impelmentations? Jeff: We steal from them, they steal from us. Some say competition is good, but having many implementations available, especially on a single cluster, is confusing to users. Jeff would like to see more consolidation.

show of hands how important is...

thread safety (multiple threads making simultaneous MPI calls). about 10 in a full room
Parallel I/O. only a few hands
one-sided operations - only a couple users

2 comments:

Jeff Squyres said...

Thanks for the live blog account of the BOF!

I have posted the slides from the BOF here:

http://www.open-mpi.org/papers/sc-2008/

G said...

ha, I didn't know anyone would actually read this other than a handful of my co-workers. I hope I didn't have any mistakes now that I know Jeff read it...