For a little background, OpenMPI is a project that was spawned when a bunch of MPI implementers got together and decided to work together since they were all working on basically the same thing. LAM/MPI (which we have been using at the Lab), FT-MPI, Sun CT 6, LA-MPI, PACX-MPI
What's new in 1.3 (to be released soon):
- ConnetX XRC support
- More scalability improvements
- more compiler and run time environment support
- fine-grained processor affinity control
- MPI 2.1 compliant
- notifier framework
- better documentation
- more architectures, more OSes, more batch systems
- thread safety (some devices, point to point only)
- MPI_REAL16, MPI_COMPLEX32 (optional, no clean way in C)
- C++ binding improvements
- valgrind (memchecker) support
- updated ROMIO version
- condensed error messages (MPI_Abort() only prints one error message)
- lots of little improvements
- keep the same on-demand connection setup as prior version
- decrease memory footprint
- sparse groups and communicators
- many improvements in OpenMPI run time system
- improved latency
- smaller memory footprint
collectives
- more algorithms, more performance
- special shared memory collective
- hierarchical collective active by default
Fault Tolerance
- coordinated checkpoint/restart
- support BLCR and self (self means you give function pointer to call for checkpoint)
- able to handle real process migration (i.e. change network type during migration)
- improved message logging
- reduce launch times by order of magnitude
- reliability: cleanup, robustness
- maintainability: cleanup, simplify program. remove everything not required for OMP
Roadmap:
v1.4 in planning phase only, feature list not fully decided
run-time usability
- parameter usability options
- sysadmin lock certain parameter values
- spelling checks, validity checks
- next-gen launcher
- integration with other run-time systems
shared memory improvements: allocations sizes, sharing. scalability to manycore
I/O redirection features
- line by line tagging
- output multiplexing
- "screen"-like features
MPI connectivity map
refresh included software
Upcoming Challenges:
fFault tolerance, first step similar to FT-MPI approach - if a rank dies the rest of the ranks are still able to communicate, up to programmer to detect and recover if possible
Scalability at run time and MPI level
Collective communication - when to switch between algorithms, take advantage of physical topology
MPI Forum
HLRS is selling MPI 2.1 spec at cost $22 (586 pages), both #1353
what do you want in MPI 3.0?
what don't you want in MPI 3.0?
Feedback:
Question regarding combining OpenMPI with OpenMP: Jeff: yes and no, OpenMPI has better threading support now, but can't guarantee it won't break yet - should be fine with devices that support mpi thread multiple
Can you compare OpenMPI with other mpi impelmentations? Jeff: We steal from them, they steal from us. Some say competition is good, but having many implementations available, especially on a single cluster, is confusing to users. Jeff would like to see more consolidation.
show of hands how important is...
thread safety (multiple threads making simultaneous MPI calls). about 10 in a full room
Parallel I/O. only a few hands
one-sided operations - only a couple users
2 comments:
Thanks for the live blog account of the BOF!
I have posted the slides from the BOF here:
http://www.open-mpi.org/papers/sc-2008/
ha, I didn't know anyone would actually read this other than a handful of my co-workers. I hope I didn't have any mistakes now that I know Jeff read it...
Post a Comment