Thursday, November 19, 2009

Thursday Keynote

Today, the thrust area highlighted in the keynote is Sustainability. The speaker will be former Vice President Al gore. Right now we are hearing from the SC '10 chair inviting us all to New Orleans next year.

Technology thrusts for '10 include Climate Simulation, Heterogeneous Computing (e.g. combining x86 processors with GPU processors), and Data Intensive Computing.

SC 10 will have more exhibitor space than ever and 300+ Gbps network connectivity.


Now up, Al Gore.

"I used to be the next President of the United States" *laughter* "I don't think that is funny"

Al is a "recovering politician", talking about working to help establish funding for supercomputing centers and the Internet during his political career. AL GORE INVENTED SUPERCOMPUTERS. Al Gore was actually present at the first SC conference in 1988 in Orlando, FL.

After a few more funny stories about life after politics (people telling him he looks just like Al Gore at restaurants), and apologizing because he was supposed to attend in 1997 but had to fill in for President Clinton at some other event, he is now talking about how important the work we are doing is. Technology is important for the discoveries, but also so that developing countries can skip the pollution intensive phase developed countries have gone through. More urban development will occur in the next 35 years than has occurred in the history of civilization. Modeling will help develop these new urban areas to be more sustainable and less automobile centric.

Supercomputing was a transformative development, similar to the invention of the telescope or other revolutionary scientific instrument.

90 million tons of global warming pollutants go into the atmosphere today, and now we know how thin our atmosphere really is. It is not as vast as it appears to our perspective, and images from space have confirmed it is a very thin shell and not impervious to our actions. Oil is the dominant source of energy after the first oil well in the world was drilled in Pennsylvania in 1859. In the last 100 years, the worlds population has quadrupled, should stabilize at ~9 billion in 2050.

The climate change crisis is actually the second major climate crisis, the first was the effects of CFCs on the Ozone layer and the large hole that developed over Antartica. One year after the discovery of the hole a treaty was developed to phase out the chemicals responsible. This was criticized as too weak, but tougher and tougher standards were set as alternatives were developed, and now it has been a success. The same holds true for climate change, any politically acceptable treaty will not be enough, but it will be a start. We need to introduce systematic changes that produce much higher levels of efficiency, and supercomputing will be key to replacing terribly inefficient technology such as the internal combustion engine. If you measure the energy used to move a person with an automobile, the car is about 99.92 inefficient when you look at how much energy actually goes to moving the passenger compared to moving the vehicle or what is lost to inefficiencies. As these technologies are developed it will start to make business sense to move to less polluting technology. Coal fired generation is also inefficient, only 1/3rd of the potential energy becomes electricity, the rest is lost mostly as waste heat. Solar and wind needs supercomputing.

Moores Law was a self-fulfilling expectation, not a law. It was anticipated how much of a revolution this growth in computational power would be so R&D resources were allocated to allow the growth. By next year there will be more than 1 billion transistors per person in the world. We can transform our energy system by recognizing the potential and making the necessary R&D investments. As long as we send 100s of billions of dollars every year to other countries for energy we are vulnerable. We need to stay the course, after the drop in fuel prices people do not see it as much of a crisis - they are in a temporary trance.

Supercomputing vastly expands our ability to understand complex realities. How do we, as human beings, individually relate to the powerful tool of supercomputing. The 3D internet (or as the CTO of Intel told us it is more accurately the 3D Web) will revolutionize the interface between humans and incredibly powerful machines. Biocomputing will revolutionize the treatment of disease. Modeling will revolutionize alternative energy.

Climate modeling will help develop political consensus to act quickly to solve the crisis. Without action, climate change can threaten human civilization. It is a planetary emergency.

short advertisement:
repoweramerica.org, 100% of proceeds from the sale of his new book will be donated to environmental causes

The environmental crisis does not trigger instinctive fears or responses to the threat, it requires reasoning. Since the distance between causes and consequences is so long we need to make the global scale and speed of this crisis obvious. Climate modeling and supercomputing will be critical.

We are not operating in a sustainable manner, understanding and responding is the challenge of our time.

Solar is promising, the key is to driving scalability and reductions in cost (Moore's Law for solar technology). We need a distributed energy architecture. Solar, wind, geothermal, biomass are all important as is new nuclear technology. Global scalability is limited by controlling nuclear proliferation. Electric vehicles can be a distributed battery for peak electricity loads.

Reducing deforestation is also an important challenge. Deforestation is responsible for significant CO2 emissions, and also the loss of genetic information (biodiversity) is important. Selling rain forest for wood is like selling a computer chip for the silicon. The real value is the biodiversity (drugs, biofuels, etc).

We are challenged to apply our skills to build a consensus to move forward quickly to solve the climate crisis.

Wednesday, November 18, 2009

Data Challenges in Genome Analysis

COMING SOON

The 3D Internet

Apologies Dave/Keith - I am definitely not as thurough blogging these things as Gregg.


One of the "thrust areas" for SC '09 was "the 3D internet", and this was the focus of the opening address Tuesday morning. The speaker was Justin Rattner (CTO Intel), but before Justin could take the stage we were addressed by the SC '09 chair, Wilf Pinfold, gave us a run down of the conference.

Despite some worries about the economic downturn impacting the conference, the numbers were still strong. Wilf informed us that all 350+ booths were sold, and there is 265,000+ square feet of exhibition space. 204 miles of fiber were run throughout the conference center, and with support of local internet providers, 400 Gbps of internet connectivity was provided to the conference center for the duration of SC 09. This year there was a 22% acceptance rate for technical papers, highlighting that this is a premiere conference in high performance computing. This SC also made an effort to be more sustainable - it is the first SC held in a LEED certified conference center, all plastic is plant-based and biodegradable, and plastic water bottles have been replaced by water coolers. Also, the conference center has recycling bins throughout, which appears to be the norm for Portland.

So on to Intel CTO Justin Rattner and the 3D Web. Justin noted how revenue in HPC has been growing very slowly (almost flat?), and that the field relies mostly on government funding for R&D and on "trickle-up" technology from the consumer space (e.g. Intel/AMD processors, GPUs,...). He thinks that the 3D Web can be the "killer app" for HPC and really help propel it to the next level. The 3D web will be continuously simulated, multi-view, and immersive and collaborative. A demo was given of ScienceSim, specifically Utah State's FernLand, which is a fern lifecycle and population genetics simulation. ScienceSim is based on OpenSim, and the idea is to standardize this technology so we have interoperability between virtual worlds After a virtual interaction with the FernLand researcher in ScienceSim, Justin welcomed Shenlei Winkler of the Fashion Institute of Technology on stage.

Shenlei discussed how the fashion industry is fairly unique in that it is a >$trillion industry yet has avoided heavy computerization. At FIT they are exploring using OpenSim to replace the traditional workflow where a designer comes up with a design, which is sent to a factory in another country where a prototype run is made and shipped back for evaluation, and this is repeated as the design is refined. With the 3D Web they can do this iterative process in a virtual world, cutting time, cost, and environmental impact. She stated how one of the important developments would be cloth simulation so designers can see the difference of how different fabrics drape and flow. Justin showed some impressive demonstrations of cloth physics simulations that were computationally intensive (~6 minutes per frame on a small cluster, much more HPC power will be needed for real-time cloth physics in virtual worlds).

Justin wrapped things up with a demonstration of a system with an Intel "larrabee" coprocessor. Matrix mathematics was offloaded to this coprocessor, achieving 1 teraflop on a single over-clocked chip. Supposedly this is the first 1 teraflop performance achieved with a single chip. Programming tools also need to improve to take advantage of this type of architecture, and Ct was discussed. Ct is a high-throughput C++ based language that takes some of the effort for programming this type of architecture off the shoulders of the programmer.

Tuesday, November 17, 2009

FOR THOSE THAT DOUBT ME


Here I am with the father of Beowulf computing, Donald Becker. Don wrote a lot of the early Ethernet drivers for Linux and invented the concept of Beowulf computing. This would be like having your picture taken with Henry Ford if you were in the automotive business. I think I was supposed to hold him up for a keg stand of some special beer that a "secret" Portland-based HPC related compay (cough *Portland Group* cough) commissioned for the Beobash. The only problem is that Josh from Penguin Computing had to leave, and he was the other party involved in this task.

Anyway, we are having a TORQUE meeting tomorrow after the Top 500 announcement. If for some crazy reason anyone reading this is at SC '09, meet me at the Top 500 BoF Tuesday November 17th and you can talk about TORQUE with me and other users/developers at a venue to be determined.

Monday, November 16, 2009

SC '09

I'm here at SC '09. I got into my hotel last night after a long trip from Bangor, Maine. Today I stopped by the convention center and took care of my registration, and then checked out a little bit of downtown via the Max light rail (the convention center and downtown area are in the "free zone" so it is very convenient). The large (~255,000 square foot according to the website) exhibition opens tonight at 5:00PM, and the technical program kicks off tomorrow morning, so I should have some more interesting things to blog about soon. I still need to plan my schedule for the next few days to make sure I get in all of the talks that look interesting.

On a side note, I know Portland, ME is much smaller than Portland, OR and this wouldn't really be feasible, but how cool would it be to have light rail connecting the Old Port, Downtown, PWM, the Amtrak station, the Maine Mall, etc, with spurs out to surrounding communities (South Portland, Scarborough, Saco, Wetbrook, Yarmouth, Freeport, etc). It would make greater Portland a very "green" city.

Monday, June 15, 2009

A Rational Design Process

In grad school I took a software engineering course that was based largely on reading classic software engineering papers and discussing them. Occasionally we also implemented some of the example systems described in the papers (for example, we implemented the simulator described in Leslie Lamport's Time, Clocks, and the ordering of events in a distributed system and the KWIC index system described by David Parnas's On the Criteria To Be Used in Decomposing Systems into Modules), but actual coding was not the focus of the course.

One of the classic papers that I remember most from this course is The paper by David Parnas and Paul Clements entitled A Rational Design Process: How and why to fake it. This is one of the first classic software engineering papers (before agile programming, extreme programming, or whatever the latest buzz-word fad is) where the authors seemed to acknowledge that a formal software design processes will always be an idealisation. We had spent much of the semester reading about the idealized process, including the aforementioned On the Criteria... paper, and this paper really brought out some interesting thoughts during our discussions.



Among many of the examples Parnas and Clements give as to why the process is an idealization, is that in most cases the people that request the software system in the first place do not know exactly what they want and are unable to communicate everything that they do know. I think most software engineers with any experience trying to use a formal development processes can attest to this observation.

The authors also note that even if the engineers knew all of the requirements up front, other details needed to implement a system that satisfies the requirements do not become known until the implementation is already underway. Had the engineers known this information up front, it is likely that the design would be different. One of the most important points made is that even if we did know all of the facts up front, human beings are unable to fully understand all of the details that must be taken into account in order to design and build a correct (and ideal) system.

The process of designing software, according to Parnas and Clements is a process by which we attempt to separate concerns so that we are working with manageable amounts of information at one time. However, the catch-22 is that until we have separated concerns we are bound to make errors due to the complexity of the system. Furthermore, as long as there are humans involved there will be human error, even after the concerns have been separated by the design process.

Parnas and Clements make several other points including the fact that even the most trivial projects are subject to change due to external reasons, and these changes can invalidate design decisions or make decisions non-ideal given the new circumstances.

Sometimes we use a projec to try out a new idea - an idea that may not be derived from the requirements in a rational process.

However,despite the fact that the rational process is unrealistic, it is still useful. It can be used to drive the project, but it should be expected that there will be deviations and backtracking. In the end, the authors argue that we should "fake it" and produce quality documentation written as if the final design of the system was designed following the ideal rational process. We identify what documents would be produced had we followed the idea process, and attempt to produce those documents in the proper order. Any time information is not known, we make a note of it in the documentation where that information would have gone and move on as if that information is expected to change. Any time a design error is found the all of the documentation (including documents made in "previous steps" in the process) must be corrected. No matter how many times we back track, or some external influence changes the requirements, the documentation is updated and in the end the final documents are rational and accurate.

Documentation plays a major roll in the process, yet many times we don't seem to get it right.

Most programmers regard documentation as a necessary evil, written as an afterthought only because some bureaucrat requires it. They don't expect it to be useful.

This is a self fulfilling prophecy; documentation that has not been used before it is published, documentation that is not important to its author, will always be poor documentation.

Most of that documentation is incomplete and inaccurate but those are not the main problems. If those were the main problems, the documents could be easily corrected by adding or correcting information. In fact, there are underlying organisational problems that lead to incompleteness and incorrectness and those problems ... are not easily repaired [1]


I think this classic paper is a must-read by anyone trying to implement a strict waterfall process (or trying to impose such a process on their underlings). Despite being written over 20 years ago it seems like there are still lessons to be learned.

[1] Parnas, D. L. and Clements, P. C. 1986. A rational design process: How and why to fake it. IEEE Trans. Softw. Eng. 12, 2 (Feb. 1986), 251-257.

Thursday, November 20, 2008

Traditional HPC vs Distributed HPC

What has changed, what has remained the same. (or is the grid dead?)

I'm just typing the comments/questions that come up during this BoF.



At This BoF, the organizers noted that while the TerraGrid has the word "grid" in it, and the systems have grid interfaces (Globus), most users use the resources as individual HPC systems either a single system or a few in sequence

CaBIG/CaGrid brought up by someone in the audience as "barely grid computing" - data access, but a possible example of successful grid.

One audience member mentions that middleware has fallen behind, Globus difficult to install and get working.

Another audience wants to know exactly what grid computing is, and what the goals are. Are the goals to be automatic? - if a user has to specify all the distributed resources it would be too difficult to use.

One audience member asked who has programmed with low level Globus API, three people raised hands and he noted that was actually a large number. He says the Web Services APIs, that Globus pushes, have too much overhead. He also mentions that local schedulers are an issue, it is difficult to coordinate multiple resources to be available at the same time.

One audience says "a grid project hijacked OGF (Open Grid Forum) and pushed Web Services and then abandoned OGF" he wouldn't mention the project by name, but it was understood he meant Globus. He also is asking if the additional level of complexity is worth it, since distributed computing does add an additional level of complexity. No one really knows the answers.

Another audience member asks is "do we see a future in Grid computing given all the problems?". One of the organizers responds and says she doesn't see a big difference between distributed computing and traditional HPC, she has done distributed computing and the Grid vision is to automate some of it and make it more accessible.

Comment from the audience "a lot of people are doing ad-hoc distributed computing". He sees a future in grid computing for sharing medical information.

Can we do everything on large systems, do we really need to span multiple systems? TACC says that it is hard to move data around the country and that drives people to use a single system. File systems are the biggest inhibitors to performance. If the data is naturally distributed (gathered or generated in separate locations) that helps drive people to use distributed computing.

one audience member put up a slide

why use distributed resources:
  • higher availability
  • peak-requirement not met on any one system: e.g. mpich-g2
  • easy to substitute single long running simulation with multiple smaller time
  • ease of modular and incremental growth
  • automatic spread of resource requirements

One audience member says nothing exciting has happened in the industry, specifically in grid computing, all the problems are still the same. Another audience member says that the advances have been in the applications and they are much more complicated, and workflows are much more complicated - perhaps this is what makes progress on grid computing so slow.


One problem noted is the lack of funding for middleware - some of the middle ware being developed is business based, and not targeted towards science users.

Big argument about why we are stuck with MPI, which is at such a low level. There were parallel languages in the 70s - 80s that some people are arguing were much better than MPI another person arguing that they just didn't work - there was too much overhead and they just didn't scale. MPI may be less elegant to program for, but it does work and it can scale. "So these advanced tools didn't work, so we're stuck with MPI? Well that is a sorry state"


Not everyone belongs in distributed computing - it may take longer to prepare a problem for the grid or distributed computing than it would take to just run it on a cluster.


Overall themes:

The continuing improvements in price/performance of HPC reduces need for distributed computing.

Lack of standards in middleware hinders use of distributed machines.

Globus is hard to use. Need for better tools.

Is the effort worth the return?

Wednesday, November 19, 2008

The G word isn't Grid anymore

Green500 BoF

Grid isn't the hot buzz word in HPC anymore, it is "Green". This is evident by walking around the show floor and seeing how many vendors are touting the "greenness" of their solutions.

With the increased pressure on reducing our carbon footprint, there is now a movement to not only improve the speed of the fastest computers in the world, but to also improve the efficiency with regards to power consumption. If power consumption continues to grow at the current rate (linear or even superlinear) with respect to performance, in the near future a large top-10 cluster could conceivably require its own power plant to operate. There is a desire to level off the power consumption of these large supercomputers and be able to increase performance without increasing power requirements.

The Green 500:
The metric is flops per watt, with the Linpack benchmark, and the system must make the normal Top500 list to be ranked. This list, which I believe was started last year, lets us know how the fastest computers in the world rank when taking power consumption into account. Perhaps making the top of the Green 500 will come with the same prestige as making the top of the Top 500. The major topic of this BoF is how the Green 500 should be redesigned to provide maximum utility. What benchmark and how many should be run? Should we have multiple metrics? If so, how would they be weighted in order to rank the systems? Or should they be condensed into a single score? What should be measured? Just the computer? The entire data center?

Update:
Fist fights
At the Green 500 BoF, a fist fight almost broke out (not really, but the discussion got very "lively") between someone that advocated using single precision math for the early part of a computation, then switching to double precision to converge to the final solution, and someone that thought this was gaming the system. This idea is "green" because many processors have much better single precision performance than double precision - over 10x better on a cell, 4x better on a GPGPU, probably even 2x on a x86. This lets more work get done with fewer processors or in less time, consuming less power. Should "green algorithms" be allowed, or even encouraged for the benchmark?

There was also discussion of creating various classes of systems, either by price or performance. One problem is that a megaflops per watt ranking favors small systems.

Gregg (who tagged along to steal power) posted his thoughts over at his blog. He tried to think of it from a CIO type perspective.

Genomic Sequence search on Blue Gene/P

This afternoon I find my self in a talk about massively parallel sequence search.

I'll take a look at the conference procedings and edit this to put in a reference to their paper.


As we know sequence search is a fundamental tool in computational biology. A popular search algorithm in BLAST.

Genomic databases are growing faster than compute capability of single CPUs (clock scaling hits the power wall). This requires more and more processors to complete the search in a timely manner. The BLAST algorithm is O(n^2) worst case.

These researchers are using mpiBLAST on the Blue Gene/P, with very high efficiency. There have been many scalability improvements to mpiBLAST in the past few years, but it still didn't scale well beyond several thousand processors. They have identified key design issues of scalable sequence search and have made modifications to mpiBLAST to improve its performance at massive scales.

one limitation was the fixed worker-to-master mapping, and high overhead with fine-grained load balancing. Their optimizations include improvements that allow mapping arbitrary workers to a master, and hide balancing overhead with query prefetching.

There are I/O challenges as well. They have implemented asynchronous two-phase I/O to get high throughput without forcing synchronization.

They show 93% efficiency on 32,000 processors with their modified mpiBLAST.

OpenMPI BoF

I am currently attending the OpenMPI BoF, being led by Jeff Squires of Cisco, one of the OpenMPI main developers. Prior to working at Cisco on OpenMPI, Jeff was part of the LAM/MPI project at Indiana University.

For a little background, OpenMPI is a project that was spawned when a bunch of MPI implementers got together and decided to work together since they were all working on basically the same thing. LAM/MPI (which we have been using at the Lab), FT-MPI, Sun CT 6, LA-MPI, PACX-MPI


What's new in 1.3 (to be released soon):
  • ConnetX XRC support
  • More scalability improvements
  • more compiler and run time environment support
  • fine-grained processor affinity control
  • MPI 2.1 compliant
  • notifier framework
  • better documentation
  • more architectures, more OSes, more batch systems
  • thread safety (some devices, point to point only)
  • MPI_REAL16, MPI_COMPLEX32 (optional, no clean way in C)
  • C++ binding improvements
  • valgrind (memchecker) support
  • updated ROMIO version
  • condensed error messages (MPI_Abort() only prints one error message)
  • lots of little improvements
Scalability
  • keep the same on-demand connection setup as prior version
  • decrease memory footprint
  • sparse groups and communicators
  • many improvements in OpenMPI run time system
Pont to point Message Layer (PML)
  • improved latency
  • smaller memory footprint

collectives
  • more algorithms, more performance
  • special shared memory collective
  • hierarchical collective active by default
open fabrics: now support iWarp, not just infiniband open fabric. XRC support, message coalescing (resisted because only really useful for benchmarking). uDAPL improvements by Sun (not really open fabric)

Fault Tolerance
  • coordinated checkpoint/restart
  • support BLCR and self (self means you give function pointer to call for checkpoint)
  • able to handle real process migration (i.e. change network type during migration)
  • improved message logging
OpenMPI on Roadrunner - scaling to 1 petaflop
  • reduce launch times by order of magnitude
  • reliability: cleanup, robustness
  • maintainability: cleanup, simplify program. remove everything not required for OMP
routed out of band communications

Roadmap:
v1.4 in planning phase only, feature list not fully decided

run-time usability
  • parameter usability options
  • sysadmin lock certain parameter values
  • spelling checks, validity checks
run-time system improvements
  • next-gen launcher
  • integration with other run-time systems
more processor and memory affinity support, topology awareness

shared memory improvements: allocations sizes, sharing. scalability to manycore

I/O redirection features
  • line by line tagging
  • output multiplexing
  • "screen"-like features
Blocking progress
MPI connectivity map
refresh included software


Upcoming Challenges:
fFault tolerance, first step similar to FT-MPI approach - if a rank dies the rest of the ranks are still able to communicate, up to programmer to detect and recover if possible
Scalability at run time and MPI level
Collective communication - when to switch between algorithms, take advantage of physical topology

MPI Forum
HLRS is selling MPI 2.1 spec at cost $22 (586 pages), both #1353
what do you want in MPI 3.0?
what don't you want in MPI 3.0?


Feedback:
Question regarding combining OpenMPI with OpenMP: Jeff: yes and no, OpenMPI has better threading support now, but can't guarantee it won't break yet - should be fine with devices that support mpi thread multiple

Can you compare OpenMPI with other mpi impelmentations? Jeff: We steal from them, they steal from us. Some say competition is good, but having many implementations available, especially on a single cluster, is confusing to users. Jeff would like to see more consolidation.

show of hands how important is...

thread safety (multiple threads making simultaneous MPI calls). about 10 in a full room
Parallel I/O. only a few hands
one-sided operations - only a couple users

Predictive Medicine with HPC

I just attended a talk about using computer modeling to make medical predictions. Things such as modeling an aneurysm, predicting how it will grow, the blood flow through it, and the forces applied on the vessel wall . With these computer models they can determine what the dangers are and what coarse of action could be taken. They also showed computer models of drugs dispersed directly into the blood stream near the heart through a catheter. They wanted to see how well th drugs would be absorbed into the vessel wall.

The major theme was that eventually they will be using these computer models to make predictions about your future health and take proactive approaches to managing it, rather than waiting for something bad to happen and treating reactively.

During the Q.A. someone form the audience asked how much computational power was needed for these models (specifically the turbulence models in aneurysms and cardiac arteries) - is it something that a doctor could do in his office. The presenter said that it isn't something that could be done on the laptop yet, but can be done with a small sized cluster - they may run on 128 processors or as low as 16 depending on what they are doing or how quickly they need the computations. They don't require massive systems. He says ideally computing power will get to the point where this can be coupled with a medical imaging device.