Beyond Beowulf: Clusters, Cores, and a New Era of TOP500

As published in Top500

This year at the SC conference, the iconic Beowulf Bash (Monday, Nov. 17, 9:00 p.m. to midnight, at Mardi Gras World) celebrates the 20-year history of Beowulf clusters, the 1994 achievement that would
transform the HPC industry, lowering the cost-per-flop barrier for anyone ambitious enough to tackle the parallelism challenge. Today the parallel
is palpable, as HPC faces new architectural revolutions that carry software implications of their own.

Supercomputing has already transformed through several discrete architectural eras, in which advancements in scalability went beyond generational speed-ups
of last year’s model to build something fundamentally new. Vector processing systems gave way to RISC-based symmetric multi-processing (SMP), shared-memory
systems based on proprietary versions of UNIX, such as AIX, HP-UX, Solaris, and IRIX. Beowulf took over from there, standardizing on lower-cost
“commodity” components, x86 architectures running Linux. Each phase change led to greater democratization of supercomputing, the ability for more
users to buy and deploy HPC-caliber systems, thanks to standardization.

Today we have entered an era beyond Beowulf, with the advent of many-core processors. Users are increasingly testing and deploying systems built with
NVIDIA GPUs, AMD APUs, or Intel Xeon Phi processors, which will eventually redefine approaches to parallelism and efficiency optimization throughout
the industry. As we look back on the achievements of Beowulf, there are some apparent similarities that are worthy of note, as well as some critical
differences that would be foolhardy to ignore.

The many-core era brings many of the same benefits of Beowulf clusters to another level, while also introducing many of the same challenges, again
at another level. Beowulf clusters offered all of those beautiful, delicious flops to anyone who wanted them, provided they were willing to self-assemble
them (at first). Many-core is the same attractive offer, writ larger. Thanks to NVIDIA’s early efforts in this space, anyone with access to an
NVIDIA GPU on a laptop has been able to download the CUDA drivers needed to amplify the performance available from the microprocessor alone.

The arguments against many-core computing—that it requires changes to the programming model, that there aren’t applications that are optimized
for it, that true performance may be far off from theoretical peak performance, that administrators didn’t know how to support or optimize them—are
the exact arguments we considered when Beowulf clusters were new. (I personally worked in product marketing at SGI from 1997-2003, and therefore
I was one of the counter-revolutionaries, failing to convince the market of the superiority of shared-memory systems.) Eventually the value proposition
for Beowulf was too tempting, and the industry gradually, collectively took the plunge in adopting the so-called “commodity cluster.”

Certainly there have been evolutions in clusters, most notably with the addition of 64-bit capabilities. AMD took a leadership position over Intel
for nearly two years as it drove its x86-64 Athlon 64 and Opteron processors into the market gap between Intel’s x86-32 Pentiums and its native
64-bit Itaniums. InfiniBand offered a high-end networking standard above Ethernet. Nevertheless, the concept of a commodity cluster endured. Although
there came to be different features available in them, clusters in general were deemed to be “industry-standard.”

Many-core is not a single industry-standard offering, and this is the biggest point of difference with respect to Beowulf. The varied emerging processing
architecture options—multi-core x86, accelerated many-core x86, GPU computing, POWER (with or without GPU), ARM, FPGAs, DSPs, etc.—are
anything but standard. The new era of supercomputing is swinging back toward proprietary solutions. Each vendor’s processing solution (NVIDIA Tesla,
Intel Xeon Phi, etc.) is proprietary to that vendor and carries its own software tools (NVIDIA CUDA, Intel developer tools, etc.) along with it.
Some industry standards like OpenCL do exist, but these are not the predominant solutions in the market, nor do they cover the gamut of processing
options available.

Furthermore there is another school of next-generation computing that is not compute-centric at all. For applications that are more I/O-bound than
compute-bound, there is a movement toward data-centric computing architectures that de-emphasize the computational elements. Certainly these computers
still contain processors, but in markets driven by Big Data trends, there is the potential of a paradigm shift that will change the focus of the
discussion.

The implied changes to programming and optimization are the single biggest challenge in scaling applications into the Exascale era, as was cited in
the recent “Solve” report from the U.S. Council on Competitiveness.
A major concern for HPC users is choosing the right architectures for the future of their applications.

As the HPC market continues to change, iconic rankings like TOP500 become even more essential, if only because they simplify complicated trends into
achievements and aspirations that the world can understand, and invest in. We will continue to need faster supercomputers, and we will continue
to apply them to a panoply of scientific, engineering, research, and business issues. The TOP500 list gives us a visible, simple way to track what
is achievable at the high end.

TOP500, meanwhile, is undergoing evolutions of its own, to give the HPC industry a more complete look at the HPC industry. There are more ways to measure
performance, and issues affecting scalability and performance go beyond the lists. TOP500 will cover these developments, and Intersect360 Research
is thrilled to be the market research partner in this endeavor, providing the lenses to examine and forecast the market in all its segmentations.

The nature of the supercomputing industry is that it is driven forward to new levels of scalability and performance by ever more demanding problems.
Models can be developed with greater realism, simulations run with higher fidelity, predictions forecast with greater specificity. New eras in
architecture herald new breakthroughs and innovations, and there is always more work to be done.

But first let’s get a drink at the Beowulf Bash.

Posted in Features