As published in HPCWire

Server partitioning – one of the many implementations of IT virtualization – has for the last half decade seen strong interest within commercial computing
environments. This interest has been driven in large part in response to the population explosion of small servers that occurred in the first part
of the decade. Confronted with the availability of inexpensive powerful PC-based servers, many users adopted the strategy of “a server for every
application, and an application for every server.”

This approach worked well until those pesky little boxes began to form herds, with each member requiring “care and feeding,” which added up to some
major system management expenses. To make matters worse, these herds of servers were generally underutilized (what does a print server do when
no one is printing, other than generate heat?). Thus, users looked to virtualization and partitioning as a way to keep the herd of killer micros
under control. By dividing an underutilized server into multiple logical partitions each configured to meet the requirements of a specific set
of applications, users began to consolidate their servers (i.e., cull the herd). They have never looked back.

While this saga played out in the commercial computing world, HPC users have gone about their business, showing no great interest in the charms of
partitioning — in our recent HPC Site Census, only 18 of the 124 systems in the survey were partitioned. I think there are several major reasons
why a technology that has been a superstar in the commercial sector is only getting third or fourth billing in HPC environments:

  • HPC applications tend to be resource hungry — scientists, engineers and their kin can always find a use for additional computer cycles, subsequently
    sites may not want to pay the overhead costs associated with partitioning.
  • HPC systems generally operate in batch mode where the batch queues can provide some of the resource management benefits of partitions.
  • HPC sites often use what might be called “time partitioning” where development and testing applications are run during the day, and production
    codes are run on the second and third shifts.
  • Configuration level partitioning — There is a tendency to configure clusters up to a “manageability limit” and then to start building a second
    cluster. The manageability limit may be close to the point where partitioning might be necessary.

That said, we are hearing about partitioning more often, and there are several ways that partitioning might begin to move toward HPC’s center stage:

  • “Development partitions” — System software/applications developers can designate a partition for code testing, which allows a free hand without
    interfering with a larger “production partition” running at the same time.
  • Certified applications partitions — Industrial engineering/design applications often require certification for various regulatory/liability reasons.
    Certified applications codes may require specific versions of the O/S, libraries, etc. In cases where the certified environment differs from
    the current production environment separate partitions may be called for.
  • Small and medium businesses — Our research indicates that small and medium engineering businesses tend to include HPC applications as part of
    their overall IT infrastructure as opposed to separate systems as found in larger industrial sites. In this case partitioning may be the best
    way to optimize total company resource use between technical and business applications.
  • Impact of Grids/Clouds — As virtual computing infrastructures come online, the geographic distribution of users begins to work against time partitioning,
    as there will always be someone in the world looking for development time or production time on a system, in this case a server partitioning
    scheme may be necessary.

A final point worth noting is that the different levels of interest in partitioning between commercial and HPC users illustrate the divide between
them. These two branches of computing have fundamental differences in workflow patterns and end-user requirements that can lead to different strategies
for configuring and managing computer room environments…. [Editors note: At this point the author began a manic rant, which quickly mounted
to the incomprehensible. He has been sedated (we keep a dart gun for this purpose), and has been given into the care of professionals. We expect
him to be back to (an approximation of) normal shortly.]

Posted in