The past, present and future HPC platform

By Koos Huijssen

What has happened around High Performance Computing (HPC) systems in the past 10 years? What will we see in the field of systems for large scale computing in the coming years? We asked two experts, who gave us a birds eye view of the developments in the field of HPC platforms.

The hardware of HPC-platforms

Daniël Mantione is High Performance Computing (HPC) Benchmark Specialist at ClusterVision. Marc van Schijndel is Country Manager NL/Benelux at ttec. Both have over 10 years of experience in designing (Daniël) and selling (Marc) HPC platforms. Among these are large scale systems for universities, research institutes and hospitals. But also smaller systems for companies and engineering firms.

Daniël (ClusterVision): “Even 10 years ago, the main configuration that we delivered was already the so-called Beowulf cluster, that is: a number of Linux compute nodes connected by a fast network.” The changes are mostly in the components, according to Daniël. The number of nodes has grown, the network is faster. But most importantly, the nodes themselves are far more powerful. The number of cores per processor has grown ten-fold and the amount of memory and disk has grown exponentially.

Marc (ttec): “An important change is that HPC-platforms used to be built from special head nodes and compute nodes with a central data storage system. These days, systems become ‘hyper-converged’: every node has both processing power and storage.” This makes the management and scaling of a cluster much easier. And these systems are also better suited for applications that crunch a lot of data.

Both experts think that the power use of HPC platforms is increasingly a limiting factor. Sustainability and efficiency are essential as the power bill becomes infeasible for larger systems. Marc (ttec): ”These days, lower clock frequencies are normal. The trend is towards more but less powerful cores.” The use of graphics cards is a good example. At this moment around 10-30% of the HPC platforms that are installed use GPU’s. This development towards an ever larger number of cores but with a lower clock speed will certainly continue in the coming years.

The software for HPC-platforms

Strikingly, the software running on compute clusters is often “harder” than the hardware. Daniël (ClusterVision): “I regularly see code that have been in use for more than thirty years and are still updated constantly. These are used for ever larger and more detailed problems. The software that runs on our systems is some 60% FORTRAN. The rest is primarily C++ and C.” The software is optimized to run on compute clusters. Daniël: “A cluster model is a fairly generic configuration. It’s different from graphics cards where you have to modify your software significantly. A solution is the availability of libraries with optimized algorithms. This has been a major factor in the acceptance of alternative HPC-platforms.”

The customers that work with proprietary and open-source software are mostly universities and research institutes. Apart from that, there is a small number of large companies with proprietary software, like Shell and ASML. The applications are things like weather predictions, seismic exploration and fluid flow. A relatively new but quickly growing HPC application is bioinformatics. Marc (ttec): ”This is all about analysing DNA and genes, identifying proteins and determining their structure, and simulating all sorts of interactions in cells.” Bioinformatics works with huge amounts of data, which has consequences for the HPC platform. The data is not kept in a central storage because then it would have to be transferred across the network. Instead, it is distributed over the compute nodes which can then do their operations on local data.

Apart from customers with proprietary software, there are also companies and engineering firms that work with compute software under a commercial license. Often, they purchase smaller systems because the license structure does not allow the use of a large number of nodes. But it seems to work for them. Daniël (ClusterVision): “Many companies don’t see themselves as HPC user. They tie some standard hardware together, but it is mostly far from optimal. There is a lot of room for improvement here.”

The cloud

The emergence of cloud computing in the general IT world did not leave the HPC world untouched. Marc (ttec): “Cloud offers a new, uniform and flexible access to compute capacity. In the past, there was a serious performance degradation from virtualisation. But these days there are cloud solutions that handle this much better. In the first place, it’s about the use of a cloud platform (like OpenStack, for example) as a resource management system for the local private cluster. It you want to start a compute job, you get a sort of mini cluster that you can configure yourself. This works with so-called Docker containers, a lightweight version of a virtual machine. An important advantage is that you can configure everything that is needed for an application inside the container. That makes the system administration much easier.

The use of public clouds like Amazon or Azure is still limited for HPC. Marc (ttec): “The public cloud has a nice cost model because you only pay for what you use. You don’t have to own and manage your own hardware. But it is still often more expensive if you use it regularly. And the confidentiality of data in the public cloud is still a nasty issue.”

Daniël (ClusterVision) sees software developers for HPC applications still working quite traditionally: a shell with a command line, manual compilation, configuring compute jobs through text files and submitting things to the queue by hand. There is often a bunch of shell-scripts around such a workflow. This is because the software always worked that way. Daniël: ”But the cloud model will certainly grow. We also offer a cloud platform for allocating compute capacity on your local cluster. That makes it easier to burst to the cloud in periods of peak demand.”

The development of the HPC-platform in the coming years

In short, the next few years will see HPC hardware platforms grow more powerful, but with much more attention to efficiency. That will make parallel computing and the adaptation of compute codes ever more important. In terms of system administration, the management platforms will increasingly be based on the cloud concept.