Architecture

Cluster architecture

HPC is often equated with a cluster computer, a collection of computers linked together over a high-speed network to perform highly complex computing tasks. These connected computers (usually called servers) work together to provide the processing power to analyze and process large data sets, simulate complex systems, and solve complex scientific and engineering problems.

../../_images/architecture-shematic.png

An overview of the HPC cluster architecture.

The main components of the HPC cluster

  • compute - a collection of computers or servers that are the main power of the computer cluster.

  • memory - high-speed and high capacity memory and storage systems for handling large amounts of data.

  • network - high-speed network systems that connect all parts of the cluster together

Compute

The computer cluster consists of a large number of compute nodes (machines) organized in blades and racks. Multiple racks make a computer cluster.

../../_images/hpc-architecture.png

The supercomputer is built of the number of compute nodes

Modern HPC systems are hybrid computing systems equipped with a conventional CPU (AMD or Intel) and various accelerators. Nowadays, the dominant accelerators are graphics processors (Nvidia and AMD), but also FPGA, Many Integrated Core (MIC) from Intel (e.g. Xeon Phi - discountinued), and Tensor Processing Units (TPU).

To improve the energy efficiency and scalability of HPC systems that consume a lot of electric power, ARM processors are becoming increasingly popular in the HPC world. An example are the Fujitsu A64FX processor, the SiPearl Rhea or NVIDIA Grace.

../../_images/supek-gpu-server.png

An example of the hybrid CPU-GPU computer server. Supercomputer Supek at the University Computing Centre in Zagreb. Picture taken from https://wiki.srce.hr/display/NR/Arhitektura+Supeka.

Memory

The memory in modern HPC (and every other) systems is a structured in different levels each with different type, speed and capacity.

Do you know why there are multiple types and levels of memory?

The goal of the memory hierarchy is to provide fast access to frequently used data while maintaining large storage capacities for less frequently accessed data. This hierarchy is critical for optimizing performance in HPC systems, where data access patterns and computational intensity vary widely across workloads.

../../_images/Memory-Hierarchy.jpg

Picture take from https://computerscience.chemeketa.edu/

Memory hierarchy in moderh HPC systems

Level

Type

Size

Speed

Location

Registers

CPU Registers

~KB

1-2 cycles

On-chip (CPU)

L1 Cache

SRAM

32–64 KB/core

2–4 cycles

On-chip (CPU)

L2 Cache

SRAM

256 KB–1 MB/core

10–20 cycles

On-chip (CPU)

L3 Cache

SRAM

10–100 MB/CPU

20–60 cycles

On-chip (CPU)

Main Memory

DRAM (DDR4/DDR5)

128 GB–4 TB/node

20–100 ns

Off-chip

GPU Memory

HBM/GDDR6

16–80 GB/GPU

1–2 TB/s

(HBM)

Non-Volatile Memory

Intel Optane/NVDIMM

100s GB–TB

~DRAM speed

Off-chip

Local Storage

NVMe SSD/HDD

TBs–PBs

μs–ms

Node-local

Shared Storage

Lustre/BeeGFS

PBs–EBs

Slower than local

Network-attached

Tape Storage

Tape Drives

PBs–EBs

Seconds–minutes

Offline

The main memory (also known as RAM - Random Access Memory) in an HPC cluster system or supercomputer is a critical component that directly impacts the performance of computational workloads. It serves as the primary workspace for data that the CPU and accelerators (e.g., GPUs) actively use during computation. The main memory is distributed and visible to only local CPUs and/or GPU (i.e. physical computer or compute node).

Storage is a large number of dedicated servers (computers) consisting of a high volume and high read and write speeds. The storage systems are distributed and visible to many (all) compute and login nodes. The storage has the following characteristics:

  • Scalability: highly scalable to accomodate growing data volumes generated and processed by HPC applications.

  • Performance: ensure that the data can be accessed, read and written at high speeds to keep pace with the computational requirements, support high-performance protocols, such as Parallel File Systems (parallel NFS), parallel IO

  • Parallelism: support parallel access to data across multiple nodes simultaneously, leverage parallel file systems and distributed storage architectures to maximize throughput and minimize contention.

  • Reliability and Availability: Secure uninterupted access to data and preventing data loss or corruption, incorporated features such as redundancy, data replication, snapshotting, etc.

  • Security: Dealing with sensitive or proprietary information, robust security features such as access control, encryption, authentication.

Distributed file systems used nowadays in HPC: Lustre, GPFS, Ceph, BeeGFS.

Network

The major bottleneck in large scale and data-intensive applications is not computation but the data movement*! The modern processors can process data at much larger speeds then the data can be transferred (accessed or read) via the network. Therefor, the avaiability of fast interconnections is of paramount performance for HPC applications.

HPC network

The interconnection inside HPC cluster has to have high bandwidth and ultra low latency!

../../_images/latency-bandwidth.jpeg

Figure taken from [https://www.dnsstuff.com/latency-throughput-bandwidth] (https://www.dnsstuff.com/latency-throughput-bandwidth)

Today, several types of interconnection are predominately used in the HPC systems and these are Infiniband and Ethernet, however, recently interconnect types have emerged such as Slingshot, Omni-Path and Tofu.

../../_images/interconnect-system-share.png

Graph taken from https://www.top500.org/statistics/list/

Supercomputers

An extremely large cluster computers comprising of specially designed hardware optimized for parallel processing, including specialized processors such as GPUs, high-speed interconnection and large-scale shared storage systems. Supercomputers usually consists of millions of processing cores and can deliver sustained performance in petaflops.

Explore supercomputers (3D virtual tour)

Fugaku - 6th largest supercomputer in the World

LUMI - 8th in the World, 3rd in Europe (Finland)