Architecture

Cluster architecture

HPC is often equated with a cluster computer, a collection of computers linked together over a high-speed network to perform highly complex computing tasks. These connected computers (usually called servers) work together to provide the processing power to analyze and process large data sets, simulate complex systems, and solve complex scientific and engineering problems.

../../_images/architecture-shematic.png — An overview of the HPC cluster architecture.

The main components of the HPC cluster

compute - a collection of computers or servers that are the main power of the computer cluster.
memory - high-speed and high capacity memory and storage systems for handling large amounts of data.
network - high-speed network systems that connect all parts of the cluster together

Compute

The computer cluster consists of a large number of compute nodes (machines) organized in blades and racks. Multiple racks make a computer cluster.

../../_images/hpc-architecture.png — The supercomputer is built of the number of compute nodes

Modern HPC systems are hybrid computing systems equipped with a conventional CPU (AMD or Intel) and various accelerators. Nowadays, the dominant accelerators are graphics processors (Nvidia and AMD), but also FPGA, Many Integrated Core (MIC) from Intel (e.g. Xeon Phi - discountinued), and Tensor Processing Units (TPU).

To improve the energy efficiency and scalability of HPC systems that consume a lot of electric power, ARM processors are becoming increasingly popular in the HPC world. An example are the Fujitsu A64FX processor, the SiPearl Rhea or NVIDIA Grace.

../../_images/supek-gpu-server.png — An example of the hybrid CPU-GPU computer server. Supercomputer Supek at the University Computing Centre in Zagreb. Picture taken from https://wiki.srce.hr/display/NR/Arhitektura+Supeka.

Memory

The memory in modern HPC (and every other) systems is a structured in different levels each with different type, speed and capacity.

Do you know why there are multiple types and levels of memory?

The goal of the memory hierarchy is to provide fast access to frequently used data while maintaining large storage capacities for less frequently accessed data. This hierarchy is critical for optimizing performance in HPC systems, where data access patterns and computational intensity vary widely across workloads.

../../_images/Memory-Hierarchy.jpg — Picture take from https://computerscience.chemeketa.edu/

Memory hierarchy in moderh HPC systems

Level	Type	Size	Speed	Location
Registers	CPU Registers	~KB	1-2 cycles	On-chip (CPU)
L1 Cache	SRAM	32–64 KB/core	2–4 cycles	On-chip (CPU)
L2 Cache	SRAM	256 KB–1 MB/core	10–20 cycles	On-chip (CPU)
L3 Cache	SRAM	10–100 MB/CPU	20–60 cycles	On-chip (CPU)
Main Memory	DRAM (DDR4/DDR5)	128 GB–4 TB/node	20–100 ns	Off-chip
GPU Memory	HBM/GDDR6	16–80 GB/GPU	1–2 TB/s	(HBM)
Non-Volatile Memory	Intel Optane/NVDIMM	100s GB–TB	~DRAM speed	Off-chip
Local Storage	NVMe SSD/HDD	TBs–PBs	μs–ms	Node-local
Shared Storage	Lustre/BeeGFS	PBs–EBs	Slower than local	Network-attached
Tape Storage	Tape Drives	PBs–EBs	Seconds–minutes	Offline

The main memory (also known as RAM - Random Access Memory) in an HPC cluster system or supercomputer is a critical component that directly impacts the performance of computational workloads. It serves as the primary workspace for data that the CPU and accelerators (e.g., GPUs) actively use during computation. The main memory is distributed and visible to only local CPUs and/or GPU (i.e. physical computer or compute node).

Storage is a large number of dedicated servers (computers) consisting of a high volume and high read and write speeds. The storage systems are distributed and visible to many (all) compute and login nodes. The storage has the following characteristics:

Scalability: highly scalable to accomodate growing data volumes generated and processed by HPC applications.
Performance: ensure that the data can be accessed, read and written at high speeds to keep pace with the computational requirements, support high-performance protocols, such as Parallel File Systems (parallel NFS), parallel IO
Parallelism: support parallel access to data across multiple nodes simultaneously, leverage parallel file systems and distributed storage architectures to maximize throughput and minimize contention.
Reliability and Availability: Secure uninterupted access to data and preventing data loss or corruption, incorporated features such as redundancy, data replication, snapshotting, etc.
Security: Dealing with sensitive or proprietary information, robust security features such as access control, encryption, authentication.

Distributed file systems used nowadays in HPC: Lustre, GPFS, Ceph, BeeGFS.

Network

The major bottleneck in large scale and data-intensive applications is not computation but the data movement*! The modern processors can process data at much larger speeds then the data can be transferred (accessed or read) via the network. Therefor, the avaiability of fast interconnections is of paramount performance for HPC applications.

HPC network

The interconnection inside HPC cluster has to have high bandwidth and ultra low latency!

../../_images/latency-bandwidth.jpeg — Figure taken from [https://www.dnsstuff.com/latency-throughput-bandwidth] (https://www.dnsstuff.com/latency-throughput-bandwidth)

Today, several types of interconnection are predominately used in the HPC systems and these are Infiniband and Ethernet, however, recently interconnect types have emerged such as Slingshot, Omni-Path and Tofu.