Architecture
Cluster architecture
HPC is often equated with a cluster computer, a collection of computers linked together over a high-speed network to perform highly complex computing tasks. These connected computers (usually called servers) work together to provide the processing power to analyze and process large data sets, simulate complex systems, and solve complex scientific and engineering problems.

An overview of the HPC cluster architecture.
The main components of the HPC cluster
compute - a collection of computers or servers that are the main power of the computer cluster.
memory - high-speed and high capacity memory and storage systems for handling large amounts of data.
network - high-speed network systems that connect all parts of the cluster together
Compute
The computer cluster consists of a large number of compute nodes (machines) organized in blades and racks. Multiple racks make a computer cluster.

The supercomputer is built of the number of compute nodes
Modern HPC systems are hybrid computing systems equipped with a conventional CPU (AMD or Intel) and various accelerators. Nowadays, the dominant accelerators are graphics processors (Nvidia and AMD), but also FPGA, Many Integrated Core (MIC) from Intel (e.g. Xeon Phi - discountinued), and Tensor Processing Units (TPU).
To improve the energy efficiency and scalability of HPC systems that consume a lot of electric power, ARM processors are becoming increasingly popular in the HPC world. An example are the Fujitsu A64FX processor, the SiPearl Rhea or NVIDIA Grace.

An example of the hybrid CPU-GPU computer server. Supercomputer Supek at the University Computing Centre in Zagreb. Picture taken from https://wiki.srce.hr/display/NR/Arhitektura+Supeka.
Memory
The memory in modern HPC (and every other) systems is a structured in different levels each with different type, speed and capacity.
Do you know why there are multiple types and levels of memory?
The goal of the memory hierarchy is to provide fast access to frequently used data while maintaining large storage capacities for less frequently accessed data. This hierarchy is critical for optimizing performance in HPC systems, where data access patterns and computational intensity vary widely across workloads.

Picture take from https://computerscience.chemeketa.edu/
Memory hierarchy in moderh HPC systems
Level |
Type |
Size |
Speed |
Location |
---|---|---|---|---|
Registers |
CPU Registers |
~KB |
1-2 cycles |
On-chip (CPU) |
L1 Cache |
SRAM |
32–64 KB/core |
2–4 cycles |
On-chip (CPU) |
L2 Cache |
SRAM |
256 KB–1 MB/core |
10–20 cycles |
On-chip (CPU) |
L3 Cache |
SRAM |
10–100 MB/CPU |
20–60 cycles |
On-chip (CPU) |
Main Memory |
DRAM (DDR4/DDR5) |
128 GB–4 TB/node |
20–100 ns |
Off-chip |
GPU Memory |
HBM/GDDR6 |
16–80 GB/GPU |
1–2 TB/s |
(HBM) |
Non-Volatile Memory |
Intel Optane/NVDIMM |
100s GB–TB |
~DRAM speed |
Off-chip |
Local Storage |
NVMe SSD/HDD |
TBs–PBs |
μs–ms |
Node-local |
Shared Storage |
Lustre/BeeGFS |
PBs–EBs |
Slower than local |
Network-attached |
Tape Storage |
Tape Drives |
PBs–EBs |
Seconds–minutes |
Offline |
The main memory (also known as RAM - Random Access Memory) in an HPC cluster system or supercomputer is a critical component that directly impacts the performance of computational workloads. It serves as the primary workspace for data that the CPU and accelerators (e.g., GPUs) actively use during computation. The main memory is distributed and visible to only local CPUs and/or GPU (i.e. physical computer or compute node).
Storage is a large number of dedicated servers (computers) consisting of a high volume and high read and write speeds. The storage systems are distributed and visible to many (all) compute and login nodes. The storage has the following characteristics:
Scalability: highly scalable to accomodate growing data volumes generated and processed by HPC applications.
Performance: ensure that the data can be accessed, read and written at high speeds to keep pace with the computational requirements, support high-performance protocols, such as Parallel File Systems (parallel NFS), parallel IO
Parallelism: support parallel access to data across multiple nodes simultaneously, leverage parallel file systems and distributed storage architectures to maximize throughput and minimize contention.
Reliability and Availability: Secure uninterupted access to data and preventing data loss or corruption, incorporated features such as redundancy, data replication, snapshotting, etc.
Security: Dealing with sensitive or proprietary information, robust security features such as access control, encryption, authentication.
Distributed file systems used nowadays in HPC: Lustre, GPFS, Ceph, BeeGFS.
Network
The major bottleneck in large scale and data-intensive applications is not computation but the data movement*! The modern processors can process data at much larger speeds then the data can be transferred (accessed or read) via the network. Therefor, the avaiability of fast interconnections is of paramount performance for HPC applications.
HPC network
The interconnection inside HPC cluster has to have high bandwidth and ultra low latency!

Figure taken from [https://www.dnsstuff.com/latency-throughput-bandwidth] (https://www.dnsstuff.com/latency-throughput-bandwidth)
Today, several types of interconnection are predominately used in the HPC systems and these are Infiniband and Ethernet, however, recently interconnect types have emerged such as Slingshot, Omni-Path and Tofu.

Graph taken from https://www.top500.org/statistics/list/
Supercomputers
An extremely large cluster computers comprising of specially designed hardware optimized for parallel processing, including specialized processors such as GPUs, high-speed interconnection and large-scale shared storage systems. Supercomputers usually consists of millions of processing cores and can deliver sustained performance in petaflops.