This post is originally written by Murugan Sekar and Krishnaprasad K from Dell hypervisor Engineering team.
In NUMA enabled systems, memory channels are distributed across the processors. All memory related operations require snoop operations in order to maintain cache data coherency. Snooping is used to probe the content of cache on both local and remote processors to find the copy of requested data resides in any of caches. If NUMA is disabled (Node interleaving enabled in BIOS) then snoop mode is disabled automatically.
There are three types of snoop mode available in Intel Haswell microarchitecture. Dell 13th generation of servers (13G) support all three snoop modes such as:-
1) Early snoop
2) Home snoop
3) Cluster On Die
In this blog we discuss about Cluster-On-Die (COD) snoop mode in terms of VMware ESXi. This blog covers the following aspects.
Before we get into the details of COD, it’s required to understand types of processors based on the core count on Intel Haswell processor microarchitecture.
Intel has classified the haswell processor architecture into the following types:-
1) LCC- Low core count[4 -8 core]
2) MCC- Medium core count[10 – 12 core]
3) HCC- High core count[14-18 core]
NOTE: This core count types varies on different Intel microarchitecture.
What is Cluster-On-Die (COD) mode?
COD is a new snoop mode introduced from Intel Haswell processor family that has 10 or more cores. For the MCC and HCC processor categories, Intel has incorporated two memory controllers on a single processor socket whereas LCC processor has only one memory controller. Each memory controller in a processor socket act as one home Agent [HA].
On COD enabled servers, each processor logically splits the socket into 2 NUMA nodes Each NUMA node has half of the total number of physical cores and half of the last level cache(LLC) with one home agent. The term cluster is formed as processor cores and the corresponding memory controller are grouped together and formed as cluster on the socket die. Each home agent uses two memory channels and sees requests from less number of processor logical cores thus providing higher memory bandwidth and low latency. This operating mode is mainly used for optimizing the NUMA workloads. The operating systems displays the number of NUMA nodes by reading the ACPI SRAT tables.
A graphical representation of COD is as follows:-
It can be seen in the second image that the single processor socket die is divided into two logical nodes when COD is enabled.
In this section, we discuss pre-requisites from both hardware and VMware ESXi point of view.
Let’s take an example to better understand the above pre-requisite. For a server with only two memory modules per channel populated, the following slots need to be populated for a specific channel
With 4 memory module,
With 8 memory module,
NOTE: A minimum two memory modules need to be populated in order to enable COD.
How do I check COD status from VMware ESXi?
VMware ESXi reads ACPI SRAT (System Resource Affinity Tables) and SLIT (System Locality Information Tables) to identify and map the hardware resources available. This also includes mapping the NUMA nodes. This section talks about few command line options that the users can make use of to see the COD state from VMware ESXi.
The following screenshots are taken from a system with two processor sockets and 128GB system memory. In the default configuration without COD enabled, esxtop would display two NUMA nodes with 64GB allocated per NUMA node. The following figure shows the esxtop command output in VMware ESXi with COD disabled.
With COD Enabled, esxtop lists four NUMA node instead of two as the single processor socket die is divided into two.
In the COD mode, the operating system sees two NUMA nodes per socket. COD has the best local latency. Each home agent sees requests from a fewer number of threads potentially offering higher memory bandwidth. COD mode has in memory directory bit support. This mode is best for highly NUMA optimized workloads. Refer to a blog published by Dell HPC team detailing different snooping modes.
Memory information of Dell PowerEdge 13th generation of servers
VMware KB calling out Intel COD support