11 septiembre 2013

Do you know what NUMA is? really? - Part 1

This post is about NUMA concept, because many people speaks about NUMA and don´t know "really" what is or how to use it.

By the way, NUMA is refered to the server platforms with more than one system bus and dedicates different memory banks to different processors. 

See an example with 2 bus (2 sockets) which  access to their own memmory banks internally and  access the rest memmory banks when interleavng is disable on the BIOS :

Each CPU (socket) + local memory = NUMA node

In the past, there were servers with only one bus and the CPU was increased in Ghz  more and more, and the memmory consumtion grow up; it´s teh cause nowadays the servers have more than one bus and you must install the memmory on banks pairing the bus.

If the memory is not populated correctly and distributed equally across the nodes the O.S. stop responding and display a purple screen (PSOD) with the following NUMA node error message:

In a NUMA architecture, processors may access local memory quickly and remote memory more slowly. This can dramatically improve memory throughput as long as the data are localized to specific processes (and thus processors). On the downside, NUMA makes the cost of moving data from one processor to another, as in workload balancing, more expensive.  The high latency of remote memory accesses can leave the processors under-utilized, constantly waiting for data to be transferred to the local node, and the NUMA connection can become a bottleneck for applications with high-memory bandwidth demands.

However, an advanced memory controller allows a node to use memory on all other nodes, creating a single system image. When a processor accesses memory that does not lie within its own node (remote memory), the data must be transferred over the NUMA connection, which is slower than accessing local memory. Memory access times are not uniform and depend on the location of the memory and the node from which it is accessed, as the technology’s name implies.

Memory interleaving refers to the way the system maps its memory addresses to the physical  memory locations in the memory channels and DIMMs. Typically, consecutive system memory addresses are staggered across the DIMM ranks and across memory channels in the following manner:

>Rank Interleaving. Every consecutive memory cache line (64 bits) is mapped to a different DIMM rank.
>Channel Interleaving. Every consecutive memory cache line is mapped to a different memory channel.

disabling Node Interleaving = NUMA active
At least, NUMA options is only avaliable on Intel Nehalem and AMD Opteron 

Next post "Part 2" will review the NUMA use on VMware.