With today’s multi-socket, multi-core, highly-threaded PowerEdge servers, the operating system, applications, and drivers are expected to be written to take advantage of this massively parallel architecture.

  • While most industry-standard benchmarks and tools (e.g., SPECrate®, SPECweb®, VMware® VMmark™, and database benchmarks from the Transaction Processing Performance Council) can be configured and optimized to saturate all the processing power of these servers, these benchmarks typically measure maximum throughput (i.e., transactions, I/O, or pages per second). These metrics are often listed as IOPS, tpmC, tpsE, OPM, etc.
  • Many organizations, however, especially in the financial industry (where high-frequency trading occurs) focus more on reducing the time it takes to solve a single task. In these cases, the focus must be on reducing system latency rather than increasing throughput. The metrics used in these measurements are typically sub-second response times: ms, ns, usec, etc.
  • To reduce system latency, the entire solution must be taken into consideration:
    • The server, including processor and memory architecture and BIOS tuning
    • The network stack—especially network driver tunings such as coalesce settings
    • Operating system (OS) selection and tuning (e.g., kernel/registry settings and binding/pinning interrupts of high-I/O devices)
    • Application tuning (e.g., affinitizing processes/threads to local memory in a Non-Uniform Memory Access, or NUMA, environment)

Update May 2012: We now have the low latency whitepaper updated for PowerEdge 12th Generation Servers:

Link to blog

Link to whitepaper

Update Nov 2012: Limiting processor C-state usage by the OS is often desirable in latency-sensitive environments.  A brief white paper discussing the ways of controlling C-state usage in linux is available here.