Measuring L1 cache cold read latency in C++
This medium-difficulty coding problem tests your ability to write a low-level microbenchmark that accurately measures CPU cache behavior. It combines systems knowledge (cache hierarchy, memory access patterns) with practical C++ techniques (compiler barriers, high-resolution timing) — the kind of work quant and HFT firms rely on to tune execution infrastructure.
The core challenge is isolating the latency of a single cold load while preventing the compiler from optimizing away your measurement or the memory access itself. You'll need to understand cache eviction, choose an appropriate timing source with nanosecond precision, and structure your loop so that each read genuinely misses L1. The final estimate should be reasonable relative to typical CPU specifications (modern L1 latencies are usually 3–5 nanoseconds), though exact values depend on the hardware.
- Cache line size and associativity
- Compiler optimization and volatile/asm barriers
- High-resolution timing sources (RDTSC, clock_gettime, std::chrono)
- Noise, variance, and averaging in microbenchmarks