Memory Layers Latency Difference

There’s such term as memory hierarchy in computer architecture. In simplistic terms it’s about structuring different memory/storage types based on response time. In software design & architecture memory hierarchy directly affects performance and may make or break design of a particular software system. Memory layers latency difference is very important to software design.

Basics of memory hierarchy

Application of memory hierarchy to software design in a nutshell can be summarized with basically two principles:

  • There are 4 memory/storage layers in order of latency increase: CPU cache / RAM / disk / network.
  • Each layer is 10 times slower than the previous one. That means RAM is 10x slower than CPU cache and network is 10x slower than disk.

There are numerous applications of these principles in software we use daily. Most of them just became too natural to even recognize:

  • Compressing files before transferring them over network. That provides decrease of download time proportional to the compression ratio.
  • Anything downloaded via network can be cached on disk. Therefore, that gives immediate ~10x less response time for a cached resource. Modern web servers & browsers rely on this heavily thanks to fundamental design principles of HTTP.
  • Anything stored on disk can be cached in RAM for similar ~10x less time to fetch. That’s how OSes speed up execution of apps after initial load time. New apps you run which depend on the same shared dependencies immediately benefit from cached dependencies.
  • Algorithms optimization for best performance may chase higher numbers of CPU cache hits vs fallback to fetching data from RAM.

Memory hierarchy in modern software

Modern software systems are extremely dynamic & distributed. That means many different components with constantly changing configuration spread over the network (think k8s). That shift if software design dramatically increased the reliance on network layer of memory hierarchy. Because of that memory layers latency difference seem to play much less significant role. In other words, network should be the de facto limiting factor of the overall system performance.

That’s actually too simplistic. 1 Gigabit network transfer speed (125 MB/s) is comparable to HDD transfer speed (100-200 MB/s). However, it still comply with 10x principle if compared with SSD transfer speed (~2 GB/s). In-memory cache layer (Redis or Memcached) is super performant but app layer talks to it via network. The number of possible interaction types exploded in comparison to standalone apps of the 90s where application of memory hierarchy was simple and explicit.

The bottom line? Similarly to scalable backend secret sauce memory hierarchy is subtle but very important concept to be mindful of. Apply it carefully to workloads of your system:

  • Is it justified to fallback to network every time in this particular interaction?
  • If we can cache a network-based resource how long the cache will be fresh?
  • Where to cache a network-based resource – on disk or in RAM?

Every single ineffectiveness of every single compute node of the whole system adds up to overall ineffectiveness. The opposite is also true – effective usage of storage types adds up to the effectiveness of the whole system. For modern systems network effects of orchestration of their parts have unparalleled significance.