CS 473 - The Memory Hierarchy

Some cold hard facts:

  1. smaller is faster
  2. smaller takes less bits to address (so indirectly faster)
  3. cheaper is slower

These basic facts are responsible for the memory hierarchy as we see it today. It's normally represented as a pyramid (note that these numbers are really rough, and changing all the time):

SizeLatencyBandwidthLevelSpeed
smallersub-nanosecondGB/secregistersfaster
couple nsecGB/seccache (typically several levels of cache)
50 nsec100 MB/secmain memory
bigger8 msec20 MB/secdisk (VM)slower

As of April, 2005, information I can find on the web says that Intel's original P4 L1 cache took two cycles, and the new Prescott core is three. The L2 cache latency is around 20-30 cycles for those processors that don't have an L3 cache; for those that do, L2 cache latency is around ten cycles and L3 seems to be around 20. In fact, it looks from the specs as if the L1 and L3 caches for these processors are the L1 and L2 from the machines that only have two levels, with a new level inserted "between" them. The memory latency has been stalled for a long, long time (which, at processors running at greater than a GHz works out to hundreds of cycles!). Current memory technologies work really hard to use prefetching and split-transaction techniques to hide this.

Further note: reviewing this information in March, 2006 doesn't show much change.

Address Spaces

It tends to be helpful to talk about the memory hierarchy both in terms of the actual levels of storage present, and in terms of the "address spaces" present. First, an address: for our purposes, an address is a means of specifying a location (is that broad enough?). An address space is a set of addresses, and consequently a means of specifying a set of locations. So, for instance, the register numbers (0-31 on MIPS) are an address space, the virtual addresses (we'll talk about what a virtual address is in just a second) your program can generate (0x00000000-0xffffffff on a 32-bit machine) are an address space, the file system is an address space.

The full memory hierarchy makes use of these three address spaces. Transfers between address spaces are always managed by software; transfers between levels within a single address space may be managed by either hardware or software. Transfers between main memory and cache, or between cache levels, will be managed by hardware while transfers between disk and memory will be managed by software.

A very important concept for any discussion about memory hierarchies is the notion of "virtual memory." While this term is usually used to refer specifically to the memory address space seen by a program, and its partial mapping to main memory, it's really a much more general notion of having a mapping from an address as seen by a program (the virtual address), and a hardware (physical) address. These days the concept is used at all levels of the memory hierarchy: the computer supports virtual memory (in the narrow sense), of course; there is a different mapping from either virtual or physical addresses to cache locations, and there is even typically renaming of registers within the CPU.

When discussing the memory hierarchy, we normally just talk about the part of the hierarchy involving the memory address space: so we discuss mappings from virtual addresses to physical addresses, and from either virtual or physical addresses to cache addresses.

Criteria

Cost: sum of parts

Mean access: weighted average of access times.

We want to minimize the average time to access data, while also minimizing cost. This implies that the more frequently we access some memory, the closer to the top of the hierarchy it should be - so as we think we'll be using it frequently, we move it to the top.

Characteristics of Levels

Registers: only a few of them (some architectures, like MIPS, have a lot more than others like IA-32. But compared to the memory space, it's still tiny), so only a few bits to address them. Also, notice that registers are shared between programs, so it is the OS's responsibility to keep programs from squishing each other's registers. Registers and on-chip cache actually have very similar latency (within a factor of ten), but registers will have higher bandwidth (since they'll have more ports). Data movement between the memory address space and the register address space is managed by the program.

Cache/main memory/disk: all in same, per-program address space. Caches and main memory have similar speed (at least compared to disk), so hardware can (and must) manage transfers between them. Disk is much slower than memory, so software can (and must) manage transfers between disk and memory (this is normally managed by the OS. There have been experiments with giving processes control over their own virtual memory; logically, one would expect that this could yield higher performance, but it hasn't become widespread).


Problem: what gives us the idea that we'll have useful hit rates? Consider 2nd level cache vs. main memory. 512K cache, 64M main are reasonable. This says that, if memory accesses are random, only 1% of accesses will hit in cache!

Works (at all levels) because accesses are not random. Principle of locality:

Key to effective memory system design is to identify good predictors of future accesses, and put things that are likely to be used soon into "closer" levels of hierarchy.


Last modified: Wed Mar 29 11:01:28 MST 2006