Let's go on to the next level of the memory hierarchy: Virtual Memory. Then we'll need to go back and tie it all together, including some more cache info.
Some points on VM that make it different from cache:
Virtual Memory gives us three major abilities, and a host of minor ones. The three major abilities are (1) efficient management of the memory resource, (2) separation of processes' memory (and memory protection), and (3) controlled memory sharing between processes.
Denning developed the concept of a "working set" in 1968. The fundamental notion here is that a process works in phases characterized by the code and data used; when it is in its first phase it uses one set of code and data; in its second it uses another set; and so forth. The code and data being used at the moment is the process's working set.
Virtual memory gives us the ability to approximate the program's working set, and only use up the memory resource on code and data likely to be used soon.
A big advantage to virtual memory is that it keeps the code and data
seen by each process completely separate (except as modified in the
next point!). One of the steps in performing a context switch
(i.e. stopping execution of one process and starting
execution of another) is to change the VM translations. In the case
of the Intel VM system, this means loading a new address into the
PDBR. Ordinarily, the translations are completely different between
the two processes; P1 and P2 both have an address
0x1234abcd
, but they're mapped to different places in
physical memory. This is about as complete a memory protection
as you can imagine; two processes not only can't step on each other's
memory, they can't even see each other.
Even if we can see a piece of memory (in the sense of having a mapping for it), VM gives us a very convenient means of controlling what access we have: the same table that we use to store the translation from virtual to physical addresses is also used to represent what we can do to a piece of memory. This is important for program reliability when we consider a single program (tracking down bugs that involve over-writing your code is very time-consuming. It's worse than bugs that trample the stack). At a minimum, the VM scheme will separate read access from write access, and user access from system access. So we can have space that we can read but not write (typically the program code), or space that we can only access while running in Kernel mode (typically the operating system itself). Some systems have had a finer variation in the memory protection; for instance, we might distinguish reading a location for an instruction fetch from reading it as data, and prohibit reading the code that's to be executed. Likewise, some systems have had a finer granularity on system state than just user/kernel mode, and have additional "rings" of protection for things like system libraries like the math or standard IO library.
If, for some reason, there is a desire to share space between two processes, that can be done as well: if we just set entries in two page tables to map to the same physical memory, we can share data between them. There are several reasons we might want to do this.
Remember the semantics of an interrupt: stripped to its essentials, a new value is stuffed into the program counter so the machine starts executing at the entry point to the device's interrupt service routine, and changes to system mode.
This has an implication: the interrupt service routine has to be in the same address space as the program that was executing (we could, in principle, have a reserved address space that's unique to interrupts. But that would make an interrupt a heavier-weight operation, and we may have a device that wants service now). And since an interrupt can occur at any time, the interrupt service routine has to be shared between the address spaces of all the processes. Consequently, the OS has to be in space shared between all processes.
You can also see how protection is used here: when the user process is executing, the CPU is in user mode. We mark the OS as system pages, so we can't access it. We can certainly generate a pointer into the OS and dereference it, but we'll get a protection violation when we do. The only way to change from user mode to kernel mode is with an interrupt; that means that when we change modes we don't have the capability to control where in the system we want to executed.
All programs executing on a system have a lot of very similar code: the IO libraries, the math libraries, the X libraries.... lot of stuff. By using memory sharing between the processes, we can have a single copy present in memory at a time. This time, we use the memory protections to allow user code read access, but not write access.
If we have a multi-CPU system, we can have multiple processes running on different processors. If we share the data between them, they can work together on it. In this case everybody will have both read and write access to the shared pages; this requires careful programming! This is especially useful if we have a multiple-CPU system, of course.
Think about the Unix fork()
system call for a moment.
The semantics of the call are defined as making an exact copy of the
previously running process, and starting the child at the same point
in the copy as the parent's current location. Most of the time this
is really inefficient, since most of the time a fork() is followed
immediately by an exec() to start up a new program (before somebody
asks, I have no idea why it was done this way instead of combining the
two operations. It seems like a design error to me, too).
BSD Unix introduced a new system call to make fork&exec more efficient. The vfork() is documented as only copying as much data as necessary for a "fork&exec" to work; if you do anything else, the results are undocumented.
What copy-on-write pages do is set up the page tables to make everything shared between parent and child, but mark them so that if either the parent or the child tries to write to the page, a copy is made at that time.
User processes actually have the ability to have a surprising amount
of control over their virtual memory mappings in Unix. The system
calls that provide this are mmap
and mprotect
mmap
is used to establish virtual memory mappings.
mprotect
is used to control your access to your virtual
memory mappings.
So, we use a simple lookup table to get from virtual to physical. We divide the address into two parts: a Virtual Page Number and a Page Offset. The width of the page offset is determined by the size of a page; the virtual page number is everything left over.
We still have the concept of a hit or a miss. Hits are basically the same as in a cache; misses turn things over to the operating system. In other words: page fault interrupt. All the choices that cache made in hardware (where does the new page go? etc) are made in software instead. Once we complete bringing in a new page, we go back to the process that faulted, and reexecute the instruction.