Basic idea: the various steps in executing an instruction don't use the same parts of the CPU. This tells us we can execute them in parallel - doing this is called pipelining. The text has a really good analogy in their ``laundry analogy.''
In the four or five basic steps involved in the MIPS, we in fact never have to reuse hardware. This will let us pipeline it. The steps, and their corresponding hardware parts, are:
Step | Hardware |
---|---|
Intruction Fetch | PC, Instruction Memory |
Decode/Register Read | Registers (read ports) |
Arithmetic | ALU |
Memory Read/Write | Data Memory |
Writeback | Registers (write port) |
We implement the pipeline by putting registers, called pipeline registers, between the stages of the pipeline. Now, on each cycle, we move the data through the CPU by one step. The result is that we are able to get a substantial improvement in performance, at very minimal cost.
How much speedup can we expect from pipelining? Let's take a not-too-unreasonable case, of a processor where the five stages take (these numbers are actually pulled out of a hat, so they shouldn't be quoted as any sort of real estimate of how long these steps take):
Time | Stage |
---|---|
5 | Instruction Fetch |
2 | Register Read |
3 | Execute |
5 | Memory |
2 | Register Write |
For a single-cycle implementation, this machine will take 5 + 2 + 3 + 5 + 2 = 17 nanoseconds to execute an instruction.
To pipeline it, we need to insert the pipeline registers, which will take time (let's call it 1 nanosecond - it's reasonable that this take less time than the register file, because no addressing is needed). This adds one nanosecond to the time for each stage. The maximum speed we can run now is determined by the speed of the slowest stage, so we can run at 6 nanoseconds per cycle: a speedup by a factor of almost three
But there is still more to the story: this is the rate with the pipeline full. If it has to be drained, the speedup goes down.