The basic idea is that the instruction is decoded at the same time the registers are being fetched, and the control signals for each pipeline stage are sent down the pipe with the data.
In principle, any control signals that need to be used in the RR stage can be consumed in that stage. In practice, the text's example pipeline doesn't use any control signals in the RR stage (this will change later, when the branch unit gets optimized).
We need signals controlling the input muxes to the ALU, the ALU itself, and the mux selecting whether Rt or Rd will be used to select the destination register.
Control whether we read from memory, write to memory, or neither.
Control whether we are going to write to a result register, and if we do whether the data comes from the memory or the "memory bypass."
We can label a signal according to what pipeline register it's in at the moment, and what it's going to control. So, for instance, ID/EX.RegDst is the field in the ID/EX pipeline register that controls whether the destination register is selected by the instruction's Rt field or Rd field.
The text presents forwarding in terms of a centralized "forwarding unit" that detects all of the forwarding issues in the CPU, and controls muxes as needed to fix them.
I prefer a slightly different presentation, that's actually functionally equivalent. Consider every consumer of data in the CPU (so the ALU consumes data on its two inputs, the memory consumes a a data element, etc). Also, consider every producer of data (so the ALU is a producer, the memory is a producer, etc). Every producer feeds the data it produces out to a pipeline register; the register provides it to a consumer.
Now imagine a mux in front of every consumer, with a "watchdog" examining every producer. The watchdog is constantly checking the producers, to see if they are going to produce some data that the consumer needs on this cycle. If so, the watchdog tells the mux to get data from the consumer instead of from its default input.
From this perspective, the ForwardA watchdog corresponding to the first EX hazard on page 480 evaluates the following:
if ((EX/MEM.RegWrite) and (EX/MEM.RegisterRd != 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10(this actually ends up being exactly the same code as in the book, but to me this is a clearer way to get there. YMMV)
Question: suppose two instructions in a row write to $1, and then an instruction reads that same register. Which of the writers should "win?"
To stall an instruction, it's necessary to keep it from proceeding
down the pipeline. In most cases (including the MIPS pipeline in the
text), it's possible to just delay the instruction in the ID stage
until it can go the rest of the way down the pipeline unimpeded. The
text's "hazard detection unit" shows an example of how to do this.
It works similarly to the forward detection unit, except that it
detects hazards which will need to be resolved by stalling (like the
lw
situation from last time). In this case, it cancels
the write of a new instruction into the IF/ID pipeline register,
cancels the load of a new PC, and inserts the control signals for a
dummy instruction that accomplishes nothing into the pipeline. When
the lw
has gone far enough down the pipeline (only one
cycle, with the MIPS), the instruction being stalled can be allowed to
proceed.
This always seems like a dangerous thing to do - what if you have an instruction that causes a stall, that will never clear? Can the CPU hang? In a word, no. By definition, a hazard only exists because of the pipelined implementation; if the CPU waited for one instruction to completely finish before starting the next, there would not have been a hazard. So the absolute worst possible case would be for the second instruction to have to wait for the first one to completely exit the pipeline.