CS473 HW#5 Solution


Superscalar Execution (25 points)

Draw a Gantt chart showing how the following code is executed in the new superscalar machine:
	lw   $2, 100($3)
lw $4, 200($5)
add $6, $2, $4
sw $6, 300($7)
addi $2, $2, 4
addi $5, $5, 4
addi $7, $7, 4
solution
                             0 1 2 3 4 5 6 7 8 9 0    

nop |_|_|_|_|
lw $2, 100($3) |_|_|_|_|_|
nop |_|_|_|_|
lw $4, 200($5) |_|_|_|_|_|
add $6, $2, $4 |_|___|_|_|
sw $6, 300($7) |_|___|_|_|_|
addi $2, $2, 4 |_|_|_|_|
addi $5, $5, 4 |_|_|_|_|_|
addi $7, $7, 4 |_|_|_|_|
nop |_|_|_|_|_|
0 1 2 3 4 5 6 7 8 9 0 (total: 10 cycles)

Out-of-Order Execution (15 points)

Show how the following code would be executed on a CDC 6600. Assume all additions are floating-point.

	X1 <- X6 + X7
X1 <- X2 * X3
X5 <- X1 * X4
X4 <- X3 + X2

Solution
                          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3   

X1 <- X6 + X7 |_|_|_|_|
X1 <- X2 * X3 |_|_|_|_|_|_|_|_|_|_|
X5 <- X1 * X4 |_________________|_|_|_|_|_|_|_|_|_|
X4 <- X3 + X2 |_|_|_|___________|
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 (total: 23 cycles)

Cache Problems (10 points each)

1. Text, question 7.9: Here is a series of address references given as word addresses: 2,3,11,16,21,13,64,48,19,11,2,22,4,27,6 and 11. Assuming a direct-mapped cache with 16 one-word blocks that is initially empty, label each reference in the list as a hit or a miss and show the final contents of the cache.

Solution : (No spatiality is explored. Each time we bring only one word into cache.)

Address Decomposition:

 

26

4

2


  tag

  index

  byte offset

(Note: Since the references are word addresses and smallest room in cache is a word, so we just do not touch the byte offsets and act as if the addresses are 30 bits long.)

                

Hit or Miss:

Reference tag index    Miss/Hit
2 0 2 M
3 0 3 M
11 0 11 M
16 1 0 M
21 1 5 M
13 0 13 M
64 4 0 M
48 3 0 M
19 1 3 M
11 0 11 H
3 0 3 M
22 1 6 M
4 0 4 M
27 1 11 M
6 0 6 M
11 0 11 M


Cache history:

Index References
0 16  64  48
1   
2 2
3 3 19 3
4 4
5 21
6 22  6
7   
8   
9   
10   
11 11 11 27 11
12   
13 13
14   
15   


Final content:

Index Valid bit Tag    Data
0 1 3  mem[48]
1 0

2 1 0  mem[2]
3 1 1  mem[19]
4 1 0  mem[4]
5 1 1  mem[21]
6 1 0  mem[6]
7 0

8 0

9 0

10 0

11 1 1  mem[11]
12 0

13 1 0  mem[13]
14 0

15 0



2. Text, question 7.10: Do the same exercise as those in the previous question except for a direct-mapped cache with four-word blocks and a total size of 16 words.

Solution : (This time we take advantages of spatiality. Each time we bring 4 words into cache.)

Address Decomposition:

26

2

 2

2


  tag

  index

  word offset

  byte offset

(Note: Since the references are word addresses and smallest room in cache is a word, so we just do not touch the byte offsets and act as if the addresses are 30 bits long.)


Hit or Miss:

Reference tag index block offset    Miss/Hit
2 0 0 2 M
3 0 0 3 H
11 0 2 3 M
16 1 0 0 M
21 1 1 1 M
13 0 3 1 M
64 4 0 0 M
48 3 0 0 M
19 1 0 3 M
11 0 2 3 H
3 0 0 3 M
22 1 1 2 H
4 0 1 0 M
27 1 2 3 M
6 0 1 2 H
11 0 2 3 M


Cache history:

Index   word 0   word 1   word 2   word 3
0  4 16 64 48 20 4  5 17 65 49 21 5  2 18 66 50 22 6  3 19 67 51 19 3 
1  24 4  21 5  22 6  23 7
2  12 28 12  13 29 13  14 30 14  11 27 11
3  16  13  14  15


Final content:

Index    tag   word 0   word 1   word 2   word 3
0     0   mem[4]   mem[5]   mem[6]   mem[3]
1     0   mem[4]   mem[5]   mem[6]   mem[7]
2     0   mem[12]   mem[13]   mem[14]   mem[11]
3     0   mem[16]   mem[13]   mem[14]   mem[15]


3. Also, perform the same exercise as above except assuming a two-way set-associative cache using LRU replacement, with a total of 16 one-word blocks.

Solution : We take advantages of associativity. Each time we bring one words into cache. But we use LRU placement policy instead of direct mapping. There are 16/2=8 sets of blocks in the cache.

Address Decomposition:

27

3

2


  tag

  set index

  byte offset

(Note: Since the references are word addresses and smallest room in cache is a word, so we just do not touch the byte offsets and act as if the addresses are 30 bits long.)


Hit or Miss:

Reference tag set index    Miss/Hit
2 0 2 M
3 0 3

M

11 1 3 M
16 2 0 M
21 2 5 M
13 1 5 M
64 8 0 M
48 6 0 M
19 2 3 M
11 1 3 H
3 0 3 M
22 2 6 M
4 0 4 M
27 3 3 M
6 0 6 M
11 1 3 M

  

Cache history:

Index       Tag      Data       Tag        Data
0       6 16 48       8 64
1



2       0 2

3       1 3 19 3 11       3 11 11 27
4       0 4

5       2 21       1 13
6       2 22       0 6
7




Final content:

Index     Tag    Data     Tag   Data
0      6   mem[48]     8   mem[64]
1           
2      0   mem[2]      
3      1   mem[11]     3   mem[27]
4      0   mem[4]      
5      2   mem[21]     1   mem[13]
6      2   mem[22]     0   mem[6]
7            

  

  

  

Cache Simulation! (50 points)