- Information
- AI Chat
Computerorganizaion & design 5th solution
컴퓨터구조 (3220)
Konkuk University
Preview text
Solutions
1
Chapter 1 Solutions S-
1 Personal computer (includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system.
- a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design
1 Th e program is compiled into an assembly language program, which is then assembled into a machine language program.
a. 1280 1024 pixels 1,310,720 pixels 1,310,720 3 3,932, bytes/frame. b. 3,932,160 bytes (8 bits/byte) /100E6 bits/second 0 seconds
a. performance of P1 (instructions/sec) 3 109 /1 2 109 performance of P2 (instructions/sec) 2 109 /1 2 109 performance of P3 (instructions/sec) 4 109 /2 1 109
Chapter 1 Solutions S-
1.
1.8 C 2 DP/(V 2 *F)
Pentium 4: C 3–8F Core i5 Ivy Bridge: C 2–8F 1.8 Pentium 4: 10/100 10% Core i5 Ivy Bridge: 30/70 42% 1.8 (Snew Dnew)/(Sold Dold) 0. Dnew C Vnew 2 F Sold Vold I Snew Vnew I Th erefore: Vnew [Dnew/(C F)]1/ Dnew 0 (Sold Dold) Snew Snew Vnew (Sold/Vold) Pentium 4: Snew Vnew (10/1) Vnew 8 Dnew 0 100 Vnew 8 90 Vnew 8 Vnew [(90 Vnew 8)/(3 3)]1/ Vnew 0 V Core i5: Snew Vnew (30/0) Vnew 33. Dnew 0 70 Vnew 33 63 Vnew 33. Vnew [(63 Vnew 33)/(2 3)]1/ Vnew 0 V
- p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup 1 2 1 2 7 39 1 2 1 9 2 5 28 1. 4 9 4 2 2 14 2. 8 4 2 2 1 7 5.
S-6 Chapter 1 Solutions
1.
p ex. time 1 41. 2 29. 4 14. 8 7. 1.9 3 1. 1.10 die area15cm wafer area/dies per wafer pi7 2 / 84 2 cm 2 yield15cm 1/(1(02/2)) 2 0. die area20cm wafer area/dies per wafer pi10 2 /100 3 cm 2 yield20cm 1/(1(03/2)) 2 0. 1.10 cost/die15cm 12/(840) 0. cost/die20cm 15/(1000) 0. 1.10 die area15cm wafer area/dies per wafer pi7 2 /(841) 1 cm 2 yield15cm 1/(1 (011/2)) 2 0. die area20cm wafer area/dies per wafer pi10 2 /(1001) 2 cm 2 yield20cm 1/(1 (012/2)) 2 0. 1.10 defects per area0 (1–y^.5)/(y^.5die_area/2) (10^.5)/ (0^.52/2) 0 defects/cm 2 defects per area0 (1–y^.5)/(y^.5die_area/2) (10^.5)/ (0^.52/2) 0 defects/cm 2 1. 1.11 CPI clock rate CPU time/instr. count clock rate 1/cycle time 3 GHz CPI(bzip2) 3 109 750/(2389 109 ) 0. 1.11 SPEC ratio ref. time/execution time SPEC ratio(bzip2) 9650/750 12. 1.11. CPU time No. instr. CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is 10%.
S-8 Chapter 1 Solutions
MIPS(P2) 3 109 10 6 /0 4 103
MIPS(P1) MIPS(P2), performance(P1) performance(P2) (from 11a) 1.12 MFLOPS No. FP operations 10 6 /T MFLOPS(P1) .4 5E9 1E-6/1 1 MFLOPS(P2) .4 1E9 1E-6/ 1 MFLOPS(P1) MFLOPS(P2), performance(P1) performance(P2) (from 11a) 1. 1.13 Tfp 70 0 56 s. Tnew 56 85 55 40 236 s. Reduction: 5% 1.13 Tnew 250 0 200 s, TfpTl/sTbranch 165 s, Tint 35 s. Reduction time INT: 58% 1.13 Tnew 250 0 200 s, TfpTintTl/s 210 s. NO 1. 1.14 Clock cycles CPIfp No. FP instr. CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr. TCPU clock cycles/clock rate clock cycles/2 109 clock cycles 512 106 ; TCPU 0 s To have the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp No. FP instr. CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr. clock cycles/ CPIimproved fp (clock cycles/2 (CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr.)) / No. FP instr. CPIimproved fp (256462)/50 0 not possible 1.14 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPIfp No. FP instr. CPIint No. INT instr. CPIimproved l/s No. L/S instr. CPIbranch No. branch instr. clock cycles/ CPIimproved l/s (clock cycles/2 (CPIfp No. FP instr. CPIint No. INT instr. CPIbranch No. branch instr.)) / No. L/S instr. CPIimproved l/s (256198)/80 0. 1.14 Clock cycles CPIfp No. FP instr. CPIint No. INT instr. CPIl/s No. L/S instr. CPIbranch No. branch instr.
Chapter 1 Solutions S-
TCPU clock cycles/clock rate clock cycles/2 109 CPIint 0 1 0; CPIfp 0 1 0; CPIl/s 0 4 2; CPIbranch 0 2 1. TCPU (before improv.) 0 s; TCPU (aft er improv.) 0 s
processors
exec. time/ processor
time w/overhead speedup
actual speedup/ideal speedup 1 100 25054 100/54 1 1/2 . 42529 100/29 3 3/4 0. 8 12 16 100/16 6 6/8 0. 16 6 10 100/10 9 9/16 0.
Chapter 2 Solutions S-
2 addi f, h, -5 (note, no subi) add f, f, g
2 f = g + h + i
2 sub $t0, $s3, $s add $t0, $s6, $t lw $t1, 16($t0) sw $t1, 32($s7)
2 B[g] = A[f] + A[1+f];
2 add $t0, $s6, $s add $t1, $s7, $s lw $s0, 0($t0) lw $t0, 4($t0) add $t0, $t0, $s sw $t0, 0($t1)
- 2.6 temp = Array[0]; temp2 = Array[1]; Array[0] = Array[4]; Array[1] = temp; Array[4] = Array[3]; Array[3] = temp2;
2.6 lw $t0, 0($s6) lw $t1, 4($s6) lw $t2, 16($s6) sw $t2, 0($s6) sw $t0, 4($s6) lw $t0, 12($s6) sw $t0, 16($s6) sw $t1, 12($s6)
S-4 Chapter 2 Solutions
2.
Little-Endian Big-Endian Address Data Address Data 12 ab 12 12 8cd 8 ef 4ef 4 cd 01 2 0 ab 2 2882400018 2 sll $t0, $s1, 2 # $t0 <-- 4g add $t0, $t0, $s7 # $t0 <-- Addr(B[g]) lw $t0, 0($t0) # $t0 <-- B[g] addi $t0, $t0, 1 # $t0 <-- B[g]+ sll $t0, $t0, 2 # $t0 <-- 4(B[g]+1) = Addr(A[B[g]+1]) lw $s0, 0($t0) # f <-- A[B[g]+1] 2 f = 2*(&A); 2. type opcode rs rt rd immed addi $t0, $s6, 4 I-type 8 22 8 4 add $t1, $s6, $0 R-type 0 22 0 9 sw $t1, 0($t0) I-type 43 8 9 0 lw $t0, 0($t0) I-type 35 8 8 0 add $s0, $t1, $t0 R-type 0 9 8 16 2. 2.12 50000000 2.12 overflow 2.12 B 2.12 no overflow 2.12 D 2.12 overflow 2. 2.13 128 231 1, x 231 129 and 128 x 231 , x 231 128 (impossible) 2.13 128 x 231 1, x 231 129 and 128 x 231 , x 231 128 (impossible) 2.13 x 128 231 , x 231 128 and x 128 231 1, x 231 127 (impossible)
S-6 Chapter 2 Solutions
2.
2.25 i-type 2.25 addi $t2, $t2, – 1 beq $t2, $0, loop 2. 2.26 20 2.26 i = 10; do { B += 2; i = i – 1; } while ( i > 0) 2.26 5*N 2 addi $t0, $0, 0 beq $0, $0, TEST LOOP1: addi $t1, $0, 0 beq $0, $0, TEST LOOP2: add $t3, $t0, $t sll $t2, $t1, 4 add $t2, $t2, $s sw $t3, ($t2) addi $t1, $t1, 1 TEST2: slt $t2, $t1, $s bne $t2, $0, LOOP addi $t0, $t0, 1 TEST1: slt $t2, $t0, $s bne $t2, $0, LOOP 2 14 instructions to implement and 158 instructions executed 2 for (i=0; i<100; i++) { result += MemArray[s0]; s0 = s0 + 4; }
Chapter 2 Solutions S-
2 addi $t1, $s0, 400 LOOP: lw $s1, 0($t1) add $s2, $s2, $s addi $t1, $t1, - bne $t1, $s0, LOOP
2 fib: addi $sp, $sp, -12 # make room on stack sw $ra, 8($sp) # push $ra sw $s0, 4($sp) # push $s sw $a0, 0($sp) # push $a0 (N) bgt $a0, $0, test2 # if n>0, test if n= add $v0, $0, $0 # else fib(0) = 0 j rtn # test2: addi $t0, $0, 1 # bne $t0, $a0, gen # if n>1, gen add $v0, $0, $t0 # else fib(1) = 1 j rtn gen: subi $a0, $a0,1 # n- jal fib # call fib(n-1) add $s0, $v0, $0 # copy fib(n-1) sub $a0, $a0,1 # n- jal fib # call fib(n-2) add $v0, $v0, $s0 # fib(n-1)+fib(n-2) rtn: lw $a0, 0($sp) # pop $a lw $s0, 4($sp) # pop $s lw $ra, 8($sp) # pop $ra addi $sp, $sp, 12 # restore sp jr $ra
fib(0) = 12 instructions, fib(1) = 14 instructions,fib(N) = 26 + 18N instructions for N >=2 Due to the recursive nature of the code, it is not possible for the compiler to in-line the function call.
2 after calling function fib: old $sp -> 0x7ffffffc ??? -4 contents of register $ra for fib(N) -8 contents of register $s0 for fib(N) $sp-> -12 contents of register $a0 for fib(N) there will be N-1 copies of $ra, $s0 and $a
Chapter 2 Solutions S-
DONE: add $v0, $s0, $ lw $ra, ($sp) addi $sp, $sp, 4 jr $ra
2 0x
2 Generally, all solutions are similar:
lui $t1, top_16_bits ori $t1, $t1, bottom_16_bits
2 No, jump can go up to 0x0FFFFFFC.
2 No, range is 0x604 + 0x1FFFC = 0x0002 0600 to 0x604 – 0x = 0xFFFE 0604.
2 Yes, range is 0x1FFFF004 + 0x1FFFC = 0x2001F000 to 0x1FFFF - 0x20000 = 1FFDF
2 trylk: li $t1, ll $t0,0($a0) bnez $t0,trylk sc $t1,0($a0) beqz $t1,trylk lw $t2,0($a1) slt $t3,$t2,$a bnez $t3,skip sw $a2,0($a1) skip: sw $0,0($a0) 2 try: ll $t0,0($a1) slt $t1,$t0,$a bnez $t1,skip mov $t0,$a sc $t0,0($a1) beqz $t0,try skip:
2 It is possible for one or both processors to complete this code without ever reaching the SC instruction. If only one executes SC, it completes successfully. If both reach SC, they do so in the same cycle, but one SC completes fi rst and then the other detects this and fails.
S-10 Chapter 2 Solutions
2.
2.46 Answer is no in all cases. Slows down the computer. CCT clock cycle time ICa instruction count (arithmetic) ICls instruction count (load/store) ICb instruction count (branch) new CPU time 0old ICaCPIa1oldCCT oldIClsCPIls1oldCCT oldICbCPIb1oldCCT Th e extra clock cycle time adds suffi ciently to the new CPU time such that it is not quicker than the old execution time in all cases. 2.46 107%, 113% 2. 2.47 2. 2.47 0. 2.47 0.
Solutions
3
Chapter 3 Solutions S-
3 5730
3 5730
3 0101111011010100
Th e attraction is that each hex digit contains one of 16 diff erent characters (0–9, A–E). Since with 4 binary bits you can represent 16 diff erent patterns, in hex each digit requires exactly 4 binary bits. And bytes are by defi nition 8 bits long, so two hex digits are all that are required to represent the contents of 1 byte. 3 753 3 7777 (3777) 3 Neither (63) 3 Neither (65) 3 Overfl ow (result 179, which does not fi t into an SM 8-bit format) 3 105 42 128 (147) 3 105 42 63 3 151 214 255 (365) 3 62 12 Step Action Multiplier Multiplicand Product 0 Initial Vals 001 010 000 000 110 010 000 000 000 000 lsb=0, no op 001 010 000 000 110 010 000 000 000 000 1 Lshift Mcand 001 010 000 001 100 100 000 000 000 000 Rshift Mplier 000 101 000 001 100 100 000 000 000 000 Prod=Prod+Mcand 000 101 000 001 100 100 000 001 100 100 2 Lshift Mcand 000 101 000 011 001 000 000 001 100 100 Rshift Mplier 000 010 000 011 001 000 000 001 100 100 lsb=0, no op 000 010 000 011 001 000 000 001 100 100 3 Lshift Mcand 000 010 000 110 010 000 000 001 100 100 Rshift Mplier 000 001 000 110 010 000 000 001 100 100 Prod=Prod+Mcand 000 001 000 110 010 000 000 111 110 100 4 Lshift Mcand 000 001 001 100 100 000 000 111 110 100 Rshift Mplier 000 000 001 100 100 000 000 111 110 100 lsb=0, no op 000 000 001 100 100 000 000 111 110 100 5 Lshift Mcand 000 000 011 001 000 000 000 111 110 100 Rshift Mplier 000 000 011 001 000 000 000 111 110 100 lsb=0, no op 000 000 110 010 000 000 000 111 110 100 6 Lshift Mcand 000 000 110 010 000 000 000 111 110 100 Rshift Mplier 000 000 110 010 000 000 000 111 110 100