Skip to document

Computerorganizaion & design 5th solution

컴퓨터구조 수업 때 과제 답지입니다.
Course

컴퓨터구조 (3220)

7 Documents
Students shared 7 documents in this course
Academic year: 2019/2020
Uploaded by:
0followers
1Uploads
292upvotes

Comments

Please sign in or register to post comments.
  • Student
    사랑합니다
  • lh
    thanks bro!
  • Student
    감사합니다 당신이 저를 구했어요
  • Student
    good!
  • Student
    Thanks for sharing :)

Preview text

Solutions

1

Chapter 1 Solutions S-

1 Personal computer (includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system.

  1. a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design

1 Th e program is compiled into an assembly language program, which is then assembled into a machine language program.

  1. a. 1280  1024 pixels  1,310,720 pixels  1,310,720  3  3,932, bytes/frame. b. 3,932,160 bytes  (8 bits/byte) /100E6 bits/second  0 seconds

  2. a. performance of P1 (instructions/sec)  3  109 /1  2  109 performance of P2 (instructions/sec)  2  109 /1  2  109 performance of P3 (instructions/sec)  4  109 /2  1  109

Chapter 1 Solutions S-

1.

1.8 C  2  DP/(V 2 *F)

Pentium 4: C  3–8F Core i5 Ivy Bridge: C  2–8F 1.8 Pentium 4: 10/100  10% Core i5 Ivy Bridge: 30/70  42% 1.8 (Snew  Dnew)/(Sold  Dold)  0. Dnew  C  Vnew 2  F Sold  Vold  I Snew  Vnew  I Th erefore: Vnew  [Dnew/(C  F)]1/ Dnew  0  (Sold  Dold)  Snew Snew  Vnew  (Sold/Vold) Pentium 4: Snew  Vnew  (10/1)  Vnew  8 Dnew  0  100  Vnew  8  90  Vnew  8 Vnew  [(90  Vnew  8)/(3  3)]1/ Vnew  0 V Core i5: Snew  Vnew  (30/0)  Vnew  33. Dnew  0  70  Vnew  33  63  Vnew  33. Vnew  [(63  Vnew  33)/(2  3)]1/ Vnew  0 V

  1. p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup 1 2 1 2 7 39 1 2 1 9 2 5 28 1. 4 9 4 2 2 14 2. 8 4 2 2 1 7 5.

S-6 Chapter 1 Solutions

1.

p ex. time 1 41. 2 29. 4 14. 8 7. 1.9 3 1. 1.10 die area15cm  wafer area/dies per wafer  pi7 2 / 84  2 cm 2 yield15cm  1/(1(02/2)) 2  0. die area20cm  wafer area/dies per wafer  pi10 2 /100  3 cm 2 yield20cm  1/(1(03/2)) 2  0. 1.10 cost/die15cm  12/(840)  0. cost/die20cm  15/(1000)  0. 1.10 die area15cm  wafer area/dies per wafer  pi7 2 /(841)  1 cm 2 yield15cm  1/(1  (011/2)) 2  0. die area20cm  wafer area/dies per wafer  pi10 2 /(1001)  2 cm 2 yield20cm  1/(1  (012/2)) 2  0. 1.10 defects per area0  (1–y^.5)/(y^.5die_area/2)  (10^.5)/ (0^.52/2)  0 defects/cm 2 defects per area0  (1–y^.5)/(y^.5die_area/2)  (10^.5)/ (0^.52/2)  0 defects/cm 2 1. 1.11 CPI  clock rate  CPU time/instr. count clock rate  1/cycle time  3 GHz CPI(bzip2)  3  109  750/(2389  109 ) 0. 1.11 SPEC ratio  ref. time/execution time SPEC ratio(bzip2)  9650/750  12. 1.11. CPU time  No. instr.  CPI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the of number of instructions, that is 10%.

S-8 Chapter 1 Solutions

MIPS(P2)  3  109  10  6 /0  4  103

MIPS(P1)  MIPS(P2), performance(P1)  performance(P2) (from 11a) 1.12 MFLOPS  No. FP operations  10  6 /T MFLOPS(P1)  .4  5E9  1E-6/1  1 MFLOPS(P2)  .4  1E9  1E-6/  1 MFLOPS(P1)  MFLOPS(P2), performance(P1)  performance(P2) (from 11a) 1. 1.13 Tfp  70  0  56 s. Tnew  56  85  55  40  236 s. Reduction: 5% 1.13 Tnew  250  0  200 s, TfpTl/sTbranch  165 s, Tint  35 s. Reduction time INT: 58% 1.13 Tnew  250  0  200 s, TfpTintTl/s  210 s. NO 1. 1.14 Clock cycles  CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr. TCPU  clock cycles/clock rate  clock cycles/2  109 clock cycles  512  106 ; TCPU  0 s To have the number of clock cycles by improving the CPI of FP instructions: CPIimproved fp  No. FP instr.  CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr.  clock cycles/ CPIimproved fp  (clock cycles/2  (CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr.)) / No. FP instr. CPIimproved fp  (256462)/50  0  not possible 1.14 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIimproved l/s  No. L/S instr.  CPIbranch  No. branch instr.  clock cycles/ CPIimproved l/s  (clock cycles/2  (CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIbranch  No. branch instr.)) / No. L/S instr. CPIimproved l/s  (256198)/80  0. 1.14 Clock cycles  CPIfp  No. FP instr.  CPIint  No. INT instr.  CPIl/s  No. L/S instr.  CPIbranch  No. branch instr.

Chapter 1 Solutions S-

TCPU  clock cycles/clock rate  clock cycles/2  109 CPIint  0  1  0; CPIfp  0  1  0; CPIl/s  0  4  2; CPIbranch  0  2  1. TCPU (before improv.)  0 s; TCPU (aft er improv.) 0 s

processors

exec. time/ processor

time w/overhead speedup

actual speedup/ideal speedup 1 100 25054 100/54  1 1/2 . 42529 100/29  3 3/4  0. 8 12 16 100/16  6 6/8  0. 16 6 10 100/10  9 9/16  0.

Chapter 2 Solutions S-

2 addi f, h, -5 (note, no subi) add f, f, g

2 f = g + h + i

2 sub $t0, $s3, $s add $t0, $s6, $t lw $t1, 16($t0) sw $t1, 32($s7)

2 B[g] = A[f] + A[1+f];

2 add $t0, $s6, $s add $t1, $s7, $s lw $s0, 0($t0) lw $t0, 4($t0) add $t0, $t0, $s sw $t0, 0($t1)

  1. 2.6 temp = Array[0]; temp2 = Array[1]; Array[0] = Array[4]; Array[1] = temp; Array[4] = Array[3]; Array[3] = temp2;

2.6 lw $t0, 0($s6) lw $t1, 4($s6) lw $t2, 16($s6) sw $t2, 0($s6) sw $t0, 4($s6) lw $t0, 12($s6) sw $t0, 16($s6) sw $t1, 12($s6)

S-4 Chapter 2 Solutions

2.

Little-Endian Big-Endian Address Data Address Data 12 ab 12 12 8cd 8 ef 4ef 4 cd 01 2 0 ab 2 2882400018 2 sll $t0, $s1, 2 # $t0 <-- 4g add $t0, $t0, $s7 # $t0 <-- Addr(B[g]) lw $t0, 0($t0) # $t0 <-- B[g] addi $t0, $t0, 1 # $t0 <-- B[g]+ sll $t0, $t0, 2 # $t0 <-- 4(B[g]+1) = Addr(A[B[g]+1]) lw $s0, 0($t0) # f <-- A[B[g]+1] 2 f = 2*(&A); 2. type opcode rs rt rd immed addi $t0, $s6, 4 I-type 8 22 8 4 add $t1, $s6, $0 R-type 0 22 0 9 sw $t1, 0($t0) I-type 43 8 9 0 lw $t0, 0($t0) I-type 35 8 8 0 add $s0, $t1, $t0 R-type 0 9 8 16 2. 2.12 50000000 2.12 overflow 2.12 B 2.12 no overflow 2.12 D 2.12 overflow 2. 2.13 128    231 1, x  231 129 and 128  x   231 , x   231  128 (impossible) 2.13 128  x  231 1, x   231 129 and 128  x   231 , x  231  128 (impossible) 2.13 x  128   231 , x   231  128 and x  128  231  1, x  231  127 (impossible)

S-6 Chapter 2 Solutions

2.

2.25 i-type 2.25 addi $t2, $t2, – 1 beq $t2, $0, loop 2. 2.26 20 2.26 i = 10; do { B += 2; i = i – 1; } while ( i > 0) 2.26 5*N 2 addi $t0, $0, 0 beq $0, $0, TEST LOOP1: addi $t1, $0, 0 beq $0, $0, TEST LOOP2: add $t3, $t0, $t sll $t2, $t1, 4 add $t2, $t2, $s sw $t3, ($t2) addi $t1, $t1, 1 TEST2: slt $t2, $t1, $s bne $t2, $0, LOOP addi $t0, $t0, 1 TEST1: slt $t2, $t0, $s bne $t2, $0, LOOP 2 14 instructions to implement and 158 instructions executed 2 for (i=0; i<100; i++) { result += MemArray[s0]; s0 = s0 + 4; }

Chapter 2 Solutions S-

2 addi $t1, $s0, 400 LOOP: lw $s1, 0($t1) add $s2, $s2, $s addi $t1, $t1, - bne $t1, $s0, LOOP

2 fib: addi $sp, $sp, -12 # make room on stack sw $ra, 8($sp) # push $ra sw $s0, 4($sp) # push $s sw $a0, 0($sp) # push $a0 (N) bgt $a0, $0, test2 # if n>0, test if n= add $v0, $0, $0 # else fib(0) = 0 j rtn # test2: addi $t0, $0, 1 # bne $t0, $a0, gen # if n>1, gen add $v0, $0, $t0 # else fib(1) = 1 j rtn gen: subi $a0, $a0,1 # n- jal fib # call fib(n-1) add $s0, $v0, $0 # copy fib(n-1) sub $a0, $a0,1 # n- jal fib # call fib(n-2) add $v0, $v0, $s0 # fib(n-1)+fib(n-2) rtn: lw $a0, 0($sp) # pop $a lw $s0, 4($sp) # pop $s lw $ra, 8($sp) # pop $ra addi $sp, $sp, 12 # restore sp jr $ra

fib(0) = 12 instructions, fib(1) = 14 instructions,fib(N) = 26 + 18N instructions for N >=

2 Due to the recursive nature of the code, it is not possible for the compiler to in-line the function call.

2 after calling function fib: old $sp -> 0x7ffffffc ??? -4 contents of register $ra for fib(N) -8 contents of register $s0 for fib(N) $sp-> -12 contents of register $a0 for fib(N) there will be N-1 copies of $ra, $s0 and $a

Chapter 2 Solutions S-

DONE: add $v0, $s0, $ lw $ra, ($sp) addi $sp, $sp, 4 jr $ra

2 0x

2 Generally, all solutions are similar:

lui $t1, top_16_bits ori $t1, $t1, bottom_16_bits

2 No, jump can go up to 0x0FFFFFFC.

2 No, range is 0x604 + 0x1FFFC = 0x0002 0600 to 0x604 – 0x = 0xFFFE 0604.

2 Yes, range is 0x1FFFF004 + 0x1FFFC = 0x2001F000 to 0x1FFFF - 0x20000 = 1FFDF

2 trylk: li $t1, ll $t0,0($a0) bnez $t0,trylk sc $t1,0($a0) beqz $t1,trylk lw $t2,0($a1) slt $t3,$t2,$a bnez $t3,skip sw $a2,0($a1) skip: sw $0,0($a0) 2 try: ll $t0,0($a1) slt $t1,$t0,$a bnez $t1,skip mov $t0,$a sc $t0,0($a1) beqz $t0,try skip:

2 It is possible for one or both processors to complete this code without ever reaching the SC instruction. If only one executes SC, it completes successfully. If both reach SC, they do so in the same cycle, but one SC completes fi rst and then the other detects this and fails.

S-10 Chapter 2 Solutions

2.

2.46 Answer is no in all cases. Slows down the computer. CCT  clock cycle time ICa  instruction count (arithmetic) ICls  instruction count (load/store) ICb  instruction count (branch) new CPU time  0old ICaCPIa1oldCCT  oldIClsCPIls1oldCCT  oldICbCPIb1oldCCT Th e extra clock cycle time adds suffi ciently to the new CPU time such that it is not quicker than the old execution time in all cases. 2.46 107%, 113% 2. 2.47 2. 2.47 0. 2.47 0.

Solutions

3

Chapter 3 Solutions S-

3 5730

3 5730

3 0101111011010100

Th e attraction is that each hex digit contains one of 16 diff erent characters (0–9, A–E). Since with 4 binary bits you can represent 16 diff erent patterns, in hex each digit requires exactly 4 binary bits. And bytes are by defi nition 8 bits long, so two hex digits are all that are required to represent the contents of 1 byte. 3 753 3 7777 (3777) 3 Neither (63) 3 Neither (65) 3 Overfl ow (result  179, which does not fi t into an SM 8-bit format) 3  105  42  128 (147) 3  105  42   63 3 151  214  255 (365) 3 62  12 Step Action Multiplier Multiplicand Product 0 Initial Vals 001 010 000 000 110 010 000 000 000 000 lsb=0, no op 001 010 000 000 110 010 000 000 000 000 1 Lshift Mcand 001 010 000 001 100 100 000 000 000 000 Rshift Mplier 000 101 000 001 100 100 000 000 000 000 Prod=Prod+Mcand 000 101 000 001 100 100 000 001 100 100 2 Lshift Mcand 000 101 000 011 001 000 000 001 100 100 Rshift Mplier 000 010 000 011 001 000 000 001 100 100 lsb=0, no op 000 010 000 011 001 000 000 001 100 100 3 Lshift Mcand 000 010 000 110 010 000 000 001 100 100 Rshift Mplier 000 001 000 110 010 000 000 001 100 100 Prod=Prod+Mcand 000 001 000 110 010 000 000 111 110 100 4 Lshift Mcand 000 001 001 100 100 000 000 111 110 100 Rshift Mplier 000 000 001 100 100 000 000 111 110 100 lsb=0, no op 000 000 001 100 100 000 000 111 110 100 5 Lshift Mcand 000 000 011 001 000 000 000 111 110 100 Rshift Mplier 000 000 011 001 000 000 000 111 110 100 lsb=0, no op 000 000 110 010 000 000 000 111 110 100 6 Lshift Mcand 000 000 110 010 000 000 000 111 110 100 Rshift Mplier 000 000 110 010 000 000 000 111 110 100

Was this document helpful?

Computerorganizaion & design 5th solution

Course: 컴퓨터구조 (3220)

7 Documents
Students shared 7 documents in this course

University: Konkuk University

Was this document helpful?
Solutions
1