Term
| What are two types of units - logical elements? |
|
Definition
Combinational elements - outputs depend on inputs
- boolean equation
- has latency
- no internal storage
- E.G. ALU
Sequential or State elements - have state, i.e. memory (while power is present)
- need to be intialized (on applying power)
- inputs
- at least one level input
- a clock
- output depends on state and input(s) |
|
|
Term
| If all instructions were performed in 1 cycle, we would have to slow clock down to accomodate ___________ instruction (load from memory). |
|
Definition
|
|
Term
| What is the idea of Multicycle Implementation and what are a few of its attributes? |
|
Definition
Idea:
1. Shareable functional units (FU)
- single memory
- single ALU (instead of 3)
- register
2. Break instruction into multiple cycles
Some Attributes:
1. Can share FUs as long as in a different cycle
2. Extra registers
- have to save results to be used during the next clock
- need
- ALUOut
- A, B (source registers)
- Memory data register
- Instruction Register
- not visible to ISA
3. Need extra multiplexers
- to select register for multiple purposes |
|
|
Term
| What is the MIPS 5-step Fetch-Execute Cycle? |
|
Definition
1. Instruction Fetch (IF)
- fetch instruction from memory
2. instruction decode/register fetch (ID)
- read registers while decoding the instruction. the format of
MIPS instructions allows reading and decoding to occur sim.
3. execution/effective address (EX)
- execute the operation or calculate an address
4. memory access (MEM)
- access an operand in data memory
5. write-back (WB)
- write the result into a register |
|
|
Term
| What are the ideal condition for speedup to take place? |
|
Definition
| perfectly balanced stages & large number of instructions |
|
|
Term
| What is the equation for, time between instructionspipelined? |
|
Definition
| Time between instructionsnonpipelined divided by the number of pipe stages (which is the speedup) |
|
|
Term
| When designing an ISA for pipelined architecture, what's a good idea to have? |
|
Definition
instructions be the same length
to have few instruction formats
memory is accessed only in loads & stores
operands must be aligned in memory |
|
|
Term
| what are three pipeline hazards and what can cause them? |
|
Definition
structural (due to limited hardware resource)
data (due to data depend. between instruct.)
control (due to branching) |
|
|
Term
| what are three strategies to handle branches in a control pipeline hazard? |
|
Definition
stall on branch
branch prediction
delayed branch |
|
|
Term
| pipelining improves ____________ but does not improve __________. |
|
Definition
| throughput, instruction latency/execution time |
|
|
Term
| what's the difference between temporal and spatial locality? |
|
Definition
temporal locality - if a memory location is referenced, it will tend to be referenced again very soon
spatial locality - if a memory location is referenced, it is likely that a nearby lcoation will be referenced soon |
|
|
Term
| what is the memory hierarchy? |
|
Definition
SRAM .5-2.5 ns $2k-$5k
DRAM 50-70 ns $20-$75
Magnetic disk 5mil-20mil ns $.20-$2 |
|
|
Term
| What are the concepts of data migration? |
|
Definition
block/line
hit, miss
hit rate/ ratio
hit time
miss rate/ ratio
miss penalty |
|
|
Term
| in data migration, what is "hit time" and "miss penalty"? |
|
Definition
hit time - time to establish if hit or miss, and then to access faster mem.
miss penalty - time to retrieve block, and to replace it in faster mem. |
|
|
Term
| how many total bits are required for a direct-mapped cache with 16 KB of data and 4-word blocks, assuming a 32-bit address? |
|
Definition
16KB = 4K words or 212 words
a block size of 4 words (22)
so (210) blocks
each block has 4x32 or 128 bits plus tag bit (32-10-2-2) and a valid bit.
so the total cache size is =
210x(128+(32-10-2-2)+1) = 210x147 = 147Kits |
|
|
Term
| consider a cache with 64 blocks and a block size of 16 bytes. what block number does byte address 1200 map to? |
|
Definition
1200/16 = 75
75%64 = 11
11 |
|
|
Term
| increasing block size will eventually _______ miss rate |
|
Definition
|
|
Term
| what happens on a cache read miss? |
|
Definition
1. send the address to the mem
2. instruct main mem to perform a read and wait for the mem to complete its access
3. write the cache entry, putting the data from mem in the data portion of the entry, writing the upper bits of the address (from the ALU) into the tag field, and turning the valid bit on
4. restart the instruction execution at the first step, which will refetch the instruction, this time finding it in the cache. |
|
|
Term
| what is the two strategies to write to cache and what are the differences? |
|
Definition
write-through- writes to cache and mem, caches is never inconsistent, large mem latency (uses write buffer)
write-back- writes only to cache, better performance, update mem on cache block replacement (more difficult to implement) |
|
|
Term
|
Definition
| instead of making the entire path between the mem and cache wider, the memory chips can be organized in banks to read or write multiple words in one access time rather reading or writing a single-word each time. |
|
|
Term
| how do you calculate the CPU time? |
|
Definition
| (CPU execution clock cycles + memory-stall clock cycles) x clock cycle time |
|
|
Term
| how do you calc. memory stall clock cycles? |
|
Definition
| read-stall cycles + write-stall cycles |
|
|
Term
| how do you calc read-stall and write-stall cycles? |
|
Definition
read = reads/program x read miss rate x read miss penalty
write = writes/program x write miss rate x write miss penalty + write buffer stall cycles |
|
|
Term
| what is AMAT and how do you calc it? |
|
Definition
| average memory access time = time for hit + miss rate x miss penalty |
|
|
Term
| assume an instruction caches miss rate for a program is 2% and a data cache miss rate is 4%. if a processor has a CPI of 2 w/o any memory stalls and the miss penalty is 100 cycles for all misses, determine how much faster a processor would run with a perfect cache that never missed. assume frequency of loads & stores is 36%. |
|
Definition
I=number of instructions
instruction miss cycles= I x .02 x 100 = 2 x I
data miss cycles=I x .36 x .04 x 100=1.44x I
total number of stalls = 3.44 x I
total CPI = 2 + 3.44 = 5.44
performance ratio = 5.44/2 = 2.72 |
|
|
Term
| what are the 3 categories of block placement and distinguish between them? |
|
Definition
1. direct-mapped - block address % # of blocks in cache, 1-way set
2. full associative - anywhere in cache
3. set associative - block address % # of sets in cache, n-way set
advantages:performance is close to FA
complexity is close to DM |
|
|
Term
| what are the write policys of write through and write back? |
|
Definition
write through - cache & mem, cache is always clean, when processor must wait for mem to finish write, we have a write stall
write back - cache only, write to mem occurs at replacement, less mem bandwidth used, less power used (embedded) |
|
|
Term
|
Definition
| this is a bit put in place in order to track whether a page has been written since it was read into mem |
|
|
Term
| 3 forms of block replacement? |
|
Definition
random
LRU (least recently used)
good but hard to implement
FIFO |
|
|
Term
| what are the 2-level mem heirarchy issues? |
|
Definition
block placement
block identification
block replacement
write strat |
|
|
Term
| what is a page and a page fault? |
|
Definition
page - equivalent to "block" in cache term
- virtual page number in mem
- physical page num
page fault - equivalent to "miss" in cache term (very expensive) |
|
|
Term
| high cost of page faults can lead to what? |
|
Definition
pages should be large enough to amortize page fault cost (typically 4-16KB)
fully associativ organization (to reduce page faults)
page faults (handled in software, can afford clever page replacement algorithms)
write back (write-through is way too slow) |
|
|
Term
|
Definition
| located in mem, entries in a page table for every virtual page (no tags), page table is not a cache (doesnt store a subset of entries) |
|
|
Term
| what does Swap Space mean? |
|
Definition
| this is reserved space (by OS) on disk for all pages |
|
|
Term
| how many cycles does a cache miss take? a page fault? |
|
Definition
CM = 10's to 100's
PF = Millions |
|
|
Term
| what does a memory access (for hit) require? |
|
Definition
looking up page table = 1 mem acc
getting word = 1 mem acc |
|
|
Term
| What is a TLB and what is it used for? |
|
Definition
translation lookaside buffer = used for memory page hits
in the case of a tlb miss = stores valid/dirty/ref bit back to page table, much more frequent than page faults
size = 16-512 entries
block size = 1-2 page tab ent (4-8bytes)
hit time = .5-1 clock cyc
miss pen = 10-100 clock cyc
miss rate = .01%-1%
associativity = full |
|
|
Term
| when interfacing I/O devices to the OS, what are some functions provided by the OS? |
|
Definition
1. provides level of protection (user rights)
2. provides abstractions for I/O access
3. provides fair scheduling of shared I/O devices
4. types of communication
- OS can give commands to I/O dev
- I/O dev can give notif to OS
- transfer of data |
|
|
Term
| What are the characteristics of I/O devices? its performances? |
|
Definition
behavior - input, output, storage
partner - human, machine
date rate - peak rate
streaming - data bandwidth
transactions - response time |
|
|
Term
| what two states do device/systems alternate between? |
|
Definition
| service accomplishment and service intteruption |
|
|
Term
| how do you calc the MTBF (mean time between failures)? how do you calc the Availability (in I/O terms)? |
|
Definition
MTBF = MTTF + MTTR
Avail = MTTF / MTBF |
|
|
Term
| how can you improve MTTF? MTTR? |
|
Definition
MTTF - fault avoidance (selection of quality, maintenance), fault tolerance (redundancy (RAID)), fault forecasting (replace component before it fails)
MTTR - better tools |
|
|
Term
| what are some components of a disk storage device? |
|
Definition
platter (magnetic surface - nonvolatile storage)
tracks (10,000 - 50,000)
sectors & gaps (512 bytes, transfer time,
error correction code)
read-write heads on Actuator arm
controller (controller time) |
|
|
Term
| how do you calc the disk read time? |
|
Definition
| latency + seek + transfer+ controller |
|
|
Term
| what are some types of buses? |
|
Definition
processor-mem bus- address, data
I/O bus - connect via proc-mem bus or backplane
graphics bus |
|
|
Term
| what physical limits on bandwidth do buses have? |
|
Definition
| length, coupling, capacitance, clock slew |
|
|
Term
| what are two types of connections? |
|
Definition
synchronous - bus has associated clock, protocol implemented with small finite state machine, longer bus => slower, all devices run at bus speed
asynchronous - no clock (no speed, length limitations on specific devices), handshaking protocol with additional control lines |
|
|