INSTRUCTION LEVEL PARALLELISM
TWO MARK QUESTIONS
1. Differentiate desktop, Embedded and server computers?
2. Define the following terms
a. Execution time
b. CPU time
3. What are dhrystone benchmarks?
4. What are Wheatstone benchmarks?
6. What are the different level of program used for evaluating the performance of a machine?
7. What is SPEC?
8. What is a kernel?
9. Mention the difference between desktop, Embedded and server benchmarks?
10. Define Total execution time
11. Define Weighted execution time
12. Define Normalized execution time
13. State Amdahl’s law
14. Define Speedup?
15. Give the CPU performance equation and define the following
b. Instruction count
16. What is the principle of locality?
17. What are the various classes of instruction set architecture?
18. What is Little Endian and big Endian?
19. What is effective address and pc relative address?
20. What are the various addressing modes?
21. What are modulo and bit reverse addressing modes?
22. Comment the type and size of operands?
23. Explain the operand for media and signal processing
24. Give the various categories of instruction operators with example for each?
25. Commend the operation for media and signal processing?
26. What are the different type of control flow instructions?
27. Give the major methods of evaluating branch condition, their advantages and disadvantages
28. Explain instruction coding and its type?
29. What are the various compiler optimization available ?
30. What is a media processor? Give example
31. Compare MIPS and TM 32 processor
32. What is a vector processor?
33. What is Flynn’s taxonomy ?
34. Explain the various methods by which data level parallelism is obtained?
35. Compare RISC and CISC machines.
36. Differentiate von Neumann and hardware architecture.
37. What is pipelining?
38. What are the basic of RISC instruction set architecture?
39. What are the different stages of pipelined architecture?
40. Briefly describe basic performance issues in pipelining?
41. What are hazards? Mention its types?
42. How data hazards can be minimized?
43. What are structural hazards? how it can be minimized?
44. What are control hazards?
45. How is pipelining implemented?
46. What makes pipelining hard to implement?
47. Mention the various exceptions and methods to deal with exceptions?
48. What is score boarding?
49. What are linear pipeline processors?
50. What is clock skewing?
51. Differentiate static and dynamic pipelining?
52. What are non linear pipeline processor?
53. What is latency?
54. What is reservation table?
55. What are forbidden and permissible latencies ? give example
56. What are contact cycle?
57. What is collision vector?
58. Explain pipeline throughput and efficiency
59. How do you compute pipeline CPI?
60. What is a basic block?
61. What is ILP?
62. What are forwarding and bypassing techniques?
63. What is loop-level parallelism ?
64. What are the various dependences? How to overcome it?
65. How to avoid hazards?
66. What are the different name dependences ?
67. What is a control dependence?
68. What is a data dependence?
69. What is dynamic scheduling? Compare dynamic scheduling with static pipeline scheduling?
70. Differentiate in-order and out-of-order execution of instruction?
71. What is imprecise exception?
72. Explain Tomasulo’s algorithm briefly?
73. Explain WAR hazards?
74. Explain WAW hazards?
75. Explain RAW hazards?
76. What is a reservation station ? mention its fields?
77. Give the merits of Tomasulo’s algorithm?
78. How to remove control dependences?
79. Compare 1 bit and 2 bit prediction schemes?
80. Give the merits and demerits of 2 bit prediction scheme?
81. What are correlating branch predictors?
82. What is register renaming?
83. What is commit stage?
84. How to take advantages of more ILP with multiple issue?
85. Compare superscalar and VLIW processors?
86. What are statically scheduled superscalar processors?
87. How multiple instruction issue is handled by dynamic scheduling ?
88. What are limitations of ILP?
89. Explain P6 micro architecture?
90. Compare Pentium III and Pentium IV processors?
91. What is thread level parallelism(TLP)?
92. Explain how to exploit TLP using ILP data path?
93. Give the practical limitation on exploiting more ILP?
- What is the role of compiler in exploiting ILP?
- Give the typical latencies of FP operations and loads and stores.
- What is loop unrolling?
- Give the summary of loop unrolling and scheduling.
- What is register pressure?
- How loop unrolling and pipeline scheduling can be used with static multiple issue?
- What is a static branch prediction?
- What are the various methods available for static branch prediction?
- What is static multiple issue?
16 MARKS QUESTIONS
1. With reference to linear processors, explain pipelining in detail.
2. With non-linear processors, explain pipelining with latency analysis, make use of relevant state diagrams whenever required.
3. Explain in details the various pipeline hazards and methods to overcome
4. Describe the classical 5-stage pipelining for a RISC processor.
5. Explain in details how the pipelining is implemented with reference to a MIPS processor?
6. Describe in details what makes pipelining hard to implement?
7. What are exceptions? Mention its types. Explain its requirements and need for maintaining precise exceptions.
8. What is dynamic scheduling ? explain with suitable examples the Tomasulo’s algorithm for MIPS processor
9. What is branch prediction ? explain the various schemes in detail.
10. Explain in detail the hardware based speculation for a MIPS processor, explain how multiple issue is handled with speculation
11. Compare Tomasulo’s algorithm and hardware based speculation
12. Explain in detail the limitations of ILP with a special mention on realizable processors
13. Indentify and justify the following fallacies/pitfalls
a. Processors with lower CPI will always be faster
b. Processors with faster clock rate will always be better
14. With suitable illustrative examples, explain how compiler techniques can be exploited for achieving ILP?
15. Explain in detail about static branch prediction.
MULTIPLE ISSUE PROCESSORS
TWO MARK QUESTIONS
- Explain static multiple issue with respect to VLIW approach?
- Compare local scheduling and global scheduling.
- What is trace scheduling?
- Discuss the various problems associated with the VLIW processor and measures for their mitigation.
- What is loop carried dependence and dependence distance?
- What is the need to detect loop dependences? How does the compiler detect it?
- What is interprocedural analysis?
- What is copy propagation?
- What is tree height reduction?
- What are recurrences?
- What is symbolic loop unrolling and mention its techniques?
- Explain software pipelining and trace scheduling. Compare them.
- What is critical path?
- Explain the methods involved in trace scheduling.
- What is superblock?
- What is tail duplication?
- What are predicated instructions? What are its limitations? Give the processors that support conditional move.
- What are the capabilities required to speculate ambitiously?
- Mention the methods to speculate ambitiously preserving the exception behavior.
- What is fast mode?
- What are poison bits?
- What is a sentinel?
- Compare hardware and software speculation mechanisms.
- What is an IA-64 ISA?
- What are the various components of IA-64 register model?
- What is a register stack engine?
- What is a bundle?
- What are NaTs and NaTVals?
- How can a deferred exception be resolved?
- What are advanced loads and ALAT?
- Explain Itanium processor.
- What are the pipeline stages available in an Itanium processor?
- What is a crusoe processor?
16 MARKS QUESTIONS
- With an example, explain static multiple issue in a VLIW processor?
- Explain in detail about multiple issue in a EPIC processor?
- Explain in detail how compiler support can be used to increase the amount of parallelism that can be exploited in a program.
- With examples, explain how do you detect and enhance Loop Level Parallelism?
- Explain software pipelining techniques in detail.
- Explain the need for hardware support for exposing more parallelism at compile time.
- Explain briefly about conditional or predicated instructions and the limiting factors affecting their complete usefulness.
- Explain compiler speculation with hardware support.
- Compare Hardware and Software speculation mechanisms.
- Explain Intel IA-64 Architecture in detail with suitable reference to Itanium processor.
- Discuss the role of ILP in embedded and mobile applications.
MULTIPROCESSORS AND THREAD LEVEL PARALLELISM
TWO MARK QUESTIONS
- Give the taxonomy of parallel architectures.
- What are the merits of MIMD multiprocessors?
- What is a thread? Explain thread level parallelism.
- What are centralized shared memory architectures and symmetric shared memory multiprocessors?
- What are distributed memory architectures?
- What is distributed shared memory architecture?
- What is a multicomputer?
- What are message-passing multiprocessors?
- What is RPC?
- What are the performance metrics for communication mechanisms?
- What are the advantages of different communication mechanisms?
- What are the major advantages for message passing communication?
- What is a shared virtual memory?
- What are MPPs?
- What are the challenges involved in parallel processing?
- What are OLTP, DSS and Altavista?
- What is a FFT Kernel, LU Kernel and Barnes and ocean application?
- How do you estimate the performance on parallel multiprocessors?
- What are private and shared data?
- What is multiprocessor Cache Coherence?
- What is cache coherence problem and when do you say a memory system is coherent? What are cache coherence protocols?
- What is cache consistency?
- What is write serialization?
- What is snooping? What are the various snooping protocols?
- What are write invalidate and write update protocols?
- What are the performance differences between write update and write invalidate protocols?
- What are write through and write back caches?
- What are ownership misses and coherence misses?
- Compare true sharing and false sharing misses.
- What are cold misses, coherence misses and conflict misses?
- What is a working set effect?
- Give a performance of snooping cache schemes.
- Give an example of multiprocessor.
- What is the need for distributed shared-memory architectures?
- What is a directory protocol?
- What are local node, home node and remote node?
- Give a summary of performance of distributed shared memory multiprocessors.
- What is an atomic exchange?
- What are the various atomic synchronization primitives?
- What is load locked and store conditional instructions?
- What are spin locks?
- What is barrier synchronization?
- What is sense-reversing barrier?
- Explain spin lock with exponential backoff?
- What are queuing locks? How does it work?
- What are gather stage and release stage in barrier synchronization technique?
- What is a combining tree?
- What is sequential consistency?
- What are data races?
- Explain relaxed consistency models.
- What is multithreading? Compare its types.
- What is SMT? What are the design challenges in SMT processors?
- What are the potential advantages from SMT?
- What is multilevel inclusion and how is it implemented?
- What happens to automatic enforcement of inclusion when the block size differs?
- What are non blocking caches and latency hiding?
- Why is nonbinding prefetch critical?
- What are the complications that arise while implementing prefetch?
- How speculation is used to hide latency in strict consistency models?
- How is virtual memory support used to build shared memory?
- What are DVMs and SVMs?
- How performance of parallel processors is measured?
- Compare memory constrained and time constrained scaling.
- Explain sun wildfire prototype.
- What is Coherence memory replication and COMA? What are the types of COMA?
16 MARKS QUESTIONS
- Give an overview of the taxonomy of parallel architectures.
- Explain in detail the various performance metrics for communication mechanisms and discuss their advantages and challenges of parallel processing.
- Explain in detail how parallel processing in various workloads affect their performance characteristics.
- Explain in detail the symmetric shared memory architectures with reference to multiprocessor cache coherence problem.
- Explain in detail the schemes available for enforcing coherence. Discuss its implementation techniques with suitable state diagrams.
- With relevant graphs, discuss the performance of symmetric shared-memory multiprocessors for various workloads.
- Explain in detail the distributed shared memory architecture highlighting the directory based cache coherence protocol. Substantiate your explanation with suitable examples and state diagrams.
- With relevant graphs, discuss the performance of distributed shared memory multiprocessors.
- Explain in detail the need for synchronization and how it is achieved in a multiprocessor? Discuss the associated implementation issues.
- Discuss the synchronization mechanisms for larger scale multiprocessors.
- Explain in detail the memory consistency models.
- Explain how thread level parallelism within a processor can be exploited? With suitable diagrams, explain simultaneous multithreading, its design challenges and potential performance enhancements.
- Explain inclusion and its implementation.
- Briefly explain how virtual memory can be used to build shared memory units.
- How do you measure performance of parallel processors?
- Explain Sun’s Wildfire architecture and its performance characteristics..
- “Multiprocessors are free” -- Explain this fallacy.
MEMORY AND I/O
TWO MARK QUESTIONS
- What are the levels in a typical memory hierarchy in embedded, desktop and server computers?
- Define the terms: Cache, Cache hit and Cache miss, Miss rate and Miss penalty.
- Compare temporal and spatial locality.
- What is a page fault?
- What is a memory stall cycle and give its formula?
- What are the four common questions for the first level of memory hierarchy? Summarize their solutions.
- What are the different cache configurations?
- What is a valid bit with reference to cache?
- Define the terms: Block address and Block offset, Tag and Index field.
- What is a LRU algorithm?
- Compare write through and write back.
- What is a dirty bit?
- What are the two options available on a write miss?
- How do you calculate the width of index field for a set associative cache?
- What is a victim buffer?
- What is average memory access time and give its formula?
- Give the relation between average memory access time and processor performance.
- For an out-of-order execution processor, how do you define miss penalty?
- What are the various cache optimizations available? Summarize them categorically.
- How to reduce cache miss penalty? What are the various optimizations available?
- Should the cache be made faster to keep pace with the CPU speed or should its size be made larger to overcome the widening gap between CPU and main memory? Which is optimal?
- Compare local and global miss rate.
- What is multilevel inclusion?
- What are critical word first and early restart techniques?
- Should read misses be given priority over writes? Comment.
- What is write merging?
- What is a victim cache?
- Summarize the miss penalty reduction techniques.
- How to reduce miss rate? Categorize misses.
- What are the different types of conflict misses?
- What is a thrash?
- Larger block sizes will reduce compulsory misses. Is the statement true or false? Justify you answer.
- How to reduce the capacity misses and what are the associated drawbacks?
- How miss rates improve with higher associativity?
- What is 2:1 cache rule of thumb?
- What are way prediction and pseudo associative caches?
- Give the relationship between regular hit time, pseudohit time and miss penalty.
- What is loop interchange and blocking?
- What is blocking factor?
- Summarize how cache miss rate can be reduced.
- How to reduce cache miss penalty or miss rate by parallelism?
- What is a lockup free cache?
- What is a “hit under multiple miss” or “miss under miss”?
- Explain hardware prefetching of instructions and data.
- What is compiler controlled prefetching?
- Compare register and cache prefetch.
- What is a nonbinding prefetch?
- Give a summary of how cache miss penalty or miss rate can be reduced by parallelism.
- How avoiding address translation during indexing of the cache will reduce ht time?
- What is a virtual cache?
- What is page level protection?
- What is a processor-identifier tag?
- What is page coloring?
- Explain how pipelined cache access can be used to reduce hit time?
- What is a trace cache?
- Explain the various techniques available for improving the performance of main memory.
- How does interleaving improve the performance of a main memory unit? What is interleaving factor?
- How do you determine the number of banks for main memory? What are the demerits memory banks?
- How do independent memory banks support higher bandwidth for main memory?
- Compare access time and cycle time.
- Compare DRAM and SRAM.
- What is RAS and CAS in DRAM?
- Give the internal organization of a 64M DRAM.
- What is DIMM?
- What are the various memory modules available for embedded processors?
- How do you improve memory performance in a DRAM?
- What is fast page mode?
- What is SDRAM?
- What is RAMBUS, RDRAM and DRDRAM?
- Compare RAMBUS and DDR SDRAM.
- What is virtual memory?
- How does the cache and virtual memory differ?
- What are pages, segments and paged segments with respect to virtual memory?
- Where can a block be placed in main memory?
- How is a block found if it is in main memory?
- What is a page table and inverted page table?
- What is a translation lookaside buffer?
- Which block should be replaced on a virtual memory miss?
- What is a use bit?
- What happens on a write in main memory?
- Explain the techniques for fast address translation.
- What is a PTE?
- How is the page size selected?
- What is internal fragmentation in a virtual memory?
- What is a process? Compare process switch and context switch.
- How to protect processes?
- What is base and bound?
- What is a kernel process and system call?
- What is a paged virtual memory?
- What are the various protection fields available in an Alpha architecture?
- What is a segmented virtual memory?
- Explain protection in Intel Pentium.
- What are Trojan horses?
- What is a descriptor table and segment descriptor in an Intel processor?
- What are the various PTEs in Intel processor?
- Compare global and local address space in a Pentium processor.
- What is call gate?
- In a protection mechanism, what happens if caller and callee are “mutually suspicious” so that neither trust the other?
- Compare protection model for virtual memory in Alpha and IA-32.
100. How does speculative execution of conditional instructions affect the Memory system?
101. What is PAL mode in an ALPHA memory hierarchy?
102.Does I/O performance matter?
103.What is throughput?
104.Does CPU performance matter?
105.Does Performance matter?
106.What are the different storage devices available?
107.Define the terms: (a) Seek time (b) Rotational latency with reference to hard disk.
108.What is queuing delay?
109. How do you compute area density of magnetic disk?
110.What is a Flash memory?
111.What is split transaction?
112. What are the various bus standards available?
113.What is SCSI?
114. What is memory mapped I/O and Interrupt driven I/O?
115.What are done bit and error bit used for?
116. What is polling?
117. How do you interface storage devices to the CPU?
118. What is Reliability, Availability and Dependability with reference to storage systems?
119. What is a latent error and latent error processing? Compare latent error processing with effective error processing.
120. What is module reliability and module availability with reference to storage systems?
121. What are the various storage faults? How reliability can be ensured?
122. Explain the following terms: RAID, MTTF, Hot swapping?
123. What is mirroring?
124. What is bit interleaved parity and block interleaved parity?
125. What is P+Q redundancy?
126.What is Berkeley’s Tertiary Disk?
127. Compare transient fault and hard faults.
128. What is Tandem and VAX?
129. How do you measure I/O performance?
130. Which I/O devices can connect to a computer system?
131. How many I/O devices can connect to a computer system?
132. What is response time?
133.Compare Throughput vs Response time.
134. How do you categorize the interaction with a computer?
135.What is transaction time?
136. What is queuing theory?
137.What is queue or waiting line?
138. What is Little’s law?
139. How do you measure server utilization?
140. How do you measure average residual service time?
141.What is SPEC SFS?
142. What is DMA and virtual DMA?
143. What is NAS?
144. Briefly explain the design of I/O system
16 MARKS QUESTIONS
- Describe in detail how the four memory hierarchical questions can be handled? Illustrate the answers with an example.
- Explain how cache performance is measured and how it can be improved?
- Discuss in detail the various techniques available for reducing cache miss penalty?
- Discuss how cache behavior can be improved by reducing the miss rate?
- Discuss how reducing cache miss penalty or miss rate by parallelism would provide scope for performance improvement in a memory module?
- Explain how reducing hit time in a cache would speedup the memory module?
- Describe in detail how performance improvement of main memory could be targeted?
- Elaborate on the various memory technologies that you know and give a comparative study.
- With a neat hypothetical memory hierarchical picture, describe the concept of virtual memory and analyze the memory hierarchical questions with reference to it. Explain how performance improvement is achieved?
- Explain the paged and segmented virtual memory protections each with a suitable example. Compare them.
- Discuss the vital issues associated in designing a memory hierarchical module.
- Discuss in detail the various types of storage devices.
- Explain the Bus standards and the interfaces. With timing diagrams, explain the read and write operations occurring in a typical bus.
- Explain with a neat diagram, the interfacing of storage devices to the CPU.
- Discuss Reliability, Availability and Dependability for storage devices in detail.
- Explain RAID architecture in detail.
- Discuss about errors and failures of storage devices.
- Explain in detail how I/O performance of storage systems can be measured?
- Using queuing theory, explain how server utilization can be computed?
- Comment on the various benchmarks available for performance measurement of storage systems.
- Describe in detail the process involved in designing an I/O system.
- Discuss the role of a storage device in a digital camera.
TWO MARK QUESTIONS
1. What is multithreading?
2. Write short notes on multi-core processors?
3. What are the different hardware multithreading techniques ?
4. What is cycle-by-cycle interleaving,
5. Write short notes on block interleaving?
6. What is simultaneous multithreading.?
7. Discuss Flynn’s taxonomy?
8. Write about SISD?
9. Write about MISD
10. Write about SIMD?
11. Define MIMD?
12. What is Simultaneous multithreading?
13. Define Chip multiprocessing
16 MARKS QUESTIONS
1. Explain in Detail about Multicore architectures
2. Discuss in detail about the applications benefit from multicore?
3. Explain in detail about software multithreading?
4. Discuss Flynn’s taxonomy?
5. Explain in detail about hardware multithreading techniques ?
6. Explain in detail about CMP architecture and SMT architecture ?
7. Discuss the design issues of CMP and SMT architecture?
8. Discuss about Intel multi-core architecture.
9. Describe in detail about SUN CMP architecture
10. Discuss in detail about heterogenous multi-core processors.
11. Explain about IBM Cell Processor