CSC258H: Week 10 Reading Guide

Week 10: Parallelism and Pipelining

Topic Reading Recommended Exercises
Parallelism DDCA 3.6 Question 3.6
Pipelining DDCA 7.4-7.5 Exercises 7.15, 7.17-7.21, 7.24-7.29, 7.31, Questions 7.1-7.3

We're taking a step back this week. On the surface, this week's reading introduces techniques for making the processor more efficient. The problem with the single-cycle design (from week 7) is that a single cycle takes a long time: the speed of the processor is determined by the longest path through the entire processor. In multi-cycle and pipelined designs, we introduce additional registers and control structures to decrease the length of paths through the processor and, eventually, to execute multiple instructions simultaneously.

That idea -- that we can get better performance by executing tasks simultaneously -- is the deeper idea underlying the reading. Parallelism can be applied in both hardware and in software, and it offers significant performance benefits. However, to take advantage of parallelism, we often need to introduce additional complexity, in the form of control circuitry or software. Parallelism is one of the "big ideas" in the course, and we'll revisit this idea in databases (CSC343), operating systems (CSC369), networks (CSC358), and a newly introduced course (CSC367: parallel programming).

Next week, we'll take a look at another level of the machine architecture: the memory system.

Parallelism: Section 3.6

This section introduces two kinds of parallelism: spatial and temporal. These are key terms, and you should develop definitions for them. However, many tasks exhibit both spatial and temporal parallelism. Try to connect the ideas to real-world experiences or problems you've encountered, so you develop an intuitive feel for the two types of parallelism and how they interact.

Question 3.6 is the key question for the week. Be sure you have a good answer for it before moving into the details of pipelining a processor.

Pipelining: Sections 7.4-7.5

Section 7.4 introduces a multi-cycle processor. Section 7.5 introduces a pipelined processor. Both break the processor into a series of stages, each of which can be computed quickly. Frankly, I'm mainly interested in the pipelined processor in 7.5, but in order to understand that design, we need to think about how to build the less complex multi-cycle design first.

The multi-cycle processor gains a performance advantage over the single-cycle processor by allowing instructions to omit stages they do not need. Only one instruction is executed at a time, and that instruction will take multiple cycles to complete. The sum of the cycle times required to execute an instruction should, in general, be smaller than the single cycle time required to execute an instruction on our single-cycle design. The design is introduced in stages, just like the single-cycle design was introduced. As before, I recommend reviewing the figures before reading any text, asking yourself, "Why does the hardware that was just introduced do? Why is it necessary?" Pay special attention to the location of the registers that have been introduced, since they break the design into stages. Can you name the stages that have been created? Once you are comfortable with the processor design, switch your focus to 7.4.4, which discusses performance. The key question for this section is, "Can you understand (and can you perform) a performance analysis like the one in example 7.8?"

The pipelined processor gains a performance advantage by executing multiple instructions simultaneously -- but each instruction must execute every stage. (Figure 7.43 is key. Make sure to view it.) Unlike the previous two processors, the pipelined processor is designed in a single step, and most of the work occurs in the control system (rather than in the datapath). Make sure you are comfortable with the figures in 7.5.1 and 7.5.2. The key question you should aim to answer is, "Why is the extra control necessary?"

As you might expect from the extra control structures, executing instructions in parallel introduces significant problems: an instruction being executed may need information that is being produced by an earlier instruction that has not yet completed. These are called "hazards"; make sure you understand the term "hazard" and can identify both data and control hazards. Once you're comfortable with that term, look at the solutions proposed: forwarding and stalls. Two key questions: "What does (forwarding/stalling) do, in terms of solving issues with hazards?" and "What are the drawbacks of each of these proposals?" Finally, Section 7.5.5 introduces a performance analysis. As before, "Can you understand and can you perform an analysis like the one in example 7.10?"