Lecture Recording
- Original lecture video: Google Drive
- Lark Minutes w/ Transcription: https://rong.feishu.cn/minutes/obcnty6e6i36cj297q5lc1ig
The Lark Suite link might not work for all, see Resources for more details.
Text Transcription
2021年10月13日 下午 8:39|56分钟6秒
关键词: instruction、orange colored components、form expense structure、stage power component、FFP rinse cycle、instruction Maps machine、duplicate hardware components、pipeline diagram、stage pipeline jester、latency execution stage、execution status multiplier、tap pipeline table、adding Addi instruction、value storing register、stalling decode stage、Control flow instruction、branch predictor mechanisms、store instruction
文字记录:
Jishen Zhao 00:04
Alright so. Thank you already know how this pipeline, Psycho Circus. I did, and basically and so and so forth. And the next time I’m going to show you an animation. The goal went this. Animation is to show you if we have a program for a bunch of instructions feeding through their pipeline in each cycle. Oh, for each instruction, which part of this pipeline is being occupied for this particular instruction, OK, so in the rest few slides in this animation are going to.
Jishen Zhao 00:44
Color my instructions. And this program on the upper right corner with different colors and I’m going to use the same color to light up with the part of the pipeline diagram the components in his pipeline. As the same color as this instruction that occupying those components. So the first instruction is this one at the same section when it first getting the pipeline. This is the fashion stage.
Jishen Zhao 01:14
F stage and started by this suggest are called PC and those orange colored components, including instruction memory. And as Marx this edit is adder. Our or occupied by this instruction, so that means other instruction cannot use those components at the same time during this cycle.
Jishen Zhao 01:38
So cycle to the next cycle this, or instruction enters the second stage. The decode stage so during decode stage. You can see it’s actually using half of this D. I pondered Esther which half that have connected to the output. So it is using outputs of this deco stage pipeline register to get the output from the fashion stage. Originally, an then it is using this projector file to help with the decoding of this. Add instruction and it’s also using this half of the ex Pipeline Stage Jester. It’s using this half this half of input to those pipelines. Jester so those are the components are being occupied by this instruction.
Jishen Zhao 02:27
But at the same time cycle 2 was another thing is happening at the same time as the second instruction, getting the pipeline so. The same as the second at the first instruction when it was in cycle one.
Jishen Zhao 02:40
The second instruction. Getting this pipeline an occupying affection stage. Those are components related to the fashion stage, I should have covered this part as. Blue instead of grey OWL update after this class.
Jishen Zhao 02:55
And then the same thing will keep going in cycle 33. Things happen at the same time with the 3 instructions. The Orange instruction already answered execution stage so this is an ALU. It does the calculation and Azad other logic components and then this is the output part of the pipeline to Jester. This is ankle part of pipeline with gesture. And the same for the second instruction that is in the decode stage and start instruction. Just enter the pipeline in the fashion stage does all happened in the?
Jishen Zhao 03:34
Cycle 3 OK. Hi Ann cycle for keep going. And cycle 5. And until all of instruction down. After Psycho 7 OK, so hopefully with the animation. You can see at the same time. This pipeline is occupied by different instructions different part of the components are occupied by different instructions. But if you look closely you can see that each component can only be occupied by one instruction or say each stage. The component that as a group for one stage can only be occupied by Wednesday. Apply one instruction, the same as our laundry example. My driver can only be occupied by one load at a time. We cannot actually squeezing more clothes there because my dryer work only have that much overcapacity.
Jishen Zhao 04:29
Ok. So, like I said, You’re not going to be asked to draw does pipeline diagrams complex diagram. All the time. Most of time actually almost all the time will use the simple table to reason about pipelining. So most of time, I’ll give you a program sequence easier in high level language or already compiled into them. We call an ask you to reason about how this pipeline is running.
Jishen Zhao 04:59
Well, this is a program so there’s several conventions going to use on his table if in your previous clashes different conventions. This class we’re going to use this web. I’m going to use this to convention to represent so pay attention to, if there is a small difference.
Jishen Zhao 05:18
Ok also convention is let’s take a look at the table how it works. There’s a table and columns. Are different cycles soqosoqo 123 and only in total 9 cycles in here?
Jishen Zhao 05:31
In this table and the roads? Are each instruction. So instruction number one getting the pipeline exactly same as or? Previous animation, the program there’s3 instruction instruction to instruction 3. So those cells in this table shows in each cycle for this particular instruction. Which stage it is A? And or our convention is the stage a letter I put here, the stage. I put here means that this particular instruction finished this stage in this particular cycle so and is Psycho one. Are the first instruction at the end of cycle one? I should finish the fashion stage OK or XI highlighted? I use this example for the second instruction shows. Finish the execution stage by the end of cycle for so this is what this table mean your you use this table this format of tables a lot during this quarter? I so this is the basic of pipelining then the rest of today.
Jishen Zhao 06:41
Will talk about hazards. Various types of hazards and how we solve the hazards. Anne has this closely related to dependency actions are a lot of the hazards are triggered or caused by dependence. So we’ll take a look at it, but I will see what categories of.
Jishen Zhao 07:04
4 types of hazards, we can see where we can encounter in a pipeline machine. So first of all before we get into each type of hazard codependency. I want to just emphasize the difference between dependency enhancers. They look quite closely related because a lot of the headers actually is caused by dependency. But just the naming or the concept. The difference between the 2 are dependencies dependence. I think it’s very. Easy for us to understand there’s2 instruction. They depend on each other. This dependency, but has missed the dependency and possible of wrong instruction order if there is dependency. I will not cause any problem we don’t need to deal with it. We typically don’t cut this hazard, but hell that means that this depends there’s a dependency end. It may cause some problem OK, so just pay attention to the small difference in terms of concept.
Jishen Zhao 08:02
Here so now we’re good to look at different types of hazards and computer architecture. We care more about hazards instead of pure dependence, but without hazard because we need to deal with Heather. So the first type of or very easy to understand type of hazards data has it that the caused by data dependency OK. So let’s take a look at a couple of examples here it’s the same example, we use. An hour animation just now there’s no dependency. You can take a look at other adjusters. They are different 6745123. They they don’t have any dependence on each other. They’re completely independent. But if I change just a little bit and naming on the registers and instruction here. Those are still independent adjusters, so there still in no dependency or no hazard. But if I change a little bit say, I chased the first showing instructions. Actually, the second instruction to be now. Add using the Rajesh 3 here. So if you still remember or recall that form expense structure. Assembly called the first gesture is the destination and the next 2 are the Inputs. So this admin, Saxman Value store angle just one and 2 and store.
Jishen Zhao 09:33
The result into 3. So the same order for this task, but that means when the second instruction wants to use the value stored in Register. 3 best Regesta 3 value if it’s in a correct program. Order it should come from the results that being written into which as the 3 but let’s take a look at if we have. We put into a context where we hopefully can have a perfect power, pointing that it means each.
Jishen Zhao 10:04
Instruction getting the pipeline smoothly what will happen. So we’re going to use this table. You will see how we use. This table and you should use that that way as well for your exercise later on this quarter.
Jishen Zhao 10:18
So, in this table, the first instruction will be like this and if we have a perfect pipelining the second instruction will be laid out like this fashion Depot so there so perfect and smoothly running pipeline in there’s no not any. Style in any of the cycle but there is a problem because we the second instruction needs.
Jishen Zhao 10:39
The rejected 3. That’s the one we’re just inside our rest of file. If you still remember our pipeline diagram. The components in their rejected file. It only being written during the right backstage that is cycle, 5 of our first instruction. Buying cycle. Or where are they need that rejected 3 the value to perform the execution performed the adding so this cycle for does adding what it used for register 3, the input is still staying. An old out of date value that’s not being updated by the first instruction yet.
Jishen Zhao 11:19
So here the second instruction may get around number run result because we have brought in value. At this is has a nurse that dependency and there’s a possible error on the output over program so now we have a problem. That’s called the data hazard because? This dependency is on the data. So how to solve it. The solution is stall the instruction or having instruction waits until after the execution stage can get the right value of of rejected great, So what we do the correct pipelining.
Jishen Zhao 11:59
A table we shared right here. The second instruction. We should stop because they DDD until in cycle 6. We get the right value of registry because after cycle 5, the project 3 and instruction. One is already being written.
Jishen Zhao 12:19
So those are all the cycles. Cycle 345 other cycles, we start sorry Taco 3 and 4 other cycles, we stalled way for our value appreciated 3 an? And those psycho cycles dream for this instruction is not doing anything, it’s just idling waiting. The Depot stage. It still occupying auto component in the decode stage but it’s doing nothing and she’s like 05. It actually can decode now I got the value of precious 3 OK.
Jishen Zhao 12:55
Now we solve the problem over the first and second instructions by stalling the second instruction. Cycles 3 and 4 and Echostage way, there, but let’s take a look at. The next instruction is there any problem here? Oh, we have a question. Question was but isn’t reading register happening decode stage can, we get correct. Add the registry in the fifth cycle.
Jishen Zhao 13:27
Now let’s see in a fifth cycle reading over gesture is at the end of the coast stage, so at the end of the coast age. That register file at the end of cycle, 5, they were just the file should already. So that their address to file should already at the right value for Richard 3 calculated by instruction.
Jishen Zhao 13:52
One and the decode stage we don’t actually read the value of project file. Or we actually we done directly get the value or consume the value over just 3 out of the last file. We just take a look at the Register file values OK and then feed into the input.
Jishen Zhao 14:11
Ok, so the actual cycle. We need to use the value increases of 3 is in cycle at the beginning of cycle 6 and this is the time or exactly time this line exactly the time. So we need the value over yesterday so this kind of sharing at between the cycle and the cycle so if you see this is. The arrow that we fit in this.
Jishen Zhao 14:40
Register file output to this execution state input then we’re good OK. Alright so let’s get back to look at our store instruction. Ok let me. Finish this table and I’ll get to your question the next question OK, so let’s take a look at first is 30 instruction. Can you see the problem here so it’s not a data has it there’s no data dependency but? If you take a look at cycle for what’s happening here.
Jishen Zhao 15:18
Both of these structures, Orang, the decode stage but that’s. Not a lot because in a single cycle. We can only have one instruction, occupying a component of one stage so this low store hello or instruction in the decode stage it’s illegal. And we call this kind of hazard. Call structural hazard. Or the dependency on Howard components OK, so the structural head that makes the 2 instructions intend to acquire or get the same stage power component at the same cycle. And this has that again. We need to solve so we end we introduced the second type of hazard called structure hasn’t OK. Let me take a look at a question, we just have now. The question was.
Jishen Zhao 16:09
Maybe we can have 2 instruction be in the decode stage at the same time, yes, you gotta get it. Ok great. So when we do the same as our first trick. We just start here. So this third instruction because the second instruction stopped the third instruction also needs to stop. And the state in need to stall as during the fashion stage.
Jishen Zhao 16:33
So we cannot fetch or cannot finish the fashion stage until after at the end of cycle fire. And then we make sure that in cycle, 6, does their instruction, entered the decode stage while the decode stage. The components are already free up OK, so but pay attention to here is.
Jishen Zhao 16:55
There’s2 ways of represent this case in the pipeline type a table. You can put it like here in this example. We can fetch fetch fetch this instructional try to fetch by never get. Be able to finish the fashion stage or if this instruction. The pipeline implemented a smarter way it will not actually entered affection stage fever and will just start to fetch here in cycle 5. So you can put FFP rinse cycle green for or you have for Psycho 3 and 4 empty and just started from Psycho file for those instruction. Both are correct OK. Alright so we already have our first solution for hazard so for both data and structural hazard, including later will see more hazard cases types stalling is always our.
Jishen Zhao 17:50
Ultimate trick. If no other measures will work we can always store. The pipeline stop problem is having one instruction wait a certain stage Ann. And here this dependency goes out we continue OK.
Jishen Zhao 18:09
And also another thing for MIPS machine. I want to emphasize or mentioning that stalling always happen in. Fetching for deco stage or if you don’t consider fashion is installing will actually not yet really started to execute officially instruction in the? Action stage in cycle, 5, then maybe say instruction Maps machine always starting adeco stage there because for this kind of machine. They always check the dependency and determine whether we? The instruction is good to execute it for the other rest of the stage during decode stage.
Jishen Zhao 18:48
Sodeco stage is the only stage that check the dependency and hazards and make the stalling if needed, OK, so we don’t for mixing machine. We don’t start. Execution stage if we are waiting for memory.
Jishen Zhao 19:00
Where don’t starve for memory stages. We waiting for W. We always start indigo stage, OK, so just pay attention to that when you reason about. Using this table. So where they learn about to have the data has because data dependency and structure has that because. Who have limited hardware resources? So we can always start the pipeline to solve the structural hazard. But another will for avoiding structure has that as we can always adding more hardware component into the pipeline so if our pipeline has 2 register files say or have bigger address to file. Then we can actually solve the dependency on the registers. A guy shades.
Jishen Zhao 19:51
As one type of measures to avoid structure has an actually being using modern computers already will talk about more of those tricks. Later on how to duplicate how that duplicate hardware components to solve the structure has at work. But here just keep in mind, there are other solutions installing to avoid the structure hazard.
Jishen Zhao 20:16
Right This is the question actually. Raised by one of my former students in his class. Did he or she? I cannot remember event actually? Asked why MIPS? Always need to takes 5 cycles to finish because for some of the instruction. I hear this, adding instruction does not really have a memory. Stage because it does not really have any member access does not have a bracket why not? We only have 4 cycles to finish instruction and the answer is where do you learn about structure? Has it the answer is structure has it? Because if we make that number of Stage 4 different instruction differently. The pipeline will be harder to reason about an in this case immediately will get structure has it in the right backstage. Alright so for MIPS machine. And make sure that every instruction takes 5 stage even though this adding instruction does not do any real work and any member state is still waiting member stage until this one cycle. His son and then it advanced to the next stage of right back OK. Hi so we have where we are so far. We’ve discussed to type over hazard structural hazard and data Heather. Ann we’ve discussed what ultimate trick of solving hazard.
Jishen Zhao 21:46
Installing. But stalling always reduced performance because we’ve already discussed that the base. Cpi is one for stalling will make these basic here up so it made actual CPR def. A much larger than one if you have a lot of stalling because for example, if you’re selling for data has at 41 instruction. Then all of the rest of instruction needs to stop saying because. Of the structure has there will be raised. So we’ll take a break now it’s24234 will take a break for 10 minutes and when we get back. 235. So will get back at 2:45 and we’ll talk about some of the measures to solve the structural and sort data has at without stalling the pipeline. Ok, so we’ll take a break and get back at 2:45.
Jishen Zhao 22:48
Ok, so stalling is always. Bad for performance will hurt performance because she actually more cycles to finish. One instruction and maybe the later instruction as well. So what is the better solution in addition to stalling too? So the data has it. Is something called bypassing or put it another way or name? We can call it is forwarding? Bypassing forwarding essentially the similar thing, but will cut it by passing for now.
Jishen Zhao 23:29
So what is bypassing an how we used by passing to resolve that data has it. Let’s take a look at the same. Or similar example of 2 instructions so the first instruction is adds the second instruction is slow work because the pipeline that direction is your instruction is running from the left to right. So the instruction on the right hand side should be an earlier instruction in the instruction life. Inside is delivering instruction here. And obviously they have the data dependency on rejected 3. The fresh extraction calculated result in putting users at 3 and a second instruction need to calculate. So pay attention to a lower need to calculate based on the value of register 3.
Jishen Zhao 24:18
I need to calculate the address that I need to use to access the memory by adding 82 values. Short introduction 3. So during execution stage of the second instruction needs to perform the adding of 8. By projected 3 well. Currently, at the same cycle this is. And time of a particular cycle.
Jishen Zhao 24:50
This first instruction is still doing memory stages. Just waiting it doesn’t write back to Richard to do it yet. So that’s a hazard and bypassing what it may buy is that?
Jishen Zhao 25:05
Although this first instruction is not performing the writeback stage. Yet it still in memory stage. But the value of Richard 3 or the calculation is reduced to one and 2 is already down because they are they? Down with the execution stage is first instruction so. If you think of where the results of 1 + 2, which is the one with Glass Register. 2 is saying it’s in those temporary location. It’s in this memory ORM. Temporary. Or am I just pipeline Magister.
Jishen Zhao 25:46
So if you take a look at this pipeline diagram. Everything else stayed the same as before, but the only thing different. I already highlighted in red and this wire and this block is a must. If we don’t know what Max is it’s a selector you can think of that way. So this mugs what it does or the select whether this is that there’s into inputs and you can control to select which 12. Forward to output so this output of this block could be this one or this one. There’s a control signal coming from here, I didn’t put in here to tell this block.
Jishen Zhao 26:25
To select which one OK, an F? I select this thing put a fortune output. That means this ALU. This error will get is from losing put an this input is connected to the output of this register. This pipeline address of Empire Register and is an pipeline is already have the values supposed to be written into which is the 3 for the first instruction, so if we put the line here we actually can Fast forward.
Jishen Zhao 26:57
Other thoughts that supposed to be written into registry to the inputs of this ALU that perform 8, plus Witcher 3 right. So now we actually solve the dependency problem because we actually got. A short cuts of this value of the first instruction results to the input of the execution stage of our second instruction, so this is called bypassing. Bypass it means we’re adding a wire with the adding a mugs to solve the data has a problem, by fast forwarding. Some of the results from early instruction to later instruction of some stage. Ok. So we called this forwarding.
Jishen Zhao 27:46
Or by passing method, particularly in this example in this case, we called it AW sorry. Should be earlier slide we cut it an NX bypassing or MX 40? Read other articles the naming maybe a little bit different. But for this class.
Jishen Zhao 28:09
Let’s just keep it consistent will always call. Mx bypassing MX, bypassing means we’re bypassing or with. Boarding the output of the end popular Jester to the input of the execution stage.
Jishen Zhao 28:27
Ok, so this call AMEX bypassing. Let’s take a look at it under example. An hopefully this naming convention will be clear to you. This is called Double X, bypassing this is MX, bypassing Stew.
Jishen Zhao 28:41
Where is the wireless Max double tap X bypassing we’re adding another this is the new? Why are we adding him this is a wire that we enable WX bypassing and once called WX, bypassing is because we’re actually. If you look at this? Why were connected, it’s connected this way, and this was coming from here is the output of our.
Jishen Zhao 29:04
Double stage pipeline Jester now. This is another Max just don’t worry about it. This still connect to the output of the double stage Pipeline Jester. So we connect the output of W Stage Piper gesture to the input of.
Jishen Zhao 29:19
Execute. State input execution stage we were input into the ALU this adder here. So This is why it’s called WX bypassing. Or 40. The question here question was what, if both projects are ATX stage needs bypass value went from WX one from MX things will be more complex. Here in this example, we don’t consider that case, yes, but you can’t use your creativity and think about what will happen. Then you will need more complex, maybe wiring and Max is to enable that. But typically I don’t see much of that case you will need to do that OK. You will need either an at Max and double X bypassing one of those.
Jishen Zhao 30:18
Another question, saying stab you at the latest stage that does WX bypassing makes sense. So the question. Let’s take a look at this example OK. So MX bypassing I don’t think there’s any problem right so if we take collect there’s3 example for program.
Jishen Zhao 30:36
We put in because the pipeline diagrams, too complex to always 2. See clearly so we’re going to use pipeline table. To reason about it, so MX, bypassing is like this at one thing I want to mention another convention. If you do your homework or your exam question.
Jishen Zhao 30:51
I like to use the arrow to represent this bypassing is happening. So it’s like a wire and my pipeline diagram in the pipeline table. I will represent arcaro so this MX, bypassing iOS. The passing happened in the cycle the boundary between cycle 3 and fourth Ann I always like to. Music put an arrow in front of the cycle for to represent that OK, you can put another way, sometimes people if you read other articles people could arrow kind of like from. Here, the output of the end of cycle 3 to the beginning of cycle forever will be like this.
Jishen Zhao 31:31
But in this class, I was like to use our like this so if you use this format that will make? Our grading much easier OK, so try to do that convention, so MX, bypassing is good.
Jishen Zhao 31:44
I think our our earlier question was about double X bypassing that they need it so WX, bypassing is needed when you have 2 instructions. That has dependency on the jester, but there’s another independent instruction kind of sandwich thing between here. So, in this case. If I don’t have a double X bypassing what I need the execution stage of the third instruction cannot start until cycle 6 because by the end of cycle 5, the W. Has done that were just the one is written OK, so using double X bypass actually can save one cycle use MX bypassing will save 2 cycles. Ok think about that.
Jishen Zhao 32:30
And the third example is the question I leave for you after class actually answer is already in the PowerPoint file but don’t don’t look at it try to first. Figure out what this question mark what kind of instruction example. You can think of that need AWN, bypassing OK yourself and then you can take a look at my example and PowerPoint 5, then note. How and see if you get it right there could be other?
Jishen Zhao 33:00
Example as well, my my answer is just one example of that OK. So by passing an forwarding we’ve. Learn about the wiring and components. We added to the pipeline diagram for MX and WX bypassing but all kinds of other bypassing can happen and you can implement pipeline. At 22 to perform other typo, bypassing like double M or anything else as long as they needed but that may make the pipeline diagram even more complex.
Jishen Zhao 33:38
For this class, I don’t require to draw out the whole pipeline diagram. But I may have some of homework or exam questions that I ask you to adding to modify the pipeline diagram by.
Jishen Zhao 33:51
Adding certain wires or Maxis to implement a particular bypassing. Logic OK, so those red highlighted in red are those wires Maxis selectors. I put in an example earlier slice. Take a look at it and try to reason about it make sure you understand why I’m adding this wire on access and how that works and then you can trade working on examples later on. Alright so no we already so the data has a with a better solution that is bypassing or forwarding, bypassing the same as forward interested in their name.
Jishen Zhao 34:39
Now we’ll just. Step out of this hazard. Concept or discussion a little bit and I’ll talk about something else kind of related or more related to some of the questions some of you asked during the last lecture.
Jishen Zhao 35:00
Today is what, if there’s some instructions that some of the stage say the floating point execution or calculation stage cannot really fit into one cycle or? If we set the cycle time to be a popcorn machine to be the same as with maximum latency of amount all of stage in such kind of instruction. The cycle time will be too long kind of wasted time for other stage to complete an this kind of instructions.
Jishen Zhao 35:35
We find another solution is called motorcycle operations is kind of way. Similar to the way where we soak the structural Heather was installing to adding more hardware components. But this time, we use the same message to solve this motorcycle operations OK, so let’s take a look example. I hopefully will make it quick OK. So.
Jishen Zhao 36:01
In addition to floating point instruction that may take a super long time. During the execution stage. Actually, if you had to multiply instruction or dividing instruction. It may take a much longer time during execution. Stage then simply adding instruction as well, particularly images machine, we actually will create a stye pass and a pipeline. To accommodate those super long latency execution stage and that’s I pass instead of the main path.
Jishen Zhao 36:37
So and here the red part I highlight here. Yes, the components I added in addition to our basic Pipeline Diagram. Eliminate all the bypassing wires just to keep the diagram clean and simple so those are originally diagram. But after running some instruction that has super long executed space. They multiply instruction actually have 4X over latency compared to an adding instruction during the executed. I love you so we’re actually performing execution stage or multiply instruction in the CY Pass.
Jishen Zhao 37:14
So that we have the 2 drivers driver one driver 2 to accommodate 2 loads at the same time, we can do that. And we have to this dryer super powerful and can take this multiply instruction during execution. Ok, so will happen is that multiply instruction gauge. The pipeline, the same as other instruction fashion stage decode stage but during execution state will go through this pass.
Jishen Zhao 37:41
I have finished and if you take a look at this diagram closely just pay attention to actually does not. Go through the memory stage it does not come wrap up around here to memory state instead. I’d go directly to write backstage. This is particularly for most machine that the. Arithmetic operations that not directly deal with the memory load and store so multiple instruction. Its operations always registers. It will not have any memory access. In at multiple instruction, so considering the execution stage multiplier sorry super long here 4 cycle. We don’t want to waste time, too. Wait idling the member states, so we’ll go to the right back stick directly after we’re done with 4 cycles. Multiplying instructions already longer than an add instruction during the adding plus memory stage anyway. And for multiply instruction, particularly for Maps.
Jishen Zhao 38:47
Instructions amidst machines will even pipeline its execution stage, so even divide, one execution status multiplier into 4 separate smaller. States they will have much play number one more point number 23 and 4. Each one take one cycle. But the benefit is that if we have 2 or even more multiple instructions why by one by one. In my program, I can’t even pipeline. Those multi instructions during the execution stage so let’s take a look at example how that works. So if we have one multiply instruction and when adding structure and their independent you can see that the multiple instruction. Fd an execution stage, therefore cycles.
Jishen Zhao 39:40
4 separate cycle separate pipeline there are different components. Although II kind of use different letters. Because PIS not X there different hardware components, although they’re both ALU they’re doing. Execution so I use different letters later on when we reason about pipeline table. It will make her reading easier and also POP one there different horror components this is multiply.
Jishen Zhao 40:07
Multiplier stage number zero there’s a multiplier stage number went so there are different components that may we don’t have the hazard structural hazard if say if we take a look at second exam. So 2 months by instructions they can actually pipeline and execution.
Jishen Zhao 40:26
Stage P zero at the second instruction can start PCOS directly cycle for does not to wait until the. All execution state or for Psycho still complete. +2 there’s no structure has it because P zero and P one there different hardware.
Jishen Zhao 40:44
Uh another benefits over motorcycle instructions that being a copy inside path with a different power component that X is that now if the 2 instructions multiple AD here in this example. Are they there are plans are completely independent? There’s no sharing registers? No data but independency?
Jishen Zhao 41:05
I can actually write my tap pipeline table like this, I can write back at, I earlier. Then right back over mobile and later instruction will if there’s no dependency. I can write back. The earliest time I can write back is here, and cycle 8. I did not need to wait for longer cycles. Ok. So that’s kind of the benefit. But if there’s a dependency that things will be different. So if I change.
Jishen Zhao 41:36
The adding Addi instruction. Just a little bit. There’s a sharing of the Register Register for. I don’t mean there’s a data dependency, then we need to stop the second instruction, the same as before. And even though we’re we can perform a forwarding, bypassing in cycle 7. This is WX bypassing. We still need to stop because there’s multiply instruction execution stages. War cycles, too long OK.
Jishen Zhao 42:16
Ann. This already mention if we have a store instruction and there’s A. Your dependency or no dependency just pay attention to the structure hazard, so if you wait and I will be reasonable program that has the motorcycle instruction and their pay attention to your. My diagram can be very messy and we need to reason about it really carefully to avoid any structure has it OK? What is the question, we have so I have a question? Do you always start indico stage? Is this viable if we always sell? The upstage also upstairs starting at this stage only if we cannot we have a structure hazard or dependency. We cannot decode otherwise we always start indico stage.
Jishen Zhao 43:06
Ok. You will see most of time you stop format machine by the way you still indico stage. Ok solution of course structure hazard in the it’s in the WL stage, so solutionstar so starting decode stage. I keep using different letters for the stalling so this delay, you can. It’s called Deley cycle delay slot, or whatever if you read other articles. I just use different letters or different ways representing installing so.
Jishen Zhao 43:41
We RE articles you will not get surprised there is not always the capital. D put in here, so people can put in all kinds of representation. This is something that you will see a lot in their computer architecture. Ha articles material there’s some variance to something. Hey, there, but they’re basically saying the same thing. Hey things can be even messier.
Jishen Zhao 44:10
If you have a motorcycle operation in your program be super Super careful when you reason about the pipeline in the table, so there’s another example. You have for 2 instruction. They don’t really have kind of. Data dependency because this all the output of. But if later instruction will need to use the instruction that the value storing register for theirs.
Jishen Zhao 44:36
Actual data 2 data dependency here, then you need to be careful. This one right back cannot write back earlier than cycle, 7 because otherwise this instruction will get value, which is an old value over just for Calculator here. Help.
Jishen Zhao 44:57
Ok, there was another question about decode stage. I know there’s a lot of questions will be about Stony. Decode stage based on my experience with my past classes. I think I forgot to mention that. 4. By the way the question was how do we know when to start in the decode stage since we only know structural hazard until structure stage 7? No actually know the structure has it for. At certain stage during the postage because during decode stage will take a look at you already know what are the names of the register will need an whether the registers? Is occupied by a previous instruction because previous insurance? The decode stage already know us also? So during the coast stage where you know this instruction. What are the registers are will need an approval instruction weather address? It is using in any of the stage because everything is kind of deterministic.
Jishen Zhao 46:01
5 stage pipeline always or the motorcycle instructions. The student stage pipeline always so indico stage we can make pretty deterministic prediction like that. So that’s why for miss instruction. We always telling because stage but again another type of Pipeline Machine. More complex or deeper pipelines things may be different because those popcorn maybe? Harder to read about but for MIPS machine. It is that because sage. You can if you you still kind of have doubts you can try out yourself reason about an example program with the pipeline table and see OK.
Jishen Zhao 46:48
Hey. Yeah, so, so I already mentioned that 45 stage pipeline an it’s easier, but for deeper pipelines things will be more complex. Alright, What Time Is it OK. We still have 10 minutes. So I think we can finish the motorcycle operation is pretty good today in progress. I am so we do with this, the example in the previous. Slide again, delaying the decode stage or started echo stage, and then make sure the order of instructions correct too. Run the program without any incorrect output. Alright so put him out together, others that other.
Jishen Zhao 47:40
Example summary slide to summarize with a couple of examples of motorcycle operation so for multiply where you know multiple instructions. The number of cycles or number of stage in the execution stage of multiplier could. Be different, but it just I put an example here our example. Here is 4 cycles. But I can easily come up with the example with 6 cycles, 7 cycles or whatever OK just pay attention to.
Jishen Zhao 48:08
What actual implementation is for multiplier? But one thing I want to emphasize about motorcycle operation is things will be different for the execution Stage 4 multipliers. It’s easy to pipeline is easy to on the hardware. Design for on the really the circuit designs we easy to.
Jishen Zhao 48:32
Total Sir to split a single large air humidifier into multiple stage of small areas. But for the version is different for dividing instructions the L use. It’s hard to separate so just pay attention to forward dividing instructions. The execution stage, although still running the side path, but it’s not pipeline. So if I’m going to put into a pop on table.
Jishen Zhao 49:00
For multipliers instructions I will put in different piece, so they represent different power before dividing instruction. I will put all piece. That means all 4 cycles are occupied. With just a single area so that means if I have a second instruction. That is also dividing no data dependency, but I will have a structure has on PSO that means. I can only access start execution stage of the second dividing instruction in cycle 7 effort as he is a big PAS free OK. I kind of use a different. Um letter to represent a delay or stalling decode stage.
Jishen Zhao 49:47
Again, I used to solve for structure has also SS this another way of representing but saying essentially the same thing OK. Alright so this will have learned I actually plan for today.
Jishen Zhao 50:01
But I think we have to have some time, we can continue with the control hazard. So basically we have learned so far, about pipelining in altogether. You can use always use.
Jishen Zhao 50:14
This kind of slide. I have this index slides as index if you want to review after class to review with. The materials we covered during class. So the next will so far. We talk about hazard. We talk about data has that we’ve talked about structure hazard and we talk about. An ultimate track that is stalling to solve the data and structure hazard. We also talk about a better solutions without stalling to solve the 2 hazard or social hazard those. Smarter solution is to adding more power components, becoming is Costco. You need to spend money to buy more hardware components. For data has as the solutions.
Jishen Zhao 51:01
We performed by passing or forwarding, so that we can start on one stage that’s supposed to be wait installed. Uh supposed to be wait for an output of a previous instruction to start execution earlier.
Jishen Zhao 51:21
And also talk about motorcycle operations kind of off the world hazard, but things where they talk about data and structure hazards, overfitting that. And the next is the third type at the last type of hazard. We cannot talk about in his class is called Control Hazard. Control has that control measures are there’s a control full instruction involved a branch instruction for jump instruction involved so this kind of has that let’s take a look at one example today. To motivate this problem and then I think we’re pretty much done for today, so control has that is caused by control dependency control dependency means there’s A?
Jishen Zhao 52:05
Control flow instruction say there’s instruction here again easy by the way do you still remember would be an easy does this assembly code Viennese emails that? There’s. Register 3 of is not not equal to zero then jump to target target is stored in.
Jishen Zhao 52:28
Why at Target? What is stores typically for MIPS? It’s a PC of that particular line of instruction OK? Hi so if we take a look at this example, if there’s a branch instruction like like this in here in my pipeline and then whatever the next instruction is.
Jishen Zhao 52:47
I need to decide whether I’m going to execute this right next instruction by adding PC by 4. Pc Plus for it, I will find this next instruction or jump to target. I cannot decide that you offer even though I have a forwarding.
Jishen Zhao 53:03
Now we actually have a MF forwarding to allow us to decide after execution stage, I compared. Execution stage, I have a comparison for our 3 and zero. After execution stage, I know whether I want to jump on at the condition of the calculated so in here. I actually start for one, and 2 cycles, even if I have a40 an ex. Cording. With 2 cycles darling actually is definitely will reduce the performance so. First of all with the ultimate trick of stalling with the help over data forwarding. We definitely already improved or solve this control hazard, but again same as before, we think about. Can we do better weather better solution that we can reduce this 2 cycle of stalling as well? So this is something we are going to talk about. At the next offer or lecture next week, actually will talk about what we talk about is called branch speculation and later I will talk about we’re going to.
Jishen Zhao 54:23
Adding hardware components to perform branch prediction. And in the next week will also talk about some advanced branch prediction. Max measures that will be related to one of one option of your project so pay attention to next week’s class if you take. Are you choose the option of project?
Jishen Zhao 54:42
Didn’t implement a branch predictor OK well introduce those branch predictor mechanisms in the next week, OK, OK, so we have A? A couple of minutes, let’s let me answer our last question. So the question was does N means to read data from memory like or 3. Uh yes or not so during end stage if you are executing a lower instruction.
Jishen Zhao 55:14
Yes, it’s reading bike. If you’re executing a store instruction. It’s writing so basically end stage. Anime is dealing with memory access can be RE or right depending on whether you’re executing a lower or store instruction for MIPS machine. This machine only lower in-store deal with memory. Other instruction does not deal with memory. Only those 2 instruction does the useful job to remember stage other instruction just wandering around during the memory stage. Alright I think that’s all for today, then see you next week. Don’t forget to work on your homework one.