Lecture Recording
- Original lecture video: Google Drive
- Lark Minutes w/ Transcription: https://rong.feishu.cn/minutes/obcntydk8r6239nxi7o242sg
The Lark Suite link might not work for all, see Resources for more details.
Text Transcription
2021年10月13日 下午 8:39|1小时3分钟58秒
关键词: award time flow、instruction、branch instruction、branch、Alfred execution stage、branch prediction、branch outcome、Vp branch predictor、branch predator design、pipeline diagram、prediction RE speculation、bTB design、bTB Maze Branch、return address stack、prediction、branch history table、prediction cut week、branch target buffer
文字记录:
Jishen Zhao 00:02
Ok, so today, we’re going to continue to talk about pipelining specifically will spend most of their time talking about branch prediction, which is. Part of a solution to solve control hazard.
Jishen Zhao 00:17
And then if you have time later on today will start to talk about optimization of pipelining, especially will talk about hardware will get the rating alright so let’s get started.
Jishen Zhao 00:33
This is the list of content we mostly talk about basic ideas pipelining so with pretty much covered everything except for control has a branch prediction. We talk about today. An if you will review later on after Virginia this index to help you get your review.
Jishen Zhao 00:54
But what we have been discussing last class, especially the second half of the last question is dealing with all kinds of hazard so we talk about 2 types of hackers. So far I’m pipelining. When is beta had that the other is structural hazard so remind yourself?
Jishen Zhao 01:17
What are those particularly for data has at one of solution we? Have. Try to solve the data has 4 by bypassing or forwarding. So we’re going to modify the pipeline diagram to add a wires and boxes to create a short pass or forwarding path from a later. Stage 2 approval stage, so when you draw a pipeline diagram to enable some sort of bypassing make sure that your wire the direction were your data focuses from a later stage to an earlier stage.
Jishen Zhao 01:55
Because F from the earlier stage later stage. This is by a normal direction award time flow so in that case. We don’t need it by passing forwarding that were whatever we need it by passing.
Jishen Zhao 02:08
Morning, we need to go reverse of time, we need to forward the output of the later stage and instructions already entered at later stage to an instructions to stay in earlier stage.
Jishen Zhao 02:19
So here’s the example that we went through last time 3 types of bypassing forward in here, so the last one alive led to you after last time the class is what kind of instruction.
Jishen Zhao 02:32
I can think of as an example that we need WM, bypassing that is to forward this that the input to the. Double stage that is this temporary store in the double stage Piper Jester to the input of the memory stage. So the answer already put in PowerPoint file its in-store instruction, so whenever you need AM stage. You actually need the data in end stage is when you actually do the real thing, real job in the end. And stage the only instructions that will do real job am stages. Low word or stored in its instruction set right so in this case, particularly in this example, I can think of a store instruction, but. You may think of others think you can.
Jishen Zhao 03:22
And also there are only 3 types of bypassing the examples, maybe, as your type of housing or forwarding. Both help you to reduce other type of data hazard. And that could happen as well.
Jishen Zhao 03:41
And we also talk about motorcycle operations last time, particularly to deal with instructions that have super long. Time needed to go through the execution stage that it’s inefficient to set the Clock cycle to be the same as the latency needed during that long Super Bowl execution stage. So for those kind of instruction will create a psychopath.
Jishen Zhao 04:11
Indicated red color here to deal with those particular type of instruction in execution stage, particularly so those instructions. We went through 2. Examples one as multiplying structuring the others divide instruction. They will go through the same faction deco stage and then in execution stage will go through this site has and then pay very attention to they don’t have a memory stage because they are already too long. Too many cycles to complete the execution stage so they will go through their case execution stage here and directly go to the right backstage skip the memory stage still.
Jishen Zhao 04:52
For a typical multiply or divide instruction. They will be the total latency for single instruction be longer than typical 5 stage. Mips instructions like adding and. Lower store. Hey. And the other thing you want to pay attention to is the difference between multiple instruction and dividing instruction for multiply instruction. The execution stage. They are pipeline, so you can pipeline to 2. Multiply instructions one after another but for divide instruction. The execution stage cannot be pipeline because you cannot divide a single divider into multiple smaller divider stage during execution. So the accuracy stage of the second divide instruction need to start after the completion of the execution stage of the first divine instruction.
Jishen Zhao 05:50
So this is how you should draw your pipeline table to reason about finding love to multiply and divide instructions and program back to back. I said, This is mainly what we have learned last time discussed last time about pipelining and today like I said, We’ll focus on control hazard and Brown lunch prediction. Alright so we kind of started a little bit about from branch prediction. And branch prediction is one type of measures to deal with control hazard, but remember that an ultimate measures to solve 4 kinds of hazards, including structure has A. Data has an end control hazard little solid pipeline, so stalling. It can always make sure the pipeline running correctly. You have the first program operation but? Darling may introduce. Performance overhead so these measures, including forwarding, bypassing and hear the branch prediction are to improve the performance over various hazards.
Jishen Zhao 07:11
Ok. So dependency will lead to hazard in here control hazard. Is due to their control dependency that can lead to around program order? And let’s take a look at this example. We kind of started to discuss last time is this branch instruction be an easy or 3 target so you should be very familiar with this instruction what it does and how. You can tell what it is now an no homework. Going is in you will need that kind of scale to readers and the call so for those instruction.
Jishen Zhao 07:53
Are they right up at my table like this kind of lazy? I didn’t draw the whole type open table the number of cycles. But it should be able to say what it mean so there’s the first cycle the first instruction getting to pipeline. Touch ID code, but we cannot decide whether to start afresh at the right.
Jishen Zhao 08:14
The next instruction. Starting from second cycle or because it’s a branch injection or it will jump to. The target so in this example, we actually need to calculate. Our branch out come the condition to compare our 3 with zero to decide whether we should jump an in particular.
Jishen Zhao 08:36
This example is it does not. Holds the condition does not hold so we will not jump in will execute the next instruction. But the next instruction. We cannot fetch into after the execution stage of this branch instruction, so in this case, where she. Introduce 2 cycle so for penalty that is 2 cycles of Deley, we called performance penalty. Because we have a control dependency or control hazard here, even we have a forwarding. F forwarding here because without a forwarding will maybe you need to delete even more. But even was forwarding available, will still need to delay 2 cycles. We still introduce 2 cycle penalty here. So. What is the question here is how we can improve the performance by reducing those 2 cycle of 4 penalty for delay and branch. Buy a new program that has a bunch instructions. Pam Ann.
Jishen Zhao 09:47
One of the measures, we introduce is called branch speculation so speculation means that we proactively or speculative way we don’t know the outcome yet, but we kind of gas or?
Jishen Zhao 10:00
We kind of arbitrarily make a decision whatever to fetch some instruction before the grandchild. Can already known so we will proactively start to fetch some instruction may be here or here, too.
Jishen Zhao 10:15
Reduce that to cycle penalty, so we can do or or best case what we can do is we start fashion here is the next instruction. Whatever the branch outcome will be. Target or fresh next instruction. We just fashion at construction, whatever but. F will run then we will need to. Delete the instruction that we did already fetched say where do you start? Refresh next instruction. But after the execution stage here of the branching structure? We know that we’re going to jump to target so. We will need to correct our error, so those crashes called branch recover. We need to recover from our previous error that we already fetched and start to decode around instruction.
Jishen Zhao 11:10
So after 3 cycles, Alfred execution stage or branch instruction. We recover the branch by deleting or flushing away the instruction worthy. Get into the pipeline and start to fetch the correct instruction that is in the target. An in order to fetch them. Now I add some wire and glass and some access and this pipeline diagram to enable this branch recovery. Because if we don’t add anything. Then the pipeline was still going on and on.
Jishen Zhao 11:49
But we need to recover the branch. So we need to make sure that we have a selection. We have a choice between 2 execute. I’ll continue to execute instruction or flush it away, so the wires. You can see I added here is I added this wire as a control signal to this mugs. So this one. You should already seeing from our previous class when we talk about data has. That data hasn’t informed by passing this is selector or Max is select switching the 2 signals so this is. Those signal orders whole line the same for those 2. Those employed to those 2 muxes are the normal pipeline execution. So it may have the selector selects. This imports to fall to output of this marks the Selector. The pipeline will continue to execute the current instructions already in the pipeline.
Jishen Zhao 12:49
That means this second instruction, but if we run. Our selector signal will select this input, so this thing puts will we put in here is going to be no up so no up with things and do nothing we use this note up input to flush this instruction. Early Oregon pipeline away so this is what we do, if we selected no up both of the 2 stage. Deco stage and affection stage is going to be no up so they will flush it away. And will not continue with the rest of stage with this instruction and in that case, we can now fetch and you instruction from the cycle OK? An if we represent it in where does the pipeline diagram sorry pipeline table. We will use this kind of High Point table so?
Jishen Zhao 13:45
When we execute our branch instruction, which is here in this program Saturday. Layout program with the branch instruction and there, the same branch instruction. We have using or examples. Speculation when we do brunch speculation. We will speculatively predict that the branch will always not jump to the target. Instead, it will always start. Execute the next instruction because before the branch instruction, complete execute. We don’t know what they’re talking to drop to. Even so, we will just predict it’s not going to jump. I will just exit connect next exceptions kind of. Easiest way for us to speculate so will speculate that it will execute the next instruction. Start refresh it. And just pay attention to we didn’t have that problem. For example, in previous site that if we have the next instruction.
Jishen Zhao 14:43
After this when we end this cycle. Where is started to fetch it as well so in that case this F or branch is correct? It will be just going on well without any penalty without any Deley of those program but if we. Make a round prediction. We actually need to jump. We found out in this cycle effort cycle for this branch instruction execution. Stages already done then we need to flush this instruction away, but don’t forget we need to flush this instruction as well because.
Jishen Zhao 15:20
Both of instruction after 2 cycles. They all enter the Python already so that’s why we actually have 22 new apps to flush. Those 2 instructions. So, in our case. This was the way where we deal with recovery so we predict or speculations around we will flush. Both of the instructions and then start to fetch the correct instruction that is in the target. Is here? But. Think of why this is still a good message you always predict.
Jishen Zhao 16:03
2 hour speculative predict that it’s not going to jump head going to always to execute the next instruction. As or speculation or production because if you think of that. That way look at the pipeline table here, even though our predictions wrong. Even though it will need to jump.
Jishen Zhao 16:24
We need to flush those 2 instructions away. Um we have nothing to lose if we think about like that because there’s22 cycles that we waste it. By faction or next and next and next instructions. If we don’t do speculation you’re still 2 cycles of delay. We still need to start to fetch the correct instruction in the target and cycle file so by always speculating. The branch is not going to jump by always speculative delayed start to fetch the next instruction. Oh, we don’t actually pay any actual penalty on top of the stalling. I think here is a question question was is there a forwarding between the first 2 instructions I. Forwarding between the first 2 instructions. Oh yes, I guess so because the first instructions at, I can put the results to our 3 and a second instruction will need our street.
Jishen Zhao 17:36
Yes, good to point it out. Thank you. I didn’t put error here you should actually put arrow here. Alright. So by speculate sharing an speculatively predict that is not going to jump always we have a potential to improve. Their performance of our program, with the branch instruction by. Some case that we predict request.
Jishen Zhao 18:09
This is the wrong prediction. But in some cases, maybe you were right. Maybe it will not jump. So we can still improve some performance but will definitely not. Further degraded performance can purchase always darling right so this is kind of the benefit of always predict to not to jump and we called us always predict or speculative to fetch next instruction code.
Jishen Zhao 18:33
We always predict the branch to be not taken so for branch instruction. There’s a2 outcome ones taken wise not taken not taken means it’s not going to jump while taking means it will jump to target. Thanks.
Jishen Zhao 18:53
Also, the IT bypassing is going to have a follow up question the previous question is bypassing from an ex you may have first instruction to the second instruction, yes. I will put an arrow on effort class independent lecture notes so we will see. So this is called the speculation is called we always predict at the branch is not taken.
Jishen Zhao 19:22
By predicting the branch is always not taken we have opportunistically will be able to improve performance. If in some case work right OK, so let’s do an exercise to see how. Always predicted branches or not taking as all speculation will improve the performance.
Jishen Zhao 19:46
I think we have a question here as well. So. For those of you don’t have any questions. Please take a look at us exercise, then I’ll give you a few minutes to work on your exercise so this exercise is we have a program that has various type of instructions. 20% of branch instruction with other percentage of other type of instruction listed here.
Jishen Zhao 20:17
An or speculation is the same as we have that we have learned so far and not take him always but actually 75 or branches are taken actually 75 up to the time we have.
Jishen Zhao 20:31
On speculation, we need to recover the brunch, but let’s do calculate to see how much performance. We improved on CPI. By this speculation always not taken so calculate the CPI’s already learn about base. Cpi equals to one’s a little basic assumption, but this I guess this first time we calculate CPI. Our days are well, we have learnt pipelining so calculate CPI to see what is CPI now was this condition is given OK so while you’re working on your? Exercise I’m going to answer one question here, so there was a question so in MIPS.
Jishen Zhao 21:15
Since around speculations, always got flushed before XM stage there’s no. There’s no security issues there’s no spectrum algun and mix. I didn’t actually think that in detail looks like. Could be about that? Let me think about it. Maybe we can discuss this later on. When we talk about cash and then if we have time we’ll talk about that altogether. It’s good to connect well, you have learned with the current. Topics in security so OK just let me think about it. So I’ll give you a couple minutes to. Work on the access that and we’ll go through it together.
Jishen Zhao 23:05
Alright so I think the those exercises kind of straightforward if you follow or we have discussed about. Api and can connect that way as we have learned about pipelining and branch speculation. So far, so let’s go through it together. If you were still working on it. Feel free to work pension to work on after the class to try your calculation so the CBI.
Jishen Zhao 23:35
Will be although you can make your assumption like I did earlier when we just started to learn about calculating CPI you can assume that a program has 100 instructions in total. In 2020 instructions and then you can calculate what is psycho numbers to complete execute all those 100 instructions.
Jishen Zhao 24:01
But here I will use my TS method? Which seems cider calculation will be easier so I will start with the base CPI they see guys one and then the actual CPI here will be based CPI plus branch penalty so branch penalty maze. At the penalty or the dillet that we have to recover the brunch So what is the branch penalty here if we were perfect perfect pipeline in the CPI is one and then plus the penalties that 20% of our program. As branch instruction, an among those 20%, 75% of instruction need to recover the branch because we have a misprediction or miss speculation and for each of those.
Jishen Zhao 24:46
Miss speculation we need to delay for 2 cycles. We need to pay the 2 cycle penalty that already. Show you in all previous slide that there’s the brunch recovery. We need to, we need. To flush all those 2 instructions in the 2 cycles, so Adam together, they CPI plus the penalty is 100.
Jishen Zhao 25:08
This is our actual CPI so feel free to. Use either message make assumption of 100 instruction program or you this message to calculate OK. Either way as soon as you get the CPI is 1. 3. Easier method should be fine OK. So. Anna case instead of 100% of the branch instruction that is 20%. We need to pay the 2. Cool, penalty. Now we only have 70% of the 20% that we need to pay the penalty. So definitely the CPI will be lower compared to we don’t do that speculation so as long as there is some. Branch instructions or speculation will be cracked we can improve the performance on top of just stalling. But now people always try to do better, so now let’s think about can. We do better on top of that can we save even more penalty.
Jishen Zhao 26:20
Compared to always predict the branch to be not taken OK. So, in terms of speculation speculation means that we proactively make some prediction and proactively execute some instruction before we actually are sure. Although there is some risk, but there’s always some chance of that we can introduce the benefit the better cases how we can do small speculation that can. And larger profits that means we can increase the benefit increase the case that we expect our speculations correct. So instead of the those are the basic idea of speculation.
Jishen Zhao 27:09
You can read yourself. So I will not really go through that line by line read them out, so after class for free to read that. Myself, but in that case how to do the smart speculation or smart prediction. Instead of always predict that it’s not taking it is not going to. Junk maybe we can make a better prediction. Our program that predict the program, sometimes taken sometimes not taken. And when we can include improve the accuracy or prediction accuracy. So, in that case, we can do some dynamic branch prediction to predict. Sometimes it’s taken sometimes it’s not taken. And we can even guess wherever is taken where will jump to before we then complete all branch instruction. An effort the branch outcome is known. That means after execution stage or branch instruction. We can verify or guess or prediction whether it’s right or wrong. Then then we can do the same as before, we either recover. Flash the instructions. If we have run prediction or we just continue execution then without any penalty.
Jishen Zhao 28:35
Hi. And if I have stage pipelines cannot easy, but if the pipelines. Deeper there will be a little bit harder because in 5 stage pipeline. I like our previous question that it’s. One of you asked about security issue, mostly security issue with spectrum and meltdown as related to you can’t the. Hacker can Coke into your memory stage by the 5 stage pipeline typically if you do this branch speculation in the 2 instructions already in the pipeline does not enter the memory stage yet. So, in that case, the speculated. Value will not be stored in the memory and also. It’s good for 5 stage pipeline makes things easier is that you haven’t modified the memories value yet. That means the memory state is not changed an whatever in the registers. Our pipeline address is just temporary values, so you have an override whatever is the previously existing member yet so you don’t make permanent damage to your true. The data is the program accessing yet so that’s.
Jishen Zhao 29:56
Also why 45 stage pipeline like MIPS Branch speculation things will be much, much simpler than those pipelining that has deeper staged. But in his class like I said, We’ll just keep it thin Staples. So those basic ideas. We discussed with all based on the 5 stage pipeline, but if you’re interested feel free to. Take a look at those deeper pipelines to see how the same exciting applied to you and how to deal with those complex cases later on OK. So let’s get back to this 5 stage pipeline.
Jishen Zhao 30:42
Wanted to. Make dynamic branch prediction, I mean, sometimes predict crack pads taken sometimes predict not taken. We need some logic this time to make this decision or make this prediction. And this logic needs you, you can think about is letting me know what it needed needs to be able to monitor over to observe what the program has been executing particularly around this.
Jishen Zhao 31:13
Branch instruction that area. And then need to have some logic or state machine kind of thing to make that prediction based on the information so that’s why I put up. A big black box or Big Red Box your code.
Jishen Zhao 31:35
Vp branch predictor. So now instead of just adding wires and Max is know to do to implement this. Fraud protection, we need a more complex component added to the pipeline so waiting on today’s class will take a look at how what’s inside will after poking through this. Into this red box to say to see what inside the main components in this branch predictor OK. But before that hour before we take a break will do another exercise to show the benefit of doing the dynamic branch prediction that may sometimes take an exam times, not taken as old.
Jishen Zhao 32:19
Prediction now to motivate or need to design this more complex branch predictor and after break will start to take a look at the detail of the branch predictor. So I’ll give you a couple of minutes to work on the this question during the break and then we’re going to take a break at the same time for 10 minutes. So now it’s2:39. Let’s take 10 minutes. And come back at 2.
Jishen Zhao 32:49
Let’s do 11 minutes, so we have work. We come back at 2:50 PM and I’ll go through this question together. And continue to talk about the details from product so let’s take a break while you’re working out through free to take several minutes to what kind of exercise and then come back at 2:15.
Jishen Zhao 33:24
Ok let’s get back to our class already put the answers out the CPI is 1. 02. I’ve seen some of purity. Let me answer during break your early correct so I don’t assume that there’s no issues with the calculation. Basically, the same message as approvers exercise OK. How my voice? Right.
Jishen Zhao 33:54
So now we’re going to get into more detail of this red block. This branch predictor how it is design and that will be sort of related to some of you if you choose. To do the course project related to the branch predator design will be discussing kind of will be helpful for you to project. So first of all we need to decide. What the functionality’s will bear brush predictor help articularly although we know it’s the main functionality of course is to do. The Diamond dynamic bones prediction, but it were dating the more detailed the first question would come out as.
Jishen Zhao 34:40
How? Where we should do the prompt branch prediction that means where the branch predictor should be sitting at which stage if you pay attention to my previous pipeline diagram. You should already see that the branch predictor actually. Sitting in their faction stage but why is that? So if say. We have the 2 instructions the branch instruction. I listed here. The same as previous example. And this is the target so think about it when if we do the branch prediction.
Jishen Zhao 35:17
Ann. The vehicle stage that means we already know that this is the branch instruction is decoded so we know we actually hits. Be an easy that instruction branch, then we know for sure. We need to do the branch prediction. But the benefit of having a brunch picture in affection stage is that we can do.
Jishen Zhao 35:41
The branch prediction. Even before we know it’s a branch instruction think about that. So before the decode stage had fertile stage of this branch instruction. If we can do the branch prediction. What is good about is that we can already fetch the next instruction is predicted in cycle 2 this is like the same as previously ordered do but that’s only enabled that we can do the branch petition.
Jishen Zhao 36:10
At the fashion stage if we wait until we know that’s a branch instruction that we do. The branch prediction rushing into ways do for one cycle and now we can fetch the predict instruction in cycle. 3 here right so that’s1 of the reasons that the bottom branch predictors typically sitting there fashion stage that starts to do. The branch prediction during infection stage already so that we can complete save the 2 cycles.
Jishen Zhao 36:40
Free tradition correctly OK. So here’s the we come back to our previous graph with the branch prediction and affection stage. So you can see this fashion stage because this is the decode stage. Require a jester the starting of Dicola stage, so we are here.
Jishen Zhao 36:59
Let’s do fashion stage and now you can take a look at the detail of the wires and components. We added to the pipeline diagram. Traffic class you can spare some time to reason about all those wires connected so II think there’s so far. It’s the most challenge pipeline diagram modification. We have so far, we’ve done affording bypassing we’ve done.
Jishen Zhao 37:28
Branch speculation to always predict not taken but it’s not as complex. This one this one. We actually combine both you can see here it’s a original of that. Why are some access we add to recover the branch from this prediction RE speculation where we always predicted not taken now can be used to recover the branch that we? Have predicted to either not taking taken we can always flush those instructions by feeding oops. But in addition to that.
Jishen Zhao 38:02
Now we have our branch predictor here and then we need to connect all those wires together. Ann. To enable that bunch predator to working so I’ll leave it to you after class. You try to read about how those wires work, the signal comes the signal goes by yourself a little bit and then the. On Thursday, the Class I will go through the wire to get to will go through the wire to together again, during review. So hopefully that will help you to better understand.
Jishen Zhao 38:38
Somehow, the branch prediction works in this pipeline diagram OK. Hey. So, in that case well zooming or get into detail of this block this branch predictor. What a half? Ok so this is what happened so this whole thing is our original red block is a branch predictor, but inside it. There’s so many components are it’s here. The PC is not not part of the contract. So, but a cider 3 at least 3 main components is BTBRAS and BHT. We’re not going not going to talk, too much about res we’re gonna spend. Turn on bTB design and the HT design, So what are those? What are those stand for so I think? Yeah, I had just better out so bTB Maze Branch? Target buffer and BHT is the branch history table, so I’ll give you examples to go through them later on actually shortly. But on a high level? What are those components for? Those 2 components As for answering 3 main questions to do the branch prediction so earlier.
Jishen Zhao 40:06
I mentioned that you should think about word or the Main. Our functionality that branch predictor order goals or branch predictor have also actually answering those 2 questions so. Step one or question one is to answer if an instruction is a branch instruction. I have is not branching instruction or we don’t need to, we can just go regulate to execute instructions. And fresh next instruction, waiting on we don’t need to do the branch prediction at all. So the first thing you do infection stage we need to decide or we need to make the prediction weather.
Jishen Zhao 40:44
This instruction is a branch instruction or not the second question, if it is approaching structure, then we will need to answer. The question of if this branch will be taken on not taking that will decide whether we should fetch the next instruction. Or we should jump to the target. Um Ann through question will be if this branch is taken if it’s not taken then answer will be fresh next instruction. But if it’s taking where is jump to jump too so we should predict the target. Even this is also part of the goal of the branch predictor, so we should also predict where it will jump to. Predict the world, the target is so the 2 key questions.
Jishen Zhao 41:27
The branch predictor. I need to answer. I’m saying particularly will not cover this component called, IS it’SAR is better spell it out as well. Every curious it’s called return address stack so it is particularly used for? If your program and there has the function calls or some other type of code that has a return as a hazard called out and return so it will store the target domain. The branch target that the return from the function will need to go, but that’s a little bit more complex than the basic ideas.
Jishen Zhao 42:07
Brown prediction so will not get in too much detail. Have you ever encountered that kind of code example code that we were kind. You can dig in more detail of that OK. So. I also already marked for to answer those 3 questions or those 3 steps actually 3 set means that we’re going to answer the 3 questions one by one first question first and then second question and third question.
Jishen Zhao 42:38
Ok, an already market. The which component bTB and BHT to answer which questions so actually the first component branch. Target buffer is used to answer both questions once that one and Step 3. So I will pose help us to decide whether it’s a branch instruction or not, and to decide where the target is if they will jump and brush his state table is.
Jishen Zhao 43:09
The components will help us to predict whether this branch will be taken on our ticket OK. And if you’re interested that I will typically will not require you to get into that. Much detail if you’re interested. You can actually take a look at other Marxism competitors and wires connection inside of lunch. Predictor but that’s completely up to you.
Jishen Zhao 43:35
Now we know those are the 2 components. Those are the goals that the 2 components are designed for which questions to answer then we’ll take a look at the detailed implementation or design of those, 2 components.
Jishen Zhao 43:49
Ok. So first of all will start from the second components that branch history table, so if you still remember our last slide. Mark and there’s branch history table is down there. The the block down. There is to help us to predict whether a bunch will be taken or not taken so let’s forget about the branch target buffer assuming where they have. Our branching structure like the one take an example here. I actually put a high level language. Model C code so that help us to see the pattern from the code more straightforward.
Jishen Zhao 44:28
If we have this branch instruction here an. We. Don’t worry about the target where junkie yet. We only are in the second step to the side or to predict whether it’s gonna be taken or not taken so this is all question. Our goal and this is the component we can design too. Pressure so I’m going to introduce a couple for designs type of designs people have been using as a basic designs. Those are not sophisticated designs, but there’s a basic designs. To help you too.
Jishen Zhao 45:05
Learn about the basic idea of the branch history table, so we’re going to introduce the first one, is called one bit. His shape based branch predictor, so that means in a branch history table, we maintain. One history that is the history means that previously this branch instruction being executed. Whether it’s taken or not taken. What is the actual outcome of this execution? And the rationale behind it, we use the history to predict the branch outcome is because her she’s always a good predictor of future so for example, you’re here today, you’re listening to my. Our zoom virtual class today, it’s highly likely that you’re gonna be here on Thursday as well. So it’s kind of using the previous history to predict next outcome OK.
Jishen Zhao 46:03
So let’s take a look at how those branch history table is is kind of Design and how we reason about the branch history to predict the future using this table.
Jishen Zhao 46:16
Hey so I’m going to use this instruction as an example don’t worry about the outer loop is just inner loop is the 4 and AJ. That is less than 3 and Andrew plus plus so if we take a look at this line of code wording can have A? Basic tradition or mine already but the machine means is not as smart as us so the machine will still use the branch history table to make partition.
Jishen Zhao 46:46
But if we take a look at this quote we know that the pattern. This branch instruction will be taken taken not taken taken taken not taken sorry. I’ve taken taken taken not taken so because it’SJ less than 3 so the outcome is always.
Jishen Zhao 47:03
3 times we stay in the loop and then get out once and then 3 times, executing this instruction will be inside the loop. And then get out, so this is all with the pattern an we listed in the table.
Jishen Zhao 47:17
Brunch his table in a common of outcome because this is actual outcome of this branch instruction being executed. And this color of time is not a time it’s a Times that the branching. Ocean being executed, so this is the first time we come to this instruction. There’s the second second time we come to this we May. Hit this instruction is very time and so on OK.
Jishen Zhao 47:43
And what other columns in this branch history table is or prediction. Definitely is important. Because this is going to be our prediction. When we use it to guide our speculation and Anna State. It’s a kind of state and maintains to tell about. Our prediction income count kind of relationship so let’s take a look at how each time there’s same branch instruction being executed when we reason about the branch.
Jishen Zhao 48:14
History table so the first time this branch instruction is encountered we come up the first time we haven’t seen it before. So will initialize both overstates and prediction as zero not taken. But unfortunately out count average execute it, so if I draw a line here divided those colors.
Jishen Zhao 48:40
These 2 are for those instructions completed this outcome is after those instructions completely or after execution stage is done, we know the definitely outcome. So the outcome is taken. Then. Second time there’s branch instruction is encountered with hit it. The second time will copy the state from outcome and then will copy all prediction from a state. So how come goes to the second time state and a second time state goes to the second time prediction so that will go on and on his third time. At this hour count of second time goes to certain states in Amsterdam state go to the production. So so and so force. If we take a look at here.
Jishen Zhao 49:36
This outcome. It’s not taken the first time it goes to the fifth time. State and then they’re not chick in the state we carpet to its prediction. So this is every row row by row? How this branch history table work. Also could take a look at the prediction and the outcome. Those 2 columns whether they match or not. That means whether we have a correct prediction or we have run production, they need to. Need a brush recovery, so the first time that weighs around we initialize to be to be not taken but I’ll change the outcome is taken by the second and third time were correct because both of them.
Jishen Zhao 50:19
The prediction an outcome they match an next we have to run an incorrect and end to end so you can see the pattern. We have 2 correct to Round 2 criteria and so on, and so forth and it can go on and on. Because there is 100 Outer Loop 100, so I only listed 12, here because my size. Not long enough. But you can go online. You can see the pattern here right so this is the one. But history table or one bit launch history table now because we only use one bit to maintain the history and stay here, this is all history right well, I use one bit one or zero. To maintain whether it’s taken on that take one bit is enough, so that’s called one bit. Like I see the outcome or the accuracy of our production is like this for the 12 iterations. So we can come with a better prediction by increasing the number of Bits. You maintain Anna history.
Jishen Zhao 51:22
So the same columns, but all it Interstate calling now we have 2 bids now, although here, I only have an arrow, one column 2. To record all in letters, but it would take a look at here.
Jishen Zhao 51:38
We have war states you can, we can use instead of one or zero. We have 1023 we need 2 Bits to maintain. A strong not taken a week, not taken as week taken Anna Strong Ticket. So now instead of 2 ends 2 extreme ends not taking versus taken now introduce 2 week. I know state in our history, is a week, not taken and weak taken so let’s reason about this table again.
Jishen Zhao 52:09
Now it’s a2 bit that we maintained and the states or history. Ok, so the same program if you still remember the pattern should be taken taken taken not taken taken taken taking a chicken. And then the same we’re going to.
Jishen Zhao 52:27
The second time this instructions being executed. First time was to initialize both of them to be struck this time of strong not taken for history and prediction. We only come up with prediction of taken not taken. Clearly there’s there’s no prediction cut week, not taken his predictions always taken on our ticket. Anna second time this instruction, we encounter we going to. Make a note in our history by this time as a little bit different than last time because now we have a week, not taken state so change from not taking only 2 week not taken. So this is like we don’t change directly to the other end of taken but we are not sure where less sure about not taking so we change just make a small tiny little step.
Jishen Zhao 53:21
Away from not taking so now we actually change to a week, not take it if the outcome is. Not the same as sour prediction, yeah, so because it’s a week, not taken so prediction is still considered as not taken. But outcome is now ticket so the third time it’s executed were even more even more not sure about not taken so we’re more a little bit more short towards that. And I’ll take it but we’re not strong taken yet so we change to be weak ticket. And because there’s a week taken, it still concerns taken so will prediction will be taken.
Jishen Zhao 54:06
Now I’ll I’ll come mash or prediction, so that kind of reinforce our prediction and then they all predictions kind of. Good so encourage our production, so the next time we encountered the same instruction. That we will be even more sure that we’re going to go toward that direction of tickets.
Jishen Zhao 54:26
So now we’re strong taken so we can see overtime. It takes 4 steps to move from a strong not take into a strong taken. Step by step every time we just move a little bit because we have 2 Bits and you can imagine. We have 3 picks where you have 8 stage, then you will move even slower. You can do that as well, OK, but it’s for 2 stage as stupid. We have 4 stage. So we need 4 steps to complete move to the other end, so now. Our prediction will be taken but outcome is not take Care now we’re out so we’re going to repeat that same. Where there will be less sure about that and show about less sure about ticket and so will be a week taken and so on, so forth so in that case after 12 iterations. The child times we encounter that.
Jishen Zhao 55:24
Same branch instruction, the outcome will be like that’s where the fur at the first will have more rounds, but later on. We actually have more correct so compared to our previous one bit history table. Where to improve our accuracy? By having 2 Bits. Right. Ok, so if you kind of forget later on or it’s harder for 2 reason about 2 Big Branch Predictor. This is from one of my former student as well. He actually come up with the correlation between watch 2 Big Branch Predictor and Mario Super Mario game so it’s the same answer. At this table where the Super Mario game works from a big, too small to the other color and so on, so forth so hopefully that will help you to interpret an? Remember not too big branch predictor how it works. Alright so that’s how a basic idea of how branch history table works. The design, then how to actually implement it. The previous table have seen we have the reason about is just a logical design. But what is the physical view? How does actually implement it?
Jishen Zhao 56:52
Well take a look at maybe later on, but now let’s just forget about branch history table for a moment and instead go to the other component. This kind of computing and the naming be which TMP TV. But try to get familiar with those the difference.
Jishen Zhao 57:14
Now, this second component had showed the first component. Our Professor Branch Predictor design at the branch target buffer so branch target buffer. So what it does with the goal is to answer 2 questions answered the questions that one if whether this instruction. The branch instruction or not, and a third question. Affect the branch instruction is taken where the target. So how do you say implement it actually is implemented more like a cash if you still remember from your undergraduate level architecture class? What is cash. Design so cash should have A tag. And should have a data version and I may have other Bits, so the same for implementation of structure over branch target buffer so how it works is that you have a PC. Open an instruction and how to tell whether it’s a branching special not just search through this branch target buffer or does cash to see if there’s a matching of PC. Or part of the PC so if we find that a particular instruction is PC is stored in here. That means where they see it before. In history as well so get out once we have encountered a branch instruction. Average executed will store it until this cash is Target buffer so next time we see it. We know this PC is Deathly, indicating it’s a branch instruction. An addition to storing its PC. We also store its target so if this branch instruction. The first time is encountered it jumped. We will make a note of the target as well so the next time.
Jishen Zhao 59:12
Your infection stage we searched this branch target buffer where you know to information. One is it’s there. The PC is there, it means. It’s a branch instruction. Second is that’s the PC correlated to the target in the same row as well. That we know that if a jump were jump, too, so this is kind of what the branch target buffer instrument implemented but this implementation example, particularly I also have one call and I store. How? The prediction statements so this is kind of particle. It part of the logically and were previous table of branch history table that we maintain a prediction or sorry updating a history. I’ll be outcome of or branching structures, so the kind of the first column. I market escaping or branch history table now in its implementation is state calling this. It should be it’s actually stored in here in the branch target buffer as well, and correlated to each PC so.
Jishen Zhao 01:00:16
When we first encounter a PC with search this target buffer we not only get. Information about whether it’s a branch instruction and wear a jump to it will jump to and also we get a history of last time or last couple of times. Weather is taken on ticket so the 3 information at a time, we get out of this branch target buffer by this implementation.
Jishen Zhao 01:00:48
So we have a question here. Our question was how does branch predictor know the current instruction is a branch instruction for the version it doesn’t know it just assume it’s not a branch instruction because the PC is not there. So it’s a cool mist, yeah, you’re right. It’s called Miss in the cache so miss, and then this project and not about instruction, so the first time we’ll need to pay the penalty of executed. And this is the time flow over full charge her for brush target buffer how it works.
Jishen Zhao 01:01:25
I will not get in detail. You can read it yourself during each stage in our profession state where there is the postage what it does. Execution stage it does not get it detailed because that’s related to the pipeline diagram or wires or components.
Jishen Zhao 01:01:38
I added so take a look at this. Flow chart and then correlated to the pipeline diagram. We can kind of reason about together and hopefully you will get an odd through as a whole picture OK every class.
Jishen Zhao 01:01:56
Hi I think that’s pretty much of today, we introduce the basic idea of were bunch predator on Thursday, will talk about as a help her to your. Project on branch predictor, but also I think that’s a good advance kind of material that you help you to better see. What is the modern branch predictors implemented and design will talk about those branch predictors they can use need in our project are. I’ll briefly talk about other typo for Bullship Predictor research like using machine learning to do the branch prediction OK, but basically so far this is the.
Jishen Zhao 01:02:44
Performance performance in terms of accuracy. Neither speed of 2 bit branch predictor with the 2 bit history table. You can see pretty much saturated ***. 94%. With this benchmark again the accuracy depending on the workload you’re running so this is kind of respect CPU. We have introduced Pasadena benchmarks when we talk about performance earlier. Are just one of the spec benchmarks and 89? Can open you can if you’re interested? I have time you can run some later Spec CPU, 2017 to see. Are the benefits of or the accuracy of 2 big branch predictors? Alright so that’s pretty much of today, an answer.
Jishen Zhao 01:03:35
Today I will talk about more much predictor. I still see on Thursday and I still have a office hour today, but again. Let me I’m gonna post. Those virtual conference room for all to clean up to make a recording. Are of that that the class?