Lecture Recording

Text Transcription

2021年10月13日 下午 8:39|1小时2分钟27秒



Jishen Zhao 00:03
Last time we may discuss isa design, an this is a2 slide. I will use 2 slides to summarize pretty much everything about isa will have learned. And what you need to know about so first of all you need to now, it should have very clear concept of what is I say and what typically will see as people to describe isa and so specification? And you should be very clear about what is difference between the program execution model an instruction execution model and in particular for this class we care more about?

Jishen Zhao 00:44
Execution model of instructions. That is on the right hand side lower. A the graph shows their steps open instruction will typically go through the main steps by executing this instruction. Different eye essays may take different number of steps out, but this is the major steps that typical instruction will definitely goes through, and this is also the. Steps that MIPS instructions will go through that will orders Porter will mostly be using an example to discuss the computer design right we also.

Jishen Zhao 01:26
Briefly talk about IC design goals. 3 design goes you should know as programmability and performance. Implementability incompatibilities, so there’s a lot of tricks and requirements when you design an I say we need to consider too. Achieve those 3I say design goes.

Jishen Zhao 01:48
Ok so if you want to review you can use this as the index, which help you through. What you have learned? And last time we talk about aspects of why I say so. Basically uh specifies. These are when you design. I say thank you to consider things to care about. That when we go through the steps of executing structure, so for each of the step or each a couple of steps.

Jishen Zhao 02:19
We talk about what aspects of I say when you design. And I say you want to be you won’t take care of so first of all. I already put the arrow here, so you know which aspects actually concerned with each step, which step. One is instruction less so now you should know that too. Tax the instructions in the program. You what you need is a PC and also the Lens of the currently being executed instruction. So the next instruction. If there’s no control transfer at will be a PC. Equals to the current PC Plus the Lens of current structure.

Jishen Zhao 03:05
Our instruction format. Here’s something concerned with decode stage. For most of you thanks you should know that for mixed construction. There are 3 types are type by type and J type if you are wondering what kind of instructions. Or I type for each type you can take a look at the miss. I say specification. I posted on our class Google site. And for many of you or.

Jishen Zhao 03:40
Some of you are also interesting X86. Ic design, so I should also talk about. I say design concept of X86 its arrival Lens isa and. Example of X86 instruction format is also provided. So you can see, there’s a difference X86 the instruction format is more complex than MIPS instructions.

Jishen Zhao 04:07
Where did I live is concerned with the steps that deal with data input and output so we’ve discussed quite a bit over addressing modes So what you want to know. And how much you want to know is if I gave you a list of adressing modes. You should be able to match. An instruction would address remote is using you don’t need to remember the address in Modo. I don’t like always. I don’t like remember things but you need to understand the addressing mode, so you can pick this is also will be the similar situation when you.

Jishen Zhao 04:45
Later on if you’re going to be able to work. You need to design and I say or design architecture used you should have some sophistication and materials on hand but you should be able to understand the urgency, Moe’s and being able to use those. And control transfers, we talk about different types of ground.

Jishen Zhao 05:05
Chandrama instructions how it is represented typically invests instruction. I say, and this is for finding the next instruction address not. Straightforward as just the PC plus Lens, but actually will jump to somewhere else so this is we have learned last time. And today will be finished up, I set design by discussing the last thing item about IC designers are performers rules when you didn’t. I say to improve the performance thanks to go. Of Architecture designed with scope order will be focused on performance. There are 2 common opponents rules. We talk about make common case faster will finish up there. Today and then we’ll also briefly talk about make faster case comments. Those 2 rules, you can see this. Complementary make common things faster and make first case comma, usually fastcase and common case, they come together.

Jishen Zhao 06:08
And then we’ll start to discuss the next unit of this class at pipelining and it’s the CPU core design. Part of the Supreme Court isn’t alright let’s get started for today’s new materials. So we left over about examples of the performance rule for eyes that design number one.

Jishen Zhao 06:33
The last couple of examples related to concerning the Register versus memory representation in USA’s instructions an. As usual, I will use that rest type of essay versus this type is X86 as examples. So I think we’re should already know the difference between the 2 in terms of for a registered member access for risk type up instructions like Maps on spark. You see. Adjuster accesses just reject access if you have a computation that will need to take an input and output to the member. How you need to first perform memory access to get that data into the Jester and then compute conquer gestures so computation does not typically apply are.

Jishen Zhao 07:34
Memory locations well for X86 this kind of I say design. They allow computation to apply to a memory locations. So you can take a look at this X86 instruction example. You should already see that before or similar.

Jishen Zhao 07:51
One example before the bracket means that you can add a data. That’s stored in the memory by 100 thing a single instruction. This is the key difference. And you can see both of the essay existing today so. Ann. Common cases it actually varies depending on application running typically in a machine, so if your pick. One of Isaac design this as Palmer case you’re the whole architecture design ship try to make. This common case fast optimized for his common case, including ice ages 9.

Jishen Zhao 08:37
For X8061 more thing I just want to mention is about accumulator’s because for X86. You can apply computation direct memory locations as it is this actually provide extra. Hello tool to help with the input and output. The data access card accumulator, so calculators. Here’s the program example. But. On high level, they’re like immediate intermediate locations where you can store data temporarily for example, the eax that were gesture could be at a relaxed regestered to temporarily store data. So that can make things faster so this is one of the mechanism that exited 6 nicely designed to make this comic case faster OK. Alright. Then, another example of considering make common case fracture is the address size. Ok, I think we have a question in the comment wouldn’t take a look. I. Talk slower OK, I will try to talk slower. But if you feel like I’m too fast. Feel free to watch the video after class again.

Jishen Zhao 09:59
But I’ll try to attack slower as Celeste possible, but we still need to finish the material that we applied for today. In order to squeeze everything to this class in this quarter. Bye. Alright so the address size you can think of for common cases. There will be using can change overtime over different applications. Ann. So. Well, I’ll try to talk slower how I’m going to suggest to my face. I think slow as well. Alright. So um.

Jishen Zhao 10:46
4. Common cases in terms of the address size in all days could be just. That a few Bits like forbids a bit. But you kiss you can see nowadays we write a program typically run on 32 bit machine, or most likely even around on 64 bit machine, so we can say today this is the common case. No, but one thing I want to emphasize or I want to point it out as a lot of people think of the 6 bit machine? What means by 6 bit isa.

Jishen Zhao 11:25
I think of there will see oh this is access. It fixes the 64 bit machine. That means the whistle arithmetic operations are 64 Bits, but actually what 64 bit I said Mrs. Ann. Your virtual address is 64 bit long. So when we talk about virtual machine later in this quarter will get into more details about what it means by 64 bit is a virtual address and what is relationship between virtual address and physical address?

Jishen Zhao 12:11
Ann. This is about the performance rule number one. Woman’s real number 2 in reverse. Make fast case comment. So make fast case comma can think of in German already have a design that for particular case. I can make it faster then. In order to make my design really efficient, I should. Use my design in the particular area that amazing, the case that this kind of Design is used alot commonly used so let’s take a look at examples. Again, so examples that kind of straightforward for Sisk as an example is 86. The design has been optimized for certain operations and people.

Jishen Zhao 13:13
Find a use case. The case comma case that will be able to leverage their those design is desktop and server machines. So that means I can use X86 machines commonly and server or desktop applications while in reverse.

Jishen Zhao 13:34
Risk type of design, you can see their market or the use case is different from other 6 type design because for risk style design fastest case. Actually fix embedded and mobile devices applications better, so in that case you can see nowadays.

Jishen Zhao 13:54
Most of the risk design arm and wrist 5. If you heard about the kind of architecture. The common use case people have thanks flooring is embedded area. Ok, so this is 2 example compared to 2. You can think of what people apply this rule number 2 makes fastcase accounting in a realistic way. Alright so this is all by say. Again. This is a list of things you wanna know bout isa. This is that kind of the list on top of. Unless I showed earlier in today’s class if you forget most of the. At least now you should know these be after a couple of years. Hopefully still remember an those kind of your success Yoshi.

Jishen Zhao 14:57
Learn as I say OK if you’re not going to be architect anyway. Alright so after we talk about performance an ISA the third unit of this class is going to be pipelining. So I think most of you have already heard about or hopefully still remember about basic idea of pipelining in computer architecture design or organization. General the pipelining what it process in his power, pointing is instructions so pipelining in computer architecture.

Jishen Zhao 15:32
Interprocessor core is processing instructions that come in from one end. One by one and then go out from the other end of this pipeline so well again start from the review of pipelining ensemble material. You should already learn about in your article. Undergraduate level computer architecture class. But for. As to get into more detail there anymore. Invest materials were going to start from the reviewing and hopefully to remind you of what you have for?

Jishen Zhao 16:10
Alright as an overview this is what we’re going to talk about in this unit today. I don’t think when you finish everything but we’ll just get started hopefully get through.

Jishen Zhao 16:23
Several bullet and’s list so we’re going to talk about single cycle data pass latency and throughput versus. Other performance metrics apply an pipelining so will do some calculation again and then I’ll review for you, the basic pipelining concept. And if we have time today will start to talk about dependency between instructions that getting a pipeline and then talk about how to deal with those dependencies to make the pipeline running. Mostly.

Jishen Zhao 17:06
Alright so first of all main concept of pipelining recall yourself or myself would you have learned earlier about pipelining? Level class. And the main thing about pipelining is going to provide evolution that instructions are executed. Sequentially remember it’s it will illusion is not real High Point in our space. Make instruction actually to execute in parallel as much as possible.

Jishen Zhao 17:43
Ann to illustrate a pipelining Anna program. You can take a look at overtime how a set of instructions being executed, I have. 3 instructions in this example, an I have a pipeline virtually can imagine. There’s a pipeline there an overtime instruction 123 get into this popcorn one by one. And get out one by one.

Jishen Zhao 18:11
But there’s a substantial amount of time and section one and 2 and 3 they overlap with each other. This is the time that they actually stay in the pipeline and being a running in parallel so this is the main concepts on high level.

Jishen Zhao 18:30
But. An order to better illustrate the idea of pipelining and some of the tricky case in pipelining. Let’s do some calculation on the small for daily life example. So pipeline is not new invention for computer architecture, you can see pipelining everywhere. This is the simple example pipelining everyone should be formula with if you’re going to do laundry. You need multiple steps to go through the boundary. You can have a wash machine going to have a driver. That following the wash machine and then I also have a folding step. But I’m kind of lazy instead of doing 40 in myself. I actually use a robot to do it. Waiting for me. So now I have this cute folding robots for me.

Jishen Zhao 19:30
You see in this pipeline, I have 3 steps wash machine driver unfolding. And let’s do this calculation. I’ll give you some time to calculate several questions on top of this scenario and then I will go through it. We’re going to go through together.

Jishen Zhao 19:53
So here’s the numbers I have for parameters. I have with the steps and this pipeline. And I have a few subjects. First of all I’m going to take one unit of time per load. And then the wash machine takes 30 minutes. Giant Machine Dryer takes 60 minutes, a little bit longer, and my folding robot is super efficient and more efficient than me. It’s going to take 15 minutes to 41 load of laundry. The question for you to calculate there are 3 questions. One is how long for one load in total. Second is how long for 2 loads of laundry in total and how long for 100 loads of laundry in total.

Jishen Zhao 20:51
So when you calculate tools there, so tricky case. You want to consider in power, pointing you will see by our calculations solve so let’s run, too slow or too fast. Just let me know by typing your comment in your chat. I’ll try to adapt my pace still fitting their gender. Today tried my best right so let’s do this calculation. I’ll give you a couple minutes to Calculator 3 and 3 questions.

Jishen Zhao 22:21
Hi Alex go through the question altogether, so if you still need more time. Feel free to do it after class. I am going to do this by illustrating it visually with this graph So what you did. Time is one block here when you need time is say 15 minutes because my? Smallest. Unit of time of all those steps is 15 minutes. That’s why our folding robot. So.

Jishen Zhao 22:55
First question is how long for one loads in total to finish. The first step of my first load is to go through my wash machine. And then after washing machine is down nice wash. I’m going to put into my driver and it takes 60 minutes. So we can count therefore blocks to 1660 minutes in total. And then uh for my driver down it’s work. I’m going to put into my. Folding robot 2 folded so in total. Adam together, it’s1:05 minutes for one load.

Jishen Zhao 23:38
So the trick are quite Lenny start from we gonna finish the Second Bowl. So pipelining there are 3 steps in total means that I cannot have to Lowe’s in parallel in the same step at the same time because one washing machine can only. Car wash one load safer driver in my folding robot so in that case.

Jishen Zhao 24:03
My second load can only get into my wash machine. After the first load is down with washing so here is the starting point on my second load. After 30 minutes of first load started I can use the wash machine.

Jishen Zhao 24:19
But. I cannot have the second law to get into the driver before the dryer of the for the first load is there so I actually have a bubble here. I have a gap of time, I can only start the driver of the second machine. Bold effort, the drivers down with the first law. So that is the tricky case dimension earlier and then once the jar is done because the folding robot is already done with the first low so I can use the folding robot to perform.

Jishen Zhao 24:53
Voting for my second floor so the second law takes longer. In total, the longer latency to finish their first law but for 2 laws. In total, the total time to finish is add more together, so we can add blocks together. When you get is. 165 minutes. So I. Do the calculation by breaking down the total of 2 loads by the first low plus the extra time. The second law will take after the first floor finish. So the actual time of the second law takes is 60 minutes.

Jishen Zhao 25:39
So. That will be helpful for us to calculate 100 loss because for every next Lowe’s. Mr. The force and will always take extra 60 minutes. Perlow so to calculate 100 load thanks will be. 100 and five close of 5 minutes for the first load an 99 of the subsequent laws will take all take 60 minutes after.

Jishen Zhao 26:11
So Adam together, it’s6045 minutes OK. This is how I’m calculating the total amount of time spent on $100 and if I put it into my computer. Program execution instead of for doing laundry things should be no difference.

Jishen Zhao 26:32
Ann. Let’s take a look at the case over an executor programming computer how the same exactly the same pipelining idea quite too. So we’re going to introduce this pipelining data pass. The basic idea and after that, we can take a break. Alright so. Like I said every time we learn new component. We’re going to.

Jishen Zhao 27:02
Come back to our basic performance metrics and then calculate performance because so the goal of computer design how for those whole quarter is to optimize performance as. Our key abstract objective. So hopefully still remember you should actually remember because we’re going to use it over throughout the whole quarter. The difference between latency and throughput. So for particular an hour example earlier. We talk about the laundry case have already mentioned latency as the time to finish one load to Lowe’s. And 100 lowest in total, so we’re having calculating latency already in our earlier example 2 put it will be another concept in terms of.

Jishen Zhao 27:58
Performance evaluation. So we got a question here. How is this going to work with a single CPU isn’t CPU only? Do one instruction every time I will show you shortly. It’s a good question. That’s actually motivate my next Life OK, I’ll show. At FedEx. Alright so this is a review of our performance metrics put it, there latency and throughput with the difference, so you will use that shortly. Alright so this is I guess the what? Just the student just asked question about.

Jishen Zhao 28:44
Instruction shared be executed one by one in a computer in a CPU. But I think I mentioned the word illusion earlier today, so this is illusion in today’s or at least modern machines. But in all days or or simplified case it does it’s real case that you can execute one instruction and then when when it’s finished the next instruction. This people actually given name. There’s a name for that it’s called? Single cycle instruction execution or single cycle instructions? What single cycle means because.

Jishen Zhao 29:20
The computers are measured for executor as the cycle by cycle the Clock. Actually, tick, tick tick. One car period is one cycle so single cycle machines execute one instruction. Per Clock cycle so each Clock cycle.

Jishen Zhao 29:35
I round one instruction and then the next Clock cycle. I can start around the next instruction. So there’s a single single cycle instructions. So the total amount of time to execute.

Jishen Zhao 29:51
There are several number of instructions say there are 2 instruction instruction. Zero instruction, one inside single cycle machines. You simply add the latency of instruction together or? The number of instructions times number of cycles per instruction, or number of instructions times cycle time versus your latency. But like I said, This is the typical is not comma case, a modern machines, a model machines, we use a pipeline. So the second row in this graph is called a pipeline machine.

Jishen Zhao 30:29
Compared to single cycle machines, you can see to execute instruction to instruction altogether. The total amount of latency is shorter and wide shoulder the same as we? Just calculated without further example the laundry case, we actually have turned overlap. The 2 instructions, although they get in and get out from the CPU. One by one, but there’s a time where they can coexist simultaneously Anna CPU. How do we do that? If we take a closer look at it compared to single cycle machines. The pipeline machine actually breakdown.

Jishen Zhao 31:12
An instruction into multiple cycles. So pipeline machine is not single cycle instruction execution more. It’s actually one instruction can take. Multiple cycles and this particular graph instruction zero and one each of them, take 3 cycles and one cycle finished one step in my instruction.

Jishen Zhao 31:36
So this is a super simple example. So we’ll have 3 steps, but if you consider the example within using when we discuss my safe, we have? How many 6 steps. So, in that case, one instruction will take in total, 6 cycles, but let’s keep it simple.

Jishen Zhao 31:55
This is just recycle instructions. I would break it down because computers around after Clock ticks so each Clock tick. We can something can happen. We can do something so that means when the first cycle finish. When a second cycle start we can do something what we do is we actually throw our second instruction into the pipeline. And the next cycle, we can actually through the next instruction. If we have for instruction to a certain instruction. We actually can throw it into the pipeline right after the fashion stage of. Instruction one finished till in that case you can see an actual. So the graph actually is next like actual pipeline how it works, we can.

Jishen Zhao 32:48
Put instruction into it one by one each cycle. We put in one instruction, so that’s why I can see the pipeline construction compared to the singles. Construction the pipeline machine runs much faster if you have a lot of instructions. But if you only work with someone instruction here. They session zero. An instruction one. The latency for each of them is exactly same as single cycle instructions.

Jishen Zhao 33:15
In this simplified case again this is super simple way so. You can think of this case can connect with the difference between latency and throughput. Pipeline machine this track of pipelining. What it is actually optimize for? But actually only optimized for throughput, but not latency because latency of each instruction. I mean latency of each instruction is exactly the same as single cycle instructions. Our pipeline machine only optimizes duplex is have many nations to execute or you can consider other way. I play machine can optimize the latency of a whole program that helps a lot of instructions.

Jishen Zhao 34:07
Correct so let’s put this main concept together of let’s talk about were the pipelining has been. Implemented also with no word we can store data in a machine, so this diagram comes again highpointing exist inside my CPU. I actually excited quarreled CPU CPU or processor can help a lot of components can have cash.

Jishen Zhao 34:38
When I talk about later on. We can have member controllers were going to talk about later on as well. But there’s a cord this area of the CPU that is where the pipeline exist. Alright let’s see. Ok, so it’s2:41. Let’s take a break for 10 minutes. So after that. We’re going to come back at 2:51 and will continue to introduce our example. We’re going to use later on. When we discuss pipelining it’s a5 stage pipeline example was MIPS machine OK.

Jishen Zhao 35:18
Question. So there are more questions. The question will we look. I can talk about that with you privately is not about this year. Today’s class right? Let’s take a break and come back. At 2:51.

Jishen Zhao 35:41
Alright so now it’s2:52. Let’s go back to our class and continue as I’ve seen a lot of questions about humble wine access. Permission so talk to TA and will try to finish that fix that problem permission form as soon as possible so you should be able to access it by today. I also see a lot of questions during the break about pipelining. And those questions are good question. Actually, those are all we’re going to discuss waiting on right now, but that makes your falling and you actually thinking. At the same time and. That’s a good sign of for. The way your West leaning to the class so keep doing that.

Jishen Zhao 36:35
Alright there’s1 question, though about cycle when I talk. Cycle is I think you’re right. It’s a Clock cycle so Psycho is cycle time is 1 o’clock period time. And for populating when we discuss pipelining will pretty much most of time you cycle as a unit of execution time or latency.

Jishen Zhao 37:01
So I think I’ll answer one of your questions during a break right into slide about cycle time. So, in this class will be discussed discussed pipelining will pretty much use the 5 stage. Mips pipelining as example and I think this is in line or so whereas. Uh your undergraduate level computer architecture class. At least 141 we’re taking here. So a Maps there’s5 stage or one each instruction to execute. And for instruction goes through the pipeline that means the pipeline also has 5 states.

Jishen Zhao 37:53
Ann. From now, I’m going to use this diagram to describe a pipeline. This is the diagram for pipeline components. There, how the wires are connected to form a pipeline. Don’t be scared by so many wires and components with more. I’m gonna not going to ask you to draw this diagram and most of the time you don’t need to. I need to call every detail in this diagram. But there’s certain components and wires. You do need to know you need to be able to modify or manipulate so I’m going to introduce a couple components today and then from later this weekend. We talk about pipelining I’m going to introduce more.

Jishen Zhao 38:41
Ok, so the first set of components are going to introduce in this diagram is called pipeline registers. So those Reds were tickles. Their all represent for gestures and there are special registers as well. Not like the common registers.

Jishen Zhao 38:58
We have seen when we talk about. I say they are adjusters only for storing temporary results of each pipeline stage. So for example, this high progesterone pointed at here. Here’s the stage. Her as sitting between 2 pipeline stage. This stage and the stage. I’m going to talk about what are those stages are shortly so I’m going to name this pipeline? After after those 2 stage OK, the same for all of those pipeline Regressors.

Jishen Zhao 39:39
But the only thing that this projector. Here is PC were to know this was just an this is to start the first pipeline stage. So this is actually you should already be able to imagine this is the first date is fashion stage fashion instruction, so this PC Projectors Starting point out my fashion stage OK.

Jishen Zhao 40:01
Alright, but still if imagine, although this is the pipeline machine is the PowerPoint diagram. But imagine if we don’t have a pipeline implemented in this diagram and there’s like A? If you or if it will still have a single cycle machine still one instruction.

Jishen Zhao 40:20
Take one cycle? What is the cycle time so that means? What is cloud period this machine I should design. The way if I want to design a single cycle instructor machine, so I think it was like old machine. That means one instruction. Take one cycle so one extraction? How long takes to execute exactly. My instruction gonna go get into from one end and get out from the other end, that it makes my instruction finish. So this is the total latency market here TA single cycle.

Jishen Zhao 40:54
Total latency to execute one instruction, so think it reversely I if I asked you design machine. Actually, what I’m my ******d to decide what does the Clock cycle is? Or what’s my Clock frequency is or what is my Clock period? Is this is the same question? What is a valid I should set for my Clock period O Clock cycle if I. Wanna design a single cycle machine an I know typically or my instruction? How long it takes to execute this whole instruction.

Jishen Zhao 41:28
So I should my answer should be my exactly my cycle time on my Clock period it should be. This amount of time, so I’m going to design machine was a Clock frequency that has a Clock cycle of this long.

Jishen Zhao 41:42
Alright so this is for single cycle machine if I’m the one to design or determine. What is the Clock period I should determine like this make common design for common case day this is the common case. I have a single cycle for changing who’s actually instructions and and this is the common case my instruction. How long takes? So if we know this then. So answer whether each instruction in the cycle.

Jishen Zhao 42:12
Time will be the same for pipeline machine. An what should be the cycle time, decided how to decide that we’re going to take a look at the same. Diagram but now imagine I’m now have a pipelining implemented. I’m going to include the pipeline and again. I’m the one who are asked to. Determine whether the cycle time? What does the Clock frequency so to answer that question? What is the cycle time for my popular machine? I need to figure out each stage? How long it takes. Not only how long takes for the total for this instruction to execute so to complete all of a job for any instructions. So we already know the instruction execution model need to go through.

Jishen Zhao 43:02
Batch ID code input output execute. And in this diagram now we can talk about those fire stage where it is in which component hangs popcorn machine being there will be in use for each stage. So this stage is the fashion stage, so you can see it’s instruction memory.

Jishen Zhao 43:26
So we fetch an instruction from the instruction memory as the instruction. Just data stored in my memory. I so this is my fashion. Date all the components we need infection stage don’t worry bout.

Jishen Zhao 43:41
What component exactly what components are at this moment. Let’s just the fashion stage and there’s2. Pirate Jesters. That before and after this fashion stage, so I’m going to fetch this instruction finish my stage one. Can I store my instruction temporarily and pipeline with Jester so this popular justify divider here in half half of it will store the temporary. Her. Sorry divide in half half of it will. Accommodate inputs of from my fashion stage and half lips. The output port will accommodate output from this temporary or Jester to fitting my second stage versus the decode stage.

Jishen Zhao 44:32
Sodeco stage, I will need to understand what instruction does. That means I need to figure out whether all my registers or we use for example, this instruction, accessory dressers with progesterone being. Can use is ready in my pullover jester? So to justify his approval jesters? We can’t imagine it’s array over jesters there a lot of them in this pool.

Jishen Zhao 44:56
So this is my **** Oh stage so if my decode stage of down going to store the temporary result. Alpha decode stage already know this with the progesterone this, the value stored in my dresser into again this. Temporary temporary location is the partner Jester it’s adeco. Ford echo stage and also shared by this stage, so this is the third stage after ID code.

Jishen Zhao 45:24
I can execute for MIPS fetch and decode and then. Execute so there’s no separate input and output stage. This is the execution stage in my third stage. Remember, the MIPS machine. Only have 5 stage would only have a5 stage pipeline so this Thursday to execution. Directly so execution stage going to execute this thing structure if it’s adding to act. If it’s snapshotting subtract so force an after my executions down are going to do. Memory stage, I’m going to deal with the member if I want to write their value out of memory. I’m going to access a memory during this stage. Anaf remember stage it down. I’m going to do with the again register file access. It’s a Ryback Station. I’m going to, if say add instruction. I’m going to add 2. About it into a gestures and write it out to a storage Ester so that means. We need to still output. Our results to a storage Ester so you can see a little bit detail and its diagram the output here. In this stage X-ray key component is here in Wichita Falls. So in this stage. Alphabet here half apaches here, so at this stage this right back stages right back to our results to the Register file so. Are they mark the latency to complete that latency complete each stage.

Jishen Zhao 46:59
The testing needs to be done in this stage here AST the state name so there’s the party there. Key component of the stage so this is the time for fashion stage to complete the job. That’s the time a little bit longer to complete job decoding and this is the time to complete execution. There’s a time to complete the memory access. She could be even longer, but I don’t have space so I can squeeze a little bit and this attempt to complete the right backstage.

Jishen Zhao 47:31
So, in this 5 stage pipeline this is the latency of each stage. I will need you or. More latency, police we need that much of latency to finish the job, they need to be completed in each state so. We talk a lot about this explain those pipeline diagram. Now we can come back to our originally question how to determine what should be the cycle time of this pipeline machine. How we have already see our previous example comparing the pipeline machine versus single cycle machine in that block simplified 3 stage graph. In that graph first of all what we can see, there is the cycle time for each stage should be the same each stage show exactly take one cycle and cycle time should be consistent because my Clock. Always take an given pace second is that the cycle time of my pipeline machine should be shorter than my single cycle machine because otherwise the pipeline machine will not have it.

Jishen Zhao 48:41
Benefits. So. If assuming and this popcorn machine. We have 5 stage an an each stage should take exactly one cycle in a cycle time should be the same. For all of the states. Memes. What we should pick as our cycle time as should be the maximum of the latest C for all of stage because think of that way. If my cycle time is shorter so which was maximum. See from this graph. Maybe it’s ALU is a little bit longer than Wichita fall. So maybe let’s say this is the longest time to finish.

Jishen Zhao 49:28
Are all of the states finish the execution stage execution state? Take the maximum of latency of compared to other states? My cycle time is shorter than this. One then that means my execution stage cannot fit into one cycle. That’s not allowed because we need to fit every stage exactly in one cycle. This is the rule for pipeline machine design so.

Jishen Zhao 49:57
In case. When we design determine the Clock cycle or Clock frequency we will take. The latency. The longer latency of. Out of all my pipeline stage and use that as my Clock period or cycle time so in that case execution stage will exactly take. Will finish the it’s job within the cycle time? But other stage say fashion stage may take shorter time it’s done but the cycle Clock cycles do not tickle. Down so it’s not the next Clock cycle yet so it’s just weather in 2.

Jishen Zhao 50:42
At cycle times next Psycho Andthat entered that next stage are the same for other states that only need shorter latency, so it’s fine that some of the stage. Will not need the the whole cycle time? But we need to fit our maximum latency into the cycle time, then we’re good.

Jishen Zhao 51:11
Ok, there’s a question what, if the time each stage takes cares for rival. Uh like floor. Point execution could we very long it could be very long while regular arithmetic could be very fast. You guys are super good you actually already asked question wait advance so. Those kind of instructions that have saved 14. 1 Stage 2 long at all cycle will be too long well. You some other tricks in PowerPoint Indesign. I’ll talk about maybe on Thursday. That’s going to be. A motorcycle PowerPoint design OK, but today will just keep it simple will not consider those. Those instructions will have a larger variation in terms of the latency to finish each state OK. I so. With that since you already determined the cycle time for popcorn machine.

Jishen Zhao 52:20
I will introduce one very important concept called base CPI so now we can connect. Well, we have learned this pipelining component back to our original knowledge about CPU performance evaluation. Cpi if you remember it’s a very important metrics to evaluate.

Jishen Zhao 52:41
Latency of instruction execution over CPU for measure the performance of CPU but for public design, we define base CPI as one. Ok, So what it means by base CPU base here. So basically I base. Cpi the definition for my definition amazed the cycle. That I need for one instruction to enter. Until it leaves the instruction on average, if we have infinite number of instructions in a program. So basically I can think of its ideal case so an ideal case.

Jishen Zhao 53:27
If we can put instructions. One by one perfectly in the pipeline and there’s no bubbles, so if you still remember or? Laundry example, we have bubbles when we tried to run our second wonder world. But in that case. It’s not our ideal case in our ideal cases there’s no bubbles. Wine instructions densely packed with the next instruction and also we have infinite number of instructions so that makes a lot, so in that ideal case you can imagine the CPI. For running the program and the ideal pipelining is going to be one. That means everyone cycle. We can execute one instruction on average.

Jishen Zhao 54:15
Ok, but in actual case think about an actual case, the same as the situation were laundry example. Some of stage we need to wait. Maybe not because of the our stage. Have different latency because now every stage had one cycle, but because of some dependency that we’re going to talk about later.

Jishen Zhao 54:39
Line 2 instructions. They have dependency if you write programs a lot you will. Maybe already notice that dependence exists all the time in your programs. Then one instruction that coming to the partner later on, you need to wait for one cycle for an earlier or? Multiple cycles for early instruct. So, in that case. Well, she needs to start the pipeline for one or several cycles. That’s not ideal case in that case, the actual CPI will be bordered in what because the total amount of cycles on average to run each instruction will be more than one cycle? That’s not ideal case, OK but we’re going to use this base. Cpi equals to one as the kind of the baseline later on.

Jishen Zhao 55:26
When we do the calculation for CPI in the pipeline case so just. You’ll be able to understand the basic pie why it’s Mike and then the base. Cpi is one in common case, then that will be useful later on OK just. Right.

Jishen Zhao 55:47
And then I will introduce another concept so the pipeline overhead here means that the pipeline stage for an instruction. That does not need the total amount of time of the cycle time need to wait before the Clock tick out of cycle time so Piper overhead means compared to the T. Alu here this is the maximum time Fashion Stage 2 fetching actually could be shorter but because we need to have the cycle time uniformly has the same as TLU. That we actually makes instruction execution total latency to be longer. Ok so this is Piper overhead.

Jishen Zhao 56:38
Later, we’ll talk about that in the pipeline machine instruction need to wait for each other because of dependency. This is not pipeline overhead? What time is it so we have 5 minutes left. Produce a convention where we how we represent the pipeline, the 5 stage pipeline. And this machine and then I think we’re good to go for today. We have a question here. We have a question that cycle time you called to the stage, which takes the longest time yes exactly.

Jishen Zhao 57:18
Ok, So what we’re going to use rest of the time when we talk about pipelining. We’re going to use Mist Machine as example first and foremost machine Cardi. Talk about in this diagram there 5 stage gender fetch decode execution is here memory. It’s your deal with memory and then right back partially here and then wrap.

Jishen Zhao 57:43
And here OK an because spell it out all of them kind of complex so most of time will just call them. F stage D Stage X stage and stage and W stage so.

Jishen Zhao 57:56
Ftxmw issue directly through 5 stage XD FDX MW 5 stage. Mips pipeline, but like I mentioned other type of pipeline machines will have various way over. Are staged for example, entail machines exodus 6 machines will have very deep pipeline deep means you have more states. The deeper more stage. You have in PowerPoint so they can have. As many as 22 states so that’s that’s a very deep pipeline that will make reasonable pipeline machine.

Jishen Zhao 58:32
The reason of instruction goes to the pipeline super complex but for this class just keep saying. Simple I don’t want to make. Too complex will just use the 5 pipeline machine as example always final class OK. Another thing it’s not required. But if you are going to read.

Jishen Zhao 58:54
Other articles articles about pipelining, especially articles touching over low level hardware design. They will call the squares red rectangles as alleged. Because our pipeline gestures many of the times they are implemented in hardware as slaps. So you don’t need to worry about what matches is you can just make things Depot enough you can just. Cancer large in the pipeline it’s apartment dressers.

Jishen Zhao 59:28
But for this class, I’m going to use the convention just for this class thing to name each partner adjuster so it OK, so things will be easier for us to? Understand communicate with each other and their homework in our exam in already so our convention.

Jishen Zhao 59:47
Here is I’m going to name each pipeline digester after the stage that it stopped OK. So except for PC like I mentioned PC. It’s a special interest or word. No so I’m gonna name PC’s are starting of the fashion stage and then.

Jishen Zhao 01:00:08
From here, starting from here, I’m going to name the popular address to ropes. However, the state is start so I’m going to name this one as steep acknowledge Esther. This is the Expiree Jester.

Jishen Zhao 01:00:22
This is an pipeline risers and this is the Depew pipe on digested so the next time I call? W pepper gesture, you know it’s actually this one OK. Hi I think this is pretty much what we’re going to cover today. I have question and chat. Let’s see are those latch or flip flops. Those are less. But what’s the difference? Are we, we don’t require for this class or you can either read article or talk to RTA swimming. If you’re interested, but I will not require that. For this class. Alright awesome or VSI. Right so I think this is pretty much I’m going to cover for today.

Jishen Zhao 01:01:15
Ann. Next time well getting more detail more complex cases for reasoning about if we feed a program a sequence of instruction to partner working happen and how to kind of problem can happen know how to solve those.

Jishen Zhao 01:01:33
Ok, so this is so for today’s class don’t forget to check the homework by today. After we fix the permission issue and then check also press zero solutions. Feel free to ask. Or ask questions some parents are enjoying ourselves, so I’m going to take a break myself for 10 minutes an and I’m going to start my office hour, 10 minutes late.

Jishen Zhao 01:01:58
Quest to end this conference room but for now, just to clean up the zoom conference room. I’m going to close up. This Zoom Conference mode for everyone and then I’m going to restart after a few minutes. Ok alright see you in June on Thursday.