Lecture Recording

Text Transcription

2021年10月13日 下午 8:37|1小时12分钟43秒



Jishen Zhao 00:00
Today will finish up at the discussion of performance and start to talk about the next session, which is instruction set architecture fly say. First of all let’s review would have learned during last class. So during this class will talk about first is performance metrics. You should be very clear about the difference between latency and throughput and what type of metrics are latency. What our throughput. So the rule of thumb is that latency has a unit of time while throughput is the reverse. Of time. And based on latency and throughput would talk about speed up so their latency speed up and throughput speed up you should be able to calculate speed up given certain information that.

Jishen Zhao 00:50
We have talked about doing examples. Anne Anne on top of that performance metrics. We also discussed how to calculate the average of the performance metrics. So we have 3 types of average calculation. Measured that we have used that was mad at mean harmonic mean and geometric mean so if you wanna touch test yourself think about. Which performance metrics should use arithmetic mean? Which one we use harmonic mean and what does the? Formula of harmonic mean and which one we should use geometry mean to calculate average.

Jishen Zhao 01:30
And based on the basic performance metric, we have learned We also discuss how to measure the performance of CPU specifically. So there are 2 basic metrics. We have used usually to evaluate the performance of CPU’s, which is CPI an IPC. So think about what is CPR? What is IPC so CBI is cycle per instruction well IPC that verse instruction per cycle? If you wanna review would have learned about CP and IPC you can use this example. And typically this is a very typical example that we can have the similar questions. Maybe in your homework and also exams so if you can calculate this example by itself, and you should be good with your homework and exam most of the time.

Jishen Zhao 02:27
Today I will continue to discuss the rest materials or performance. Well specifically talk about what are other CPU performance metrics people usually use actually develop a sign of latency and throughput we have learned an also we’re going to talk about 2 and boarding performance laws.

Jishen Zhao 02:47
And finally we’ll discuss things will be discussing benchmarks and what are the benchmark sweats that people usually use to measure the performance of computers? An the rest of time will start to talk about Isa.

Jishen Zhao 03:04
So first of all in addition to CPI and IPC we have. We have discussed sometimes we use Azure. Cpu performance metrics. There 2 of them will be discussing during this class. As admits an flops so MIPS means millions instructions per second. So it’s actually similar should I be seat, so the format of throughput. It’s just like a Virgin use different units to measure IPC. Instead of instruction per cycle where should match.

Jishen Zhao 03:40
Millions instructions per second. And the other performance metrics were also uses flops so flops maze. Floating point operations per second. This is also a means over throughputs similar IPC also measures during the certain amount of time. How many instructions. You can execute it on his computer but in this case would only care about. Floating point instructions or if are you want to think about alternatives to also people also use G flops, which is gigger floating point operations per second. Out during the end of this quarter. We’re going to discuss GPU architecture and are we going to use a lot of flops Angie flops to measure the performance of GPU. So we will not discuss flops, too much here during this class will leave it, mostly to the end of this quarter.

Jishen Zhao 04:37
But I’ll give you a couple of examples of MIPS million instructions per second. So first example is here. The similar to the example. We have for joining last class and calculate CPI here. We have an example of again a program have different component of instructions aow. Branch loads and stores and different percentage in his program. Now we’re going to have for 2 computers to run this program. And each of the computer. Because performance difference, so you may use different number of cycles to execute each type of instructions and those are the numbers for example. For AIU running on computer, one. Only have one circle. And in computer 2, running AOE instruction needs 2 cycles to finish so and so forth so the question here. An example is? What is the average Maps for computer, one and computer 2 respectively. Am I have provided an exception of Clock frequency of the 2 computers, which is one Giga Hertz. That means one cycle as one nanosecond.

Jishen Zhao 05:56
He goes calculating Maps as Miller instruction per second so you need this information, Psycho per second to calculate Maps. So. Take your time for a couple minutes to calculate this example.

Jishen Zhao 06:27
Free free to take more time to stop this video to calculate the MIPS yourself, an when you’re done. Let’s go over how we calculate it together. So for computer, one machine one and one CPI. It’s easy to calculate because they were already learn how to calculate during last class example, so to calculate CPI open machine one. I’m at 50%, one cycle, 15%, 2 cycles, 20%, 2 cycles and 15% of instructions of one cycle so Adam together for CPU one. The CPI of running this program as 1. 35.

Jishen Zhao 07:13
Now, how to calculate Maps for computer, one. Now already have CPI this information one. 35 then we’re going to do this conversion 1,000,000 instructions. An one cycle is why nanoseconds? So 1,000,000 instructions will take. 1. 35 multiplied by 10 to the 6 nanoseconds so that means one millisecond it takes. 1. 35. Multiplied by 10 to the -3 seconds to run. Then Maps is 1,000,000 divided by the time so it’s741 minutes. This is the performance ever can be one so for computer to say first.

Jishen Zhao 08:05
We’re going to calculate CPI the same as we’ve done as last class. And the CPI of permission to is 1. 5. Their Maps as one over 1. 5 multiplied by 10 to the -3 and a mix of machine to therefore is 667. So hopefully you know how to calculate mix. Perhaps we’re going to have her homework question on calculating Maps and you know how to? Alright I’m going to have another example help you to better understanding the calculation of Maps. So the same information provided for the program and 2. Computers now instead of calculating average Maps.

Jishen Zhao 08:55
Now we’re going to calculate peak Maps for the 2 computers, So what is peak? Pink means the maximum so it makes the maximum apps that we achieve for computer, one and computer 2 respectively. So give yourself a couple minutes to calculate peak mix 42 machines and his example.

Jishen Zhao 09:28
Again feel free to take more time if you need more time to calculate and by stopping this video. And when you’re done. Now we’re going to go over together so again for both of the. Machines we need to figure out where the maximum mix we can achieve. So maximum Maps achieved when we actually have the lowest cycle time. So are the circle out the cycle times and is example. That is lowest is one cycle so for machine one.

Jishen Zhao 10:05
Peak Maps or the lowest cycle time achieved when this machine is executing an ALU instruction or store instruction. So at this moment while the machine is executing one. Alu instruction or once doing instruction at achieve the minimum cycle time sorry minimal number of cycles. Anime the cheek the peak Maps. And as a forward motion to the peak Maps achieved when the machine to is executing instruction that only needs one cycles to finish. So for both of machines where you know the minimal cycles. As one so Maps. As one over.

Jishen Zhao 10:55
10 to the minus Lee. So peak. Eps is achieve at 1000. Alright so this is what I have learned so far, the performance metrics and summary that we have learned 2 basic performance metrics latency and throughput and on top of that we calculate speed up. But this is feedback and stupid speed up. And on top of the 3 metrics latency throughput and speed up.

Jishen Zhao 11:29
We calculate average using arithmetic mean Harmonic May. Andrew metric name. So we use arithmetic mental calculate average latency. Anharmonic Maine used to calculate average to put. Do you measure me we used to calculate average troops as speed up? Ann. On top of the basic performance metrics. We also learn about how to calculate CPU performance specifically so for CPU performance. We actually use CPI PC MIPS. Flops most of time. And they are all a method of calculating latency and throughput respectively. So it should be no very clearly, which one is latency, which wants to put in a particular CPI. Is latency an IPC MIPS and flops are all throughput?

Jishen Zhao 12:28
Alright the next we’re gonna talk about 2 important performance loss. The first performance law as of this law. You perhaps already learn about it during your undergraduate level computer architecture class. Here are mostly we’re going to review we have learn. I’m in a class so Apple slaw. You can recall what is it? This is the equation, perhaps you already learned. And for myself were all the PS waters. Ss Anne whether physical meaning of this equation.

Jishen Zhao 13:09
So here is. Almost law measures how much will an optimization and your computer designed to improve the performance. And here as the proportion of running time affected by the optimization and as as the speed up that we want to speed up before I. And I’m going to give you 2 examples very to calculate so remind yourself about where we have learned before so first question is. What if I speed up 25% of programs execution by 2 times? What is speed up I’m going to get? And the second question is what, if I speed up 25% of program as occasion by Infinite. How much speed up I can get? In this case. So give yourself a cocoa couple minutes to calculate. The 2 examples. And you can stop your video whenever. And once you’re finish I would give you yes Sir. Joe. And the first question the answer is 1. 14. And the second answer is 1. 33 speed up. So here to calculate you simply apply so first. Say for example, the first question 25%. Anne here P equals 225 percent. And 2 times. As equals to 2 times. So one over 1 - 25 percent. +25 percent divided by 2. Your answer will be one point 14. And the second question is simply replace S from 2 times to infant number so the second term of your equation here actually is zero. So it’s one over 1 - 25 percent. You got 1. 33. I’m not slow not only we use it to calculate the optimization of performance. How much speed up. You can get but you can also use it to calculate how much improvement you can get by writing programs. Anna parallelism.

Jishen Zhao 15:45
Underscore not only can be used to calculate how much improvement. You can get from computer design. But you can also use to calculate how much improvement can get by writing program with certain amount of parallelism.

Jishen Zhao 16:00
So here in terms of parallelism and I gave the example over if you write a multithreaded program, the number of stress you get. Increases the parallelism. So here in this equation instead of for us, we replaced as with N&N equals to number of threads. Your program. So I’m going to give you again, 2 examples, so for calculating speed up your program using almost law, so first question is what is the maximum speed up you can get for program that is 10%? Zero so that means that 10% of time energy program. You cannot actually divided into multiple threads. But you need to execute interior so example could be some initialization or program or calculating summary summarize your program.

Jishen Zhao 16:58
And second question will be what about if I only have 1% of cereal, so I increase the parallelism my program and reduce the amount of cereal, a program. To be only 1%. So. By maximum. What I mean is that? Ever get. Infinite number of stress, so can hear it close to Infinite. Then your job will be using this equation to calculate speed up our program so again give you a couple minutes to calculate by stopping the video and then once you’re down and continue and gave you the answer. So answer will be on question one is 10X question 2 is 100X. So it’s easy to do it, you same play. Replace an was infinite and then you give peace and number some percentage based on what information given in this question.

Jishen Zhao 18:01
I’m getting a few more concrete example of a computer design and in this case again. The similar information of Guardian provides examples. We have a computer running a program than we have for certain. I write down over each type of instructions so in this quest. Memory instructions takes 440 percent at instruction, 50% multiplies 10% and the cycle time of each type of instruction or the given as well so. If your boss asks you to improve the performance of this computer with this program running an. You have the power or capability to. Make one category of her instructions twice as half the means you can reduce the cycles of one type of destruction by half. Which type of rejection when to pick? So again give yourself a couple minutes to calculate.

Jishen Zhao 19:19
And once you’re done, we’re going to go over together. So what I’m going to do this question my way will be calculating CPI instead of using the Omnicell Equation, so and the rest of the time during this quarter.

Jishen Zhao 19:36
I believe most of time you will not use the? Homeless like question, but CPI and IPC the most time you will be using to calculate whatever questions you have about performance so.

Jishen Zhao 19:49
One of them is first to calculate basic CPI without any optimizations so in basic CPI. Urena how to calculate is 40% for cycles, 50%, 2 cycles, 10%, 16 cycles. Or you can do other ways, assuming there are 100 instructions in total and one program.

Jishen Zhao 20:10
40 memory instructions 50. Add instructions 10. Multiply instructions and calculate how many cycles in total to round this instruction. I added together. 40 multiplied by 4 + 50 multiplied by 2 + 10 multiplied by 16. An divided by 100 instructions so either way, you will get an answer for basic CPI will be 4. 2.

Jishen Zhao 20:38
So that means on average, running one instruction takes 4. 2 cycles. And then what I’m going to do is to calculate the CPI of each optimization. I can take so for option. One Magna improve the memory instruction is paid by twice so I’m going to. Reduce the cycles of remember instruction by half from 4 cycles to 2 cycles.

Jishen Zhao 21:04
So now the new CPI will be 3. 4. Now I can do the same with other options option 2 minute make the add instruction twice as faster and then option 3 will be going to make multiply instructions twice as faster so. In either case, we reduce this number of cycles for the type of instructions I have. An option 2 will get 3. 7 as CBI, an option 3, we get 3. 4 SPD as well so. Would you answer? Lower the CPI the faster the computer so the better option will be optimization so in this case, we actually have 2 winners.

Jishen Zhao 22:01
Alright so let’s summarize we have learned. This equation. Has under SLA? And what it is gripe what it useful as to help people better understand how much does an optimization can improve the performance for easier computer design or program? I read it, but. Note that will not use this question.

Jishen Zhao 22:29
Most of time. During this class and most of the time when you need to calculate speed up or calculate based on US law use CPI. Like would just have down and our previous example.

Jishen Zhao 22:47
Alright so this is the first performance law. We have learned today and then the next performance. While we’re going to talk about is about. Cute. So everyone has experienced over waiting AQ and for computer design at a computer has a lot of cues as well. So you have a lot of requests going on so say memory access requests or computation requests. I love instructions waiting to queue as well. So this law helps you to better understand how long it takes to for you to win a queue or how long AQ would be.

Jishen Zhao 23:31
Given certain number of requests Anna certain speed of a computer can run so this law is little’s Law. Ann this is equation for let US law. L equals to wonder multiple by W so here the 3 parameters here in this law audience equation means L equals to the number of patterns. Anna system or the number of items in AQ and one that is the average arrival rate of your request so how many requests per say per second.

Jishen Zhao 24:14
Ann. Deborah Mays average wait time so how long one request need to wait in a queue or an assistant. And there are certain amount of sections given when people calculating with this equation. As first of all this equation is used in a steady state by means of steady state is that the system actually has steady arrival rate. An the arrival rate equals to the dip Archer. Lee rate so that means your queue already reach certain state that is quite stable.

Jishen Zhao 25:04
And we can use letters lie in various ways. You can actually use. This too, so there’s3 parameters in there, the Lens that QL. Request arrival rate Lambda and the wait time over one request W so given 2 of the parameters. You can use it to calculate the last one. So, sometimes you can use this to calculate so if you want to design a computer give to design AQ how long a queue need to be so the length of Q. At least need to hold the number of requests. We wait in a queue in a steady state OK, so that means you can calculate L and use the L2 as a guideline to design your Q. In terms of for killings.

Jishen Zhao 25:54
And you can use this other ways as well, and this is several options and slide. I can read yourself. So I will not get into too much determined here, so in summary the 2 basic laws. We have learned about performance as first under slaw. An hour I give an example for what kind of questions could have and how to calculate Anna second is little’s Law. And because little slot equation so simple so we’re not giving examples.

Jishen Zhao 26:32
Alright the last piece over performance is about benchmarking so. We learn about performance metrics and how to measure the performance of your computer. But in order to measure the performance computer. We need to know what kind of application is the computer typically. Anne was certain types of locations how well they perform a computer can perform right so in order to understand the performance of computer. We need sound workload to drive our evaluation. So without workload or without applications performance of the computer means nothing because we just don’t have enough information about how we should measure performance. We don’t even know whether we should lease use latency or throughput.

Jishen Zhao 27:25
So this is how benchmark. Comes to play so when people designer computer. We typically use a set of benchmarks to evaluate and decide whether the problem is good enough. And there are certain types of benchmarks in this class. We’re going to only categorized them into 2 types, so one of them is called Micro benchmarks. And the others cause ovary applications or benchmarks. Both of types of benchmarks have been widely used and then. Actually, very important and critical because. The 2 types actually is used different situations. So for micro benchmarks typically there now standard will close and you actually are may weekend black winning maybe few minutes. There are very simple and small workloads. So I would say there tiny programs and typically they are not representative of what people are going to use in the real scenario through applications. But the useful in terms of because they’re so simple to write and then we can control what kind of program behavior well looks like so we can use them to do some. Evaluation that to affiliates, particularly behavior of computer. An evaluation can be down real quick well.

Jishen Zhao 29:11
Definitely will need till a valid computer while it’s running real close so we know in the real world what how does computer? Performing so in this case, we’re going to use standard with clothes or without benchmarks standard will closer. Moreover, bigger programs so it takes many people. To develop and takes long time, and sometimes at we directly grab those workloads or applications being used in the real world machines and then adapt to a benchmark. And those type of programs were benchmarks are more representative for actual program. People will run or use in daily so they remove representative over how computer while will be performing in a real life. But because first of all those are takes long time to develop so a little bit more effort to use those benchmarks or develop those benchmarks and second because they were so big so it takes longer time to round in micro benchmarks so turn around time over. Evelina computer performance will be longer, so their trade off process comma. Vita that user types of benchmarks so that’s why depending out your situation or your? Need it should pick user type of benchmarks for actually avail your computer was both. And the next I’m going to give you a couple of examples for several examples. So for benchmarks ways people typically used to measure the performance of the computers. So first is very standard, wet very widely used in both adamia an industry. It is called specs appear so some of them may already heard about those benchmarks with. A collection of benchmarks.

Jishen Zhao 31:06
And actually in 2017. This benchmark suite has the new version so it’s back. Cpu 2017 and before that, it was spec CPU 2006 an even. We’re now both. Versions of for benchmarks are are all used in today’s performance measurement. And there are different types of spec benchmarks and sweet so one of them is the benchmarks that used to evaluate latency. Of a machine, so it’s called Latency Spec.

Jishen Zhao 31:47
And what it does say that for each benchmark you’re gonna have a number over latency examples an. Take a speed up. Based on your machine execution and then take an average so remind yourself. What kind of average. We should use to calculate the average of speed up.

Jishen Zhao 32:10
It’s Joe Jack May, I. An we have latency throughput latency, Spec definitely will have 2 prospect. There are several recent website of respect speed benchmarks, they actually publish the winners or the leaders over the performance. So if you’re interested can take a look at their website and find out whatever reason say latency or throughput leaders.

Jishen Zhao 32:43
The second example of benchmarks, we can talk about is parsec, so this isn’t over. Developing Badania, an but it was also widely used benchmark suite. So the difference between parsec and the spec at the Spec. While single threaded benchmarks as far as I know well the parsec or multithreaded benchmarks, so if want to measure your multithreading performance of your machine. Perhaps you want to use some multithreaded benchmarks one example will be parsec.

Jishen Zhao 33:25
So here is a list of the workloads that have. In a parsec benchmark suite so you can see, there are a lot of emerging workloads or widely using today’s. Various machines, so for example, financial analysis, computer vision and data analysis. Media processing so averse of them so if you want to measure a specific type of location is running our machine. You can pick either one of them. And for parsec and really emphasize pluralism because mostly are.

Jishen Zhao 34:08
Multithreaded programs in this benchmark suite so in terms of parallelism where you can find is 2 types. When is data parallelism so except in the data as the same time? Multiple stress access data center or pipeline parallelism so access or the behavioral problem over in PowerPoint manner.

Jishen Zhao 34:32
And the third example of benchmark suite as close wait so close with is used to measure a different type of computer which is the cloud OK, so and this benchmark suite there.

Jishen Zhao 34:46
Poncho for benchmarks or programs actually is widely used in a cloud services so you can see, there’s a data. Analytics data cache in which men cash if you heard about Mcashan Cassandra no SQL. Our database cable store and not your father’s. So if you want to measure the performance of our cloud computer cloud service, you can consider using class with.

Jishen Zhao 35:15
And the last example. I’m going to introduce Sir ML proof. This is some over recent one. I think it’s publish maybe last year, so this is the machine learning benchmark suite. So all of the benchmarks and this wait for a machine learning workloads. It can say some over an introduction. I listed over here. The type of locations and their speech recognition.

Jishen Zhao 35:41
Text to speech image recognition and machine translation. So far actually started based on the by do D bench suite that anchors developed starting from 2016, but now there are really a lot of our industry. Or and also adamy researchers help develop this benchmark suite. So if you want to test or measure your computer performance. Easier to Clowes computer or at your mobile device. Training or inference, you can use this benchmark suite.

Jishen Zhao 36:28
Alright so this is all about performance. We have discussed in his class and here’s a summary so you can use the summary to review. Well, for. You have learned. So basically we’ve learned about performance metrics throughout them, an average of performance metric and you know how to calculate me an measuring the CPU performance. So there are 2 basic CPI PC actually happy see as a verse of CPI and then there’s some others like MIPS and flops so you should know how to calculate Maps if you’re.

Jishen Zhao 37:11
Do you have question go back to our example? And try to do it again. And we learn about 2 performance loss on those louana little’s law finally benchmarks. So we will not have for any question about benchmarks homework or exam by there’s a more focused for you if you want to measure.

Jishen Zhao 37:32
Computer running a specific type of applications you know how to choose the type of benchmark you should use. Alright this is all about performance. And the rest of today’s class, we can and start to talk about their second. Session over computer architecture and that is an instruction set architecture or I say. So first of all what is isa? I say ashtrays layer between the software and hardware for computer ashy. There’s that define it more like a contract. Between the software and hardware software talks to hardware V isa. And vice versa.

Jishen Zhao 38:26
So, in the rest of today’s Class I’m going to introduce some high level ideas. Avaya say NFL time. We’re going to talk about how what it means by a designer good isa. And if we have even more time going to talk about compatibility of icy so first of all.

Jishen Zhao 38:49
What is isa on however? When I’m going to explain it very first of all an execution model for program? So how a program finally get executed on a computer. Actually help program get compile and assemble so first of all I have a program written in CC plus plus or Java, csharp anything. We will call it those languages are high level. Programming language because there are closer to the natural language. Then the machine cannot stand so those are over for human beings to read into interpret rather than a machine. But in order for the machine to understand and be able to run those programs.

Jishen Zhao 39:35
When you too. Take a few steps so first of all we need to compile it Anna Wintour compiled into assembly. So if ever learn about simply call it looks like this on the right hand side. An assembly language is still human readable representation. So on the right hand side of this piece of code. You can still relay. This is add. The 2 numbers and get that result in back to the start number and say this is the. Branch so force, but computer machine cannot actually understand this.

Jishen Zhao 40:20
Call so for machines, you can only understand zeros and ones so in order to for machine to be able to run the program. We need to further. Translate into we called a machine code so a machine readable representation ones, and zeros and left inside of those piece of code. So to execute a program on a computer when you first compile it and then translate into machine code.

Jishen Zhao 40:59
So if you think about what I see has an instruction set architecture has pretty much define. What machines ability to round? What kind of of call so here I gave an example of an but all of the I say will be similar in high level, the isa will define so how many operations. How long an operation says 32 bit or 64 bit would have an how many registers that means? How many variables you can store in the same time, Anna Machine. That the program or instruction can use and how many different type of instructions so as multiply branch so force move and Subtraction. Those are all types of instructions? How many of those different Apple instructions. We can have in a supported Anna certain types of machine.

Jishen Zhao 41:54
This is all defining isa. Then I’m going to give you more. Detailed description of icy. So, like I mentioned I actually define I say more like a contract with Printer software and hardware. So for software to talk to hardware. And his contract. Software will provide a higher information about what it is so if you write a program what kind of money will need from hardware support say my program needs adds my program need brunch or it might program need multiplication. So those are our types of operations will be needed that I need hardware at 5 software and hardware to support so this is all written in contract.

Jishen Zhao 42:46
Advisor and for her to talk to software as OK, you need those support, but I’m going to talk to you. I’m gonna let you know what kind of support, I can provide.

Jishen Zhao 42:58
So how are you going to write something contract is what the capability of the hardware? What kind of type over support can provide. So. Put them together and his crowd contract will both have suffer information in the hardware information. So you will have both suffer requirement an Harmer capability. This is basically being written in our in isa or this. I say contract so I gave an example of icy documentation. You can find online. Those documentation is a forearm architecture. They have for serve several versions of Isa.

Jishen Zhao 43:39
Can 4? Maps their version via C4X86 treasure. Isa documentation as well so this. I say documentation or this. I say contract what it has is on the bottom left of the slide. First of all I’m going to have say one. This is One Page of the isa documentation at the first half of this documentation defined. The software needs so ask for example, the software will need at operation supported from hardware.

Jishen Zhao 44:12
An then. The hardware provide information over what kind of functionality can support say how to implement this App. I need some destination register and some need some. Source Projectors and even a button over the stock news page. One Page of icy documentation. I don’t have any space left this life so actually putting the right button right? Is about how machines capability will be and how is actually implemented machine so this? Is pure hardware then and this documentation we would define will be?

Jishen Zhao 44:55
Say I have this instruction implemented in this 32 bit matter, so there’s instruction to 32 bit and which version of this best friend will will be mapped to. Which part of instruction so say I have? Add instruction and then the App operand? Where it is Anna stores destination register. The idea of destination rejector word is? And the operation where is? Define and his best friend.

Jishen Zhao 45:28
So you can see in his documentation, you have both software port and Harbor power suffer part defines. What Adanna functionality of AD was this short paragraph and the hard part of lines? How is actually implemented and what is capability sodas? Again, only have 32 bit until we cannot see that capability.

Jishen Zhao 45:52
Alright so, based on that example. I’m going to give more formal definition of I say so. I say again it’s refers to an instruction set. Architecture so it’s an interface between the software and hardware or idea of seats at contract between hardware and software. And mostly it actually has a functional definition of reverse aspects like the registers and memory addresses and that has a set of.

Jishen Zhao 46:26
Operations also defines add multiply branch phone stores. Those are required by software and be implemented by hardware based on its capability. And the IC needs to have very accurate description.

Jishen Zhao 46:50
And. You need to be pay attention to what is not defined in isa So what is not defined as contract for this isa documentation? There are still example those maybe in summer for homework or exam? Question I will give you. A bunch of law. Options an ask you which one is should be defining.

Jishen Zhao 47:12
I say which one should not so now. Those should not be defining. I say you should know very clear as first is how do operations are implemented? And the low level hardware, so how the circuit is simply implemented is not fine. I say I said implementation over high level implementation. In terms of like the stream. And second example is which operations faster or slow. Or what is performance over operation disappear hardware designs not in as either an how much power it burns.

Jishen Zhao 47:50
For each operation is not defined and I say user so roof sound with the isa is only high level. Her information implementation low level hardware implementation is not defined. I know I say. And uh if you’re wanting more over concrete example or an analogy.

Jishen Zhao 48:17
If I say you can think of language, so I say it’s kind of language. And where we have different language. I we have English, French, Japanese, Chinese social force. While we have different isa as well, depending on word. It is implemented, which countries implement rate, which kind of architecture or hardware is implemented. What is X86 or arm or wrist 5 they have different as days different types of language? And the same as language.

Jishen Zhao 48:54
I also have her narrative, which is a program so bunch of IC, we can use it to design a program. And each sentence in a program as an instruction. And you can think of verb is up as operations and multiplier lower branch or operations. They’re like a verb and human language and now it’s like data item I can operate. At. Have source and destination of each operation. Those are now and we can even have our objective, so illiterate.

Jishen Zhao 49:34
If this class or the next class when you talk about addressing modes. And I say so you will have more a better idea of where are those? And the same as language.

Jishen Zhao 49:49
We have different language, so we have different eyes as well as we mentioned earlier. The bear thinking about like a key difference between language and Isa. As that language actually can have certain accountant certain ambiguity well. I say need to be very precise very clear because machine just cannot think like human being still. Machine cannot tolerate any ambiguous.

Jishen Zhao 50:21
Hey so already already gave you the asset example of why I say. Sister bigger larger views of hopefully it’s clear so in summary. This is One Page. So you can imagine this One Page and an ISA documentation for one instruction. That is just add instruction and the first half of this page is defined by software so add an would’ve functionality description of this part with this short paragraph and the second half is given by. Power so mostly defined how this. Our instructions being implemented and what is capability was hard work in terms of implementing this instruction. Alright so this is the high level idea of icy.

Jishen Zhao 51:15
Anan the rest of today, we’re going to go a little bit detail into. Or I say been implemented and design. So this is the basic structure of are running an instruction, so previously. I’ve talk about program execution model and for I say more computer architecture. We care more about instruction execution model that means how an instruction being executed, while programming execution model will care about how one program being executed right so now we only focus on.

Jishen Zhao 51:53
One instruction here, so one instruction needs to go through a lot of steps in order to be executed on the left hand side of the slide. I show very basic example of such steps. So. There, how many 6 steps in total, so in order to execute one instruction need to go through first fashion instruction from memory. Because your instructions or programs or simply a stream like Abyss remind file store and your memory. So we should first fresh this one instruction and decode. Deco means that the machine needs to understand what instruction does and reading put an execute this instruction say add execute F and write output to somewhere. And find the next instruction, so in order to find one instruction to fetch when instruction.

Jishen Zhao 52:53
We need to identify word is instruction store in the memory. And in particular, we use what is called PC? So have you heard about PC? At the program counter. So it’s called counter, but actually it’s more like a register. A special resistor implemented an computer. And the way that instruction being stored as. Store pointed at any location pointed by a PC so PC is more like so like a index 2 instructions. And for each instruction is associated with one PC. So if you already know the PPC open instruction, you will be able to find the instruction by searching through the memory with that index. And you see on the left hand side of the slide for one instruction actually to go through so many steps an executed instruction.

Jishen Zhao 54:10
I need to ensure that the execution is atomic. So top it means that. All of steps need to either happen. Altogether or. None. Or add described atomic as or or nothing so that means at each time point over machine execution in to make sure that easier for this particular instructions completely executed all steps are done. Or none of them are done. So we’re going to talk about our Bay City over instruction execution later on. So, in summary an instruction execution model means how when instruction be.

Jishen Zhao 55:00
Executed on a machine there are a few aspects first is that computer itself a finite state machine of registers memory and programme counters. And in particular program counter is very special type of a gesture and it’s kind of important because it’s we use this as the instruction pointer or index to find an instruction were stored in the memory.

Jishen Zhao 55:27
Ann to execute one instruction. We need to go through many steps an simple example is we’re going to at least go through fresh this instruction from memory using? Pc the program counter and then decode instruction, so assuming may imagine your machine, so we need to first fashion instruction.

Jishen Zhao 55:48
Anna decode instruction. So you understand what instruction need ask you to do. And then raising pullets storing investors or memory and then execute it after executing instruction. You got a result and then write result back to either register or memory. And. You use the PC again problem cover over the next instruction to identify where the next instruction is and then you look back to fetch next instruction. Decode read way read execute right. And so and so forth. And this is how an instruction being executed.

Jishen Zhao 56:34
Alright so this is the instruction execution model so you should distinguish between the program execution model so for program execution model to compile and then. Use a similar to translate into machine code and then get in executing the machine while for instruction execution.

Jishen Zhao 56:53
Mode execution model, mostly talking about you need to go through. Those steps for one instruction to be executed. So now we have for sort of for high level, basic idea of what I say is looks like an I’m going up before we talk about how I say being design, how we can design a better I say. I’ll talk about most goals so whenever we need to improve something optimize something we need to first clear about what our goals.

Jishen Zhao 57:27
So. Wanna go so visor design? What is a good I say? Mostly, for IC design while we need to improve our optimization. We need to improve programmability performance and Implementability and compatibility so we’re going to talk about them. One by one. So first of all programmability because I see. As associate will suffer. Both software and hardware so the software part. Is that we need to make sure the software? Can actually right down into the contract and I can use it? So that’s a part of our programmability. That means that as a design need to make sure that it’s easy to express in a programs and express patiently.

Jishen Zhao 58:22
But. For I said design words piece of software exactly interact with IC design. Actually is different in a change overtime as earlier as human because computers, so, so terrible with that it’s true. Hard for a code to reassemble it atomically so most of time actually hand assembled so in that case. The IC design need to make sure that human need would be able to understand and deal with that easily and efficiently well today, mostly this compiler directly into requisite design. Because there’s compiler after program African Paul called into Assembly, an entrance into machine code so compiler needs to able to interact with the icy. Really easily and efficiently.

Jishen Zhao 59:29
So pay attention to hear what it called programmability action can change overtime. An implement ability means that. How easier this? I say being can be implemented by hardware so there’s some over her part? And there are various tricks or techniques and hardware. We can use to implement. Isa also and this class.

Jishen Zhao 59:56
We can talk about several of them like a pipelining an out of order execution. Now I will ask talk about later during this quarter. And so those are set of techniques that people use to improve the performance of Isa. But in order to use those techniques.

Jishen Zhao 01:00:19
Ic design need to be really efficient an implementable so several types of isa implementation. Actually make it harder for her implementation, so for example, some icy actually have Red Bull Instruction, winning some instructions so the example. We have for same before about. Add instruction that One Page of icy documentation actually have a fixed Lens of for each instruction. Sorry to fix but some of us. It isn’t actually have verbal Lens, so some instructions longer some instructions shooter. That I should make the hardware design, most challenging.

Jishen Zhao 01:01:01
And there are other examples that I said here make the implementation of hardware was challenging for specific type of essays and performance. All of the computer design actually need to ensure that have high performance. And as it’s actually why one of the reasons why we talk about performance and metrics at the beginning of this quarter. So we know where to go is how to achieve those goals, so in specifically for performance.

Jishen Zhao 01:01:29
Let’s get back to we have learned during last class about CPU performance equation so here there’s several terms. Actually, developing the fact out this latency equation. An every remember, we actually mentioned isa multiple times. So at least. 2 out of the 3 terms, has I say as one of the factor so for instruction counts related to IC and for CPI also related to Isa. I mean, if you develop really good, I see you have a chance of improves instruction count in CPI. Circle out. Here, I’m going to give you an example. How? How ask fire say impact the instruction counts and CPI the 2? So there are 2 flavors advisory design. Then we can think of one is called Sisk when’s called risk. So here are 6 men’s complex instruction set computing was more complex instructions. So, in 6 styled IC designs each instructions very fast. That means a complete lot of jobs in one instruction. So why instructions heavy weighted. Ann in risk that means reduced instruction set computer so. Track was Cisc RISC actually each instruction is very simple just just complete very simple job. So there are trade out so first of all 46 because one instruction actually. Complete a very complex job so in order to. Implement one program is a simple task actually you need less amount of instructions so instruction count for 6 tiles instruction set actually is lower for. Given program compared to risk.

Jishen Zhao 01:03:48
While for CPI on their hand because one extraction so ******* complex so in order to execute that instruction typically will take more. Number of cycles with 6 Stylizes while risk because one instruction, so simple so it typically takes less number of cycles, so cti 46. Sis has lower than CPI of risk her fits one instruction.

Jishen Zhao 01:04:20
I’m good if you have or more concrete examples, so one of the Cisco machine. I’ve went to this is a very typical instruction for it. It’s AD, but think of what this particular instruction does. I’m actually does this a lot of work and a single instruction. So actually added content over MM location, pointed by a32 the component of an array starting at. Address of 100, the index number of the component is 282 and a container of A3 is then automatically increased by one. So there’s a lot of work being down and a single instruction. Well, if I translate the amount of work being done and there’s no instruction insists out architecture into a risk. It’s going to be 5 instructions total. So here you can see the difference right so the same to complete the same job for assist only one instruction for risk their 5 instructions but.

Jishen Zhao 01:05:24
46 there’s1 instruction say will take. Test cycles well for risk because this completes a mental job. Maybe assuming it’s also take 10 cycles. But the CPI of this risk based machine is going to be. 10 over 5 gummy too. While the 6 that gonna be 10 over one is 10, so CPI of 6 double machine as. Higher then Restout Machine Wild instruction counts. Is reverse so you can see the trade off? And because of the trade off so there’s no single best or better design, SIS or risk so either way. These are design.

Jishen Zhao 01:06:16
Heather pros and cons and being used in different types of machines and both of them are useful in depending on what your requirement is. So you can read those debate in a post count between the 2. I’m going to go through them to spend the time to go through them if you’re interested just read them.

Jishen Zhao 01:06:42
As we’ve talk about. I see designers are directly interact with the compiler design, so compiler actually can improve the instruction, counting CPI. By certain optimizations, so for example, you can reduce the number of instructions instruction count by Optimized. During the compilation and there also Azure improvement. That compiler can do say like a producer branch and reduce the cache misses that we’re going to talk about that later they can. All improve this. Given certain improve the computer performance given certain eyes design. Hi so well that’s good for IC designers compatibility so comparability means that if you have a new version of icy develop later on for the same type of architecture say X86. I need to be compatible with the earlier versions. This is kind of important because, like for the same type of machine. People have been written a lot of programs already and. On top of earlier versions if you change to a new version. We need to completely RE implement all of the programs that’s just too much amount of work. So we need to ensure the comparability that so that the legacy code can still run on you.

Jishen Zhao 01:08:20
A version of Ice Age. Actually, almost there are the I say families ensure compatibility and as an example and tells X86 kind of architecture is really, really good compatibility support over versions. Well, so for compatibility probably think of very obvious is backward compatibility domains. The new version of isa design needs to be compatible with old version of Isaiah designs. But before I see design. We also need to ensure for compatibility so that means the design currently developed in terms of I say we need to ensure that in future.

Jishen Zhao 01:09:06
You can use the same set of basic structures, so something is ample for insuring for compatibility is that to create some redundancy so for X86 download. I say particular at each version actually have several overloaded non. Ops those noobs Akshaye will turn into certain operations and future versions. And there are some other tricks of reinsure compatibility and some of them.

Jishen Zhao 01:09:40
Sir Binary Translation, an emulation so money translation means that you actually statically transform a static image of your program and run it. Native while emulation is that dynamic so. Emulation is the dynamic binary translation so while the. Program is running as she does the binary translation online as an example is so perhaps you’ll be familiar with if you read program code and pay attention to the compilation operating options. So if some options that you have is dressing time compilation so those are an example over emulation. And you can also adopt virtualized says pay attention to this is not virtual machine, or virtual memory. You have the Nobel Virtualizes means that is the. Ads particularly for IC design to ensure compatibility. So we’re not getting too detailed those during class. If you’re interested. I can talk to me and we can I can give you a couple of examples? Alright and the last track over compatibility’s actually have a piece of hardware, so people actually do this do this before if you play games. For PlayStation 2 actually can support PlayStation one games, the reason? How they did it, they actually have her. Piece of hardware and there an the to execute those old instructions to support compatibility there.

Jishen Zhao 01:11:36
Alright so this is so far, we have learned and summary about. I say we have learned what is isa and then we talk about program execution model? Versus an instruction execution model so you should be clear for a program execution model. We care about how the program is compiled and then translate into machine code finally be able to run on a computer. While instruction execution model, we care about how one instruction is being executed. Orders stage steps an instruction goes through, and how to find identify one instruction. Using a PC. And we also talk about IC design goals programmability performers an compatibility. An after now we learn about PI level idea viruses. Some basic introductions, then we’ll be discussing more details and aspects of Isa. In the next.