EITF20: Computer Architecture
Part1.1.1: Introduction

Liang Liu<br>liang.liu@eit.Ith.se

## Course Factor

$\square$ Computer Architecture (7.5HP)

## http://www.eit.lth.se/kurs/eitf20

$\square$ EIT's Course Service Desk (studerandeexpedition)

- Course secretary: Anne Andersson, Room 3152B
- e-mail: anne.andersson@eit.Ith.se

Datorsystemimplementering


## Outline

$\square$ Computers
$\square$ Computer Architecture
$\square$ This Course
$\square$ Trends
$\square$ Performance
$\square$ Quantitative Principles


## Computer is everywhere



## Build a Computer...

Desktop computer

| Part | Price (SEK) | Example (2012-10-02) | Price |
| :---: | :---: | :---: | :---: |
| Case |  | CD-DVD | 490 |
| Power supply | Power | $\xrightarrow{\square}$ | 399 |
| Motherboard |  | $\square$ | 790 |
| CPU |  |  | 2490 |
| Memory |  |  | 698 |
| Disk | RAM $=$ | 3D | 790 |
| DVD/Blue-ray | o Card | $d$ Disk drive) | 590 |
| Graphics |  |  | - |
| Sound, net, ... | and Card |  | - |
| Keyboard, mouse, cables, |  |  | ? |

$\sum=3000-5000-10000$ SEK

Power Consumption: 65 to 250 watts

## Build a Computer...



3965 I lager för leverans inom 1
arbetsdagar
Tidigare $347,75 \mathrm{kr}$
298,19 kr
Pris (ex. moms) Each


## Build a Computer...



Digilent FPGA
Zybo Zynq-7000 ARM/FPGA SoC Trainer Board $\$ 189.00$

SKU:
410-279
Quantity:

$$
1
$$



## Build a Computer...

## CC2640 Bluetooth low energy



## Quick Facts

## Ultra-low Power Consumption

- $65 \mu \mathrm{~A} / \mathrm{MHz}$ ARM Cortex M3
- $8.2 \mu \mathrm{~A} / \mathrm{MHz}$ Sensor Controller
- $0.7 \mu \mathrm{~A}$ sleep with retention and RTC
- 5.9 mA RX (single-ended)
- 6.5 mA TX (single-ended)


## SoC Key Features

- Autonomous sensor controller engine
- $4 \times 4 \mathrm{~mm}$ to $7 \times 7 \mathrm{~mm}$ QFN
- $1.65-3.8 \mathrm{~V}$ supply range
- 128 kB Flash +8 kB Cache
- 20 kB RAM


## RF Key Features

- +5 dBm output power
- -97 dBm sensitivity
- $2360 \mathrm{MHz}-2500 \mathrm{MHz}$
- Pin compaúble with CCI $3 \times x$ min $4 \times 4$ and $5 \times 5$ QFN (BLE + Sub 1GHz prop)
\$ 2.98


## Build a Computer...



## 80 Megawatts



## Class of Computers

| Feature | Personal mobile device (PMD) | Desktop | Server | Clusters/warehousescale computer | Embedded |
| :---: | :---: | :---: | :---: | :---: | :---: |
| Price of system | \$100-\$1000 | \$300-\$2500 | \$5000-\$10,000,000 | \$100,000-\$200,000,000 | \$10-\$100,000 |
| Price of microprocessor | \$10-\$100 | \$50-\$500 | \$200-\$2000 | \$50-\$250 | \$0.01-\$100 |
| Critical <br> system design issues | Cost, energy, media performance, responsiveness | Priceperformance, energy, graphics performance | Throughput, availability, scalability, energy | Price-performance, throughput, energy proportionality | Price, energy, application-specific performance |

## Intel v.s. ARM

## ARM is Pervasive and Open




## IoT - ARM




## Time-line

$\square$ Mid-1800 Programmable computer

- Charles Babbage (analytical engine)
- Ada Lovelace (programmer)
$\square 1940$ sirst modern computers
- Zuse, MARK, ENIAC, ...
$\square 1960$ s Mainframe
- 1964 IBM System/360
-1970s Minicomputer
- 1971 First microprocessor



## Time-line

$\square 1980 s$ Desktop

- 1977 Apple II
- 1981 IBM PC
-1990s PDA
$\square 2000$ s Embeded computers
$\square 2010 s$ Cloud computing
$\square 2020$ s Boundless computing, Edge computing



## The first electronic computer



ENIAC-1946 18000 vacuum tubes, 30 ton, 150 m $^{2}$, 140kW


## The first electronic computer



ENIAC-1946
18000 tubes, 30 ton, 150m² ,140kW

## The first electronic computer



ENIAC-1946
18000 tubes, 30 ton, $150 \mathrm{~m}^{2}$, 140kW

The first electronic computer

## "I think there is a world market for maybe five computers." <br> -- Thomas Watson, chairman of IBM, 1943

"Computers in the future may weigh no more than 1.5 tons."
-- Popular Mechanics, forecasting the relentless march of science, 1949

## "640K ought to be enough for anybody." -- Bill Gates, 1981

## ENIAC-1946 <br> 18000 tubes, 30 ton, $150 \mathrm{~m}^{2}$,140kW

## Interlude: The imitation game




## Interlude: Alan Turing




## Interlude: Debug

In 1947, Rear Admiral Grace Murray Hopper and associates was working on Mark II, the machine was experiencing problems. An investigation showed that there was a moth trapped in a relay. The operators removed the moth and affixed it to the log. The computer had been "debugged".



## Development of Microprocessor

|  | Year | Transistors | Frequency | cores | Cache |
| :---: | :---: | :---: | :---: | :---: | :--- |
| Intel4004 | 1971 | 2300 | 108 kHz | $" 1 "$ | None |
| Z80 | 1976 | 8500 | 2.5 MHz | 1 | None |
| Intel386 | 1985 | 280000 | 16 MHz | 1 | None |
| Intel486 | 1989 | 1185000 | $20-50 \mathrm{MHz}$ | 1 | 8 kB |
| Pentium 4 | 2000 | 44000000 | $1-2 \mathrm{GHz}$ | 1 | 256 kB |
| Nehalem | 2008 | 731000000 | $>3.6 \mathrm{GHz}$ | 4 | 8 MB |
| Sandy Bridge | 2011 | 995000000 | 3.8 GHz | $4+$ | $8+1 \mathrm{MB}$ |
| Haswell | 2013 | 1860000000 | $>3.6 \mathrm{GHz}$ | 6 | $15+1.5 \mathrm{M}$ |
| Itanium 9560 | 2012 | 3100000000 | 2.5 GHz | 8 | $32+6 \mathrm{MB}$ |



## Outline

## $\square$

$\square$ Computer Architecture
$\square$
$\square$
$\square$
ロ


# The art of designing computers is based on engineering principles and quantitative performance evaluation 

## Computer abstraction levels

| $\begin{aligned} & \frac{0}{0} \\ & 0 \\ & \sum_{0}^{0} \\ & i \end{aligned}$ |  |  | plication |  |
| :---: | :---: | :---: | :---: | :---: |
|  | Assembler | Operating System |  |  |
|  | Linker | Loader | Scheduler | Device Drivers |
| Instruction Set Architecture (Interface SW/HW) |  |  |  |  |
|  | Processor |  | Memory | I/O System |
|  | Datapath \& Control Design |  |  |  |
|  | Digital Logic Design |  |  |  |
|  | Circuit Design |  |  |  |
|  | Physical (IC Layout) Design |  |  |  |

## Computer Architecture

Computer architecture is a set of disciplines that describe the functionality, organization and implementation of computer systems.
-ISA: Instruction-set architecture
$\square$ Computer orginization: micro architecture
$\square$ Specific implementation


## ISA

An instruction set architecture (ISA) is the interface between the computer's software and hardware and also can be viewed as the programmer's view of the machine.

MIPS32 Add Immediate Instruction

| 001000 | 00001 | 00010 | $\mathbf{0 0 0 0 0 0 0 1 0 1 0 1 1 1 1 0}$ |
| :--- | :--- | :--- | :--- |
| OP Code | Addr 1 | Addr 2 | Immediate value |

Equivalent mnemonic: addi $\$ 1, \$ \mathrm{r} 2,350$



## Microarchitecture

Microarchitecture is the way a given instruction set architecture (ISA) is implemented on a processor.


## Microarchitecture

Microarchitecture is the way a given instruction set architecture (ISA) is implemented on a processor.


Intel Care 2 Arcritecture

## Microarchitecture

## ARM ${ }^{\circledR}$ Cortex $^{\circledR}$-A72

ARM CoreSight ${ }^{\text {T" }}$ Multicore Debug and Trace


128-bit AMBA 4 ACE or AMBA 5 CHI Coherent Bus Interface

## Implementation




## The role of computer architecture?

Make design decisions across the interface between hardware and software in order to meet functional and performance goals.


## Why computer architecture?

$\square$ Understand how to evaluate and choose

- What do we mean "one computer is faster than another"?
- How can Gene Amdahl help you decide which enhancement is the best?
- Is a larger cache better than higher clock frequency?
- Why is pipelining faster than combinational circuits?
- Different levels of caches - why?
$\square$ Design your own specialized architecture
- Embedded special purpose processors
$\square$ Axis Communications/Ericsson/Nokia/ARM/SAAB -...
$\square$ Write better program


## What computer architecture?

$\square$ Design and analysis

- ISA
- Orgnization (microarchitecture)
- Implementation
$\square$ To meet requirements of
- Functionality (application, standards...)
- Price
- Performance
- Power
- Reliability
- Dependability
- Compatability
- ..


## What affect computer architecture?

Technology

Software


## X86 Architecture



## Architecture change due to new applications



## Architecture change due to new applications




## Outline

ㅁ
ㅁ

## $\square$ This Course

$\square$
$\square$
$\square$

Lund University / EITF20/ Liang Liu 2016


## Course Objectives

## After this course, you will...

$\square$ Have a thorough knowledge about the design principles for modern computer systems
$\square$ Have an understanding of the relations between

- The design of the instruction set of a processor
- The microarchitecture of a processor
$\square$ Be able to evaluate design alternatives towards design goals using quantitative evaluation methods
$\square$ Side effects...
- Better digital IC designer
- Better understanding of compiler, operating system, highperformance programming


## Book Recommendation

## -Computer Architecture - A Quantitative Approach

- Hennessy, Patterson
- $5^{\text {th }}$ Edition




## Course Content \& Schedule


$\square$ Overview
$\square$ Instruction set architecture
$\square$ Pipeline
$\square$ Memory System
$\square$ Storage System
$\square$ I/Os
$\square$ Multiprocessor

## Teachers

## - Lecture

- Liang Liu, Associate Professor
- Email: liang.liu@eit.Ith.se
- Room: E2342
- Homepage: http://www.eit.|th.se/staff/Liang.Liu
$\square$ Teaching Assistants
- Mojtaba Mahdavi
- Steffen Malkowsky



## Lectures and Labs

$\square$ Lectures (10)

- Tuesday : 13:15-15:00 E:B (V:B)
- Thursday: 08:15-10:00 E:1406 (E:B)
- Covers design principles and analysis methodology
- Read the literature before each lecture
- Does not cover all of the literature
- Ask many questions!
$\square$ Labs (4)
- Tuesday: 08:15-12:00 E:4118-E:4119
- Friday: 08:15-12:00 (except for last one) E:4118-E:4119
- 2 students/group
- Read manual and literature before the lab
- Do Home Assignments before lab (or be sent home)
- Experiment and discuss with assistants
- Understand what you have done (or FAIL the exam)
- Finish Lab before DEADLINE


## Examination (Written)

$\square$ Anonymous exam
$\square$ Pass all labs to be able to attend written exam $\square$ Five problems

- Highly lab related
- Problem solving
- Descriptive nature



## Questions?

## Outline

$\square$
$\square$
$\square$

## $\square$ Trends

ロ

■


## Moore's Law

The experts look ahead

Cramming more components onto integrated circuits

With unit cost falling as the number of components per circuit rises, by 1975 economics may dictate squeezing as many as $\mathbf{6 5 , 0 0 0}$ components on a single silicon chip

By Gordon E. Moore
Drrector, Research and Dovolopmort Laboratorioes
division of Fairchild Camera and Instrument Corp.
division of Fairctilid Camera and instrument Corp.


$\square$ Electronics, Apr. 19, 1965 Gordon Moore (co-founder of Intel) described a doubling every year in the number of components per integrated circuit



## Moore's Law

Moore reformulates to a doubling every 2 years. (1975) Interview 2000:
"...change the doubling time again... to maybe four or five years."


Gordon Moore Co-founder of Intel

ca. 1 billion transistors in 2007


## Performance of Microprocessor



## Does not Apply to All

$\square$ Processing power doubles every 18 months
$\square$ Memory size doubles every 18 months
$\square$ Disk capacity doubles every 18 months
$\square$ Disk positioning rate (seek \& rotate) doubles every ten years!
$\square$ Speed of DRAM and disk improves a few \% per year



## Moore's Law: power density

Power Consumption, w (Burn)


Pentium IV chip area (in 130 nm technology)
$1.3 \mathrm{~cm}^{2}$

This gives about $100 \mathrm{~W} / \mathrm{cm}^{2}$ that needs to be transported away (cooling)

## Comparison: This little thing operates at about $10 \mathrm{~W} / \mathrm{cm}^{2}$.



## Moore's Law: power density




## Outline

$\square$
$\square$
$\square$
$\square$

## $\square$ Performance

■

## Performance

$$
\operatorname{Performance}(X)=\frac{1}{T_{\text {exe }}(X)}
$$

" $X$ is $n$ times faster than $Y$ " means:

$$
\frac{T_{\text {exe }}(Y)}{T_{\text {exe }}(X)}=\frac{\operatorname{Performance}(X)}{\operatorname{Performance}(Y)}=n
$$

## How to define execution time?

## Performance



MIPS = millions of instructions per second MFLOPS = millions of FP operations per second

## Program to evaluate performance

$\square$ Real programs: e.g. TeX, spice, SPEC benchmarks, ...
$\square$ Kernels - small, key pieces of real applications
$\square$ Toy programs - sort, prime number generation

- Something 100 -line programs
$\square$ Synthetic benchmarks - "The average program"
- Fake programs to mathc the behaviour of real applications
- Real programs are the only true measurement objects
$\square$ SPEC benchmarks will be used here (plus some toy programs )
- Real programs modified to be portable and to minimize the effect of IO


## Which Computer is Faster?

| Execution time |  |  |  |
| :--- | ---: | ---: | ---: |
| Computer | A | B | C |
| Program P1 | 1 | 10 | 20 |
| Program P2 | 1000 | 100 | 20 |
| Total time | 1001 | 110 | 40 |

$\square A$ is 10 times faster than $B$ for $P 1$
$\square B$ is 10 times faster than $A$ for P2
$\square A$ and $B$ are faster than $C$ for P1
$\square C$ is faster than $A$ and $B$ if both $P 1$ and $P 2$ are run

## Which Computer is Faster?

| Execution time |  |  |  |
| :--- | ---: | ---: | ---: |
| Computer | A | B | C |
| Program P1 | 1 | 10 | 20 |
| Program P2 | 1000 | 100 | 20 |
| Total time | 1001 | 110 | 40 |

- Arithmetic mean of execution time: $\frac{\sum T_{i}}{n}$ or weighted execution time $\frac{\sum W_{i} * T_{i}}{n}$


## Outline

$\square$
$\square$
$\square$
$\square$
$\square$
$\square$ Quantitative Principles

## Quantitative Principles

## $\square$ This is intro to design and analysis

- Take advantage of parallelism
$\square$ ILP, DLP, TLP, ...
- Principle of locality
$\square 90 \%$ of execution time in only $10 \%$ of the code
- Focus on the common case
$\square$ In makeing a design trade-off, favor the frquent case ove the infrequent case
- Amdahl's Law
$\square$ The performance improvement gained from uisng faster mode is limited by the fraction of the time the faster mode can be used
- The Processor Performance Equation


## Amdahl's Law

Enhancement $E$ accelerates a fraction $F$ of a program by a factor $S$


Speedup due to enhancement E :
$\operatorname{Speedup}(E)=\frac{T_{\text {exe }}(\text { without } E)}{T_{\text {exe }}(\text { with } E)}=\frac{\text { Performance }(\text { with } E)}{\text { Performance }(\text { without } E)}$
$T_{\text {exe }}($ with $E)=T_{\text {exe }}($ without $E) *[(1-F)+F / S]$
$\operatorname{Speedup}(E)=\frac{T_{\text {exe }}(\text { without } E)}{T_{\text {exe }}(\text { with } E)}=\frac{1}{(1-F)+F / S}$
Best you could ever hope to do:

$$
\text { Speedup }_{\text {maximum }}=\frac{1}{\left(1-\text { Fraction }_{\text {enhanced }}\right)}
$$

## Amdahl's Law: example

$\square$ New CPU is 10 times faster!
$\square 60 \%$ for I/O which remains almost the same...

$$
\begin{aligned}
\text { Speedup }_{\text {overall }} & =\frac{1}{\left(1-\text { Fraction }_{\text {enhanced }}\right)+\frac{\text { Fraction }_{\text {enhanced }}}{\text { Speedup }_{\text {enhanced }}}} \\
& =\frac{1}{(1-0.4)+\frac{0.4}{10}}=\frac{1}{0.64}=1.56
\end{aligned}
$$

Apparently, its human nature to be attracted by 10X faster, vs. keeping in perspective its just 1.6X faster

## Amdahl's Law: example

Amdahl's Law


## Aspect of CPU performance

CPUtime $=$ Execution time $=$ seconds/program =


|  | IC | CPI | $T_{C}$ |
| :--- | :---: | :---: | :---: |
| Program | X |  |  |
| Compiler | X | (X) |  |
| Instr. Set | X | X |  |
| Organization |  | X | X |
| Technology |  |  | X |

## Instructions are not created equally

"Average Cycles per Instruction"

CPI ${ }_{o p}=$ Cycles per Instruction of type op
$I C_{o p}=$ Number of executed instructions of type op

$$
\text { CPUtime }=T_{c} * \sum\left(C P I_{o p} * I C_{o p}\right)
$$

"Instruction frequency"

$$
\overline{C P I}=\sum\left(C P I_{o p} * F_{o p}\right) \text { where } F_{o p}=I C_{o p} / I C
$$



## Average CPI: example

| Op | $F_{o p}$ | $C P I_{o p}$ | $F_{o p} * C P I_{o p}$ | \% time |
| :--- | :---: | :---: | :---: | :---: |
| ALU | $50 \%$ | 1 | 0.5 | $(33 \%)$ |
| Load | $20 \%$ | 2 | 0.4 | $(27 \%)$ |
| Store | $10 \%$ | 2 | 0.2 | $(13 \%)$ |
| Branch | $20 \%$ | 2 | 0.4 | $(27 \%)$ |

$\overline{C P I} \quad=\quad 1.5$

Invest resources where time is spent!

