

# EITF35: Introduction to Structured VLSI Design

### Part 3.1.1: FSMD

Liang Liu liang.liu@eit.lth.se



Lund University / EITF35/ Liang Liu 2016

### **Digital VLSI**

#### Lecture 2.2.1

- Data storage
- Setup & hold timing

Z

No Latch





### Outline

#### **FSMD** Overview

- Algorithmic state machine with data-path (ASMD)
- **Germitary Series and Series and**
- **Timing analysis of FSMD**



### Why FSMD? Start with algorithm

Task: sums four elements of an array, divides the sum by 8 and rounds the result to the closest integer

```
size = 4
sum = 0;
for i in (0 to size-1) do {
    sum = sum + a(i);}
q = sum / 8;
r = sum rem 8;
if (r > 3) {
    q = q+1;}
outp = q;
```

#### Two characteristics of an algorithm:

- Use of variables
   e.g., sum, or q = q + 1
- Sequential execution

e.g., sum must be finished before division



### **Converting algorithm to hardware**

#### "Dataflow" implementation in VHDL

5

Convert the algorithm in to combinational circuit





### **Dataflow Implementation: Drawbacks**

#### Problems with dataflow implementation:

- Can only be applied to simple trivial algorithm
- Not flexible
  - What if size=10, 100, 1000 ...
  - □ or size = n, i.e., size is determined by an external input
  - or changing operation depending on instructions



### **Alternatively?**

Hardware resembles the variable and sequential execution model

• Use **register** to store *intermediate data* and imitate *variable* 

e.g. sum=sum+a => sum\_reg+a\_reg->sum\_reg

Basic format of <u>RT operation</u>

 $r_{\text{dest}} \leftarrow f(r_{\text{src1}}, ..., r_{\text{srcn}})$ 

• Sequence of data manipulation and transfer among registers (RTL)





### **RT Operation: Timing**

$$\mathbf{r}_{dest} \leftarrow f(\mathbf{r}_{src1}, \mathbf{r}_{src2}, \dots, \mathbf{r}_{srcn})$$

#### **Timing**:

- Hardware! major difference between a variable and a register is that a explicit clock is embedded in an RT operation
- Rising edge of clk: outputs of source reg  $r_{src1}$   $r_{src2}$  etc. are available
- The output are passed to a combinational circuit that performs f()
- At the **NEXT** rising edge of the clock, the result is stored into r<sub>dest</sub>





### **RT Operation: Timing**



 $\boldsymbol{\varsigma}$ 

C

### Hardware Mapping of RT: Example 1

#### □ E.g. r1 ← r1+r2

- C1: r1\_next<=r1\_reg+r2\_reg
- C2: r1\_reg<=r1\_next</li>



### Hardware Mapping of RT: Example 1

#### □ E.g. r1 ← r1+r2

- C1: r1\_next<=r1\_reg+r2\_reg
- C2: r1\_reg<=r1\_next</li>



### Hardware Mapping of RT: Example 2

#### Multiple RT operations

How can we organize multiple operations on one register (in a time-multiplexing way)?



### **MIPS** pipeline



\* S

C

### **FSM as Control Path**

#### **FSMD: FSM with data path**

- Use a data path to realize all the required RT operations
- Use a control path (FSM) to specify the order of RT operation





### FSMD (FSM with Date Path)



C

### FSMD (FSM with Date Path)

#### **Control Path: FSM**

- •Command: the external command signal to the FSMD
- •Internal status: signal from the data path.
- •Control signal: output, used to control data path operation.
- External status: output, used to indicate the status of the FSMD



### FSMD (FSM with Date Path)



#### **Data Path:** perform all the required RT operations

- Data registers: store the intermediate results.
- •Functional units: perform RT operations
- Routing circuit: connection, selection (multiplexers)



### Outline

### Overview of FSMD

#### **Algorithmic state machine with data-path (ASMD)**

# FSMD design of a repetitive-addition multiplierTiming analysis of FSMD



### ASM (algorithmic state machine)

#### **ASM** (algorithmic state machine) chart

•Flowchart-like diagram, provide the same information as an FSM

•More descriptive, better for complex algorithm

•Can easily be transformed to VHDL code



### State Diagram and ASM Chart: Example 1





### **State Diagram and ASM Chart: Example 2**

21



### ASMD

#### **ASMD**:

Extend ASM chart to incorporate RT operations

RT operations are treated as another type of activity and be placed where the output signals are used

S0: r1←r1+1 if a>b r2←r2+a else r2←r2+b





### **ASMD: Timing**

Value is available at the register input before the NEXT clock tick

Z



## Suggestion: use meaningful names for your signals (direction\_function\_type)

### Outline

Overview of FSMD
Algorithmic state machine with data-path (ASMD)
FSMD design of a repetitive-addition multiplier
Timing analysis of FSMD



### Map Algorithm to FSMD

### **Example: Repetitive addition multiplier**

□ Basic algorithm: 7\*5 = 7+7+7+7

```
if (a_in=0 or b_in=0) then {
if (a_in=0 or b_in=0) then {
                                                r = 0; \}
   r = 0;
                                             else {
else {
                                                a = a_{in};
   a = a_{in};
                                                n = b_{in};
   n = b_{in};
                                                r = 0;
   r = 0;
                                      op:
                                            r = r + a;
                                               n = n - 1;
   while (n != 0){
                                                if (n = 0) then {
      r = r + a;
                                                   goto stop;}
      n = n-1;
                                                else {
}
                                                  goto op;}
return(r)
                                             }
                                      stop: return(r);
```

#### Pseudo code

#### **ASMD-friendly code**



### **ASMD** Chart

#### □Input:

•a\_in, b\_in: 8-bit unsigned

#### •clk, reset

start: command

#### **Output:**

•r: 16-bit unsigned

ready: ready for new input

### ASMD chart

- •3 registers (n,a,r)
- 4 states
- Data-path: RT operations
- •FSM: state transition

#### **Translate ASMD to Hardware**



### **Construction of FSMD**

#### Construction of the data path

- List all possible RT operations
- Group RT operation according to the destination register
- Add combinational circuit/mux
  - RT operations with the r register:
    - $r \leftarrow r$  (in the idle state)
    - $r \leftarrow 0$  (in the load and abo states)
    - $\mathbf{r} \leftarrow \mathbf{r} + \mathbf{a}$  (in the op state)
  - RT operations with the n register:
    - $n \leftarrow n$  (in the idle state)
    - $n \leftarrow b_{in}$  (in the load and ab0 states)
    - $n \leftarrow n 1$  (in the op state)
  - RT operations with the a register:
    - $\mathbf{a} \leftarrow \mathbf{a}$  (in the idle and op states)
    - $= \mathbf{a} \leftarrow \mathbf{a}_{in}$  (in the load and ab0 states)

### Grouping RT Operations

### **Construction of the Date Path**

#### Circuit associated with r register



٠

### **Construction of the Date Path**



### **Construction of the Control Path**

#### □Input of FSM



### **VHDL Follow the Block Diagram**

```
Entity
```

```
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
entity seq_mult is
   port (
      clk, reset: in std_logic;
      start: in std_logic;
      a_in, b_in: in std_logic_vector(7 downto 0);
      ready: out std_logic;
      r: out std_logic_vector(15 downto 0)
      );
end seq_mult;
```

### **FSM** (state registers)



### FSM (next-state/output logic)



### Data Path (Data Registers)

```
--- data path: data register
process (clk, reset)
begin
   if reset='1' then
      a_reg <= (others=>'0');
      n_reg <= (others => '0');
      r_reg <= (others => '0');
   elsif (clk'event and clk='1') then
      a_reg <= a_next;
      n_reg <= n_next;
      r_reg <= r_next;</pre>
   end if;
end process;
```





### Data Path (Multiplexer Routing)

idle

ab0 load

00

idle

ab0 load

op

idle

ab0

load

out

-1



### **Design Flow**





### Outline

Overview of FSMD
Algorithmic state machine with data-path (ASMD)
FSMD design of a repetitive-addition multiplier
Timing analysis of FSMD



### **Timing and Performance of FSMD**

#### Maximal clock rate

•More difficult to analyze because of *two interactive loops*, depend on the specified design

•The boundary of the clock rate can be found, *best/worse-case* 



N7

### **Clock-Rate Boundary: Best-Case**

#### Best-case scenario:

- •Control signals needed at late stage
- •Status signal available at early stage





### **Clock-Rate Boundary: Best-Case**

#### Best-case scenario

- •Output logic overlaps with the data path, no extra delay
- •Next-state logic of the control path and the data path are in parallel
- •Clock rate dominates by the data-path (most of the case)



41 Lund University / EITF35/ Liang Liu 2016

### **Clock-Rate Boundary: Best-Case**

#### Best-case scenario

- •Output logic overlaps with the data path, no extra delay
- •Next-state logic of the control path and the data path are in parallel
- •Clock rate dominates by the data-path (most of the case)



7

42 Lund University / EITF35/ Liang Liu 2016

### **Clock-Rate Boundary: Worst-Case**

#### Worst-case scenario:

•Control signals needed at early stage

•Status signal available at late stage



# Thanks!



Lund University / EITF35/ Liang Liu 2016

### **Clock-Rate Boundary: Worst-Case**

#### Worst-case scenario

- •The data path must wait for the FSM to generate the output signals
- •The control path must wait for status signals to generate the next-state
- •Clock period includes the delays of all combinational components



### **Clock-Rate Boundary: Worst-Case**

#### Worst-case scenario

- •The data path must wait for the FSM to generate the output signals
- •The control path must wait for status signals to generate the next-state
- •Clock period includes the delays of all combinational components

