

# EITF35: Introduction to Structured VLSI Design

## Introduction to FPGA design

Steffen Malkowsky

Steffen.Malkowsky@eit.lth.se

Slides from Chenxin Zhang



- What is FPGA?
  - Field Programmable Gate Array

Configurable logic blocks

Configuration memory



Interconnects

IO blocks



- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market





#### **Design Starts**



Source: VLSI Research, Inc.

- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market
- Where do we use it?
  - Prototyping
  - Computer vision
  - Medical imaging
  - Software-defined radio
  - **–** ...





# FPGA vs. Microprocessor

|                              | Intel Henium 2                                   | Viliar Vintor II Duo            |  |  |  |
|------------------------------|--------------------------------------------------|---------------------------------|--|--|--|
|                              | Intel Itanium 2                                  | Xilinx Virtex-II Pro (XC2VP100) |  |  |  |
| Technology                   | 0.13 μm                                          | 0.13 μm                         |  |  |  |
| Clock speed                  | 1.6 GHz                                          | 180 MHz                         |  |  |  |
| Internal memory bandwidth    | 102 GBytes/S                                     | 7.5 TBytes/S                    |  |  |  |
| # Processing units           | 5 FPU (2 MACs+1 FPU)<br>6 MMU<br>6 Integer units | 212 FPU or 300+Integer units or |  |  |  |
| Power consumption            | 130 W                                            | 15 W                            |  |  |  |
| Peak performance             | 8 GFLOPs                                         | 38 GFLOPs                       |  |  |  |
| Sustained performance        | ~2GFLOPs                                         | ~19 GFLOPs                      |  |  |  |
| IO/External memory bandwidth | 6.4 GBytes/S                                     | 67 GBytes/S                     |  |  |  |

(Courtesy: Nallatech)

Department of Electrical and Information Technology, Lund University

#### **FPGA** devices

- Manufactures:
  - Xilinx: Virtex, Kintex, Artix, Spartan
  - Altera: Cyclone, Arria, Stratix
  - Lattice Semiconductor: flash, low power
  - Microsemi (Actel): antifuse, mix-signal
  - Achronix: high speed
  - QuickLogic: application-specific (handheld)



















#### Some FPGA boards

- ERICSSON F500
- Xilinx Virtex-5 OpenSPARC Evaluation Platform
   http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,795&Prod=XUPV5
- Xilinx Virtex-II Pro Development System
   <a href="http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,794&Prod=XUPV2P">http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,794&Prod=XUPV2P</a>
- We use Xilinx Nexys 4 Artix-7 FPGA board in this course!

http://store.digilentinc.com/nexys-4-artix-7-fpga-trainer-board-limited-time-see-nexys4-ddr/

## Xilinx Nexys 4 Artix-7





#### **FPGA** architectures

- Early FPGAs
  - N x N array of unit cells (CLB + routing)
  - Special routing along center axis
- Next Generation FPGAs
  - M x N unit cells
  - Small block RAMs around edges
- More recent FPGAs
  - Added block RAM arrays
  - Added multiplier cores
  - Added processor cores







#### **FPGA** architecture trends

- Memories
  - Single & Dual-port RAMs
- Digital Signal Processor Engines
- Embedded Processors
  - Hardcore (dedicated processors)
  - Soft core (synthesized from a HDL)
- High speed/performance I/O connectivity
  - PCIe interface block
  - I/O transceiver
- Clock management blocks



## **Programming technology**

| Feature              | SRAM                       | Antifuse                       | Flash/E2PROM                   |  |  |
|----------------------|----------------------------|--------------------------------|--------------------------------|--|--|
| Technology           | State-of-the-art           | One or more generations behind | One or more generations behind |  |  |
| Reprogrammable       | Yes<br>(in system)         | No                             | Yes (in system or offline)     |  |  |
| Reprogramming speed  | Fast                       |                                | 3x slower than SRAM            |  |  |
| Volatile             | Yes                        | No                             | No                             |  |  |
| Instant-on           | No                         | Yes                            | Yes                            |  |  |
| Security             | Acceptable                 | Very Good                      | Very Good                      |  |  |
| Size of Config. Cell | Large<br>(Six transistors) | Very small                     | Medium-small (Two transistors) |  |  |
| Power consumption    | Medium                     | Low                            | Medium                         |  |  |

#### Xilinx FPGA architecture

#### SRAM-based FPGA



## Configurable logic block (CLB) (I)

One CLB contains four slices



## Configurable logic block (CLB) (II)

- One CLB contains four slices
- Each slice:
  - Four Look-up tables (LUTs)
  - Eight D Flip-Flops (DFFs)
  - Multiplexers and arithmetic gates
  - Carry logic
- Left-hand slice (SLICEM)
  - Distributed RAM
  - Shift register





## Look-up table (LUT) (I)



## Look-up table (LUT) (II)

- Inputs are used as a pointer into a LUT.
- Decoded using a hierarchy of transmissiongate MUXs.
- Transmission-gate: "pass" or "high-impedance".



## LUT based RAM (Distributed RAM)

- Normal LUT performs "read" operation.
- For "write" operation, address decoders + write enable.
- Can be concatenated to created larger RAMs.
- Can also be used as shift registers (some of the LUTs).



### Xilinx Artix-7 FPGAs

#### XC7A100T:

- 15,850 Slices (each containing four LUTs and 8 flip-flops)
- 240 DSP slices
- Max Block RAM: 4,860 Kb, Max Distributed RAM: 1,188 Kb

| Device   | Logic<br>Cells | Configurable Logic Blocks<br>(CLBs) |                                | DSP48E1               | Block RAM Blocks(3) |       |             |                     |                     | XADC | Total I/O | Max User             |                    |
|----------|----------------|-------------------------------------|--------------------------------|-----------------------|---------------------|-------|-------------|---------------------|---------------------|------|-----------|----------------------|--------------------|
|          |                | Slices <sup>(1)</sup>               | Max<br>Distributed<br>RAM (Kb) | Slices <sup>(2)</sup> | 18 Kb               | 36 Kb | Max<br>(Kb) | CMTs <sup>(4)</sup> | PCIe <sup>(5)</sup> | GTPs | Blocks    | Banks <sup>(6)</sup> | I/O <sup>(7)</sup> |
| XC7A12T  | 12,800         | 2,000                               | 171                            | 40                    | 40                  | 20    | 720         | 3                   | 1                   | 2    | 1         | 3                    | 150                |
| XC7A15T  | 16,640         | 2,600                               | 200                            | 45                    | 50                  | 25    | 900         | 5                   | 1                   | 4    | 1         | 5                    | 250                |
| XC7A25T  | 23,360         | 3,650                               | 313                            | 80                    | 90                  | 45    | 1,620       | 3                   | 1                   | 4    | 1         | 3                    | 150                |
| XC7A35T  | 33,280         | 5,200                               | 400                            | 90                    | 100                 | 50    | 1,800       | 5                   | 1                   | 4    | 1         | 5                    | 250                |
| XC7A50T  | 52,160         | 8,150                               | 600                            | 120                   | 150                 | 75    | 2,700       | 5                   | 1                   | 4    | 1         | 5                    | 250                |
| XC7A75T  | 75,520         | 11,800                              | 892                            | 180                   | 210                 | 105   | 3,780       | 6                   | 1                   | 8    | 1         | 6                    | 300                |
| XC7A100T | 101,440        | 15,850                              | 1,188                          | 240                   | 270                 | 135   | 4,860       | 6                   | 1                   | 8    | 1         | 6                    | 300                |
| XC7A200T | 215,360        | 33,650                              | 2,888                          | 740                   | 730                 | 365   | 13,140      | 10                  | 1                   | 16   | 1         | 10                   | 500                |

## Programmable Interconnects (I)



## **Programmable Interconnects (II)**

- Programmable swich, also called programmable interconnect points (PIP).
- Implemented using transmission gates.
- Several types of PIPs:







## **FPGA Design flow**

- Synthesis
  - Parses HDL design
  - Infers Xilinx primitives
  - Generates design netlist
- Translate
  - Merges incoming netlists and constraints into a design file
- Map
  - Maps (places) design into the available resources on the target device
- Place and Route
  - Places and routes design



## Synthesis constraints







## **Are FPGAs perfect?**



#### **FPGAs** are inefficient

- Compared to ASICs, penalties in FPGAs:
  - Area: 17 54x
  - Speed: 3-7x
  - Power: 6 62x
- Main culprit: INTERCONNECT!





# Coarse-grained reconfigurable architecture

- Currently in FPGA
  - Dedicated building blocks: multiplier, DSP core, processor
  - Partial configuration
- Moving torwards coarse-grained architecture:
  - Block-level instead of bit manipulations
  - Lower area and power consumption
  - High-level programming: e.g. xilinx vivado
  - Run-time configuration



#### References

- Clive "Max" Maxfield, "The Design Warrior's Guide to FPGAs – Devices, Tools and Flows", ELSEVIER, 2004.
- Bill Jason P. Tomas, "Introduction to Field Programmable Gate Arrays (FPGAs)".
- Xilinx, "Artix-7 FPGA Family Data Sheet".

