

# EITF35: Introduction to Structured VLSI Design

# Introduction to FPGA design

Steffen Malkowsky

Steffen.Malkowsky@eit.lth.se

Slides from Chenxin Zhang



- What is FPGA?
  - Field Programmable Gate Array



- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market





- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market
- Where do we use it?
  - Prototyping
  - Computer vision
  - Medical imaging
  - Software-defined radio

#### **FPGA vs. Microprocessor**

|                                 | Intel Itanium 2                                  | Xilinx Virtex-II Pro<br>(XC2VP100)     |  |  |
|---------------------------------|--------------------------------------------------|----------------------------------------|--|--|
| Technology                      | 0.13 µm                                          | 0.13 µm                                |  |  |
| Clock speed                     | 1.6 GHz                                          | 180 MHz                                |  |  |
| Internal memory bandwidth       | 102 GBytes/S                                     | 7.5 TBytes/S                           |  |  |
| # Processing units              | 5 FPU (2 MACs+1 FPU)<br>6 MMU<br>6 Integer units | 212 FPU or<br>300+Integer units or<br> |  |  |
| Power consumption               | 130 W                                            | 15 W                                   |  |  |
| Peak performance                | 8 GFLOPs                                         | 38 GFLOPs                              |  |  |
| Sustained performance           | ~2GFLOPs                                         | ~19 GFLOPs                             |  |  |
| IO/External memory<br>bandwidth | 6.4 GBytes/S                                     | 67 GBytes/S                            |  |  |
|                                 |                                                  | (Courtesy: Nallatech)                  |  |  |

## **FPGA** devices

- Manufactures:
  - Xilinx: Virtex, Kintex, Artix, Spartan
  - Altera: Cyclone, Arria, Stratix
  - Lattice Semiconductor: flash, low power
  - Microsemi (Actel): antifuse, mix-signal
  - Achronix: high speed
  - QuickLogic: application-specific (handheld)























C

# **FPGA** architectures

- Early FPGAs
  - N x N array of unit cells (CLB + routing)
  - Special routing along center axis
- Next Generation FPGAs
  - M x N unit cells
  - Small block RAMs around edges
- More recent FPGAs
  - Added block RAM arrays
  - Added multiplier cores
  - Added processor cores





#### **FPGA** architecture trends

- Memories
  - Single & Dual-port RAMs
- Digital Signal Processor Engines
- Embedded Processors
  - Hardcore (dedicated processors)
  - Soft core (synthesized from a HDL)
- High speed/performance I/O connectivity
  - PCIe interface block
  - I/O transceiver
- Clock management blocks



# **Programming technology**

| Feature              | SRAM                       | Antifuse                       | Flash/E2PROM                      |  |
|----------------------|----------------------------|--------------------------------|-----------------------------------|--|
| Technology           | State-of-the-art           | One or more generations behind | One or more generations behind    |  |
| Reprogrammable       | Yes<br>(in system)         | No                             | Yes<br>(in system or offline)     |  |
| Reprogramming speed  | Fast                       |                                | 3x slower than SRAM               |  |
| Volatile             | Yes                        | No                             | No                                |  |
| Instant-on           | No                         | Yes                            | Yes                               |  |
| Security             | Acceptable                 | Very Good                      | Very Good                         |  |
| Size of Config. Cell | Large<br>(Six transistors) | Very small                     | Medium-small<br>(Two transistors) |  |
| Power consumption    | Medium                     | Low                            | Medium                            |  |
|                      |                            |                                |                                   |  |

とう

#### Xilinx FPGA architecture

**SRAM-based FPGA** 



# Configurable logic block (CLB) (I)

One CLB contains four slices



# Configurable logic block (CLB) (II)

- One CLB contains four slices
- Each slice:
  - Two Look-up tables (LUTs)
  - Two D Flip-Flops (DFFs)
  - Multiplexers and arithmetic gates
  - Carry logic
- Left-hand slice (SLICEM)
  - Distributed RAM
  - Shift register





# Look-up table (LUT) (I)



# Look-up table (LUT) (II)

- Inputs are used as a pointer into a LUT.
- Decoded using a hierarchy of transmissiongate MUXs.
- Transmission-gate: "pass" or "high-impedance".





#### LUT based RAM (Distributed RAM)

- Normal LUT performs "read" operation.
- For "write" operation, address decoders + write enable.
- Can be concatenated to created larger RAMs.
- Can also be used as shift registers (some of the LUTs).



# Xilinx Spartan-3 FPGAs

- XC3S200:
  - $480 \text{ CLBs} = 480^{*}4 \text{ Slices} = 480^{*}4^{*}2^{*}(4 \text{-input LUTs} + \text{registers})$ \_
  - 12 18-kbit dual-port BRAMs = 12\*18 Kb = 216 Kbits \_
  - Distributed RAM:  $480^{2}2^{2}2^{4} = 30,720b = 20Kb$  (only 2 LUTs per slice)

| System<br>Device Gates | Equivalent<br>Logic<br>Cells <sup>1</sup> | CLB Array<br>(One CLB = Four Slices) |         | Distributed   | Block RAM            |                  |                          |      | Maximum             |                           |     |
|------------------------|-------------------------------------------|--------------------------------------|---------|---------------|----------------------|------------------|--------------------------|------|---------------------|---------------------------|-----|
|                        |                                           | Rows                                 | Columns | Total<br>CLBs | RAM Bits<br>(K=1024) | Bits<br>(K=1024) | Dedicated<br>Multipliers | DCMs | Maximum<br>User I/O | Differential<br>I/O Pairs |     |
| XC3S50 <sup>2</sup>    | 50K                                       | 1,728                                | 16      | 12            | 192                  | 12K              | 72K                      | 4    | 2                   | 124                       | 56  |
| XC3S200 <sup>2</sup>   | 200K                                      | 4,320                                | 24      | 20            | 480                  | 30K              | 216K                     | 12   | 4                   | 173                       | 76  |
| XC3S400 <sup>2</sup>   | 400K                                      | 8,064                                | 32      | 28            | 896                  | 56K              | 288K                     | 16   | 4                   | 264                       | 116 |
| XC3S1000 <sup>2</sup>  | 1M                                        | 17,280                               | 48      | 40            | 1,920                | 120K             | 432K                     | 24   | 4                   | 391                       | 175 |
| XC3S1500               | 1.5M                                      | 29,952                               | 64      | 52            | 3,328                | 208K             | 576K                     | 32   | 4                   | 487                       | 221 |
| XC3S2000               | 2M                                        | 46,080                               | 80      | 64            | 5,120                | 320K             | 720K                     | 40   | 4                   | 565                       | 270 |
| XC3S4000               | 4M                                        | 62,208                               | 96      | 72            | 6,912                | 432K             | 1,728K                   | 96   | 4                   | 633                       | 300 |
| XC3S5000               | 5M                                        | 74,880                               | 104     | 80            | 8,320                | 520K             | 1,872K                   | 104  | 4                   | 633                       | 300 |

#### Notes:

Logic Cell = 4-input Look-Up Table (LUT) plus a 'D' flip-flop. "Equivalent Logic Cells" equals "Total CLBs" x 8 Logic Cells/CLB x 1.125 effectiveness. 1. 2

These devices are available in Xilinx Automotive versions as described in **DS314**: Spartan-3 Automotive XA FPGA Family.

#### **Programmable Interconnects (I)**



#### **Programmable Interconnects (II)**

- Programmable swich, also called programmable interconnect points (PIP).
- Implemented using transmission gates.
- Several types of PIPs:







# **FPGA Design flow**

- Synthesis
  - Parses HDL design
  - Infers Xilinx primitives
  - Generates design netlist
- Translate
  - Merges incoming netlists and constraints into a design file
- Map
  - Maps (places) design into the available resources on the target device
- Place and Route
  - Places and routes design



#### Synthesis constraints







#### **Are FPGAs perfect?**



#### **FPGAs are inefficient**

- Compared to ASICs, penalties in FPGAs:
  - Area: 17 54x
  - Speed: 3 7x
  - Power: 6 62x

#### Main culprit: INTERCONNECT!



# **Tabula Spacetime**

- Ultra-rapid full/partial reconfiguration with makes it possible to fold more functions onto the same hardware: multi-GHz rates
- Their claim:
  - 2.5x logic density
  - 3.7x DSP performance

#### www.tabula.com





# **Coarse-grained reconfigurable architecture**

- Currently in FPGA
  - Dedicated building blocks: multiplier, DSP core, processor
  - Partial configuration
- Moving torwards coarse-grained architecture:
  - Block-level instead of bit manipulations
  - Lower area and power consumption
  - High-level programming: e.g. xilinx vivado
  - Run-time configuration



#### References

- Clive "Max" Maxfield, "The Design Warrior's Guide to FPGAs – Devices, Tools and Flows", ELSEVIER, 2004.
- Bill Jason P. Tomas, "Introduction to Field Programmable Gate Arrays (FPGAs)".
- Xilinx, "Spartan-3 FPGA Family Data Sheet".

