

# EITF35: Introduction to Structured VLSI Design

## Introduction to FPGA design

#### Steffen Malkowsky

Steffen.Malkowsky@eit.lth.se



## WWW.FPGA

- What is FPGA?
  - Field Programmable Gate Array



## WWW.FPGA

- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market







https://semiengineering.com/big-trouble-at-3nm/

## WWW.FPGA

- What is FPGA?
  - Field Programmable Gate Array
  - Configurable logic blocks + interconnects + IOs + memory
- Why do we use it?
  - High performance & Flexible
  - Shorter time to market
- Where do we use it?
  - Prototyping
  - Computer vision
  - Medical imaging
  - Software-defined radio

#### **FPGA vs. Microprocessor**

|                                 | Intel i9 – 9900k | Xilinx Virtex 7<br>UltraScale |  |  |  |
|---------------------------------|------------------|-------------------------------|--|--|--|
| Technology                      | 14 nm            | 16nm                          |  |  |  |
| Clock speed                     | 3.6 GHz / 5GHz   | ~500 MHz*                     |  |  |  |
| Power consumption               | 95 W             | 30 W*                         |  |  |  |
| Dhrystone performance           | 400 GIPS         |                               |  |  |  |
| FP performance                  | 240 GFLOPs       | ~1000 GFLOPs*                 |  |  |  |
| IO/External memory<br>bandwidth | 41.6 GBytes/S    | 200 GBytes/S                  |  |  |  |

\* Depending on deployed hardware



#### Microprocessors including FPGAs

#### Skylake + FPGA on Purley



- Power for FPGA is drawn from socket & requires modified Purley platform specs
- Platform Modifications include Stackup, Clock, Power Delivery, Debug, Power up/down sequence, Misc IO pins (see BOM cost section)

| Cores                                               | Up to 28C with J                                                 | B cin                                                           |
|-----------------------------------------------------|------------------------------------------------------------------|-----------------------------------------------------------------|
| FPGA                                                | Altera®                                                          |                                                                 |
| Socket TDP                                          | Sh<br>Up to 16 <sup>r</sup>                                      | inte                                                            |
| Socket                                              |                                                                  | So In.                                                          |
| Scalability                                         | Up to 25                                                         | alapter                                                         |
| РСН                                                 | Lewis<br>Up to: 10y<br>Engine                                    | Scalable Xeon<br>with integracessor<br>FPGA                     |
| Memory                                              | 6 channels D.<br>RDIMM, LRDIMM,<br>2666 1DPC,<br>2133, 2400 2DPC |                                                                 |
| Intel® UPI                                          | 2 channels<br>(10.4, 9.6 GT/s)                                   | 1 G.<br>(9.6 Gi)                                                |
|                                                     | PCIe* 3.0<br>(8.0, 5.0, 2.5 GT/s)                                | PCle* 3.0<br>(8.0, 5.0, 2.5 GT/s)                               |
| PCIe*                                               | 32 lanes per CPU<br>Bifurcation support:<br>x16, x8, x4          | 16 lanes per FPGA<br>Bifurcation support:<br>x8                 |
| High Speed<br>Serial Interface                      |                                                                  | 2xPCle 3.0 x8                                                   |
| (Different board<br>design based on<br>HSSI config) | N/A                                                              | Direct Ethernet<br>(4x10 GbE, 2x40 GbE,<br>10x10 GbE, 2x25 GbE) |

Courtesy: Anandtech





### **FPGA** devices

- Manufactures:
  - Xilinx: Virtex, Kintex, Artix, Spartan
  - Intel (former Altera): Cyclone, Arria, Stratix
  - Lattice Semiconductor: flash, low power
  - Microsemi (Actel): antifuse, mix-signal
  - Achronix: high speed
  - QuickLogic: application-specific (handheld)











#### Some FPGA boards

#### Xilinx ZYnq BOard



https://reference.digilentinc.com/reference/programmable-logic/zybo/start



#### Some FPGA boards

#### Xilinx Virtex-5 OpenSPARC Evaluation Platform



http://www.digilentinc.com/Products/Detail.cfm?NavPath=2,400,795&Prod=XUPV5



#### Xilinx Nexys 4 Artix-7

#### We use Xilinx Nexys 4 Artix-7 FPGA board in this course!



http://store.digilentinc.com/nexys-4-artix-7-fpga-trainer-board-limited-time-see-nexys

### **FPGA** architectures

- Early FPGAs
  - N x N array of unit cells (CLB + routing)
  - Special routing along center axis
- Next Generation FPGAs
  - M x N unit cells
  - Small block RAMs around edges
- More recent FPGAs
  - Added block RAM arrays
  - Added multiplier cores
  - Added processor cores





#### **FPGA** architecture trends

- Memories
  - Single & Dual-port RAMs
- Digital Signal Processor Engines
- Embedded Processors
  - Hardcore (dedicated processors)
  - Soft core (synthesized from a HDL)
- High speed/performance I/O connectivity
  - PCIe interface block
  - I/O transceiver
- Clock management blocks
- Transceiver units





#### Xilinx 7-Series FPGA architecture

Based on: Advanced Silicon Modular Block (ASMBL)

- Allows varying feature mixes optimized for different domains
- Lowering dependency between I/O count and array size
- IP blocks can be scaled independent of surrounding resources





## 7-series Configurable logic block (CLB) (I)

- One CLB contains two slices
- Resource for combinational logic and flip-flops
- Switch matrix allows routing to other FPGA resources
- Carry chain is propagating vertically from one slice to another



Courtesy: Xilinx



## Configurable logic block (CLB) (II)

- CLBs are pairwise arranged in symmetrical fashion
  - Higher density
  - Clock lines can be shared
- Routing becomes easier



# Slice Resources (I)

- Each slice contains:
  - Four six-input Look-up tables (LUTs)
  - Eight D Flip-Flops (DFFs)
  - Multiplexers and arithmetic gates
  - Carry logic
- Left-hand slice (SLICEM)
  - May be used for memory aka Distributed RAM



Courtesy: Xilinx

# Slice Resources (II)

- 6-input LUT may be separated into two 5-input LUTs
  - Negligible effect on performance
  - One or two outputs
  - Any function of six variables or two independent functions of five variables may be mapped





## Slice Resources: Multiplexers

- Each Mux can implement
  - Arbitrary 7-input function
  - 8-1 multiplexer
- Middle Mux can implement
  - Arbitrary 8-input function
  - 16-1 multiplexer
- Mux output can drive combinational or to flip-flop



## Slice Resources: Carry Chain

- Carry chain implements fast arithmetic addition and subtraction
  - Propagates vertically through the four LUTs
  - Propagates vertically to slice above in same column



## Slice Resources: Flip-Flops

- Each slice has four flip-flop / latches
  - Configurable functionality
  - Input from LUT, carry chain, multiplexer or external
- Each slice has four flip-flops
  - Fixed functionality
  - Inputs come from LUTs
  - No access to carry chain, wide multiplexers or slice inputs



## Look-up table (LUT) (I)



# Look-up table (LUT) (II)

- Inputs are used as a pointer into a LUT.
- Decoded using a hierarchy of transmissiongate MUXs.
- Transmission-gate: "pass" or "high-impedance".





#### LUT based RAM (Distributed RAM)

- Normal LUT performs "read" operation.
- For "write" operation, address decoders + write enable.
- Can be concatenated to created larger RAMs.
- Can also be used as shift registers (some of the LUTs).



# DSP Cells (I)

- All 7 series FPGAs contain DSP48E1 cells
- The DSP48E1 cell has the following features
  - 25x18 signed multiplier
  - 48-bit add/subtract/accumulate
  - Pipeline registers for high speed
  - Pattern detector
  - SIMD operators
  - Cascade paths
  - 25 bit pre-adder
- Allows to map high-speed arithmetic onto FPGA without utilizing LUTs

## **DSP Cells (II)**



#### **DSP Cells Example: Six-Tap FIR Filter**



#### Uses six legacy DSP slices (without pre-adder)

Example from:

https://www.xilinx.com/support/documentation/user\_guides/ug479\_7Series\_DSP48E1.pdf

Courtesy: Xilinx

y(n-4)

## Xilinx Artix-7 FPGAs

- XC7A100T:
  - 15,850 Slices (each containing four LUTs and 8 flip-flops)
  - 240 DSP slices
  - Max Block RAM: 4,860 Kb, Max Distributed RAM: 1,188 Kb

| Device   | Logic<br>Cells | Configurable Logic Blocks<br>(CLBs) |                                | DODIOFI                          | Block RAM Blocks <sup>(3)</sup> |       |             |                     |                     | XADC | Tetel I/O | May Have                          |                                |
|----------|----------------|-------------------------------------|--------------------------------|----------------------------------|---------------------------------|-------|-------------|---------------------|---------------------|------|-----------|-----------------------------------|--------------------------------|
|          |                | Slices <sup>(1)</sup>               | Max<br>Distributed<br>RAM (Kb) | DSP48E1<br>Slices <sup>(2)</sup> | 18 Kb                           | 36 Kb | Max<br>(Kb) | CMTs <sup>(4)</sup> | PCle <sup>(5)</sup> | GTPs | Blocks    | Total I/O<br>Banks <sup>(6)</sup> | Max User<br>I/O <sup>(7)</sup> |
| XC7A12T  | 12,800         | 2,000                               | 171                            | 40                               | 40                              | 20    | 720         | 3                   | 1                   | 2    | 1         | 3                                 | 150                            |
| XC7A15T  | 16,640         | 2,600                               | 200                            | 45                               | 50                              | 25    | 900         | 5                   | 1                   | 4    | 1         | 5                                 | 250                            |
| XC7A25T  | 23,360         | 3,650                               | 313                            | 80                               | 90                              | 45    | 1,620       | 3                   | 1                   | 4    | 1         | 3                                 | 150                            |
| XC7A35T  | 33,280         | 5,200                               | 400                            | 90                               | 100                             | 50    | 1,800       | 5                   | 1                   | 4    | 1         | 5                                 | 250                            |
| XC7A50T  | 52,160         | 8,150                               | 600                            | 120                              | 150                             | 75    | 2,700       | 5                   | 1                   | 4    | 1         | 5                                 | 250                            |
| XC7A75T  | 75,520         | 11,800                              | 892                            | 180                              | 210                             | 105   | 3,780       | 6                   | 1                   | 8    | 1         | 6                                 | 300                            |
| XC7A100T | 101,440        | 15,850                              | 1,188                          | 240                              | 270                             | 135   | 4,860       | 6                   | 1                   | 8    | 1         | 6                                 | 300                            |
| XC7A200T | 215,360        | 33,650                              | 2,888                          | 740                              | 730                             | 365   | 13,140      | 10                  | 1                   | 16   | 1         | 10                                | 500                            |
|          |                |                                     |                                |                                  |                                 |       |             |                     |                     |      |           |                                   |                                |



#### **Programmable Interconnects (I)**



#### **Programmable Interconnects (II)**

- Programmable switch, also called programmable interconnect points (PIP).
- Implemented using transmission gates.
- Several types of PIPs:







## **FPGA** Design flow

- Synthesis
  - Parses HDL design
  - Infers Xilinx primitives
  - Generates design netlist
- Translate
  - Merges incoming netlists and constraints into a design file
- Map
  - Maps (places) design into the available resources on the target device
- Place and Route
  - Places and routes design



## Schematic vs. Technology view (I)

- Full adder
  - Translate the HDL description into a schematic





## Schematic vs. Technology view (II)

- Full adder
  - Map the design to the resources on the FPGA (LUTs, flip-flops etc.) and route the design





#### **Are FPGAs perfect?**



#### **FPGAs are inefficient**

- Compared to ASICs, penalties in FPGAs:
  - Area: 17 54x
  - Speed: 3 7x
  - Power: 6 62x

#### Main culprit: INTERCONNECT!



#### References

- Clive "Max" Maxfield, "The Design Warrior's Guide to FPGAs – Devices, Tools and Flows", ELSEVIER, 2004.
- Bill Jason P. Tomas, "Introduction to Field Programmable Gate Arrays (FPGAs)".
- Xilinx, "Artix-7 FPGA Family Data Sheet".
- Xilinx, "7-series FPGAs Configurable Logic Block User Guide",

https://www.xilinx.com/support/documentation/user\_guides/ug474\_7Series\_CLB.pdf

• Xilinx, "7-series DSP48E1 Slice User Guide", https://www.xilinx.com/support/documentation/user\_guides/ug479\_7Series\_DSP48E1.pdf

