## Configurable, scalable single-ended sense amplifier with additional auxiliary blocks for low-power two-port memories in advanced FinFET technologies

LIMITHA SUBBAIAH KUMAR NANGARU MASTER'S THESIS DEPARTMENT OF ELECTRICAL AND INFORMATION TECHNOLOGY FACULTY OF ENGINEERING | LTH | LUND UNIVERSITY



## Configurable, scalable single-ended sense amplifier with additional auxiliary blocks for low-power two-port memories in advanced FinFET technologies

Limitha Subbaiah Kumar Nangaru li0886su-s@student.lu.se

Department of Electrical and Information Technology Lund University

> Academic Supervisor: Baktash Behmanesh Ph.D., Assistant Professor, EIT LTH

> > Supervisor: Babak Mohammadi Ph.D., Xenergic AB

Examiner: Pietro Andreani Ph.D., Senior Lecturer, EIT LTH

September 20, 2022





© 2022 Printed in Sweden Tryckeriet i E-huset, Lund

### Abstract

System on Chip (SoC) designs contain a variety of Intellectual Property (IP) cores, including digital signal processing blocks, media and graphics processing units, as well as processing core units that employ multiple-port memories to enhance performance and bandwidth. These memories allow parallel read/write operations from the same memory blocks from different ports. Due to the enormous number of on-chip memories in modern SoCs, area efficiency is critical. Going down the technology node of transistors to create these memories is one solution to reduce the area and increase the computational density. However, as transistors were continuously scaled-down, lesser gate control and higher leakage current became a major concern. This led the semiconductor industry to reinvent the underlying transistor architecture and manufacturing processes. Today, Fin Field Effect Transistor (FinFET) are the world's pioneering transistors commercially available in the market. These are multigate transistors designed primarily for high-speed/high-density applications, that could effectively increase gate control.

In addition to area and power constraints, improving access time has always been a challenge in memories. Sense amplifiers are read circuit elements that are employed to interpret data bit stored in memory by amplifying a low-power bit line signal to recognizable logic levels, thereby improving the read access time. The goal of this thesis project is to use state-of-the-art finFET technology to design a low-power, configurable and compiler-friendly single-ended sense amplifier that can be easily scaled up or down based on the size of the memory block. This design dynamically produces reference voltage based on charge redistribution from high bit-line capacitance to low capacitance nodes. Apart from the single-ended sense amplifier, other memory sub-blocks like D Flip-flop, multiplexer and differential sense amplifier was also designed.

The designed circuits also had to be verified before and after the layout phase, to understand the effects of parasitics in the design. The design verification flow was automized using a python script that performs statistical analysis on transistor parameter variations and accumulates the results of the simulation tests and evaluates the failure probability. The thesis project was carried out in Xenergic AB.

## Acknowledgements

In appreciation of this thesis opportunity and guidance, I would like to thank my industrial supervisor, Babak Mohammadi, CEO of Xenergic. I am indebted for his constructive feedback and massive encouragement despite his busy schedule throughout the project. I would like to thank my academic supervisor, Baktash Behmanesh, for his patience and confidence in me throughout the entire project. I would also like to extend my gratitude to Adam, Xiao, Tom, Reda and Reza for their constant support and collaboration during the project. Heartfelt appreciation to Hemanth, Allan, Ajay, Menglin, Yaojie and Tharald for their incessant help, supervision during other projects at Xenergic, and personal advice.

Hat tip to my family and friends for always backing me up and for the enormous support throughout my master's program. As a final note, I would like to thank my professor Joachim Rodrigues for his support and opportunities these past two years.

## Popular Science Summary

Recent trends witness the size of semiconductor chips decrease to a few nanometers. Downscaling the process node allows for an increasing curve of computation density and also reducing the area. Owing to their structure, finFETs generate lower leakage power and provide for a more compact design. They also operate at a lower voltage and offer a high drive current. This means that much more performance can be packed into a smaller area and in turn reducing costs per unit performance.

Present-day SoCs feature multiple embedded processors, memory subsystems, and application-specific peripherals. Rather than relying on off-chip memory communication with limited inputs and outputs (I/Os) to exchange data, large memories are integrated into the chips themselves to prevent high look-up latencies. System power and performance of embedded SoCs are heavily influenced by their memory architecture. Back-to-back data processing requests in multi-core processing and multimedia applications demanded multi-port memories. Integration of additional bit lines and word lines lead to a massive area expense. Reducing area and power overhead consumed by memory is a primary concern in SoC design.

With the ongoing research in this field, several methods are used to reduce power consumption in memories. One of the primary methods opted is to reduce the supply voltage. Alternatively, special read circuitry called sense amplifiers were proposed which reduces signal swing on bit lines thereby eliminating power dissipation due to charging and discharging. Sense amplifiers not only minimized power consumption to a greater extent but also enhanced the read performance by minimizing sensing delay. This is done mainly by detecting and accelerating small bit line transitions. The sense amplifier's design strongly influences memory reliability (endurance, retention) and performance (access time).

The goal of this research project to design peripheral blocks for two port memories with a focus on configurable, compiler-friendly, scalable single-ended sense amplifier using the latest finFET technology. The proposed single-ended sense amplifier design tactfully uses the parasitic bit line capacitances to act as a source for generation of reference voltage during read operation. Additionally, the configurable property of the design reduces the area cost which could have, otherwise, incurred in establishing dummy bit line columns. The thesis project mainly aims at achieving high-performance speed, high sensitivity, lower area, and lesser power consumption at a system level.

## Table of Contents

| 1 | Intro | oduction 1                                            |
|---|-------|-------------------------------------------------------|
|   | 1.1   | Project Specification                                 |
|   | 1.2   | Thesis Organization                                   |
| 2 | Bac   | kground5                                              |
|   | 2.1   | FinFETs                                               |
|   | 2.2   | Two-Port SRAM    5                                    |
|   | 2.3   | Sense Amplifier                                       |
|   | 2.4   | Capacitive Charge Sharing Concept                     |
|   | 2.5   | Statistical analysis                                  |
| 3 | Sens  | e Amplifier Architecture 11                           |
|   | 3.1   | StrongARM Topology [13]                               |
| 4 | Circ  | uit Design17                                          |
|   | 4.1   | Differential-ended Sense Amplifier                    |
|   | 4.2   | Single-ended Sense Amplifier                          |
|   | 4.3   | D Flip-flop                                           |
| 5 | Layo  | put 23                                                |
|   | 5.1   | Sense Amplifiers                                      |
|   | 5.2   | D Flip-flop                                           |
| 6 | Veri  | fication 25                                           |
|   | 6.1   | Characterization of D Flip-flop                       |
|   | 6.2   | Verification of sense amplifier design characterstics |
|   | 6.3   | Automized Characterization flow of sense amplifier    |
|   | 6.4   | Timing Scheme    29                                   |
| 7 | Resi  | ılts 31                                               |
|   | 7.1   | D Flip-flop                                           |
|   | 7.2   | Sensitivity and Reliability check                     |
|   | 7.3   | Failure rate                                          |
|   | 7.4   | Power Delay Product                                   |

|    | 7.5 Design Comparison | 39                          |  |  |  |
|----|-----------------------|-----------------------------|--|--|--|
| 8  | Conclusion            | _ 41                        |  |  |  |
| 9  | Future Work4          |                             |  |  |  |
| Re | ferences              | _ 45                        |  |  |  |
| 10 | Appendix A            | <b>47</b><br>47<br>49<br>50 |  |  |  |

# List of Figures

| 1.1                             | Two-Port SRAM Memory Architecture                                                                                                                                                      | 2                          |
|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------|
| 2.1<br>2.2<br>2.3<br>2.4<br>2.5 | Two-Port Memory Cell                                                                                                                                                                   | 6<br>7<br>8<br>9<br>10     |
| 3.1<br>3.2<br>3.3<br>3.4        | StrongARM latch topology [13]                                                                                                                                                          | 12<br>13<br>14<br>15       |
| 4.1<br>4.2<br>4.3<br>4.4<br>4.5 | Single-ended Sense Amplifier design architectureProposed Single-ended Sense Amplifier Design [12]Bit cell array with end cellsSwitch logic for configurabilityD Flip-flop Architecture | 18<br>19<br>20<br>21<br>22 |
| 5.1                             | Proposed Floorplan                                                                                                                                                                     | 24                         |
| 6.1<br>6.2<br>6.3               | Testbench                                                                                                                                                                              | 26<br>28<br>29             |
| 7.1<br>7.2<br>7.3<br>7.4<br>7.5 | Setup Rise time variation with layout parasitics                                                                                                                                       | 31<br>32<br>32<br>32       |
| 7.6                             | sense amplifier                                                                                                                                                                        | 33<br>34                   |
| 7.7                             | Timing diagram for read operation                                                                                                                                                      | 35                         |
| 7.8                             | Failure rate of the differential-ended sense amplifier design                                                                                                                          | 36                         |
|                                 |                                                                                                                                                                                        |                            |

| 7.9  | Failure rate of the single-ended sense amplifier design                | 37 |
|------|------------------------------------------------------------------------|----|
| 7.10 | Total power consumption during read operation in single-ended sense    |    |
|      | amplifier                                                              | 38 |
| 7.11 | Average power consumption during read operation in differential-ended  |    |
|      | sense amplifier                                                        | 38 |
| 7.12 | PDP in different process corners in single-ended sense amplifier for   |    |
|      | read 0 and read 1 operation                                            | 38 |
| 7.13 | PDP in different process corners in differential-ended sense amplifier | 39 |
|      |                                                                        |    |
| 10.1 | Differential Half-Circuit of StrongARM latch circuit                   | 49 |

## List of Tables

| 7.1  | Monte Carlo Output for read 0 operation | 35 |
|------|-----------------------------------------|----|
| 7.2  | Monte Carlo Output for read 1 operation | 35 |
| 7.3  | Monte Carlo Output                      | 35 |
| 7.4  | Design Comparison                       | 39 |
|      |                                         |    |
| 10.1 | Characterization of Sense Amplifier     | 47 |

## List of Abbreviations

**CMOS** Complementary Metal Oxide Semiconductors  ${\bf DRC}\,$  Design Rule Check eDRAM Embedded Dynamic Random Access Memories FF Fast Fast FinFET Fin Field Effect Transistor FS Fast Slow IC Integrated Circuit **IP** Intellectual Property **LVS** Layout Versus Schematic  $\mathbf{M}\mathbf{C}$  Monte Carlo **NMOS** N-type Metal Oxide Semiconductors  ${\bf NV}\,$  Nominal Voltage **PDK** Process Design Kit **PDP** Power Delay Product **PMOS** P-type Metal Oxide Semiconductors **RBL** Read Bit Line  ${\bf RWL}\;$  Read Word Line  ${\bf SF}~$  Slow Fast SoC System on Chip SRAM Static Random Access Memory SS Slow Slow TT Typical Typical  ${\bf VTH}\,$  Threshold Voltage

xiii

## \_\_\_\_<sub>Chapter</sub> \_ Introduction

With the advancement in process technology, finFET devices dominate over planar MOSFETs. Below the 22nm process node, the traditional 2-D MOSFET transistors are replaced with 3-D finFETs. Additionally, the low power specifications and reduction in short channel effects in finFETs help improve the power and energy efficiency of the entire chip [1].

Since the massive proliferation of handheld battery-operated devices, designing low-power Integrated Circuit (IC)s has become of greater significance. On average up to 70% of contemporary SoC systems are composed of embedded memories primarily, Static Random Access Memory (SRAM) [5]. SRAM memories offer highspeed, high-power efficiency, and robustness. However, these memories consume more silicon area and have less storage capacity [5]. SRAMs use latching circuitry to store a single bit of data. A bit cell is the fundamental unit of computer memory that stores 1 bit of binary information, typically composed of SRAM. These bit cells are laid out in an array on a chip. Word lines and bit lines run across this matrix of cells connecting to each bit cell. The data stored in the bit cells are read/written from two complementary, precharged bit lines running vertically across the chip. To enhance the accessibility and speed of operation, an additional pair of bit lines and word lines are employed, such memories are referred to as multi-port memories. One such multi-port memory is a two-port memory that has two separate read ports and one write port for multiple read access to the memory cell.

In memories, sense amplifiers are essential analog circuitry attached to the end of these bit lines, supplemental to the bit cell, that assists in the reading operation. Sense amplifier topologies are strongly influenced by the type of memory, voltage levels, and overall memory architecture. SRAM bit cells produce a true differential output hence, in general, integrated sense amplifiers are based on differential amplifiers, that is, it senses differential voltages on both the bit lines to output a full logic signal. The functionality of a sense amplifier is to sense any discharge on the bit lines once the read operation is initiated. The sense amplifier is enabled when a sufficient differential voltage has been established. The amplifier then evaluates the data being read as a result, making the read operation of the memory faster. A sense amplifier performs better if it can detect a smaller differential offset voltage.

A two-port memory cell uses three bit-lines as illustrated in Figure 1.1. Two of

these bit lines are connected to the differential sense amplifier while the third bit line needs a single-ended sense amplifier. If the current in the third bit line drops below a certain threshold value, then it is sensed by the sense amplifier considered as logic '0' otherwise it is a logic '1'. The threshold voltage value can be tuned by sizing the transistors. Single-ended sense amplifiers display very good behavior at low supply voltages and their implementation is straightforward [6]. Figure 1.1 includes the array of memory cells, pre-charge circuitry, control and timing circuitry, bit lines, word lines, D flip-flop, address decoder, and sense amplifiers. The address decoder is used to select a specific bit cell from the array. D flip-flops latches the data input to be written into the memory.



Figure 1.1: Two-Port SRAM Memory Architecture

### 1.1 Project Specification

The main goal of the thesis project was to design a configurable, scalable singleended sense amplifier with additional auxiliary blocks for low-power two-port memories in advanced FinFET technologies. The proposed design architecture for single-ended sense amplifier had to work with a bitline offset voltage of 70mV while differential sense amplifier with 50mV when integrated with bit cell array and other peripheral blocks.

Specifically, for the single-ended sense amplifier the design had to be scalable and configurable. Additionally, the design of the layout for all the blocks had to be compiler-friendly, such that the design is Design Rule Check (DRC) clean when automatically placed by the compiler. The sense amplifiers and the D Flip-flop were also designed to meet the pitch of two bit cells

#### 1.2 Thesis Organization

The report is organized as follows:

- Chapter 2: Background with reference to finFET technology, Two-port memory cell, charge sharing concept and statistical analysis
- Chapter 3: Elaboration on strongARM topology used in the design of sense amplifiers
- Chapter 4: Circuit design of the memory peripheral blocks
- Chapter 5: Layout techniques opted for the physical design
- **Chapter 6:** Verification flow employed to verify the design, with a focus on automization of the entire flow
- Chapter 7: Results in terms of reliability, failure rate and performance. A comparison study is also done in the end
- Chapter 8: Conclusion of the project
- Chapter 9: Future work based on the current architecture

A detailed description of all the major concepts used in this thesis project is presented in the next section

# \_\_\_\_ <sub>Chapter</sub> 2 Background

#### 2.1 FinFETs

With continuous downscaling, Complementary Metal Oxide Semiconductors (CMOS) technology has advanced towards higher density, better performance and lower power consumption, however this results in detrimental short-channel effects. As 2-D planar transistors have scaled smaller, electrons have a higher probability of passing between the source and drain regions due to quantum tunneling. This leads to higher leakage current when a source-drain voltage is applied, even if the gate is turned off. Even with a perfect gate dielectric of thickness zero, controlling the leakage current is not possible because it is far away from the gate interface. In finFET technology, the gate is enclosed by a wrap-around gate electrode surrounding the channel on three sides, therefore finFETs are also referred to as 'trigate'. As a result of the wrap-around structure of finFETs, designers have better control over leakage currents. By wrapping the gate electrode around the channel, the electric field in the channel can be made more uniform, thereby improving the electrostatic control. In addition, there is no doping variation throughout the body, which reduces threshold voltage variations due to substrate bias (body effect).

FinFETs have the following advantages:

- Suppressed short channel effects due to better channel control
- Lower static leakage current
- Faster switching speed
- Lesser power consumption

However, the 3-D finFET structure increases the parasitics, especially the capacitances. The fabrication costs of finFETs are also high.

#### 2.2 Two-Port SRAM

An 8-transistor two-port SRAM bitcell is presented in Figure 2.1. An additional Read Word Line (RWL) and Read Bit Line (RBL) allows for two simultaneously read operations as opposed to single port memories that allow only one read/write operation in one clock cycle. A single port SRAM uses complementary bit lines and

an inverter couple pair activated by a write word line while a two-port bit cell uses an additional Read Word Line (RWL) and Read Bit Line (RBL) auxiliary to the 6-transistor (single-port) SRAM bit cell. It is possible to read simultaneously from both word lines due to the decoupling of the read port that is, it prevents charge sharing with internal storage nodes when read word line is activated. Generally, the transistors M7 and M8 in read port are stronger than the other transistors. Also, the read transistor M7 is connected to the bit storage node through its gate terminal thus the read operation is faster through this read port.



Figure 2.1: Two-Port Memory Cell

#### 2.3 Sense Amplifier

A sense amplifier is an important component that assists in reading the data from a bitcell. As mentioned in Chapter 1, the SRAM bit cells produce a true differential output due to the complementary write bit lines. This differential offset voltage is given as an input to the sense amplifier. The sense amplifier is designed to sense this voltage. Depending on the data read from the bit cell, current either flows in or out of the bit storage node through the access transistors (refer Figure ??). The difference in current, that flows through the bit lines, appears as a differential input voltage across the input transistors due to the RC effect present in the bitlines. The sense amplifier used in this thesis project uses a differential voltage sense amplifier. The Figure 2.2 shows the basic schematic of the sense amplifier design used.

In addition to the two sensing schemes, there are two sensing models, dualended sensing and single-ended sensing based on the memory topology. In dualended sensing, complementary bit lines are available for reading the value stored in a bit cell hence, it is majorly employed in single port SRAMs. As opposed to this sensing model, in single-ended sensing, since data is available on one bit line, to read the correct logic level, a reference voltage needs to be generated. This type of sensing is used majorly in two-port memories and in Embedded Dynamic Random Access Memories (eDRAM).

In this thesis project a dynamic charge sharing circuitry is used in order to generate a reference voltage. This circuitry is based on capacitive charge sharing which is explained in detail in the next section.



Figure 2.2: Basic structure of differential voltage sense amplifier

#### 2.4 Capacitive Charge Sharing Concept

The Figure 2.3 elaborates the charge sharing concept. The bit-line, initially precharged to  $V_{DD}$ , is modelled by the capacitor C\_Bitline. Capacitor C\_Share, which is predischarged to ground when switch SW2 is closed, is the capacitor with which the capacitor C\_Bitline shares its charge. Once the switch SW1 is closed charge starts flowing from C\_Bitline to C\_Share, this establishes the reference voltage across the C\_Share capacitor which can be estimated as,



Figure 2.3: Capacitive Charge Sharing Concept

Once the design is made, it needs to be verified for its yield and functionality. Statistical analysis is employed for this evaluation. This is further elaborated in the next section.

#### 2.5 Statistical analysis

Due to the potential growth in manufacturing variations in integrated circuits since the recent past, proper manufacturing yield can no longer be guaranteed in these silicon components and must be explicitly optimized during the design process.[7] All designs are subject to these variations which can affect their functional behavior.

Transistors form basic building blocks in memory cell arrays. With continuous technological advancements and ever-increasing components in integrated circuits, these variations in transistor parameters must be considered during the design process. There are two types of variations to be considered:

• Process variation which are observed between different chips on single wafer or between the wafers

• Mismatch variation which refers to random deviations between identical devices present on the same chip

Parameters of transistors are primarily affected by statistical variations. These variations follow a Gaussian distribution and it is important to analyze their effect in yield.

An algorithm based on repeated random sampling of transistor parameter distribution, referred to as a Monte Carlo (MC) method is typically employed for statistical analysis of these variations. Based on given configurations, a random sample is taken for each parameter and an evaluation is performed for each simulation. The simulation results are collected and analyzed to define the failure probability based on the number of failures relative to the number of simulations. As seen in Figure 2.4, the probability distribution curve is Gaussian which means the probability to find a failure is higher around the mean than at the edges of the curve, hence sampling is done around the mean of the distribution.



Figure 2.4: Transistor Parameter Gaussian Distribution

These variations in transistor parameters are denoted by process corners. These process corners are represented by carrier mobility of N-type Metal Oxide Semiconductors (NMOS) and P-type Metal Oxide Semiconductors (PMOS) transistors. Typical(T),Fast(F) and Slow(S) are usually used to refer to normal, high and low carrier mobility, respectively [8]. This is a function of Threshold Voltage (VTH) as well. Lower the VTH, faster is the transistor. The combination of these cases define the process corners as illustrated in the Figure 2.5. The first letter refers to NMOS and the second to PMOS transistors.



Figure 2.5: Process Corner Distribution

Chapter 3

## Sense Amplifier Architecture

Both dual-ended and single-ended sense amplifier employ strong-arm current-mode topology. However, in case of single-ended sense amplifier capacitive charge sharing scheme is employed to generate the reference voltage.

This architecture was chosen primarily for the following reasons:

- Zero static power consumption
- Produces rail-to-rail outputs
- Less sensitive with respect to process, voltage and temperature (PVT) variations

### 3.1 StrongARM Topology [13]

Known as "StrongArm," this circuit was used in the StrongArm microprocessor of Digital Equipment Corporation, but the basic structure was first developed by Toshiba's Kobayashi[15]. The Figure 3.1 shows the strong-arm latch topology used in the design of the dual ended sense amplifier. The latch in the Figure consists of a differential pair M1 and M2, two cross-coupled inverters M3-M6, one NMOS transistor M7 activated at SAE (sense amplifier enable) signal that activates the sense amplifier, four PMOS transistors (M8-M11) that act as switches to precharge the connected nodes to  $V_{DD}$  based on a precharge signal and two PMOS transistors (M12-M13) to equalize the voltages between the two branches of the latch before the activation of the sense amplifier.

StrongARM architecture exhibits a symmetric behavior where the transistor properties, like size, transconductance etc., on either side of the branches are identical. Hence, when a differential voltage is applied to the amplifier in a balanced manner, the performance can be determined by considering only half the circuit.

Initially, when the precharge (PCH) signal is low, transistors M7-M12 are switched on, the nodes D1, D2, D3 and D4 are precharged to  $V_{DD}$ . Transistors(M1-M6) are in the off state in this phase.



Figure 3.1: StrongARM latch topology [13]

In the next phase, the SAE signal, initially low, is pulled high along with the PCH signal. When a differential voltage is applied across the transistors M1 and M2, a differential current is drawn proportional to the voltage. With M3-M6 turned off, this current increases the node voltage  $|V_{D1} - V_{D2}|$ . Since the transistors M1 and M2 are matched, the current through transistor M7 is constant, this difference in voltage at the nodes D1 and D2 can be determined as:

 $\mid V_{D1}$  -  $V_{D2}\mid \sim ((g_{m1,2}\mid V_{in1}$  -  $V_{in2} \mid) \ / \ C_{1,2})t$  where,

 $g_{m1,2}$  is the transconductance of M1 and M2

 $\mathrm{C}_{1,2}$  is the capacitance of M1 and M2

Figure 3.2a enunciates the transistors active during the amplification phase. The current flow from the capacitors  $C_1$  and  $C_2$  is depicted in Figure 3.2b.



(a) Transistors active in the Amplification phase [13]



(b) Current flow during the Amplification phase [13]

Figure 3.2: Phase 2 - Amplification Phase

Next, as voltages at D1 and D2 fall to  $V_{DD}$ - $V_{THN}$  the cross coupled NMOS transistors turn on, thus current starts flowing from node D3 and D4 (refer to the Figure 3.3b below) for (C<sub>1,2</sub> / I<sub>CM</sub>) V<sub>THN</sub> seconds, where I<sub>CM</sub> represents the common mode current drawn by each capacitance. The node capacitances, transistors active during this phase is as shown in Figure 3.3a

The output voltages at node D1 and D2 fall until they reach  $V_{DD}-V_{THP}$ , at which point the PMOS transistors M5 and M6 are turned on. The equivalent circuit during this phase is as shown in Figure 3.4a. These transistors under the influence of a positive feedback, pulls up the voltage at one output branch to  $V_{DD}$ , simultaneously allowing the other branch to fall to zero based on the bit read from the memory. Thus, enabling a full voltage swing at the branches of the inverter couple as seen in Figure 3.4b. The current in each branch that causes this full voltage swing is explained in the Appendix A.



(b) Current flow when cross-coupled NMOS pair is activated [13]

Figure 3.3: Phase 3 - Cross-coupled NMOS pair activation Phase

One of the most important design metric in differential amplifiers is its overall gain. This gain is calculated by analyzing the half-circuit of the sense amplifier. The calculations and theory is explained in the Appendix A



(a) Activation of cross-coupled PMOS pair [13]

![](_page_31_Figure_3.jpeg)

(b) Output Voltage Swing [13]

Figure 3.4: Phase 4 - Cross-coupled PMOS pair activation Phase

# \_\_\_\_<sub>Chapter</sub> 4 Circuit Design

As illustrated in Figure 1.1, a memory unit comprises of several peripheral blocks like D Flip-flop, differential-ended sense amplifier, single-ended sense amplifier, multiplexers etc. The thesis project was focussed in designing these memory peripheral blocks.

#### 4.1 Differential-ended Sense Amplifier

The differential-ended sense amplifier follows the same StrongARM architecture described in the section 3.1. The differential transistors M1 and M2 needs to be highly sensitive and strong since it is attached to the bit lines which is highly capacitive. Hence, these transistors were made bigger compared to the other transistors. The transistor M7 connected to the sense amplifier enable signal is also one of the most critical transistors since the whole operation starts at the arrival of this signal. Transistors M3 and M4 prevents static current to flow from  $V_{DD}$  to ground by cutting off the dc path between the two power supplies, thereby reducing static power dissipation. Apart from these transistors, M5 and M6 play an important role in restoring output high level to  $V_{DD}$ , without them the voltage level at D3 or D4 will reach a degraded  $V_{DD}$  (depending on the polarity of the differential voltage at the transistors M1 and M2) hence, it would not be possible achieve a full output voltage swing. Thus, the transistors M3-M6 is also made stronger. The sizing also depends on the number of transistors on the same dc path, hence to reduce the resistance in this path the transistors are widened. Since, the precharge transistors mainly play an important role in establishing  $V_{DD}$  at the nodes and equalising this voltage at both sides of the branch of the latch. This also prevents the transistors M1 and M2 from entering triode region. These transistors basically act as switches hence these transistors were chosen to be minimum sized. The total power consumption was equal to,

$$P_{\text{static}} = f_{\text{CLK}} (2C_{1,2} + C_{3,4}) V_{\text{DD}}^2$$

where,

 $f_{CLK}$  is the clock frequency.

 $C_{1,2}$  and  $C_{3,4}$  is the capacitance at nodes D1, D2 and D3, D4.

#### 4.2 Single-ended Sense Amplifier

The proposed single-ended sense amplifier uses the differential-ended sense amplifier architecture with a capacitive charge sharing circuit to internally generate a reference voltage as shown in Figure 4.1. Due to its symmetrical design, it could read from either side of the bit line column, making the other bit line column a reference. When the selected bit line contains a '0' it remains undischarged and higher than the reference voltage then output is evaluated as '0'. Similarly, when '1' is being read the bit line discharges below the reference voltage which is assessed as '1'.

![](_page_34_Figure_3.jpeg)

Figure 4.1: Single-ended Sense Amplifier design architecture

The final proposed circuit design is presented in Figure 4.2. Transistors M15-M20 make the reference generation circuit. The reference voltage is generated every clock cycle using a charge sharing mechanism between the unselected bit line and a charge share capacitance which is initially predischarged to ground using the transistors M16/M19 enabled by the signal PREDISLFT/PREDISRGT. Transistors M21-M23 precharge and equalize the bit lines to  $V_{DD}$ . If a memory cell connected to the right bit line has to be read, the LFT\_EN signal along with C\_SHARE is enabled which activates the reference generation circuit on the left through CSH\_LFT\_B generated from a NAND logic gate. Transistors M17/M20 determines the amount of charge shared thereby setting the reference voltage. The branches of the inverter couple gives an inverted output hence this output is inverted to analyse the bit read from the memory cell.

![](_page_35_Figure_1.jpeg)

Figure 4.2: Proposed Single-ended Sense Amplifier Design [12]

Since sense amplifiers are connected to the end of bit line columns, the capacitance of read bit-line transistor in the bit cell and the wiring capacitance (based on the size of memory array) add up to form the bit-line capacitance. In this thesis project, the existing parasitic bitline capacitance is used to act as the precharged capacitance to give rise to the reference voltage. To elaborate, no new additional capacitance was created for the purpose of this project. Thus, the size of the memory array is critical to determine the reference voltage generated. The bit cell array will be surrounded by end cells. These end cells are special boundary cells which gives a DRC clean array. The Figure 4.3 shows the bit cell array surrounded by end cells.

| Corner<br>endcell | WL endcell        | Corner<br>endcell |
|-------------------|-------------------|-------------------|
| Bitline endcell   | BIT CELL<br>ARRAY | Bitline endcell   |
| Corner<br>endcell | WL endcell        | Corner<br>endcell |

Figure 4.3: Bit cell array with end cells

To get more accurate value of bit-line capacitance, parasitic extraction was performed on the 128x32 bit cell matrix with the end cells. However, the Process Design Kit (PDK) did not provide the schematic for the bit cells or the end cells so the schematic was created based on the layout with which Layout Versus Schematic (LVS) and parasitic extraction was run.

The goal of this thesis project also intended on making this sense amplifier scalable and configurable. The following subsections elaborate on how these criteria were met.

#### 4.2.1 Scalable

Sense amplifiers are often designed to be enabled only when a particular offset voltage is reached, that is after the bit line voltage drops to a certain level. Therefore, the sense amplifier design must be altered depending on the threshold voltage desired. For instance, here the sense amplifier was aimed to give accurate results at 60mV threshold voltage in all corners and temperature variations. As part of this thesis, the design of the sense amplifier was aimed to be configurable to any threshold value, so that the design is automatically modified for the desired voltage. Since the reference voltage generated depends on the bit line capacitance which in turn depends on the size of the bit cell macro. As the designed sense amplifier is dependent on dynamic charge sharing, the MOSCAP can be scaled easily to maintain the same reference voltage for memory of any capacity.

#### 4.2.2 Configurable

As mentioned previously in section 4.2, reference voltage generation requires a fully precharged bit line capacitance. This can be established by placing a dummy bit line column. This, however, would lead to reference voltage degradation when the memory size is scaled up with many sense amplifiers placed next to each other. The single dummy bit line column would fail to deliver the same signal strength throughout. An alternative would be to place dummy bit line columns at regular intervals. This will lead to a massive area trade-off. Hence, a novel idea is proposed in this thesis project to use the existing active bit line columns to act as reference voltage generators.

The single-ended sense amplifier is attached to the end of the bit line column. Its design was such that the sense amplifier was repeated every two bit cells during the final placement of the entire memory block. The two differential input lines of the sense amplifier is connected to the read bit lines of the bit cell column. As seen in Figure 1.1 selection of a word line activates the entire row of bit cells. To act as a reference bit line the bit cell column must always discharge to a particular voltage value which means that the bit cell must store a value of '0'. However, in real life scenario this is not true. Hence, to read from a particular bit line column the adjacent bit line must be cut off irrespective of the bit it stores, thus assuring a stable reference voltage.

In a two-port memory cell, the read bit line connected to the reference ground, called 'RVSS', is separate from the ground connected to the inverter couple pair (Figure 2.1). This feature gives us an opportunity to cut off the transistor M7 by manipulating the ground connection using a switch configuration as presented in Figure 4.4.

![](_page_37_Figure_5.jpeg)

Figure 4.4: Switch logic for configurability

To elaborate, if we read from the right read bit line then the left bit line acts as a reference as the control signal 'LFT\_EN' goes high (refer Figure 4.2). This signal also activates a switch logic, as illustrated in Figure 4.4 as 'REFERENCE\_SELECT', which connects the ground supply 'RVSS' to  $V_{DD}$  (refer Figure 2.1). Since the source of transistor M7 would be pulled up to the supply voltage the transistor transitions to cut-off region thus the read port gets decoupled from the 6-T transistor bit cell whose nodes store the bit value. Thus, the already precharged bit line can discharge to give a stable reference voltage. In parallel, since 'RGT\_EN' is an inverted output of 'LFT\_EN' the right reference generation circuit would be deactivated and 'RVSS' pin of the right bit column would be connected to ground.

#### 4.3 D Flip-flop

Data Flip-flop or D Flip-Flop is a digital peripheral block that stores value given on the data line, before it is written into the memory array. The proposed design follows the standard master slave architecture as shown below in Figure 4.5. The D Flip-flop circuitry has a multiplexer attached to it which selects the data input to be written into the memory or the test data input to test the memory. The master follows the data input (D) while the clock is high, and latches the value of the input at the output of the master on the trailing edge of the clock pulse. The master is now disabled and will remain so until the clock goes high again. When the clock goes low the inverted clock signal at the clock input of the slave enables it, and the output of the master is transferred to the output of the slave. When the clock next goes high the slave is disabled and will remain so until the clock goes low again[11].

![](_page_38_Figure_4.jpeg)

Figure 4.5: D Flip-flop Architecture

| Chapter 5 |
|-----------|
| Layout    |

Once the design phase is completed, the transistor design circuit is translated into physical layout design. The layout phase is an intermediate step between circuit design and fabrication process. The layout illustrates what a chip-fabricated design would look like. During design of the layout, matching, symmetry and area should be considered carefully.

#### 5.1 Sense Amplifiers

Specifically, in sense amplifier design, which is an analog circuit, the layout demands more precautions to minimize effects of device mismatches, noise, etc [22]. The transistors are placed symmetrically such that environment effects are uniformly distributed. To maximize device matching certain design techniques are employed:

- FinFET transistor symmetry brings about interdigitation technique to reduce the source-drain junction area and gate resistance by sharing common source-drain terminals
- Euler path helps in reducing the area by giving a unique pathway to place the transistors without breaking the diffusion layer
- Common centroid layout improves the matching between the differential transistors
- The finFET technology used in this thesis project employs a intermediary connecting layer which is highly capacitive. Care must be taken while drawing this layer as the fins of the transistor can experience different capacitance if its drawn heedlessly.
- Dummy transistors are inserted in euler path to avoid diffusion break which otherwise would cost two additional fingers per break

Additionally, the maximum width of the sense amplifier layout was limited to be two times the pitch of bit cell as shown in Figure 5.1.

![](_page_40_Figure_1.jpeg)

Figure 5.1: Proposed Floorplan

#### 5.2 D Flip-flop

The proposed D flip-flop architecture in section 4.3 is a digital logic block with many transistors hence the area consumption is slightly higher. However, unlike the sense amplifier the transistors in this logic block will be minimum sized. As mentioned previously, a single diffusion break consumes two fingers which can be avoided by using a single euler path. Dummy transistors are inserted or strength of some critical transistors are increased by increasing the number of fingers. These two methods enable a single euler path which reduces the area consumption to some extent.

However, these add on transistors can lead to an increase in parasitics which increases the setup and hold time. Additionally, the D Flip-flop was designed as a standard cell.

These peripherals along with the bit cell array would be integrated by a memory compiler. A memory compiler is a software tool, which allows a designer to generate memories of various capacities as well as types such as single-port or two-port SRAM memories [9]. Using a compiler results in a faster, more efficient, and more customizable design flow. Since the thesis also focuses on making the design scalable, producing memories of different sizes requires minimum changes.

One of the goals specified in this thesis project is that the design should be compiler friendly, this is taken care in the layout phase. The layout which is made DRC clean in the cell level must be DRC clean in chip level once all the other blocks are tailored together with the memory block.

| Chapter $6$  |  |
|--------------|--|
| Verification |  |

This chapter elaborates on the automated design validation flow employed for verification process.

#### 6.1 Characterization of D Flip-flop

As mentioned in section 4.2 to avoid diffusion break and reduce the area certain dummy transistors or additional transistor fingers of critical transistors are inserted in the euler path. These however, increases the parasitics in the design which affects the setup timing (amount of time required for the input to a Flip-Flop to be stable before a clock edge) and hold timing (minimum amount of time required for the input to a Flip-Flop to be stable after a clock edge) of the flip flop.

Apart from the parasitics introduced by the additional transistors, the layout design itself adds several parasitics. Hence, it is important to verify the functionality of the design post layout. To understand the effects of additional transistor fingers and layout parasitics the design is simulated pre layout and post layout to extract setup and hold timing information. Monte carlo simulations were run in various process corner variations to check for the setup and hold requirement between the data and clock signals. It was observed that the setup and hold timings were higher due to additional parasitics.

#### 6.2 Verification of sense amplifier design characterstics

The goal of this project specifies the proposed design to be scalable, configurable and compiler-friendly. These requirements had to be verified.

Scalable - As mentioned earlier in Section 4.2.1 the design is said to be scalable when it can be easily morphed to generate the required bit line offset voltage. To verify this constraint the design was simulated with two different bit cell macro size of 2Kb with 8-bit word size and and 1Kb with 8-bit word size. The changes in the design was limited to the size of the MOSCAP, which can be easily done manually or by the compiler without creating new DRC errors.

Additionally, the width of the sense amplifier was twice the pitch of the bit cells that is called, size of MUX-2. However, according to the floor plan the design

is perfectly flexible to fit for any size configuration like MUX-4 or MUX-8.

**Compiler-Friendly** - Multiple sense amplifier design cells were placed next to each other with a memory macro of size 1Kb and multiplexers. A DRC check was run after the system integration. A clean DRC result at this level confirms that when the design is tailored by the memory compiler along with other blocks and bit cell macro (regardless of the macro size), the design would be layout compatible.

**Configurable** - The ground switch logic introduced in the section 4.2.2 gives the sense amplifier its configurable property. Hence, the design is verified by simulating the design in different cases of reading 0 and 1. The test bench consisted of the sense amplifier to be verified, 256 two-port bit cells connected in a column to either side of the bit lines of the sense amplifier. The multiplexer logic created for switching between right and left reference generation circuit is also included. To give more accurate and real world simulation results the parasitics extracted from the system level layout is added to the design. The block diagram of the testbench created is shown in the Figure 6.1.

![](_page_42_Figure_4.jpeg)

Figure 6.1: Testbench

This simulation was performed in all PVT conditions hence this verification flow was automated which is discussed in the next section.

#### 6.3 Automized Characterization flow of sense amplifier

After the design phase is completed, it is necessary to validate the design for its functionality before entering the layout phase. It is imperative that the design developed is resilient to PVT variations, i.e., generating the correct output at every process corner, supply voltage variations and temperature variations.

For more accurate and realistic results of the validity of the circuit design, performing several hundred thousand runs of simulations in all PVT variations is required. A manual approach is highly inflexible and laborious even to analyze the results.

A python script was developed to perform automatic verification of sense amplifiers. To verify the proper functionality of the sense amplifier the response time is recorded ie., the time required for the output voltage on the branch of the inverter couple to drop to 20% of  $V_{\rm DD}$  (depending on the bit read from the memory cell) since the time the sense amplifier is enabled. The script takes certain key-

words as command line arguments to perform various functionalities. The syntax is as shown below:

#### Syntax – python script.py command input file.

The flow chart presented in Figure 6.2 illustrates this entire verification flow with the command lines. To create simulation folders – 'setup' keyword is used as the command. While running this command sense amplifier netlist, PVT list file and an input file that specifies the simulation environment is passed as inputs. These input files are copied into each newly created folder. To run the simulation in these folders the code is run with 'sim' command and the PVT list file. We can evaluate the estimated time required to run the simulation in each folder using the 'eval' command. It specifies the time required for netlist simulation for each evaluated sigma or monte carlo value. 'collect' command is used to assemble the response time values for each PVT case and create a csv file. 'help' command is used to display the help text for the functionalities. It describes the purpose of the tool, the syntax of commands to perform the specified tasks and the things to be taken care of while running the simulation. Lastly, the 'clear' command can be used to delete all the inessential files and folders.

Physical implementation of the design leads to parasitics (resistance, capacitance and inductance) at various nodes, thus once the layout of the design is completed parasitic extraction is performed to give a more precise analog model. In this thesis project an RC coupled simulation was performed so that detailed simulations can emulate real life analog circuit responses. The output of this extraction is a netlist with transistor definitions and lumped parasitics information in dspf format. This dspf file is the input to the python script to perform post layout simulation carried in this project. To capture more accurate results, the design was simulated in the industry standard conditions, that is, at a supply voltage variation of  $\pm 10\%$  from the Nominal Voltage (NV), at a nominal temperature of 25°c, a minimum temperature of -40°c and maximum temperature of 125°c. Thus, we can make sure the design works in a wide range of temperature. Slow Slow (SS), Fast Slow (FS), Slow Fast (SF), Fast Fast (FF), and Typical Typical (TT) were the five main process corners chosen for the simulation. The design was simulated in all cross conditions to check for its functionality.

A testbench file was developed to test the functionality of the sense amplifier design. The file is basically the spectre model of the testbench schematic illustrated in Figure 6.1. The file also specifies the simulator tool to be used (in this project spectre tool was used) and simulator configuration options.

Specifically, in differential sense amplifiers the offset voltage is set based on the bit read from the memory cell. That is, one of the complementary bit lines discharges depending on whether a '0' or '1' is read from the bit cell. Since, the analysis remains the same for both conditions due to symmetry, the design is treated as a black box and one of the bitline is discharged to maintain a stable 50mV offset voltage with respect to its complementary bit line.

![](_page_44_Figure_4.jpeg)

Figure 6.2: Sense Amplifier Verification Flow

#### 6.4 Timing Scheme

The proposed design was simulated in a system level with a bit cell array of size 1Kb (8-bit word size) and 2Kb(8-bit word size) with RC-coupled parasitics to check for the frequency with which the design can work. The timing diagram below in Figure 6.3 shows the timing scheme opted during the simulation. As seen in figure 6.3 a buffer time of a few picoseconds was given before starting the read operation for decoder operation. Once the decoder operation is completed, the bit line precharge is deactivated and the virtual ground and charge share circuit (CSC) is activated. Once a stable ground and reference voltage is generated 'RWL' is activated to establish the required offset between the read bit line and reference bit line. Once the required offset value is reached the sense amplifier is fired. This timing scheme shown in Figure 6.3 was considered for a bit cell array of size 1Kb and for the slowest bit cell in the worst corner and temperature. A similar approach was followed for cell size of 2Kb. This total timing scheme thus gave the maximum frequency with which the entire system can work for both bit cell capacity.

![](_page_45_Figure_3.jpeg)

Figure 6.3: Timing Scheme for system level simulation

![](_page_47_Figure_0.jpeg)

This chapter is a summary of the results obtained from the implementation of the peripheral block designs.

#### 7.1 D Flip-flop

As elaborated in section 5.1, the effects of parasitics introduced by the dummy transistors and the layout on the setup and hold timing has to be analysed. The setup and hold of the data signal 'D' is checked relative to the rising edge of the clock signal 'CLK'. Two different slew rates (change in signal voltage with respect to time) were chosen arbitrarily. For better analysis the slew rates of data and clock signals were varied from a few picoseconds to nanoseconds. The setup and rise time were checked with both rising and falling edge of the data signal.

This analysis with parasitics gives a more realistic understanding of the D flip-flop behaviour. Figures 7.1 and 7.2 show the effect of layout parasitics on the setup time when the data signal is on a rise and fall trend respectively. Similarly, Figures 7.3 and 7.4 show the effect of layout parasitics on the hold time

![](_page_47_Figure_5.jpeg)

Figure 7.1: Setup Rise time variation with layout parasitics

![](_page_48_Figure_1.jpeg)

Figure 7.2: Setup Fall time variation with layout parasitics

![](_page_48_Figure_3.jpeg)

Figure 7.3: Hold Rise time variation with layout parasitics

![](_page_48_Figure_5.jpeg)

Figure 7.4: Hold Fall time variation with layout parasitics

#### Results

#### 7.2 Sensitivity and Reliability check

The two sense amplifiers work based on a certain differential offset voltage between the bitlines. The reliability of the sense amplifier is determined by this voltage. The design is said to be reliable as long as it gives the right output regardless of the variance in the offset voltage. The lower offset voltage, the sense amplifier can detect, higher is its sensitivity. This offset voltage can vary due to the transistor parameter variations, specifically in single-ended sense amplifier due to its dynamic reference voltage generation property. Hence, it is necessary to check the minimum offset voltage until which the sense amplifier can work reliably in the presence of PVT variations.

This thesis project aims at a minimum offset voltage of about 70mV for the single-ended sense amplifier and 50mV for the differential-ended sense amplifier. This was made sure by running several monte carlo simulations on the design with parasitics and randomly selecting a particular condition set to check if the design fails. The failure of the design is determined by the response time. Here, response time refers to the amount of time taken by the output node to rise/fall 20% of supply voltage from the moment SAE signal rises to 50% of supply voltage enabling the sense amplifier. By ensuring symmetry and varying transistor sizes the design reliability is ensured at the desired minimum offset voltage. Although, to achieve higher reliability the area of the design is compromised due to transistor sizing. The response time variation in different process corners for the two sense amplifiers design is presented in Figure 7.5 (single-ended sense amplifier) and Figure 7.6 (differential-ended sense amplifier).

![](_page_49_Figure_4.jpeg)

Figure 7.5: Response time variation in different process corners in single-ended sense amplifier

Specifically, in differential sense amplifiers as mentioned in section 6.3, the design was treated as a black box with one of the complementary bit lines delib-

![](_page_50_Figure_1.jpeg)

erately discharged to maintain the specified offset voltage. Analysis of one such condition was sufficient to verify the functionality. In this test particularly read '0' operation was verified.

Figure 7.6: Response time variation in different process corners in differential-ended sense amplifier

The subsections below elaborates on the response time evaluation on the using monte carlo analysis and the evaluation setup specific to each sense amplifier.

#### 7.2.1 Single-ended Sense Amplifier

First, the results analysed from reading '1' and a '0' are depicted in Figure 7.7. In the Figure 7.7a, it can be seen that the read bit line starts to discharge as reading a '1' turns on the M7 (refer Figure 2.1). The sense amplifier enable signal is then activated exactly at the time when the read bit line is at 60mV offset with respect to the reference bit line. While reading a '0' the bit line does not discharge but similar to read '1' operation the reference voltage is set to be stable at 60mV offset with respect to the read bit line.

This simulation shown above were run in the 'FS' corner at  $\pm 10\%$  of the nominal voltage and 125°c. However, as mentioned in Chapter 5 for more accurate and realistic results of the validity of the circuit design, performing several hundred thousand runs of simulations in all PVT variations is required. Here, a total of 100000 monte carlo simulations were run. The supply voltage was varied  $\pm 10\%$  of the nominal voltage in the three temperature gradients. The results are summarized in the Table 6.5 and Table 6.6. The actual results obtained were tabulated and stored in a csv file format using the python script. The format is elaborated in the Appendix A Section 10.1.

![](_page_51_Figure_1.jpeg)

(a) Read-1 operation

(b) Read-0 operation

| Figure 7.7: | Timing | diagram | for read | l operation |
|-------------|--------|---------|----------|-------------|
|             |        |         |          |             |

Table 7.1: Monte Carlo Output for read 0 operation

| Operation | Bitline Offset   | Monte Carlo | Pass       |
|-----------|------------------|-------------|------------|
| Read 0    | $60 \mathrm{mV}$ | 100000      | $100 \ \%$ |

Table 7.2: Monte Carlo Output for read 1 operation

| Operation | Bitline Offset | Monte Carlo | Pass  |
|-----------|----------------|-------------|-------|
| Read 1    | 60mV           | 100000      | 100~% |

#### 7.2.2 Differential-ended sense amplifier

A similar analysis was used to test the functionality differential-ended sense amplifier. Identical conditions were chosen, that is a total of 100000 monte carlo simulations were run. The supply voltage was varied  $\pm 10\%$  of the nominal voltage in the three temperature gradients. However, unlike single-ended sense amplifier in differential-ended scheme we have a stable offset voltage between the two bit lines. Depending on bit read from the memory cell one of the two bit lines is pulled down. Thus, the analysis of read 0 and read 1 operation would be the same. Here, one of the bit line was set at supply voltage while the other was offset by 50mV from the supply voltage. The results obtained were tabulated which is shown in the Table 6.7.

| <b>Table 7.3:</b> Monte Carlo Οι | utput |
|----------------------------------|-------|
|----------------------------------|-------|

| Operation | Bitline Offset   | Monte Carlo | Pass  |
|-----------|------------------|-------------|-------|
| Read 0    | $50 \mathrm{mV}$ | 100000      | 100 % |

#### 7.3 Failure rate

As tabulated in Tables 7.1 and 7.3 the minimum offset voltage achieved turned out to be 60mV for single-ended sense amplifier and 50mV for differential-ended sense amplifier. However, different PVT conditions can reveal slightly varying differential offset voltage. Iterative monte carlo simulations revealed the worst PVT condition at which the design is sure to fail due to the change in offset voltage. Failure rate of the design was analysed in all process corners and the case temperature and supply voltage (as observed from the iterative simulation). The results were plotted as shown below in Figure 7.8 and Figure 7.9. The designs were run in all various conditions to reveal the worst case conditon. In this case, 0.9 times NV at a high temperature of 125°c. As seen in the figures, the differential sense amplifier design recorded a highest of two failures for a 100000 MC runs in FF, SS and SF process corners for an offset voltage less than 50mV. Similarly, in FF corner the single-ended sense amplifier design failed 3 out of 100000 times, which recorded as the worst failing corner when the differential offset was less than 60mV. The design was modified, by resizing the critical transistors, controlling the effects of parasitics and strengthening the supply connections to achieve 100% pass rate for the specified offset voltage.

![](_page_52_Figure_3.jpeg)

Figure 7.8: Failure rate of the differential-ended sense amplifier design

![](_page_53_Figure_1.jpeg)

Figure 7.9: Failure rate of the single-ended sense amplifier design

#### 7.4 Power Delay Product

Layout parasitics has an impact on the propagation delay of the output signal. The performance of the design during the two read operations can be determined by the Power Delay Product (PDP). PDP as a qualitative measure estimates the energy consumption during the read operation. PDP links propagation delay and power consumption. Faster switching of signals leads to higher energy consumption. This metric is estimated by multiplying power consumed with resolution time.

The total power consumption by the design was obtained by averaging the dynamic power over the simulation time period and adding it with leakage power consumed. However, this leakage current occurring due to finite slope of input signal during switching is minute compared to the dynamic power. Figure 7.10 shows the average total power consumed during read-0 and read-1 operation in different process corners. It can be deduced that FF corner presented to be the worst corner and SS was the best corner with lesser power consumption. Additionally, it can be noted that read-1 operation sees more power dissipation this is mainly due to the fact that the read bit line starts discharging while reading a '1' and in order to establish the required offset voltage the bit line is allowed to discharge before the sense amplifier is enabled.

A similar analysis was conducted with differential-ended sense amplifier which is illustrated in Figure 7.11. As witnessed in the bar graph the design experienced more power consumption in FF corner while in SS corner had minimum power consumption.

Using the results obtained in Figure 7.5, 7.6, 7.10 and 7.11. The power delay product is calculated for both the designs. The variation in this metric in various process corners specifically for read-0 and read-1 operation in single-ended sense amplifier is summarized in the Figure 7.12 and 7.13.

![](_page_54_Figure_1.jpeg)

Figure 7.10: Total power consumption during read operation in single-ended sense amplifier

![](_page_54_Figure_3.jpeg)

Figure 7.11: Average power consumption during read operation in differential-ended sense amplifier

![](_page_54_Figure_5.jpeg)

Figure 7.12: PDP in different process corners in single-ended sense amplifier for read 0 and read 1 operation

![](_page_55_Figure_1.jpeg)

**Figure 7.13:** PDP in different process corners in differential-ended sense amplifier

#### 7.5 Design Comparison

Finally, a comparison with other technology designs was made. The Table 7.4 below elaborates on the performance comparison of the designed single-ended sense amplifier with 16nm and 14nm technology nodes. As seen, the proposed design worked at a maximum frequency of 3.63GHz for 128-bits per bit line and at 2.94GHz for 256-bits per bit line. However, this frequency was achieved using the simulation environment and the compared results were measured on silicon. The obtained results might vary on silicon.

| Table | 7.4: | Design | Comparison |
|-------|------|--------|------------|
|-------|------|--------|------------|

| Source                      | ISSCC 2015[17] | JSSC 2017[12] | -                             |
|-----------------------------|----------------|---------------|-------------------------------|
| Technology                  | 16nm           | 14nm          | This work                     |
| Sensing Scheme              | Large Signal   | Small Signal  | Small Signal                  |
| $\# \operatorname{Bits/BL}$ | 16             | 256           | 128/256                       |
| Performance                 | 1.67GHz        | 2.21GHz       | $\sim 3.63/2.94 \mathrm{GHz}$ |

| Ch    | apter 8 |
|-------|---------|
| Concl | usion   |

In this thesis project work, the main focus was to design a single-ended sense amplifier with dynamic reference voltage generation using the latest finFET technology. With its configurable and scalable properties, the proposed design architecture proved a suitable candidate for two-port memories, despite the increased parasitics and complex layout design requirements associated with the new finFET process node.

One of the goals of the thesis project was to achieve a sensistivity of 70mV for the single-ended sense amplifier and of 50mV for the differential sense amplifier. As stated earlier in Chapter 7, the designed single-ended sense amplifier could generate a stable offset voltage of 60mV and reliably read from the memory block in all process corner, temperature and supply voltage variation. Similarly, the differential sense amplifier was designed to be sensitive to 50mV bit line offset voltage. The functionality was verified using monte carlo simulations.

The design was also verified to check if it satisfied the requirement of being scalable, configurable and compiler friendly. Multiple sense amplifier design cells were tailored with a memory macro of size 1Kb and multiplexers. A DRC check verified its compiler-friendly trait. The design also proved to be configurable, where data can be read from both left or right port of the sense amplifier by manipulating one of the bit line to be the reference bit line. For the scalability check, the design was simulated with memory macro of 2 different sizes that is, 2Kb with 8-bit word size and and 1Kb with 8-bit word size.

A very high performance of about 3.63GHz/2.94GHz for 128/256 bits per bitline respectively was achieved.

The D Flip-flop designed as a part of the thesis project was tested for its setup and hold time variation. The design fulfilled the required setup and hold timing requirement and the effects of parasitics seemed to be negligible

Other performance metrics like PDP, resolution time etc., were also discussed in this project report.

# \_\_\_\_<sub>Chapter</sub> 9 Future Work

It is imperative to explore the viability of the implementation in the worst process corner in the light of varying supply voltage. Monte Carlo method is not an efficient method for designs with high replication and less error rate. For this reason, importance sampling needs to be employed to speed up heavy simulation runs. It is also important to investigate other single-ended sense amplifier architectures that is less sensitive to node capacitance considering the highly capacitive finFETs.

Apart from the memory peripheral circuits designed in the thesis project other peripheral components like decoder, pre-charge, timing circuitry etc., need to be designed for complete memory generation.

## References

- R. S. Pal, S. Sharma, and S. Dasgupta, "Recent trend of FinFET devices and its challenges: A review," 2017 Conference on Emerging Devices and Smart Systems (ICEDSS), 2017, pp. 150-154, DOI: 10.1109/ICEDSS.2017.8073675
- [2] P. U. Jain and V. K. Tomar, "FinFET Technology: As A Promising Alternatives for Conventional MOSFET Technology," 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), 2020, pp. 43-47, doi: 10.1109/ESCI48226.2020.9167646
- [3] L. Lu, T. Yoo, L. Van Loi and T. T. -H. Kim, "An Ultra-low Power 8T SRAM with Vertical Read Word Line and Data Aware Write Assist," 2018 IEEE Asian Solid-State Circuits Conference (ASSCC) 2018, pp. 1-2, doi: 10.1109/ASSCC.2018.8579292
- [4] A. Fritsch et al., "24.1 A 6.2 GHz Single Ended Current Sense Amplifier (CSA) Based Compileable 8T SRAM in 7nm FinFET Technology," 2021 IEEE International Solid- State Circuits Conference (ISSCC), 2021, pp. 334-336, doi: 10.1109/ISSCC42613.2021.9365812
- [5] C. -C. Wang, R. G. B. Sangalang and I. -T. Tseng, "A Single-Ended Low Power 16-nm FinFET 6T SRAM Design with PDP Reduction Circuit," in *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 68, no. 12, pp. 3478-3482, Dec. 2021, doi: 10.1109/TCSII.2021.3123676
- [6] C. Papaix and J. M. Daga, "A new single ended sense amplifier for low voltage embedded EEPROM non volatile memories," *Proceedings of the 2002 IEEE International Workshop on Memory Technology, Design and Testing* (*MTDT2002*), 2002, pp. 149-153, doi: 10.1109/MTDT.2002.1029776
- [7] A.Singhee and R.A. Rutenbar," Extreme Statistics in Nanoscale Memory Design, USA:Springer Science+Business Media LLC, 2010, pp.12-15, 72-85
- [8] N.H.E. Weste and D. Harris," CMOS VLSI Design: A Circuit and Systems Perspective, 3rd Ed.,, Addison-Wesley, 2005, pp. 231-235
- [9] Gustavsson, John, Andersson, Axel, "Design of memory compiler", http: //lup.lub.lu.se/student-papers/record/8896304

- [10] Wen-Chieh Wu, Ming-Chuen Shiau, Chien-Cheng Yu, and Ching-Chih Tsai, "Two-Port SRAM Cell with Improved Write Operation", *International Journal of Information and Electronics Engineering*, Vol. 8, No. 3, September 2018
- [11] B. Holdsworth, R.C. Woods, *Digital Logic Design*, fourth ed., 2003, doi: https://doi.org/10.1016/B978-0-7506-4582-9.X5000-8
- [12] J. P. Kulkarni et al., "5.6 Mb/mm<sup>2</sup> 1R1W 8T SRAM Arrays Operating Down to 560 mV Utilizing Small-Signal Sensing With Charge Shared Bitline and Asymmetric Sense Amplifier in 14 nm FinFET CMOS Technology," in *IEEE Journal of Solid-State Circuits, vol. 52, no. 1, pp. 229-239*, Jan. 2017, doi: 10.1109/JSSC.2016.2607219
- [13] B. Razavi, "The StrongARM Latch [A Circuit for All Seasons]," in *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12-17, Spring 2015, doi: 10.1109/MSSC.2015.2418155
- [14] X. Dong, C. Xu, Y. Xie, and N. P. Jouppi, "Nvsim: Acircuit-level performance, energy, and area model for emerging nonvolatile memory", *IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems*, vol. 31, pp.994-1007, July 2012.
- [15] T. Kobayashi, K. Nogami, T. Shirotori, and Y. Fujimoto, "A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture," in *Proc. VLSI Circuits Symp. Dig. Technical Papers, June 1992*, pp. 28-29.
- [16] S. Yan, D. Li, L. Wang, Xiao and M. Tang, "A Novel Methodology of Layout Design by Applying Euler Path", 2010, 10th IEEE International Conference on Solid-State and Integrated Circuit Technology
- [17] H. Fujiwara et all., "A 64kb 16nm asynchronous disturb current free 2-port SRAM with PMOS pass-gates for finFET technologies," in *IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers*, Feb. 2015, pp. 1-3.
- [18] C. Toumazou, F. Lidgey, and D. Haigh, Analogue IC Design: The Current mode Approach, IEEE circuits systems series, Peregrinus, 1992
- [19] J. M. Rabaey, Digital Integrated Circuits: A Design Perspective. Upper Saddle River, NJ, USA: Prentice-Hall, Inc., 1996
- [20] B. Razavi, Design of Analog CMOS Integrated Circuits. McGraw-Hill, first ed., 2001
- [21] A. Prieto, "Statistical approach for the design of refresh-free edram with retention timing constraint," 2019. Student Paper.
- [22] Srinivasan Muthukrishnan, "Design of High Speed in Memory Serializer/Deserializer with Integrated Sense Amplifier," 2019. Student Paper.

![](_page_63_Figure_0.jpeg)

### 10.1 Characterization of Sense Amplifier

The given Table 10.1 below tabulates the results of running 100000 monte carlo simulations on both differential and single ended sense amplifier design. The python script developed for characterization of the design outputs a csv file with the following format.

| Corners | $V_{DD}(V)$ | Temp(°c) | Bitline<br>Voltage (V) | Offset(mV)          | MC     | Fail |
|---------|-------------|----------|------------------------|---------------------|--------|------|
| FF      | 1.1*NV      | 125      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 1.1*NV      | 125      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 1.1*NV      | 125      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 1.1*NV      | 125      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| SS      | 1.1*NV      | 125      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FF      | 1.1*NV      | 25       | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 1.1*NV      | 25       | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 1.1*NV      | 25       | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 1.1*NV      | 25       | 1.1*NV                 | Specified<br>offset | 100000 | 0    |

#### Table 10.1: Characterization of Sense Amplifier

| Corners | $V_{DD}(V)$ | Temp(°c) | Bitline<br>Voltage (V) | Offset(mV)          | MC     | Fail |
|---------|-------------|----------|------------------------|---------------------|--------|------|
| SS      | 1.1*NV      | 25       | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FF      | 1.1*NV      | -40      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 1.1*NV      | -40      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 1.1*NV      | -40      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 1.1*NV      | -40      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| SS      | 1.1*NV      | -40      | 1.1*NV                 | Specified<br>offset | 100000 | 0    |
| FF      | 0.9*NV      | 125      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 0.9*NV      | 125      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 0.9*NV      | 125      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 0.9*NV      | 125      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SS      | 0.9*NV      | 125      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| FF      | 0.9*NV      | 25       | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 0.9*NV      | 25       | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 0.9*NV      | 25       | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 0.9*NV      | 25       | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SS      | 0.9*NV      | 25       | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| FF      | 0.9*NV      | -40      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| FS      | 0.9*NV      | -40      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| TT      | 0.9*NV      | -40      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SF      | 0.9*NV      | -40      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |
| SS      | 0.9*NV      | -40      | 0.9*NV                 | Specified<br>offset | 100000 | 0    |

### 10.2 Voltage gain of sense-amplifier

Ignoring the transistors M12 and M13 (refer figure 3.1) and analyzing only the half circuit of strongARM architecture chosen (as shown in the figure 10.1), the overall voltage gain can be given by:

$$A_{\rm V} = \frac{V_{\rm OD}}{V_{\rm ID}}$$

where,

 $V_{OD}$  is the differential output voltage

 $V_{ID}$  is the differential input voltage

We can also deduce voltage gain as,

$$A_{\rm V} = g_{\rm m1,2}(R_{\rm ON} \parallel R_{\rm OP})$$

This half circuit acts as a differential pair with cascoding load M3 and current source transistor M5 applied to the amplifying transistors M1 and M2. Hence,

$$R_{\rm ON} = (g_{\rm m3} r_{\rm o3}) r_{\rm o1}$$
$$R_{\rm OP} = r_{\rm o5}$$

where,

 $r_{\rm o1},\,r_{\rm o3},\,r_{\rm o5}$  is the output resistance of transistor M1, M3 and M5 respectively.

![](_page_65_Figure_13.jpeg)

Figure 10.1: Differential Half-Circuit of StrongARM latch circuit

### 10.3 Currents in the branches of sense-amplifier

Referring to the figure 3.1 the currents in the branches can be calculated as shown below.

Let  $+\Delta I$  and  $-\Delta I$  represent the differential currents produced by M1 and M2 respectively. The current at D3 and D4 can be written as,

$$-C_{\rm D3} \frac{{\rm d}(V_{\rm D3})}{{\rm d}t} = g_{\rm m3}(V_{\rm D4} - V_{\rm D1})$$
$$-C_{\rm D4} \frac{{\rm d}(V_{\rm D4})}{{\rm d}t} = g_{\rm m4}(V_{\rm D3} - V_{\rm D2})$$

Then the currents at the drains of transistors M1 and M2 is,

$$-C_{\mathrm{D1}}\frac{\mathrm{d}(V_{\mathrm{D1}})}{\mathrm{d}t} = C_{\mathrm{D3}}\frac{\mathrm{d}(V_{\mathrm{D3}})}{\mathrm{d}t} + \Delta I$$
$$-C_{\mathrm{D2}}\frac{\mathrm{d}(V_{\mathrm{D2}})}{\mathrm{d}t} = C_{\mathrm{D4}}\frac{\mathrm{d}(V_{\mathrm{D4}})}{\mathrm{d}t} - \Delta I$$

![](_page_67_Picture_1.jpeg)

Series of Master's theses Department of Electrical and Information Technology LU/LTH-EIT 2022-893 http://www.eit.lth.se