# A Novel Variation-Aware Low-Power Keeper Architecture for Wide Fan-in Dynamic Gates

Hamed F. Dadgour, Rajiv V. Joshi\* and Kaustav Banerjee

Department of Electrical and Computer Engineering, University of California, Santa Barbara, CA 93106

\*IBM T.J. Watson Research Center, Yorktown Heights, NY

e-mail: {hamed, kaustav}@ece.ucsb.edu, rvjoshi@us.ibm.com

## ABSTRACT

Substantial increase in leakage current and threshold voltage fluctuations are making design of robust wide fan-in dynamic gates a challenging task. Traditionally, a PMOS keeper transistor has been employed to compensate for leakage current of pull down (NMOS) network. However, to maintain acceptable noise margin level in sub-100 nm technologies, large PMOS is necessary, which results in substantial contention (during pull down) and severe loss of performance. In this paper, a novel keeper architecture is proposed which is capable of significantly reducing the contention and improving the performance and power consumption. Using circuit simulations, superior characteristics of the proposed keeper is demonstrated in comparison to those of the traditional as well as state-of-the-art keepers. It is shown that for an 8-input OR gate, in presence of 15%  $V_{th}$  fluctuations, the proposed architecture can lead to 20%, 15%, and more than 40% reduction in power consumption, mean delay, and standard deviation of delay, respectively, when compared to traditional keeper circuit.

## **Categories and Subject Descriptors**

B.7.1 [Integrated Circuits]: Types and Design Styles – *VLSI* (very large scale integration).

# **General Terms**

Performance, Design, Reliability.

## Keywords

Dynamic gates, process variation, keeper design, low-power design, reliability, robustness, VLSI.

# **1. INTRODUCTION**

High fan-in dynamic OR-gates are commonly employed in design of high-performance register files, which are one of the most important modules in the critical path of modern microprocessors [1]. Dynamic implementation of wide fan-in OR-gates offers low latency, because it dose not require a PMOS transistor stack unlike their static CMOS counterparts. However, the major disadvantage of dynamic gates is their low noise margin, which is becoming more severe with increased leakage current and process variation in sub-100 nm technologies [2].

DAC 2006, July 24-28, 2006, San Francisco, California, USA.

Copyright 2006 ACM 1-59593-381-6/06/0007...\$5.00.

To increase noise margin of dynamic gates, conventionally, a small PMOS keeper transistor is used to compensate for the leakage current drawn out of dynamic node to preserve its acceptable voltage level during evaluation phase. However, the keeper circuit stays on during the time when the NMOS network begins to pull down the dynamic node until voltage of output node reaches a certain high voltage to turn off the PMOS keeper (Figure 1). This contention increases both delay of OR-gate and its power consumption and apparently forms a trade-off between achievable performance, power consumption and noise margin of the dynamic circuit [3]. This trade off is becoming more and more demanding in sub-100 nm technologies: because, as technology scales, leakage current of transistors increases tremendously which means that wide dynamic gates require much larger keepers. Moreover, process variation [4] leads to significant variation in leakage current of gates located on different regions of a die. As a result, to maintain appropriate level of noise margin for different gates spread over a chip, designers must use large PMOS keepers such that sufficient amount of current is supplied to dynamic node even in worst case scenarios.



Figure 1. Dynamic OR gate with traditional keeper.

In the literature, couple of attempts has been made to address the robustness problem of dynamic gates. These papers can be classified into two groups: designs in the first category try to decrease the leakage current through re-engineering of the pull down network [5], [6]. On the other hand, papers in the second group, including this work, focus on design of innovative keeper circuits [2], [7], and [8]. Since, re-design of keeper usually has less overhead than modification of the pull down network, these works focus on presenting new keeper circuitry. In [7], a programmable keeper circuit has been proposed, so that it's strength can be adjusted using a 3-bit enable digital input. This keeper has 3 stages with relative strength of 1, 2, and 4. Parameter variation is measured with a sensor, and then a keeper circuit of appropriate strength is chosen by applying the 3-bit input. However, in this approach, dynamic node is heavily loaded by gate and junction capacitance of control and keeper transistors. In [8], authors have split the keeper circuit into two parts so that during the evaluation phase, first part is always on and the second component turns on after a delay. Delay is used to decrease the amount of contention between the PMOS keeper and the pull down network throughout switching transition. The draw back of

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee.

this approach is that nearly all of noise occurs during the transition of pre-charge/evaluation clock signal, during which, this keeper's strength is the smallest and therefore the dynamic node dose not receive enough current in that period. In [2], authors have proposed a new approach in which, instead of a PMOS transistor, a semiconductor device (non-CMOS) with negative resistance characteristics has been used that is not easily integrable in a CMOS process.

In this paper, a novel variation-aware low-power/highperformance keeper architecture is presented, which is capable of improving noise margin of dynamic gates with minimal performance penalty. Circuit simulation has been used to show superior performance of proposed architecture in comparison to existing works and traditional keeper. It is shown that in presence of 15%  $V_{th}$  fluctuations, an 8-input OR gate with new keeper compared to same gate with traditional keeper leads to 20%, 15% and more than 40% reduction in power consumption, mean delay, and standard deviation of delay, respectively.

Rest of the paper is organized as follows: Section 2 includes characterization of the trade off between noise margin and performance of dynamic gates through a graphical presentation leading to the main idea behind the proposed circuit. In Section 3, architecture of proposed keeper is presented and details about its implementation are discussed. Section 4 includes simulation results and comparison of proposed and other keeper circuits. Finally, concluding remarks are made in Section 5.

## 2. Keeper Sizing for Dynamic OR Gates

Higher performance of wide fan-in dynamic OR gates comes at the cost of lower noise margin. This trade-off is shown graphically in Figure 2 for an 8-input dynamic OR gate with a simple PMOS keeper, where Z, Y, and X-axis represent normalized worst-case delay, process variation measured in terms of standard deviation of threshold voltage ( $\sigma_{Vth}$ ) as percentage of its nominal value ( $\mu_{Vth}$ ), and noise margin in volts, respectively.



Figure 2. 3D graph showing trade-offs in PMOS keeper sizing: Z, Y and X axis are normalized delay, noise margin, and process variation, respectively.

As can be observed, performance degrades significantly as we increase either process variations or noise margin level. The idea behind our proposed keeper is based on a graphical representation of this trade off (section 3), which requires proper description of noise margin and performance metrics as summarized below.

#### 2.1 Noise Margin and Its Characterization

In the literature, several metrics have been proposed to determine noise margin of dynamic gates [3]. In this work, a noise margin metric called *Unity Noise Gain (UNG)* has been used [2]. UNG is defined as the voltage level, which if applied to all inputs of a dynamic OR gate, results in a signal of same amplitude at the output node. This is shown in Figure 3, where voltage source of amplitude  $V_{NM}$  is applied to all the NMOS transistors of the pull down network and a keeper circuit, which is modeled by a current source, supplies enough current to the dynamic node so that voltage of output node ( $V_{OUT}$ ) is equal to  $V_{NM}$ . In this configuration, for a given threshold voltage of transistors, amplitude of current source  $I_{Min\_Req}$  is called *minimum required current* for UNG equal to  $V_{NM}$ .



Figure 3. Measurement of minimum required current in presence of V<sub>th</sub> variations for a particular noise margin.

In presence of process variations, leakage current drawn out of dynamic node reveals a large deviation; therefore, minimum required current that must be supplied by an ideal keeper circuit for same UNG level would be different. To account for impact of variation, HSPICE simulations of configuration shown in Figure 3 have been used in which, for a particular noise margin (UNG), threshold voltage of transistors have been swept over the entire variation range and minimum required current is measured.



A sketch of a minimum required current is shown in Figure 4 where, the vertical axis shows current and horizontal axis represents the variation in threshold voltage. The broken line in this figure displays minimum amount of current required to sustain a constant UNG over threshold voltage range from  $V_{th0}$ -3 $\sigma$ to  $V_{th0}+3\sigma$ . Alternatively, the solid lines represent the drive current of traditional PMOS keepers, which show linear behavior with respect to threshold voltage variations. To meet the minimum current criteria over  $V_{th}$  variation range, PMOS keeper must be enlarged and in this figure, only the top solid line satisfies the requirement. Minimum required current curve, which presents behavior of an ideal keeper, is our reference to evaluate the efficiency of other keeper circuits under process variation. Note that it is very difficult to design a keeper that can provide exactly the same current as minimum required current over entire range of threshold voltage variations: however, it is desirable to propose a circuitry that does not provide much higher current than is necessary.

Hashed area in Figure 4 corresponds to excess amount of current injected into dynamic node (that is higher than minimum required

current) and can be reduced with proper design of keeper. Considering Figure 4, a novel architecture is presented in this paper to reduce amount of contention. In the proposed architecture, as shown in Figure 5, instead of using a large single PMOS, minimum required current is generated through superposition of two curves. Consequently, the proposed keeper is composed of two parts, which are connected to the dynamic node in parallel as shown in Figure 6. Current characteristics of each part of the keeper are represented with dashed lines in Figure 5. First component of the proposed keeper is a small PMOS and its current curve  $(I_1)$  is a straight line with a small slope, while second part is a process variation coupled circuit, which has insignificant current up to the point  $V_X$  and begins to supply current afterwards  $(I_2)$  with a large slope. The bold solid line is the total current supplied to the dynamic node by these two components. Also shown in Figure 5 are circuit parameters that affect each of  $I_1$  and  $I_2$  curves and will be discussed in Section 3.



Although redesigning keeper circuit reduces contention current, its implementation requires additional circuitry compared to traditional single PMOS keeper, which can diminish the potentially achievable performance enhancement. Therefore, the proposed keeper circuitry must be designed carefully considering its impact on the performance of dynamic gates.

## 2.2 Performance of a Dynamic OR Gate

To maintain high performance, there are two major factors that should be considered when designing a keeper circuit: first, additional loading caused by keeper and its control circuitry and secondly, keeper circuit should be capable of switching off very fast, because if it remains on during evaluation, it will compete for longer time with the NMOS network during pull down. Most of the existing keeper designs in the literature either heavily load OR-gate nodes, or involve control circuitry that respond so slowly that the overall design fails to fully exploit the potential improvements [7], [8].

Proposed keeper in this work addresses these two issues with its uncomplicated and fast architecture. As shown in Figure 6, loading-wise, this approach only requires two additional connections to the dynamic node. Additionally, the control circuit is very fast so that it can shut down the variation coupled keeper almost instantly, even before the small (fixed) PMOS keeper goes off. Precise design and sizing of transistors provide significant performance improvements as will be discussed in Section 3.



Figure 6. Block diagram of proposed keeper architecture.

## 3. Proposed Keeper Circuit

The proposed architecture has three major modules: two keepers, which are called fixed and variation-coupled, and one sensor for process variation [6]. In the following sub-sections details about the operation of these circuits are presented.

## 3.1 Process Variations

Process variation is generally classified into random and systematic parameter fluctuations [9]. Random variation affects each device individually and independent of other nearby devices. However, in systematic variation, adjacent devices are affected in the same manner and there is a strong correlation between parameters of transistors located in a neighborhood. In this work, we focus only on systematic variation as it is assumed that threshold voltage of all transistors in a dynamic gate have been altered in the same way. To account for random variations, one can target slightly higher noise margin from actual desired UNG level. This noise margin guard bound must be chosen according to the level of random variations at a given technology node. However, note that noise margin overhead is likely to be low since it is very unlikely to have worst case scenarios where random variation imposes lower threshold voltage on all NMOS and higher threshold voltage on the PMOS keeper.

## 3.2 Process Variation Sensor

The variation sensor used in this work is based on DIBL (Drain Induced Barrier Lowering) effect, where it is observed that for a short channel device, threshold voltage is modulated by drain voltage [6]. In other words, high drain voltage can lower the source-channel barrier so that channel is formed at lower gate voltages (lower threshold voltage). Therefore, assuming fixed bias condition ( $V_{Bias}$  and  $I_{REF}$ ), drain voltage of  $M_2$  in Figure 6 is a function of systematic process variations. If variation imposes higher threshold voltage on gate, V<sub>DIBL</sub> will decrease and vice versa. Role of bias circuitry is critical here; because, it is assumed that bias condition is same over entire threshold voltage range. Such a process variation insensitive circuitry has been proposed in [10] and used in our keeper circuit. The reference current and voltage generated by bias circuitry are only a function of thermal voltage and width of transistors in that circuit; hence, they are effectively independent of channel length variations. Moreover, the process variation sensor can be used for multiple adjacent dynamic OR gates using current mirror circuits, which prevents further area overhead and power consumption.

## 3.3 Fixed Keeper

In this sub-section we derive analytical equation for drive current of traditional keeper versus  $V_{th}$  ( $I_1$ - $V_{th}$  curve) using circuit

configuration of Figure 3 where current source is replaced by traditional keeper. Drive current of fixed PMOS keeper  $(M_l)$  is approximately a linear function of its threshold voltage (1).

$$I_1 = I_p = \mu_p C_{ox} \left(\frac{W}{L}\right)_{M1} (V_{SGp} + V_{thp})$$
(1)

Note that in this circuit configuration, voltage difference between gate and source of PMOS keeper is a constant value, because this equation is derived under condition specified by definition of UNG. Therefore, assuming  $V_{GSP}=V_{NM}$  and  $V_{thp}=-V_{th}$ , (1) can be re-written as:

$$I_{1} = K_{1}(V_{DD} - V_{out}) - K_{1}V_{th}$$
(2)

Where,

$$K_1 = \mu_p C_{ox} \left(\frac{W}{L}\right)_{M1} \tag{3}$$

Equation (2) justifies the situation where increasing size of PMOS to meet the minimum required current level at  $V_{th0}$ - $3\sigma$ , also raises its current around  $V_{th0}$ + $3\sigma$  (Figure 4).

#### 3.4 Variation-Coupled Keeper

In this section we derive current curve  $(I_2-V_{th} \text{ curve})$  of variationcoupled keeper. The proposed architecture (Figure 6) is chosen so that only  $M_3$  or  $M_4$  is conducting at a given time. If dynamic node is high,  $M_4$  is off and gate of  $M_3$  is controlled by drain voltage of  $M_2$ . Whenever dynamic node switches from high to low,  $M_4$ becomes on, pulling up gate voltage of  $M_3$  and turning if off. Note that,  $M_2$  is in its sub-threshold region and  $M_4$  can easily lift up gate of  $M_3$ . When this circuit is conducting, gate of  $M_3$  is controlled by drain voltage of  $M_2$ . Therefore, to find drain current of  $M_3(I_2)$ , we first need to derive an equation for drain voltage of  $M_2$ . Since  $V_{bias}$ is chosen to be less than threshold voltage of  $M_2$ , this transistor is in sub-threshold region. Sub-threshold leakage current for a MOSFET can be modeled as [11]:

$$I_{ds} = \mu_n C_{ox} \frac{W}{L} V_r^2 e^{\frac{V_{gs} - (V_{th} - \eta V_{ds})}{nV_T}}$$
(4)

Where,  $\mu_n$  is the effective mobility,  $C_{OX}$  is the gate-oxide capacitance, *L* is the effective channel length, *W* is the effective width,  $V_T$  is the thermal voltage,  $V_{th}$  is the threshold voltage of transistor, and  $\eta$  is the DIBL constant. Since current of  $I_{REF}$  is independent of process variations, all of the parameters in (4) are constant except for  $V_{th}$  and  $V_{ds}$  (in this circuit, voltage of DIBL node). As a result, any variation on threshold voltage of  $M_2$  can only be compensated by deviation on  $V_{ds}$ . Voltage of DIBL node can be obtained in term of other parameters from (4):

 $V_{DS} = V_{DIBL} = \frac{1}{n} \left( V_{th} + K_2 \right)$ 

Where,

$$K_{2} = nV_{T} \cdot \ln \left( \frac{I_{REF}}{(r_{T})} \right) - V_{River}$$

$$\zeta_2 = nV_T \cdot \ln \left[ \frac{I_{REF}}{\mu_n C_{ax} \left( \frac{W}{L} \right)_{M2} V_T^2} \right] - V_{Bias}$$
(6)

We know that gate-source voltage of  $M_3$  is the voltage difference between  $V_{DD}$  and DIBL nodes, therefore  $V_{GSP}=V_{ds}-V_{DD}$  and also  $V_{thp}=-V_{th}$ . As a result current characteristics of variation coupled keeper ( $I_2$ ) can be easily obtained as:

$$I_2 = I_p = \mu_p C_{ax} \left(\frac{W}{L}\right)_{M3} (V_{SGp} + V_{thp})$$
(7)

$$I_{2} = K_{3}(V_{DD} - \frac{K_{2}}{\eta}) - K_{3}\left(1 + \frac{1}{\eta}\right)V_{th}$$
(8)

Where,

$$K_3 = \mu_p C_{ox} \left(\frac{W}{L}\right)_{M3} \tag{9}$$

Equation (8) is very interesting; because, it presents a straight line in  $I_2$ - $V_{th}$  plane that it's intersect with  $V_{th}$  axis (function of  $M_2$  size,  $V_{Bias}$ , and  $I_{REF}$ ) and its slope (function of  $M_3$  size) are entirely independent. Circuit parameters affecting keeper current curves are shown in Figure 5.

#### 3.5 Keeper Design Framework

It was found that optimal value for  $V_X$  is nominal threshold voltage ( $V_{th0}$ ) and  $M_4$  should be minimum sized in order to ensure smallest possible loading on dynamic node. Parameter  $M_1$  must be chosen so that fixed keeper ( $I_1$  in Figure 5) can maintain minimum required current over  $V_{th}$  spread from  $V_{th0}$  to  $V_{th0}+3\sigma$  as variation coupled keeper is off in this range ( $I_2=0$ ). Consequently, size of  $M_2$ ,  $V_{Bias}$ , and  $I_{REF}$  should be chosen such that the variation coupled keeper remains off from  $V_{th0}$  to  $V_{th0}+3\sigma$  and conducts only for rest of  $V_{th}$  range. Finally,  $M_3$  should be chosen such that total current of two keepers ( $I_1+I_2$ ) stays above the minimum required current in  $V_{th}$  range from  $V_{th0}-3\sigma$  to  $V_{th0}$  (Figure 5).

## 4. Implementation and Results

To study its relative performance in terms of delay and power consumption, the proposed keeper is compared to the traditional circuit and three existing works in literature using SPICE simulations. Three metrics are used for comparison: (1) mean value of worst-case delay ( $\mu_{Delay}$ ), (2) standard deviation of worst-case delay ( $\sigma_{Delay}$ ) over  $V_{th}$  range, and (3) mean value of power consumption. We have used BSIM models [12] for 90 nm technology with Vdd=1V, temperature of 110 °C, and UNG=0.2V for all simulations. Other than Figure 8, all simulations were carried out for 8-input dynamic OR gates. As per [13], standard deviation of  $V_{th}$  ( $\sigma_{Vth}$ ) is assumed to be 5% of nominal  $V_{th}$  ( $\mu_{Vth}$ ) for all simulations, except for Table 2, in which we vary  $\sigma_{Vth}$  values.



Figure 7 shows the minimum required current along with current supplied by traditional and proposed keepers for an 8-input OR. Vertical and horizontal axes are normalized current and threshold voltage, respectively. Figure 7 agrees with the sketched curves presented in Figure 5, where traditional keeper has linear behavior and proposed circuit provides a curved supply current. This figure suggests that new keeper creates less contention over almost entire variation range especially around nominal  $V_{th}$  and values

(5)

higher than it. Also, it can be seen that proposed keeper supplies slightly more current than traditional circuit around  $V_{th0}$ - $3\sigma$  region.

Superior performance of both 8- and 16-input OR gates, which employ proposed keeper over gates with traditional keeper is shown in Figure 8, where, y-axis is normalized worst-case delay and x-axis shows  $V_{th}$ . Due to lower contention, for both 8- and 16-input OR gates, over almost entire range of  $V_{th}$  variation, new keeper offers higher performance. Only in a small region around  $V_{th0}$ - $3\sigma$ , our circuit has slightly more delay because of higher contention there, as could be predicted from Figure 7. Moreover, over entire  $V_{th}$  spread, worst-case delay of proposed keeper roughly stays on a horizontal line and exhibits less variation compared to that of the traditional circuit, which can be considered as higher robustness to process variation.



Figure 8. Performance comparison between 8- and 16-input dynamic OR gates with proposed and traditional keeper.

To further demonstrate the efficiency of our approach, three existing keeper design works in the literature, shown in Figure 9, have been simulated along with the proposed and traditional keeper. Alvandpour et al. [8] (Figure 9 (b)) split the keeper into two parts so that, in evaluation phase, first part is always on and the second component turns on with a delay resulting in reduced contention. Kim et al. [7] (Figure 9 (c)) uses a 3-bit digital input to adjust keeper strength according to process variation. Krishnamurthy et al. [5] (Figure 9 (d)) tries to decrease leakage through re-engineering of NMOS pull down network.



Figure 10 and Figure 11 show the normalized delay and power consumption of above five circuits over the entire spread of  $V_{th}$ . All OR gates are optimized to have noise margin of 0.2V under process variation of  $\sigma_{Vth}=5\%\mu_{Vth}$ . Table 1 lists transistor sizes used for simulations of various circuits. Different transistors are

referred by their corresponding numbers, from Figure 6 for this work and Figure 9 for rest of the circuits. The pull down NMOS network transistors in all the designs (including the proposed circuit) have W/L ratio of 5 and bias circuitry for proposed keeper is chosen with  $I_{REF}=5uA$  and  $V_{Bias}=0.2V$ .

Table 1. Sizing of transistors used for simulation of different circuits. All circuits were sized to have the same noise margin.

| This Work    | W/L(M1) = 1.5, $W/L(M2) = 2$ , $W/L(M3) = 2$ and $W/L(M4) = 1$ |  |  |  |  |
|--------------|----------------------------------------------------------------|--|--|--|--|
| Trad. Keeper | W/L(M1) = 2.5                                                  |  |  |  |  |
| Ref. [8]     | W/L(M1) = 1.5 and W/L(M2) = 1.5                                |  |  |  |  |
| Ref. [7]     | W/L(M1) = 0.5 , W/L(M2) = 1 and W/L(M3) = 2                    |  |  |  |  |
| Ref. [5]     | W/L(M1) = 0.25                                                 |  |  |  |  |

Figure 10 suggests that proposed architecture has the lowest delay over almost entire range of  $V_{th}$  variation. Although implementation of [8] has less delay around  $V_{th0}$ - $3\sigma$  region, our keeper offers lower delay around  $V_{th0}$  which is of higher interest because statistically majority of transistors fall into this category. Circuit of [5] is designed to be very low-power and consequently, it has high latency.



Figure 10. Comparison of mean delay versus threshold voltage between 8-input OR gates with proposed keeper and previous works.



Figure 11. Comparison of power consumption versus threshold voltage for 8-input OR gates with proposed keeper and previous works.

Figure 11 shows the normalized power consumption of simulated circuits. In this simulation, power consumption of process variation sensor as well as bias circuitry is considered for all implementation. It can be observed that for most part of  $V_{th}$  variation range, new keeper performs better than traditional keeper and those presented in [8] and [7]. Only keeper of [5] has lower power consumption which comes at the cost of its very low speed as pointed out in Figure 10.



Figure 12. Delay distribution for traditional, proposed, and two previous works, obtained from a Monte Carlo simulation.

Besides mean delay and power, another important factor is standard deviation of delay over range of  $V_{th}$  variation, which essentially shows impact of process variation on fluctuation of gate delay. The proposed keeper not only reduces mean value of delay, but also significantly lowers delay deviations. This is shown in Figure 12 through a delay distribution obtained from a Monte Carlo simulation of 50 samples. As can be observed, distribution of delay for new keeper is both shifted to left (lower delay) and is narrower (lower deviation).

Table 2. Comparison between mean delay, standard deviation of delay, and power consumption for proposed keeper and previous works for  $\sigma_{Vth} = 1\%$ , 3%, 5%, 7%, and 10%  $\mu_{Vth}$ .

|                                  |                    | This<br>Work | Trad. | Ref.[8] | Ref.[7] | Ref.[5] |
|----------------------------------|--------------------|--------------|-------|---------|---------|---------|
|                                  | Power              | 1.10         | 1.00  | 0.95    | 1.91    | 0.56    |
| $\sigma_{Vth} = 1\%\mu_{Vth}$    | $\mu_{Delay}$      | 0.95         | 1.00  | 0.88    | 1.08    | 1.86    |
|                                  | $\sigma_{Delay}$   | 0.28         | 1.00  | 0.84    | 16.82   | 3.53    |
|                                  | Power              | 1.14         | 1.20  | 1.11    | 1.91    | 0.56    |
| $\sigma_{Vth} = 3\% \mu_{Vth}$   | $\mu_{Delay}$      | 1.07         | 1.07  | 0.94    | 1.08    | 1.86    |
|                                  | $\sigma_{Delay}$   | 2.08         | 3.22  | 2.71    | 14.66   | 10.65   |
|                                  | Power              | 1.14         | 1.42  | 1.28    | 1.91    | 0.56    |
| $\sigma_{Vth} = 5\%\mu_{Vth}$    | $\mu_{Delay}$      | 0.96         | 1.14  | 1.00    | 1.08    | 1.86    |
|                                  | $\sigma_{Delay}$   | 3.44         | 5.75  | 4.85    | 12.51   | 17.85   |
|                                  | Power              | 1.18         | 1.94  | 1.69    | 1.91    | 0.56    |
| $\sigma_{Vth} = 7\%\mu_{Vth}$    | µ <sub>Delay</sub> | 0.97         | 1.32  | 1.14    | 1.08    | 1.86    |
|                                  | $\sigma_{Delay}$   | 5.65         | 9.22  | 7.76    | 10.40   | 25.19   |
|                                  | Power              | 1.26         | 2.88  | 2.52    | 1.91    | 0.56    |
| $\sigma_{Vth}$ = 10% $\mu_{Vth}$ | $\mu_{Delay}$      | 0.99         | 1.70  | 1.43    | 1.08    | 1.86    |
|                                  | $\sigma_{Delay}$   | 7.56         | 16.80 | 13.84   | 7.47    | 36.59   |

To illustrate impact of different levels of parameter fluctuation  $(\sigma_{Vth}/\mu_{Vth})$ , each circuit is simulated for  $\sigma_{Vth}=1\%$ , 3%, 5%, 7% and 10% of  $\mu_{Vth}$ . Results are presented in Table 2 in which all numbers are normalized to simulation results of traditional keeper circuit under  $\sigma_{Vth} = 1\%\mu_{Vth}$  variation. In this table, worst case mean delay  $(\mu_{Delay})$ , standard deviation of worst case delay  $(\sigma_{Delay})$  and power consumption are summarized for different  $\sigma_{Vth}$ . Keeper of [5] has highest mean delay and lowest power consumption for all  $\sigma_{Vth}$ values and shows high  $\sigma_{Delay}$ . On the other hand, [7] has high mean delay, delay deviation and power consumption for lower variations and improves at higher variation ( $\sigma_{Vth} > 5\% \mu_{Vth}$ ). Alvandpour et al. [8] has low mean delay and power consumption with reasonable delay deviation for lower variations  $(\sigma_{Vth} < 7\% \mu_{Vth})$ , but their characteristics degrade rapidly at higher variations. Overall performance of traditional keeper is acceptable at lower variation ( $\sigma_{Vth}$ ), but for higher  $\sigma_{Vth}$  values, it is not an attractive choice. Nonetheless, proposed keeper has lowest delay (mean and deviation) in entire  $\sigma_{Vth}$  range and its power consumption is only higher than that of [5].

## 5. Conclusions

A novel approach for designing low-power variation-aware keeper circuits of wide fan-in dynamic gates has been presented. The trade off between noise margin, power consumption and performance of dynamic gates was discussed and through a graphical representation, it was shown that conventional keeper circuits generate unnecessary excess contention that can be avoided with proper design of keeper. HSPICE simulation results show that the proposed architecture offers smallest delay deviation for entire  $V_{th}$  range considered in this study, compared to all existing works in the literature. Also, among high-performance designs, its power consumption is the lowest. Results of Monte Carlo simulations suggest that delay distribution for gates that employ proposed keeper have both lower mean delay and deviation. Hence, the proposed keeper architecture could be very effective in increasing the robustness of dynamic gates to process variations.

## 6. REFERENCES

- Hwang, W., Joshi, R. V., and Henkels, W. H. A 500-MHz, 32word×64-bit, eight-port self-resetting CMOS register file. IEEE Journal of Solid-State Circuits, 1999, pp. 56 – 67.
- [2] Li, D., and Mazumder, P. On circuit techniques to improve noise immunity of CMOS dynamic logic. Very Large Scale Integration (VLSI) Systems, 2004, pp. 910 – 925.
- [3] Shanbhag, N., Soumyanath, K., and Martin, S. *Reliable low-power design in the presence of deep submicron noise*. ISLPED, 2000, pp. 295 302.
- [4] Borkar, S., Karnik, T., Narendra, S., Tschanz, J., Keshavarzi, A., and De, V. Parameter variations and impact on circuits and microarchitecture. DAC, 2003, pp. 338 – 342.
- [5] Krishnamurthy, R., Alvandpour, A., Balamurugan, G., Shanbhagh, N., Soumyanath, K., and Borkar, S. A 0.13 μm 6 GHz 256×32b leakage-tolerant register file. VLSI Circuits, 2001, pp. 25 – 26.
- [6] Kuroda, T., Fujita, T., Mita, S., Nagamatu, T., Yoshioka, S., Sano, F., Norishima, M., Murota, M., Kako, M., Kinugawa, M., Kakumu, M., and Sakurai, T. A 0.9 V 150 MHz 10 mW 4 mm2 2-D discrete cosine transform core processor with variable-threshold-voltage scheme. ISSCC, 1996, pp. 166 – 167.
- [7] Kim, C. H., Hsu, S., Krishnamurthy, R., Borkar, S., and Roy, K. Self calibrating circuit design for variation tolerant VLSI systems. On-Line Testing Symposium, 2005, pp. 100 – 105.
- [8] Alvandpour, A., Krishnamurthy, R., Soumyanath, K., Borkar, S. A conditional keeper technique for sub-0.13μ wide dynamic gates. VLSI Circuits, 2001, pp. 29 – 30.
- [9] Bowman, K., Duval, S.G., and Meindl, J.D. Impact of die-to-die and within-die parameter fluctuations on the maximum clock frequency distribution for gigascale integration. IEEE Journal of Solid-state Circuits, 2002, pp.183–190.
- [10] Griffin, M.M., Zerbe, J., Tsang, G., Ching, M., and Portmann, C.L., A process-independent, 800-MB/s, DRAM byte-wide interface featuring command interleaving and concurrent memory operation. Journal of Solid-State Circuits, IEEE, 1998, pp. 1741 – 1751.
- [11] Taur, Y., and Ning, T.H. Fundamentals of Modern VLSI Devices. Cambridge University Press, New York, NY, 1998.
- [12] http://www-device.eecs.berkeley.edu/~ptm/mosfet.html.
- [13] Nassif, S. Delay variability: sources, impacts and trends. ISSCC, 2000, pp. 368 – 369.