A Formulation of Fast Carry Chains Suitable for Efficient Implementation with Majority Elements



#### Behrooz Parhami (2nd author)

Dept. Electrical & Computer Eng. Univ. of California, Santa Barbara

#### **Ghassem Jaberipur**

Shahid Beheshti Univ. & IPM, Iran

#### Dariush Abedi

Shahid Beheshti Univ., Iran

## **Continual Reassessment of Designs**

- Change in cost/delay models with advent of ICs Transistors became faster/cheaper; wires costlier/slower
- Adaptation to CMOS, domino logic, and the like Optimal design for one technology not best with another
- Power and energy-efficiency considerations Voltage levels and number of transitions became important
- Quantum computing and reversible circuits Fan-out; managing constant inputs and garbage outputs
- Nanotech and process uncertainty / unreliability Designs for a wide range of circuit parameters and failures
- Novel circuit elements and design paradigms From designs optimized for FPGAs to biological computing

## Threshold, Majority, Median

Threshold logic extensively studied since the 1940s

"Fires" if weighted sum of the inputs equals or exceeds the threshold value



Majority is a special case with unit weights and t = (n+1)/2

For 3-input majority gate:  $w_1 = w_2 = w_3 = 1$ ; t = 2

For 0-1 inputs, majority is the same as median

Axioms defining a median algebra 
$$\begin{split} \mathsf{M}(a,b,b) &= b\\ \mathsf{M}(a,b,c) &= \mathsf{M}(a,c,b)\,; \qquad \mathsf{M}(a,b,c) = \mathsf{M}(c,a,b)\\ \mathsf{M}(\mathsf{M}(a,x,b),x,c) &= \mathsf{M}(a,x,\mathsf{M}(b,x,c)) \end{split}$$

## **Emerging Majority-Based Technologies**

- Quantum-dot cellular automata (QCA) The basic cell has four electron place-holders ("dots")
- Single-electron tunneling (SET) Based on controlled transfer of individual electrons
- Tunneling phase logic (TPL) Capacitively-coupled inputs feed a load capacitance
- Magnetic tunnel junction (MTJ) Uses two ferromagnetic thin-film layers, free and fixed
- Nano-scale bar magnets (NBM) Scaled-down adaptation of fairly old magnetic logic
- Biological embodiments of majority function Basis for neural computation in human / animal brains

## **Quantum-dot Cellular Automata (QCA)**

The basic cell has four electron place-holders ("dots")







"0"

Three QCA cell configurations



A robust QCA Inverter QCA M gates with 2 sets of inputs

## Single-Electron Tunneling (SET)

**Based on controlled transfer of individual electrons** 



SET circuits for M (left) and inversion (right) [28]

## **Tunneling Phase Logic (TPL)**



The basic TPL gate implements the minority function

$$inv(a) = \overline{a} = minority(a, 0, 1)$$



## **Magnetic Tunnel Junction (MTJ)**



## Nano-scale Bar Magnets (NBM)



Two types of nanomagnet wires

## **The Carry Recurrence and Operator**

 $c_{i+1} = a_i b_i \vee (a_i \vee b_i) c_i \qquad 0 \le i \le n-1$ 

With generate  $g_i = a_i b_i$  and propagate  $p_i = a_i \lor b_i$  signals:  $c_{i+1} = g_i \lor p_i c_i$ 

With group-generate  $G_{i:i}$  and group-propagate  $P_{i:i}$  signals:

$$(G_{i:j}, P_{i:j}) = (G_{i:k} \lor P_{i:k} G_{k-1:j}, P_{i:k} P_{k-1:j})$$

 $c_{i+1} = G_{i:j} \lor P_{i:j}c_j$ 



Carry generation using a majority gate:

$$c_{i+1} = M(a_i, b_i, c_i)$$

## The Full-Adder (FA) Building Block

$$s_i = a_i \oplus b_i \oplus c_i$$
$$c_{i+1} = a_i b_i + (a_i + b_i) c_i$$



FA has been widely studied and optimized Implementation with seven 2-input gates:



## **Majority-Gate Implementations of FA**

Blind mapping: Seven partially utilized M-gates, 2 inverters:



Slide # 012

## Parallel-Prefix Kogge-Stone-Like CGN



#### **Exploiting Fully Utilized M-Gates:** First Attempt by Pudi *et al.*



[61% fewer M-gates than with blind mapping]

#### **Exploiting Fully Utilized M-Gates:** Second Attempt by Perri *et al.*

Two-bit CGN with 1 M CDP in  $c_i$ -to- $c_{i+2}$  path

Total for 8-bit adder: 24 [67% fewer M-gates than with blind mapping]



$$c_{i+2} = M(M(a_{i+1}, b_{i+1}, p_i), M(a_{i+1}, b_{i+1}, g_i), a_{i+1}, a_{i+1}, a_{i+1}, a_{i+1})$$

Conventional (2M delay, 2 FUM):  $c_{i+2} = M(a_{i+1}, b_{i+1}, M(a_i, b_i, c_i))$ 

#### Our Compromise Solution (1M carry-path delay, 3 FUM)

 $c_{i+2} = M(M(a_{i+1}, b_{i+1}, a_i), M(a_{i+1}, b_{i+1}, b_i), c_i)$ 

$$A_{i+1:i} = M(a_{i+1}, b_{i+1}, a_i)$$
  

$$B_{i+1:i} = M(a_{i+1}, b_{i+1}, b_i)$$
  

$$c_{i+2} = M(A_{i+1:i}, B_{i+1:i}, c_i)$$

Think of  $A_{i+1:i}$  and  $B_{i+1:i}$ , as representing 2-bit inputs  $a_{i+1}a_i$  and  $b_{i+1}b_i$ 

Example:

$$a_{i+1}a_i = c_i = 1 \Longrightarrow a_i = c_i = 1 \Longrightarrow$$
$$c_{i+1} = 1 \text{ and } a_{i+1} = 1 \Longrightarrow c_{i+2} = 1$$

## **Generalizing the Compromise Solution**

Twin M-gate:

 $(A_{j:i}, B_{j:i}): (M(a_j, b_j, A_{j-1:i}), M(a_j, b_j, B_{j-1:i}))$ 

Majority group generate and propagate:

$$\begin{split} & \Gamma_{j:i} = A_{j:i} B_{j:i} & \Pi_{j:i} = A_{j:i} + B_{j:i} \\ & \Gamma_{j:i} = g_j + p_j \Gamma_{j-1:i} & \Pi_{j:i} = g_j + p_j \Pi_{j-1:i} \end{split}$$



 $(A, B): (M(A_1, B_1, A_r), M(A_1, B_1, B_r))$ 

Properties:

$$\begin{split} \Gamma_{j:i} &= g_j + p_j \Gamma_{j-1:i} \\ \Pi_{j:i} &= g_j + p_j \Pi_{j-1:i} \\ c_{i+j+1} &= M(A_{i+j:i}, B_{i+j:i}, c_i) \end{split}$$

Associativity:

$$A_{k+j:i} = M(A_{k+j:j}, B_{k+j:j}, A_{j-1:i}), B_{k+j:i} = M(A_{k+j:j}, B_{k+j:j}, B_{j-1:i})$$

# KS-Like and LF-Like M-Based CGNs (with C<sub>in</sub>)



 $(a_7,b_7)$   $(a_6,b_6)$   $(a_5,b_5)$   $(a_4,b_4)$   $(a_3,b_3)$   $(a_2,b_2)$   $(a_1,b_1)$   $(a_0,b_0)$   $c_{in}$ 



#### KS-Like M-Based CGNs (with C<sub>in</sub>) (% of FUM: 100)



#### LF-Like M-Based CGNs (with C<sub>in</sub>) (% of FUM: 100)



## Scaling up to 16-bit KS-Like Design



## **QCA Implementation: 8-Bit LF-Like**



B. Parhami

ARITH-23: Fast Carry Chains with Majority Elements Slide

Slide # 022

## Comparison with Previous Work (8-bit CGN)

|             | Delay<br>(clock zone) | PUM* | FUM* | Total M |
|-------------|-----------------------|------|------|---------|
| New KS-like | 6                     | 0    | 30   | 30      |
| New LF-like | 6                     | 0    | 20   | 20      |
| [13]        | 9                     | 28   | 7    | 35      |
| [15]        | 9                     | 15   | 13   | 28      |

#### \* Partially / Fully-Utilized M-Gates

## **Conclusions and Future Work**

- Best M-based carry-network designs to date
  - More efficient use of (fully utilized) M-gates
  - Applicable to a variety of PPN design styles
  - Benefits over naïve designs and prior attempts
- Majority-friendly tech's becoming important
  - Improve, assess, and fine-tune implementations
  - Extend designs to several other word widths
  - Obtain generalized cost / latency formulas
  - Pursue design methods for other technologies

# **Questions or Comments?**

parhami@ece.ucsb.edu http://www.ece.ucsb.edu/~parhami/ jaberipur@sbu.ac.ir

