Received 15 December 2014; revised 5 February 2015; accepted 20 March 2015. Date of publication 2 April 2015; date of current version 16 July 2015.

Digital Object Identifier 10.1109/JXCDC.2015.2418033

# Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits

DMITRI E. NIKONOV (Senior Member, IEEE), AND IAN A. YOUNG (Fellow, IEEE)

Components Research, Intel Corporation, Hillsboro, OR 97124 USA CORRESPONDING AUTHOR: D. E. NIKONOV (dmitri.e.nikonov@intel.com)

**ABSTRACT** A new benchmarking of beyond-CMOS exploratory devices for logic integrated circuits is presented. It includes new devices with ferroelectric, straintronic, and orbitronic computational state variables. Standby power treatment and memory circuits are included. The set of circuits is extended to sequential logic, including arithmetic logic units. The conclusion that tunneling field-effect transistors are the leading low-power option is reinforced. Ferroelectric transistors may present an attractive option with faster switching delay. Magnetoelectric effects are more energy efficient than spin transfer torque, but the switching speed of magnetization is a limitation. This article enables a better focus on promising beyond-CMOS exploratory devices.

**INDEX TERMS** Adder, arithmetic logic unit (ALU), beyond-CMOS, computational throughput, electronics, ferroelectric, integrated circuits, logic, magnetoelectric, power dissipation, spintronics.

## I. INTRODUCTION

**S** UCCESSFUL scaling of CMOS field-effect transistors (FETs) has occurred over more than four decades according to the cadence of Moore's law [1] and has now reached gate lengths of 20 nm in [2]. The projection of scaling limits and quantum limits on the size of electronic transistors [3] has increased the urgency of research for finding beyond-CMOS device options. Various research centers, including the Nanoelectronics Research Initiative (NRI), were therefore organized [4].

Benchmarking devices became an important effort within the NRI. It not only provided a comparison between various device options, but also demanded a clear articulation of how each device performs digital computation with an integrated circuit. The first pass at beyond-CMOS benchmarking [5] (which we call BCB 1.0) was performed based on estimates done by each device group that were neither reviewed by others nor validated through an audit process. The next release (BCB 2.0) [6], [7] applied a simple, transparent, and uniform methodology to all the beyond-CMOS devices to derive an estimate of logic circuit performance—area, switching delay, and energy. It took a few device parameters from device groups as inputs and calculated the performance estimates using consistent equations and assumptions. BCB 2.0 received positive feedback and assurances of usefulness from the research community. Since then, the thrust of NRI has been renewed and a new program, STARnet was added with a common mission. Their exploratory research scope included a wider range of exploratory materials, switching mechanisms, and devices. In general, the understanding of important aspects of beyond-CMOS circuit operation and requirements improved. Several review papers [8], [9] focused on different classes of beyond-CMOS devices. We decided to prepare a new release of benchmarking (BCB 3.0) to reflect the improved understanding and to update it with the current input parameters and to add the additional new beyond-CMOS devices. Several changes were made compared with the previous BCB 2.0 release [7]. We have included 11 additional devices. New switching mechanisms are included: 1) spin Hall effect; 2) ferroelectric switching; and 3) piezoelectric switching. A better understanding of the role of the power supply and clocking of spintronic devices by charge currents is achieved. We now include not only active power, but also standby power in circuits. Most importantly, we drastically extended the functionality of benchmarking circuits, which now comprises sequential logic up to arithmetic logic units (ALUs). Also a number of minor improvements were made throughout the benchmarking methodology. One can follow all the changes and utilize

BCB 3.0 by downloading the new version of the MATLAB code [10]. We maintain the approach of 2.0, preferring simpler analytical estimates of performance and recognizing that the performance estimates can only be approximate at this stage of the research into of beyond-CMOS devices. The goal of this paper is to capture the changes made and the new insights gained since the previous BCB release.

## **II. NEW DEVICES**

The concept of computational variables for devices was described in [7] and [11]. In BCB 3.0 we include additional computational variables (Table 1). Ferroelectric devices rely on electric polarization. Piezoelectronic devices utilize stress and strain mechanisms to be switched. Metal-insulator transition devices are switched between states with different orbital states of electrons within the crystal lattice cell. Based on their approach to the circuit architecture, we divide beyond-CMOS circuits into two major groups-transistor circuits (TCs) and majority-gate circuits (MGCs). The first mostly includes electronic, ferroelectric, straintronic, and orbitronic devices, while the second is mostly composed of spintronic devices. However, there are exceptions: even though a spinFET is a spintronic device, we use only its transistor-like operation in making circuits. Also the spintronic device STT/DW has electrical interconnects and is connected into circuit topologies reminiscent of CMOS. The reason why the majority gates are used for most of spintronic devices is that there is no analog of a transistor for magnetization. A transistor uses charge on a gate to control current which can switch another gate. However, the magnetization cannot directly control a flux of any physical quantity that can switch magnetization (e.g., spin current). Instead one can use the vector property of magnetization where the addition of three input magnetizations gives a vector with a definite direction. Note that majority-gate logic is a subclass of threshold logic [12]. Below we relate many estimates back to an intrinsic switching element. For TC, the intrinsic element is a transistor. For MGC, the intrinsic element is a single nanomagnet, rather than a whole majority gate.

TABLE 1. Computational variables and corresponding devices.

| Class           | VARIABLES            | Example                 |
|-----------------|----------------------|-------------------------|
| Charge          | Q, I, V              | CMOS, TFET              |
| Electric Dipole | Р                    | FeFET                   |
| Magnetic Dipole | M, I <sub>spin</sub> | ASL, SWD, NML           |
| Orbital State   | Orb, Bose condensate | BisFET                  |
| Strain          | σ                    | PiezoFET                |
| O=charge I=     | current V=voltage    | P=electric polarization |

Q=charge, I=current, V=voltage, P=electric polariza M=magnetization,  $I_{spin}$ =spin current, Orb=orbital state,  $\sigma$ =stress.

The list of considered devices changed compared with [7]. One device, STTtriad, has been removed due to generally worse performance estimates and a lack of continued research on it. Several devices currently under study at NRI and STARnet centers have been added: ITFET [13] aka SymFET [14], FEFET [15], NCFET [16], PiezoFET [17], MITFET [18], CSL [19], ExFET [5], ThinTFET [20], GaNTFET [21], TMDTFET [22], and van der Wall FET (vdwFET) [23]. Also the mLogic [24] device concept has been merged with that of the STT/DW [25] due to the similarity of the essential elements of the devices. We do not include semiconductor spintronic devices, such as the spin gain transistor [26]. The physical principles of the operation of the devices are discussed in [27]. In addition, a list summarizing the devices under consideration and their attributes is in Table 2. The cells are colored according to their computational variables with a coloring scheme consistently used in this paper. The newly included devices involve a larger variety of switching mechanisms and provide more options for beyond-CMOS logic.

 TABLE 2. List of devices under consideration with their

 employed computational variables and classification.

| Device name                           | acronym  | input(s) | control | int state | output | material                            |
|---------------------------------------|----------|----------|---------|-----------|--------|-------------------------------------|
| Si MOSEET high perf                   | CMOS HP  | V        | Va      | 0         | v      | silicon                             |
| Si MOSFET low voltage                 | CMOSIV   | v        | Vø      | 0         | v      | InAs                                |
| van der Walls FFT                     | vdWFFT   | v        | Va      | 0         | v      | MoS                                 |
| Homojunction III-V TEET               | HomITEET | v        | Va      | R         | v      | InAs                                |
| Heterojunction III-V TEFT             | HetITEET | v        | Vø      | R         | v      | GaSh/InAs                           |
| Graphene nanoribbon TEET              | gnrETET  | v        | Vø      | R         | v      | graphene                            |
| Interlayer tunneling FET              | ITFET    | v        | Vø      | R         | v      | graphene                            |
| Two D Heterojunction Interlayer TFET  | ThinFET  | v        | Vø      | R         | v      | WTe <sub>2</sub> /SnSe <sub>2</sub> |
| Gan TEET                              | GaNFET   | v        | Vø      | R         | v      | GaN                                 |
| Transition Metal Dichalchogenide TEET | TMDTEET  | v        | Vø      | R         | v      | WTe                                 |
| Graphene pn-junction                  | GpnJ     | v        | Vg      | R         | v      | graphene                            |
| Ferroelectric FET                     | FEFET    | V        | Vg      | P         | v      | PZT                                 |
| Negative capacitance FET              | NCFET    | v        | Vg      | Р         | v      | PZT                                 |
| Piezoelectric FET                     | PiezoFET | v        | V       | σ         | v      | AIN                                 |
| Bilayer pseudospin FET                | BisFET   | v        | Vg      | BC        | v      | graphene                            |
| Excitonic FET                         | ExFET    | v        | Vg      | BC        | v      | MoS <sub>2</sub> /MoSe <sub>2</sub> |
| Metal-insulator transistor            | MITFET   | v        | Vg      | Orb       | v      | NdNiO <sub>3</sub>                  |
| SpinFET (Sughara-Tanaka)              | SpinFET  | V        | Vg, Vm  | Q, M      | v      | CoFeB                               |
| All-spin logic                        | ASL      | м        | V       | м         | м      | CoPtCrB                             |
| Charge-spin logic                     | CSL      | 1        | V       | М         | I      | CoPtCrB                             |
| Spin torque domain wall               | STT/DW   | 1        | V       | м         | I      | CoFeB                               |
| Spin majority gate                    | SMG      | м        | V       | М         | М      | PMN-PT                              |
| Spin torque oscillator                | STO      | 1        | V       | M         | 1      | CoPtCrB                             |
| Spin wave device                      | SWD      | м        | l or V  | М         | м      | PMN-PT                              |
| Nanomagnetic logic                    | NML      | м        | B or V  | М         | м      | PMN-PT                              |

B=magnetic field, Vg=gate voltage, Vm=magnetic switching voltage, BC=Bose condensate. The cell color designates the computational variable: blue=electronic, orange=ferroelectric, yellow=straintronic, purple=orbitronic, red=spintronic devices. Key materials are also listed.

# III. SWITCHING MECHANISMS AND MATERIAL PARAMETERS

Devices with various computational variables require appropriate physical effects for their switching. We distinguish four switching mechanisms: 1) electronic—charging of a gate capacitor by current; 2) ferroelectric—similar to electronic, but accompanied by switching of electric polarization; 3) current-driven magnetization switching; and 4) voltage-driven magnetization switching. The first two are employed in TC; the second two underpin MGC. Models for these switching mechanisms employ material parameters collected in [27]. Several parameters were kept the same as in BCB 2.0 (refer to [7] for their definition and values). Here we focus on the models that are new or updated in BCB 3.0.

Models for switching mechanisms are built to obtain the switching delay and energy for intrinsic elements (transistors or nanomagnets). While beyond-CMOS devices are very diverse, we reduce their operation to a few



FIGURE 1. Scheme of driving switching of (a) electric device, (b) ferroelectric device, (c) ferromagnetic device, and (d) magnetoelectric device.

cases based on fundamental principles and switching mechanisms.

In electronic switching [Fig. 1(a)] current generated by a transistor is used to charge a gate of a transistor in the next logic state. The characteristic switching delay (determined by charging of the gate) and energy are obtained via well-known expressions [28]

$$t_{\rm el} \approx C V_{\rm dd} / I, \quad E_{\rm el} \approx C V_{\rm dd}^2$$
 (1)

where  $V_{dd}$  is the power supply voltage, *I* is the ON-current in a transistor, and *C* is the capacitance that is driven in switching. This capacitance includes the capacitance of the gate dielectric, the semiconductor capacitance, and the parasitic capacitance (such as fringing) for one transistor. The nontrivial aspect of applying this model is the need for a thorough accounting of the capacitances of the gates, and interconnects, and parasitic capacitances.

Ferroelectric switching [Fig. 1(b)] is treated similarly to the electronic one. The difference is that the total transfer charge includes a term proportional to the saturation polarization of the ferroelectric  $P_{fe}$ 

$$Q = P_{\text{fe}}A + CV_{\text{dd}}, \quad t_{\text{ch}} \approx Q/I, \quad E_{\text{el}} \approx QV_{\text{dd}}.$$
 (2)

Also, the switching delay is limited by the intrinsic response time of a ferroelectric  $\tau_{fe}$  rather than the charging time  $t_{ch}$ . Experiments [29] provide a value of ~70 ps for large area samples. We adopt optimistic estimates of the material parameters of nanoscale ferroelectric materials [27]. We also treat piezoelectric switching using this ferroelectric switching model, though we assume that polarization is only partially switched.

Current-driven magnetization switching [Fig. 1(c)] is primarily based on the spin transfer torque effect. In such a case, the current is spin polarized in a pinned magnet and the polarized spin is used to switch a free magnet. It can be modeled as in the previous BCB 2.0 release [30], [31]. The spin-polarized current  $I_s = PI$  needs to exceed the critical value

$$I_{\rm cs} = e\alpha\mu_0 M_{\rm s}^2 v_{\rm nm}/\hbar \tag{3}$$

where  $v_{nm}$  is the volume of a nanomagnet. Material parameters are defined in [27]. The switching time and energy are

$$t_{\rm stt} \approx \frac{eM_s v_{\rm nm}}{g\mu_B (I_s - I_{\rm cs})} \log\left(\frac{2\sqrt{2\pi}}{\sqrt{\Delta}}\right), \quad E_{\rm stt} = IV_{\rm dd} t_{\rm stt}.$$
 (4)

The spin torque is zero when there is zero angle between the injected spin polarization and the magnetization in the nanomagnet. Thermal fluctuations provide an initial angle difference to start the switching, which is thus a stochastic process. The above switching delay calculation corresponds to a 50% probability of switching. To account for the thermal spread, we introduce an additional factor of three into the delay estimate. Once magnetization is switched in one part of a ferromagnetic wire, it can propagate as a domain wall or a spin wave to the next logic stage. Switching by the spin Hall effect is described similarly. In this case, the spin-polarized current is generated by and flows perpendicular to the charge current in a material with strong spin-orbit coupling (such as Pt, W, or Ta). The spin current injected in the nanomagnet of width  $w_{nm}$  is approximately

$$I_s = \theta_{\rm she} w_{\rm nm} I / d_{\rm she}.$$
 (5)

Voltage-driven magnetization switching [Fig. 1(d)] is performed by a magnetoelectric effect [32]. When a capacitor is charged, and electric polarization of the material is switched, bulk or surface magnetoelectric coupling causes switching of the magnetization. We consider four cases of the magnetoelectric effect in various material combinations and treat each of them in a similar manner.

Material parameters, including the measured electric field and the corresponding effective magnetic field, are provided in [27] and are distinguished with corresponding subscripts. The cases are: mf = exchange bias exerted by a multiferroic material effect (such as in bismuth-iron-oxide) [33], me = exchange bias in a linear magnetoelectric effect (such as in chromia,  $Cr_2O_3$ ) [34], ms = piezoelectric material (such as lead magnesium niobate-lead titanate) exerts strain on a ferromagnet with a high magnetostrictive coupling effect [35], su = electrically switchable surface anisotropy effect between a ferromagnet and a dielectric (such as MgO) [36]. The required electric field, switching delay of a nanomagnet, charge, and magnetoelectric charging energy, and the charging time are

$$\mathbf{E}_r = \mathbf{E}_{\mathrm{mf}} B_c / B_{\mathrm{mf}}, \quad t_{\mathrm{mag}} = \pi / (\gamma B_c) \tag{6}$$

$$Q = A_{\rm me}(\varepsilon_0 \varepsilon_{\rm mf} E_r + P_{\rm mf}) \tag{7}$$

$$t_{\rm mf} \approx Q/I, \quad E_{\rm me} \approx Q E_r t_{\rm mf}.$$
 (8)

Here the area of the magnetoelectric surface is  $A_{\rm me}$ , and the critical magnetic field  $B_c \approx 0.1$  T for switching the nanomagnets is obtained from micromagnetic simulations.

Note that all of these switching mechanisms involve a short pulse of charge current. However, the current plays a different role in each. In electronic and ferroelectric devices, current carries the computational variable from one logic stage to the next. In spintronic devices, magnetization is the computational variable passed to the next stage. The current pulse serves both as the power supply to perform the switching and as a clock to time the operation. This realization is a major change compared with the initial intent of NRI. It was believed that beyond-CMOS devices needed to avoid charge current in order to be energy efficient. Since then, charge current has turned out to be an indispensable aspect of beyond-CMOS devices. Note that therefore the energy of clocking separate circuits is included, but not the contribution of the clock distribution on the whole chip.

## **IV. DEVICE PARAMETERS**

In BCB 3.0, we keep the values for physical geometry parameters consistent with those presented in the 2011 edition of the International Technology Roadmap for Semiconductors, and its 2018 node (F = 15 nm) [37]. CMOS HP parameters are taken directly from [37] for high-performance transistors. CMOS LV is an InAs device [38] of the same geometry simulated [39], [40] with the supply voltage scaled down to 0.3 V. Other parameters (summarized in [27]) are specific to each device and come from the simulations of these devices (with methods ranging from drift-diffusion to quantum transport) performed by university groups participating in NRI and STARnet. The geometries of these devices are not necessarily the same as that of CMOS. For example, the gate length of tunneling FET (TFETs) is chosen to be longer than that of CMOS. We determine the capacitance of the gates from the common intrinsic gate capacitance and by using devicespecific adjustment factors [27, Tables 1 and 5]

$$C_{\rm tot} = C_g (M_{\rm cpar} + M_{\rm cadj}). \tag{9}$$

The contact resistance is accounted for by a simple rescaling [41]. The resistance of the ON-state of a transistor and the rescaled ON-current with contact resistance  $R_{\text{cont}}$  are

$$R_{\rm ON} = V_{\rm dd}/I_{\rm ON}, \quad \tilde{I}_{\rm ON} = I_{\rm ON}/(1 + R_{\rm cont}/R_{\rm ON}).$$
 (10)

Note that we denote the total current through the devices by I, while  $\tilde{I}_{ON}$  is current per unit width. For spintronic devices, CMOS transistors are needed for controlling the power supply of current. They can operate with voltage bias between the source and the drain much smaller than 1 V. We approximate the resistance per unit width of a CMOS transistor in the linear portion of the I-V characteristic by

$$R_L = (R_{\rm ON} + R_{\rm cont})/3.$$
 (11)

The resistance of this CMOS transistor channel can be larger than the resistance of both the ferromagnetic to normal metal junction (with the resistance-area product, RA) and the short interconnect wires  $R_{wir}$ . Therefore, it is the power supply transistor (and its width  $w_X$ ) that determines the current available for switching. For a magnet with area  $a_{mag}$ , the resistance of the total current path in current-driven spintronics devices is

$$R_{\rm stt} = RA \cdot a_{\rm mag} + R_{\rm wir} + R_L w_X. \tag{12}$$

We also design the width of the transistors such that the second and third terms are approximately equal. We start with the minimum acceptable voltage for spintronic devices  $V_{dd} = V_{st}$ . If the resistance proves to be too high to ensure sufficient switching current  $I_{dev}$ , we choose a higher voltage to maintain the required current  $V_{dd} = R_{stt}I_{dev}$ .

#### **V. AREA ESTIMATES**

Consistent with the analysis in BCB 2.0, the layout pitch is  $p_m = 4F$ , where *F* is the metal-1 half-pitch [37]. We set F = 15 nm, corresponding to the 2018 node. We start with areas of simpler circuits and build up to estimates of more complicated circuits. For details, see [27].

#### VI. TREATMENT OF STANDBY POWER

One aspect missing in previous BCB 2.0 releases was the treatment of standby power. In logic circuits, standby power is dissipated due to leakage current in transistors which flows between the power and ground networks even when no input voltages are switched in circuits. In this leakage analysis of power, we leave the clocking circuit power out, though it too can contribute to standby power. In transistors there are two components of the leakage current in the OFF state: current from source to drain and leakage current through the gate dielectric. They are quantified by the drain current (per unit width) in the OFF state  $I_{OFF}$  and by the gate leakage per unit area,  $J_g$ , respectively. Therefore, the two components are

$$S_{\rm sd} = I_{\rm OFF} V_{\rm dd} w_X, \quad S_g = J_g L_{\rm ch} w_X V_{\rm dd}. \tag{13}$$

The situation becomes more complex in the context of a circuit, in which various voltages are applied to the terminals of transistors and thus, they can be in ON or OFF states.

In an inverter [Fig. 2(a)] depending on the input, one transistor is in a low-resistance,  $R_L$ , ON state, while the other is in a high-resistance,  $R_H$ , OFF state. The voltage drop is insignificant across the ON transistor, and thus, the overall leakage current is limited by the high-resistance



FIGURE 2. Schemes of an inverter (a) and a NAND gate with various inputs (b)–(d). Resistances of the transistors are shown for the purposes of standby power calculation:  $R_L$  = low resistance in the ON state,  $R_H$  = high resistance in the OFF state, and 'block' = extremely high resistance due to this transistor's negative gate-source bias.

1

OFF transistor. The standby power for the inverter is the same as for a single transistor and is given by the sum of the two leakage components

$$S_{\rm int} = S_{\rm sd} + S_g. \tag{14}$$

In multi-input gates, like the NAND in Fig. 2(b)-(d), nFET transistors in the pull-down network have larger widths to compensate for their resistance being in series. The ON and OFF states are determined by combinations of the input conditions. The overall standby power needs to be weighted by the probabilities of these input conditions. For example, the input combinations in Fig. 2(c) and (d) have a probability of 0.25. The input combination in Fig. 2(b) and the equivalent one with inputs exchanged have together a probability of 0.5. Standby powers in the cases shown in Figs. 2(b) and (c) are equal to  $2S_{int}$ . The combination shown in Fig. 2(b) is a special case. One of the transistors (designated as block) is in a state of negative gate-to-source bias (this is due to the so-called stacking effect [42]) and thus has an even higher resistance OFF state that effectively blocks any leakage current. The standby power in this case is much smaller than with the other input combinations. Instead of analytically summing standby powers for the above combinations, we obtained the adjustment factors from circuit simulations. For CMOS and other non-tunneling transistors, these factors [43] are listed in Table 3. For tunneling transistors, the stacking effect proves to be even more pronounced and the factors are different.

| PARAMETER                                 | SYMBOL                         | VALUE |
|-------------------------------------------|--------------------------------|-------|
| Standby factor for 2-input NAND           | $M_{Snand 2}$                  | 1.625 |
| Standby factor for 3-input NAND           | M Snand 3                      | 0.897 |
| Standby factor for 4-input NAND           | $M_{\it Snand4}$               | 0.625 |
| Standby factor for tunneling 2-input NAND | $M_{Snand2}$                   | 1.535 |
| Standby factor for tunneling 3-input NAND | $M_{Snand 3}$                  | 0.837 |
| Standby factor for tunneling 4-input NAND | $M_{\it Snand4}$               | 0.6   |
| Inverter delay factor                     | $M_{_{tinv}}$                  | 0.9   |
| Inverter energy factor                    | $M_{_{Einv}}$                  | 1.4   |
| NAND delay factor                         | $M_{\scriptscriptstyle tnand}$ | 2     |
| NAND energy factor                        | $M_{_{Enand}}$                 | 2     |

TABLE 3. Circuit parameters.

Nanomagnets in spintronic circuits are nonvolatile. In the time intervals when they do not need to be switched, power can be turned OFF and thus, no current is flowing through the nanomagnets; the nanomagnets would also still retain their states. So spintronic circuits can in theory have zero standby power. In reality, standby power is dominated by the current driving transistors. Even if the gate voltage is set such that driving transistors do not transmit any current, there will be leakage current. We assume that for current-driven circuits, standby power is equal to *S*<sub>int</sub> per nanomagnet. It is slightly different for voltage-driven circuits. Their current does not

flow to the ground, but instead it charges a capacitor with negligible leakage. We assume that only the gate leakage part remains for these circuits.

#### VII. ESTIMATES FOR TRANSISTOR CIRCUITS

We start with simple combinational logic gates: inverters with fan-out of one or four, NANDs with two, three, and four inputs. Then they are used as building blocks for more complicated circuits. In BCB 3.0 we add sequential logic-a state element and a register bit. Most TCs have the complementary transistor implementation [28]. We make the approximation for the ON-current of pFET to be equal to that of nFET. In multi-input gates, transistors in the pull-up and pull-down networks are sized to have the same ON-current. The estimates for switching delay and energy of simple circuits are proportional to those for intrinsic elements,  $t_{int}$ ,  $E_{int}$ , which are given by (1)– (8) for specific switching mechanisms. In a more complicated circuit, a stage is driven by the previous stage and drives the next stage. First we estimate the delay and energy of circuits being driven by a minimum-size inverter. For an inverter driving an inverter with fan-out (FO)

$$t_{\rm inv}(\rm FO) = (\rm FO + 1) \cdot M_{t \rm inv} t_{\rm int} + L_{\rm inv} t_{\rm ic} \qquad (15)$$

$$E_{\rm inv}({\rm FO}) = {\rm FO} \cdot (M_{E\rm inv}E_{\rm int} + L_{\rm inv}E_{\rm ic})$$

$$S_{\rm inv}({\rm FO}) = {\rm FO} \cdot S_{\rm int}$$
 (16)

see (14) and (31)–(36) [27] for symbols used. For an inverter driving an input of a NAND gate with number of inputs (NI)

1

$$t_{\text{NAND}}(\text{NI}) = \text{NI} \cdot M_{t\text{NAND}} t_{\text{int}} + L_{\text{NAND}} t_{\text{ic}}$$
 (17)

$$E_{\text{NAND}}(\text{NI}) = \text{NI} \cdot M_{E\text{NAND}} E_{\text{int}} + L_{\text{NAND}} E_{\text{ic}} \qquad (18)$$

$$S_{\text{NAND}}(\text{NI}) = M_{S\text{NAND},\text{NI}}S_{\text{int}}.$$
(19)

The empirical factors were chosen to approximately match the results of SPICE simulation with Arizona compact predictive technology models [44], [45] of a 10-nm-highperformance FinFET transistor. This is a change from BCB 2.0, where the PETE circuit simulator [46] was used is missing. The factors are in order-of-magnitude agreement with analytical equations [28]. The numerical values for these constants are in Table 3. We assume the estimates for NOR to be the same as for NAND. Then more complicated circuits are built up from these simple parts. We adopt a straightforward approach—summing delays of all parts on the critical path and summing energies of all parts in the circuit as shown by diagrams in [27]. For an exclusive or (XOR), 1:4 mux and 4:1 demux

$$f_{\rm XOR} = 3t_{\rm NAND2} + L_{\rm XOR}t_{\rm ic} \tag{20}$$

$$E_{\text{XOR}} = 4E_{\text{NAND2}} + L_{\text{XOR}}E_{\text{ic}}, \quad S_{\text{XOR}} = 4S_{\text{NAND2}} \quad (21)$$

$$t_{\text{mux}} = t_{\text{NAND4}} + t_{\text{NAND3}} + t_{\text{inv1}} + L_{\text{mux}}t_{\text{ic}}$$
(22)

$$E_{\text{mux}} = E_{\text{NAND4}} + 4E_{\text{NAND3}} + 2E_{\text{inv1}} + L_{\text{mux}}E_{\text{ic}} \quad (23)$$

$$S_{\text{mux}} = S_{\text{NAND4}} + 4S_{\text{NAND3}} + 2S_{\text{inv1}}$$
(24)

$$t_{\rm dem} = t_{\rm NAND3} + t_{\rm inv1} + L_{\rm dem} t_{\rm ic}$$
(25)

$$E_{\rm dem} = 4E_{\rm NAND3} + 2E_{\rm inv1} + 4L_{\rm dem}E_{\rm ic}$$
(26)

$$S_{\rm dem} = 4S_{\rm NAND3} + 2S_{\rm inv1}.$$
 (27)

For the sequential circuits, a memory cell and a state element are treated similarly. With non-TFETs and the 6T-SRAM cell (assuming that access transistors do not contribute to standby power)

$$t_{\rm rb} = 2t_{\rm inv1} + L_{\rm ram}t_{\rm ic} \tag{28}$$

$$E_{\rm rb} = 3E_{\rm inv1} + 2L_{\rm ram}E_{\rm ic}, \quad S_{\rm rb} = 2S_{\rm inv1}.$$
 (29)

With TFETs and the 8T-SRAM cell, additional transistors contribute to the switching energy and leakage, but do not slow down writing of the cell

$$t_{\rm rb} = 2t_{\rm inv1} + L_{\rm ram}t_{\rm ic} \tag{30}$$

$$E_{\rm rb} = 4E_{\rm inv1} + 2L_{\rm ram}E_{\rm ic}, \quad S_{\rm rb} = 3S_{\rm inv1}.$$
 (31)

For the state element (gated D-latch) (see [27] for schematics)

$$t_{\rm se} = 2t_{\rm inv1} + 3t_{\rm nan2} + L_{\rm se}t_{\rm ic} \tag{32}$$

$$E_{\rm se} = 3E_{\rm inv1} + 4E_{\rm nan2} + 2L_{\rm ram}E_{\rm ic}, \quad S_{\rm ram} = 3S_{\rm inv1} + 4S_{\rm nan2}.$$
(33)

For the 1-bit full adder, using the activity factors from [46]

$$t_{\text{add1}} = 3t_{\text{XOR}}/2 + 5t_{\text{nan2}}/2 + L_{\text{add1}}t_{\text{ic}}$$
 (34)

$$E_{\text{add1}} = 7E_{\text{XOR}}/8 + 351E_{\text{nan2}}/512 + 2L_{\text{add1}}E_{\text{ic}} \quad (35)$$

$$S_{\text{add1}} = 2S_{\text{XOR}} + 3S_{\text{nan2}}.$$
(36)

We use the most straightforward circuit implementation for an adder—the ripple carry adder. In FETs, all delays and energies are just multiplied by the number of bits. This implies that every bit dissipates energy only when the carry bit propagates and it is switched. A special case is that for BisFET and ITFET. They belong to negative differential resistance (NDR) logic [47]. This logic needs to be clocked on every cycle, regardless of whether a certain gate is switched or not. This suggests an additional factor of 32 in energy. Here the distinction between the active and standby power is blurred: both are dissipated on every clock cycle. Due to their dynamic nature, such circuits are somewhat reminiscent of dynamic logic [28]. The need to clock NDR logic in order to operate the gate probably makes register files (RFs) impossible to implement with them.

## **VIII. ESTIMATES FOR MAJORITY GATE CIRCUITS**

As stated before, most of spintronic circuits are MGC. They are also mostly nonvolatile. Therefore, every node potentially has the functionality of a latch. An inverter is actually simpler than a majority gate and requires just one intrinsic element a nanomagnet [27]

$$t_{\text{inv1}} = t_{\text{int}} + L_{\text{inv1}}t_{\text{ic}}, \ E_{\text{inv1}} = E_{\text{int}} + L_{\text{inv1}}E_{\text{ic}}, \ S_{\text{inv1}} = S_{\text{int}}.$$
(37)

The register bit and the state element are equivalent to an inverter, i.e., they contain just an output and an input magnet

$$t_{\rm rb} = t_{\rm se} = t_{\rm inv1}, \quad E_{\rm rb} = E_{\rm se} = E_{\rm inv1}, \quad S_{\rm rb} = S_{\rm se} = S_{\rm inv1}.$$
(38)

An inverter with a fan-out of two can be implemented with just one majority gate with one input and three outputs; thus, all its characteristics are the same as those for two-input NAND (below). A fan-out of four required two such MGs cascaded and four output nanomagnets driving the next stages of the circuits

$$t_{\rm inv4} = 2t_{\rm int} + L_{\rm inv4}t_{\rm ic} \tag{39}$$

$$E_{\text{inv4}} = 5E_{\text{int}} + 4L_{\text{inv4}}E_{\text{ic}}, \quad S_{\text{inv4}} = 5S_{\text{int}}.$$
 (40)

Two input NAND or NOR gates are obtained by fixing one of the inputs of a three-input majority gate to 1 or 0, respectively. Three or four input gates require more majority gates

$$t_{\text{NAND}}(\text{NI}) = (\text{NI} - 1)t_{\text{int}} + L_{\text{nan}}t_{\text{ic}}$$
(41)

$$E_{\text{NAND}}(\text{NI}) = (3\text{NI} + 1)E_{\text{int}} + L_{\text{nan}}E_{\text{ic}}$$

$$S_{\text{nan}}(\text{NI}) = (3\text{NI} + 1)S_{\text{int}}.$$
(42)

For the exclusive OR [27]

$$t_{\rm XOR} = 2t_{\rm NAND2} + t_{\rm inv1} + L_{\rm XOR}t_{\rm ic}$$
(43)

$$E_{\rm XOR} = 3E_{\rm NAND2} + 2E_{\rm inv1} + L_{\rm XOR}E_{\rm ic}$$

$$S_{\text{XOR}} = 3S_{\text{NAND2}} + 2S_{\text{inv1}}.$$
(44)

Here a majority gate is equivalent to a 2-input NAND. For 1:4 mux and demux

$$t_{\rm mux} = 4t_{\rm NAND2} + 2t_{\rm inv2} + L_{\rm mux}t_{\rm ic} \tag{45}$$

$$E_{\rm mux} = 9E_{\rm NAND2} + 3E_{\rm inv2} + L_{\rm mux}E_{\rm ic}$$

$$S_{\rm mux} = 9S_{\rm nan2} + 3S_{\rm inv2} \tag{46}$$

$$t_{\rm dem} = 2t_{\rm NAND2} + t_{\rm inv2} + L_{\rm dem}t_{\rm ic} \tag{47}$$

$$E_{\rm dem} = 8E_{\rm NAND2} + 2E_{\rm inv2} + 4L_{\rm dem}E_{\rm ic}$$

$$S_{\rm dem} = 8S_{\rm NAND2} + 2S_{\rm inv2}.$$
(48)

The one-bit full adder is defined by the number of majority gates: i.e., total  $M_{\rm mg}$  on the critical path  $M_{\rm cmg}$  [27, Table 3]

$$t_{\rm add1} = M_{\rm cmg} t_{\rm NAND2} + L_{\rm add1} t_{\rm ic} \tag{49}$$

$$E_{\rm add1} = M_{\rm mg} E_{\rm NAND2} + (M_{\rm mg} - 1) E_{\rm inv1} + 2L_{\rm add1} E_{\rm ic}$$
(50)

$$S_{\text{add1}} = M_{\text{mg}}S_{\text{NAND2}} + (M_{\text{mg}} - 1)S_{\text{inv1}}.$$
 (51)

The adder is still a ripple-carry kind. The values are obtained by multiplying by the number of bits, in the present case, 32.

## **IX. ARITHMETIC LOGIC UNIT**

With the circuits described above, one can construct an ALU, which is an example of a state machine, even a rudimentary processor. For the purposes of this estimate, we consider the structure and operation of the ALU shown in Figs. 3 and 4, which, at this level of description, are common to TC and MGC logic.

At the heart of an ALU is the block performing arithmetic operations (AOs): addition, subtraction, NAND, and NOR, as per Fig. 3. All of the operations are performed on two input 32-bit numbers *A* and *B*. NAND and NOR are done in parallel. Addition and subtraction require propagating the carry from one bit to another. Therefore, the delay of propagating the



FIGURE 3. Scheme of the circuit block performing the AOs. Ctrl1 inputs enable the XOR blocks to choose between add or subtract operations. Ctrl0 inputs select the result of which AO is directed to the output. Lines represent 32-bit buses.



FIGURE 4. Scheme of the whole ALU. Latches are opened on the rising edge of the clocks: Clk0 and Clk1 are offset by half the clock cycle. All blocks contain 32 bits. AO = arithmetic operation unit as in Figure 3. RF = register files.

carry across the adder limits the operational delay of this block. This whole block is based on sequential logic and the operation function options are controlled by two signals Ctrl0 and Ctrl1. The delay, energy, and standby power for this block are

$$t_{ao} = t_{se} + t_{XOR} + t_{add32} + t_{mux} + L_{ao}t_{ic}$$
(52)

$$E_{\rm ao} = E_{\rm add32} + 32(2E_{\rm se} + E_{\rm XOR} + E_{\rm mux} + L_{\rm ao}E_{\rm ic})$$
(53)

$$S_{\rm ao} = S_{\rm add32} + 32(2S_{\rm se} + S_{\rm XOR} + S_{\rm mux}).$$
(54)

The entire ALU [Fig. 4] performs the following functions: it stores two 32-bit numbers in RFs, retrieves them and passes them to the AO block, receives the result from the AO block, sends it to the output, and writes the result into one of the RFs. Each of the RFs is a  $1 \times 32$  array of memory cells. Clocking is required to transfer data between the memory and the AO block. We chose a two-phase clock with the signals Clk0 and Clk1 shifted by a half of the clock period relative to each other. On the rising edge of Clk0, the latches transmit the outputs of the RFs to the AO block. On the rising edge of Clk1, the output is transferred to one of the RFs. Simultaneously, on the falling edge of Clk0, the inputs to AO are isolated from the RFs.

On the falling edge of Clk1, the inputs to RFs are isolated from the outputs of AO. The AOs (addition is the limiting one) must fit into the half of the clock cycle between the rise of Clk0 and Clk1 signals. The ALU switches in one complete clock cycle, e.g., period of Clk0

$$t_{\rm alu} = 2(t_{\rm ao} + t_{\rm se} + L_{\rm alu}t_{\rm ic}) \tag{55}$$

$$E_{\text{alu}} = E_{\text{ao}} + 32(E_{\text{se}} + E_{\text{rb}} + L_{\text{alu}}E_{\text{ic}}) \tag{56}$$

$$S_{\rm alu} = S_{\rm ao} + 32(S_{\rm se} + S_{\rm rb}).$$
 (57)

## X. RESULTS AND DISCUSSION

Now we utilize the models described above to obtain the estimates of energy and delay for a selective set of benchmark circuits. The complete set of plots is shown in [27]. A good representation of the performance of combinational circuits is given by the adder (Fig. 5). Consistent with the previous BCB 2.0 release, we note that spintronic devices switch slower than electronic ones. Among these, spin transfer torque (current-driven) spintronic devices require higher energy to switch state. Magnetoelectric (voltage-driven) spintronic devices have lower energies, down by an order of magnitude lower compared with that of CMOS HP. The van der Waals FET is the only device slightly faster than CMOS HP; though a more thorough study may change this assessment. Multiple TFET devices have lower switching energy than CMOS HP, though they switch slower. They still have  $2 \times$  to  $6 \times$  lower energy-delay products than CMOS HP. Ferroelectric transistors (such as FEFET and MITFET) are faster than the nonvolatile options-spintronic devices. These qualitative relations remain the same for a sequential logic circuit, such as 32-bit ALU (Fig. 6).



FIGURE 5. Switching energy versus delay of a 32-bit adder.



FIGURE 6. Switching energy versus delay of a 32-bit ALU.

When we look at active and standby power in Fig. 7, we notice that spintronic devices have lower standby power, especially the magnetoelectric ones. Magnetoelectric devices also have a significantly lower active power. Electronic devices, in general, have both higher active and higher standby power. Among them, TFETs have standby power in the range between nontunneling electronic and spintronic devices. TFETs also have active power between CMOS HP and magnetoelectric devices. We also show the metric introduced in BCB 2.0-computational throughput with capped power (Fig. 8). Under constraints of dissipated power density, CMOS HP cannot achieve its maximal throughput. Other devices, vdWFET, ExFET, and HetJFET, have better energy efficiency and are projected to exceed CMOS in throughput. TFETs enable relatively high throughput with even lower power. Magnetoelectric devices have about an order of magnitude lower throughput than CMOS HP, which is counterbalanced by smaller power dissipation. Ferroelectric devices are predicted to have comparable throughput. Spin torque devices are limited by power dissipation and have low throughput.



FIGURE 7. Active power versus standby power of a 32-bit adder.



FIGURE 8. Dissipated power versus computational throughput (capped at 10 W/cm<sup>2</sup>) related to a 32-bit ALU.

## **XI. CONCLUSION**

A benchmarking methodology has been introduced to compare beyond-CMOS devices which rely on new computational variables with new principles for transduction. A better understanding of electronic power supplies even for nonelectronic devices in circuits is laid out. We treated a wide set of benchmarking circuits, including sequential logic and an ALU. We introduced estimates for standby power, which is an extremely important metric, especially for mobile and wearable devices. Ferroelectrics were identified as a promising class of nonvolatile devices, but they have their own switching speed limitation. With our latest, more sophisticated treatment of circuits, we find that interconnects dominate switching energy and delay. Spintronics was found to have simplicity and size advantages for some circuits such as the Register bit-cell, and the state element. Spintronic circuits have a much lower standby power (if they are clocked). With all of the above, we provide an approach for researchers to better focus on promising beyond-CMOS devices and to seek methods of improving their power versus performance.

## ACKNOWLEDGMENT

The authors would like to thank their colleagues from NRI and STARnet who participated in the recent round of benchmarking: A. Naeemi, C. Pan, N. Kani, S.-C. Chang, R. Lake, J. Bird, P. Dowben, C. Kim, J. Wang, J. Kim, A. Seabaugh, J. Nahas, V. Narayanan, A. Khan, and S. Salahuddin. They would also like to thank their Intel colleagues, U. Avci, S. Manipatruni, R. Kim, G. Allen, D. Morris, G. Bourianoff, B. Doyle, R. Chau, I. Karpov, M. Mayberry, M. Bohr, and K. Zhang, for their helpful feedback.

#### REFERENCES

- G. E. Moore, "Cramming more components onto integrated circuits," *Electronics*, vol. 38, no. 8, pp. 114–117, 1965.
- [2] S. Natarajan *et al.*, "A 14 nm logic technology featuring 2<sup>nd</sup>-generation FinFET, air-gapped interconnects, self-aligned double patterning and a 0.0588 μm<sup>2</sup> SRAM cell size," in *Proc. IEEE IEDM*, Dec. 2014, pp. 3.7.1–3.7.3.
- [3] V. V. Zhirnov, R. K. Cavin, J. A. Hutchby, and G. I. Bourianoff, "Limits to binary logic switch scaling—A gedanken model," *Proc. IEEE*, vol. 91, no. 11, pp. 1934–1939, Nov. 2003.
- [4] J. J. Welser, G. I. Bourianoff, V. V. Zhirnov, and R. K. Cavin, III, "The quest for the next information processing technology," *J. Nanoparticle Res.*, vol. 10, no. 1, pp. 1–10, 2008.
- [5] K. Bernstein, R. K. Cavin, III, W. Porod, A. Seabaugh, and J. Welser, "Device and architecture outlook for beyond CMOS switches," *Proc. IEEE*, vol. 98, no. 12, pp. 2169–2184, Dec. 2010.
- [6] D. E. Nikonov and I. A. Young, "Uniform methodology for benchmarking beyond-CMOS logic devices," in *Proc. IEEE IEDM*, Dec. 2012, pp. 25.4.1–25.4.4.
- [7] D. E. Nikonov and I. A. Young, "Overview of beyond-CMOS devices and a uniform methodology for their benchmarking," *Proc. IEEE*, vol. 101, no. 12, pp. 2498–2533, Dec. 2013.
- [8] A. C. Seabaugh and Q. Zhang, "Low-voltage tunnel transistors for beyond CMOS logic," *Proc. IEEE*, vol. 98, no. 12, pp. 2095–2110, Dec. 2010.
- [9] W. Kang et al., "An overview of spin-based integrated circuits," in Proc. 19th Asia South Pacific Design Autom. Conf. (ASP-DAC), Singapore, Jan. 2014, pp. 676–683.
- [10] (2014). Benchmarking of devices in the Nanoelectronics Research Initiative. [Online]. Available: https://nanohub.org/tools/nribench/ browser/trunk/src

- [11] G. Bourianoff, "The future of nanocomputing," *Computer*, vol. 36, no. 8, pp. 44–53, Aug. 2003.
- [12] M. Dertouzos, *Threshold Logic: A Synthesis Approach*. Cambridge, MA, USA: MIT Press, 1965.
- [13] L. F. Register, private communication, 2014.
- [14] P. Zhao, R. M. Feenstra, G. Gu, and D. Jena, "SymFET: A proposed symmetric graphene tunneling field effect transistor," in *Proc. 70th Annu. Device Res. Conf. (DRC)*, Jun. 2012, pp. 33–34.
- [15] S. L. Miller and P. J. McWhorter, "Physics of the ferroelectric nonvolatile memory field effect transistor," J. Appl. Phys., vol. 72, no. 12, pp. 5999–6010, 1992.
- [16] S. Salahuddin and S. Datta, "Use of negative capacitance to provide voltage amplification for low power nanoscale devices," *Nano Lett.*, vol. 8, no. 2, pp. 405–410, 2008.
- [17] D. Newns, B. Elmegreen, X. H. Liu, and G. Martyna, "A low-voltage high-speed electronic switch based on piezoelectric transduction," *J. Appl. Phys.*, vol. 111, no. 8, p. 084509, 2012.
- [18] J. Son, S. Rajan, S. Stemmer, and S. J. Allen, "A heterojunction modulation-doped Mott transistor," J. Appl. Phys., vol. 110, no. 8, p. 084503, 2011.
- [19] S. Datta, S. Salahuddin, and B. Behin-Aein, "Non-volatile spin switch for Boolean and non-Boolean logic," *Appl. Phys. Lett.*, vol. 101, no. 25, p. 252411, 2012.
- [20] M. O. Li, D. Esseni, G. Snider, D. Jena, and H. G. Xing, "Single particle transport in two-dimensional heterojunction interlayer tunneling field effect transistor," *J. Appl. Phys.*, vol. 115, no. 7, p. 074508, 2014.
- [21] P. Fay, private communication, 2014.
- [22] S. Das, A. Prakash, R. Salazar, and J. Appenzeller, "Toward low-power electronics: Tunneling phenomena in transition metal dichalcogenides," *ACS Nano*, vol. 8, no. 2, pp. 1681–1689, Jan. 2014.
- [23] L. Liu, Y. Lu, and J. Guo, "On monolayer MoS<sub>2</sub> field-effect transistors at the scaling limit," *IEEE Trans. Electron Devices*, vol. 60, no. 12, pp. 1433–1439, Dec. 2013.
- [24] D. Morris, D. Bromberg, J.-G. Zhu, and L. Pileggi, "mLogic: Ultra-low voltage non-volatile logic circuits using STT-MTJ devices," in *Proc. 49th* ACM/EDAC/IEEE DAC, San Francisco, CA, USA, Jun. 2012, pp. 486–491.
- [25] J. A. Currivan, Y. Jang, M. D. Mascaro, M. A. Baldo, and C. A. Ross, "Low energy magnetic domain wall logic in short, narrow, ferromagnetic wires," *IEEE Magn. Lett.*, vol. 3, p. 3000104, Apr. 2012.
- [26] D. E. Nikonov and G. I. Bourianoff, "Spin gain transistor in ferromagnetic semiconductors-the semiconductor Bloch-equations approach," *IEEE Trans. Nanotechnol.*, vol. 4, no. 2, pp. 206–214, Mar. 2005.
- [27] D. E. Nikonov and I. A. Young. (Mar. 2014). Benchmarking of Beyond-CMOS Exploratory Devices for Logic Integrated Circuits. [Online]. Available: http://ieeexplore.ieee.org/xpl/abstractMultimedia. jsp?arnumber=7076743.
- [28] N. H. E. Weste and D. M. Harris, CMOS VLSI Design: A Circuits and Systems Perspective, 4th ed. Boston, MA, USA: Addison-Wesley, 2011.
- [29] J. Li, B. Nagaraj, H. Liang, W. Cao, C. H. Lee, and R. Ramesh, "Ultrafast polarization switching in thin-film ferroelectrics," *Appl. Phys. Lett.*, vol. 84, no. 7, pp. 1174–1176, 2004.
- [30] J. Z. Sun, "Spin angular momentum transfer in current-perpendicular nanomagnetic junctions," *IBM J. Res. Develop.*, vol. 50, no. 1, pp. 81–100, Jan. 2006.
- [31] R. H. Koch, J. A. Katine, and J. Z. Sun, "Time-resolved reversal of spintransfer switching in a nanomagnet," *Phys. Rev. Lett.*, vol. 92, p. 088302, Feb. 2004.
- [32] D. E. Nikonov and I. A. Young, "Benchmarking spintronic logic devices based on magnetoelectric oxides," *J. Mater. Res.*, vol. 29, no. 18, pp. 2109–2115, Sep. 2014.
- [33] J. T. Heron *et al.*, "Electric-field-induced magnetization reversal in a ferromagnet-multiferroic heterostructure," *Phys. Rev. Lett.*, vol. 107, no. 21, p. 217202, 2011.
- [34] X. He *et al.*, "Robust isothermal electric control of exchange bias at room temperature," *Nature Mater.*, vol. 9, pp. 579–585, Jun. 2010.
- [35] T. Wu *et al.*, "Giant electric-field-induced reversible and permanent magnetization reorientation on magnetoelectric Ni/(011)  $[Pb(Mg_{1/3}Nb_{2/3})O_3]_{(1-x)}$ -[PbTiO<sub>3</sub>]<sub>x</sub> heterostructure," *Appl. Phys. Lett.*, vol. 98, no. 1, p. 012504, 2011.

- [36] Y. Shiota, T. Nozaki, F. Bonell, S. Murakami, T. Shinjo, and Y. Suzuki, "Induction of coherent magnetization switching in a few atomic layers of FeCo using voltage pulses," *Nature Mater.*, vol. 11, pp. 39–43, Nov. 2011.
- [37] (2011). International Technology Roadmap for Semiconductors. [Online]. Available: http://www.itrs.net/
- [38] R. Kim, U. E. Avci, and I. A. Young, "Source/drain doping effects and performance analysis of ballistic III–V n-MOSFETs," *IEEE J. Electron Devices Soc.*, vol. 3, no. 1, pp. 37–43, Jan. 2015.
- [39] M. Luisier, A. Schenk, and W. Fichtner, "Atomistic treatment of interface roughness in Si nanowire transistors with different channel orientations," *Appl. Phys. Lett.*, vol. 90, no. 10, p. 102103, 2007.
- [40] M. Luisier, A. Schenk, W. Fichtner, and G. Klimeck, "Atomistic simulation of nanowires in the  $sp^3d^5s^*$  tight-binding formalism: From boundary conditions to strain calculations," *Phys. Rev. B*, vol. 74, no. 20, p. 205323, 2006.
- [41] M. Lundstrom. (2008). ECE 612: Nanoscale Transistors (Fall 2008). [Online]. Available: https://nanohub.org/resources/5328
- [42] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, "Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits," *Proc. IEEE*, vol. 91, no. 2, pp. 305–327, Feb. 2003.
- [43] D. H. Morris, U. E. Avci, R. Rios, and I. A. Young, "Design of low voltage tunneling-FET logic circuits considering asymmetric conduction characteristics," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 4, no. 4, pp. 380–388, Dec. 2014.
- [44] (2014). *Predictive Technology Model*. [Online]. Available: http://ptm.asu.edu/
- [45] S. Sinha, G. Yeric, V. Chandra, B. Cline, and Y. Cao, "Exploring sub-20 nm FinFET design with predictive technology models," in *Proc.* 49th ACM/EDAC/IEEE Design Autom. Conf., San Francisco, CA, USA, Jun. 2012, pp. 283–288.
- [46] C. Augustine, A. Raychowdhury, Y. Gao, M. Lundstrom, and K. Roy, "PETE: A device/circuit analysis framework for evaluation and comparison of charge based emerging devices," in *Proc. Int. Symp. Quality Electron. Design*, 2009, pp. 80–85.
- [47] D. Reddy, L. F. Register, E. Tutuc, and S. K. Banerjee, "Bilayer pseudospin field-effect transistor: Applications to Boolean logic," *IEEE Trans. Electron Devices*, vol. 57, no. 4, pp. 755–764, Apr. 2010.



**DMITRI E. NIKONOV** (M'99–SM'06) received the M.S. degree in aeromechanical engineering from the Moscow Institute of Physics and Technology, Moscow, Russia, in 1992, and the Ph.D. degree in physics from Texas A&M University, College Station, TX, USA, in 1996.

He joined Intel, Santa Clara, CA, USA, in 1998. He is currently a Principal Engineer with the Components Research Group, Hillsboro, OR, USA, doing simulation and benchmarking of beyond-

CMOS logic devices and managing research programs with universities on nanotechnology, 79 publications, and 48 issued patents.



**IAN YOUNG** (M'78–SM'96–F'99) received the B.E.E and M.Eng.Sci. degrees from the University of Melbourne, Melbourne, VIC, Australia, and the Ph.D. in electrical engineering from the University of California at Berkeley, Berkeley, CA, USA.

He is currently a Senior Fellow and the Director of the Exploratory Integrated Circuits with the Technology and Manufacturing Group, Intel Corporation, Hillsboro, OR, USA. He leads a research group exploring the future options for the inte-

grated circuit in the beyond CMOS era.