Contents lists available at ScienceDirect





Microelectronics Journal

journal homepage: www.elsevier.com/locate/mejo

# CMOS voltage-mode quaternary look-up tables for multi-valued FPGAs

R.C.G. da Silva<sup>a</sup>, C. Lazzari<sup>b</sup>, H. Boudinov<sup>c</sup>, L. Carro<sup>c,\*</sup>

<sup>a</sup> CEITEC, Brazil

<sup>b</sup> INESC Lisboa, Portugal

<sup>c</sup> Universidade Federal do Rio Grande do Sul, Brazil

### ARTICLE INFO

# ABSTRACT

Article history: Received 2 April 2009 Received in revised form 15 July 2009 Accepted 21 July 2009 Available online 12 August 2009

*Keywords:* Quaternary logic Low power circuits Reconfigurable circuits Circuits for future technologies

# 1. Introduction

With the advent of deep submicron technologies, interconnections are becoming the dominant aspect of the circuit delay for state-of-the-art circuits, and this fact is becoming more significant at each new technology generation [1]. This is due to the fact that gate speed, density and power scale are obeying Moore's law, while the interconnection resistance–capacitance product increases with each technology node, with a consequent increase in the network delay [2,3]. In field programmable gate arrays (FPGAs), interconnections play an even more crucial role, because they not only dominate the delay [4], but they also severely impact power consumption [5] and area [6]. For million-gates FPGAs, as much as 90% of the chip area is devoted to interconnections [7].

Multiple-valued logic (MVL) targeted to FPGAs was already shown in [8–10]. Most efforts to accomplish multiple-valued FPGAs have presented current-mode circuits or hybrid implementations of CMOS and quantum devices to be used in configurable logic blocks (CLB). These circuits present successful improvements in reducing area, but their excessive static power consumption and realization complexities have precluded their acceptance as a viable alternative to standard CMOS designs.

In order to deal with the static power dissipation problem using a standard CMOS process, and still maintain the logic compaction allowed by MVL, this paper presents a new metho-

Field programmable gate arrays usage has been growing steadily for years now. Their popularity stems from the fact that they can be reprogrammed to implement any function, with any amount of parallelism. Unfortunately, exactly due to their flexibility, FPGAs require a huge amount of resources, in the form of LUTs and routing switches, and these can take up to 90% of the chip area. In this paper we present the development of a low-power full CMOS multiple-valued logic to build a LUT for FPGAs. Several circuits are mapped to quaternary LUTs and compared to their binary counterpart. Results show great improvements in terms of area and power consumption. Moreover, we show the positive impact of the proposed architecture in the global reduction of routing switches and wiring, and hence in the total FPGA area.

 $\ensuremath{\textcircled{}^\circ}$  2009 Elsevier Ltd. All rights reserved.

dology that uses quaternary look-up tables implemented using voltage-mode CMOS logic. The look-up table circuit presented is a transmission gate (T-gate) like circuit that uses down literal circuits, binary inverters and pass transistor gates.

To cope with interconnections costs, we propose the use of multiple-valued logic to compact information, since a single wire can transmit more than two distinct signal levels. This can lead to a reduction on the total interconnection costs, hence reducing area, power and delay, what is especially important in FPGA designs.

In order to evaluate the proposed technique, this paper presents simulations of quaternary LUTs of 1, 2 and 3 quaternary control inputs, and compares them to their corresponding binary LUTs, the basic logic block used in several commercial FPGAs. Circuits are simulated with the SPICE tool using TSMC 0.18 µm technology. Logic mapping of quaternary functions into look-up tables is shown for a set of functions to emulate the use of such circuits in the configurable logic block of an FPGA.

#### 2. Binary and quaternary look-up tables

Field programmable gate arrays are two dimensional arrays of logic blocks and flip-flops with electrically programmable interconnections among those. In several modern FPGAs, look-up tables are used as logic blocks to implement any number of different functionalities. The desired function input feeds the input signals of look-up tables. The output of the look-up table gives the result of the logic function that it implements. In digital logic, an n-bit look-up table can be implemented with a

<sup>\*</sup> Corresponding author. Tel.: +55 5133086806; fax: +55 5133087036. E-mail address: carro@inf.ufrgs.br (L. Carro).

<sup>0026-2692/\$ -</sup> see front matter  $\circledcirc$  2009 Elsevier Ltd. All rights reserved. doi:10.1016/j.mejo.2009.07.001

multiplexer whose select lines are the inputs of the LUT, and whose primary inputs are mapped at design time to a constant value. A *k*-input LUT (*k*LUT) can encode any *k*-input Boolean logic function by modeling such functions as truth tables. Available FPGAs use LUTs with up to 8 binary inputs (8LUT) or combinations of different input number LUT [11]. A *k*-input LUT can implement

$$F = b^{(b^k)}$$

functions [9], where F is the number of functions, b the base or radix and k the number of LUT inputs. For N outputs LUT, the number of select lines (M) allowed is

 $M = N * b^k$ .

Comparing these numbers for binary and quaternary logic, one can find equivalent logic complexities. A quaternary LUT with 1, 2 and 3 inputs and 1 output can multiplex 4, 16 and 64 quaternary signals, respectively. Equivalent binary multiplexer circuits must deal with 2 outputs to represent a quaternary output, so that for each quaternary LUT, 2 binary LUTs are necessary to evaluate the same logic. To multiplex each 4, 16 and 64 signals to the output, the binary LUTs require 2, 4 and 6 control inputs. These numbers are compared in Table 1.

A quaternary look-up table (QLUT) can have four quaternary select lines using only one quaternary input (the control signal). Accordingly, the simplest QLUT can implement all possible functions of one variable allowed in this logic, i.e.,  $2^8$  functions. The block representation of QLUT is shown in Fig. 1.

#### Table 1

Comparison of LUT capacity.

| Quaternary |    |   | Equivalent binary |   | Functions        |
|------------|----|---|-------------------|---|------------------|
| k          | М  | Ν | М                 | Ν |                  |
| 1          | 4  | 1 | 8                 | 2 | 2 <sup>8</sup>   |
| 2          | 16 | 1 | 32                | 2 | 2 <sup>32</sup>  |
| 3          | 64 | 1 | 128               | 2 | 2 <sup>128</sup> |



Fig. 1. Representation of the quaternary LUT with 4 quaternary inputs.



Fig. 2. Block diagram of a two input quaternary look-up table.

The QLUT can be used to implement higher order QLUTs, e.g., a 2-input quaternary look-up table (2QLUT) as shown in Fig. 2, which is equivalent to two binary 4LUTs.

Indeed, modern FPGAs present higher order LUTs. As examples, Xilinx Spartan-3E [12] and Virtex-5 [13] families use binary 4LUT and 6LUT in their configurable logic blocks, respectively. Also, Altera Ciclone III [14] and Stratix III [15] present 4LUT and 6LUT, respectively. For this reason, in this work we focus on the realization of high order LUTs, targeting complex circuits with several inputs and outputs.

## 3. Quaternary LUT design

The proposed QLUT is based on a voltage-mode quaternary logic CMOS circuits presented in a previous work [16]. This new family of circuits uses several different transistors with different threshold voltages, and operates with four voltage levels, corresponding to logic levels '0', '1', '2'and '3'. The QLUT presented in this work is designed using down literal circuits (DLCs), binary inverters and pass transistor gates. There are three possible down literal circuits in quaternary logic [17], named DLC1, DLC2 and DLC3 (Table 2). Table 2 shows the output logic level of each DLC for each of the 4 quaternary inputs.

Current-mode MVL circuits use as logic levels current values, which are in general integer multiples of a reference current value. On the other hand, voltage-mode MVL circuits use different voltage values, usually fractions of VDD, as logic levels. In order to favor maximum noise margins, each logic level is equally spaced from its neighbor levels. The down literal circuits are designed in CMOS technology with 3 different threshold voltages for PMOS transistors, and 3 different threshold voltages for NMOS transistors. Table 3 shows  $V_t$  values relative to the transistors source-bulk voltages for circuits with 1.8 V VDD. In this case, the voltage levels 0, 0.6, 1.2 and 1.8 V correspond to logic levels '0', '1', '2' and '3', respectively. As mentioned, one should pick the electrical levels in such a way to allow maximum noise margins. As the minimum  $V_t^0$  for this technology was 0.6 V, the final partitioning was done as described.

Each DLC uses one PMOS and one NMOS transistor from Table 3 connected as shown in Fig. 3. DLC1 is designed substituting the PMOS transistor of a binary inverter with T1 and the NMOS with T4. DLC2 is designed using T3 and T6 while DLC3 uses T5 and T2. In all these circuits, only one transistor is conducting for each of the four input values. In DLC1, with 0V at the input, T1 is conducting and T4 is OFF, so that the output is connected to 1.8 V (logic value 3). With 0.6, 1.2 and 1.8 V at the input, T1 is off and T4 is conducting, and the output goes to ground. In DLC2, T3 is

| Table | 2       |          |       |        |
|-------|---------|----------|-------|--------|
| Down  | literal | circuits | truth | table. |

| In | Down literal ou | eral outputs |      |  |
|----|-----------------|--------------|------|--|
|    | DLC1            | DLC2         | DLC3 |  |
| 0  | 3               | 3            | 3    |  |
| 1  | 0               | 3            | 3    |  |
| 2  | 0               | 0            | 3    |  |
| 3  | 0               | 0            | 0    |  |

Table 3

Transistor  $V_{\rm t}$  values related to  $V_{\rm S}$ .

|                | T1   | T2   | T3   | T4   | T5   | T6   |
|----------------|------|------|------|------|------|------|
| Type           | PMOS | NMOS | PMOS | NMOS | PMOS | NMOS |
| V <sub>t</sub> | -2.2 | 2.2  |      | 0.2  | -0.2 | 1.2  |



Fig. 3. Schematics of (a) DLC1, (b) DLC2 and (c) DLC3.



Fig. 4. Quaternary look-up table schematic circuit.

conducting for 0 and 0.6 V at the input, while T6 is conducting for 1.2 and 1.8 V at the input. For DLC3, T5 is on for 0, 0.6 and 1.2 V, while T2 is conducting for 1.8 V at the input.

The QLUT was designed using the 3 DLCs, 3 binary inverters and 6 pairs of pass transistor gates as shown schematically in Fig. 4. This circuit has 4 quaternary select lines (A, B, C and D), one quaternary output (Out) and one quaternary control input (Z). The select lines are used in the exact same sense as their binary counterparts, that is, they decide which input gets connected to the output of the multiplexer, by controlling the transmission gates. The input is split in 3 different binary signals by DLCs. These binary signals are applied to pairs of pass transistor gates PMOS-NMOS that are used to transmit any signal without degradation.

Each control input value defines a different set of DLC outputs that are used to control the pass transistor gates in such a way that, when the input voltages are 0, 0.6, 1.2 and 1.8 V, the output is connected to A, B, C and D respectively. When the control input (Z) is at 0 V, all DLCs outputs are high. The output of DLC1 and its negative are applied to the NMOS pass transistor gate N1 and PMOS pass transistor P1, respectively, setting the output to the A value. At the same time these two signals are also applied to the PMOS P2 and to NMOS N2, respectively turning them off, therefore opening the path between B and the output. The DLC2 and DLC3 output and their complement are used to turn off P4, N4, P6 and N6, opening the paths between the output and C and D. For the other 3 signals that can be applied to the control input, the output is set in a similar way. The QLUT circuit is composed of 24 transistors.

The realization of high order quaternary LUTs is carried out based on the low order quaternary QLUT. The 2QLUT is essentially developed by replicating the QLUT five times. As shown in Fig. 2,



Fig. 5. QLUT output transients for all possible transitions of the input for a quaternary inverter function.

one of the inputs (Z1) controls four QLUTs, and each output is connected to the fifth QLUT controlled by the Z2 input. Each combination of Z1 and Z2 leads to one of the sixteen select lines (A to P in Fig. 2) to the 2QLUT output. To implement a 3QLUT another sixteen QLUTs are demanded and they are controlled by a third control input, and so on.

#### 4. Simulation results

Simulations of look-up table circuits were carried out with the Spice tool using TSMC  $0.18 \,\mu$ m technology. In this work we present results for binary 2LUT, 4LUT and 6LUT and quaternary QLUT, 2QLUT and 3QLUT to estimate and compare their power consumptions, area and performance.

Circuits were simulated for all possible transitions of the control inputs. As an example, Fig. 5 shows the output transients of the QLUT mapping the quaternary inversion function, i.e., for '0', '1', '2' and '3' at the input it returns '3', '2', '1' and '0' at the output. A binary inverter with dimensions of  $L = 0.18 \,\mu\text{m}$  and  $W = 0.81 \,\mu\text{m}$  was connected at the output of all circuits as a load.

Simulations of all possible combinations of input signals were carried out in quaternary and binary look-up tables. To compare the quaternary and binary results using the same logic complexity, one must duplicate the binary LUT, i.e., a QLUT is equivalent to two 2LUT, the 2QLUT is equivalent to two 4LUT, and the 3QLUT to two 6LUT. While duplicating the circuits, the number of transistors and power consumption is doubled; however the propagation delay roughly remains the same, as they will work in parallel. A summary of these simulations is shown in Table 4.

#### 5. Logic mapping

Using only quaternary look-up tables as logic elements, several quaternary functions were simulated by mapping their truth tables into a set of QLUTs by correctly settings all inputs and interconnections. A logic mapping was also performed for the binary correspondent function using binary look-up tables, with the ABC logic minimization software [18].

Our benchmark set is composed of the functions named sum3, sum4, sum5, sum6, sum22 and prod22, which are summation of 3, 4, 5 and 6 quaternary numbers of 1 bit, and sum and product of 2 quaternary numbers of 2 bits, respectively. The simulation

**Table 4**Comparison between binary and quaternary LUTS.

| LUT      | Transistors | Delay (ps) | Power at 500 MHz ( $\mu$ W) |
|----------|-------------|------------|-----------------------------|
| QLUT     | 24          | 287        | 28                          |
| 2 × 2LUT | 32          | 67         | 62                          |
| 2QLUT    | 84          | 503        | 155                         |
| 2 × 4LUT | 136         | 146        | 289                         |
| 3QLUT    | 288         | 592        | 507                         |
| 2 × 6LUT | 528         | 261        | 1010                        |



Fig. 6. Power consumption of simulated functions mapped using different look-up tables.



Fig. 7. Critical delay of simulated functions mapped using different look-up tables.



Fig. 8. Number of transistors used of simulated functions mapped using different look-up tables.

includes mapping in 2LUT, 4LUT, 6LUT, 2QLUT and 3QLUT and the results for power consumption, critical delay and number of transistors used are shown in Figs. 6–8, respectively.

Regarding circuit areas, results show a great reduction in the number of transistors used when one employs quaternary logic in place of binary logic. As an example, one can consider the function sum6, mapped with 3QLUT, which uses 2016 transistors, while it takes 50,160 transistors to map the same function to a binary equivalent 6LUT, as shown in Fig. 8.

Comparing performances, the quaternary basic logic blocks are not as fast as their corresponding binary blocks. The critical delay for 3QLUT is about 2.3 times greater than the corresponding delay for 6LUT. However, since quaternary circuits have lower area and consumption compared with the equivalent binary, improvements in performance can still be achieved by transistors resizing in quaternary circuits.

Moreover, in FPGAs, the connection delay is far more important than the LUT delay. Hence, by reducing the number of connections, one reduces the number of switches within a given path, and this way one can increase total system performance.

Regarding dynamic power consumption, the quaternary logic blocks have presented a great improvement compared to the binary logic blocks. The power consumption of quaternary circuits can be as low as 50% of the power consumption of the corresponding binary circuit. This improvement is due to the reduction in the number of transistors and lower average voltage, since some signals are fractions of the binary VDD voltage, and few transitions are rail to rail, in opposition to the binary case.

The studied results indicate an increase in the overall quality of quaternary implementations as the complexity of the functions to be mapped increase, regarding the number of inputs and outputs. The area and power consumption advantages of quaternary circuits are greater for more complex functions. The critical delay that is higher for all quaternary implementations does not present a clear trend of improvement, but as it also does not present any worsening, the overall improvement as a function of circuit complexity is evident for quaternary circuits.

### 6. Discussions

Simulation results affirm the functionality of the developed quaternary logic circuits, and also show that these circuits can achieve a satisfactory performance and excellent improvements in area and power consumption. Moreover, it is worth to emphasize that the major gain with the use of quaternary logic instead of binary logic is the ability to compact more information in a single gate or wire, which favors the cost reduction in the interconnect network, since the amount of wires can be potentially halved, with obvious reductions in the cost of switching.

Analyzing the obtained results regarding area, one can see an improvement on the number of transistors from quaternary to binary LUTs. The area saving in the logic block is substantial, and together with the reduction in interconnect area, this can lead to a great reduction in the total chip area. In modern FPGAs, the interconnections are very area consuming due to reprogrammability requirements, and MVL has already been showed to reduce this interconnection demand [19].

Another point to consider regarding FPGA architectures is the power consumption in the CLB output latches. Since every active latch is switching as function of the clock signal, the power dissipation of latches represents a considerable amount in the total power consumption. Using quaternary logic in each CLB, the number of latches is halved, hence further reducing the power consumption. Voltage-mode quaternary inverters and pass transistors were already demonstrated [16], and they can be used in the latches.

Comparing performances, one can observe an increase in the propagation delay for quaternary circuits. The propagation delay for 3QLUT is about 2.3 times larger than its binary counterpart. This higher delay is due to the quaternary LUT structure itself, since it has, for each input, an additional stage for down literal circuit evaluations, which does not exist in binary circuits. Another point to consider in quaternary LUTs is that critical paths have two pass transistors coupled in series for each quaternary input, instead of only one transistor for the binary counterpart. Hence, the path length through pass transistors is not halved. Nevertheless, despite this extra delay, a quaternary FPGA can still show improvements in performance due to the reduced wire number and average length. Moreover, transistor resizing can be done since extra area and power slack are available.

The quaternary design presented in this work has some sensitive points, including the necessity of additional implants (since one must adjust the threshold voltages of the circuits with extra steps in the chip fabrication process), reduced noise margins and the need for three different voltage sources. However, all these extra wires have an impact that can be expected to be solved at design time. During the programming and execution phase of an FPGA application, the advantages of shorter and more information driven connections can be significant. Despite the mentioned features, it is worth to emphasize the fully compatibility of this quaternary design with present state-of-the-art CMOS technology. The additional implants needed for threshold adjustments do not necessarily require an increase in the processing steps or costs, because the reduction in metal layers may compensate for it.

The reduced noise margin is a drawback inherent to multiplevalued logic, whenever one uses fractions of VDD to power the circuit up. However, by doing extensive Monte Carlo simulations one has already demonstrated that the proposed quaternary circuits can effectively be used in future technologies [20].

For what regards dynamic power dissipation, quaternary circuits dissipate half of their binary counterparts, as seen in Table 4. This is not the same for static power. We have performed some experiments with the 45 nm technology, where the static power is more significant, and the quaternary LUT dissipated 134 nA, against 110 nA of the equivalent binary LUT. Hence, although quaternary circuits show extra static power, the huge gains obtained in terms of dynamic power largely compensates for this effect.

Regarding the need for three power supplies, the additional wiring required does not increase considerably the total amount of metals and area, and certainly does not impact routing capabilities of the underlying circuit. Since the most important wiring is the user-programmable connection in the FPGA, the power distribution will not be a limitation in the project. Modern circuits already use more than one power supply and even if two additional metal power lines should be needed, this amount of metal layers can be less than the layers saved by the inherent compaction of data allowed by quaternary circuits.

In this work we cover only arithmetic circuits and although conceptually simple, they are fairly important. As an example, Xilinx provides embedded and hardwired multipliers inside their FPGAs. Hence, we conclude that the set of chosen examples can point to the right direction regarding the goals of this research, while we work to cover a broader class of circuits in future publications.

#### 7. Conclusion

This work presented some steps towards a low-power, reduced area quaternary FPGA. We presented low-power quaternary lookup tables intended to be used in configurable logic block for high density FPGA designs. Simulation results showed a reduction of the LUT area and power consumption compared to binary equivalent LUTs. Logic mapping of quaternary functions into quaternary LUTs showed high improvement in overall characteristics compared to binary equivalent mapping. Results also showed a clear trend of overall improvements for more complex functions, regarding not only LUT area and power, but interconnection reduction and acceleration.

# References

- W.J. Dally. Computer architecture is all about interconnect, in: the 8th International Symposium on High-Performance Computer Architecture, Cambridge, Massachusetts, February 2002.
- [2] G. Metze, M. Khbels, N. Goldsman, B. Jacob, Heterogeneous integration, Technol. Trend Notes 12 (2) (2003) 3.
- [3] B. Havemann, The interconnect challenge, 16th Biennial University Government Industry Microelectronics Symposium June 2006.
- [4] R. Pragasan, "Spartan FPGA—The Gate Array Solution" Xilinx Application Notes, August. 2001.
- [5] Fei Li, Yan Lin, Lei He, Power modeling and characteristics of field programmable gate arrays, IEEE Trans. Computer-Aided Design Integrated Circuits Syst. 24 (11) (November 2005).
- [6] A. Singh, M. Marek-Sadowska, Efficient circuit clustering for area and power reduction in FPGAs, FPGA, February. 2002.
- [7] Xilinx, The Programmable Logic Databook 2003, Xilinx Inc., San Jose, California, 2003.
- [8] Z. Zilic, Z.G. Vranesic, Multiple-valued logic in FPGAs, in: Proceedings of the 36th Midwest Symposium on Circuits and Systems, Detroit, Mi., August 1993.
- [9] P.M. Kelly, T.M. McGinnity, L.P. Maguire, L. McDaid, Exploiting binary functionality in quaternary look-up tables for increased functional density in multiple-valued logic FPGAs, Electron. Lett. 41 (6) (March 2005) 300–302.
- [10] A. Sheikholeslami, R. Yoshimura, P.G. Gulak, Look-up tables (LUTs) for multiple-valued, combinational logic, in: Proceedings of ISMVL 98, pp 264–269.
- [11] Stratix II: 8-Input Fracturable LUT in the ALM. [Online] March 23, 2007. <a href="http://www.altera.com/products/devices/stratix2/features/architecture/st2-lut.html">http://www.altera.com/products/devices/stratix2/features/architecture/st2-lut.html</a>>.
- [12] Spartan II Architecture, [Online] March 23, 2007. <a href="http://www.xilinx.com/products/silicon\_solutions/fpgas/spartan\_series/spartan2\_fpgas/capabilities/architecture.htm">http://www.xilinx.com/products/silicon\_solutions/fpgas/spartan\_series/spartan2\_fpgas/capabilities/architecture.htm</a>>.
- [13] ExpressFabric Architecture, [Online] March 23, 2007. <http://www.xilinx. com/products/silicon\_solutions/fpgas/virtex/virtex5/capabilities/ExpressFab ric.htm >.
- [14] Ciclone III Device handbook, Retrieved March 23, 2007 from <http://www. altera.com/literature/lit-cyc3.jsp>.
- [15] Stratix III device handbook. Retrieced March 23, 2007, from <http://www. altera.com/literature/lit-stx3.jsp>.
- [16] R.C.G. da Silva, H. Boudinov, L. Carro, A novel voltage-mode CMOS quaternary logic design, IEEE Trans. Electron Devices 53 (6) (June 2006).
- [17] M. Inaba, Experiment result of down literal circuit and analog inverter on CMOS double-polysilicon process, in: Proceedings of ISMVL 07.
- [18] ABC: A system for sequential synthesis and verification. Retrieved April 05, 2008, from <a href="http://www.eecs.berkeley.edu/~alanmi/abc/">http://www.eecs.berkeley.edu/~alanmi/abc/</a>.
- [19] P.M. Kelly, T.M. McGinnity, Maguire, Reducing interconnection resource overhead in nano-scale FPGAs through MVL signal systems, in: Proceedings of ASAP 2005.
- [20] E. Rhod, L. Carro, A low cost low power quaternary LUT cell for fault tolerant applications in future technologies, in: Proceedings of ISVLSI 2009, Tampa, Florida.