$See \ discussions, stats, and \ author \ profiles \ for \ this \ publication \ at: \ https://www.researchgate.net/publication/3349213$ 

# Mixed analogue-digital artificial-neural-network architecture with on-chip learning

Article in IEE Proceedings - Circuits Devices and Systems · January 2000

DOI: 10.1049/ip-cds:19990685 · Source: IEEE Xplore



A 16-channel 1.1mm2 implantable seizure control SoC with sub-µW/channel consumption and closed-loop stimulation in 0.18µm CMOS View project

# A Mixed Analog-Digital Artificial Neural Network Architecture with On-Chip Learning

Alexandre Schmid, Yusuf Leblebici and Daniel Mlynek

Abstract—This paper presents a novel artificial neural network architecture with on-chip learning capability. The issue of straightforward design-flow integration of an autonomous unit is addressed with a mixed analog-digital approach, by implementing a charge-based artificial neural network which interacts with digital control and processing units. We demonstrate the circuit architecture and designflow approach for the case of a Hamming network performing pixel-pattern recognition.

Keywords— Charge-based ANN, mixed-mode ANN hardware architecture, ANN integration design-flow.

#### I. INTRODUCTION

THE ABILITY of artificial neural networks (ANN) to acquire knowledge of their surrounding environment and adapt to it, as well as their use of a high degree of computing parallelism makes them very efficient in many application fields including process and quality control, consumer products, optical character and speech recognition, and complex forecasting tasks among many others [1].

Silicon implementation of ANNs as an integrated circuit (IC) [2] aims at providing a final product with desirable low-area, low-power and low-cost properties. Several purely analog ICs, most of them belonging to the chargebased or current-based families, have been developed in order to meet the criteria of minimal area and fast throughput. Yet the main drawbacks of analog systems include sensitivity to ambient noise and to temperature, as well as the lack in efficient automated synthesis methods and tools. On the other hand, purely digital realizations have the advantage of a limited but well defined precision that is given by the quantification of all neuron parameters. One main characteristic of purely digital realizations is their straightforward design-flow; some realizations start from a high-level hardware language description such as VHDL, to be synthesized into a standard-cells based architecture or a FPGA. The extensive reuse of precharacterized modules, may these be VHDL-based descriptions or mask layouts, is yet another possible solution to speeding up the IC development process.

Combining the advantages of both analog and digital realizations into a novel mixed-mode architecture is the purpose of the implementation described in this paper. We will focus on developing a simple design-flow aiming at the integration of artificial neural networks with on-chip learning into an autonomous and easily reconfigurable integrated circuit architecture. We will show the silicon integration of a Hamming network [3] of 20 charge-based neurons, interacting with a purely digital unit which the on-chip-learning and circuit control tasks are dedicated to. Section II gives an overview on the architecture of the realized IC. The integrated ANN architecture and operation is described in Section III. The algorithmic aspects of the implemented training algorithm are explored in Section IV, and Section V presents the IC realization.

# II. The Circuit Architecture and its Design-Flow

#### A. The Main Building Blocks

The overall circuit architecture is divided into two main parts with regard to their operating modes, i.e. analog and digital. The analog ANN unit executes the neural function processing based on a charge-based circuit structure; it is composed of a 20 neuron layer, each with 10-bit vector inputs. The winner-take-all (WTA) [4] unit is devoted to the task of selecting one neuron as the winner on the criteria of best degree of matching between the stored pixel pattern and the current input vector. On the other hand, the error correction unit (OLU), the circuit control (CCU) and clock generator (CGU) units perform purely digital operations (see Fig. 1).



Fig. 1. Block-diagram of the implemented IC.

This mixed analog-digital architecture is consistent with our objectives to construct a flexible, straightforward

Manuscript received September 19, 1997; revised February 26, 1998 and June 18, 1998.

The authors are with the Integrated Systems Center, Department of Electrical Engineering, Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. Y. Leblebici is currently with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA, 01609.

design-flow with reusability properties, and also to address the issues of efficient and compact design. The ANN operates in the analog domain and thus inherits most of advantages associated with it, especially speed of neural function execution and compact design. All the digital parts on the other hand were designed, simulated and then synthesized from a VHDL high-level hardware language description. This design-flow significantly simplifies such issues as fast prototyping of a new algorithm into an IC, fast integration of the selected architecture, and easy layout floorplanning. A dedicated multipurpose memory unit (MMU) which has a scan-path architecture with parallel and serial read/write ability is devoted to the task of loading the initial weights and observing the new processed weights. This unit is an operating and test structure that together with others allow full testability and observability of the IC.

# B. The Control Signals and Data Flow

The CCU is the master circuit controller; i.e. all other units are subordinated to this unit. Its tasks mainly include the synchronization of all processing units among themselves and with the circuit supervisor, as well as the input/output protocol implementation. The control and dataflow are represented on Fig. 2. Notice that the ANN and WTA are completely controlled by the CGU that has the ability of generating the clock signals  $\phi_1$  through  $\phi_5$ , asynchronously from the circuit master clock.

| Initiali   | zation                  | Trainin         | ıg            | Forward                                    |        |  |
|------------|-------------------------|-----------------|---------------|--------------------------------------------|--------|--|
|            | For                     | ward H          | Backprop.     | Processing                                 |        |  |
|            | Pa                      | ass             | Pass          |                                            |        |  |
| supervisor | INITS START             |                 |               | (START)                                    |        |  |
| CCU        | ( <u></u> ( <u>s_CG</u> | Ū∕ <b>(</b> ∕s_ | <u>olu</u> >  | DONE (s CGU)                               | (DONE) |  |
| CGU        | ((                      | • ø,cycle –     |               | $( \sqrt{\phi_{,-}\phi_{,} \text{cycle}})$ | 5      |  |
| ANN -      |                         | wd pass         | <u> </u>      | fwd pass                                   | -      |  |
| OLU-       |                         | ¥               | backprop pass | /                                          |        |  |

| Signal       | Consequence                                         |
|--------------|-----------------------------------------------------|
| ld_INITS     | IC initialization sequence. IC in wait mode until   |
|              | START                                               |
| START        | A new vector is available for processing. One for-  |
|              | ward processing pass is allowed                     |
| s_CGU        | Makes the CGU produce one cycle of $\phi_1$ through |
|              | $\phi_1$ signal clocks                              |
| s_OLU        | Makes the OLU start one error correction cycle      |
| ld <b>_w</b> | Loads the new processed weights into the ANN        |
| DONE         | The launched process has finished. When in for-     |
|              | ward processing mode : a new result is available    |

Fig. 2. Control signals and dataflow between the main blocks.

The external dataflow consists in presenting a new data on the START event; and sampling the result on the DONE event. The internal dataflow is simplified; no complex dataflow control structure are required as any new vector is immediately processed by the ANN. This data is lost at the end of a cycle since no internal framing is available.

#### C. The Circuit Environment

The developed architecture needs to interact with a supervisor to download several process control and data signals in order to allow proper work. This global control system may either be a dedicated microcontroller in the case of an embedded microsystem or a piece of control software driving a conventional microprocessor in a computer architecture. This requirement obviously reduces the autonomy of the overall circuit architecture and assumes that it has to be included in a complete system such as a computer board. Nevertheless, the ability to modify some algorithmic parameters and decision criteria in real time significantly improves the efficiency of the system.

For example, the learning rate parameter  $\eta$  has a significant influence on the ANN convergence and on its ability to properly acquire knowledge. As stated in Section IV its value may be downloaded into the IC at any time. A signal indicating whether or not error correction was applied during the last cycle is sent to the supervisor in order to keep the decision of accepting or rejecting the convergence condition outside the chip. Following the same idea, the selection of the neuron to be trained is also devoted to the supervisor. The threshold value also has to be produced externally in the form of an analog voltage  $V_{\theta}$ .

All of these features could be easily integrated into a fully autonomous version of the developed architecture, which, however, would result in loss of flexibility due to the impossibility to modify any parameters.

# III. THE MIXED ANALOG-DIGITAL ANN Architecture and Operation

### A. The ANN Circuit Architecture

A Hamming network is a two layer feed-forward ANN with the ability to classify noise corrupted patterns. Its internal architecture consists in a first layer of neurons performing in parallel the Hamming distance of a m-bit digital input vector with n previously stored exemplar patterns - this is the quantifier subnet; the second layer is devoted to the selection of the winner neuron which is the one with smallest Hamming distance to the input vector (see Fig. 3) - this is the discriminator subnet. This network performs efficient classification for relatively low complexity, and always converges to one of the previously stored combinations.



Fig. 3. General structure and functional description of a Hamming network.

The number of independent neurons (n) corresponds to the number of patterns to be sorted out, and the number of synapses (m) associated with each neuron corresponds to the number of input vector components.

For the realization of the Hamming network, we use a modified version of the charge-based circuit architecture first presented in [5], which was originally designed with fixed weights. In particular, the circuit architecture was modified to allow simple programming of the input weights. Since this paper is primarily focused on the overall system architecture, a detailed analysis of the charge-based quantifier and discriminator subnets is not presented here. The fundamental circuit architecture of the capacitive Hamming network is essentially identical to the fixed-weight classifier circuit published earlier, the operation and limitations of which were well documented in [5]. It has also been experimentally demonstrated earlier that charge-based circuit architectures offer the advantages of high integration density, high speed and low power dissipation, while sensitivity limitations (discriminator offset) that may stem from circuit/device mismatch still allow a relatively large input vector size [5], [6]. For a detailed description and electrical analysis of the charge-based capacitive Hamming network architecture, the reader is referred to [5].

Each charge-based synapse is composed of four binary weighted capacitors as well as four memory latches to support programmability of the device. The capacitor values associated with each synapse are chosen as  $C_i = 2^n C_u$ , where  $n = 0, \dots, 3$  and  $C_u$  is unit capacitance. Thus the modified configurable circuit architecture of the chargebased Hamming network allows four-bit weight programming.

#### B. The ANN Circuit Operation

The circuit operates in two distinctive modes (forward processing mode and training mode) which can be selected by an external triggering signal. The circuit operates in the following sequence when in *forward processing mode* (also called *recall mode*):

- Initialization phase: the initial weights (or newly processed weights) are downloaded into the internal synaptic memory.
- Quantification phase: the input vector is applied. Depending on the programmed weight values, all dendritic voltages in the ANN structure assume their new level.
- Discrimination phase: the WTA processes to winner selection. The result may be sampled when convergence of the WTA is reached.

The circuit controller flags the availability of a new maximum likelihood classification result. All these steps repeat every time a new forward processing pass is required under the control of the circuit supervisor unit.

The circuit has to be trained in order to acquire experience of the patterns to be sorted out. This happens during the *training mode* which is divided into two passes: one *forward pass* and one *backpropagation pass* (error correction pass).

- Forward pass: the training forward pass is identical to the normal forward processing mode, with the exception that only one neuron is activated at a time (each neuron is trained separately). The neuron to be trained is selected by the supervisor and is activated through a forward processing pass using the training pattern, while all other neurons are kept in idle mode to prevent undesired interaction.
- Back-propagation pass: given the current input vector, the current processing weights and the binary result of the forward pass, the circuit controller activates the OLU, the digital unit which computes the weight values the ANN will have in the next cycle, to process the learning algorithm. The OLU computes the new weights to be downloaded into the synaptic memory. The circuit controller flags the end of the cycle to the outside, and indicates whether or not error correction was to be applied during the current training pass. The circuit supervisor may then decide on the necessity of a refining training pass with the same or another threshold value, or to train another neuron because convergence was satisfactorily achieved.

# IV. The Learning Algorithm and Algorithmic Considerations

Hardware implementations of ANNs are typically subject to restrictions in terms of area, power and time which may complicate the realization of a chosen learning algorithm. The so-called hardware-friendly algorithms [7] are intended to yield a simple hardware realization, yet also achieve a high degree of efficiency despite of limited precision of computation, approximation of the implied functions, and perturbing effects of quantization.

The training algorithm was chosen as a hardwarefriendly adaptation (2) of the error correction learning algorithm (1) [8].

$$\mathbf{w}(n+1) = \mathbf{w}(n) + \eta [d(n) - y(n)]\mathbf{x}(n)$$
(1)

Here **w** stands for the weight vector, **x** for the input vector, d is the expected output and y the actual neuron output result,  $\eta$  is the learning rate, n is the time increment.

$$\mathbf{w}(n+1) = \mathbf{w}(n) \pm \zeta \tag{2}$$

where  

$$\zeta = \eta [d(n) - y(n)] \mathbf{x}(n) \text{ if } x_j = 1$$
or  

$$\zeta = 0 \text{ if } x_j = 0$$

The use of a WTA unit restricts the input vector  $\mathbf{x}$  to be purely binary; thus all of its components belong to the binary set  $\{0,1\}$ . This fact, together with the hard limiting activation function in the ANN produces a purely binary result to the  $[d(n) - y(n)]\mathbf{x}(\mathbf{n})$  operation. Hence, the influence of the  $\eta$  parameter is enhanced as it remains the only non-binary parameter to be multiplied with one of the logical values  $\{0,1\}$ . Thus the system was designed so as to allow the  $\zeta$  value to be changed at any time by the supervisor controller.

Prior to the design of the unit, C simulations were run to validate the hardware oriented algorithms. A specific simulation tool was developed in order to produce a realistic high-level characterization, which is based on the model of a neuron that optimally reproduces the analog behavior of the real implementation in the integer domain. The simulations were run on a network consisting of nine neurons with 9-bit vectors to classify. The small size of the network does not affect in any way the quality of the results; expanding the network to a larger one would result in a longer delay to reach convergence (in a general case). The training set and simulation parameters can be seen on Fig. 4.

|   | Рат. 1                                                 | Pat.2                                                  | Pat. 3                                                | Pat.4                                                  | Рат. 5                                                 | Pat. 6                                                                               | Pat.7                                                                | Рат. 8                                                 | Рат.9                                                  |
|---|--------------------------------------------------------|--------------------------------------------------------|-------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------------------------------------|----------------------------------------------------------------------|--------------------------------------------------------|--------------------------------------------------------|
| A |                                                        |                                                        |                                                       |                                                        |                                                        |                                                                                      |                                                                      |                                                        |                                                        |
| в |                                                        |                                                        |                                                       |                                                        |                                                        |                                                                                      |                                                                      |                                                        |                                                        |
| D |                                                        |                                                        |                                                       |                                                        |                                                        |                                                                                      |                                                                      |                                                        |                                                        |
| с | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $\begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $\begin{array}{c ccc} 0 & 2 & 4 \\ \hline 3 & 1 & 4 \\ \hline 4 & 3 & 1 \end{array}$ | $\begin{array}{c c} 3 & 4 & 2 \\ 1 & 0 & 2 \\ 4 & 0 & 1 \end{array}$ | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ | $ \begin{array}{c ccccccccccccccccccccccccccccccccccc$ |
| D | 15 0 15<br>0 15 0<br>14 0 14                           | 13 0 14<br>14 3 12<br>14 1 11                          | 131312<br>061<br>151213                               | 4 15 2<br>15 15 15<br>4 15 0                           | 8 0 0<br>14 3 0<br>14 14 8                             | $\begin{array}{c ccccccccccccccccccccccccccccccccccc$                                | 9 14 14<br>0 3 14<br>0 0 7                                           | 0 0 9<br>0 4 14<br>8 14 14                             | 9 8 8<br>11 3 9<br>7 11 9                              |
| Е | 1000                                                   | 105                                                    | 1050                                                  | 900                                                    | 900                                                    | 900                                                                                  | 900                                                                  | 900                                                    | 1000                                                   |

| FIGURE | LEGEND                                        |  |  |  |
|--------|-----------------------------------------------|--|--|--|
| А      | Noiseless patterns to be recognized PAT. 1 TO |  |  |  |
|        | Pat. 9                                        |  |  |  |
| В      | Noisy patterns                                |  |  |  |
| С      | Initial weights                               |  |  |  |
| D      | Resulting weights                             |  |  |  |
| Е      | Threshold $\theta$                            |  |  |  |

Fig. 4. Simulation of the implemented algorithms for a pattern recognition/classification example.

Some simulations were run to test the behavior of the network when confronted to unknown patterns, which highlighted the efficiency of the network in generalizing (see Fig. 5).



Fig. 5. PAT. 1 perturbed by noise. All of these patterns were correctly classified.

#### V. REALIZATION OF THE ANN INTEGRATED CIRCUIT

A test chip implementing the developed architecture was designed and realized in AMS (Austria Micro Systems) CMOS  $0.8\mu$  2-Poly technology [9], [10]. The layout can be seen on Fig. 6. The die size is less than  $13mm^2$ ; the functional modules (ANN, WTA, CCU, CGU and OLU) occupy less than  $5mm^2$ . The number of pins is 100, several pins being attributed to additional test structures.

The test features were integrated into the design so as to make each main building element testable independently from all others. As previously mentioned the multipurpose memory unit is fully accessible in read/write mode, which allows to load the weights to be read by the ANN or the OLU, or download them to check the computation of the OLU. The binary result of the WTA output in forward processing mode (is current presented pattern recognized as being the one stored in current trained neuron?) is also fully accessible in read/write mode, which allows for testability of WTA and control of the OLU in test mode. The WTA outputs are all connected to output pins which ensures full testability over the ANN and WTA. The reason for observing all the WTA outputs lies in the internal operation of the WTA that may produce multiple winner selection. The ANN and WTA can be tested independently; in this test mode all the driving clock signal are provided by external means via the CLK\_DVR unit. Finally one single neuron with full external access was integrated to allow sensitivity and speed tests.



Fig. 6. Microphotograph of the realized IC. All operative units occupy an area less than  $5mm^2$ ; the overall die size is less than  $13mm^2$ , with a 100 PGA package. Several test structures and test pins were implemented to allow easy testability of the chip.

All of the major modules on chip were tested separately to confirm their functionality. The digital error correction unit (OLU), circuit control unit (CCU), multi-purpose memory unit (MMU) and the clock generator unit (CGU) were tested using the HP82000 testing environment and were found to be fully functional. Measurements were also performed to verify the operation of the analog ANN and WTA modules. The WTA was found to operate correctly for all cases with a minimum Hamming distance of two bits or more. Discrimination of a winner neuron was found to become problematic in cases where the minimum Hamming distance is only one bit, which indicates that the unit weight capacitance of 17 fF actually remains below the limit value dictated by the process-dependent quantifier offset voltage. Extensive measurements for a complete characterization of the entire ANN architecture are currently continuing.

The main effort was not put on developing a high-speed architecture. Nevertheless, a speed of 4M inferences per second is expected in forward processing mode with an external control. Internal control processing is limited by the slowest clock signal to be produced by the CGU, by the circuit master clock and by the control path which makes it difficult to evaluate. A realistic estimation gives an expected speed of 100K inferences per second with a master circuit clock reaching 10MHz and an ANN driving clock reaching 1MHz.

# VI. CONCLUSIONS

We have demonstrated in this paper the integration of a novel artificial neural network architecture. The proposed mixed analog-digital realization is based on an analog ANN block which interacts with a purely digital learning unit, implementing the error correction learning algorithm, as well as the circuit control part. The ANN is a Hamming network including a first layer of charge-based neurons driving a WTA unit.

A test chip containing 20 neurons of 10 synapses each has been designed using a AMS CMOS 0.8 2-poly technology. It has an active area of less than  $5mm^2$  for a die size of  $13mm^2$ .

The general idea in this development was to establish a valid design-flow for a ANN-based integrated circuit to be reusable in some other applications, rather than focus on integrating a high throughput processing unit.

The mixed analog-digital architecture presented in this work can be used in applications where the main focus is the on-chip learning ability of the ANN rather than a high processing/inference capability. This includes all autonomous systems with relatively slow time constant but a very long lifetime. Possible applications may be found in medical engineering, automotive engineering and consumer products.

#### References

[1] H. J. Kappen, An overview of neural network applications, Proceedings of the 6th International Congress for Computer Tech-

nology in Agriculture, Wageningen, The Netherlands, pp. 75-79, 1996.

- [2] Sied Mehdi Fakhraie, VLSI-Compatible Implementations for Artificial Neural Networks, Kluwer Academic Publishers, Dordrecht, The Netherlands, 1997.
- [3] Richard P. Lippmann, An Introduction to Computing with Neural Nets, IEEE ASSP Magazine, pp. 4-20, April 1987.
- [4] Z. Sezgin Günay and Edgar Sánchez-Sinencio, CMOS Winner-Take-All Circuits: A Detail Comparison, 1997 International Symposium on Circuits and Systems ISCAS'97, Hong Kong, pp.41-44, June 1997.
- [5] Uğur Çilingiroğlu, A Charge-Based Neural Hamming Classifier, IEEE Journal of Solid-State Circuits, Vol. 28, No. 1, pp. 59-67, January 1993.
- [6] Hakan Özdemir, Asim Kepkep, Banu Pamir, Yusuf Leblebici, and Uğur Çilingiroğlu, A Capacitive Threshold-Logic Gate, IEEE Journal of Solid-State Circuits, Vol. 31, No. 8, pp. 1141-1150, August 1996.
- [7] Perry Moerland, Emile Fiesler, Neural Network Adaptations to Hardware Implementations, Handbook of Neural Computation, E. Fiesler and R. Beale Editors, Institute of Physics Publishing and Oxford University Publishing, New York, E1.2:1-13, 1996.
- [8] Simon Haykin, Neural Networks: A Comprehensive Fundation, Macmillan College Publishing Company, New-York, 1994.
- [9] Alexandre Schmid, Yusuf Leblebici and Daniel Mlynek, A Charge-Based Artificial Neural Network with On-Chip Learning Ability, 5th European Congress on Intelligent Techniques & Soft Computing EUFIT'97, Aachen, Germany, pp. 250-254, September 1997.
- [10] Alexandre Schmid, Yusuf Leblebici and Daniel Mlynek, Hardware Realization of a Hamming Neural Network with On-Chip Learning, 1998 IEEE International Symposium on Circuits and Systems ISCAS'98, Monterey, CA, June 1998.