

# **CENICS 2015**

The Eighth International Conference on Advances in Circuits, Electronics and Micro-electronics

ISBN: 978-1-61208-430-5

August 23 - 28, 2015

Venice, Italy

## **CENICS 2015 Editors**

Falk Salewski, Muenster University of Applied Sciences, Germany

Sergey Yurish, IFSA, Spain

## **CENICS 2015**

## Foreword

The Eighth International Conference on Advances in Circuits, Electronics and Microelectronics (CENICS 2015), held between August 23-28, 2015 in Venice, Italy, continued a series of events initiated in 2008, capturing the advances on special circuits, electronics, and microelectronics on both theory and practice, from fabrication to applications using these special circuits and systems. The topics cover fundamentals of design and implementation, techniques for deployment in various applications, and advances in signal processing.

Innovations in special circuits, electronics and micro-electronics are the key support for a large spectrum of applications. The conference is focusing on several complementary aspects and targets the advances in each on it: signal processing and electronics for high speed processing, micro- and nano-electronics, special electronics for implantable and wearable devices, sensor related electronics focusing on low energy consumption, and special applications domains of telemedicine and ehealth, bio-systems, navigation systems, automotive systems, home-oriented electronics, bio-systems, etc. These applications led to special design and implementation techniques, reconfigurable and self-reconfigurable devices, and require particular methodologies to be integrated on already existing Internet-based communications and applications. Special care is required for particular devices intended to work directly with human body (implantable, wearable, eHealth), or in a human-close environment (telemedicine, house-oriented, navigation, automotive). The mini-size required by such devices confronted the scientists with special signal processing requirements.

We take here the opportunity to warmly thank all the members of the CENICS 2015 Technical Program Committee, as well as the numerous reviewers. The creation of such a high quality conference program would not have been possible without their involvement. We also kindly thank all the authors who dedicated much of their time and efforts to contribute to CENICS 2015. We truly believe that, thanks to all these efforts, the final conference program consisted of top quality contributions.

Also, this event could not have been a reality without the support of many individuals, organizations, and sponsors. We are grateful to the members of the CENICS 2015 organizing committee for their help in handling the logistics and for their work to make this professional meeting a success.

We hope that CENICS 2015 was a successful international forum for the exchange of ideas and results between academia and industry and for the promotion of progress in the field of circuits, electronics and micro-electronics.

We hope Venice provided a pleasant environment during the conference and everyone saved some time for exploring this beautiful city.

## **CENICS 2015 Chairs:**

Vladimir Privman, Clarkson University - Potsdam, USA Sergey Y. Yurish, Technical University of Catalonia (UPC-Barcelona), Spain Martin Horauer, University of Applied Sciences Technikum Wien, Austria Adrian Muscat, University of Malta, Malta

## **CENICS 2015 Research/Industry Chairs**

Ravi M. Yadahalli, PES Institute of Technology & Management - Karnataka, India

### **CENICS 2015 Industry Liaison Chairs**

Falk Salewski, Muenster University of Applied Sciences, Germany

### **CENICS 2015 Publicity Chair**

Sandra Sendra Compte, Universidad Politécnica de Valencia, Spain

## **CENICS 2015 Special Area Chairs**

Formalisms Peeter Ellervee, Tallinn University of Technology, Estonia Application-oriented Josu Etxaniz Marañon, University of the Basque Country / Universidad del País Vasco / Euskal Herriko Unibertsitatea - Bilbao, Spain Sensors Yulong Zhao, Xi'an Jiaotong University, China

## **CENICS 2015**

## Committee

#### **CENICS Advisory Chairs**

Vladimir Privman, Clarkson University - Potsdam, USA Sergey Y. Yurish, Technical University of Catalonia (UPC-Barcelona), Spain Martin Horauer, University of Applied Sciences Technikum Wien, Austria Adrian Muscat, University of Malta, Malta

#### **CENICS 2015 Research/Industry Chairs**

Ravi M. Yadahalli, PES Institute of Technology & Management - Karnataka, India

#### **CENICS 2015 Industry Liaison Chairs**

Falk Salewski, Muenster University of Applied Sciences, Germany

#### **CENICS 2015 Publicity Chair**

Sandra Sendra Compte, Universidad Politécnica de Valencia, Spain

#### **CENICS 2015 Special Area Chairs**

#### Formalisms

Peeter Ellervee, Tallinn University of Technology, Estonia

#### **Application-oriented**

Josu Etxaniz Marañon, University of the Basque Country / Universidad del País Vasco / Euskal Herriko Unibertsitatea - Bilbao, Spain

#### Sensors

Yulong Zhao, Xi'an Jiaotong University, China

## **CENICS 2015 Technical Program Committee**

Amr Abdel-Dayem, Laurentian University, Canada Amir Shah Abdul Aziz, TM Research & Development, Malaysia Adel Al-Jumaily, University of Technology, Sydney Said Al-Sarawi, The University of Adelaide, Australia Mohammad Amin Amiri, Iran University of Science and Technology, Iran Henri Basson, University of Lille North of France (Littoral), France Lotfi Bendaouia, ETIS-ENSEA, France Yngvar Berg, Vestfold University College, Norway Madhu Bhaskaran, RMIT University, Australia Manuel José Cabral dos Santos Reis, University of Trás-os-Montes e Alto Douro, Portugal Javier Calpe, University of Valencia, Spain James M. Conrad, University of North Carolina at Charlotte, USA Jose Carlos Meireles Monteiro Metrolho, Polytechnic Institute of Castelo Branco, Portugal David Cordeau, CNRS-XLIM, UMR 7252, University of Poitiers, France Marc Daumas, Université de Perpignan, France Javier Diaz-Carmona, Technological Institute of Celaya, Mexico Gordana Jovanovic Dolecek, Institute INAOE - Puebla, Mexico Rolf Drechsler, University of Bremen, Germany Peeter Ellervee, Tallinn University of Technology, Estonia Ykhlef Fayçal, Centre de Développement des Technologies Avancées, Algeria Sérgio Adriano Fernandes Lopes, Universidade do Minho, Portugal Francisco V. Fernández, IMSE, CSIC and University of Sevilla, Spain Joaquim Filipe, EST Setubal, Portugal Patrick Girard, LIRMM, France Luis Gomes, Universidade Nova de Lisboa, Portugal Petr Hanáček, Brno University of Technology, Czech Republic Houcine Hassan, Polytechnic University of Valencia, Spain Martin Horauer, University of Applied Sciences Technikum Wien, Austria Chun-Hsi Huang, University of Connecticut, U.S.A. Wen-Jyi Hwang, National Taiwan Normal University, Taiwan Emilio Jiménez Macías, University of La Rioja, Spain Anastasia N. Kastania, Athens University of Economics and Business, Greece Kenneth Blair Kent, University of New Brunswick, Canada Eric Kerherve, University of Bordeaux, France Israel Koren, University of Massachusetts at Amherst, USA Tomas Krilavicius, Vytautas Magnus University - Kaunas & Baltic Institute of Advanced Technologies -Vilnius, Lithuania Junghee Lee, University of Texas at San Antonio, USA Kevin Lee, Murdoch University, Australia Hongen Liao, Tsinghua University, China Diego Liberati, National Research Council of Italy, Italy Alie Eldin Mady, University College Cork (UCC) - Cork, Ireland Cesare Malagu', University of Ferrara and Istituto di acustica e sensoristica Orso Maria Corbino CNR-IDASC, Italy José Carlos Metrôlho, Instituto Politécnico de Castelo Branco, Portugal Harris Michail, Cyprus University of Technology, Cyprus Yoshikazu Miyanaga, Hokkaido University, Japan Bartolomeo Montrucchio, Politecnico di Torino, Italy Adrian Muscat, University of Malta, Malta Shinobu Nagayama, Hiroshima City University, Japan Arnaldo Oliveira, Universidade de Aveiro, Portugal Adam Pawlak, Silesian University of Technology - Gliwice, Poland George Perry, University of Texas at San Antonio, USA Angkoon Phinyomark, Prince of Songkla University, Thailand Eduardo Correia Pinheiro, Instituto de Telecomunicações - Lisboa, Portugal Katalin Popovici, MathWorks, USA Adam Postula, University of Queensland, Australia Anton Satria Prabuwono, Universiti Kebangsaan Malaysia, Malaysia

Vladimir Privman, Clarkson University - Potsdam, USA Càndid Reig, University of Valencia, Spain Marcos Rodrigues, Sheffield Hallam University, U.K. Julio Sahuquillo, Universitat Politècnica de València, Spain Falk Salewski, Muenster University of Applied Sciences, Germany Marc Sevaux, Université de Bretagne-Sud, France Arvind K. Srivastava, NanoSonix Inc., USA Ephraim Suhir, University of California – Santa Cruz, USA Ivo Stachiv, National Chung-Cheng University / Institute of Physics - Czech Academy of Sciences, Taiwan / & Czech Republic João Manuel R. S. Tavares, Universidade do Porto, Portugal Felix Toran, European Space Agency, Germany Francisco Torrens, Institut Universitari de Ciencia Molecular / Universitat de Valencia, Spain Carlos M. Travieso-González, University of Las Palmas de Gran Canaria, Spain Miroslav Velev, Aries Design Automation, USA Manuela Vieira, UNINOVA/ISEL, Portugal Thomas Webster, Northeastern University, USA Chin-Long Wey, National Central University, Taiwan Robert Wille, University of Bremen, Germany Ravi M. Yadahalli, PES Institute of Technology & Management - Karnataka, India Sergey Y. Yurish, IFSA, Spain David Zammit-Mangion, University of Malta – Msida, Malta

## **Copyright Information**

For your reference, this is the text governing the copyright release for material published by IARIA.

The copyright release is a transfer of publication rights, which allows IARIA and its partners to drive the dissemination of the published material. This allows IARIA to give articles increased visibility via distribution, inclusion in libraries, and arrangements for submission to indexes.

I, the undersigned, declare that the article is original, and that I represent the authors of this article in the copyright release matters. If this work has been done as work-for-hire, I have obtained all necessary clearances to execute a copyright release. I hereby irrevocably transfer exclusive copyright for this material to IARIA. I give IARIA permission or reproduce the work in any media format such as, but not limited to, print, digital, or electronic. I give IARIA permission to distribute the materials without restriction to any institutions or individuals. I give IARIA permission to submit the work for inclusion in article repositories as IARIA sees fit.

I, the undersigned, declare that to the best of my knowledge, the article is does not contain libelous or otherwise unlawful contents or invading the right of privacy or infringing on a proprietary right.

Following the copyright release, any circulated version of the article must bear the copyright notice and any header and footer information that IARIA applies to the published article.

IARIA grants royalty-free permission to the authors to disseminate the work, under the above provisions, for any academic, commercial, or industrial use. IARIA grants royalty-free permission to any individuals or institutions to make the article available electronically, online, or in print.

IARIA acknowledges that rights to any algorithm, process, procedure, apparatus, or articles of manufacture remain with the authors and their employers.

I, the undersigned, understand that IARIA will not be liable, in contract, tort (including, without limitation, negligence), pre-contract or other representations (other than fraudulent misrepresentations) or otherwise in connection with the publication of my work.

Exception to the above is made for work-for-hire performed while employed by the government. In that case, copyright to the material remains with the said government. The rightful owners (authors and government entity) grant unlimited and unrestricted permission to IARIA, IARIA's contractors, and IARIA's partners to further distribute the work.

## **Table of Contents**

| An Efficient Spike Detection VLSI Architecture Based on Normalized Correlator<br>Wen-Jyi Hwang, Chun-Fu Lin, and Szu-Huai Wang                                                                                          | 1  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| COTS or Custom Made? Design Decisions for Industrial Control Systems<br>Falk Salewski                                                                                                                                   | 7  |
| Design Guidelines for Designing High Gain Patch Antenna in the Ku-band<br>Qasim Umar Khan and Mojeeb Bin Ihsan                                                                                                          | 13 |
| Filtering of Magnetic Noise Induced in Magnetometers by Motors of Micro-Rotary Aerial Vehicle Nathan Unwin and Adam Postula                                                                                             | 17 |
| Implementation and Comparison of Conventional and Ordering Based RO-PUFs for Secret Key Generation Giray Komurcu, Ali Emre Pusane, and Gunhan Dundar                                                                    | 23 |
| Hopf Bifurcation Analysis and Implementation of Single Tunnel Diode Oscillator Circuit Mustafa Fayez, Mohammad Awwad, and Hassan El-Hamouly                                                                             | 28 |
| Reconfigurable Hyper-Structures for Intrinsic Digital Circuit Evolution<br>Spyros Kazarlis, John Kalomiros, Vassilios Kalaitzis, Dimitrios Bogas, Paris Mastorokostas, Anastasios<br>Balouktsis, and Vassilios Petridis | 31 |
| Design and Implementation of a 94 GHz CMOS Down-Conversion Mixer for Image Radar Sensors<br>Yo-Sheng Lin, Chien-Chin Wang, Guo-Hao Li, and Jay-Min Liu                                                                  | 37 |

## An Efficient Spike Detection VLSI Architecture Based on Normalized Correlator

Wen-Jyi Hwang Chun-Fu Lin Szu-Huai Wang Department of Computer Science and Instrument Technology Research Center Department of Computer Science and Information Engineering National Applied Research Laboratories, Taiwan Information Engineering National Taiwan Normal University Email: vincent@itrc.narl.org.tw National Taiwan Normal University Taipei, 117, Taiwan Taipei, 117, Taiwan Email: a0919779123@gmail.com Email: whwang@csie.ntnu.edu.tw

*Abstract*—This paper aims to present an effective circuit for noisy spike detection. The circuit detects spikes by the normalized correlators. The operations of the correlators involve filtering, block energy computation, normalized correlation, and thresholding. All the computations are carried out in a pipelined fashion. The circuit has been implemented by the field programmable gate arrays (FPGAs). The circuit is used as a hardware accelerator in a network-on-chip (NOC) platform for performance evaluation. Experimental results reveal that the proposed circuit provide realtime computation for the noisy spike detection with high true postive and low false alarm rates.

Keywords-Spike Sorting; Spike Detection; FPGA; Network on Chip

#### I. INTRODUCTION

Spike sorting [1] is often desired for the design of brain machine interface (BMI) [2]. It receives spike trains from extracellular recording systems. Each spike train is a mixture of the trains from neurons near the recording electrodes. Spike sorting is able to segregate the spike trains of individual neurons from this mixture. It usually involves detection, feature extraction, and classification operations. Spike detection is the first step of the spike sorting. The goal of spike detection is to separate spikes from background noise. Extracellularly recorded signals are inevitably corrupted by noise from a number of sources such as the recording hardware and electromagnetic interference. In the presence of large noise, successful spike detection is essential for subsequent accurate feature extraction and classification.

One way to perform the spike detection is based on the energy of spike trains. An example of energy-based spike detection is the nonlinear energy operator (NEO) [3], which computes the energy difference between the signal's current power and the power in adjacent time intervals. The energy of coefficients in wavelet domain may also be useful for spike detection [4]. The energy-based methods are simple and efficient. However, when noise becomes large, proper selection of threshold values for these algorithms may be difficult. Therefore, their performance may deteriorate rapidly as noise energy increases. An alternative to the energy-based methods is to utilize the templates of spikes for detection. A typical technique using templates is based on matched filters [5]. A drawback of the matched filters is the high computational complexities. Realtime spike detection may then be difficult when matched filters are implemented by software. In addition, similar to the energy-based methods, it may be difficult to find an effective threshold levels for matched filters when noise becomes large.

A number of hardware implementations for spike sorting have been proposed for reatime spike sorting. Some hardware implementations [6] are based on NEO because of is simplicity and low area costs so that the circuits may be implantable at the front end for online detection. Nevertheless, the circuits may not be suited for detection with high noise levels. In addition, hardware designs are also beneficial for offline spike sorting [8] because of the requirement for processing large amount of data. With the relaxation of implantation requirement for offline processing, development and implementation of more efficient spike detection algorithms in hardware may be desired.

The objective of this paper is to present a novel VLSI architecture for realtime spike detection for noisy spike trains. The architecture is based on normalized correlator for enhancing detection performance. Segments of spike trains are normalized prior to the correlation computation. The normalization allows the output of the correlators lie inside a range, which is independent of the input spike trains and noise levels. This is beneficial for selecting an effective threshold levels for spike detection as signal-to-noise (SNR) ratios become low.

The proposed architecture can be separated into four units: the filter unit, the block energy computation unit, the correlator unit, and the thresholding unit. All the units are operated in a pipelined fashion to enhance the throughput of the circuit. The filter unit consists of a bandpass Butterworth filter capable of removing DC and high frequency components of spike trains. The filter is helpful for noise removal prior to correlation computation and detection. The block energy computation unit is used for calculating block energy of segments of spike trains. The normalized correlation is then carried out in the correlator unit. The thresholding unit then detects spikes based on the results produced by correlator unit.

The proposed architecture can be simplified for the design of implantable circuit. By retaining only the block energy computation unit, and thresholding unit, the proposed architecture becomes an incoherent energy detector, which performs generalized likelihood ratio test (GLRT) [7] for spike detection. The incoherent energy detector has the advantages of low area costs and low power consumption, while attaining higher throughput for spike detection.

The proposed architecture has been implemented by the field programmable gate arrays (FPGAs). The circuit is employed as a hardware accelerator in a network-on-chip (NOC) platform for performance evaluation. Experimental results show that the proposed architecture is able to attain high speed detection with high true positive rate and low false alarm rate even when SNR becomes -3 dB. Its simplified

version, which performs noncoherent energy detection, has the additional advantages of lower area costs at the expense of slightly inferior detection performance. They are effective alternatives for spike sorting applications requiring real-time computation with superior spike detection performance.

The remaining parts of this paper are organized as follows. Section 2 gives a brief review of the normalized correlation algorithm. Section 3 describes the proposed spike detection architecture. Experimental results are included in Section 4. Finally, the concluding remarks are given in Section 5.

#### II. THE NORMALIZED CORRELATION ALGORITHM FOR SPIKE DETECTION

We start with the basic matched filter technique for spike sorting, which can be implemented by convolving the spike trains with the pre-stored templates. For sake of simplicity, we assume the matched filter contains only one template. Let x[n] be the *n*-th sample of the input spike train. Let  $\mathbf{x}_n = [x[n], x[n-1], ..., x[n-N+1]]^T$  be the *n*-th segment of the spike train, where N is the length of the segment. The template for matched filtering contains also N elements, denoted by  $\mathbf{t} = [t[1], ..., t[N-1]]^T$ . The matched filter output at n, denoted by, y[n], is computed from the convolution

$$y[n] = \sum_{k=0}^{N-1} x[n-k]t[k] = \mathbf{x}_n^T \mathbf{t}.$$
 (1)

Note that the convolution is equivalent to the inner product of segment  $\mathbf{x}_n$  and template  $\mathbf{t}$ , which indicates the correlation between these two vectors. The segment  $\mathbf{x}_n$  is detected as a spike when y[n] is larger than a pre-specified threshold  $\eta$ .

A drawback of matched filter technique is that the threshold  $\eta$  alone cannot be used to determine the squared distance for template matching. To see this fact, we first observe that the squared distance between  $\mathbf{x}_n$  and  $\mathbf{t}$ , denoted by  $d(\mathbf{x}_n, \mathbf{t})$ , is given by

$$d(\mathbf{x}_n, \mathbf{t}) = ||\mathbf{x}_n||^2 + ||\mathbf{t}||^2 - 2\mathbf{x}_n^T \mathbf{t}.$$
 (2)

Therefore, when  $\mathbf{x}_n^T \mathbf{t} > \eta$ ,

$$d(\mathbf{x}_n, \mathbf{t}) \le ||\mathbf{x}_n||^2 + ||\mathbf{t}||^2 - 2\eta.$$
(3)

Therefore, when  $\mathbf{x}_n$  is detected as a spike (i.e.,  $\mathbf{x}_n^T \mathbf{t} > \eta$ ), we see from (3) that the upper bound of  $d(\mathbf{x}_n, \mathbf{t})$  is determined from  $||\mathbf{x}_n||^2$ ,  $||\mathbf{t}||^2$  and  $\eta$ , where  $||\mathbf{x}_n||^2$  is dependent on the input spike trains. When  $||\mathbf{x}_n||^2$  is large, it is possible that  $d(\mathbf{x}_n, \mathbf{t})$  is still large even  $\mathbf{x}_n^T \mathbf{t} > \eta$ . In this case, a false alarm may occur.

One way to overcome this problem is to normalize  $\mathbf{x}_n$  and  $\mathbf{t}$  before computing the correlation. Define  $\bar{\mathbf{x}}_n$  and  $\bar{\mathbf{t}}$  as the normalized version of  $\mathbf{x}_n$  and  $\mathbf{t}$ , respectively. That is,

$$\bar{\mathbf{x}}_n = \frac{\mathbf{x}_n}{||\mathbf{x}_n||}, \quad \bar{\mathbf{t}} = \frac{\mathbf{t}}{||\mathbf{t}||}.$$
(4)

Therefore,

$$d(\bar{\mathbf{x}}_n, \bar{\mathbf{t}}) = 2 - 2\bar{\mathbf{x}}_n^T \bar{\mathbf{t}}.$$
 (5)

Because  $d(\bar{\mathbf{x}}_n, \bar{\mathbf{t}}) > 0$ , it can be easily shown that

$$\bar{\mathbf{x}}_n^T \bar{\mathbf{t}} \le 1. \tag{6}$$

Our normalized correlator is based on  $\bar{\mathbf{x}}_n$  and  $\bar{\mathbf{t}}$ . When  $\bar{\mathbf{x}}_n^T \bar{\mathbf{t}} > \eta$ , then  $\mathbf{x}_n$  is detected as a spike. From (6), it follows that

$$\eta \le 1. \tag{7}$$

In addition, when  $\bar{\mathbf{x}}_n^T \bar{\mathbf{t}} > \eta$ , from (5) we see that

$$d(\bar{\mathbf{x}}_n, \bar{\mathbf{t}}) \le 2(1 - \eta),\tag{8}$$

which is dependent only on the threshold value  $\eta$ . Therefore, the threshold value for correlation computation uniquely determines the upper bound of squared distance for template matching after a spike is detected. In addition, a larger  $\eta$ implies a smaller squared distance  $d(\bar{\mathbf{x}}_n, \bar{\mathbf{t}})$ . The upper bound of  $\eta$  is 1, which is independent on the input spike trains.

The normalized correlator has more meaningful interpretation for the threshold value  $\eta$  because  $\eta \leq 1$ , and the upper bound of squared distance for template matching for a detected spike is  $2(1-\eta)$ . When  $\eta = 1.0$  is selected for detection, only the segments having *full* correlation with the template t are considered as spikes, and their squared distance with t is 0. When  $\eta = 0.5$ , all the segments having *half* correlation (or above) with t are detected as spikes, and the upper bound of their squared distances is 1. When  $\eta = 0$ , even the segments having no correlation with t are detected as the spikes, and the upper bound of their squared distances increases to 2. In the presence of noise, it may be impractical to require the detected spikes as the segments having full correlation (i.e.,  $\eta = 1.0$ ). In our experiments, the requirement of 70 % correlation (i.e.,  $\eta = 0.7$ ) may be sufficient for the normalized correlator to attain high detection hit rate, low miss rate, and low false alarm rate even for high noise levels. Detalied discussions of the normalized correlator can be found in our earlier work in [9].

Although the normalized correlator simply the process for the selection of threshold values, it has higher computation complexities for spike detection as compared with the basic matched filter technique. This is because the block energy of each segment need to be computed prior to the correlation computation. Hardware implementation of the normalized correlator may be beneficial for enhancing the throughput of the normalized correlator for realtime spike sorting.

#### III. THE PROPOSED ARCHITECTURE

Figure 1 shows the block diagram of the proposed architecture, which contains the filter unit, and block energy computation unit, the correlator unit, and thresholding unit. The filter unit is the pre-processing unit for the spike detection. It removes both the DC offset and noises before the detection operation. The goal of the block energy computation unit is to compute the block energy  $||\mathbf{x}_n||^2$ . The correlator unit then calculates  $\bar{\mathbf{x}}_n^T \bar{\mathbf{t}}$ . The detection results are then produced by the thresholding unit.

#### A. Filter Unit and Block Energy Computation Unit

In the implementation, the bandpass butterworth filter is used for the preprocessing operations. The filter can be implemented by shift registers, multipliers and adders. For sake of simplicity, the details of the implementation is not included. The direct implementation of the block energy computation involving N multiplications is also straightforward. Although N multipliers can be employed for the multiplications, the area



Figure 1. The Block Diagram of the Proposed Architecture for q templates



Figure 2. The Architecture of the Block Energy Computation Unit

costs can be high. An alternative is based on the observation that

$$|\mathbf{x}_n||^2 = ||\mathbf{x}_{n-1}||^2 + x^2[n] - x^2[n-N].$$
 (9)

Therefore, when the block energy of the previous block (i.e.,  $||\mathbf{x}_{n-1}||^2$ ) is known, the computation of the block energy of the current block needs only two multiplication for the computation of the square of the samples x[n] and x[n-N], as shown in Figure 2. There are one N-stage shift register, two multiplier and two adders in the block energy computation unit. The shift register is used to hold the values of the past samples (i.e., x[k], k = n-1, ..., n-N) in the first-in-first-out (FIFO) fashion. In addition to providing the value x[n-N] for the computation of  $x^2[n-N]$ , the shift register is beneficial for the correlation computation in the correlator unit.

#### B. Correlator Unit

In addition to multiplications, the correlator for the computation of  $\bar{y}[n] = \bar{\mathbf{x}}_n^T \bar{\mathbf{t}}$  requires the normalization operations. Although the normalized template  $\bar{\mathbf{t}}$  can be obtained offline, the computation of the normalized  $\bar{\mathbf{x}}_n$  should be carried out online. A direct implementation of the circuit for the computation of  $\bar{\mathbf{x}}_n$  is to divide each sample of  $\mathbf{x}_n$  by  $||\mathbf{x}_n||$ . This would require N dividers, because the dimension of the block  $\mathbf{x}_n$  is N. An alternative is based on the post-normalization technique, in which the inner product  $\mathbf{x}_n^T \bar{\mathbf{t}}$  is computed first. Because the inner product is a scalar, we can then use only one divider to compute  $\bar{\mathbf{x}}_n^T \bar{\mathbf{t}}$  by dividing  $\mathbf{x}_n^T \bar{\mathbf{t}}$  by  $||\mathbf{x}_n||$ . Figure 3 shows the architecture of the correlator unit for the case of two templates. Correlators for any q > 0 templates can be carried out in a similar fashion. As shown in the figure, there are 2N multipliers, two accumulators, one squared root circuit, and one divider. The samples of  $\mathbf{x}_n$  are obtained from the shift register in the block energy computation unit. The normalized templates  $\bar{\mathbf{t}}_1$  and  $\bar{\mathbf{t}}_2$  are pre-stored in the registers of the unit. To accelerate the correlation computation, there are N multipliers for the computation of each  $\bar{\mathbf{x}}_n^T \bar{\mathbf{t}}_i$ , i = 1, 2. In addition, the accumulation of the multication results are carried out in a pipelined fashion. The output of each accumulator is then divided by by  $||\mathbf{x}_n||$ . Observe from Figure 2 that the output of the block energy computation unit is  $||\mathbf{x}_n||^2$ . Therefore, the squared root (SQRT) circuit can be used to compute  $||\mathbf{x}_n||$ , as shown in Figure 3.

#### C. Thresholding Unit

Although the thresholding operations can be easily accomplished by a simple comparison circuit, the detection accurracy may be further improved by taking the detection results of the neighboring blocks into consideration. Because the neighboring blocks are overlapping, it is then likely that these blocks have similar normalized correlation values. A number of neighboring blocks may then have normalized correlation values larger than a pre-specified threshold. Consequently, it is possible that multiple hits may be declared for the occurrence of a single spike.

One way to solve this problem is not to declare a hit



Figure 3. The Architecture of the Correlator Unit for q = 2 Templates



Figure 4. The Architecture of the Thresholding Unit for q = 2 templates

when the normalized correlation value of a block is above the threshold. The normalized correlation values of the previous blocks are also ckecked. Among its K preceding blocks, if k of them are also above the threshold, a hit is then declared. This may effectively reduce the false alarm rate for the detection. The architecture of the thresholding unit is revealed in Figure 4. It can be observed from the figure that a K-stage shift register is used to store the thresholding results of the K previous blocks. Each stage contains 1-bit information, where 0 and 1 indicate the corresponding block has correlation value below and above the threshold  $\eta$ , respectively. Consequently, when the sum of the output of all the K stages is equal or above k, then k of the K preceding blocks have correlation value above the threshold. A hit is then issued.

Note that we may be able to further reduce the false alarm rate at the expense of a slight increase in true positive rate by imposing the assumption that spikes are at least M samples apart. The enforcement of the assumption can be carried out be an additional M-stage shift register recording the location of the previous hit. Each stage also has values of 0 or 1. If the previous hit is less than M samples apart, one of the stage in the shift register contains value of 1, which disables the hit. A hit is allowed to be issued only when all the stages contain value of 0.

#### D. Noncoherent Energy Detector

The proposed circuit can be simplified by removing the correlator unit. In this case, the output  $||\mathbf{x}_n||^2$  of the block energy computation unit is connected directly to the thresholling unit. The circuit will declare a hit when  $||\mathbf{x}_n||^2$  is above the threshold. This is the noncoherent energy detector proposed by [7]. As compared with the proposed circuit, the noncoherent energy detector has the advantages of lower area costs and power consumption at the expense of slightly lower true positive rates and/or higher false alarm rates. The circuit is advantageous for the applications where both the speed and area costs are the important concerns.

#### IV. EXPERIMENTAL RESULTS

This section presents some experimental results of the proposed architecture. The simulator developed in [10] is

| SNR (dB) |     | Normalized Noncoherent |                 | NEO     | SWT     | Matched |
|----------|-----|------------------------|-----------------|---------|---------|---------|
|          |     | Correlator             | Energy Detector |         |         | Filter  |
| 10       | TPR | 93.64 %                | 91.37 %         | 93.10 % | 94.82%  | 89.65 % |
|          | FAR | 0.40 %                 | 5.35 %          | 3.57 %  | 6.77 %  | 2.80 %  |
| 1        | TPR | 90.04 %                | 88.03 %         | 87.21 % | 92.43 % | 82.90 % |
|          | FAR | 0.92 %                 | 6.36 %          | 22.49 % | 79.36 % | 3.02 %  |
| -3       | TPR | 82.71 %                | 82.60 %         | 80.53 % | 86.66 % | 80.31 % |
|          | FAR | 1.06 %                 | 9.52 %          | 57.87 % | 82.43 % | 8.92 %  |

TABLE I. THE TPR AND FAR VALUES OF VARIOUS SPIKE DETECTION ALGORITHMS FOR SPIKE TRAINS WITH VARIOUS SNR LEVELS.



Figure 5. An example of the proposed normalized correlator for noisy spike detection with SNR=-3 dB for q = 2 templates.

adopted to generate extracellular recordings. The simulation gives access to ground truth about spiking activity in the recording. This facilitates the quantitative assessment of the proposed architecture, since the features of the spike trains are known a priori. All the spikes are recorded with a sampling rate of 24,000 samples/s. Each spike has 64 samples (i.e., N = 64), and the length of each spike is 2.67 ms.

We first consider the true positive rate (TPR) and false alarm rate (FAR) of the proposed architecture. The TPR is defined as the number of detected true spikes divided by the total number of true spikes. The FAR is defined as the number of silent segments, which are detected as spikes, divided by the total number of detected segments. Table I shows the TPR and FAR of the normalized correlator, the noncoherent energy detector, NEO, stationary wavelet transform (SWT), and matched filter for various SNR levels. The number of neurons is 2. The proposed normalized correlator architecture therefore uses 2 templates (i.e., q = 2).

It can be observed from Table I that the normalized correlator has higher TPR and lower FAR as compared with those of the other algorithms. This is because the correlation is beneficial for identifying real spikes and ignoring silent segments. This fact can be further observed in Figure 5, where the noisy spike train with SNR= -3 dB, and the normalized correlation values  $\bar{y}_i[n], i = 1, 2$ , are shown. It can be observed from Figure 5 that it is difficult to locate spikes due to large noise corruption. Nevertheless, the normalized correlation values shown in Figures 5 still provide useful information revealing the location of true spikes. It is also interesting to note that the noncoherent energy detector has TPR and FAR values comparable to those of matched filter. These results show that the energy is also effective for spike detection.

Next we evaluate the area complexities. Because adders, multipliers, dividers, comparators and registers are the basic building blocks of the architecture, the area complexities are separated into four categories: the number of adders, multipliers, dividers, comparators and registers. Table II shows the area complexities of the proposed architecture. It can be observed from Table I that the number of adders, multipliers, and dividers are fixed, and independent of the block dimension N and number of templates q in the filter unit, block energy computation unit and thresholding unit. Although the number of adders and the number of multipliers grows with the N and

TABLE II. THE AREA COMPLEXITIES OF THE PROPOSED ARCHITECTURE

|             | Filter | Block Energy     | Correlator | Thresholding |
|-------------|--------|------------------|------------|--------------|
|             | Unit   | Computation Unit | Unit       | Unit         |
| Adders      | O(1)   | O(1)             | O(qN)      | O(1)         |
| Multipliers | O(1)   | O(1)             | O(qN)      | O(1)         |
| Dividers    | 0      | 0                | O(1)       | 0            |
| Comparators | 0      | 0                | 0          | O(1)         |
| Registers   | O(1)   | O(N)             | O(qN)      | O(1)         |

TABLE III. HARDWARE UTILIZATION OF THE FPGA IMPLEMENTATION OF THE PROPOSED NORMALIZED CORRELATOR ARCHITECTURE

|             | Filter | Block Energy     | Correlator | Thresholding | Total |
|-------------|--------|------------------|------------|--------------|-------|
|             | Unit   | Computation Unit | Unit       | Unit         |       |
| ALUTs       | 750    | 649              | 4571       | 89           | 6059  |
| Registers   | 236    | 866              | 2788       | 13           | 3903  |
| Memory Bits | 0      | 0                | 0          | 0            | 0     |
| DSP Blocks  | 24     | 3                | 528        | 0            | 555   |

q in the block energy computation unit, only a single divider is used in the unit because of the employment of the postnormalization technique. This is beneficial for lowering the area costs of the circuit.

We further consider the hardware utilization of the proposed normalized correlation architecture implemented by FPGA. In the experiments, we set the dimension of the spikes to be N = 64. There are q = 2 templates. The target FPGA in the experiments is Altera Stratix III EP3SE80F780C2, which contains 64,000 adaptive lookup tables (ALUTs), 64,000 registers, 6,331,392 memory bits, and 672 DSP blocks. The FPGA design platform is Altera Quartus II 13.0. Table III shows the number of ALUTs, the number of registers, the number of memory bits, and the number of DSP blocks consumed by each unit of the proposed circuit. It can be observed from Table III that many of the ALUTs, registers and DSP blocks provided by the target FPGA are consumed by the correlator unit because the inner product operations are required in the unit.

When only the noncoherent energy detection is necessary, the correlator can be removed. Therefore, the area costs can be effectively lowered. Table IV shows the hardware utilization of the proposed normalized correlation architecture and the proposed noncoherent energy detection architectures. It can be observed from Table IV that the noncoherent energy detection architecture has lower hardware utilization. In particular, the utilization of DSP blocks is 3, which is only 0.54 % (i.e., 3/555) of that utilized by the normalized correlator architecture.

The proposed architecture is used as a hardware accelerator in a NOC platform for the speed evaluation. The NOC is designed by Altera Qsys 13.1. The NOC consists of a NIOS II softcore processor, an embedded RAM, and the proposed circuit. The noisy spike sequences are stored in the embedded RAM. The NIOS II processor activates the delivery of the spike sequence from the RAM to the proposed circuit for spike detection. Upon the completion of spike detection operations, it also collects the results of the spike detection for subsequent spike sorting operations. When operating at the clock rate 50 MHz, the proposed architecture is able to complete the

TABLE IV. COMPARISONS OF HARDWARE UTILIZATION OF THE NORMALIZED CORRELATOR AND NONCOHERENT ENERGY DETECTOR FPGA IMPLEMENTATIONS

|                 | ALUTs | Registers | Memory Bits | DSP Blocks |
|-----------------|-------|-----------|-------------|------------|
| Normalized      |       |           |             |            |
| Correlator      | 6059  | 3903      | 0           | 555        |
| Noncoherent     |       |           |             |            |
| Energy Detector | 1488  | 1115      | 0           | 3          |

detection operation in 52 ms for a spike sequence with length of 100 seconds. By contrast, the computation time of its software counterpart running in the 1.7 GHz Intel I-7 processor for the same spike sequence is 1.58 second. The speedup of the hardware acceleration therefore in 30.38 (i.e., 1.58 second vs. 52 ms). All these facts demonstrate the effectiveness of the proposed architecture.

#### V. CONCLUSION

The proposed normalized correlator architecture has been implemented by FPGA for performance evaluation. Experimental results show that the architecture is effective for spike detection. It has the advantages of high TPR, low FAR, and fast computation. For spike trains with SNR = -3 dB, the proposed normalized correlator is able to achieve TPR 82.71 % and FAR 1.06 %. In addition, the speedup of the proposed architecture in the NOC operating at 50 MHz over its counterpart is 30.38. The proposed architecture can also be simplified to a noncoherent energy detector when lower hardware costs are desired at the expense of a slight degradation in detection performance.

#### REFERENCES

- [1] S. Gibson, J. W. Judy, and D. Markovic, "Spike sorting: the first step in decoding the brain," IEEE Signal Processing Magazine, 2012, pp. 124-143.
- [2] M. A. Lebedev and M. A. L. Nicolelis, "Brainmachine interfaces: past, present and future," Trends in Neurosciences, Vol. 29, 2006, pp. 536-546.
- [3] S. Mukhopadhyay and G. C. Ray, "A new interpretation of nonlinear energy operator and its efficacy in spike detection," IEEE Trans. Biomed. Eng., Vol. 45, 1998, pp. 180-187.
- [4] K. Kim and S. Kim, "A wavelet-based method for action potential detection from extracellular neural signal recording with low signal-tonoise ratio," IEEE Trans. Biomed. Eng., Vol. 50, 2003, pp. 999-1011.
- [5] N. Mtetwa and L. S. Smith, "Smoothing and thresholding in neuronal spike detection," Neurocomputing, Vol. 69, 2006, pp. 1366-1370.
- [6] J. Drolet, H. Semmaoui, and M. Sawan, "Low-power energy-Based CMOS digital detector for neural recording arrays," IEEE Biomedical circuits and systems conference, 2011, pp.13-16.
- [7] K. Oweiss and M. Aghagolzadeh, Detection and classification of extracellular action potential recordings, Chapter 2 of Statistical Signal Processing for Neuroscience, 2010, pp. 15-74.
- [8] S. Gibson, J. W. Judy, and D. Markovic, "An FPGA-based platform for accelerated offline spike sorting," Journal of Neuroscience Methods, Vol. 215, 2013, pp. 1-11.
- [9] W. J. Hwang, S. H. Wang, and Y. T. Hsu, "Spike Detection Based on Normalized Correlation with Automatic Template Generation," Sensors, 2014, pp. 11049-11069.
- [10] L. S. Smith and N. Mtetwa, "A tool for synthesizing spike trains with realistic interference," Journal of Neuroscience Methods, Vol. 159, 2007, pp. 170-180.

## **COTS or Custom Made? Design Decisions for Industrial Control Systems**

Falk Salewski

Department of Electrical Engineering and Computer Science Muenster University of Applied Sciences Germany

Email: falk.salewski@fh-muenster.de

*Abstract*—In the area of industrial control systems, the choice between *custom made* (CM) electronics and the use of *commercial* of the shelf (COTS) components is often not trivial. Especially, when required quantities or specific requirements do not give a clear sign for selection. From a pure cost point of view (development costs and product costs) a decision might look trivial, but a broader view helps to perform an sound decision. In this work, decision criteria and a decision method are presented for industrial control systems targeting COTS devices, CM devices or a combination of both. Moreover, a case study with three industrial control systems is presented showing the application of the approach.

Keywords-commercial of the shelf; electronic design decisions; industrial control units

#### I. INTRODUCTION

In industrial automation, commercial of the shelf (COTS) components as programmable logic controllers (PLCs) and industrial PCs (IPCs) are widely used as control units (For this paper, we follow the following definition for COTS: A COTS device can be bought from a catalog without modifications [1]). In some applications, companies are faced with the decision if a custom made (CM) design of a control unit might be beneficial for their products and systems. In other application, a change from a custom made design of control units to COTS components is discussed.

A custom made development often comes with an optimized functionality and an attractive price of the final product, but involves much more than own development activities. Especially in case of safety or mission critical systems, it has to be assured that specific requirements (temperature range, failure rate, electrical robustness, etc.) are met over the complete product life cycle (and not only with a prototype during development). While a custom made design allows full control of the final product, all relevant aspects have to be verified. These activities are performed on basis of prototypes and first series devices, but also have to be reconsidered in case of changes (e.g., obsolete memory chips require replacement).

On the other hand, the use of COTS components often requires more than applying a plug and play procedure. In the example of COTS components in critical applications, it could be required to establish specific relationships with the suppliers and/or to perform additional tests on the COTS components (examples can be found in [1]).

In both cases, the complete life cycle of the product has to be considered for a sound selection. An approach for such a selection is the so called Total Cost of Ownership (TCO) [2] that aims to consider all cost factors of a product during product life. To supplement existing approaches with the required technical data, this paper deals with the differences of the following approaches for industrial control units:

- 1) Commercial of the shelf (COTS)
- 2) Custom made (CM)
- 3) Combination of 1 and 2.

The main focus of this paper is on electronic control units (including their software), but not on pure software products as discussed for example in [3].

As a basis for a systematic selection procedure, we collect relevant selection criteria in the following Section II. Next, the specialties of the three approaches are analyzed based on their product life cycle in Section III. Based on these two sections, a selection procedure is presented in Section IV, followed by a case study in Section V. After a discussion in Section VI, the paper ends with a conclusion.

#### II. TARGETS FOR SELECTION

Before having a closer look on the different approaches, it is necessary to define the key targets to be fulfilled by the devices. Common targets often cited are fast time to market, improved costs and competitive advantages [4]. These competitive advantages describe product properties beside the price and differ between application domains. In previous work, we already identified a set of impact factors for hardware platforms [5]. For this work, we take a system view on the control units (electronics + software + mechanical). Moreover, we assume that the functional requirements are fulfilled for industrial environments in case of all candidates. The resulting set of impacts is presented in Figure 1 and will be further described below.

#### A. Time to Market

A fast *time to market* is an obvious target. As soon as the product is on the market amortization of non-recurrent costs can start. Moreover, a fast *time to market* can be a competitive advantage to competitors.

#### B. Costs

As with *time to market*, it is an obvious target to keep the *costs* low. However, several aspects have an impact on the overall costs for a product. In case of *recurrent costs*, it is the cost of purchasing or manufacturing the product itself. In addition, license costs for software (drivers, operating systems, etc.) and/or hardware modules (e.g., inclusion of externally developed modules in custom made products) as well as



Figure 1. Targets for selection of electronic control units

costs resulting from later maintenance activities have to be considered. The non-recurrent costs for a custom made control unit include development costs (including costs for prototypes and test activities during development) as well as costs for the preparation of the series production (creation and test of tooling, as soldering frames, adapters for automatic assembly, programs for test equipment as automated optical inspection (AOI), in circuit tester (ICT), and/or functional tester, test adapters and specific test electronics). Further non-recurrent costs that also appear for COTS systems are the costs of integration of the electronic control system into the target system as well as those for verification, validation and certification activities (performed before and/or after integration in target system). Often, at least certification activities are executed on system level, but benefit from pre-certified components. Finally, costs resulting from required documentation activities (product + development process) have to be considered.

#### C. Product Properties

While we assume that all candidates can fulfill the functional requirements, further properties could make a difference.

A first important property is the *availability* of the product (availability in this context is not the operational availability but the possibility to purchase or manufacture the product). For any application, it is important that the required control electronics are available for new products and replacements of defect units.

As many industrial control electronics perform safety and/or mission critical tasks, their *reliability* and *functional safety* is another important factor. As evaluated in previous work, the choice of the hardware platform has impacts on the safety properties of the overall system [6]. The specific needs have to be analyzed for each application individually.

Security is another important property. Especially the increasing interconnection of industrial automation systems via the internet requires corresponding measures [7], [8], [9]. Additionally, a protection of the *intellectual property* (IP: firmware, electronics, design, etc.) is often desirable to protect own products from plagiarizing. As with functional safety and reliability, the requirements depend on the individual application.

For applications that evolve during their life time (e.g., an industrial plant undergoing modernization) or those in which a control unit should be applied in several different target applications (perhaps not all of them defined today), it is desirable to work with systems that can be *adapted* to different or changing requirements. Examples are modular PLCs which allow to add a variety of different plug-in modules (analog and digital I/O, communication interfaces, special function modules). Another approach is to define major parts of the product via software or reconfigurable hardware (e.g., FPGAs).

While *energy efficiency* of control units was predominantly an issue in mobile and battery powered devices in the past, it is now also an issue in all industrial application (especially if a high number of control units is applied). Additionally, *size* and/or *weight* is an issue in several applications.

#### D. Customer Perception

Finally, an impact that could be important is the customer perception. While a decision could not be the optimum choice, it still might be the optimum solution from the customers perspective. As an example, the use of a COTS device with a good reputation might increase customer's confidence in the product although it does not differ from alternatives from a technical point of view.

#### III. PRODUCT LIFE CYCLE

In this section, a typical product life cycle is presented in Figure 2 for a design based on COTS control units, a design with custom made control units and a combination of COTS and CM components.

Following accepted processes, the product life cycle starts with a specification. While the creation of a sound specification is a major task, we assume it is already existent for the next step. Based on the specification, an implementation could be realized in the three ways presented above. Additionally, each product life cycle ends with some *end of life* activities, typically decommissioning. As the impact of this phase is considered low for the selection process, end of life activities are not considered in this paper. The following subsections deal with the remaining phases for the three approaches.



Figure 2. Product life cycle for different approaches (length of phases does not necessarily reflect the effort required for this phase)

#### A. COTS

In case of a COTS design, a suitable device has to be selected. The aim is to identify an existing product that fulfills the requirements given in the specification. Moreover, further aspects as those presented above could be important for the selection, although often not explicitly stated in the specification. Depending on the application, it might be useful to reconsider the specification, if no suitable COTS device could be identified. Moreover, the fulfillment of the requirements is often not only determined by the product itself and related aspects (e.g., documentation), but also by the relationship to the supplier of this device (support during integration, operation, maintenance, long time availability, insight into verification and validation activities, willingness to perform further verification and validation activities if needed, etc.). Especially for critical applications, additional verification activities could be required to apply COTS devices (see [1] as an example for military applications). If these verification activities are required and cannot be performed by the supplier, own verification activities have to be performed with the COTS device.

In the next phase, the selected COTS device has to be integrated into the application (for this approach, we assume that no modifications are required to integrate the COTS device). In this phase, the knowledge of the COTS device's properties is of great importance. Gaining this knowledge could be time consuming, but could be eased by support given by the supplier (good documentation, qualified hotline support, tools supporting integration, etc.).

While verification and validation of the control unit itself has already been targeted, it is the overall system that has to fulfill the requirements. Thus, verification and validation activities have to be performed also on system level. Based on the application, also certifications are required or recommended (e.g., functional safety applications). Several COTS devices come with some pre-certification for certain applications (as the mentioned safety applications). These pre-certifications typically ease the certification activities on system level.

#### B. Custom Made

The CM approach requires development and manufacturing activities. During development, prototypes are implemented and verified on basis of the specification. Design decisions have to consider functional aspects, as well as further impacts (see Figure 1). Some aspects for COTS apply here for specific integrated circuits used in the design. They can simplify design and verification activities, but also lead to the challenges listed in the COTS section (e.g., availability). Especially in complex designs, often several prototype stages are required until verification and validation activities are passed successfully. Additionally, an ideal design is optimized for later manufacturing reducing manufacturing times and tooling costs. Generally speaking, the aim is to deal with the complexity in development and manufacturing [10]. For optimum timeto-market, the preparation for manufacturing is started before the development activities are finished. The required synchronization between development and manufacturing activities are often challenging [11]. Moreover, to determine the start time of preparation activities, a tradeoff between risks of changes in the product relevant for production and reduced preparation time is necessary.

In the following steps, optimizations of the manufacturing process take place, mostly to optimize manufacturing time and quality. Integration, verification and validation activities can start with prototypes, but final tests and certification typically require first samples from the serial manufacturing process.

In case of a COTS product, analysis of defect products, obsolete components or changes in regulatory requirements (e.g., EMC requirements) are typically performed by the supplier. Also in case of a CM design, this analysis has to be performed periodically to check if changes in the product are required. While these activities could be outsourced, the effort for these activities has to be considered. Moreover, required changes could result in costly redesign activities (new verification, validation and certification activities might be needed), a risk worn by the supplier in case of COTS components.

#### C. Combination

The process of combining COTS components with a custom made design follows a combination of both processes. Typically, the product core is implemented with a COTS component and the interfaces are custom made, but also other parts as interfaces or power supplies can be implemented with COTS parts. Thus, during development all aspect of a custom made design have to be followed in addition to a selection of suitable COTS components (lower part of Figure 2, only the differences to the CM process are displayed). While the use of COTS components comes with some challenges to be considered (see section above), it can simplify the remaining development significantly. An example is the use of a COTS single board PC on a custom made printed circuit board (PCB) populated with interface and power supply circuits (and some application specific functions if needed, see also Section V). This combination can simplify the manufacturing process if the main PCB is populated with comparatively simple components only. Furthermore, PC parts tend to become easily obsolete, a problem now covered by the supplier of the PC board. But the supplier of the PC board benefits from his high production volume. Thus, the resulting price of the PC module could be lower than to manufacture low quantities in house.

#### IV. SELECTION PROCEDURE

Based on the targets presented in Section II and the product life cycles presented in the preceding Section III, a systematic selection is feasible. For an objective evaluation, it is recommended to evaluate each factor in a team (at least technical and sales representatives). In case a custom made design might be the desired choice, experts from the area of electronic development and manufacturing should be consulted (internal or external partners). This way, quantitative data can be achieved for costs and time-to-market aspects. However, for reliable data, a sound specification and "trustworthy" experts are required.

Besides costs and time to market, the targets are of qualitative nature. While a qualitative analysis is probably sufficient in many cases, a rating system can be applied in case of all qualitative aspects (e.g., rating of products availability from 1 to 10) if needed, for example in form of a decision matrix. Rating can be agreed on in the team or it can be build from a set of individual ratings. Further approaches for these so called *multi-criteria decision analysis* (MCDA) can be found in literature (e.g., [12]).

#### V. CASE STUDY

In this section, three existing control units are evaluated based on the criteria defined before. The emphasis of the following description is on the properties of the selected system and not on the selection process (devices already exist).

#### A. Three Control Units

1. Machine for sorting metal parts: The control unit is required to switch electric motors and pneumatic valves and read several position sensors and an analogue input for measuring the metal parts. Moreover, the status of the machine has to be displayed on a screen. The volume of this machine is  $\leq 50$ per year.

2. User terminal for embroidery machine: The Control unit has to read the required embroidery pattern from a USB stick and display it on the screen of the terminal. Moreover, user commands have to be read from the terminal. A set of commands is computed and send to the embroidery machine via a proprietary interface. The volume is 800 units per year.

3. Window control unit: This electronic unit has to control a DC motor (PWM, encoders) based on sensor information and a proprietary bus interface. Moreover, the available space for this device is limited to 100x40x18mm. The volume here is  $\geq 1000$  units per year.

#### B. Evaluation

An overview of the evaluation can be found in Figure 3 while details will be described below.

1) Case A: The low quantity of required products indicate a COTS device as best choice. However, a conflict could arise from the remaining targets. The non recurrent costs, as well as the required time to market clearly benefit from the use of a COTS component. The recurrent price is probably higher than a custom made approach, but a quantity of 50 units in most cases does not allow to amortize non recurrent costs for a custom made design incl. verification. Finally, product properties have to be considered. Size and Weight targets, as well as energy efficiency, which could be a tough challenge for COTS approaches, are not critical here. For this application, a modular programmable logic controller (PLC) has been chosen. This approach allows to adapt the control units in case of later changes. Moreover, this approach allows to use similar approaches in different machines. During the selection of the device, the availability of this device or potential replacements is crucial. Well established systems, as well as individual contracts can mitigate the risks. Additionally, the use of standardized components (including the programming languages) ease the migration to alternative systems when needed. Finally, no specific safety, security or reliability requirements were given in this application. However, specific PLC systems targeting these requirements are available. Based on this brief evaluation, a COTS approach is the optimum solution for this application.

2) Case B: In this application, the need for a proprietary interface requires at least some custom made design. Moreover, the visualization requirements for the terminal screen require a certain amount of processing power. In this application, a combination of a COTS processor board was chosen in combination with a custom made main board implementing the power supply and required interfaces. The use of the COTS board was driven by the following aspects:

- simplifies design and manufacturing of main board (no fine pitch components, less high speed design)
- in required quantities, COTS board has an attractive price compared to CM approach.

|                   | Case :                              | Case A                                                                                                                                                                           | Case B                                                                                                                                                                                                                 | Case C                                                                                                                                                         |  |
|-------------------|-------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
|                   | Description :                       | Machine for sorting metal parts                                                                                                                                                  | User terminal for embroidery machine                                                                                                                                                                                   | Window control unit                                                                                                                                            |  |
| Assum             | ned annual quantity :               | ≤ 50                                                                                                                                                                             | 800                                                                                                                                                                                                                    | ≥ 1000                                                                                                                                                         |  |
| Target            | s $\downarrow$ Choice $\rightarrow$ | COTS                                                                                                                                                                             | COTS + CM                                                                                                                                                                                                              | СМ                                                                                                                                                             |  |
| Product           |                                     | high, but according to low quantity best<br>with COTS device (here PLC)                                                                                                          | combination of a COTS processor board<br>with a custom made main board allows a<br>competitive product price                                                                                                           | custom made design allows cost optimized approach (for given constraints)                                                                                      |  |
| Recurring         | Licenses                            | no licence for operation                                                                                                                                                         | open source operating system                                                                                                                                                                                           | none                                                                                                                                                           |  |
| Recurring         | Maintenance                         | diagnosis features supported by PLC,<br>modular PLC allows replacement/ repair of<br>modules                                                                                     | individual repair/replacement of processor<br>and main board possible, maintenance<br>features have to be custom made                                                                                                  | diagnosis features implemented via bus<br>interface                                                                                                            |  |
|                   | Development                         | HW: only selection & integration<br>SW: based on PLC operating system =><br>application only                                                                                     | HW: only main board + selection processor<br>board & integration<br>SW: operating system has to be adapted to<br>custom design + application SW                                                                        | full development of electronics and software                                                                                                                   |  |
| Costs             | Manufacturing<br>Setup              | none                                                                                                                                                                             | manufacturing of main board + integration<br>processor board + test in manufacturing;<br>separate processor board, no fine pitch<br>devices on main board => simplifies<br>manufacturing process                       | full manufacturing setup incl. test required                                                                                                                   |  |
| Non-<br>Recurring | Integration                         | HW setup with COTS IDE + wiring of sensors and actuators                                                                                                                         | <ol> <li>main and processor board</li> <li>operating system and HW</li> <li>application</li> </ol>                                                                                                                     | HW/SW integration in development,<br>integration with remaining system via bus<br>interface                                                                    |  |
|                   | V&V                                 | focus on SW + overall system                                                                                                                                                     | complete system                                                                                                                                                                                                        | complete system                                                                                                                                                |  |
|                   | Certification                       | not required for control unit                                                                                                                                                    | EMC test for CE marking                                                                                                                                                                                                | EMC test for CE marking,<br>further tests with complete system                                                                                                 |  |
|                   | Documentation                       | SW + wiring (hardware configuration saved<br>in project data)                                                                                                                    | full documentation,<br>exisiting documentation for processor board<br>and operating system can be included                                                                                                             | full documentation                                                                                                                                             |  |
|                   | -                                   | depends on PLC supplier, long term<br>industrial availability provided                                                                                                           | depends on supplier of processor board<br>(long term contract), processor board can<br>be replaced (redesign main board +<br>comparable alternative processor board)                                                   | depends only on components used,<br>obsolences can be handeld with 2nd source<br>components, if needed in combination with<br>redesign (HW or HW+SW)           |  |
|                   |                                     | no specific requirements, COTS HW is<br>assumed to be well tested,<br>COTS devices typically = black box, but<br>reliability and safety data is available for<br>certain devices | complete reliability analysis possible for<br>main board, data für processor board<br>available from supplier.<br>No specific safety requirements.<br>(implementation on main board could be an<br>option if required) | complete reliability analysis possible for<br>electronics.<br>Specific safety requirements could be<br>implemented in SW and HW (emergency<br>stop, life beat) |  |
| Product           | Security & IP-<br>Protection        | supported, setting via COTS IDE                                                                                                                                                  | processor supports protection of program<br>memory                                                                                                                                                                     | processor supports protection of program<br>memory                                                                                                             |  |
| Properties        | Adaptability                        | modular PLC systems allows to add further<br>modules (I/O, special function,), other<br>devices can be added via bus interface                                                   | full control of SW,<br>custom main board allows adaptations, but<br>these changes require redesigns of the<br>hardware (incl. verification and<br>certification)                                                       | full control of SW,<br>full control of HW, but changes require<br>redesign (incl. verification and certification)                                              |  |
|                   | Energy efficiency                   | COTS devices with acceptable energy efficiency are available                                                                                                                     | the custom made design and the selection<br>of a suitable processor board allows an<br>optimized design                                                                                                                | stand by <0,4W => low power controller in<br>combination with suitable HW and SW<br>design (sleep modi)                                                        |  |
|                   | Size & Weight                       | no specific requirements                                                                                                                                                         | size of PCB determined by 10" screen (not<br>critical)<br>no specific weight requirements                                                                                                                              | critical => only achievable with custom<br>design                                                                                                              |  |
| time              | e to market                         | fast (weeks)                                                                                                                                                                     | medium (months),<br>with COTS processor board, the SW<br>development can start before custom made<br>HW is ready,<br>risk of design iterations                                                                         | medium-high (months),<br>with evaluation board, the SW development<br>can start before custom made HW is ready,<br>risk of design iterations                   |  |
| custor            | ner perception                      | selected brand of COTS device supports<br>image of high quality product                                                                                                          |                                                                                                                                                                                                                        | customized solution allows to meet the targets for size and product price                                                                                      |  |

Figure 3. Case Study

- components as memory chips change frequently. In COTS approach, the qualification of new chips is done by supplier.
- an approach of a complete COTS user terminal in combination with an interface converter was resulting in a significantly higher product price.

Also the remaining cost related factors show no disadvantage of this approach compared to a full custom made design. With respect to time to market this approach benefits from the COTS components in comparison to the CM approach, as a major part of the design could be implemented as a pretested module. The product properties are influenced as follows:

As the COTS board has a major impact on the availability, a long term contract was set up with the supplier. Nevertheless, a migration to another processor board is possible (probably involves redesign). Reliability analysis is possible as the complete design is known. Optimizations could have been performed if required, as well as the implementation of safety functions on the main board. A protection of the program memory is supported by the processor, no further security or IP protection requirements exist. Adaptability can be achieved by modifications of the main board. However, this approach requires redesigns (incl. verification activities). In this application, it is expected to handle all modifications via SW. Customization allows optimization of energy, size and weight properties. However, non of these are considered as critical here.

Finally, a custom made design allows significant separation from competitors (customers perception). In summary, the application benefits from the chosen combination of COTS and CM.

3) Case C: Size and product price restrictions are major impacts for this application and could not be fulfilled with available COTS components. The non-recurring costs for the required design and manufacturing activities are significantly higher than with a COTS approach, but could be amortized by the expected quantity in an acceptable period. Costs for verification and certification activities could be held on a moderate level as the complete system was already undergoing sufficient procedures. With full control of HW and SW design, specific project properties could be fulfilled. The time to market was (with almost a year) long compared to a COTS approach, but not critical as the development of the complete system took similar time.

#### VI. DISCUSSION

While the emphasis of the presented case study is on the differences of the three cases, the compiled data can be used as the basis for a systematic decision process. Even in a pure qualitative approach, the collection and evaluation of the proposed targets and the consideration of the design process prevent that important factors are neglected during the decision process. If required, quantitative approaches as described in Section IV can be applied to further formalize the selection.

A first impression could be that the decision for or against COTS devices is solely driven by the quantity of the required units. For sure, in extreme cases (less than 10 units, more than 100000 units) the decision is probably simple. However, for medium numbers and depending on further targets to be fulfilled by the control unit the decision process differs. As an example, a product with a quantity of 1000 units/year could be better implemented with COTS (high volume product that perfectly matches requirements) and a unit only needed a few 100 times a year might be better in CM (e.g., when other targets do not allow a pure COTS approach). Additionally, the importance of the different targets could be rated very differently for different applications. If for example the availability of a product is rated very high and CM design is possible with standard components (all with 2nd source), a high independence from suppliers could be achieved by a CM design.

#### VII. CONCLUSION

Comparing COTS and CM approaches (or combinations of both) requires more than just an analysis of cost and time to market. Moreover, the overall costs (recurring and non-recurring) are compiled from several aspects. This paper presents a set of important targets to be considered in the decision process, as well as impacts on the product life cycle of the different approaches. A systematic selection process can be based on this evaluation as demonstrated in a case study with three industrial control units.

#### REFERENCES

- J. Hall and R. Naff, "The cost of cots," in Proceedings of the Digital Avionics Systems Conference. IEEE, 2000, pp. 20–24.
- [2] F. Wynstra and K. Hurkens, "Total cost and total value of ownership," in Perspektiven des Supply Management, M. Eig, Ed. Springer Berlin Heidelberg, 2005, pp. 463–482.
- [3] K. Megas, G. Belli, W. B. Frakes, J. Urbano, and R. Anguswamy, "A study of cots integration projects: Product characteristics, organization, and life cycle models," in Proceedings of the 28th Annual ACM Symposium on Applied Computing, ser. SAC '13. New York, NY, USA: ACM, 2013, pp. 1025–1030.
- [4] E. R. Hnatek, Practical Reliability Of Electronic Equipment And Products. Marcel Decker, 2003.
- [5] F. Salewski and S. Kowalewski, "Hardware platform design decisions in embedded systems - a systematic teaching approach," in Special Issue on the Second Workshop on Embedded System Education (WESE), vol. 4, no. 1, SIGBED Review. ACM, Jan. 2007, pp. 27–35.
- [6] —, "The effect of hardware platform selection on safety-critical software in embedded systems: Empirical evaluations," in IEEE Symposium on Industrial Embedded Systems (SIES'07). IEEE, July 2007, pp. 78– 85.
- [7] E. Byres and J. Lowe, "The myths and facts behind cyber security risks for industrial control systems," in Proceedings of the VDE Kongress, vol. 116, 2004, pp. 213–218.
- [8] K. Stouffer, J. Falco, and K. Scarfone, "Guide to industrial control systems (ics) security," NIST special publication, 2011, pp. 800–82.
- [9] C. Alcaraz, R. Roman, P. Najera, and J. Lopez, "Security of industrial sensor network-based remote substations in the context of the internet of things," Ad Hoc Networks, vol. 11, no. 3, 2013, pp. 1091–1104.
- [10] W. ElMaraghya, H. ElMaraghy, T. Tomiyamac, and L. Monostorid, "Complexity in engineering design and manufacturing," CIRP Annals - Manufacturing Technology, vol. 61, no. 2, 2012, pp. 793 – 814.
- [11] E. Puik, P. Gielen, D. Telgen, L. van Moergestel, and D. Ceglarek, "A generic systems engineering method for concurrent development of products and manufacturing equipment," in Precision Assembly Technologies and Systems, ser. IFIP Advances in Information and Communication Technology, S. Ratchev, Ed. Springer Berlin Heidelberg, 2014, vol. 435, pp. 139–146.
- [12] E. Triantaphyllou, "Multi-criteria decision making methods: A comparative study," Applied Optimization, vol. 44, 2000.

## Design Guidelines for Designing High Gain Patch Antenna in the Ku-band

Qasim Umar Khan National University of Sciences and Technology, (College of Electrical and Mechanical Engineering) Islamabad, Pakistan e-mail: qasimumarkhan@gmail.com qasimumar.khan@ceme.nust.edu.pk

Abstract— In this paper, a general method is proposed to design patch antenna with high gain in the Ku band. The design method is illustrated with the help of two examples. It is shown that high gain antennas can be designed with satisfactory performance in the desired band. The designed antennas can be used in the applications relating to the Kuband such as satellite communication, radar, point to point communication etc.

Keywords- Antenna radiation patterns; patch antenna; high gain antenna.

#### I. INTRODUCTION

The patch antennas have been studied rigorously and widely since last three decades. This class of antenna offers various advantages as low cost, light weight, easy fabrication and conformability etc but on the other hand these antennas have narrow bandwidth and low gain [1]. Various approaches have been used to overcome the problems of narrow bandwidth and low gain [2]. Recently the authors have proposed use of partial Koch to improve the gain of triangular patch antenna [3]. Similarly star shaped patch antenna has been designed to overcome the problem of low gain at higher order modes in the Ku band [4][5]. The design of star shaped antenna in [4][5] has been generalized in this paper and guidelines have been provided for the design procedure at any frequency in the Ku-band. Furthermore slots usage to improve antenna performance is also discussed by the authors in [6]. This paper generalizes the design procedure of [4]-[6] to design a single layer patch to achieve high gain at higher order modes without having to resort to air gaps, parasitic patches, superstrates, staked patches or arrays etc. The design guidelines have been verified by designing two antennas in band of 16GHz-16.3GHz [4]-[6] and 24GHz-24.25GHz. These two frequency bands have been selected for their usage in practical applications like point to point communication, radar, amateur radio, radiolocation services etc and the growing trend towards the usage of higher microwave frequencies. The paper is organized as follow: In Section 2 existing literature on the high gain antennas is presented. Section 3 presents the proposed method guidelines, Section 4 presents design example based on the proposed method. The discussion on Mojeeb Bin Ihsan National University of Sciences and Technology, (College of Electrical and Mechanical Engineering) Islamabad, Pakistan e-mail: mojeeb-eme@nust.edu.pk

the design is presented in Section 5 while conclusion is drawn in Section 6.

#### II. STATE OF THE ART

The low gain problem is tackled by different techniques. These techniques can be broadly classified into five categories. 1) Use of Substrates-Superstrates 2) Use of Stacked/Parasitic patches 3) Lossless feeding Techniques 4) and Higher order mode Excitation. Among these categories 2 and 3 were initially used to tackle bandwidth problem but later these were also used for gain enhancement. Nicolaos et al. [7][8] presented the resonance conditions for a substratesuperstrate printed antenna geometry for high gain. K.F.Lee et al. [9] have used same size parasitic elements to achieve high gain apart from bandwidth. Parasitic elements obtain energies from fed by near field coupling and function as radiating element in an array. Along with the parasitic elements concept the stacked elements concept [10] is used to achieve high gain. L-probe feed [11] has been shown to increase the bandwidth and gain of the fed patch. The Lprobe acts as a series resonant element with resonant frequency close to the dominant mode thereby increasing the gain of the patch. The higher order modes in case of patch antennas however are discussed very little to increase gain. This is due to the difficultly and challenge associated with the excitation of higher order modes in given dimensions of the patch. Furthermore high side lobe levels, undesirable radiation patterns and narrow impedance bandwidth are associated with higher order modes [1]. The star shape antenna thus designed alleviates this problem and hence facilitates the antenna design without having to resort to complex and thick designs.

#### III. PROPOSED METHOD GUIDELINES

The equation relating the resonant frequency and the patch dimensions at particular mode [1] is given below.

$$f_{mn} = \frac{k_{mn}c}{2\pi\sqrt{\varepsilon_r}}$$
(1)

where  $\mathcal{E}_{r}$  is relative dielectric constant, c the speed of light and  $k_{mn}$  is the wave number at mn<sup>th</sup> mode given as:

$$k_{mn} = \sqrt{\left(\frac{m\pi}{L}\right)^2 + \left(\frac{n\pi}{L}\right)^2}$$

where L is the length of square patch. Using equ (1) one can simply finds out the dimension L by plugging in the mode numbers m n, frequency desired (this is  $f_{70}$ ) and relative dielectric constant and then by using calculated dimension L the dominant mode frequency can be found easily for the initial design. This is explained by two examples: Example 1: (Frequency = 16GHz, dielectric constant = 2.33 and m=7, n=0) the dimension L comes out to be 42.12mm. Putting L = 42.12mm and the dominant mode m = 1, n = 0, the frequency is found to be 2.4GHz for dielectric constant of 2.33. Similarly for Example 2: (Frequency = 24GHz, dielectric constant = 2.33 and m=7, n=0) L is found to be 28.6mm and frequency at dominant mode is 3.4 GHz. These calculations are summarized in Table 1.

TABLE 1 FREQUENCIES AND DIMENSIONS CALCULATION SUMMARY

| Freq f <sub>70</sub> | €    | Length L | Freq f <sub>10</sub> |
|----------------------|------|----------|----------------------|
| 16GHz                | 2.33 | 42.12mm  | 2.4GHz               |
| 24GHz                | 2.33 | 28.6mm   | 3.4GHz               |

The dominant mode frequencies calculated are to be used for initial design of square patch [1] which when undergoes the surface modification [4]-[6] resonates at  $f_{70}$  frequency. The initial antenna design [4], its improvement [6] have been reported by the authors however the design procedure was not generalized in [4]-[6]. Here we present the general design guide lines of the design. Following are the designs guidelines

- 1. Using above listed method calculate the dominant frequency from the desired frequency  $f_{70}$ .
- 2. Design square patch at the dominant frequency with side length L [1].
- 3. Cut squares of side L/4 from four edges of the designed square patch in 2 resulting in the cross shaped antenna [4].
- 4. Cut L/2 equilateral triangles from center of cross shaped antenna to result in star shaped antenna [4]. These guidelines have been shown in fig.1 (left).

# IV. EXAMPLES TO ILLUSTRATE DESIGN PROCEDURE AND SIMULATION RESULTS

Example 1 can be seen in [4][5] where the simulated and measured results are discussed. Here as an Example 2 the square patch is designed at 3.4GHz with substrate RT DURIOD 5870 having relative permittivity of 2.33 and height h = 1.5mm as calculated in Table 1. The antenna is fed with coaxial probe. The design of 24GHz star antenna follows the guidelines listed above. The final design is shown in Fig. 1(right) **along with** its S<sub>11</sub> parameter and radiation patterns at 24GHz in Fig 2(left) and Fig. 2(right) respectively. The simulated gain at 24GHz is found to be

12.9dBi. The other resonant modes in fig.2 (left) can be suppressed by use of suitable slots [6] which will eventually enhance the antenna performance.

#### V. DISCUSSION AND EXPLANATION OF HIGH GAIN

For the same size antenna, increasing the frequency will decrease the wavelength hence the gain of the antenna will increase. This can be verified through equ. (2) [1]

$$G = \frac{4\pi A_{eff}}{\lambda^2}$$
(2)

where  $A_{eff}$  is the effective aperture area.

A basic square patch can be viewed as a two element array, consisting of two radiating edges of length  $\lambda/2$  and separated by a distance  $\lambda/2$  [6]. In case of square shaped antenna resonated at  $f_{70}$  frequency, the distance between the radiating edges become greater then  $\lambda/2$ , however when the surface modification is done leading to star shaped antenna the distance decreases hence surface current densities become favorably in phase leading to high gain [6]. The Microstrip patch antennas have been studied extensively in last three decades to alleviate the problems of low gain and narrow bandwidth. However, the higher order modes of the patch antennas are not discussed significantly due to difficulty in their excitation and deteriorated performance associated with them. This approach of surface modification leading to star shape patch antenna provides the method to excite the higher order modes properly. Furthermore, the performance of antenna is highly improved in terms of gain, radiation patterns and bandwidth. Thus the design procedure facilitates the design of high gain patch antenna at higher frequencies by utilizing higher order modes without resorting to complex designs and techniques already reported in the literature.

#### VI. CONCLUSION

The paper presents a generalized method to design a high gain patch antenna suitable for the applications relating to the Ku-band. The method is verified through two design examples. The performance of the designed antennas can be further improved by use of suitable slots. The proposed method alleviates the problems associated with higher order modes such unstable radiation patterns, high SLLs etc. The method provides the procedure through surface modification of the patch to excite higher order modes in the patch and hence achieve high gain without complex designs.

#### References

- [1] A. Balanis, "Antenna Theory: Analysis and Design" Wiley-Interscience, 3 edition April 4, 2005, ch.14
- [2] I. J. Bahl and P. Bhartia, Microstrip Antennas. Norwood, MA: Artech House, 2001
- [3] D. Fazal, Q.U. Khan, and M.B. Ihsan, "Use of partial Koch boundaries for improved return loss, gain and sidelobe levels of triangular patch antenna", IET Electronics letter, Vol.48, No.15, July 2012, pp.902-903.
- [4] Q.U.Khan and M.B.Ihsan, "Higher Order Mode Excitation for High Gain Microstrip Patch Antenna," AEUE - International Journal of Electronics and Communications, vol. 68, issue 11, Nov,2014, pp-1073-1077.
- [5] Q.U.Khan and M.B.Ihsan, "A new microstrip star shaped patch antenna", Proceedings of IEEE TENCON Spring Conference, 2013, pp. 53- 56.
- [6] Q.U.Khan D. Fazal, and M.B.Ihsan "Use of Slots to Improve Performance of Patch in Terms of Gain and Side Lobes Reduction", IEEE Antenna and Wireless Propagation Letters, doi 10.1109/LAWP.2014.2365588, vol.14, Oct 2014, pp.422-425
- [7] Nicolaos G. Alexopoulos and David R. Jackson, "Fundamental superstrate (cover) effects on printed circuit antennas," IEEE Trans. Antennas propagat.,vol 32. No.8,1984
- [8] N. G. Alexopoulos and D. R. Jackson, "Gain enhancement methods for printed circuit antennas," IEEE Trans. Antennas propagat., vol. Ap-33, Sept. 1985, pp. 976-987.
- [9] R. Q. Lee, R. Acosta, and K. F. Lee, "Radiation characteristics of microstrip arrays with parasitic elements," Electron. Lett., vol. 23,pp.835-837,1987.
- [10] M.T.islam, M.N.Shakib and N.Misran, "High gain microstrip patch antenna" Euro. Jour. of Scientific Research vol.32, no.2, 2009, pp.187-193.

[11] C. L. Mak, K. M. Luk, and K. F. Lee, "Experimental study of a microstrip patch antenna with an L-shaped probe," IEEE Trans. Antennas propagat., vol. 48, no. 5, may 2000, pp. 777–783.



Figure 1. (left) Design Procedure Illustration, (right) Designed of Star Shaped at 24GHz



Figure 2. (left) Simulated S11 of Star Shape 24GHz, (right) Radiation Patterns of Star Patch at 24GHz

## Filtering of Magnetic Noise Induced in Magnetometers by Motors of Micro-Rotary Aerial Vehicle.

Nathan J. Unwin, Adam J. Postula School of Information Technology and Electrical Engineering University of Queensland Brisbane, Australia email: n.unwin@uq.edu.au, a.postula@uq.edu.au

Abstract— Avionics systems of micro aerial vehicles (MAV) pose unique problems in system design, sensor signal handling and control. This is evident in micro-rotary aircraft as their whole body rotates with the sensors of the flight control. The precise calculation of attitude and heading from magnetometer readings is complex due to the body rotation. It is made even more difficult by noise induced in the geomagnetic signal by fluctuating magnetic field of the closely positioned motors. Filtering that noise is challenging since the rotation speed of motors and the vehicle can be very close. This paper presents analysis of motor induced noise, based on experimental data of brushless micro motors. A novel time domain filter is proposed, designed, implemented in FPGA hardware, tested and compared to other filters. This filter provides good performance even when the rotational rate of the motor and vehicle are close and traditional frequency domain filters would perform poorly.

#### Keywords - magnetic noise, magnetometer, rotary body UAV

#### I. INTRODUCTION

Rotary body aircraft is unique since it is both a rotary wing and fixed wing aircraft, which produces lift by spinning like a maple seed. The Papin-Rouilly Gyroptère [1] built in 1915 as a manned airplane is the first example of a



Figure 1. a. Lockheed-Martin Samarai prototype [2], b. Lockeed – Martin patent drawing [3], c. University of Maryland aircraft [4], d. University of Queensland aircraft [5].

"monocoptor", a type of rotary body aircraft. While the Gyroptère did not fly it is the basis for contemporary designs of rotary body unmanned micro-aircrafts. Figure 1 shows some of the latest developments of such micro-aircrafts in industry and academia [2], [3], [4], [5].

The interesting property of the rotary body aircraft is that the core set of sensors of the flight control system, the inertial measurement unit containing magnetometer, is always rotating as it is affixed to the body of the aircraft. While this is not a problem for the sensors, it is an issue for calculation of the attitude and heading of the vehicle since this rotation must be filtered out of the geomagnetic signal. This is compounded by the relatively high rotation rate of these vehicles of up to 10Hz [6], [7].

As the scale of a monocoptor decreases, the speed at which it rotates needs to increase if efficiency of flight is to be maintained [7], [8]. In a fast spinning and small aircraft the on-board magnetometer placed close to the motors is exposed to high level of magnetic noise generated by the motors which rotate with speed close to the spin. Filtering that noise with traditional frequency domain filters is difficult since frequency separation between noise and signal is small, necessitating a complex high order filter, and raising a question if a standard frequency based filter could be effective at all.

This paper presents an alternative: using a recoded or constructed signal to null the signal generated by the motor.

The most widely used motor for micro aerial vehicles is the BrushLess Direct Current (BLDC) motor. The brushless DC motor is a permanent magnet synchronous motor designed to be used with a square wave input generated by a DC powered speed controller [9]. The motor is comprised of a permanently magnetised "rotor" that rotates and the electro-magnetic "stator" that remains stationary. This paper focuses on the most popular out-runner motor type where the rotor is positioned around the outside of the stator as shown in Figure 2.

The out-runner motor has a number of magnets arranged with the poles alternating on the faces of a ring outside the stator, the ring is then connected to a centre shaft that runs inside the stator (stator sandwiched between the rotor and shaft) [10], [11].



Figure 2. 1. BLDC simplified configuration. 2: Illustration of localised demagnetisation (not to scale).

As the motor rotates, the magnets are presented to different parts of the stator. By energising the windings to pull the magnet towards the winding or inverting the power to winding to push the magnet away, torque is applied to the rotor [12].

The same movement of the magnets generates an alternating field outside the rotor. This field is used by some speed controllers to sense the position of the rotor to determine the optimum way to energise the stator at that instant. This field also is measured by magnetometers as noise superimposed on the geomagnetic measurements.

The strength of the field is dependent on the construction of the magnet, size of the magnet, construction around the magnet and distance to the magnet.

Permanent magnets exhibit a tendency to demagnetise over time. Demagnetisation exhibits relationships with temperature, time and subjected magnetic fields [13], [14]. However it has been noted that there is behaviour where regions of the magnet will demagnetise in preference to surrounding regions resulting in poles with non-uniform strength within the pole region.

Localised demagnetisation is of interest for sensing applications as it adds higher frequency components to the signal generated by the rotation of the motor. Frequencies of these components are approximately odd multiples of the number of poles. As each magnet may not deteriorate identically, the frequency multiple may not be an integer value (but as it is a periodic signal, will be a waveform with an even number of poles).

This paper is structured as follows: in Section 2 analysis of the noise generated by motors is provided, in Section 3 we analyse options for filtering and outline the design of time domain inverse filter, in Section 4 filter performance is discussed, in Section 5 design options and limitations are examined, and finally in Section 6 we conclude and outline possible extensions to this work.

#### II. EXTERNAL MAGNETIC FIELD OF PERMANENT MAGNET SYNCHRONOUS MOTORS

Before a method of correction could be attempted, the properties of the interference due to the motors needed to be

determined. To do this a motor was operated with a moderate load and the resulting magnetic interference



Figure 3. Raw measured magnetic field of rotating motor

recorded by a magnetometer in close proximity. The experiments were conducted on a number of small motors of different types such as Cyclone 440, Scanner RC SCM3213-1750, Turnigy C2826-1650, Turnigy C3542-1100, and a wear ranging from brand new to a few hours of continuous load (typical for micro-aerial vehicles). The effects of the motor were measured at various speeds. During this test it was noted that the noise was fairly constant across the different speeds. The representative results of measurements are shown in Figure 3.

The external magnetic field with the motor running (for the tested motor) is approximately 0.05 gauss. The field generated purely due to permanent magnetic field is also approximately 0.05 gauss for this motor. The above observation indicates that the field measurements can be performed with the motor rotated by an external drive e.g. a stepper motor. Such an arrangement allows for much better control and more precise measurements.

The result of the magnetic field measurement is a periodic waveform presented in Figure It can be observed that the motor appears to have a higher and lower frequency components.



Figure 4. Example magnetic field measurement with a rotating motor

Closer analysis using the Fast Fourier Transform identifies three dominate frequency components, as shown in Figure 5.

The lowest frequency component  $f_1$  corresponds to the speed of the motor. From this it can be established that the

motor forms a pair of strong poles, possibly due to imbalances in the magnets. This pair of poles forms the strongest field present in the motor when considering only the permanent magnetic field.

The medium frequency component  $f_7$  corresponds to the magnets embedded inside the rotor of the motor: The motor under this test was a 14 pole motor, with each poll corresponding to either a north or south orientation. As the motor is rotated the polls will result in a waveform with a number of peaks equal to the number of poles and a frequency equal to half the number of poles.



Figure 5. FFT analysis of magnetic field of rotating BLDC

The high frequency component  $f_{35}$  is due to localised demagnetisation. Owing to the fact that the demagnetisation is not the same for all magnets, the spectrum had a wider distribution. Different motors exhibited different centre frequencies and distributions, but were all around the 35 times rotation rate.

#### III. SELECTION AND DESIGN OF FILTERS

#### A. Frequency domain Filters

Frequency domain filters are the current choice for filtering noise for sensors on UAVs.

The low pass filter is almost universally used on all sensors (typically  $2^{nd}$  or  $5^{th}$  order) to remove high frequency noise from the desired signal. Low pass filters perform poorly when the desired signal is close to the noise and unfortunately, that is the case when the vehicle rotational speed is close to the motor speed.

A more sophisticated method to allow the speed of the vehicle to approach the speed of the motor is to use a combination of tracking notch filter and a low pass filter. Assuming that the motor and vehicle rotation speed don't overlap for long periods, tracking the motor speed, and centring a notch filter on the motor speed may provide a filter of superior performance. A low pass filter would remove high frequency noise outside the maximum vehicle dynamics.

We developed and tested a novel method based on combination of the notch filter tracking the motor and the band pass filter tracking gyroscope measurements of the vehicle. This is based on the observation that if the gyroscope signal is approximately correct then the gyroscope data can be used to estimate the frequency of changes in the geomagnetic field directly related to the motion of the vehicle. We used a notch filter tracking motor speed (4<sup>th</sup> order Butterworth) and a band pass filter tracking gyroscope measurements (4<sup>th</sup> order Butterworth).

#### B. Inverse filter in time domain

The principle of inverse filter is that for periodic noise signals, such as generated by motors, a period of noise signal known not to contain the desired signal is recorded and applied as an inverse signal super-positioned with the measured signal in time domain. The expected result should be the desired signal with only residues of the noise components. This approach has been used in magnetic tape playback [15].

In our application, this method has the potential of providing optimal results assuming that the noise is only dependent on the angular position of the motor's rotor. The noise signal can be divided into position dependent elements that are stored in a look up table. Each element corresponds to a small arc of the motors motion; as the motor rotates, successive elements in the table are used to provide a correction. As this method requires accurate position of the rotor, an optical encoder is added to the motor. The encoder used for experiments generates a signal twice every 1/800<sup>th</sup> of a rotation (two edges, spaced apart by 1/1600<sup>th</sup> of a rotation). This signal is used to estimate the position of the rotor and to step to the next element in the look up table.



Figure 6. Inverse filter structure

The design shown in Figure 6 is the core of the filter. The index is the current lookup location, Din is the sensor input and Dout is the corrected output. The filter was implemented on an FPGA, and is optimised to make use of the available resources. The BRAM (Block RAM) acts as the look up table and is loaded with the correction values. Half wave and Quarter wave symmetry were applied to reduce the BRAM usage. Symmetry requires that the index value needs to be folded into a subset of ranges and an offset is needed to shift the waveform. This makes the control part of the design more complex and taking into account the looming accuracy issue in case of asymmetry of the waveform, as shown in Figure 7, makes the approach less attractive.



Figure 7. Waveform of half wave symmetric table

Table 1 shows that the logic size doubles when symmetry is used, however as the filter is a very light design, the utilisation of FPGA resources is very small.

|               | Used | Spartan 3e500   |
|---------------|------|-----------------|
|               |      | total available |
| Logic Slices  | 33   | 4656            |
| (no symmetry) |      |                 |
| Logic Slices  | 61   | 4656            |
| (symmetry)    |      |                 |
| Block Ram     | 1    | 20              |
| (1024x16bit)  |      |                 |

TABLE 1. FPGA IMPLEMENTATION RESOURCES

The throughput of the filter is very high, requiring only 5 clock cycles to perform a correction. The correction time is as follows:

- 1 cycle to load in the sensor data and index address
- 1 cycle to convert the index address to LUT address
- 1 cycle to fetch the correction from the LUT
- 1 cycle to add the correction and sensor data
- 1 cycle to output the result

This filter is comparable to the simplest frequency domain filter (2 point window filter) in both speed and implementation size. Compared to the band stop/pass filters the implementation size is significantly smaller as it requires at most 3 addition operations rather than iterative addition and multiplication operations. Given a moderate 50HMz clock speed the latency is 100 nanoseconds, which is more than adequate for this application, where filtering in the range of kilohertz is required.

#### IV. FILTER PERFORMANCE

For all filter designs except the look up table approach, the frequency separation between signal and noise is important. The performance was compared for several conditions:

- 1. "Traditional" separation that could be expected for most vehicles; vehicle rotation is significantly slower than the motor speed(Typically Fn/Fs > 10)
- 2. Narrow separation where the vehicle rotation is still slower than the motor, but they are with in an order of magnitude. (10>Fn/Fs > 1)
- 3. Reversed separation where the motor speed is slower than the vehicle rotational velocity. This would only likely be seen if the motor power was reduced to slow the vehicle (Fn/Fs < 1)



Figure 8. Results of filters at different noise to signal frequency factors

The results in *Figure* 8 show that the low pass filters perform poorly when vehicle and motor rotation speeds are close. Both the band pass and notch filters worked well. Their difference in performance, as shown in Figure 8, is explained by the fact that the notch filter handles motor rotations which have some variance while the band pass filters vehicle rotations which are much more stable due to larger inertia. Their main disadvantage is much larger cost of implementation than the inverse filter.

The look up table based inverse filter has showed good performance and the exceptional performance was achieved with the addition of a low pass filter for filtering the  $f_{35}$  component.

Another advantage of the look up table based method is constant latency and linear phase delay, which would be achievable with standard FIR filter but at much higher implementation cost.

#### V. INVERSE FILTER CONSIDERATIONS

As Figure 5 shows, the spectrum of magnetic noise from the motor contains high frequency component  $f_{35}$ . In a straightforward approach the look up table of inverse filter would need to have sufficient number of samples/elements to cover that spectrum. A more efficient method is to augment inverse filter with a simple 2<sup>nd</sup> order low pass filter (LPF in Figure 8) to filter out that part of the spectrum.

Results presented in Figure 8 were obtained with the highest resolution implemented for the inverse filter, however depending on the platform it may be desirable to decrease the number of elements in the look up table. Tables with 800, 400, 200, 100 and 50 element were tested to determine the impact.

The results in Figure 9 show that for 800 to 200 look up elements the deterioration of performance is limited, while 100 and 50 element table causes significant decrease in effectiveness. The x axis is the rotor position error measured in the number of elements miscounted by the rotor position encoder. The larger the number of samples, the lesser is the impact of the position error since the angle which each sample represents is smaller. FPGA BRAMs make implementation of look up tables very efficient, even if large number of elements is required.



Figure 9. Heading error due to rotor encoder miscounts

Our experiments with a number of motors show significant variations in the rotating magnetic field patterns, even for the same type of motor. This is caused by manufacturing imperfections, demagnetisation and/or wear and tear. The magnets vary in strength (or over time loose strength at differing rates) resulting in the waveform being compressed for a portion of cycle and expanded for another portion, and the individual peaks having varying magnitude. The localised demagnetisation effect adds a high frequency component that is not instantaneously constant, when the motor starts.

The above phenomena means that the look up table must be prepared for a specific motor and updated with the aging of the motor. This can be done automatically with help of an additional calibration module in the aircraft control system that is activated in the start when the vehicle is stationary.

#### VI. CONCLUSIONS AND FUTURE RESEARCH

We analysed and experimented with the motor induced noise in magnetometer measurements on board micro body rotate aircraft. Methods of filtering that noise were investigated and properties of various filters assessed for this application. An inverse filter in time domain, based on look up table principle, has been designed on FPGA. The filter was tested and proven to have superior performance in filtering signals and noise of very close frequencies. This filter demonstrates a small implementation cost while offering speed either matching or exceeding the performance of optimised frequency domain based filters.

The research presented in this paper focused on steady state operation of the motor. This was justified as the motor speed of a rotary body vehicle does not vary significantly during operation. However for some applications where the motor speed need to change rapidly, further research is needed to assess applicability of our filtering method.

A possible application of our research is also for assessing the health of a motor by monitoring the spectrum of motor induced noise in magnetometer readings. It has been demonstrated that as a motor ages or is subjected to high loads, the localised demagnetisation increases, changing its noise spectrum. This relationship could be possibly used to estimate the health of the motor.

#### REFERENCES

- W. Pearce, "Papin-RouillyGyropter(Gyropter)". [Online]. Available from:https://oldmachinepress.wordpress.com/2012/09/06/papinrouilly-gyroptere-gyropter/. [Retrieved 03/06/2015]
- [2] K. Fregene and C. L. Bolden, "Dynamics and Control of a Biomimetic Single-Wing Nano Air Vehicle", 2010 American Control Conference Marriott Waterfront, Baltimore, MD, USA, June 30-July 02, 2010, pp51-56.
- [3] S. M. Jameson, B. P. Boesch, and E. H. Allen, "Active maple seed flyer", United States of America Patent US7766274 B1, Lockheed-Martin, Aug3, 2010.
- [4] E. R. Ulrich, D. J. Pines, and J. S. Humbert, "From Falling to Flying: The Path to Powered Flight of a Robotic Samara Nano Air Vehicle," Bioinspiration and Biomimetics, Vol. 5, No. 4, 2010, pp 3-16.
- [5] Pounds P. and Singh S. "Samara: Biologically inspired self-deploying sensor networks", IEEE Potentials, Vol 34, No.2, 2015, pp10—14.
- [6] C. Hockley and B. Butka, "The Samareye: A Biologically Inspired Autonomous Vehicle," in Digital Avionics Systems Conference, Salt Lake City, Oct. 2010, pp 5.C.1-1 - 5.C.1-9.
- [7] H. Youngren, S. Jameson, and B. Satterfield "Design of the SAMARAI Monowing Rotorcraft Nano Air Vehicle," [Online].Available from: http://www.atl.lmco.com/papers/1628.pdf. [Retrieved 14/04/2015].
- [8] A.R.S. Bramwell, G. Done, and D. Balmford, "Bramwell's Helicopter Dynamics", Butterworth-Heinemann, 2001.

- [9] Microsemi User Guide, "Field Oriented Control of Permanent Magnet Synchronous Motors". [Online]. Available from: http://www.microsemi.com/document-portal/doc\_view/130909-sffoc-pmsm-hall-ug [Retrieved 03/06/2015
- [10] G. H. Jang, J. H. Chang, D. P. Hong, and K. S. Kim "Finite-Element Analysis of an Electromechanical Field of a BLDC Motor Considering Speed Control and Mechanical Flexibility," IEEE Transactions On Magnetics, vol. 38, no. 2, 2002, pp. 945-948.
- [11] N. Bianchi, S. Bolofa, and F. Luise, "Analysis and design of a brushless motor for high speed operation," in Electric Machines and Drives Conference, Madison, Wisconsin, 2003, pp-44-5.
- [12] J. Rais and M.P. Donsión, "Permanent Magnet Synchronous Motors (PMSM). Parameters influence on the synchronization process of a PMSM," International Conference Renewable Energies and Power Quality (ICREPQ'8), pp.409, Santander, March 2008
- [13] M. Ooshima, S. Miyazawa, A. Chiba, F. Nakamura, and T. Fukao "A Rotor Design of a Permanent Magnet-Type Bearingless Motor Considering Demagnetization," in Power Conversion Conference, Naqaoka, 1997, pp-655-660.
- [14] S. Touati, R. Ibtiouen, O. Touhami, and A. Djerdir "Experimental investigation and optimisation of permanent magnet motor based on coupling boundry element method with permeances network," Progress in Electromagnetics Research, vol. 111, no. -, pp. 71-90,
- [15] R. W. Kruppa, "Method, and apparatus and article of manufacture for filtering periodic noise from a magnetic read head". United States of America Patent 5,887,075, 23 March 1999.

## Implementation and Comparison of Conventional and Ordering Based RO-PUFs for Secret Key Generation

Giray KömürcüAli Emre Pusane, Günhan DündarNational Research Institute of Electronics and Cryptology<br/>TÜBİTAK, Kocaeli, TurkeyBogazici University, Dept. of Electrical and Electronics Eng.<br/>Istanbul, TurkeyEmail: giray.komurcu@tubitak.gov.trEmail: {ali.pusane, dundar}@boun.edu.tr

Abstract—Physical Unclonable Functions (PUFs) are security primitives that have the capability of key generation on the fly. Ordering based Ring Oscillator (RO) PUFs are one of the best performing structures in terms of robustness, since key generation requires error-free bit streams. Even though many aspects of ordering based RO-PUFs have been analyzed in considerable detail in the literature, a full implementation has not been presented yet. Hence, the total area cost of the system is still in question. In this work, we first implement a conventional RO-PUF including an Error Correction Coding (ECC) block. Then, we present a full implementation of an ordering based RO-PUFs are compared and their advantages and disadvantages are discussed.

#### Keywords-PUF, Physical Unclonable Functions, Reliability, Robustness, Ring Oscillator, FPGA, Key Generation

#### I. INTRODUCTION

Physical Unclonable Functions (PUFs) provide economic and secure solutions in the areas of cryptographic key generation, IP protection, authentication, and ID generation with their capability of signature generation on the fly [1]. With this property, they eliminate the need for a nonvolatile memory for ID and key storage purposes. Even though Optical PUFs and Coating PUFs are the first two structures proposed in the literature, their impracticality and expensive equipment requirement prevented wide usage of these primitives [2][3]. In spite of this, Silicon PUFs, such as Arbiter PUFs, SRAM PUFs, Ring Oscillator (RO) PUFs, Butterfly PUFs, and Glitch PUFs have drawn significant attention with their ease of integration and low cost [4]-[8].

The main working principle of PUFs depends on small mismatches present in the manufacturing process, which lead to the deviation of parameters such as doping concentration, threshold voltage, and oxide thickness. These deviations are the basis for the uniqueness, robustness, unclonability, and unpredictability properties of PUF structures. Certain PUF types, such as RO-PUFs, are convenient for FPGA implementations as well, since manufacturing imperfections are also present in FPGAs [9]. Robustness is a key feature of PUF circuits, which aims at minimizing the number of unstable bits at the output [10]. Since PUF outputs are generated depending on small imperfections in the IC, any temporal variation present in the system may easily result in generating unstable outputs [11]. Almost all PUF structures, except for ordering based RO-PUFs, are vulnerable to internal and external effects and generate noisy outputs. However, certain applications, such as key generation, require 100% robust outputs for correct operation. Adding an Error Correction Coding (ECC) block is a proper but costly solution for key generation systems that utilize noisy PUF circuits.

RO-PUFs, which are the most convenient type of PUFs for FPGA implementation, work relatively reliably under changing environmental conditions and are suitable for key generation applications [9][12]. A conventional RO-PUF compares the frequencies of two identical ROs for one bit output generation. In these systems, the output bit can be set to 0, if  $RO_1$  is faster than  $RO_2$ , and can be set to 1, otherwise. Since applications require generation of certain length bitstreams, a number of identical ROs are implemented in the circuit and different pairs are selected via multiplexers for each output bit generation. Ordering based RO-PUFs generate outputs using the frequency ordering of a group of ROs. During the grouping step, ROs whose frequencies are adequately apart from each other are grouped together in order to prevent ordering changes due to environmental variations and noise. Despite the noisy nature of conventional RO-PUFs, ordering based RO-PUFs enable 100 % robust, noise-free outputs, and avoid the need for ECC in key generation [13]. In addition to this, they have the capability of high entropy extraction, enabling higher area and power efficiency than conventional RO-PUFs [13][14]. Another advantage of ordering based RO-PUFs is their high number of CRP support that has been introduced recently [15]. Despite these advantages of ordering based RO-PUFs, a full hardware implementation including the output generation mechanisms has not been presented in the literature yet.

In this work, our main aim is to determine the area cost of ordering based RO-PUFs with all required components and compare their area efficiency with conventional RO-PUFs. For this purpose, we first present an implementation of conventional RO-PUFs with an ECC block for 100% robustness that is required for key generation in Section II.



Figure 1. Block Structure of Conventional RO-PUFs.

Table I Area Utilization of Frequency Detection Circuitry for Spartan3 and Virtex5 Devices.

| FPGA<br>Type | 96<br>ROs | 128<br>ROs | 160<br>ROs | 192<br>ROs | 224<br>ROs | 256<br>ROs |
|--------------|-----------|------------|------------|------------|------------|------------|
| Spartan3S    | 40        | 48         | 57         | 65         | 73         | 81         |
| Virtex5      | 31        | 44         | 44         | 57         | 62         | 68         |

Next, a full implementation of ordering based RO-PUFs is presented in Section III. Performances of conventional and ordering based RO-PUFs are compared and their advantages and disadvantages are discussed in Section IV. Finally, Section V concludes the paper.

#### II. IMPLEMENTATION OF CONVENTIONAL RO-PUFS AND ERROR CORRECTION CODES

Block structure of conventional RO-PUFs is presented in Figure 1. As can be seen from the figure, frequencies of implemented ROs are detected and output bits are generated depending on these frequencies. Frequency detection is a common step for both conventional and ordering based RO-PUFs and composed of a multiplexer and a counter. In this step, oscillation counts of all ROs are detected within a certain measurement time,  $t_m$ . With the proposed design, a multiplexer and a counter are implemented. Each RO is selected one-by-one with the multiplexer and their frequencies are detected with the counter. Six sample structures are implemented using combinatorial circuits for systems composed of 96, 128, 160, 192, 224, and 256 ROs. Area utilization results for Xilinx Spartan3 and Virtex5 FPGAs are presented in Table I. Maximum achievable frequencies for the proposed frequency detection circuit are 230 MHz for Spartan3 and 430 MHz for Virtex5 devices, which are significantly higher than the oscillation frequencies of 5-stage RO structures in both FPGA types. The output generation step is composed of a comparator to compare the oscillation counts and is implemented using 9 and 5 slices for Spartan3 and Virtex5 devices, respectively.

The last block required for 100% robust output generation using conventional RO-PUFs is ECC. The use of ECC in PUF implementations is illustrated in Figure 2. As can be



Figure 2. Key Generation Schematic with Conventional RO-PUFs.

Table II Area Utilization of Error Correction Codes for Spartan3 and Virtex5 Devices.

| Err. Cor.<br>Capabilty | (255,<br>231,3) | (255,<br>207,6) | (255,<br>187,9) | (255,<br>163,12) | (255,<br>139,15) | (255,<br>131,18) |
|------------------------|-----------------|-----------------|-----------------|------------------|------------------|------------------|
| Enc. Sp.               | 20              | 31              | 36              | 44               | 58               | 60               |
| Dec. Sp.               | 223             | 334             | 471             | 581              | 705              | 843              |
| Enc. Vir.              | 17              | 19              | 21              | 25               | 33               | 33               |
| Dec. Vir.              | 148             | 178             | 272             | 288              | 363              | 427              |

seen from the figure, PUF output is applied to the ECC encoder and helper data is generated and recorded to a database during the initialization phase. Then, during the usage phase, ECC decoder removes the noise present in the PUF output by using the information stored in the helper data. Bose, Chaudhuri, and Hocquenghem (BCH) codes are convenient for data recovery in PUF circuits with their guaranteed error recovery for multiple errors. In this study, BCH codes are implemented and analyzed in terms of area and timing performance.

The capabilities of multi-bit correcting ECC are shown with a three item notation, (a, b, c). In this format, a represents the total number of data and helper data bits, b represents the total number of data bits, and c represents the maximum number of erroneous bits that ECC can recover successfully in a noisy data. As the number of maximum number of erroneous bits that can be recovered increases, the complexity; hence, the area, time, and power consumption of both ECC encoder and decoder increase as well.

In order to determine the area overhead of ECC on PUF systems, BCH encoders and decoders for different error correction capabilities are implemented and their area usages are analyzed. In all systems considered, *a* is selected as 255 bits. Results are presented in Table II. As can be seen from the table, area usage increases as the error correction capability increases. For instance, 3 bit correcting BCH decoder consumes 223 slices, whereas 18 bit correcting BCH decoder consumes 843 slices on Spartan3 FPGAs. Since the implemented conventional RO-PUF may result in up to 18 bits of errors, (255, 131, 18) BCH encoder and decoder seems ideal for this case [10].

#### III. IMPLEMENTATION OF ORDERING BASED RO-PUFS

As mentioned previously, the main advantages of ordering based RO-PUFs are their 100% robust output generation capability and high entropy extraction. Even though the number of required ROs for the generation of certain length outputs is significantly reduced with ordering based RO-PUFs compared to the conventional structures, analysis of the output generation mechanisms in terms of area and speed will be beneficial for a fair comparison. For this purpose, ordering and output generation circuits are developed and implemented for different number of ROs and group lengths.

The output generation mechanism of the proposed ordering based RO-PUF is illustrated in Figure 3. According to this structure, it is assumed that grouping is done either by a PC during the initialization step and resulting groups are stored in a memory on-chip or off-chip, or done by a microprocessor present on the IC. Determining the ordering of ROs in a group and generating the output depending on this ordering are mandatory steps in ordering based RO-PUFs and are critical for the performance and cost of the system. This step can be performed using a microprocessor already present in the system, or by implementing a dedicated hardware. Assuming a microprocessor is not present in the system, dedicated hardware blocks are designed and implemented for ordering and output generation steps. Ordering of the oscillation counts is performed sequentially. RO IDs and their counts are stored in an array of registers in increasing order of the oscillation counts. Ordering of four ROs are illustrated in Figure 4. Execution time of ordering the circuits is upper-bounded by  $m^2/2$  for a group of m oscillators. However, since the ordering can overlap with the frequency detection of ROs, only the ordering time of the last group will reduce the speed of the operation.

Output generation of the ordering based RO-PUF is performed by mapping each ordering to a different bitstream using a sequential circuit. In this step, RO IDs and ordering information are used together. Pseudo code of the output generation is presented in Figure 5 and output generation of a group of four ROs is illustrated in Figure 6. Execution time of the ordering circuit is upper-bounded by m for a group of m oscillators. Similar to the ordering case, only the output generation time of the last group will reduce the speed of the operation.

#### IV. IMPLEMENTATION RESULTS AND COMPARISON

Since measuring the ROs one-by-one is a good design practice to prevent the inter-locking of ROs, implementing one ordering detection and output generation circuit according to the largest group present in the system is the most convenient way for ordering based RO-PUFs. In this method, an upper-bound for the group lengths is set and the grouping step forms the groups according to this upperbound. The proposed ordering and output generation circuits are implemented for different group lengths in the range of 3



Figure 3. Block Structure of Ordering Based RO-PUFs.

| STEP 1                        |                 | STEP                          | 2               |  |
|-------------------------------|-----------------|-------------------------------|-----------------|--|
| OSC_CNT                       | RO_ID           | OSC_CNT                       | RO_ID           |  |
| <br>10100                     | 0               | 10100                         | 0               |  |
|                               |                 | <br>10110                     | 1               |  |
|                               |                 |                               |                 |  |
|                               |                 |                               |                 |  |
|                               |                 | STEP 4                        |                 |  |
| STEP                          | 3               | STEP                          | 4               |  |
| STEP<br>OSC_CNT               |                 | STEP<br>OSC_CNT               |                 |  |
|                               |                 |                               |                 |  |
| <br>OSC_CNT                   | RO_ID           | OSC_CNT                       | RO_ID           |  |
| OSC_CNT<br>10001              | RO_ID<br>2      | <br>OSC_CNT<br>10001          | RO_ID<br>2      |  |
| <br>OSC_CNT<br>10001<br>10100 | RO_ID<br>2<br>0 | <br>OSC_CNT<br>10001<br>10100 | RO_ID<br>2<br>0 |  |

Figure 4. Ordering circuit sample execution.

to 10 and their area utilization results are presented in Tables III and IV. As can be seen from the tables, required resources increases immensely for ordering and output generation circuits as the group lengths increase.

Total number of slices for the generation of 128 bit outputs using conventional RO-PUFs and ordering based RO-PUFs with different maximum group lengths are presented in Tables III, IV and Figure 7. As can be seen from the tables, the required number of ROs decreases with increasing maximum allowed group lengths due to the more and more entropy extraction. These values are obtained from a Matlab analysis and rounded up for a safety margin. According to the presented results, ordering based RO-PUFs with maximum group lengths of 3 and 4 seem to be the optimum case for Spartan3 and Virtex5 devices, respectively, for the area performance of the system. Increasing the group lengths more than the indicated values does not contribute to the overall performance due to the increasing cost of ordering and output generation circuits. It should be also noted that the area performance of the conventional circuit is significantly worse than the ordering based structure due to the high cost of ECC implementation. However, this step can not be eliminated for the applications that require 100% reliable outputs.

#### Data:

List of RO IDs in a group sorted according to their frequencies, RO[m].

Result: Output bitstream.for  $i \leftarrow 1$  to m - 1 doOutput = Output + RO[i]\*(m-i)!for  $j \leftarrow i$  to m - 1 doif RO[i] < RO[j] then| Decrement RO[i]endend

```
end
```

Figure 5. Output generation in pseudo code.



Figure 6. Output generation sample execution.

#### V. CONCLUSION

Ordering based RO-PUFs are recently developed promising structures with their 100% robust output generation capability, high entropy extraction, and suitability to FPGA implementations. However, a full implementation has not been yet presented, preventing a fair comparison with conventional RO-PUFs. In this work, we have investigated the area cost of both conventional and ordering based RO-PUFs in detail for two different FPGA types. According to the analysis results, ordering based RO-PUFs with small group seems to be the best performing structures for generating robust outputs.

#### REFERENCES

- G. E. Suh and S. Devadas, "Physical unclonable functions for device authentication and secret key generation," in Design Automation Conference (DAC), 2007, pp. 9–14.
- [2] R. S. Pappu, "Physical one-way functions." Ph.D. dissertation, Massachusetts Institute of Technology, Massachusetts, 2001.
- [3] P. Tuyls, G. J. Shrijen, B. Skoric, J. V. Geloven, N. Verhaegh, and R. Walters, "Read proof hardware from protective coatings," in 18th Annual Computer Security Applications Conference (CHES), vol. 4249, 2006, pp. 369–383.

 Table III

 AREA UTILIZATION OF RO-PUFS FOR SPARTAN3 DEVICES.

| PUF    | RO  | RO    | F. Det. | Ord.  | O. Gen. | ECC   | Total |
|--------|-----|-------|---------|-------|---------|-------|-------|
| Туре   | Num | Slice | Slice   | Slice | Slice   | Slice | Slice |
| Conv.  | 256 | 512   | 81      | 0     | 9       | 903   | 1505  |
| OB(3)  | 195 | 390   | 73      | 28    | 10      | 0     | 501   |
| OB(4)  | 185 | 370   | 65      | 57    | 16      | 0     | 508   |
| OB(5)  | 175 | 350   | 65      | 97    | 36      | 0     | 548   |
| OB(6)  | 170 | 340   | 65      | 128   | 57      | 0     | 590   |
| OB(7)  | 165 | 330   | 65      | 163   | 82      | 0     | 640   |
| OB(8)  | 160 | 320   | 57      | 213   | 98      | 0     | 688   |
| OB(9)  | 155 | 310   | 48      | 260   | 175     | 0     | 793   |
| OB(10) | 150 | 300   | 48      | 336   | 210     | 0     | 894   |

 Table IV

 AREA UTILIZATION OF RO-PUFS FOR VIRTEX5 DEVICES.

| PUF<br>Type | RO<br>Num | RO<br>Slice | F. Det.<br>Slice | Ord.<br>Slice | O. Gen.<br>Slice | ECC<br>Slice | Total<br>Slice |
|-------------|-----------|-------------|------------------|---------------|------------------|--------------|----------------|
| Conv.       | 256       | 512         | 68               | 0             | 5                | 460          | 1045           |
| OB(3)       | 195       | 390         | 62               | 10            | 7                | 0            | 469            |
| OB(4)       | 185       | 370         | 57               | 26            | 10               | 0            | 463            |
| OB(5)       | 175       | 350         | 57               | 49            | 15               | 0            | 471            |
| OB(6)       | 170       | 340         | 57               | 54            | 23               | 0            | 474            |
| OB(7)       | 165       | 330         | 57               | 71            | 45               | 0            | 503            |
| OB(8)       | 160       | 320         | 44               | 114           | 56               | 0            | 534            |
| OB(9)       | 155       | 310         | 44               | 117           | 98               | 0            | 569            |
| OB(10)      | 150       | 300         | 44               | 181           | 123              | 0            | 648            |

- [4] D. Lim, J. Lee, B. Gasend, G.E.Suh, M. V. Dijk, and S. Devadas, "Extracting secret keys from integrated circuits," IEEE Transactions on VLSI Systems, vol. 13, no. 10, 2005, pp. 1200–1205.
- [5] B. Gassend, D. Clarke, M. V. Dijk, and S. Devadas, "Delaybased circuit authentication and applications," in ACM Symposium on Applied Computing, 2003, pp. 294–301.
- [6] B. Gassend, "Physical random functions," M.S. Thesis, Massachusetts Institute of Technology, Massachusetts, 2003.
- [7] J. Guajardo, S. Kumar, G. Schrijen, and P. Tuyls, "FPGA intrinsic PUFs and their use for IP protection," in 18th Annual Computer Security Applications Conference (CHES), vol. 4727, 2007, pp. 63–80.
- [8] D. Suzuki and K. Shimizu, "The glitch PUF: A new delay-PUF architecture exploiting glitch shapes," in Cryptographic Hardware and Embedded Systems (CHES), 2010, pp. 366– 382.
- [9] A. Maiti and P. Schaumont, "Improved ring oscillator PUF: An FPGA-friendly secure primitive," Journal of Cryptology, vol. 24, no. 2, 2011, pp. 375–397.
- [10] G. Komurcu and G. Dundar, "Determining the quality metrics for PUFs and performance evaluation of two RO-PUFs," in IEEE 10th International New Circuits and Systems Conference, (NEWCAS), 2012, pp. 73–76.
- [11] A. Maiti, L. McDougall, and P. Schaumont, "The impact of aging on an FPGA-based physical unclonable function," in International Conference on Field Programmable Logic and Applications (FPL), 2011, pp. 151–156.



Figure 7. Area Utilization of RO-PUFs.

- [12] C. Yin and G. Qu, "Temperature aware cooperative ring oscillator PUF," in IEEE International Workshop on Hardware Oriented Security and Trust (HOST), 2009, pp. 36–42.
- [13] C. Yin and G. Qu, "LISA: Maximizing RO-PUF's secret extraction," in IEEE International Symposium on Hardware Oriented Security and Trust (HOST), 2010, pp. 100–105.
- [14] G. Komurcu, A. E. Pusane, and G. Dundar, "Dynamic programming based grouping method for RO-PUFs," in 9th Conference on Ph. D. Research in Microelectronics and Electronics (PRIME), 2013, pp. 329–332.
- [15] G. Komurcu, A. E. Pusane, and G. Dundar, "Enhanced challenge-response set and secure usage scenarios for ordering based RO-PUFs," Devices, and Systems, (IET-CDS) , vol. 9, no. 2, 2014, pp. 87–95.

# Hopf Bifurcation Analysis and Implementation of Single Tunnel Diode Oscillator Circuit

Mustafa Fayez, Mohammad Awwad, Hassan El-Hamouly Department of Electronic Engineering Military Technical College Cairo, Egypt Mustafa.Fayez.EG@IEEE.org, Mohammad.Awwad.EG@IEEE.org, H.Hamouly49@gmail.com

*Abstract*—In this paper, the simple LC tunnel diode is proved to oscillate at bias voltage in the range of the negative resistance region of tunnel diode. The Hopf bifurcation theorem is employed to prove the theoretical proof. The achieved analysis has been verified by circuit simulations and confirmed by experimental measurements. The results illustrates that for 1N3716 tunnel diode, oscillations occur at a bias voltage starts from 65 mV to 500 mV.

#### Keywords—Hopf Bifurcation; Tunnel Diode; Oscillations.

#### I. INTRODUCTION

One of the most powerful methods for studying periodic solutions in autonomous nonlinear systems is the Hopf bifurcation theorem. It shows that oscillations near an equilibrium point can be understood by looking at the eigenvalues of the linearized equations for perturbations from equilibrium and at certain crucial derivatives of the equation [1]. The problem discussed in this paper is applying Hopf analysis to an electronic circuit and verifying this using both simulation and hardware implementation and measurement.

Tunnel diodes are heavily doped p-n junctions only some 10 nm (100 Å) wide. The heavy doping results in a broken bandgap, where conduction band electron states on the n-side are more or less aligned with valence band hole states on the p-side. Under normal forward bias operation, as voltage begins to increase, electrons at first tunnel through the p-n junction barrier because electron states in the conduction band on the n-side become aligned with valence band hole states on the p-side of the pn junction. As voltage increases further, these states become more misaligned and the current drops this is called negative resistance, because current decreases with increasing voltage. As voltage increases yet further, the diode begins to operate as a normal diode, where electrons travel by conduction across the pn junction, and no longer by tunneling through the pn junction barrier. Thus, the most important operating region for a tunnel diode is the negative resistance region.

This paper will be organized as follows: in Section-II Hopf bifurcation analysis is applied to the oscillator circuit [2].In Section-III, a SPICE model for the tunnel diode is built [3][4] and used in ORCAD software. The model is measured and compared with actual I-V characteristics measured with an I-V characterization device. The circuit is simulated using PSPICE. The circuit is implemented and the simulated results are compared with the measured ones.

#### II. CIRCUIT SCHEMATIC AND HOPF BIFURCATION ANALYSIS

The circuit consists of a single tunnel-diode, an inductor, and a capacitor as shown in Figure 1.



Figure 1. Circuit schematic of a single tunnel diode LC oscillator

The mathematical expressions for this circuit is [5]:

$$\dot{i}_L = \frac{1}{L} v_C \tag{1}$$

$$\dot{v_C} = \frac{1}{C} \left( g \left( v_B - v_C \right) - i_L \right) \tag{2}$$

Where g (V) describes the relation between current and voltage at tunnel diode terminals as follows:

 $x_1 = i_L, x_2 = v_C$ 

$$I_D = I_{excess} + I_{tunnel} + I_{Diode}$$
(3)

$$I_{D} = \frac{v_{D}}{R_{v}} e^{(\frac{v_{D} - v_{v}}{v_{ex}})} + \frac{v_{D}}{R_{o}} e^{-(\frac{v_{D}}{v_{o}})} + I_{s} e^{(\frac{v_{D}}{V_{T}})}$$
(4)

Let:

(5)

And VB is chosen to be  $\mu$ , then (1) and (2) will be:

$$x_1' = \frac{1}{L}x_2 \tag{6}$$

$$x_{2} = \frac{1}{C} \left( g \left( \mu - v_{C} \right) - x_{1} \right)$$
(7)

The equilibrium point is

$$x_2 = 0, x_1 = g\left(\mu\right) \tag{8}$$

$$y_2 = x_2, y_1 = x_1 - g(\mu)$$
 (9)

The new ODE sys will be:

$$y_1' = \frac{1}{L} y_2$$
 (10)

$$y'_{2} = \frac{1}{C} \left[ g(\mu - y_{2}) - y_{1} - g(\mu) \right]$$
 (11)

This system has a fixed point at the origin, and the linearized system will be:

$$A = \begin{bmatrix} 0 & \frac{1}{L} \\ -\frac{1}{C} & -f(\mu) \end{bmatrix}$$
(12)

Where f ( $\mu$ ) is the linear components of g ( $\mu$ ) and its eigenvalues are:

$$\lambda = \frac{-f(\mu) \pm \sqrt{f(\mu)^2 - \frac{4}{LC}}}{2}$$
(13)

$$f(\mu_{o}) = 0, \lambda = \pm i \frac{1}{\sqrt{LC}}$$
(14)

The derivative of the eigenvalues is always positive for all values of  $\mu$ . The index is calculated and proved to be negative. The bifurcation occurs at the negative resistance region of the tunnel diode.

# III. CIRCUIT SIMULATION AND IMPLEMENTATION

In this section, circuit simulation is performed using OrCAD to verify the range of oscillation of the circuit. Then

the circuit is implemented and measured to compare simulation results with experimental measurements.

#### A. Tunnel Diode Spice Model

The tunnel diode has no PSPICE model, so ABM is used to be able to simulate the tunnel diode as shown in Figure 2 using model equations in [3].



Figure 2. Analog behavioral model of a tunnel diode

The ABM model is a superposition of three currents:

$$I = I_{diode} + I_{tunnel} + I_{excess}$$
(15)

• The diode current (I<sub>diode</sub>) is a regular p-n junction forward current due to injection of free electrons and holes from conduction band (CB) in n-type to CB in p-type, and from valence band (VB) in p-type to VB in type, respectively. This current is given by:

$$I_{diode} \approx I_{s} e^{\left[\left(\frac{V}{\eta V_{th}}\right)^{-1}\right]}$$
(16)

Where: Is denotes the saturation current,  $\eta$  represents the ideality factor, and Vth=KT/q.

• Tunneling current ( $I_{tunnel}$ ) is the current due to tunneling of free electrons from CB in the n-type to free holes of the same energy in VB of the p-type. This current increases in the positive resistance region (ohmic region) due to the increase of the aligned energy states below Fermi-level in CB with that above Fermi-level in VB. At the same time, in the negative resistance region, the number of aligned energy states begin to decrease decreasing this current. This current is given by:

$$I_{tunnel} = \frac{V}{R_0} e^{\left[-\left(\frac{V}{V_0}\right)^m\right]}$$
(17)

Where:  $R_0$  is the tunnel diode resistance in the ohmic region. m represents a factor with value ranges from 1 to 3. Also,  $V_0$  ranges from 0.1V

• The excess current(I<sub>excess</sub>) is an additional tunneling current related to parasitic tunneling via impurities and is given by:

$$I_{excess} = \frac{V}{R_V} e^{\left[\left(\frac{V-V_V}{V_{ex}}\right)\right]}$$
(18)

Where:  $V_V$  denotes valley voltage (voltage at local minimum current).  $R_V$  and  $V_{ex}$  are empirical parameters in high quality diodes  $R_V >> R_0$ ,  $V_{ex}$  ranges from 1 to 5V.

I-V characteristics were measured using Agilent I-V characterization system. Figure 3 shows the measured and simulated I-V characteristics.



Figure 3. Simulated I-V characteristics of tunnel diode ABM on PSPICE vs measured characteristics using I-V characterization system

#### B. Simulation

The circuit was simulated using PSPICE a. A time domain analysis is done to the circuit. It was found that if the bias exceeds  $\mu_0$ , the voltage across the inductor and capacitor oscillates with a frequency  $\omega_o = \frac{1}{\sqrt{LC}}$  as mentioned before. This is shown in Figure 4. And when the bias exceeds  $V_V$ , the oscillation begins to die.



Figure 4. output voltage versus time in PSPICE shows that it performs oscillations at frequency  $\omega_0$ 

#### C. Circuit Implementation

The circuit is implemented using a simple breadboard, and measured using an oscilloscope.



Figure 5. measured output voltage of the real circuit using oscilloscope and applying its internal digital filter.

It was found that the output voltage was very close to the simulated one except for some noise that appeared on the signal, so an internal digital filter built in the oscilloscope is used and the filtered output voltage is shown in Figure 5. We see that it is almost the same as the simulated circuit.

#### IV. CONCLUSION

Simple tunnel diode LC circuit oscillation was proved mathematically using the Hopf bifurcation theorem. The results showed that oscillation occurred at a bias voltage in the negative resistance region of the tunnel diode, an ABM of the tunnel diode was made to be able to simulate the circuit in PSPICE, and finally, the real circuit was implemented and its measured output was very close to the simulated one.

#### REFERENCES

- A. Mees and L. Chua, "The Hopf Bifurcation Theorem and Its Applications to Nonlinear Oscillations in Circuits and Systems", IEEE, 1979.
- R. Munoz, "Introduction to Bifurcations and the Hopf Bifurcation Theorem for Planar Systems", Colorado State University, 2011, pp. 11 - 14.
- [3] M. Lotfi and D. Zohir, "A Spice Behavioral Model of Tunnel Diode: Simulation and Application", International Conference on Automation, Control, Engineering and Computer Science (ACECS'14), Sousse, Tunisia, 2014, pp. 190 - 204.
- [4] Neculoiu. D and Tebeanu. T, "SPICE implementation of double barrier resonant tunnel diode model", in Semiconductor Conference, 1996, International, Sinaia, Romania, pp. 181 - 184 vol.1.
- [5] J. E. Marsden and M. McCracken, "The Hopf Bifurcation and Its Applications", Springer-Verlag, pp 95-104
- [6] GeneralElectric, "Tunnel Diodes" 1N3712-21 datasheet.

# **Reconfigurable Hyper-Structures for Intrinsic Digital Circuit Evolution**

S. Kazarlis, J. Kalomiros, V. Kalaitzis, D. Bogas, P. Mastorokostas, A. Balouktsis Dept. of Informatics Engineering Technological Educational Institute of Central Macedonia, 62124 Serres, Greece email: kazarlis@teicm.gr, ikalom@teicm.gr

Abstract— A workbench for intrinsic evolution of digital circuits is presented, based on a Cartesian Genetic Programming algorithm running on a personal computer and reconfigurable platform suitable for run-time a reconfiguration. Two types of Cartesian cell structures are proposed, based on a cylindrical interconnection grid. In addition to a feed-forward network, the cylindrical grid can allow feedback loops as well. The proposed structures are combined with dedicated communication and control logic, producing automatically a fitness result for each circuit configuration. The proposed system is tested with known digital circuits and evaluated in terms of resource usage and configuration speed.

Keywords - Evolvable Hardware; intrinsic evolution; reconfigurable hardware; Cartesian structures;

## I. INTRODUCTION

A lot of research has been directed in recent years towards the study of evolvable hardware (EHW), which is a field of evolutionary computation that employs evolutionary algorithms for the building of electronic circuits [1]-[3]. Evolvable hardware is an offspring of Genetic Programming, an evolutionary technique originally proposed for the evolution of software. In EHW, the circuits are encoded into genotypes, traditionally using tree structures, and more recently using Cartesian lattices or other forms, like binary strings. From the genotype the actual circuit or phenotype is constructed and tested, either in a simulator, as in the case of extrinsic evolution [4] [5] or in a reconfigurable device, as in intrinsic evolution [6]-[8]. Evolvable hardware can have a number of important applications, most notably in the automatic design of adaptive and fault-tolerant systems [3] and in the design of digital circuits, where new unconventional forms of known circuits can be found and new design principles can be derived [9] [10].

A variation of Genetic Programming, called Cartesian Genetic Programming (CGP), encodes a digital circuit as a directed graph, where functional units are represented by a rectangular array of nodes connected together to perform a computational task on binary input data [9] [11]. The genotype is a binary string that represents connections and gate functions. Based on this concept evolvable hardware V. Petridis

Dept. of Electrical and Computer Engineering Aristotle University of Thessaloniki, 54124 Thessaloniki, Greece email: petridis@eng.auth.gr

platforms have been proposed, both for extrinsic and for intrinsic evolution of digital circuits [5] [8] [9]. Also, following the notion of a Cartesian node array, a new type of reconfigurable platform has been introduced, the Virtual Reconfigurable Circuit, or VRC [12] [13]. A VRC is a new reconfigurable device realized on top of an ordinary Field Programmable Gate Array (FPGA), consisting of an array of Programmable Elements, interconnection network and configuration memory, all implemented on the available resources of a common FPGA device. The VRC concept has been utilized for the evolution of combinational circuits [14], and the evolution of components for image and signal processing [8].

The simple merit of such circuits is that while they adhere to the basic LUT cell structure of an FPGA chip, they are still open to full run-time reconfiguration by the user, through well determined configuration rules set by the matrix designer. In this way, the VRC reconfiguration circumvents the need for low-level configuration. The latter requires complicated low-level knowledge of the particular FPGA chip and the development of custom compilation tools. Both tasks are daunting and are usually hindered by undisclosed information or by the advent of new devices that revolutionize the field.

In this paper, a workbench for intrinsic digital evolution experiments is designed and implemented in a Field Programmable Gate Array. The system includes a host computer running a genetic programming application and a communication channel that allows the run-time reconfiguration of the evolvable platform. The configuration string is composed of the genotype encoded according to the CGP principles, while the phenotype is implemented and evaluated in the reconfigurable device.

The concept implemented in the proposed workbench is based on reconfigurable hyper-structures following the general idea of the VRCs. They form two-dimensional arrays of cells, which are interconnected with a predefined fixed or programmable switching array. The proposed structures adhere to specific interconnection properties derived from a cylindrical interconnection grid. In addition to the feed-forward network, the cylindrical grid can allow feedback loops as well. The proposed Configurable Cylindrical Structures or CCS are combined with custom communication and control logic, implemented as finite state machines. The peripheral logic allows communication with a PC host application over a serial port. An embedded register file is used in order to store the configuration values. Additional logic automatically produces a fitness result for each circuit configuration. The controller returns this fitness result to the host computer and the host CGP application proceeds to reconfigure the CCS.

In this preliminary phase, the proposed workbench is tested using configuration strings corresponding to typical test-benches for evolutionary design. The overall time for CCS configuration and fitness response is measured as a function of CCS dimensions. The required FPGA resources for the implementation of the CCS are also measured as a function of circuit complexity. In this way, the suitability of the proposed workbench for intrinsic evolution experiments is evaluated.

The remaining of the paper is organized as follows. In Section II, two alternative CCS circuits are reported and their differences are discussed. In Section III, the overall architecture, including the dedicated controllers and fitness logic, is presented. In Section IV, test configurations are conducted and evaluation results are reported, while in Section V, the paper is concluded.

#### II. THE CONFIGURABLE CARTESIAN STRUCTURES

#### A. CCS-1: A feed-forward Cartesian structure

The proposed configurable structures are developed as parameterizable blocks using the hardware description language VHDL, where external parameters are the required number of rows and columns in the Cartesian structure and the number of inputs and outputs in the CCS device. In Fig. 1, the first hyper-structure (CCS-1) implemented in the proposed workbench is presented. It is a two-dimensional lattice of two-input one-output cells connected with a fixed feed-forward interconnection grid. Each output can feed two separate forward inputs. In addition, the interconnection grid has a cylindrical structure, meaning that the lower-row cells are seamlessly interconnected with the upper-row cells. As a result, all cells of the hyper-structure receive inputs adhering to the same interconnection rules and the structure can automatically expand using a FOR GENERATE statement in VHDL.

The first column in the design of Fig. 1 is a set of multiplexers the role of which is to distribute the input signals to the front-end cells. There are two *p*-input multiplexers per cell, where *p* is the number of inputs of the target circuit. Depending on the required number of outputs q, q N:1 multiplexers in the output stage select one among the N possible outputs.

Each cell is composed by a 2-input LUT implemented by a four to one multiplexer, as shown in Fig. 2. The LUT is able to implement in total sixteen different two-input functions, including the basic digital gates.



Figure 1. A simple 4x4 Cartesian structure (CCS-1) with a fixed grid of interconnections.

An embedded register file is used in order to store the configuration scheme. The cell can easily be enhanced, in future expansions, to include a flip-flop in each cell, for sequential circuit design. A 4-bit register, where the configuration bits are stored, corresponds to each cell in the hyper-structure. Additionally, configuration registers are attributed to the selection bits of the input multiplexers. The register file is rewritten during reconfiguration at run-time, at all instants when the genetic algorithm updates the evolving circuit. Table I presents all possible gates and logic functions that a cell can implement, along with their corresponding binary configuration patterns. A and B are the cell inputs. In order to configure the four-input, four-output 4x4 lattice of Fig. 1, a total of eighty eight configuration bits is required nominally. These bits are distributed between the selection bits of the eight 4:1 input multiplexers (2x8=16 bits), the sixteen lattice cells (4x16=64 bits) and the four output multiplexers (2x4=8 bits).



Figure 2. Four-to-one multiplexer implementing the 2-input LUT for each cell of the proposed hyper-structure.

| X3 | X2 | X1 | X0 | Implemented logic   | Boolean<br>function         |
|----|----|----|----|---------------------|-----------------------------|
| 0  | 0  | 0  | 0  | Always outputs zero | F = 0                       |
| 0  | 0  | 0  | 1  | F=A NOR $B$         | $F = \overline{A + B}$      |
| 0  | 0  | 1  | 0  | F=A AND NOT( $B$ )  | $F = A \cdot \overline{B}$  |
| 0  | 0  | 1  | 1  | F = NOT(B)          | $F = \overline{B}$          |
| 0  | 1  | 0  | 0  | F = NOT(A) AND B    | $F = \overline{A} \cdot B$  |
| 0  | 1  | 0  | 1  | F=NOT(A)            | $F = \overline{A}$          |
| 0  | 1  | 1  | 0  | F=A XOR B           | $F = A \oplus B$            |
| 0  | 1  | 1  | 1  | F=A NAND B          | $F = \overline{A \cdot B}$  |
| 1  | 0  | 0  | 0  | F=A AND B           | $F = A \cdot B$             |
| 1  | 0  | 0  | 1  | F=A XNOR B          | $F = \overline{A \oplus B}$ |
| 1  | 0  | 1  | 0  | Transfers A         | F = A                       |
| 1  | 0  | 1  | 1  | If B then F=A       | $F = A + \overline{B}$      |
| 1  | 1  | 0  | 0  | Transfers B         | F = B                       |
| 1  | 1  | 0  | 1  | If A then F=B       | $F = \overline{A} + B$      |
| 1  | 1  | 1  | 0  | F=A OR B            | F = A + B                   |
| 1  | 1  | 1  | 1  | Always outputs 1    | F = 1                       |

TABLE I. THE SIXTEEN LOGIC FUNCTIONS CORRESPONDING TO 4-BIT CONFIGURATION PATTERNS



b7 b6 b5 b4 b3 b2 b1 b0

The configuration file increases according to the dimensions of the Cartesian structure and the number of inputs and outputs. In the present implementation, the register file consists of 8-bit registers, since they are compatible with 8bit communication over the serial port. The proposed register file architecture is shown in Fig. 3. Following this scheme, the configuration of the hyper-structure of Fig. 1 requires four bytes for input routing and sixteen bytes for cell configuration. If the circuit produces two outputs, then two additional bytes are needed. In this, way, the configuration file includes many redundant bits which however can be used in future expansions. For example, attributing one byte to each pair of input multiplexers, allows for up to four useful selection bits or up to sixteen input channels. This is more than the number of inputs required in most of our present evolution tests. Also, according to Fig. 3, one 8-bit register is attributed per lattice cell. Although only the four lower bits are useful in the present design, the higher bits can be used in later upgrades in order to support function generators with 3-input LUTs. The role of input, output and configuration bits in the basic 2-input LUT cell is shown in Fig. 2.

#### B. CCS-2: A more General Cartesian Structure

An alternative Cartesian Structure (CCS-2) is presented in Fig. 4. The configurable cells belong again to an NxM lattice; however the interconnection grid is more flexible than that of CCS-1, since it is implemented by multiplexers allowing sets of predefined connections. The output of each cell can be selected to provide input to four different neighboring cells, namely to three forward cells in the next column and to the adjacent cell on the row below.

Figure 3. Architecture of the 8-bit register file used for the configuration of the CCS structure of Fig. 1. Indices correspond to the cells of the 2D lattice.

Each cell input can be connected to one of two possible outputs. The selection process is achieved by two-to-one multiplexers. The interconnection grid has again a cylindrical structure as indicated by the arrows in Fig. 4. In this case, the cylindrical interconnections allow the creation of feedback loops, since an output can be transferred through a column and return as input to the same cell. For example, the output of cell 3 can go through cells 7, 11, 15 and return as input to cell 3.



Figure 4. The hyperstructure CCS-2. Two-input multiplexers are used for the routing of interconnections between cells.

|          | in1-B    | Muxin1-A    |  |  |  |  |
|----------|----------|-------------|--|--|--|--|
| muxb-1,1 | muxa-1,1 | Config(1,1) |  |  |  |  |
| muxb-1,2 | muxa-1,2 | Config(1,2) |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
| muxb-1,M | muxa-1,M | Config(1,M) |  |  |  |  |
| Mux      | in2-B    | Muxin2-A    |  |  |  |  |
| muxb-2,1 | muxa-2,1 | Config(2,1) |  |  |  |  |
| muxb-2,2 | muxa-2,2 | Config(2,2) |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
| muxb-2,M | muxa-2,M | Config(2,M) |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
| Mux      | inN-B    | MuxinN-A    |  |  |  |  |
| muxb-N,1 | muxa-N,1 | Config(N,1) |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
|          |          |             |  |  |  |  |
| muxb-N,M | muxa-N,M | Config(N,M) |  |  |  |  |
|          | Mux out1 |             |  |  |  |  |
|          | Mux out2 |             |  |  |  |  |
|          |          |             |  |  |  |  |
| b7 b6    | b5 b4    | b3 b2 b1 b0 |  |  |  |  |
| 07 00    | 05 04    | 03 02 01 00 |  |  |  |  |

Figure 5. Architecture of the 8-bit register file used for the configuration of CCS-2. Indices correspond to the cells of the 2D lattice.

The register file employed for the configuration of CCS-2 is shown in Fig. 5. In this file, each 8-bit cell register is divided in a four-bit nibble for cell configuration (b3 down to b0) and a nibble for multiplexer configuration (only bits b4 and b6 are used). Input multiplexers have dedicated registers at the beginning of each row, while first column cells use only the lower nibble of a configuration register.

Variations of the above Cartesian structures can lead to a

trade-off between interconnection flexibility and reduced complexity. More interconnection options increase the possibility to reach a solution. At the same time, the search space is expanded and complexity is increased. A fixed interconnection grid can reduce complexity for some problems but it may also require a larger grid in order to implement a solution.

#### III. CONTROLLER ARCHITECTURE

The main setup of the proposed workbench consists of a PC running the evolutionary algorithm and an FPGA board. The FPGA is configured to implement the CCS design and supportive hardware logic for configuration, testing and control. A dedicated custom controller and datapath was designed for the configuration of the Cartesian structure in the FPGA device. The datapath includes logic which produces the test patterns for the evaluation of each circuit configuration and returns a fitness result to the evolutionary algorithm. The overall architecture is based on 8-bit registers and is presented in Fig. 6. It includes a UART peripheral controller supporting communication with the host PC application over the serial port and a streaming controller implementing the algorithmic steps of the configuration and testing procedure, in the form of a finite state machine. Other system blocks are the register file for the storage of configuration data, the CCS structure which is configured by the evolutionary algorithm and a "ground truth" block, where the target logic is implemented. Finally, a computational block extracts the Hamming distance between the truth tables of the target logic and the CCS logic under test.

The heart of the system is the streaming controller. It produces clock and control signals to all other blocks and makes data available to other blocks through the system bus. It can initiate a UART "receive" or "transmit" operation, it clocks successive test inputs to the CCS and returns the Hamming distance to the host computer, as a fitness result



Figure 6. Block diagram of the implemented system architecture.

for circuit evaluation. Then, the genetic algorithm evaluates the result and produces a new genotype in the form of a new configuration array. This procedure is repeated until the genetic algorithm reaches a predefined number of generations. Fig. 7 presents the basic state diagram of the streaming controller, between successive configurations. At the beginning, the controller is at the "idle" state waiting for a protocol character, signaling the beginning of a configuration stream. The controller enters the "receive" state and counts the number of received data. It repeats the reception until all expected data in the configuration array have been received. Then, a "test" process begins, where the controller employs a finite state machine in order to create successive test patterns as input to the CCS and the ground truth blocks. At each repetition, a clock pulse is sent to the Hamming distance block, where the Hamming distance is accumulated. When all test patterns have been tested, the total Hamming distance is transmitted back to the computer via the serial port. The controller returns to the "idle" state waiting for a new configuration array.

#### IV. TESTS AND EVALUATION

At the present stage, the proposed workbench is used to configure a number of test circuits in the CCS. The system is evaluated in terms of the required hardware resources and total response time. The response time is significant in evolution experiments, since the configuration cycle is repeated for hundreds of thousands times.



Figure 7. State diagram of the implemented controller.



Figure 8. Example configuration of the full-adder, implemented with the Cartesian structure of Fig. 1 (CCS-1).

The structures were verified with a number of test configurations. The following widely used test circuits were implemented: a. the half adder, b. the full adder, c. the 2:4 binary decoder, d. the 2:1 and 4:1 multiplexer, e. the 2-bit multiplier. These circuits can be effectively implemented by both hyper-structures employing grids of variable sizes. The possibility for feedback loops in CCS-2 can be used to implement latches. The list of our test-circuits is therefore concluded with f. the S-R latch g. the D-latch.

An interesting implementation is that of the full adder. CCS-1 can implement the full adder using a 4x3 cell grid configured as shown in Fig. 8. Several cells are configured as "transfer" gates. Eighteen configuration bytes are required in this example. Two bytes correspond to the output multiplexers. CCS-2 can implement the same circuit in a 3x3 grid. An implementation of the S-R latch is shown in Fig. 9.

The resource requirements of the overall system shown in Fig. 6 are quite low. As shown in Table II, the supportive control-and-test logic requires 220 logic elements (LE) and 150 registers, while the CCS structures require an increasing amount of LE out of a Cyclone II 2C35 FPGA device.



Figure 9. S-R latch implemented using the CCS-2 structure.

| TABLE II.              | RESOURCE USAGE (CYII 2C35F672) |                 |  |  |  |  |
|------------------------|--------------------------------|-----------------|--|--|--|--|
| Hardware block         | Logic Elements                 | Total registers |  |  |  |  |
| Control and test logic | 220                            | 150             |  |  |  |  |
| CCS-1 (2x2)            | 19                             | 18              |  |  |  |  |
| CCS-1 (4x4)            | 126                            | 79              |  |  |  |  |
| CCS-1 (8x8)            | 438                            | 293             |  |  |  |  |
| CCS-1 (16x16)          | 1521                           | 1099            |  |  |  |  |
| CCS-2 (2x2)            | 79                             | 40              |  |  |  |  |
| CCS-2 (4x4)            | 208                            | 127             |  |  |  |  |
| CCS-2 (8x8)            | 591                            | 429             |  |  |  |  |
| CCS-2 (16x16)          | 2003                           | 1603            |  |  |  |  |

. . . . . .

CCS-1 and CCS-2 refer to the structures of Figures 1 and 4, respectively. The number of required LEs follows an almost linear dependence on the number of cells in the structure. The FPGA device used in our experiments provides a total of 33216 LE; therefore, very large structures can be implemented. The system was clocked at 100 MHz.



Figure 10. Total time for configuration and fitness response, as a function of grid size.

Another test concerns the response time for the full CCS configuration and response loop. Fig. 10 shows the total response time measured from the beginning of the transmission of the configuration string until the reception of the Hamming distance, for various sizes of the cell array. The implemented baud rate is 115Kbps. Since the total response time is within several milliseconds, the system can implement and test a large number of phenotypes within a reasonable time interval.

#### V. CONCLUSIONS

A workbench for intrinsic evolution of digital circuits is proposed. Genotypes are encoded following the principles of Cartesian Genetic Programming, while phenotypes are implemented in a reconfigurable device, making use of expandable 2D arrays of cells. As opposed to previous implementations, the proposed hyper-structures are based on a cylindrical interconnection grid, which reduces complexity and increases interconnection flexibility. Also, the proposed grids allow for feed-forward as well as for feed-back connections between the matrix cells.

A custom embedded controller configures the hyperstructures at run time while additional supportive task logic produces the required test patterns for fitness evaluation. The system is verified by implementing a series of test circuits and is evaluated in terms of the required resources and response time, for various matrix dimensions.

#### ACKNOWLEDGMENT

This work has been co-financed by the European Union (European Social Fund – ESF) and Greek national funds through the Operational Program "Education and Lifelong Learning" of the National Strategic Reference Framework (NSRF) - Research Funding Program: ARCHIMEDES III, Investing in knowledge society through the European Social Fund.

#### REFERENCES

- [1] Evolvable Hardware, T. Higuchi, Y. Liu, and X. Yao, Eds. Springer Science & Business Media, vol. 11, 2006.
- [2] P.C. Haddow and A. M. Tyrrell, "Challenges of evolvable hardware: past, present and the path to a promising future", Genetic Programming and Evolvable Machines, vol. 12, no. 3, 2011, pp. 183-215.
- [3] A. Thompson, P. Layzell, and R.S. Zebulum, "Exploration in design space: unconventional electronics design through artificial evolution", IEEE Transactions on Evolutionary Computation, vol. 3, no. 3, 1999, pp. 167-196.
- [4] J. Miller, P. Thomson, and T. Fogarty, "Designing electronic curcuits using evolutionary algorithms. Arithmetic circuits: a case study", Genetic Algorithm and Evolution Strategies in Engineering and Computer Science, D. Quagliarella, J. Periaux, C. Poloni, and G. Winter, Eds. Chechester, UK: Wiley, 1997, pp. 105-131.
- [5] S. Kazarlis, J. Kalomiros, A. Balouktsis, and V. Kalaitzis, "Evolving optimal digital circuits using Cartesian genetic programming with solution repair methods", in Proc. of the 2015 International Conference on Systems, Control, Signal Processing and Informatics (SCSI 2015), Barcelona, Spain, April 7-9, 2015, pp. 39-44.
- [6] Z. Vasicek and L. Sekanina. "An evolvable hardware system in Xilinx Virtex II Pro FPGA". International Journal of Innovative Computing and Applications, vol. 1, no. 1, 2007, pp. 63-73.
- [7] A. Thompson, Hardware evolution: Automatic design of electronic circuits in reconfigurable hardware by artificial evolution, Springer Science & Business Media, 2012.
- [8] L. Sekanina, "Evolvable computing by means of evolvable components", Natural Computing, vol. 3, 2004, pp. 323-355.
- [9] J. Miller, D. Job, and V. Vassilev, "Principles in evolutionary design of digital circuits - part I", Genetic Programming and Evolvable Machines, vol. 1, 2000, pp. 7-35.
- [10] J. R. Koza, M. A. Keane, and M. J. Streeter, "What's AI done for me lately? Genetic programming's humancompetitive results", IEEE Intelligent Systems, vol. 18, no.3, 2003, pp. 25-31.
- [11] J. F. Miller and P. Thompson, "Cartesian genetic programming", in LNCS, Euro GP 2000, vol. 1802, R. Poli, W. Bazhaf, W.B. Langdon, J. Miller, P. Nordin, and T.C. Fogarty, Eds. Heidelberg: Springer, 2000, pp. 121-132.
- [12] L. Sekanina, and R. Ruzicka, "Design of the special fast reconfigurable chip using common FPGA", in Proc. of the IEEE Design and Diagnostics of Electronic Circuits and Systems Workshop, Bratislava, Smolenice, 2000, pp. 161-168.
- [13] L. Sekanina, Evolvable components: from theory to hardware implementations, Springer Science & Business Media, 2012.
- [14] L. Sekanina and S. Friedl, "An evolvable combinational unit for FPGAs", Computing & Informatics, vol. 23, no. 5, 2004, pp. 461-486.

# Design and Implementation of a 94 GHz CMOS Down-Conversion Mixer for Image Radar Sensors

Yo-Sheng Lin, Chien-Chin Wang, Guo-Hao Li, and Jay-Min Liu

Department of Electrical Engineering, National Chi Nan University, Puli, Taiwan, ROC

Emails: stephenlin@ncnu.edu.tw, s99323901@ncnu.edu.tw, s101323507@ncnu.edu.tw, s102323518@ncnu.edu.tw

Abstract- A 94 GHz down-conversion mixer for image radar sensors using standard 90 nm CMOS technology is reported. The down-conversion mixer comprises a double-balanced Gilbert cell with peaking inductors between RF transconductance stage and LO switching transistors for Conversion Gain (CG) enhancement and Noise Figure (NF) suppression, a Marchand balun for converting the single RF input signals to differential signals, another Marchand balun for converting the single LO input signals to differential signals, and an IF amplifier. The mixer consumes 22.5 mW and achieves excellent RF-port input reflection coefficient of -7.1~ -35.9 dB for frequencies of 82.6~96 GHz, and LO-port input reflection coefficient of -10~ -35.9 dB for frequencies of 88.2~110 GHz. In addition, the mixer achieves CG of -3.4~ -6.4 dB for frequencies of 85~97 GHz (the corresponding 3-dB CG bandwidth is 12 GHz) and LO-RF isolation of 41~47.2 dB for frequencies of 90~100 GHz, one of the best CG and LO-RF isolation results ever reported for a downconversion mixer with operation frequency around 94 GHz. Furthermore, the mixer achieves an excellent input third-order intercept point (IIP3) of -3 dBm at 94 GHz. These results demonstrate the proposed down-conversion mixer architecture is promising for 94 GHz image radar sensors.

Keywords-CMOS; down-conversion mixer; conversion gain; noise figure; LO-RF isolation

#### I. INTRODUCTION

Recently, several excellent GaAs down-conversion mixers for operation frequencies around 94 GHz have been reported [1]-[3]. For example, in [1], a 90~112 GHz image reject downconversion mixer with an improved Lange coupler in 0.15 µm GaAs PHEMT process is demonstrated. Though wide bandwidth of 22 GHz was achieved, its performances, such as CG of -10 dB, LO-RF isolation of 30 dB, and chip area of 4 mm<sup>2</sup> are not good enough. In [2], a 90~97 GHz single balanced down-conversion mixer using a rat-race hybrid ring with five ports and two GaAs Schottky diodes is reported. Though wide bandwidth of 7 GHz was achieved, its conversion gain of -12.6 is not satisfactory. In [3], a 94 GHz single balanced down-conversion mixer using branch line couplers in 0.1 µm GaAs process is demonstrated. Similarly, its performances, such as CG of -14.7 dB, LO-RF isolation of 34.2 dB, and chip area of 3.38 mm<sup>2</sup> are not good enough. In this work, to demonstrate that low power dissipation (< 25 mW), high CG (> -5 dB), excellent LO-RF isolation (> 40 dB) and small chip



Figure 1. Block diagram of the proposed 94 GHz down-conversion mixer.



Figure 2. (a) Schematic, and (b) chip microphotograph of the 94 GHz CMOS down-conversion mixer.

area (< 1 mm<sup>2</sup>) can be achieved simultaneously for a CMOS down-conversion mixer with operation frequency around 94GHz, we report a miniature low-power 94 GHz down-

conversion mixer with excellent CG, 3-dB bandwidth ( $\omega_{3dB}$ ) and port-to-port isolation properties using cost-effective standard 90 nm CMOS technology. In Section 2, circuit design is introduced. In Section 3, we demonstrate the measurement results and provide some discussions. Section 4 presents the conclusion.

## **II. CIRCUIT DESIGN**

The 94 GHz down-conversion mixer was designed and implemented in a standard 90 nm CMOS process provided by a commercial foundry. This technology offers 9 metal layers, named  $MT_1$  to  $MT_9$  from bottom to top. The thickness of  $MT_9$  is 3.4 µm, and that of  $MT_8$ ,  $MT_7 \sim MT_2$  and  $MT_1$  is 0.85 µm, 0.31 µm and 0.24 µm, respectively. The interconnection lines as well as the microstrip-line (MSL) inductors were implemented with the 3.4-µm-thick topmost metal to minimize the resistive loss. Figure 1 shows the block diagram of the proposed 94 GHz down-conversion mixer.

Figure 2(a) shows the schematic of the 94 GHz CMOS down-conversion mixer. The mixer comprises a doublebalanced Gilbert cell, a miniature wideband Marchand balun for converting the single RF input signal to differential signal, another miniature wideband Marchand balun for converting the single LO input signal to differential signal, and an IF amplifier (which constitutes a resistive source-degeneration IF differential amplifier followed by a source-follower IF buffer amplifier). Note that the double-balanced Gilbert cell has peaking inductors between RF transconductance stage and LO switching transistors for CG improvement and NF suppression. With the addition of the tail current source comprising transistor M7 and resistor R9, the RF transconductance stage operates as an elegant, yet robust differential pair. The current of the tail current source is also mirrored to the IF buffer amplifier constitutes transistors  $M_{10}$ ~ $M_{13}$  and resistors  $R_{12}$ ~ $R_{13}$ . The driving current of the IF buffer amplifier can be tuned by varying the resistance of resistors R<sub>12</sub>~R<sub>13</sub>. The chip micrograph of the mixer is shown in Figure 2(b). The chip area is only  $0.69 \times 0.84 \text{ mm}^2$  excluding the test pads.

Figure 3(a) shows the schematic diagram of the Marchand balun used in the mixer. It is designed based on the "lumpedelement" Marchand balun structure proposed in [4]. Such a balun structure is advantageous in terms of its excellent amplitude/phase match and broadband response compared with the traditional single-to-differential transformers. Instead of the area-consumed straight-line or U-shaped MSL structures, the miniature spiral coil MSL structure, i.e. with the patterned MT<sub>1</sub> ground plane (with MT<sub>1</sub> density of about 56%) underneath and around the MSL structure, as shown in Figure 3(b), is adopted to implement the needed inductor elements in the baluns. The metal width and space are 4  $\mu$ m and 2  $\mu$ m, respectively. The balun consists of an unbalanced input (Port 1) with 50  $\Omega$  terminal impedance, an open terminal (O.C.), two short terminals (GND) and two balanced outputs (Port 2 and Port 3)



Figure 3. (a) Schematic diagram, (b) metal-1 patterned ground plane, and (c) lump-element equivalent circuit of the proposed 94-GHz-band Marchand balun.



Figure 4. Simulated input reflection coefficients of the down-conversion mixer (a) at RF-port and LO-port and (b) at IF-port.



Figure 5. Simulated CG versus frequency characteristics of the downconversion mixer.



Figure 6. Measured (a) input reflection coefficients at RF-port ( $S_{11}$ ), and (b) input reflection coefficients at LO-port ( $S_{22}$ ) versus frequency characteristics of the down-conversion mixer.



Figure 7. Measured CG versus RF frequency characteristics of the downconversion mixer.



Figure 8. Measured LO-RF isolation versus frequency characteristics of the down-conversion mixer.



Figure 9. Measured and simulated NF versus frequency characteristics of the down-conversion mixer.

with 50  $\Omega$  terminal impedance. Note that the coils of the balun are implemented by the 3.4- $\mu$ m-thick topmost metal (MT<sub>9</sub>) to minimize the resistive loss. Only the underpass interconnection lines are realized by MT<sub>8</sub>.

Figure 3(c) shows the lump-element balun equivalent circuit [4]. The spiral coil couple-line is modeled by the lump inductor L, and the capacitor C models the coupling capacitance effect produced from the spiral coil couple-line. That is, the capacitors are realized as the parasitic components of the inductors. Port 1 is the unbalanced RF input port (or LO output port), and port 2 and port 3 are the balanced RF+ and RF– output port (or LO+ and LO– input port), respectively. In a network, this lump-element balun can be regarded as an out-of-phase power splitter, including a parallel-connected high-pass filter and a band-pass filter. The signals through the output ports of the ideal balun have equal power but are 180° out-of-phase; all ports (except the O.C. port) have an input impedance of 50  $\Omega$  (i.e. Z<sub>0</sub>).

Figure 4(a) shows the simulated input reflection coefficients at RF-port (S<sub>11</sub>) versus RF frequency characteristics of the down-conversion mixer. The mixer achieves S<sub>11</sub> of -10.8 dB at 94 GHz, and S<sub>11</sub> smaller than -10 dB for RF frequencies of 90.5~96.2 GHz. That is, the simulated -10 dB input matching bandwidth at RF-port is 5.7 GHz. What is also shown in Figure 4(a) is the simulated input

| References          | Topology                                               | RF Frequency<br>(GHz) | IF Frequency<br>(GHz) | CG<br>(dB) | LO-RF<br>Isolation (dB) | Power<br>(mW) | Chip Area<br>(mm <sup>2</sup> ) | Technology<br>(nm)     | f <sub>T</sub> /f <sub>max</sub><br>(GHz) |
|---------------------|--------------------------------------------------------|-----------------------|-----------------------|------------|-------------------------|---------------|---------------------------------|------------------------|-------------------------------------------|
| This Work           | Gilbert-Cell with Source-<br>Follower Output Buffer    | 94                    | 0.1                   | -3.4       | 47.5                    | 22.5          | 0.58                            | CMOS<br>(90)           | 152/157                                   |
| [1]<br>(2011 MTT-S) | Image-Reject with<br>Improved Lange Coupler            | 94                    | 6                     | -10        | 30                      | NA            | 4                               | GaAs PHEMT<br>(150)    | NA                                        |
| [2]<br>(2010 ICMMT) | Rat-Race Hybrid-Ring<br>with Five Ports and Two Diodes | 94                    | 0.5                   | -12.6      | NA                      | NA            | NA                              | GaAs Schottky<br>Diode | NA                                        |
| [3]<br>(2012 GSMM)  | Single-Balanced Using<br>Branch Line Couplers          | 94                    | 0.325                 | -14.7      | 34.2                    | NA            | 3.38                            | GaAs MHEMT<br>(100)    | 189/334                                   |

TABLE I. SUMMARY OF THE IMPLEMENTED 94 GHZ CMOS DOWN-CONVERSION MIXER, AND RECENTLY REPORTED STATE-OF-THE-ART DOWN-CONVERSION MIXERS WITH OPERATION FREQUENCY AROUND 94 GHZ.

reflection coefficients at LO-port ( $S_{22}$ ) versus LO frequency characteristics of the down-conversion mixer. The mixer achieves  $S_{22}$  of -12.8 dB at 94 GHz, and  $S_{22}$  smaller than -10dB for RF frequencies of 85~110.2 GHz. That is, the simulated -10 dB input matching bandwidth at LO-port is 25.2 GHz.

Figure 4(b) shows the simulated input reflection coefficients at IF-port ( $S_{33}$ ) versus IF frequency characteristics of the down-conversion mixer. The mixer achieves excellent  $S_{33}$  of  $-10.3 \sim -15.3$  dB for IF frequencies of  $0 \sim 3$  GHz.

Figure 5 shows the simulated CG versus frequency characteristics of the mixer both with and without the IF amplifier. RF input power is -40 dBm and LO input power is 4 dBm. The mixer achieves CG of 8.7 dB at 94 GHz, and CG of 7.82~8.72 dB for frequencies of 90~100 GHz, one of the best CG results ever reported for a down-conversion mixer with operation frequency around 94 GHz. The corresponding 3-dB bandwidth is larger than 21.8 GHz (78.2~100 GHz). In the case without the IF amplifier, the mixer achieves inferior CG of 0.84 dB at 94 GHz, and CG of 0.36~0.85 dB for frequencies of 90~100 GHz. The corresponding 3-dB bandwidth is larger than 21.1 GHz (78.9~100 GHz).

### **III. MEASUREMENT RESULTS AND DISCUSSIONS**

On-wafer measurements were performed by an Agilent's 110 GHz RFIC measurement system. The down-conversion mixer is biased in the condition of  $V_{DD} = 1.3$  V and  $I_{DD} = 17.3$  mA. That is, the simulated power consumption of the mixer is 22.5 mW. Figure 6(a) shows the measured S<sub>11</sub>. The mixer achieves excellent S<sub>11</sub> of -8.3 dB at 94 GHz, and S<sub>11</sub> of -7.1~ -35.9 dB for frequencies of 82.6~96 GHz. Figure 6(b) shows the measured S<sub>22</sub>. The mixer achieves excellent S<sub>22</sub> of -14.9 dB at 94 GHz, and S<sub>22</sub> of -10~ -35.9 dB for frequencies of 88.2~110 GHz. That is, the measured -10 dB LO input matching bandwidth is larger than 21.8 GHz.

Figure 7 shows the measured CG versus frequency characteristics of the down-conversion mixer. The mixer achieves maximum CG of -3.4 dB at 91 GHz and CG of  $-3.4 \sim -6.4$  dB for frequencies of 85~97 GHz, one of the best CG results ever reported for a down-conversion mixer with

operation frequency around 94 GHz. The corresponding 3-dB bandwidth is 12 GHz (85~97 GHz).

Figure 8 shows the measured LO-RF isolation versus frequency characteristics of the mixer. The mixer achieves LO-RF isolation of 41~47.2 dB for frequencies of 90~100 GHz, one of the best LO-RF isolation results ever reported for a down-conversion mixer with operation frequency around 94 GHz. Furthermore, the mixer achieves an excellent IIP3 of -3 dBm at 94 GHz (not shown here).

Figure 9 shows the measured and simulated NF versus frequency characteristics of the down-conversion mixer. As can be seen, the measured results conform with the simulated ones well. The mixer achieves NF of 23.2 dB at 94 GHz, and NF of 22.4~24.4 dB for frequencies of 90~100 GHz.

Table I is a summary of the implemented 90~100 GHz CMOS down-conversion mixer, and recently reported state-ofthe-art down-conversion mixers with operation frequency around 94 GHz. Compared with the 90~112 GHz image reject down-conversion mixer in [1], the proposed mixer exhibits better CG and LO-RF isolation, and smaller chip area. Compared with the 90~97 GHz single balanced GaAs down-conversion mixer in [2], the proposed mixer exhibits better CG. Compared with the 94 GHz single balanced GaAs down-conversion mixer in [3], the proposed mixer exhibits better CG and LO-RF isolation, and smaller chip area. These results indicate that our proposed down-conversion mixer is suitable for W-band transceiver systems.

#### IV. CONCLUSION

In this work, we reported a 90~100 GHz CMOS downconversion mixer comprises a double-balanced Gilbert-cell, a miniature wideband RF Marchand balun, a miniature wideband LO Marchand balun, and an IF amplifier. The mixer consumes 22.5 mW and achieves excellent CG of -3.4~ -6.4 dB for frequencies of 85~97 GHz, that is, the corresponding 3-dB bandwidth of RF is 12 GHz. Moreover, excellent LO-RF isolation is also achieved. These results highlight the potential application of the proposed downconversion mixer architecture in 94 GHz and even higher frequency communication systems.

# REFERENCES

- Y. C. Wu, S. K. Lin, C. C. Chiong, Z. M. Tsai, and H. Wang, "A W-Band Image Reject Mixer for Astronomical Observation System," IEEE MTT-S International Microwave Symposium, 2011, pp. 1-4.
- [2] W. Zhao, Y. Zhang, and M. Z. Zhan, "Design and Performance of a W-Band Microstrip Rat-Race Balanced Mixer," International Conference on Microwave and Millimeter Wave Technology, 2010, pp. 713-716.
- Microwave and Millimeter Wave Technology, 2010, pp. 713-716.
  [3] S. J. Lee, T. J. Baek, M. Han, S. G. Choi, D. S. Ko, and J. K. Rhee, "94 GHz MMIC Single-Balanced Mixer for FMCW Radar Sensor Application," Global Symposium on Millimeter Waves, 2012, pp. 351-354.
- [4] P. C. Yeh, W. C. Liu, and H. K. Chiou, "Compact 28-GHz Subharmonically Pumped Resistive Mixer MMIC Using a Lumped-Element High-Pass/Band-Pass Balun," IEEE Microwave and Wireless Components Letters, vol. 15, no. 2, Feb. 2005, pp. 62-64.