LOW POWER CIRCUITS FOR SMART FLEXIBLE ECG SENSORS

YANG ZHAO

A DISSERTATION SUBMITTED TO
THE FACULTY OF GRADUATE STUDIES
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY

GRADUATE PROGRAMME IN ELECTRICAL ENGINEERING AND
COMPUTER SCIENCE
YORK UNIVERSITY
TORONTO, ONTARIO

AUGUST 2019

© YANG ZHAO, 2019
ABSTRACT

Cardiovascular diseases (CVDs) are the world leading cause of death. In-home heart condition monitoring effectively reduced the CVD patient hospitalization rate. Flexible electrocardiogram (ECG) sensor provides an affordable, convenient and comfortable in-home monitoring solution. The three critical building blocks of the ECG sensor i.e., analog frontend (AFE), QRS detector, and cardiac arrhythmia classifier (CAC), are studied in this research.

A fully differential difference amplifier (FDDA) based AFE that employs DC-coupled input stage increases the input impedance and improves CMRR. A parasitic capacitor reuse technique is proposed to improve the noise/area efficiency and CMRR. An on-body DC bias scheme is introduced to deal with the input DC offset. Implemented in 0.35μm CMOS process with an area of 0.405mm², the proposed AFE consumes 0.9μW at 1.8V and shows excellent noise effective factor of 2.55, and CMRR of 76dB. Experiment shows the proposed AFE not only picks up clean ECG signal with electrodes placed as close as 2cm under both resting and walking conditions, but also obtains the distinct α-wave after eye blink from EEG recording.

A personalized QRS detection algorithm is proposed to achieve an average positive prediction rate of 99.39% and sensitivity rate of 99.21%. The user-specific template avoids the complicate models and parameters used in existing algorithms while covers most situations for practical applications. The detection is based on the
comparison of the correlation coefficient of the user-specific template with the ECG segment under detection. The proposed one-target clustering reduced the required loops.

A continuous-in-time discrete-in-amplitude (CTDA) artificial neural network (ANN) based CAC is proposed for the smart ECG sensor. The proposed CAC achieves over 98% classification accuracy for 4 types of beats defined by AAMI (Association for the Advancement of Medical Instrumentation). The CTDA scheme significantly reduces the input sample numbers and simplifies the sample representation to one bit. Thus, the number of arithmetic operations and the ANN structure are greatly simplified. The proposed CAC is verified by FPGA and implemented in 0.18μm CMOS process. Simulation results show it can operate at clock frequencies from 10KHz to 50MHz. Average power for the patient with 75bpm heart rate is 13.34μW.
ACKNOWLEDGEMENTS

I would first like to thank my supervisor, Dr. Yong (Peter) Lian, for his vast guidance in my research and the four-year academic life. I still remember his warm welcome and exhaustive introduction at the first time I came here. His smart and patient support in my course study, scholarship application, research, industrial cooperation, and writing of papers, inspired me, promoting the success of this research. The invaluable experiences with Dr. Lian opened the scientific study door of my life.

I would also like to acknowledge Prof. Ebrahim Ghafar-Zadeh and Prof. Sebastian Magierowski, the member of my supervisor committee. Their important suggestions and comments on my research proposal and topic form the final structure of this study.

I would like to express my thanks to my colleagues. I’m grateful for the great help from Zhongxia Shang in each design. The testing arrangement assisted by Chen Xi facilitates this research; I appreciate his time and effort. Without the valuable design tips from Dr. Chundong Wu and low power design experiences from Dr. Xiaoyang Zhang, this research wouldn’t conduct so smoothly. Much gratitude to the discussions with Qingsong Xie, who helps improve my algorithm efficiency.

In addition, I would like to express my gratitude to Prof. Guoxing Wang for his support of testing equipment during my internship. I also like to thank Dr. Yu Pu for his patient assistance on the ASIC implementation of CAC.
Finally, I would like to thank my family and friends for their support and counsel. The encourage of my parents kicks out my research anxieties. Parties, sports, and many other activities with my friends give me a rich and colourful research life.
LIST OF PUBLICATIONS


TABLE OF CONTENTS

Abstract ........................................................................................................................................... ii

Acknowledgements ....................................................................................................................... iv

List of Publications ....................................................................................................................... vi

Table of Contents ........................................................................................................................ ix

List of Tables ................................................................................................................................ xiii

List of Figures ............................................................................................................................ xiv

List of Acronyms .......................................................................................................................... xix

Chapter 1 Introduction .................................................................................................................... 1

1.1 Motivation.................................................................................................................................. 1

1.2 Flexible ECG sensor ................................................................................................................ 5

1.3 Existing Issues in the Designing of Flexible ECG Sensor .................................................... 7

1.4 Objectives of Research ........................................................................................................... 11

Chapter 2 Review of ECG Interface for Wearable Applications ................................................ 13

2.1 Brief Introduction of ECG Signal Analog Front-End ............................................................... 13

2.1.1 Design Challenges for AFE ............................................................................................. 14

2.1.2 Resistive Three-Amplifier AFE ..................................................................................... 22

2.1.3 AC-coupled Pseudo Resistor AFE ................................................................................. 23

2.1.4 DC-coupled Pseudo Resistor AFE ................................................................................. 28

2.1.5 Chopper Stabilization ...................................................................................................... 32
<table>
<thead>
<tr>
<th>Section</th>
<th>Title</th>
<th>Page</th>
</tr>
</thead>
<tbody>
<tr>
<td>2.1.6</td>
<td>Differential Difference Amplifier</td>
<td>34</td>
</tr>
<tr>
<td>2.1.7</td>
<td>Summary of AFE</td>
<td>35</td>
</tr>
<tr>
<td>2.2</td>
<td>QRS detection</td>
<td>37</td>
</tr>
<tr>
<td>2.3</td>
<td>Machine Learning based Cardiac Arrhythmia Classification</td>
<td>42</td>
</tr>
<tr>
<td>2.4</td>
<td>Summary</td>
<td>46</td>
</tr>
<tr>
<td></td>
<td><strong>CHAPTER 3</strong> A 2.55 NEF 76dB CMRR DC-Coupled FDDA AFE</td>
<td>47</td>
</tr>
<tr>
<td>3.1</td>
<td>FDDA DC-coupled AFE</td>
<td>48</td>
</tr>
<tr>
<td>3.1.1</td>
<td>FDDA Instrumentation Amplifier</td>
<td>48</td>
</tr>
<tr>
<td>3.1.2</td>
<td>Back-to-back Connected Pseudo Resistor</td>
<td>52</td>
</tr>
<tr>
<td>3.1.3</td>
<td>On-body DC Biasing</td>
<td>54</td>
</tr>
<tr>
<td>3.1.4</td>
<td>Design Considerations of Proposed FDDA</td>
<td>56</td>
</tr>
<tr>
<td>3.1.5</td>
<td>Noise Analysis</td>
<td>58</td>
</tr>
<tr>
<td>3.1.6</td>
<td>Layout of Large Size Input Transistor</td>
<td>61</td>
</tr>
<tr>
<td>3.1.7</td>
<td>PGA</td>
<td>62</td>
</tr>
<tr>
<td>3.2</td>
<td>Simulation Results</td>
<td>63</td>
</tr>
<tr>
<td>3.3</td>
<td>Measurement Results</td>
<td>67</td>
</tr>
<tr>
<td>3.4</td>
<td>Conclusion</td>
<td>73</td>
</tr>
<tr>
<td></td>
<td><strong>CHAPTER 4</strong> Personalized QRS Detection Based on One Target Clustering and Correlation Coefficient</td>
<td>74</td>
</tr>
<tr>
<td>4.1</td>
<td>User Adaptive Detection</td>
<td>75</td>
</tr>
</tbody>
</table>
### 4.1.1 Peak Detection ................................................................. 77
### 4.1.2 One Target Clustering .................................................... 78
### 4.1.3 Correlation Coefficient .................................................. 82
### 4.2 Adaptive Thresholding ...................................................... 84
### 4.3 Experiment Results .......................................................... 88
### 4.4 Conclusion ........................................................................ 91

#### CHAPTER 5 An Event-driven Patient Specific ANN-CAC ......................... 93

### 5.1 Data Preparation .................................................................. 94
#### 5.1.1 Identity of ECG .............................................................. 95
#### 5.1.2 Conditional Data Grouping Scheme (CGS) ......................... 97

### 5.2 Proposed CTDA ANN-CAC ............................................... 99
#### 5.2.1 CTDA vs Nyquist Sampling ............................................. 100
#### 5.2.2 Processing CTDA Signal for ANN-CAC .......................... 102
#### 5.2.3 Topology of Proposed CTDA ANN-CAC ...................... 104

### 5.3 Design Considerations ........................................................ 108
#### 5.3.1 SRAM ......................................................................... 109
#### 5.3.2 SPI ............................................................................ 110
#### 5.3.3 Multiplier ..................................................................... 112
#### 5.3.4 Top FSM .................................................................... 119
#### 5.3.5 FSM for Conditional Accumulation (FSM-CA) .................. 120
LIST OF TABLES

Table 2-1-1. Minimum requirements of medical grade ECG AFE. ........................................ 20
Table 2-1-2. Summarize of different AFE used in biomedical sensor interface circuits. 36
Table 3-3-1. Measured gain for the 9 AFE chips................................................................. 69
Table 3-3-2. AFE performance comparison with other related works. ......................... 70
Table 4-1-1. Pseudo code of proposed one target clustering. ............................................. 78
Table 4-3-1. Performance Evaluation on MIT-BIH Database............................................. 90
Table 4-3-2. Comparison with Other Algorithms................................................................. 91
Table 5-1-1. Heart beat label and corresponding meaning in MIT-BIH ......................... 94
Table 5-1-2. Mapping of MIT-BIH labels to AAMI types................................................... 94
Table 5-1-3. Statistics of heartbeat number in each record and the proposed CGS ....... 98
Table 5-4-1. Details of the proposed biased training............................................................ 128
Table 5-5-1. Current summary at 75bpm heart rate......................................................... 134
Table 5-5-2. Classification fusion matrix based on simulation and FPGA verification. 136
Table 5-5-3. Classification accuracy comparison............................................................... 137
Table 5-5-4. Performance comparison with other implementations.............................. 137
LIST OF FIGURES

Fig. 1-1-1. The diseases diagnose expenditures in United States from 2014 to 2015. ........ 2
Fig. 1-1-2. Projected CVD cost for different age in USA. .................................................. 2
Fig. 1-1-3. Conduction potentials at different regions of heart in one cardiac cycle [6] ... 3
Fig. 1-1-4. Preventive oriented cardiac healthcare system. .............................................. 5
Fig. 1-2-1. From portable, wearable, to flexible ECG sensor........................................... 7
Fig. 1-3-1. Block diagram of the smart flexible ECG sensor. ........................................... 8
Fig. 2-1-1. Models of different electrodes: wet Ag/AgCl electrode with 100KΩ impedance; dry and insulated electrode with 10MΩ impedance; non-contact electrode with over 200MΩ impedance [32]........................................................................................................ 15
Fig. 2-1-2. Illustration of the 12-lead electrodes’ position [33]............................................ 15
Fig. 2-1-3. Influence of electrode impedance to the input signal acquisition................. 17
Fig. 2-1-4. Block diagram for ECG measurement regarding the power line interference [36].................................................................................................................................................. 18
Fig. 2-1-5. Testbench for ECG AFE measurement [40]....................................................... 21
Fig. 2-1-6. Topology of AFE with DRL [46]. ................................................................. 22
Fig. 2-1-7. Two configurations of diode connected pseudo resistor [52]......................... 24
Fig. 2-1-8. TBA and PGA of programmable AC-coupled AFE [56]. .............................. 25
Fig. 2-1-9. Comparison of AC-coupled AFE with its variant with passive input resistor network [57].............................................................................................................................................. 26
Fig. 2-1-10. Active electrode and negative capacitance for impedance boosting [61].

Fig. 2-1-11. Topology of three-amplifier DC-coupled AFE.

Fig. 2-1-12. DC-coupled AFE with baseline stabilizer [66].

Fig. 2-1-13. Feedback filter techniques to suppress the DC offset [67].

Fig. 2-1-14. Operational principles of CS amplifier [79].

Fig. 2-2-1. A typical ECG pattern.

Fig. 3-1-1. Topology of proposed FDDA DC-coupled IA.

Fig. 3-1-2. Simulated resistance for diode connected PR in Fig. 2-1-7(a).

Fig. 3-1-3. Simulated resistance for diode connected PR in Fig. 2-1-7(b).

Fig. 3-1-4. On-body DC bias for EEG and ECG monitoring.

Fig. 3-1-5. Schematic of proposed FDDA.

Fig. 3-1-6. Simulated integrated input referred noise at different transistor size and bias current.

Fig. 3-1-7. Overview of 417μm×263μm input transistor layout.

Fig. 3-1-8. Symmetrically placed transistor unit.

Fig. 3-1-9. Topology of PGA.

Fig. 3-2-1. AFE transient simulation at different gain.

Fig. 3-2-2. Transient current with 8 duplicated AFE at different gain.

Fig. 3-2-3. Simulated frequency responses of the proposed AFE.

Fig. 3-2-4. Monte Carlo simulation of frequency response at minimum gain.
Fig. 3-2-5. Noise simulation of the AFE................................................................. 66
Fig. 3-3-1. AFE chip micrograph........................................................................... 67
Fig. 3-3-2. Measured AFE frequency response at three different gain settings. ....... 68
Fig. 3-3-3. Tested AFE CMRR at three different gain settings................................. 68
Fig. 3-3-4. Measured AFE input referred noise........................................................ 69
Fig. 3-3-5. Analysis of harmonic distortion of the Proposed AFE............................. 69
Fig. 3-3-6. Monitored ECG signal using the proposed AFE................................. 71
Fig. 3-3-7. Acquired EEG with distinct α component after eye blink..................... 72
Fig. 3-3-8. Results of 2-hour ECG recording.......................................................... 72
Fig. 4-1-1. Diagram of proposed user adaptive QRS detection algorithm............... 75
Fig. 4-1-2. Diagram of the proposed one target clustering...................................... 79
Fig. 4-1-3. Extracted normal and PVC QRS template by OTC............................... 81
Fig. 4-1-4. PCC values on record 208 with baseline drift and PVC distortions........... 84
Fig. 4-2-1. Diagram of adaptive thresholding....................................................... 85
Fig. 4-3-1. GUI of the implemented QRS detection in MATLAB............................ 89
Fig. 5-1-1. Illustration of ECG identity................................................................. 96
Fig. 5-2-1. CTDA vs Nyquist sampling [177].......................................................... 100
Fig. 5-2-2. Topology of CTDA ECG sensor......................................................... 103
Fig. 5-2-3. Mapping of LC-ADC output and adjacent RR intervals to ANN-CAC input. ................................................................................................................. 103
Fig. 5-2-4. Topology of the proposed patient specific CTDA ANN-CAC.................. 105
Fig. 5-3-1. Operational principle of SRAM................................................................. 110
Fig. 5-3-2. Timing of customized SPI. .............................................................. 111
Fig. 5-3-3. Generated PP for 16-bit unsigned multiplicand and signed multiplier....... 114
Fig. 5-3-4. Radix-4 booth encoding diagram......................................................... 115
Fig. 5-3-5. Delay reduction with rearrange of input order........................................ 115
Fig. 5-3-6. Rearrange the position of HA in TDM multiplier. .............................. 116
Fig. 5-3-7. Structure and pipeline data flow of the customized multiplier............ 117
Fig. 5-3-8. Transition diagram for top FSM. .......................................................... 120
Fig. 5-3-9. Data flow of FSM-CA. ........................................................................ 121
Fig. 5-3-10. Transition diagram for FSM-CA. ...................................................... 122
Fig. 5-3-11. Transition diagram for FSM-NN. ...................................................... 124
Fig. 5-4-1. Comparison between conventional training and the proposed biased training. .......................... 129
Fig. 5-5-1. Implementation verified on Pynq-Z2 board........................................... 131
Fig. 5-5-2. Resource utilization on Atrix-7 FPGA verification............................... 132
Fig. 5-5-3. Power statistic for the FPGA implementation at 2.5MHz......................... 132
Fig. 5-5-4. Layout and performance summary of the proposed CTDA ANN-CAC. ..... 133
Fig. 5-5-5. Simulated classification speed and power peaks at different clock frequency. ................................. 134

xvii
Fig. 5-5-6. Energy summary at 25MHz and 10KHz for 75bpm heart rate. .................. 135

Fig. 5-5-7. Classification accuracy details on each records in none (N) groups. .......... 136
LIST OF ACRONYMS

<table>
<thead>
<tr>
<th>Acronym</th>
<th>Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>AAMI</td>
<td>Association for the Advancement of Medical Instrumentation</td>
</tr>
<tr>
<td>ADC</td>
<td>Analog to digital convertor</td>
</tr>
<tr>
<td>AFE</td>
<td>Analog front-end</td>
</tr>
<tr>
<td>ANN</td>
<td>Artificial neural network</td>
</tr>
<tr>
<td>ASIC</td>
<td>Application specific integrated circuits</td>
</tr>
<tr>
<td>BT</td>
<td>Biased training</td>
</tr>
<tr>
<td>CAC</td>
<td>Cardiac arrhythmia classifier</td>
</tr>
<tr>
<td>CGS</td>
<td>Conditional grouping scheme</td>
</tr>
<tr>
<td>CMRR</td>
<td>Common mode rejection ratio</td>
</tr>
<tr>
<td>CNN</td>
<td>Convolutional neural network</td>
</tr>
<tr>
<td>CPA</td>
<td>Carry propagate adder</td>
</tr>
<tr>
<td>CSD</td>
<td>Canonical signed digit</td>
</tr>
<tr>
<td>CTDA</td>
<td>Continuous-in-time discrete-in-amplitude</td>
</tr>
<tr>
<td>CVD</td>
<td>Cardiovascular disease</td>
</tr>
<tr>
<td>DDA</td>
<td>Differential difference amplifier</td>
</tr>
<tr>
<td>DRL</td>
<td>Driven right leg</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Description</td>
</tr>
<tr>
<td>--------------</td>
<td>-------------</td>
</tr>
<tr>
<td>ECG</td>
<td>Electrocardiogram</td>
</tr>
<tr>
<td>EEG</td>
<td>Electroencephalogram</td>
</tr>
<tr>
<td>FA</td>
<td>Full adder</td>
</tr>
<tr>
<td>FDDA</td>
<td>Fully differential difference amplifier</td>
</tr>
<tr>
<td>FN</td>
<td>False negative</td>
</tr>
<tr>
<td>FP</td>
<td>False positive</td>
</tr>
<tr>
<td>FSM</td>
<td>Finite state machine</td>
</tr>
<tr>
<td>HA</td>
<td>Half adder</td>
</tr>
<tr>
<td>IA</td>
<td>Instrumentation amplifier</td>
</tr>
<tr>
<td>LA</td>
<td>Left arm</td>
</tr>
<tr>
<td>LC</td>
<td>Level crossing</td>
</tr>
<tr>
<td>MBE</td>
<td>Modified booth encoding</td>
</tr>
<tr>
<td>ML</td>
<td>Machine learning</td>
</tr>
<tr>
<td>MLII</td>
<td>Modified limb Lead II</td>
</tr>
<tr>
<td>NEF</td>
<td>Noise efficiency factor</td>
</tr>
<tr>
<td>OTC</td>
<td>One target clustering</td>
</tr>
<tr>
<td>PCB</td>
<td>Printed circuits board</td>
</tr>
<tr>
<td>PCC</td>
<td>Pearson correlation coefficient</td>
</tr>
<tr>
<td>PD</td>
<td>Peak detection</td>
</tr>
<tr>
<td>PGA</td>
<td>Programmable gain amplifier</td>
</tr>
<tr>
<td>Abbreviation</td>
<td>Full Form</td>
</tr>
<tr>
<td>--------------</td>
<td>-----------</td>
</tr>
<tr>
<td>PMU</td>
<td>Power management unit</td>
</tr>
<tr>
<td>PP</td>
<td>Partial product</td>
</tr>
<tr>
<td>PR</td>
<td>Pseudo resistor</td>
</tr>
<tr>
<td>PVC</td>
<td>Premature ventricular contraction</td>
</tr>
<tr>
<td>RA</td>
<td>Right arm</td>
</tr>
<tr>
<td>ReLU</td>
<td>Rectified linear unit</td>
</tr>
<tr>
<td>RMS</td>
<td>Root mean square</td>
</tr>
<tr>
<td>SE</td>
<td>Sensitivity</td>
</tr>
<tr>
<td>TDM</td>
<td>Three dimensional reduction</td>
</tr>
<tr>
<td>THD</td>
<td>Total harmonic distortion</td>
</tr>
<tr>
<td>TP</td>
<td>True positive</td>
</tr>
<tr>
<td>VCS</td>
<td>Vertical compress slice</td>
</tr>
<tr>
<td>WHO</td>
<td>World Health Organization</td>
</tr>
<tr>
<td>WPD</td>
<td>Windowed peak detection</td>
</tr>
</tbody>
</table>
Chapter 1
Introduction

Cardiovascular diseases (CVDs) are the world leading cause of death. Early detection helps improving the life quality of CVD patients. In-home monitoring provides a convenient and affordable solution for the CVD symptom detection. Wearable electrocardiogram (ECG) sensor is one of the best candidates for in-home heart monitoring. Flexible ECG sensor is the future of the wearable ECG devices because of its non-invasiveness, fitness, compactness. This research aims for the design of critical building blocks, i.e., analog front-end (AFE), QRS detector, and cardiac arrhythmia classifier (CAC), for such a flexible ECG sensor.

1.1 Motivation

Diseases that related to the blood vessels, e.g., veins, arteries and capillaries, or heart are referred as CVDs. It includes coronary heart diseases, strokes, congestive heart failure and other CVDs that affect cardiovascular system [1]. People with unhealthy life style such as lack of sleep and exercises, unhealthy diet and eating, addicted drinking and smoking, and high mental stress level, are at high risks for the CVDs.

According to the World Health Organization (WHO), CVDs are the number one cause of death globally [2]. An estimated 17.9 million people died from CVDs in 2016, representing 31% of all global deaths [2]. The direct and indirect costs of CVDs between 2014 and 2015 were $351.2 billion in USA according to American Heart Association.
2019 Statistics [3]. The total annual diagnoses expenditures from 2014 to 2015 due to CVDs are higher than any other major causes from the statistic results of USA in the year 2019, as shown in Fig. 1-1-1 [3]. The number of CVD patients is on the rise due to aging, leading to the increasing total CVD cost as shown in Fig. 1-1-2 [3]. WHO predicts that 43.9% of the US population will have some forms of CVD and the total direct-cost for CVDs will increase to over $1100 billion by 2035 [3]. It’s reported that in total $21.2 billion direct and indirect cost yearly on CVDs for Canadians and the number may reach to $293 billion by 2040 [4].

Fig. 1-1-1. The diseases diagnose expenditures in United States from 2014 to 2015.

Fig. 1-1-2. Projected CVD cost for different age in USA.
Electrocardiogram (ECG) is a graph representation of continuously changed electrical field of the heart and the reflection of the heart electrical activities [5]. The generation of the ECG can be summarized as (1) initiation of the electrical stimulus from sinoatrial (SA) node, (2) depolarization and contract of the atrial, (3) depolarization of atrioventricular (AV) node, (4) depolarization of the ventricle, (5) start of the repolarization from the epicardial surface of the left ventricle, (6) repolarization spreads...
inwards to finish one cardiac cycle. The typical potentials at different heart region and the corresponding ECG are presented in Fig. 1-1-3 [6].

Early detection and prevention would help managing CVDs and improve the life quality of CVD patients. Detections based on the perceived warning symptoms of CVDs, such as chest pain, shortness of breath, and angina, are not effective, which are normally too late for preventive actions and even result the delay of clinical treatment. Monitoring the heart activities, including heart electrical depolarization and repolarization, and mechanical contractions in response to the heart electrical signals, provides a direct and timely approach for the detection of CVDs. This is because abnormal heart activities can be detected in very early stage based on electrocardiogram (ECG). Meanwhile, the analysis of the heart activities enables the real-time monitoring of the heart condition, providing sufficient details for further diagnoses. Thus, monitoring of the heart electrical activities through examining the ECG of the subject is an efficient and effective way for preventing CVDs.

Real-time ECG monitoring brings enormous benefits to CVD patients. Recent study shows that the use of home monitoring effectively lowers heart failure hospitalization rates [7], which not only reduces healthcare cost but also improves the life quality of the patients. A preventive oriented cardiac healthcare system for real-time heart condition monitoring is illustrated in Fig. 1-1-4. The body worn sensors collect vital signs and send raw data to the data center via body gateway. The data center performs data analysis and detects any potential problems from raw data. If any abnormality is found
and confirmed by doctors at medical center, a warning message is sent to the person under monitoring and a follow-up action may take. In such a system, the portable ECG sensor is the foundation since it collects the raw ECG signal and directly interacts with the patients.

Fig. 1-1-4. Preventive oriented cardiac healthcare system.

1.2 Flexible ECG sensor

Portable ECG sensor is not something new. The well-known Holter is a typical one as shown at the top of Fig. 1-2-1 [8]. It is a 3-lead ECG system. However, it has several drawbacks such as limited usage time, low patient acceptance rate due to many wires and electrodes, and skin irritation caused by additional tapes to get good skin-electrode
contacts. Moreover, the swing of wires leads to low diagnosis yield due to the noises and distortions introduced in ECG recording process, i.e. motion artefacts caused by body movement. With the advancement of technology, wearable ECG sensors are getting smaller and lighter, and produce better recording quality. As illustrated in the middle of Fig. 1-2-1, the wearable ECG sensor normally is a single lead ECG system [9]. Compared with portable ECG, it is much easier to use, and allows real-time reading of ECG waveform through built-in wireless transmitter and mobile phone. Although it is lighter than the Holter and requires no wires to attach to chest, the weight of electronics and battery still pulls the electrodes causing significant motion artifacts under non-rest condition.

Flexible ECG sensors, such as epidermal electronics [10], are recent developments that are very thin and comfortable to wear. As shown at the bottom of Fig. 1-2-1 [11], the flexible ECG sensors are built with bendable, breathable and stretchable thin materials that conform to body shape without losing of functionality [12]. In this way, the dry electrodes on the thin substrate can be comfortably attached on the chest, leading to better user acceptance. The recorded ECG signals have much less motion artifacts compared to the wearable ones, especially during exercise. Moreover, the skin friendly thin and light substrate makes the wearing of flexible ECG sensor almost imperceptible. Thus, among all types of ECG sensors, the flexible ECG sensor is the most attractive one since it not only offers real-time high recording quality, high diagnostic yield, but also the best user experience. Thus, flexible ECG sensors are
gaining wide interests among researchers and clinicians. It is an ideal candidate for the preventive oriented cardiac healthcare system.

Fig. 1-2-1. From portable, wearable, to flexible ECG sensor.

1.3 Existing Issues in the Designing of Flexible ECG Sensor

As shown in Fig. 1-3-1, a typical smart flexible ECG sensor is composed of an ECG application specific integrated circuits (ASIC), a battery, and electrodes. The ASIC consists of an analog front-end (AFE), an analog to digital convertor (ADC), a heart-rate detector or QRS detector (QRS-D), a cardiac arrhythmia classifier (CAC), a power management unit (PMU), and a wireless transceiver. It performs the main functions of
the ECG recording: 1). conditioning the ECG signal by the AFE; 2). converting the analog signal to its digital representations by the ADC; 3). extracting the heart rate via the QRS detector; 4). detecting the arrhythmia beat types by the CAC. The AFE performance directly affects the obtained ECG signal quality. The QRS detection is the first step in arrhythmia classification. In addition, it also provides heart rate information. In the whole wireless ECG sensor, the wireless transmission of raw ECG data consumes most of power, i.e. more than 80% of total power [13]. Thus, the wireless transmission should be used only if ECG abnormalities are detected by the classifier. Thus, the on-chip CAC plays an important role in reducing the wireless data transmission. It should be built with low power, much lower than a wireless transceiver, while the classification accuracy is high so that to reduce the raw ECG data transmission due to false alarm. This research aims to develop the three essential components in ECG ASIC for flexible ECG sensors.

Fig. 1-3-1. Block diagram of the smart flexible ECG sensor.
The design of three main building blocks for flexible ECG sensor faces much more challenges than the design for the wearable ones. Firstly, the dry electrode used in flexible ECG sensor shows over 10MΩ contact impedance. It affects the recording quality of the AFE from two aspects: one is the signal attenuation due to the large impedance; the other is the distortion induced by the skin-electrode impedance variation and mismatch due to the body movement. The increase of input impedance of the AFE will help boost the input signal amplitude while minimizing the impact of variation in the skin-electrode contact impedance. For example, if the AFE input impedance is 1GΩ, the effect of 10MΩ is smaller than 1%. The high input impedance is also helpful to reduce the input mismatch and thus improve the CMRR (Common Mode Rejection Ratio) to suppress the power line interferences. Thus, the input impedance greater than 1GΩ is preferred for the AFE of the flexible ECG sensor.

Secondly, the flexible ECG sensors normally adopt dry-electrode approach such that the electrode can be printed on thin substrate. The use of dry electrode introduces imbalance in skin-electrode contact resistance between a pair of electrodes attached to chest. As a result, large power line noise could be picked up by the AFE if the CMRR is not high. Furthermore, due to large gain used in flexible ECG sensor (for reason given in next paragraph), the power line noise could saturate the AFE. Thus, a high CMRR of great than 70dB is highly desired for reducing power line noise.
Thirdly, the portable ECG sensor, such as Holter, places electrodes far away from each other. For wearable and flexible ECG sensor, the distance between two electrodes is limited, normally within 10cm. For example, in a Lead-II configuration using Holter, the two electrodes can be placed at the right and left shoulders with distance greater than 2cm where the standard ECG signal can be recorded. In the case of flexible ECG sensor, the distance can be just 2cm. The electrode distance affects the strength of picked up ECG signal, i.e. the longer the distance the larger the ECG amplitude. Therefore, the ECG amplitude by using flexible ECG sensor is weaker than the standard recording. To obtain clear signal, higher gain and lower noise of the AFE is required.

What’s more, bulky battery is impossible to be integrated to a flexible ECG sensor. Microwatt or nanowatt AFE circuit, QRS detector and CAC are necessary for the use of tiny on-chip battery. The design of ultralow power ECG circuit is a tough task for two reasons. On one hand, there is a limit on the reduction of the power for an AFE with acceptable noise performance. An AFE with the best power/noise efficiency should be designed for the flexible ECG sensor. On the other hand, the accuracy of the QRS detector and the CAC should be trustable. Machine learning (ML) algorithms are required for the confidence of arrhythmia alerting. However, it is almost impossible to achieve the microwatt ML based CAC using conventional Nyquist based signal flow, calling for innovation in the design of on-chip CAC.

Last but not least, discrete components are not welcomed in flexible ECG sensor because they are difficult to be integrated on thin substrate and reduce the reliability of
the sensor due to the stretching, bending or deforming of flexible material. Ideally, there should be only one ASIC on the flexible substrate. Moreover, the size of the ASIC and the amount of the footprints should be small for easy integration and manufacturing of flexible ECG sensor. In addition, more discrete components mean larger size and higher cost. Thus, the flexible ECG sensor should be designed in a compact fully integrated way, challenging the sensor circuit design.

1.4 Objectives of Research

This research aims to solve several critical issues in the design of sensor interface circuits for smart flexible ECG sensor as mentioned above section. Firstly, a fully integrated AFE with gain, bandwidth programmability should be designed for the compactness and robustness of the flexible ECG sensor. Secondly, high input impedance of the AFE should be achieved to handle the motion artefact and closely located electrodes. Thirdly, the ultra-low power AFE, QRS detector and CAC have to be implemented for a tight power budget of the flexible ECG sensor. Lastly, patient-specific ML based QRS detector and CAC are essential for the smart flexible ECG sensor.

Upon our proposed solutions, we try to eventually achieve: an ultralow power, high performance AFE with 1G input impedance to handle the dry electrodes (most of existing designs are around 200MΩ), over 75dB CMRR (the state-of-the-art ultralow power designs shows around 70dB) to suppress the power line interferences, over 60dB gain (most designs presents gain less than 60dB) and less than 1.5μVrms input referred noise (state-of-the-art designs without chopper stabilization shows several μVrms noise)
to pick up clear ECG signal on short electrode distance, a pass-band filter covers 0.5-100Hz to filter out unwanted components; an robust and simple QRS detector to achieve 99% heart beat detection accuracy; an μW machine learning based CAC with classification accuracy of over 95%. The goal is to implement a fully integrated, intelligent ECG processing ASIC with tens of μW power and less than 2mm² area for flexible ECG sensor.

The thesis is organized as follows. After the Introduction, we present the literature review and the current research progress on AFE, QRS detector and CAC in Chapter 2.

In the design of AFE, we focus on improving the immunity of the motion artefact and reducing power consumption without too much sacrifice of noise and area performance. Inspired by the extremely high input impedance of DC-coupled AFE and reported high CMRR of DDA, a simple yet acceptable fully differential DDA DC-coupled AFE will be presented in Chapter 3 [15].

For the QRS detector, an user-specific light-weight algorithm with detection accuracy of over 99% is proposed to handle the diverse of the ECG during practical prediction, as reported in Chapter 4 [16].

A 13.34μW patient-specific artificial neural network (ANN) CAC with classification accuracy of over 90% following the evaluation matrix recommended by Association for the advancement of Medical Instrumentation (AAMI) is implemented for arrhythmia classification. The network structure and the optimized multiplier will be presented in Chapter 5.
Chapter 2
Review of ECG Interface for Wearable Applications

AFE, QRS detector, and CAC are three critical building blocks of an ultra-low power ECG-on-Chip that provides high recording quality and cardiac arrhythmia classification accuracy for flexible ECG sensor. In this chapter, literature reviews of the three blocks are presented.

The organization of this chapter is as follows. In Section 2.1, a brief introduction to IA is presented. In Section 2.2, QRS detections are reviewed. In Section 2.3, machine learning based CACs are discussed. The summary is given in Section 2.4.

2.1 Brief Introduction of ECG Signal Analog Front-End

The quality of ECG signal depends heavily on the analog front-end (AFE) of an interface chip. The requirements include high input impedance, low noise, high common mode rejection ratio (CMRR) as well as low power. There are many attempts in the design of interface chip that satisfies these requirements. In the following, we will present why these requirements are important for wearable sensor interface chip and how to achieve these design goals.
2.1.1 Design Challenges for AFE

The design of ECG sensor interface circuits faces enormous challenges. In this section, we start with the introduction of skin-electrode interface, and then highlight the challenges as the result of skin-electrode interface.

ECG signal with the distinguishing feature of 1-5mV amplitude and 0.05-100Hz bandwidth [17] should be sensed by an AFE with large input dynamic range, high input impedance, high common mode rejection ratio, good noise performance, ultra-low high-pass cut-off corner frequency to reduce baseline drift, handle motion artefact, minimize mismatch of the electrode contact impedance, and electrical interferences [18], [19].

To transport the bio-potential, electrode should be adopted to connect the body to the front-end AFE [20]. The commonly used wet electrode, e.g. silver/silver chloride (Ag/AgCl) electrode, presents simplicity, good signal stability/quality and low cost. It was once a prevalent choice of physicians. However, during prolonged recordings [21], [22], the wet gel of the electrodes brings drawbacks: (1) the gel has to be reapplied to keep the signal stable during the long-term recordings [23], (2) preparation of skin should to be carried out to improve the interfacial impedance [24], (3) the use of electrolytic gel can cause inflammations [25], [26]. Recently, dry electrodes [27]-[29], insulated electrodes [30] and non-contact electrodes [31], are studied to improve the signal quality and the user acceptance. Electrical models of skin-electrode interface for different kinds of electrodes are summarized in Fig. 2-1-1 [32]. Generally, the interface can be taken as a series of RC elements with finite impedance.
Fig. 2-1-1. Models of different electrodes: wet Ag/AgCl electrode with 100KΩ impedance; dry and insulated electrode with 10MΩ impedance; non-contact electrode with over 200MΩ impedance [32].

![Diagram of different electrodes](image)

Fig. 2-1-2. Illustration of the 12-lead electrodes’ position [33].

The source signal of an ECG recording system is composed of ECG, electromyography (EMG), skin potential, and polarization potential coming from the induction of electrode interface with DC current. Besides the unwanted source signal, ECG is simultaneously distorted by the instrument noise, the electric and magmatic interferences brought by the power line, radio frequency, and body movement. When the muscle cells at the placed spot are electrically or neurologically activated, electric
potential generated by those activations will be added in series with ECG. To obtain an excellent ECG recording during the exercise recording, 12-lead diagnostic ECG is proposed [33]. The recommend 12-lead position is shown in Fig. 2-1-2.

It is commonly agreed by physiologists that skin potential varies from -70mV to 10mV at static state [34]. Thakor and Webster hypothesized that it is induced by the difference of metabolic activity between the dead cells lying in the stratum corneum layer and the cells lying in the stratum germinativum layer [35]. When stretching the skin, the skin potential increases by several millivolts [36]. The deformation of skin during an ECG recording, e.g. pressing, squeezing or stretching, changes the initial skin potential. This deformation drifts the baseline of recorded ECG, leading to unreadable or failed recordings. Not only skin potential but also skin impedance variations during the movement [34] distort the ECG. The distortions are called motion artefacts. In the early ECG monitoring system, cable movement is also a cause of motion artefact [36], which can be easily solved by shielding the cable. The motion artefact introduced by skin deformation can be dramatically reduced by using dry electrode, non-contact electrode. High impedance is the common feature of those electrodes. To reduce the input signal attenuation and common mode interferences, the input impedance of AFE should be extremely high.

As shown in Fig. 2-1-3 for a low frequency case, one electrode has a mismatch of $\Delta Z$ compared with another electrode $Z_{el}$. Assuming the two input impedances of the
designed AFE are finite and equivalent, $V_{\text{in,CM}}$, the differential input referred to common mode AC signal, $V_{CM}$, can be derived as

$$V_{\text{in} +} = V_{CM} \frac{z_{\text{in}}}{z_{\text{el}} + \Delta Z + z_{\text{in}}} \quad \text{and} \quad V_{\text{in} -} = V_{CM} \frac{z_{\text{in}}}{z_{\text{el}} + z_{\text{in}}}$$

This phenomenon that the differences in skin/electrodes impedance and/or common mode input impedance convert the common mode signal into differential signal is called as ‘potential divider effect’. Similarly, $V_{\text{in,D}}$, the differential input referred to the original differential input $V_d$ is determined by

$$V_{\text{in,D}} = \frac{z_{\text{in}}}{z_{\text{el}} + z_{\text{in}}} V_d.$$

It indicates the desired differential signal is attenuated by the finite impedance of AFE. For a typical dry electrode composed of conductive graphite that lightly impregnated with aluminum, it features with 1.36MΩ impedance at 0.1Hz and 1Hz [37]. To guarantee the attenuation less than 1% of the original differential input, the input impedance should be
larger than 10 times of the electrode impedance, which is 136MΩ. In this case, \( Z_{in} \) is much larger than \( Z_{el} \). So (2-1-1) can be approximately rewritten as

\[
\frac{V_{in,CM}}{V_{CM}} = \frac{\Delta Z}{Z_{in}}.
\]  

(2-1-3)

Thus, the inherent CMRR\(_{\Delta Z}\) decided by the input impedance is given as

\[
\text{CMRR}_{\Delta Z} = 20 \log \frac{V_{CM, \Delta Z}}{V_{in, DZ_{in}}}. 
\]  

(2-1-4)

Common mode AC signal generated by the motion artefact contributes to part of \( V_{CM} \), meanwhile there is another portion coming from the power line interferences. As modeled by [38], the block diagram is illustrated in Fig. 2-1-4. The total interference is composed of the interference current through body, the interference current into the amplifier and the interference current into the measurement cables. A right-leg driven (DRL) circuit is required to relax the interference.

Fig. 2-1-4. Block diagram for ECG measurement regarding the power line interference [36].

If we assume the electrodes with zero impedance, then the interference appears at the output due to the common mode AC is purely determined by the CMRR of AFE. As
discussed in [39], the total CMRR of a cascaded system is produced by the combination of each stages, which is given as

$$\frac{1}{\text{CMRR}} = \frac{1}{\text{CMRR}_{AZ}} + \frac{1}{\text{CMRR}_{AFE}}.$$  \hspace{2cm} (2-1-5)

To achieve high system CMRR, both CMRR$_{AZ}$ and CMRR$_{IA}$ should be sufficient large. Practically, one of these will be a limiting factor. CMRR$_{IA}$ is mainly determined by the architecture of designed AFE and the mismatch of the components. While for CMRR$_{AZ}$, we can utilize (2-1-4) to calculate inherent CMRR$_{AZ}$ by applying the interference $V_{CM}$ (coming from motion artefact and power line interference) and the mismatch of electrode impedance. Once the mismatch and $V_{CM}$ are settled, CMRR$_{AZ}$ can be effectively reduced by increasing the input impedance of the AFE. So it is an effective way to achieve a proper CMRR by setting a moderate CMRR$_{IA}$ of AFE, and an extremely high CMRR$_{AZ}$ achieved by extremely high AFE input impedance. Therefore, no matter for handling motion artefact or suppressing power line interference, high input impedance is one of the key performance indicators in a high performance ECG AFE design.

Input DC offset is another challenge in the ECG AFE design. When the electrode contacts with a conducting medium like electrolyte, a polarized potential will be observed at the boundary region. This phenomenon is called as the electrode polarization. DC offset appears due to this polarization. According to the British and American standard IEC 60601-2-47:2001 and ANSI/AAMI EC38:2007, the ECG AFE should have the capability to deal with maximum +/-300mV offset [40], [41].
Trade-offs among noise, power dissipation and area are another challenge. The never-stopping pursuit of better user experience leads to the development of near zero power ECG sensors with tiny battery or energy harvester [42]. When the power budget of a CMOS amplifier comes to nanowatt, flicker noise (or 1/f noise) could take over the thermal noise to dominate the total amplifier noise, which is especially severe for low frequency ECG signal [43]. The dominating flicker noise limits the minimum detectable signal [44]. It can be effectively reduced by either using auxiliary technique like chopper modulation or using large-size PMOS input pairs. The reduction is at the price of either larger area or more power. So how to reduce the 1/f noise at the lowest power and smallest area should be considered.

Table 2-1-1. Minimum requirements of medical grade ECG AFE.

<table>
<thead>
<tr>
<th>Requirement</th>
<th>Range</th>
<th>Unit</th>
<th>Standard</th>
</tr>
</thead>
<tbody>
<tr>
<td>Input Dynamic Range</td>
<td>10</td>
<td>mV p-v</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Electrode Offset</td>
<td>±300</td>
<td>mV</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Input Impedance</td>
<td>&gt;10</td>
<td>MΩ</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>CMRR</td>
<td>60 (50-60Hz)</td>
<td>dB</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td></td>
<td>30 (100-120Hz)</td>
<td>dB</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Gain Accuracy</td>
<td>Error&lt;10 and Amplitude&lt;±10</td>
<td>%</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Gain Stability over 24h</td>
<td>&lt;3</td>
<td>%</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Noise</td>
<td>&lt;50</td>
<td>µV</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Crosstalk</td>
<td>Error&lt;5 and Amplitude&lt;0.2mV</td>
<td>%</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Frequency Response</td>
<td>0.67-40/50 or 0.05-55</td>
<td>Hz</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Timing Accuracy</td>
<td>&lt;30 over 24h</td>
<td>s</td>
<td>IEC 60601-2-47</td>
</tr>
<tr>
<td>Temporal Alignment</td>
<td>Error&lt;20</td>
<td>ms</td>
<td>IEC 60601-2-47</td>
</tr>
</tbody>
</table>
Table 2-1-1 summarizes the minimum requirements for an ECG AFE [40]. Those requirements focus on the performance to guarantee the transient response (time accuracy), frequency response, electrode polarization, interferences (CMRR, input impedance) and the crosstalk from the defibrillation. Also the recommended performance testbench for the AFE is given in Fig. 2-1-5. The recorder and the entire test circuit are shield by wrapped foil to earth. The electrodes are shield by the common mode coming from DRL to reduce interference. The capacitors $C_1$ between power line and the driven shield and $C_x$ between earth and the driven shield are used to model the power line interference. The switch controlled battery simulates the electrodes polarization and a parallel connection of 51kΩ resistor $R$, 47nF capacitor $C$ and a switch imitates the electrode.

![Testbench for ECG AFE measurement](image)

Fig. 2-1-5. Testbench for ECG AFE measurement [40].
In summary, high input impedance, high CMRR, low input DC-offset, and high power/noise/area efficiency are main design challenges of AFE.

### 2.1.2 Resistive Three-Amplifier AFE

In the early stage, the AC-coupled AFE with three-amplifier and gain ratio resistors is the best choice [45]. However, there are three drawbacks. First, power line interference is distinct due to poor CMRR. Second, μF off-chip capacitors are required for sub-Hz high-pass cut-off frequency because of the MΩ limit of on-chip gain ratio resistor. Last, the resistive feed-back contributes extra thermal noise.

The Right-leg driven (DRL) circuits is commonly used to address the CMRR issue of the three-amplifier AC-coupled AFE. The overall topology of an AFE with DRL is simplified in Fig. 2-1-6 [46].

![Diagram](image)

**Fig. 2-1-6.** Topology of AFE with DRL [46].
The parasitic input common mode signal introduced by the power line is
determined by the coupling impedance between the power line and body, $Z_{inf}$, and the
grounding electrode impedance, $Z_{gnd}$ as in (2-1-6).

$$v_{cm}^{60Hz} = v_{60Hz} \frac{Z_{gnd}}{Z_{gnd} + Z_{inf}}.$$ (2-1-6)

Introducing the active grounding electrode as in Fig. 2-1-6, the impedance of grounding
electrode is attenuated by a factor of $1+A$, of which $A$ is the loop gain. Then the $v_{cm}^{60Hz}$ is
effectively reduced to the value

$$v_{cm}^{60Hz} = v_{60Hz} \frac{Z_{gnd}}{1+AZ_{gnd} + Z_{inf}}.$$ (2-1-7)

The active grounding is also called as DRL.

2.1.3 *AC-coupled Pseudo Resistor AFE*

To remove the off-chip capacitor in the resistive three-amplifier AFE, GΩ feedback
resistor is necessary. However, the off-chip passive GΩ resistor is quite expensive. The
study of “adaptive element” that takes a diode connected transistor as a resistor [47]
enables on-chip GΩ resistor.

Inspired by the adaptive element, authors of [48], [49] proposed the subthreshold
biased transistors to achieve hundreds of GΩ pseudo resistor (PR) and [50] applied the
diode connected transistor with 15.9GΩ resistance on the biomedical amplifiers to
achieve the low high-pass cut-off frequency for the first time. After the typical topology
of the PR AFE reported in [51] for neural recording front-end, PR has been widely used
in biomedical signal acquisition system.

23
Sixteen different passive pseudo resistors and their corresponding resistance are studied in [52]. It is demonstrated that both of diode connected pseudo resistors as in Fig. 2-1-7(a) and Fig. 2-1-7(b) shows over 20TΩ under small amplitude cross voltage e.g., 0.15V. When the voltage across the pseudo resistor reach to 1.5V, configure in Fig. 2-1-7(a) remains over 20TΩ while configure in Fig. 2-1-7(b) drops down to 100MΩ. As the output swing in PGA stage may has rail-to-rail property, adoption of Fig. 2-1-7(b) in PGA would introduce large non-linearity. Fig. 2-1-7(a) is a simple yet consistent choice in AFE.

A typical AC-coupled PR AFE is composed of a low noise amplifier (LNA), feedback capacitors, and diode connected PRs [51]. Thanks to the on-chip GΩ PR, it’s easy to achieve a fully integrated AFE with sub-Hertz high-pass cut-off frequency using pF capacitors. Due to the limit of the output headroom of the LNA in [51], the noise effective factor (NEF) cannot lower than 2.9. To address the issue, the flipped voltage follower (FVF) based OTA was proposed in [53].
To achieve programmable gain and bandwidth, tunable bandwidth amplifier (TBA) connecting with programmable gain amplifier (PGA) are adopted in [54]-[56]. The TBA and PGA used in [56] are given in Fig. 2-1-8(a) and Fig. 2-1-8(b) respectively.

Fig. 2-1-8. TBA and PGA of programmable AC-coupled AFE [56].

The high-pass cut-off for the TBA in Fig. 2-1-8(a) can be either changed by adjust $C_1$ or pseudo $R$. As $C_1$ also contributes to the overall gain, it’s practical to tune the pseudo $R$ via the change of the bias voltage. The currents passed through M3 and M4 are tuned by the gating signal Ctrl$_{HFP1}$, Ctrl$_{HFP2}$, and Ctrl$_{HFP3}$. For a symmetrical wave across the biased pseudo resistor consisting of M1 and M2, M1 and M2 will alternatively activate in each of the half cycle. This presents a symmetrical tunable pseudo resistance,
which eliminates the signal distortions comparing with the single biased unbalanced PR used in [54], [55].

Fig. 2-1-8(b) takes the off-state switch resistance $R_s$ into consideration. The transfer function for the PGA is given by

$$H(s) = \frac{C_1}{C_2} \cdot \frac{sC_x R_x + 1}{sC_x R_x + 1 + C_x/C_2}. \tag{2-1-8}$$

As $C_x$ is in picofarad range and $R_s$ is comparable to $10^{13}$ ohm, the non-overlapping pole and zero will distort the high pass band. The flip-over-capacitor scheme shown at the bottom of Fig. 2-1-8(b) eliminates this distortion by excluding $R_s$ out of the feedback loop through using the switch-controlled capacitors.

Though the classical AC-coupled PR AFE inherent rejects the DC drift, their low input impedance is not good for handling motion artefact. Two main techniques can be applied to AC-coupled PR AFE for boosting the input impedance: passive input resistor network [57] and input positive feedback [58]-[65].

![Fig. 2-1-9. Comparison of AC-coupled AFE with its variant with passive input resistor network [57].](image)

The bias resistor $R_2$ in Fig. 2-1-9(a) is replaced by the series connected resistors $R_1$ and $R'_1$ coming from the input terminals in Fig. 2-1-9(b). Therefore, there is no DC
path to ground seen by the common mode signal and thus the common mode resistance is enhanced over 100MΩ [57].

![Diagram](image)

(a)

![Diagram](image)

(b)

Fig. 2-1-10. Active electrode and negative capacitance for impedance boosting [61].

Utilizing the feedback circuits, active shielding or active electrode and the negative capacitance are proposed to boost the input impedance to GΩ. As in Fig. 2-1-10(a) [61], the active electrode keeps voltage across the external parasitic capacitance as constant. Assume the open loop gain of the amplifier in the unity gain buffer is $A_V$, the $C_{ext}$ will be reduced to $C_{ext}/(1+A_V)$. The negative capacitance generated from the positive feedback circuits is given in Fig. 2-1-10(b). In the circuits, $G(s)$ is designed with constant amplitude in signal pass-band such that the generated impedance is frequency independent. The transfer function in Fig. 2-1-10(b) is given as

$$H(s) = \frac{C_{EL}}{C_{EL} + C_{INT} - C_{PF}(G(s) - 1)} \quad (2-1-9)$$
Where, the $C_{INT}$ is the internal input capacitance to be cancelled, and the $-C_PF(G(s)-1)$ is the generated negative capacitance. The negative capacitance is designed to the value $-C_{INT}$ to achieve the extremely large input impedance. As reported in [62], the impedance can be as large as 400GΩ.

### 2.1.4 DC-coupled Pseudo Resistor AFE

Complex impedance boosting is essential to enhance the input impedance of the AC-coupled AFE. However, the positive feedback suffers stability issue, thus external tuning capacitor [62] or complicated auto tuning circuits are required [61]. The feedback amplifier and the tuning circuits also consume extra power and area. Since the input is directly connected to the amplifier transistor gate, DC-coupled AFE provides a simpler way to improve the input impedance. There are several attempts for DC-coupled AFE in recent years [45], [66]-[69].

![Fig. 2-1-11. Topology of three-amplifier DC-coupled AFE.](image)
The typical topology of a DC-coupled pseudo resistor AFE is illustrated in Fig. 2-1-11 [66]. The circuit topology is quite similar with the three-amplifier topology except that all the resistors are replaced by the capacitors and diode connected PR (as in Fig. 2-1-7). Since it is DC-coupled, the input impedance is inversely proportional to the input transistor parasitic capacitance of the low noise amplifier (LNA). For the picofarad range parasitic capacitance and sub-100Hz operating frequency, the reactance of the parasitic capacitance should be in GΩ level, which is high enough for the rejection of potential divider effect. The input DC bias is provided by the PR. It can be found the AFE with two amplification branches. The two branches amplify both common mode and differential signal at the same gain. Assume the input common mode signal is $v_{cm}^{body}$, and the monitored bioelectrical signal is $v_{bio}$, then we have

$$v_{out} = \frac{C_2+C_1 C_5}{C_1} v_{cm}^{body} + \frac{v_{bio}}{2} - \frac{C_4+C_3 C_7}{C_3} v_{cm}^{body} - \frac{v_{bio}}{2} = \left(\frac{C_2+C_1 C_5}{C_1} - \frac{C_4+C_3 C_7}{C_3}\right) v_{cm}^{body} + \frac{1}{2} \left(\frac{C_2+C_1 C_5}{C_1} + \frac{C_4+C_3 C_7}{C_3}\right) v_{bio}. \quad (2-1-10)$$

Thus, the CMRR can be derived as

$$CMRR = 2 \left(\frac{C_2+C_1 C_5}{C_1} - \frac{C_4+C_3 C_7}{C_3}\right) / \left(\frac{C_2+C_1 C_5}{C_1} + \frac{C_4+C_3 C_7}{C_3}\right). \quad (2-1-11)$$

CMRR is determined by the mismatch of the gain ratio capacitors of the AFE. Since the three-amplifier topology requires an extra LNA comparing with AC-coupled approach, the area left for ratio capacitors is smaller than that of AC-coupled one if area constraint is the same, resulting larger mismatch and worse CMRR. In [69], open-loop DC-coupled AFE with complicated analog-digital mixed feedback loops are used for the improvement.
of CMRR and noise performance, which presents 75dB CMRR. However, the complex compensation scheme consumes more power.

![Diagram of DC-coupled AFE with baseline stabilizer](image)

**Fig. 2-1-12.** DC-coupled AFE with baseline stabilizer [66].

Since bioelectrical signals have extremely weak drive capability, the input DC bias for the DC-coupled AFE is commonly provided by $GΩPR R_2$ as in Fig. 2-1-11. It is worth noting that the bias provided by the PR is unpredictable due to the leakage. Furthermore, the leakage of input transistor is not negligible if its size is made large for better noise performance. This leakage reduces the input impedance to the level close to the bias PR. In such case, the unpredictable bias resistance would produce uncontrollable input DC-offset. Upon this point, a baseline stabilizer is proposed in [66] to the typical DC-coupled AFE. The circuit architecture is depicted in Fig. 2-1-12. Once the inputs encounter a sudden unbalance baseline due to the motion artifact, the drift is detected by the baseline stabilizer. Once a motion artifact event occurs, the unbalance of the baseline saturates the PGA output at the movement. The reset action quickly brings the incorrect
saturation state back to normal state through shorting the unbalanced input of the PGA to system DC common mode bias.

Different from the reset method discussed above, to address the issue of DC offset at the input of DC-coupled AFE, feedback techniques with analog filter [70]-[72], analog-digital mixed filter [73], and digital filter [74]-[75], [67] have been reported. Fig. 2-1-13(a) shows the topology of analog filter feedback path to cancel the offset [67]. The integrator contributes to the following overall transfer function

\[ H(s) = \frac{sR_{eq} C_I}{sR_{eq} C_I + 1}, \]

where \( G \) is the gain of A1. When \( s=0 \), the overall gain is 0, which means the DC components are fully suppressed by this feedback loop. A differential version of this topology is discussed in [72]. For the response speed, A2 in Fig. 2-1-13(a) consumes large amount of power. In addition, the feedback loop contributes to extra noise to the input.

Fig. 2-1-13. Feedback filter techniques to suppress the DC offset [67].
Fig. 2-1-13(b) gives the model of mixed signal filter. The extracted DC offset from $V_{\text{OUT}}$ is fed to a coarse and a fine controlled current source to cancel the offset in the input. Both Fig. 2-1-13(c) and Fig. 2-1-13(d) present the topologies of pure digital controlled feedback loop. They utilize the on-chip or off-chip low pass FIR or IIR digital filter ($H(z)$) to achieve the offset in digital codes format from the output of the ADC. The differences are Fig. 2-1-13(c) modulates the input transistors width by the filtered codes to suppress the offset, while Fig. 2-1-13(d) reconstruct the analog offset from the filtered codes via a DAC consisting of capacitors array.

2.1.5 Chopper Stabilization

In conventional AC-coupled AFE and DC-coupled AFE, the only way to reduce the 1/f noise is to use large gate area input transistor. The area is costly and inefficient for the high surface state densities processes [76]. Another technique is to use lateral bipolar mode input transistors which has a reduction of 40dB 1/f noise but results an offset between 1 to 10mV [77]. There are also circuit techniques, e.g., correlated double sampling (CDS) [78] and chopper stabilization (CS) [79], are applied for the reduction of 1/f noise.

CDS, also called as auto-zero, cancels the 1/f noise by subtracting the sampled correlated version of the current noise from a previous time. Due to the residual noise, which corresponds to the fold-over broadband white noise and flicker noise, the noise reduction is limited [79]. Another issue of CDS is the input-output short action requires high slew rate to slew the output back and forth in each sampling period. Moreover, the
speed of the amplifier should be much higher than the sampling frequency. Overall, high power consuming is required.

Fig. 2-1-14. Operational principles of CS amplifier [79].

The operation principles of a CS amplifier are illustrated in Fig. 2-1-14 [79]. Firstly, the input is multiplied by the chopping square wave via the modulator inserted between input and the amplifier. In this way, the input is modulated to the frequencies around the odd harmonics of the chopping frequency. After the second modulation, the output is demodulated back to the even harmonics of the chopping signal. However, for the noise and offset of the amplifier, they are only modulated once by the second multiplier and hence most of the noise and offset power are transported to the odd harmonics of the chopper square waves as illustrated by the second noise waveform in Fig. 2-1-14. Lastly, after the low pass filter, main signal power is recovered while $1/f$ noise and offset power are removed. To avoid the signal alias due to the chopping, the chopping frequency should be higher than twice of the signal bandwidth. Also, the gain of the amplifier at chopping frequency should be higher enough to guarantee the designed amplification, so the cut-off frequency of the amplifier should be higher than the
chopping frequency. Due to the introducing of the four multiplier switches, extra spikes (right bottom of Fig. 2-1-14) are added by the charge injection during the transition of the switches states, which will result the parasitic noise.

In CS AFE, spikes are introduced by the input modulator. Clock edge staggering is proposed to mitigate the distortion in [80]. In the past few years, many designs adopted the CS technique in the biomedical sensors [81]-[88] for the sake of high CMRR, low offset, and low noise. In [81] the basic CS AFE structure is proposed by using the selective band-pass $g_m$-C cell. To further improve the offset cancellation, a DC servo-loop is proposed in [82]. To relax the amplifier bandwidth and hence reduce the power for CS amplifier, [83]-[85] proposed the integrator feedback CS (IFCS). In the IFCS structure, the distortions due to the switching of the chopper modulator are also reduced by subtracting the averaged distortion power from the input. The heterodyne chopper readout circuits for ECG extraction and QRS complex detection are possible [86]. Chopper stabilization combined with CDS noise cancelling technique has been discussed in [87] taking common gate (CMG) in parallel with common source (CMS) transimpedance.

### 2.1.6 Differential Difference Amplifier

Differential difference amplifier (DDA) could be a candidate for the implementation of high performance AFE featuring with the inherent DC-coupled high input impedance, circuit simplicity and low power.
In the past two decades, there are many DDA IAs reported for their distinguishing CMRR [89]-[97]. A programmable resistive DDA AFE proposed by [89] introduces extra thermal noise, and extra band-pass filter is required for the bandwidth control. Also the offset in this initial DDA AFE structure is not well treated. To eliminating the effect of the amplifier offset and enhance the noise performance, CS technique has been applied to the DDA topology [90]. To suppress the slow changing baseline drift, the AC-coupled feedback AFE is proposed in [91]. In [92], dual DC servo-loop is added to compensate the baseline drift. A sub-Hz high-pass filter, formed by the input parasitic capacitance in parallel with a PR, is introduced to suppress the slow changing baseline drift [93]. But it transforms the original differential input DDA into a single ended DDA and thus folded the utility of the DDA by half. The two-stage AFE that is composed of DDA, OTA and capacitive feedback combined with PR was proposed by [94] to achieve the band-pass AFE. DDA AFE generally shows excellent CMRR. In [95], a chopper stabilized DDA AFE with CMRR of 85dB is proposed for bio-potential acquisition. The rail-to-rail DDA reported in [96] achieves 137dB CMRR for biosensor. As high as 150dB CMRR is possible as shown by the simulation result in [97] by using DDA. These results prompt us to design a simple yet acceptable AFE to address the CMRR issue in the three-amplifier DC-coupled ones.

2.1.7 Summary of AFE

Characteristics of different AFE structures for biomedical sensors are summarized in Table 2-1-2. AC-coupled AFE shows the lowest input impedance. The impedance
boosting techniques enhanced the input impedance significantly but require extra power and area. Thus, AC-coupled AFE with auxiliary circuits, such as impedance boosting and DC-serve loop, is not the best candidate for the ultra-low power flexible ECG interfacing. DC-coupled AFE shows inherent high input impedance, but suffers poor CMRR. Without chopper, the way to improve the noise performance is either enlarging the input transistor size or increasing the bias current of the AFE input transistor. With chopper, the modulator deteriorates the input impedance, the clocking and auxiliary compensation circuits require the highest power among all topologies. Therefore, DC-coupled AFE without chopper should be the best choice for ultra-low power purpose. However, the poor CMRR of the three-amplifier DC-coupled AFE should be addressed for the suppression of power line interferences. DDA AFE shows highest CMRR among the structures without chopper. Combine of DDA with DC-coupling may provide a simple yet acceptable AFE with high input-impedance, ultra-low power, low noise and acceptable CMRR.

Table 2-1-2. Summarize of different AFE used in biomedical sensor interface circuits.

<table>
<thead>
<tr>
<th>Techniques</th>
<th>Power</th>
<th>Noise</th>
<th>CMRR</th>
<th>Anti-offset</th>
<th>Input Impedance</th>
<th>Area</th>
</tr>
</thead>
<tbody>
<tr>
<td>AC-coupled</td>
<td>H</td>
<td>M</td>
<td>M</td>
<td>M</td>
<td>L</td>
<td>M</td>
</tr>
<tr>
<td>AC-coupled with DC-serve loop</td>
<td>M</td>
<td>M</td>
<td>M</td>
<td>H</td>
<td>L</td>
<td>L²</td>
</tr>
<tr>
<td>AC-coupled with impedance boosting</td>
<td>M</td>
<td>M</td>
<td>M</td>
<td>M</td>
<td>H</td>
<td>M</td>
</tr>
<tr>
<td>AC-coupled with chopper</td>
<td>L</td>
<td>H</td>
<td>H</td>
<td>H</td>
<td>L</td>
<td>H</td>
</tr>
<tr>
<td>DC-coupled</td>
<td>M</td>
<td>M</td>
<td>L</td>
<td>L</td>
<td>H</td>
<td>M</td>
</tr>
<tr>
<td>DC-coupled with auto reset</td>
<td>M</td>
<td>M</td>
<td>L</td>
<td>M</td>
<td>H</td>
<td>M</td>
</tr>
<tr>
<td>DDA</td>
<td>M</td>
<td>M</td>
<td>H</td>
<td>M</td>
<td>M</td>
<td>M</td>
</tr>
</tbody>
</table>
*1. Note all the techniques utilize the pseudo resistor to achieve acceptable high-pass cut-off frequency and all the amplifier operate in subthreshold region to suppress noise with μA current.
*2. It may require off-chip capacitors to achieve DC-server loop.
*3. The benchmarks of each technique are indicated by three letters: H, M and L respectively stands for high, middle and low performance.

2.2 QRS detection

As shown in Fig. 2-2-1, a normal ECG mainly contains P, Q, R, S and T five waves that can be further represented by PR interval, PR segment, QRS complex, ST segment and QT interval for the convenience of the medical analysis. The characteristics of these intervals, complex and segments are given bellow.

![Fig. 2-2-1. A typical ECG pattern.](image)

P wave: P wave indicates the depolarization of atrial prior to the atrial contraction with a general positive polarity. The time domain P wave features with an average duration of less than 0.12s and an average amplitude of 300μV. The spectrum of P wave is within 15Hz.

PR interval: PR interval is the duration between the beginning of P wave and the beginning of QRS complex. It reflects the stimulation pulse transporting time spreads
from the atria to AV node. The normal duration of PR interval varies from 0.12s to 0.2s among different person. A shorter PR interval less than 0.12s indicates a bypassing AV node stimulus while a longer PR interval more than 0.2s suggests a first-degree heart block.

QRS complex: QRS complex reflects the rapid depolarization of ventricles. Due to the largest muscle mass of ventricles, QRS complex shows the highest amplitude ranging from 1mV to 3mV and hence it is commonly used for the automatic heart rate detection. The normal duration of QRS complex is in the range of 0.06s to 0.12s. The wider QRS complex higher than 0.12s indicates the disruption of the heart conduction system or ventricles rhythms such as ventricular tachycardia. A low-amplitude QRS complex much less than 1mV suggests the pericardial effusion or infiltrative disease while an unusually high QRS complex indicates the left ventricular hypertrophy.

ST segment: The duration starts from the end of S wave to the beginning of the T wave is ST segment. The typical time of ST segments ranges from 0.08s to 0.12s. The elevated ST segment suggests ischemia while the depressed ST segment indicates the myocardial infarction.

T wave: T wave represents the repolarization of ventricles and the duration of T wave is less than 0.3s. Peaked T wave can be a sign of very early myocardial infarction or hyperkalemia and the inverted T wave generally suggests the myocardial ischemia, metabolic abnormalities and etc.
RR interval: RR interval is the duration between two nearest R peaks. It shows the time of one entire cardiac cycle and hence stands for the instantaneous heart rate. It is the critical parameter for the computation of heart rate which is related the heart arrhythmias. The change of the RR interval is called as the heart rate variability (HRV) which is related to the subject’s heart health and stress level.

Above mentioned morphology features of the ECG signal constraint the decision rules of the QRS detection algorithm. The study of QRS detection can be traced back to 1970s [98], [99]. Detection accuracy is the most important performance in early QRS detectors since the algorithms is designed for ECG machines like intensive care unit where power is not constrained. The type of QRS detectors can be summarized as [100]:

(A) Digital differentiation and filters based methods: digitalized ECG is differentiated to reflect the signal slope [101]-[103]. As the R peak shows the highest peak on the obtained ECG derivative, a threshold is used for comparison to decide the R peak. The main task is to reduce the noise interference and to get the proper threshold. Nonlinear digital filters, such as multiplication of backward difference [104], were studied to suppress the noise. Adaptive threshold was commonly used to dynamically update the threshold accompany with refraction period restrictions that two beats should have at least 200ms gap [98]. Those methods are simple yet effective for hardware implementation.

(B) Wavelet transform based methods: discrete wavelet transform (DWT) decomposes the ECG waveform into different scales. Different frequency components
are reflected on those scales. The local maxima of the decomposed coefficients on each scale represent the position of the signal peaks which is known as the singularity detection [105]. Pure DWT based QRS detection methods are based on the singularity detection by finding the local maxima on the decomposed scales [106]-[108]. The implementation of DWT singularity detection based QRS detector can be achieved by utilizing filter banks [109]. DWT is not only used as detection method but also as pre-processing technique in the detector e.g., ECG denoising [110] and feature extraction [111].

(C) Artificial neural network (ANN) based methods: Compare with conventional method, ANN can be trained to mimic any function with extreme nonlinearity. The detection accuracy and robustness are generally better than the handcrafted model based methods [112]. However, the training requires tremendous labeled samples and the trained network needs thousands of multiplications and accumulations. Thus, neural network methods are commonly adopted by arrhythmia classification [113] instead of detection. Nevertheless, there are still reports on QRS detection by using ANN for its superior accuracy [114], [115]. Besides classification and QRS detection, the ANN can also be used for pre-processing e.g. separating the noise and QRS on the unusable noisy ECG signal [116].

(D) Other methods: Inspired by other prediction mathematic models, there are methods such as Hidden Markov Chain (HMC) [117], Genetic Algorithm (GA) [118], Maximum a Posteriori (MAP) [98], Herbert Transform (HT) [119], Shannon Energy
Envelope [120], Mathematical Morphology [121], Zero-Crossing Counts [122] for QRS detection. Those unpopular methods are either lack of detection accuracy such as the HMC, GA, and HT or too computation intensive e.g., the MAP.

Evaluating on MIT-BIH database [123], the accuracy of the QRS detector can easily surpass 99%. Thus, currently low power design is the main challenge for a real-time QRS detector implementation.

Combining the data compression with the digital differentiation, the joint QRS detection is proposed in [124] where the detection accuracy of over 99.6% is achieved with just 133nW power consumption for the detector. ECG compression with QRS detection is a favourable choice for the reduction of power and many other designs took the similar architecture [125]. The 20.9nW real-time event-driven QRS detector is proposed in [126] by cooperating with the level-crossing ADC and achieves an over 97% detection accuracy. The event-driven detector is a great idea as it compatible with the state-of-the-art low power ADC design for ECG acquisition where level-crossing sampling scheme is adopted for the ECG signal. Using the curve length transform and WT, the authors of [127] designed a 642nW ECG feature extractor which can extract the P-wave, QRS complex, and T-wave of the ECG segments for wearable devices. There are many other low power implementations [128]-[133] using power efficient simple QRS detection algorithm, e.g. level-crossing sampling [131] and Wavelet down sampling [133], and low power circuits techniques such as burst transfer and multi-voltage [130].
Except the computation intensive ANN based QRS detection, it is worth noting that all other existing algorithms try to use predefined parameters or constraints for all types of patient. That is, to be robust for handling special cases, leading to complicate models with large number of parameters. Both the diversity of ECG waveform due to the user’s age, gender, region, heart condition and the monitoring environment limit the use of feature or model based QRS detection algorithms. Not to mention the time evolutionary property caused by the physical condition variation during long-term monitoring. Patient specified methods are becoming new trend for ECG waveform classification [134] and for heart rate detection based on Ballistocardiogram [135] but rarely studied in QRS complex detection. A low computation complexity patient specific QRS detector is in need.

2.3 Machine Learning based Cardiac Arrhythmia Classification

Similar with the QRS detector, the design considerations for the Cardiac Arrhythmia Classification (CAC) lies in two aspects. One is the classification accuracy; the other is the energy efficiency. Classification accuracy is mainly decided by the selected algorithm while the energy efficiency is related to both the algorithm and the implementation details.

The implementation of ultralow power CAC has been studied for quite long time. Approaches, running on general purpose processor cores such as MSP430 [136] and customized RISC core [137], only perform feature extractions by employing simple algorithms, e.g., quad-level vector and waveform skeleton analysis, to meet the power
requirement for ambulatory devices due to the inefficient code execution. For better
processing performance and energy efficiency, discrete wavelet decomposition/transform
(DWT) and simple machine learning (ML) methods e.g., support vector machine (SVM),
with low computational complexity were proposed in recent years to achieve CAC
application-specific integrated circuits (ASIC) [138]-[143]. In [141], pseudo down-
sampling wavelet transform and inverse wavelet transform were adopted to detect and
analysis the ECG features. Both architecture-level and circuit-level low power design
techniques, e.g., dynamic clocking and ultralow voltage operation, were used and finally
achieved the 457nW feature extraction processor. However, the classification is still
available. The several tens of microwatt classifiers using maximum likelihood, SVM,
and $k$th nearest neighbour techniques were proposed to classify two types of cardiac
arrhythmia through analyzing the extracted features from a multivariate autoregressive
feature extractor [138]. To further save power, single ML method is implemented, e.g.,
the 2.78µW Bayes classifier with 86% accuracy for the detection of ventricular
arrhythmia [140], [141] and the classifier with energy dissipation of
48.99nJ/classification [142] to identity three types of cardiac arrhythmia by utilizing
cascaded SVM with granular resampling and adaptive speculative mechanisms. In [143],
a 96pJ/clock classifier was implemented to detect atrioventricular block using the time
domain P-QRS-T complex features. Due to the limited power budget, those reported
ASIC based CACs are merely able to classify one or two types of cardiac arrhythmia
with acceptable accuracy.
According to *Association for the Advancement of Medical Instrumentation* (AAMI) recommendation, there are total five types of heart beats i.e., normal beat (N), supraventricular ectopic beats (S), ventricular ectopic beats (V), fusion beats (F) and unclassified beats (Q) [144]. Without consideration of type Q, there are one normal and three abnormal beats should be classified. The high-performance CACs with full classification types and high accuracy are commonly based on deep learning models, i.e., artificial neural network (ANN) and its variants [134], [145]-[149]. Due to the heavy computation burden, ANN based CACs (ANN-CAC) run on workstation, coprocessor and separate accelerators. In order to reduce the computational load while accelerating the training and classification process, morphology features extraction technique is used to reduce the input sample numbers [148]. However, the choices of feature are dependent on specific patients and thus cannot cover the diversity and evolutionary of the ECG due to the gender, age, region, race, clinic environment, and recording time of the patients [149]. To overcome the non-ideal performance of a global network, raw patient specific training data instead of extracted features is adopted to train the network for each patient to improve the accuracy [134], i.e. the patient specific network.

Using raw data samples as the input of the network presents better practical applicability but requires more computational resources since the input number of the network is much higher than that of selected features based networks. Nevertheless, the input data volume can be reduced via down sampling and lower resolution analog-to-digital convertor (ADC), which sacrifices part of information. However, there is a limit
on the reduction of sample amplitude, and the down sampling marginally relaxes the computation complexity. Level-crossing ADC (LC-ADC) has been demonstrated as the best energy-efficient ADC in dealing with bursty ECG signals [150]-[168]. Different from Nyquist sampling where samples are evenly distributed on time, the LC-ADC generates continuous-in-time and discrete-in-amplitude (CTDA) signal, i.e., the number of samples is proportional to the slope of the signal amplitude, extracting more information on fast changing part while less or even no sample on slowly changed or steady signal segments.

Using CTDA signal flow, the ANN structure as well as the arithmetic operations of the CAC can be dramatically simplified. To our best knowledge, such a CTDA CAC has not been studied.

Besides the power consideration, imbalanced data could lead to poor detection accuracy. To address the unbalanced issue in the global ANN-CAC, Chazal proposed the cross-validation data sets [146] where the 44 records from the MIT-BIH database [123] are divided into two datasets with same number of records and approximate proportion of beats by the mixture of routine and complex arrhythmia: the 50000 heart beats of the first data set (DS1) is used for evaluating the CAC performance while the other 50000 heart beats from the second data set (DS2) is used for the ANN CAC training. In the patient specific CAC, the network is individually trained for each record, i.e., each patient has his own CAC weights. In [134], the patient specific CAC is trained for each record in DS1. The training samples comes from two parts: in-common used heart beat samples
from DS2 and the patient specific heart beats that are taken from the first 5 minutes of the specific record from DS1. In this way, the training samples are balanced. The rest of heart beats in the specific record are used for evaluation. Although the number of the training samples is balanced, the patterns in each labeled type are diverse due to the ECG identity. A proper grouping scheme for the training samples is necessary for the patient specific ANN CAC.

2.4 Summary
In this chapter, the design considerations and challenges for the AFE, QRS detector, and the CAC are discussed in the review of existing reports. AC-coupled AFES with auxiliary circuits e.g., the impedance boosting, chopper stabilization, and DC-serve loop, are not the best choice for the ultralow power flexible ECG sensors. Thus, DC-coupled AFE is adopted in this research to achieve a medical grade AFE with low noise, ultralow power, high input impedance, and acceptable CMRR. The design of QRS detector has been a mature topic. Recent research mainly focuses on the optimization of power consumption of the detector. Due to the ECG diversity, personalized QRS detector is in need but rarely studied. For the CAC design, deep learning based patient-specific CAC outperforms other CACs. How to reduce the computation complexity and how to balance the training samples are the two main challenges, which will be the focus of this research.
Chapter 3
A 2.55 NEF 76dB CMRR DC-Coupled FDDA AFE

Wearable ECG sensors are cost effective tools for managing cardiovascular diseases. Analog front-end (AFE) is one of the critical components in biomedical sensors. A medical grade AFE should follow the medical equipment standard, such as IEC 60601-2-47 for ambulatory electrocardiographic (ECG) systems [40] in which the required gain, bandwidth, noise, input offset, input impedance, common mode rejection ratio (CMRR), etc., are defined. For flexible ECG sensors, the design of AFE faces more challenges for two reasons. First, the motion artefact due to body movements could saturate the AFE output. Second, the wearable biomedical sensors need low power solution for long battery life and small sensor size. It is necessary to increase the AFE input impedance to handle the motion artefact. It is also important to achieve high CMRR to minimize ambient noise. For low power implementation, we should keep the circuits as simple as possible.

AC coupled AFE with impedance boosting circuits falls short in power consumption. The clocking and auxiliary compensation of the chopper stabilization AFE faces similar problem. Thus, for our ultralow power flexible ECG sensor, DC-coupled AFE with inherent high input impedance is the best choice. In this chapter, several techniques are presented to achieve the ultralow power, high input impedance, low noise yet high CMRR AFE. The proposed AFE uses fully differential difference amplifier (FDDA) to replace the three-amplifier topology for the improvement of CMRR. An on-
body common mode DC voltage is introduced to provide the on-body DC bias voltage to the bioelectrical signal in order to handle uncontrollable input DC offset. To reduce the flicker noise, large-size input transistors are often used. We propose to reuse the parasitic capacitance in the large input transistors to improve noise performance and area efficiency. We also use the current mirror load in the first stage of FDDA to replace the common mode feedback (CMFB) for better power efficiency.

The rest of this chapter is organized as follows. Details of the proposed FDDA DC-coupled AFE and design considerations are presented in Section 3.1. Simulation results are provided in Section 3.2. Measurement results as well as the performance summary are given in Section 3.3. Conclusion remarks are drawn in Section 3.4.

3.1 FDDA DC-coupled AFE

3.1.1 FDDA Instrumentation Amplifier

The conventional three-amplifier DC-coupled Instrumentation Amplifier (IA) has two amplification branches. The input DC bias is provided by \( V_{cm} \) via PR [66] as in Fig. 2-1-11. The CMRR is mainly determined by the mismatch of gain ratio capacitors. Due to the limited chip area, the feedback ratio capacitors are commonly selected as one or few times of minimum allowable MIM capacitor, otherwise the gain ratio capacitors would be unacceptable large. Hence, this topology generally presents poor CMRR.
The proposed IA, as shown in Fig. 3-1-1, isolates the common-mode input from the amplification branches by utilizing a FDDA, where only differential signal is amplified. The inputs of FDDA are defined by:

\[
\begin{bmatrix}
    v_d \\
v_{cp} \\
v_{cn} \\
v_{cd}
\end{bmatrix}
= \begin{bmatrix}
    1 & -1 & -1 & 1 \\
    1/2 & 1/2 & 0 & 0 \\
    0 & 0 & 1/2 & 1/2 \\
    1/2 & -1/2 & 1/2 & -1/2
\end{bmatrix}
\begin{bmatrix}
    v_{p1} \\
v_{n1} \\
v_{p2} \\
v_{n2}
\end{bmatrix}
\]  \hspace{1cm} (3-1-1)

Where \(v_d, v_{cp}, v_{cn}, \) and \(v_{cd}\) are respectively defined as the differential input signal, the none inverting common mode signal, the inverting common mode signal, and the differential common mode signal.

The linear model of the FDDA is given by:

\[v_{op} - v_{on} = A_d(v_d + v_{cm}/CMRR).\]  \hspace{1cm} (3-1-2)

Fig. 3-1-1. Topology of proposed FDDA DC-coupled IA.
where $A_d$ represents the differential gain, $V_{cm}$ is the common mode signal. The second term in the parentheses can be rewritten as [169]

$$
\frac{V_{cm}}{CMRR} = \frac{V_{cp}}{CMRR_p} + \frac{V_{cn}}{CMRR_n} + \frac{V_{cd}}{CMRR_d}.
$$

(3-1-3)

In (3-1-3), all the CMRRs are in dB; the CMRR$_p$ and CMRR$_n$ are only related to the mismatch of input pair; CMRR$_d$ is determined by both the input pair mismatch (CMRR$_{d,p}$) and tail current mismatch (CMRR$_{d,c}$) as given by (3-1-4) and (3-1-5) [169], respectively.

$$
CMRR_{d,p} = \frac{1}{1 - \sqrt{\beta_n/\beta_p}}
$$

(3-1-4)

$$
CMRR_{d,c} = \frac{1}{1 - l_{cn}/l_c} \frac{(2 - \frac{\beta}{l_c}v^2_{cd})^2}{2 + \frac{\beta}{l_c}v^2_{cd}}
$$

(3-1-5)

In (3-1-4), $\beta_n$ and $\beta_p$ are geometry-dependent amplification factor separately for inverting and non-inverting pairs. In (3-1-5), $\beta$ and $l_c$ are expected ideal values, while $l_{cn}$ and $l_{cp}$ are respectively the tail current of the practical inverting and non-inverting branches. The impact of tail current mismatch can be easily suppressed by increasing the impedance of the tail current via cascaded current mirror or increasing the transistor length to several times (e.g. 4×) of default length. To relax the mismatch of input transistor, large size and symmetrical layout are necessary for these transistors. The large size transistor provides extra benefits in additional to better CMRR in the proposed FDDA. The parasitic capacitance, $(C_{GS} + C_{GB})$ referred as $C_5$ in Fig. 3-1-1, contributes to better gain consistency among different chips without extra area cost. That is because the unit size of transistor is much smaller than the MIM capacitor, i.e., many more symmetrically placed units are allowable for transistors than capacitors with same area.
For a single MOS and MIM capacitor, the precision of absolute value from MOS is worse than MIM. In this process, the minimum size of MIM capacitor is 5μm×5μm. The area taken by 100 minimum MIM capacitors can be used to symmetrically place over 800 transistors with size of W/L=0.5μm/2μm, i.e., the match of MOS capacitor is better than MIM capacitor. In addition to the much more units for reduction of geometry mismatch, the on-body bias relaxes the input DC offset to reduce the non-ideal effect. Thus, gain variation from chip to chip by using MOS capacitor is smaller than using MIM capacitors.

Furthermore, the capacitor area of \( C_2 \) can be reduced since part of capacitance is provided by \( C_5 \), i.e., we can transfer part of capacitance from input transistor to ratio capacitor. In this design, 4 large size input transistors with size of 5mm/2μm are used to improve the noise performance and offset. From the simulation, the parasitic capacitance contributes about 1/3 of the gain, i.e., the ratio of \( C_5 \) and \( C_2 \) is around 1/2. This helps to reduce the chip area. Although the reuse of \( C_5 \) improves the area utilization in terms of the CMRR and noise performance, MIM capacitors are not totally replaced by \( C_5 \) for two reasons. The first is area budget. With same capacitance, \( C_5 \) occupies roughly 7/3 area of \( C_2 \). The second is the increased gate leakage. The impedance at DC is targeted at over 1GΩ in this design, so \( C_5 \) cannot be too large.

Assume the two \( G_m \) cell of the FDDA is matched, and ignoring the CMRR term in (3-1-3). Then we have

\[
\nu_{OP} - \nu_{ON} = A_d \left( (\nu_{P1} - \nu_{N1}) - (\nu_{P2} - \nu_{N2}) \right).
\]  

(3-1-6)
According to the circuit in Fig. 3-1-1, (3-1-6) can be transformed to:

$$v_{OP} - v_{ON} = A_d \left( (v_{P1} - v_{N1}) - \frac{1+j\omega R_1 \cdot C_1}{1+j\omega R_1 \cdot (C_1 + C_2 + C_5)} (v_{OP} - v_{ON}) \right). \quad (3-1-7)$$

Simplify (3-1-7) with assumption that $A_d$ is very large (designed value is 70dB), transfer function for the FDDA IA is given by:

$$H(j\omega) = 1 + \frac{j\omega R_1 \cdot (C_2 + C_6)}{1+j\omega R_1 \cdot C_1}. \quad (3-1-8)$$

That is, FDDA IA has unit DC gain, a high-pass cut-off frequency of $1/(2\pi R_1 C_1)$, and midband gain of $(C_2 + C_5 + C_1) / C_1$. $R_1$ is back-to-back connected pseudo resistor [66] which is hundreds GΩ. The high-pass cut-off is meant to be sub-Hertz, of which the precise value is obtained from the chip measurement. The low-pass cut-off frequency is determined by the FDDA bandwidth and the load ($C_1$, miller capacitor in FDDA, and the input capacitor of PGA).

### 3.1.2 Back-to-back Connected Pseudo Resistor

As discussed in Chapter 2 Section 2.1.3, different diode connected pseudo resistors have different properties. The fabrication process also decides its value. The extremely large value cannot be precisely calculated. But the equivalent resistor level can be approximated via simulation. The simulated diode connected pseudo resistor behaviour for Fig. 2-1-7(a) and Fig. 2-1-7(b) are respectively given in Fig. 3-1-2 and Fig. 3-1-3. For both configurations, the W/L is set as 0.7μm/1μm.
Fig. 3-1-2. Simulated resistance for diode connected PR in Fig. 2-1-7(a).

Fig. 3-1-3. Simulated resistance for diode connected PR in Fig. 2-1-7(b).

The simulation resolution for absolute current and $g_m$ is respectively set as $10^{-16}$ A and $10^{-16}$ S to handle the TΩ resistance evaluation. In Fig. 3-1-2, the resistance keeps over 10 TΩ, and it is proportional to the across voltage. In PGA, the output swing is large, the
over 10 TΩ property is suitable for the suppression of output none-linearity variation. That is, Fig. 2-1-7(a) is chosen in PGA. While in Fig. 3-1-3, the situation is different. It can be found the resistance drops with the increase of the across voltage. When the voltage comes to 1V, the resistance is about 770MΩ. If the voltage in several tens of millivolts, the resistance keeps around 11TΩ. It is a useful feature in IA. Normally, the output of IA is several tens millivolts, i.e., once established the stable feedback in IA, the voltage variation across the feedback pseudo resistor is no more than tens of millivolts. Once motion artefacts encountered, the voltage across the pseudo resistor is large. We hope the feedback can come back quickly. The deteriorated resistance in Fig. 2-1-7(b) helps it because of strong current of the low resistance of the feedback path. Thus, in IA stage, Fig. 2-1-7(b) is selected.

It is noticeable that the simulated TΩ pseudo resistor is not true. The measured resistance is tens of GΩ. It is due to the limitation of the simulation model accuracy. Nevertheless, the simulation result is helpful in analysis since the resistance variation versus the across voltage coincides with the measurement results.

3.1.3 On-body DC Biasing

Employing large size input transistor increases the leakage current, making the leakage resistance comparable with the pseudo resistor. In this case, the inaccurate pseudo resistor bias, such as the bias via $R_2$ in Fig. 2-1-11, results uncontrollable input offset. We introduce an extra DC electrode, as shown in Fig. 3-1-4, to deal with the DC offset. The
DC electrode is placed away from the two input electrodes, to provide the DC bias $V_{cm}$ for the bioelectrical signal.

Fig. 3-1-4. On-body DC bias for EEG and ECG monitoring.

Two examples for electrode placement are individually illustrated in Fig. 3-1-4 for EEG and ECG monitoring. For ECG, as the DC-coupled input impedance is high enough, the distance between two input electrodes placed on chest can be as close as 2cm. The $V_{cm}$ electrode is placed on the left upper arm to provide the DC bias voltage via the path constitute of body tissue ($R_{\text{body}}$) and input electrodes (10nF in parallel with 1MΩ for dry electrode [33]). The equivalent resistance, $R_{\text{body}}$ in series with 1MΩ, is significantly smaller than the pseudo resistor and the equivalent input resistance of the gate leakage, presenting negligible DC offset. The position of the $V_{cm}$ electrode is not limited to left upper arm. It can be placed any point that is at least 5cm away from the input electrodes, e.g., the point for driven right leg (DRL) electrode. The distance for the $V_{cm}$ electrode to input electrode is found from our experiment. The 5cm distance guarantees the acquired signal quality. If the distance comes to a smaller value, the signal amplitude may attenuated by the absorbing of the DC biasing electrode. For EEG acquisition, the
electrode connected to the non-inverting terminal $V_{\text{inp}}$ is placed on the forehead, and the other input electrode is on earlobe. As amplitude of EEG signal is much smaller than ECG, the distance of the two input electrodes cannot be too close as for the ECG. The $V_{\text{cm}}$ electrode is tagged on the other earlobe for biasing.

### 3.1.4 Design Considerations of Proposed FDDA

Besides the large-size input transistor (MN$_7$-MN$_{10}$), two more modifications are made to the conventional FDDA for better power efficiency and circuit stability as shown in Fig. 3-1-5 by red dash boxes. Firstly, the CMFB module to supply the $MP_1$ and $MP_2$ bias is replaced by the current mirror load connection as denoted by the top box. If using CMFB, the loop is from VP and VN to the gate of $MP_1$ and $MP_2$, where there are two main poles. One is at the point VP (or VN) and the other is at the gate of $MP_1$ and $MP_2$. Assume it employs the same CMFB structure with the output stage as shown by the right side of the red dotted line in Fig. 3-1-5, the two poles can be approximated by

$$p_1 \propto \frac{1}{(r_{o,n17}|r_{o,p1})[(1+A_0)c_M+c_{gd,n}\|c_{db,n}\|c_{gd,p1}+c_{db,p1}]}$$

(3-1-9)

$$p_2 \propto \frac{1}{(r_{o,n12}|r_{o,p6})[(1+1/A_0)c_M+2(c_{gd,p1}+c_{db,p1})]}$$

(3-1-10)

where, $A_o$ is the gain for output stage. Due to the Miller effect and the large-size input transistor, the capacitance in (21) is much larger than (22), but the resistance part of (22) is over ten times of that in (21) because of the bias current for the CMFB (10nA) is much smaller than that in the first stage of the core circuit (150nA). That is, the two poles are very close to each other and appear at relative low frequency. Thus, the phase margin
quickly degrades with the decrease of the loop gain. With one feedback amplifier, it’s hard to maintain sufficient loop phase. Extra isolated feedback amplifier is required to achieve the required loop phase and gain [170], which needs extra current. Using the current mirror connection, the CMFB loop stability issue is mitigated. Moreover, this configuration wastes no power on the CMFB.

Fig. 3-1-5. Schematic of proposed FDDA.

The proposed asymmetrical connection deteriorates the CMRR of the FDDA. Even with extremely large size input transistor size and well-matched layout design, the CMRR does not match with the conventional FDDA topology of higher than 80dB [95]-[97]. Nevertheless, it is still better than the conventional three-amplifier DC-coupled topology, and meets the requirements in ECG standard.

The insertion of MN₃ and MN₄ at the output stage stabilizes the biasing current. Without MN₃ and MN₄, the current in the conventional FDDA output stage is determined by the feedback voltage $V_{FB}$ generated from the output of CMFB which is the classic one-
stage amplifier (the five-transistor cell at the right side). This bias is sensitive to process variation. The adoption of MN₃ and MN₄ makes the bias current independent on the fixed V_B rather than the unpredictable V_FB. In this way, the V_FB just modulates the output common mode voltage to V_CM but has no effect on bias current.

3.1.5 Noise Analysis

The input referred noise of the FDDA can be written as

\[ n_{in,eq}^2 = n_{out}^2 / A_v^2 \]

\[ = A_v^2 n_{in,n7}^2 + A_v^2 n_{in,n9}^2 + \left( g_{m,p3} r_{out} \right)^2 n_{in,p3}^2 + \left( g_{m,n3} r_{out} \right)^2 n_{in,n3}^2 \]

\[ \frac{1}{2} \left( g_{m,n7} \frac{1}{2} r_{o,n7} \frac{1}{2} r_{o,p1} \right) g_{m,p3} r_{out} \]

\[ + \left( g_{m,p1} / g_{m,n7} \right)^2 n_{in,p1}^2 \]

\[ = 2 n_{in,n7}^2 + n_{in,p3}^2 / \left( g_{m,n7} r_{o1} \right)^2 + \left( g_{m,p1} / g_{m,n7} \right)^2 n_{in,p1}^2 \]

\[ + \left( g_{m,n3} / \left( g_{m,p3} g_{m,n7} r_{o1} \right) \right)^2 n_{in,n3}^2 \]

\[ \approx 2 n_{in,n7}^2 + \left( g_{m,p1} / g_{m,n7} \right)^2 n_{in,p1}^2, \quad (3-1-11) \]

where \( r_{out} \) is the output equivalent resistance of the output stage, and \( r_{o1} \) is the output equivalent resistance of the first stage. In (3-1-11), we assume MN₉ is the same as MN₇. Note that the 2\(^{nd}\) and 4\(^{th}\) terms contain the \( \left( g_{m,n7} r_{o1} \right)^2 \) attenuation leading to the final approximation. When transistors in subthreshold region, the relation between \( I_D \) and \( V_{GS} \) is given by [171]

\[ I_D \approx I_{D0} \frac{W}{L} e^{V_{GS} / (n V_t)}. \quad (24) \]

In which \( V_t \) represents the thermal voltage, \( I_{D0} \) stands for the characteristic current, and the parameter \( n \) is slope factor, both \( I_{D0} \) and \( n \) are process dependent. Then, the transconductance of the transistor in subthreshold region can be expressed as
\[ g_m = \frac{dI_D}{dV_{GS}} = \frac{1}{nV_t} I_D \frac{W}{L} e^{V_{GS}/(nV_t)} = \frac{I_D}{nV_t}. \]  

(3-1-12)

Considering both thermal and flicker noises, the equivalent voltage noise source seen from the transistor gate is defined by

\[ n_{eq}^2 = \left( \frac{8kT(1+\eta)}{3g_m} + \frac{B}{2fWL} \right) \Delta f, \]  

(3-1-13)

where \( B \) is flicker noise constant and purely decided by process, \( \eta \) is determined by the body coefficient, the surface potential, and the source-body potential.

If the design is ideal, then the current in MP1 is twice of MN7. Assuming \( \eta \) and \( g_m \) has no relation with the transistor size, from (3-1-10), (3-1-12), and (2-1-13), the equivalent input referred noise is obtained as

\[ n_{eq}^2 = \left( \frac{32nkT(1+\eta)V_t}{3I_{D,n7}} + \frac{B}{fW_{n7}l_{n7}} + \frac{2B}{fW_{p1}l_{p1}} \right) \Delta f, \]  

(3-1-14)

To achieve good noise performance, the bias current and the transistor size of MN7-MN10 and MP1, MP2 should be large enough. The simulated integrated input referred noise at different currents and transistor sizes are provided in Fig. 3-1-6. It is obvious that the noise drops quickly before 10000μm² while keeps steady or even slightly higher after that point, where the flicker noise is much lower than thermal noise. Thus, in this design, the size of MN7-MN10 is 5mm/2μm, and the size of MP1 and MP2 is 800μm /10μm. Furthermore, the noise level drops with the increasing of the bias current. However, in ECG and EEG applications, low bias current is preferred to achieve the narrow low-pass cut-off frequency.
Fig. 3-1-6. Simulated integrated input referred noise at different transistor size and bias current.

The load of IA is composed of feedback ratio capacitor, $C_1$ (as in Fig. 3-1-1), the Miller capacitor, $C_M$, and the input ratio capacitor in PGA, $C_4$. If the gain of IA is $G_1$, then the low-pass cut-off frequency of the IA can be derived as

$$f_{LP} = \frac{g_{m,n7}}{G_1[(1 + 1/A_o)C_M + C_1 + C_4]}.$$  \hspace{1cm} (3-1-15)

As $g_{m,n7}$ is purely proportional to bias current, to achieve $f_{LP}$ around 100Hz, the area used for capacitors are extremely large if the current is set to hundreds of nano-ampere. Also, the increase of the current is not preferable for wearable sensors. Thus, the bias current for non-inverting branch and inverting branch is kept as 150nA.
In summary, the large-size input transistor and the load transistor improve the CMRR, gain consistence, and noise. As the parasitic capacitance of the input transistor is reused for gain ratio capacitors as discussed early, thus the increase of transistor size helps to reduce the ratio capacitor in IA. In other words, the large-size transistor doesn’t post as thread to the area budget; rather it helps in improving performance. The increased leakage current due to the larger transistor size degrades the impedance, for which the pseudo resistor biasing is not suitable. The on-body DC biasing is thus proposed to mitigate the deteriorated DC-offset.

Fig. 3-1-7. Overview of 417μm×263μm input transistor layout.

3.1.6 Layout of Large Size Input Transistor

The overview of the large size input transistor MN7-MN10 is given in Fig. 3-1-7. The area is 417μm by 263μm. To make a regular and compact layout, 10 units with centrosymmetric transistors are vertically placed at the left side and 16 units are placed at the right side. As each unit contains 80 small transistors, the orientation of the units has
little mismatch effect. The details of the unit are shown in Fig. 3-1-8 where all transistors are centrosymmetric deployed with equal length metal connections.

![Fig. 3-1-8. Symmetrically placed transistor unit.](image)

### 3.1.7 PGA

PGA has 4 gain settings controlled by two digital control bits as shown in Fig. 3-1-9. The default gain is set as the maximum value which is determined by $C_4/C_3$. The tunable capacitor can be changed among $C_3$, $2C_3$, and $3C_3$, i.e., when switch S is closed, the gain can be set as $1/2$, $1/3$, and $1/4$ of the default gain. Different gain settings present different load for the amplifier OP2. Thus, minimum low-pass cut-off frequency in PGA stage should higher than that of IA stage to avoid the bandwidth changing when tuning gain, i.e., the low-pass cut-off frequency of PGA at minimum gain setting should larger than that in IA.

As AFE is the cascade of IA and PGA. The gain of IA stage is generally higher than the gain in PGA to relax the design of OP2. In this design, the gain of IA and maximum gain of PGA are 75 and 36, respectively. As part of the gain of IA is contributed by the parasitic capacitance of the input transistor, the gain of IA stage is
estimated by simulation. Although the absolute gain is not able to be precisely calculated in design stage, the gain variation is smaller than one using pure MIM capacitor as discussed before. By such gain distribution, the classic differential input, single-end output two-stage amplifier with Miller compensation is adopted for OP2 as in the right side of Fig. 3-1-9.

To drive the on-chip module, generally ADC with capacitive coupling with several pico-farad, 20nA biasing is good enough in the output stage of the PGA. In this design, for testing purpose, extra 100nA is added to drive the probe of the oscilloscope, which is 10MΩ || 4pF, to avoid the use of extra buffer.

Fig. 3-1-9. Topology of PGA.

3.2 Simulation Results

The transient, frequency, noise and current simulation results are presented in this section. All simulations are based on post-layout schematics with extracted parasitic RC. The transient responses of the four gain settings of the proposed AFE are given in Fig. 3-2-1
with 100μV peak-to-peak 40Hz input signal. The perfect sinusoidal signal can be observed at the output with expected gain.

To evaluate the current, transient responses of 8 duplicated AFEs at three gain settings are simulated as shown in Fig. 3-2-2. It can be found, the average current for all gain settings are at 3.94μA, i.e. the current for the proposed AFE is 492.5nA. The current variation is within 1μA, i.e. 125nA for single AFE.

Fig. 3-2-1. AFE transient simulation at different gain.

Fig. 3-2-2. Transient current with 8 duplicated AFE at different gain.
The simulated frequency responses of the AFE at four gain settings are shown in Fig. 3-2-3. The gain in both real number and dB formats are given. The mid-band gain can be programmed at 57dB, 59dB, 63dB, and 65dB, which coincide with the results of the transient responses of Fig. 3-2-1.

Fig. 3-2-3. Simulated frequency responses of the proposed AFE.

Monte Carlo simulation with 100-time samplings is given in Fig. 3-2-4. Both process variation and layout mismatch are considered in the simulation. The minimum gain is 55.5dB and maximum gain is 57dB, i.e. a variation of 2.6% which meets the requirement of IEC standard [40].

The simulated integrated noise from 0.1 to 250Hz is 2.21μVrms. Due to the flicker noise and selective of the band-pass filtering of the AFE, the highest noise appears around 0.1Hz which is 11.5μV2. With the increase of the frequency, the noise drops quickly e.g., the noise at 100Hz is just 75.8nV2 as in Fig. 3-2-5.
Fig. 3-2-4. Monte Carlo simulation of frequency response at minimum gain.

Fig. 3-2-5. Noise simulation of the AFE.
3.3 Measurement Results

The design is fabricated in 0.35µm CMOS process with core area of 450µm×900µm. The chip micrograph is shown in Fig. 3-3-1. The current consumed by the entire AFE is 499nA, of which 350nA is taken by IA, and 120nA is used for the output stage of the PGA to drive the oscilloscope probe with 10MΩ and 4pF load. Except the maximum gain, other three gain settings are verified during the measurement of frequency response, CMRR, and gain variation as shown in Fig. 3-3-2, Fig. 3-3-3, and Table 3-3-1, respectively. It can be seen that different gain settings have fixed bandwidth from 0.4Hz to 120Hz and 76dB CMRR. The gain variation based on the measurement from 9 chips is within 3%, which is slightly higher than the simulation result in Fig. 3-2-4 but still much smaller than the 10% requirement in IEC Standard. The input referred noise is $1.02\mu V_{\text{rms}}$ integrated over the pass-band as shown in Fig. 3-3-4. As the noise is integrated from 0.4 to 120Hz during measurement, the result is close but smaller than the simulated one integrated from 0.1 to 250Hz. Two distortion peaks occur at 50Hz and 150Hz due to the power line interferences as given in Fig. 3-3-5. The total harmonic distortion (THD) is smaller than 1% excluding the components caused by the power line interference.

![AFE chip micrograph](image-url)
Fig. 3-3-2. Measured AFE frequency response at three different gain settings.

Fig. 3-3-3. Tested AFE CMRR at three different gain settings.
Table 3-3-1. Measured gain for the 9 AFE chips.

<table>
<thead>
<tr>
<th>Chip</th>
<th>1</th>
<th>2</th>
<th>3</th>
<th>4</th>
<th>5</th>
<th>6</th>
<th>7</th>
<th>8</th>
<th>9</th>
</tr>
</thead>
<tbody>
<tr>
<td>Gain (dB)</td>
<td>55.6</td>
<td>56.0</td>
<td>56.8</td>
<td>57.0</td>
<td>56.4</td>
<td>56.0</td>
<td>56.3</td>
<td>57.0</td>
<td>56.1</td>
</tr>
<tr>
<td></td>
<td>58.1</td>
<td>58.8</td>
<td>58.5</td>
<td>58.6</td>
<td>58.1</td>
<td>58.6</td>
<td>59.1</td>
<td>58.7</td>
<td>58.6</td>
</tr>
<tr>
<td></td>
<td>61.8</td>
<td>62.1</td>
<td>63.2</td>
<td>62.3</td>
<td>61.8</td>
<td>62.1</td>
<td>62.4</td>
<td>63.4</td>
<td>62.2</td>
</tr>
</tbody>
</table>

Fig. 3-3-4. Measured AFE input referred noise.

Fig. 3-3-5. Analysis of harmonic distortion of the Proposed AFE.
Noise efficiency factor (NEF) was first introduced by [171] to quantify the trade-off between the noise and power, which is defined by

\[
NEF = \frac{2I_{\text{tot}}}{\sqrt{4\pi kT U_T BW}}
\]

(3-3-1)

where \(v_{\text{ni, rms}}\) is the input referred RMS noise voltage; \(I_{\text{tot}}\) is the total current supplied to the OTA; \(U_T\) is the thermal voltage \(kT/q\) and \(BW\) is the OTA bandwidth in Hertz. The NEF of the proposed FDDA DC-coupled IA is 1.98, and 2.55 for entire AFE counting the current to drive oscilloscope probe.

Table 3-3-2. AFE performance comparison with other related works.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Current</td>
<td>1.19µA</td>
<td>0.361µA</td>
<td>5.3µA</td>
<td>31nA</td>
<td>160nA</td>
<td>8.25µA</td>
<td>18µA</td>
<td>50µA</td>
<td>499nA</td>
</tr>
<tr>
<td>VDD</td>
<td>1.2V</td>
<td>0.8V</td>
<td>2V</td>
<td>0.6V</td>
<td>2V</td>
<td>1.2V</td>
<td>5V</td>
<td>1.8V</td>
<td>1.8V</td>
</tr>
<tr>
<td>Gain</td>
<td>38-55dB</td>
<td>26-52dB</td>
<td>50-62dB</td>
<td>51-96dB</td>
<td>40dB</td>
<td>52dB</td>
<td>0-20dB</td>
<td>9.5-40dB</td>
<td>68dB</td>
</tr>
<tr>
<td>Bandwidth</td>
<td>0.5-150Hz</td>
<td>1-400Hz</td>
<td>0-170Hz</td>
<td>0.1-250Hz</td>
<td>0.2-200Hz</td>
<td>1Hz-6.5KHz</td>
<td>0.26-100Hz</td>
<td>N/A</td>
<td>0.4-120Hz</td>
</tr>
<tr>
<td>Input Noise</td>
<td>3.06µV</td>
<td>8.26µV</td>
<td>1.7µV</td>
<td>6.52µV</td>
<td>2.05µV</td>
<td>5µV</td>
<td>3.7µV</td>
<td>0.8µV</td>
<td>1.02µV</td>
</tr>
<tr>
<td>CMRR</td>
<td>64.9dB</td>
<td>66dB</td>
<td>105dB</td>
<td>55dB</td>
<td>65dB</td>
<td>65dB</td>
<td>70dB</td>
<td>82dB</td>
<td>76dB</td>
</tr>
<tr>
<td>THD</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
<td>2.87%</td>
<td>1%</td>
<td>0.95%</td>
<td>N/A</td>
<td>N/A</td>
<td>1%</td>
</tr>
<tr>
<td>Area (mm²)</td>
<td>0.35*</td>
<td>0.86*</td>
<td>5.2*</td>
<td>1.1*</td>
<td>0.18</td>
<td>0.018</td>
<td>1.23</td>
<td>6.48</td>
<td>0.405</td>
</tr>
<tr>
<td>Input Impedance</td>
<td>3.6Ω</td>
<td>200Ω</td>
<td>N/A</td>
<td>110MΩ</td>
<td>20MΩ</td>
<td>N/A</td>
<td>400Ω</td>
<td>2Ω</td>
<td>1Ω</td>
</tr>
<tr>
<td>Process</td>
<td>0.13µm</td>
<td>0.18µm</td>
<td>0.5µm</td>
<td>65nm</td>
<td>0.35µm</td>
<td>0.13µm</td>
<td>0.18µm</td>
<td>0.18µm</td>
<td>0.35µm</td>
</tr>
<tr>
<td>NEF</td>
<td>10.6</td>
<td>8.43</td>
<td>11.7</td>
<td>2.64</td>
<td>2.26</td>
<td>7</td>
<td>N/A</td>
<td>12.3</td>
<td>2.55, 1.98</td>
</tr>
</tbody>
</table>

* The area is approximated from the chip micrograph

Compare with the state-of-the-art works as in Table 3-3-2, this design presents the best NEF which is 1.98, and best CMRR of 76dB among the designs without chopper stabilization (the two designs in the year 2011 from JSSC and TBioCAS utilized the chopper stabilization while all others didn’t). Thanks to the DC-coupling, input
impedance 1GΩ at DC is achieved (JSCC 2016 shows 3.6GΩ using DC-coupling but takes higher power, larger noise and smaller CMRR which are respectively 1.19×1.2μW, 3.06μV and 65dB). The chip area is as small as 0.405mm² benefiting from the reuse of the parasitic capacitance even with almost the highest gain setting.

To demonstrate the capability of the chip in acquiring ECG and EEG signals, we conducted measurement on human using the electrode configuration shown in Fig. 3-1-4. The input electrodes are placed just 2cm apart for ECG recording. The top and bottom ECG waveforms in Fig. 3-3-6 are respectively obtained at resting and walking conditions. The motion artefact during walking slightly distorts the ECG baseline but does not saturate the AFE output. For EEG measurement, one second EEG signal after eye blink and its FFT is presented in Fig. 3-3-7. It is filtered by a notch filter cascaded with a low-pass filter with cut-off frequency of 30Hz to remove power line interference. It is obvious from Fig. 3-3-7 that, oscillation with an energy peak at 12Hz is presented. This coincides with the well-known phenomenon that dominating α-wave appears after the eye blink.

![Monitored ECG signal using the proposed AFE.](image)

Fig. 3-3-6. Monitored ECG signal using the proposed AFE.
To verify the variation of the baseline drift, 2-hour ECG recording is conducted. Part of the recording is shown in Fig. 3-3-8 with time length of 2 minutes. The baseline is steady.

Fig. 3-3-8. Results of 2-hour ECG recording.
3.4 Conclusion

A FDDA DC-coupled AFE is presented in this chapter for flexible biomedical sensors. Using FDDA with improved power efficiency and stability, the AFE has significant improvement on input impedance, CMRR and NEF, e.g., $1\text{G}\Omega$ at DC, $76\text{dB}$ CMRR, and 2.55 for NEF. Utilizing the parasitic capacitance reuse technique, this implementation shows great noise/area efficiency, i.e. $0.405\text{mm}^2$ in $0.35\mu\text{m}$ process. With the high input impedance, the chip is able to not only pick up clear ECG signal at separation of 2cm electrode distance at both resting and walking conditions but also acquire the dominating $\alpha$-wave after eye blink on EEG testing.
Chapter 4
Personalized QRS Detection Based on One Target Clustering and Correlation Coefficient

ECG gives direct and robust evidence for the diagnosis of cardiovascular diseases such as arrhythmia, ischemia and ventricular hypertrophy. The diagnosis is derived from the study of heart rate features like heart rate variability (HRV) via R-R interval information, and ECG morphology characteristics like ECG pattern classification. How accurate the QRS detection algorithm in a system reflects the reliability of the HRV analysis. In the cardiac arrhythmia classification process, the R peak position from the QRS detection is a prime input parameter. Thus, in an ECG processor, the QRS detection algorithm is the foundation of further analysis.

QRS detection algorithm has been studied for long time. It is worth noting that existing algorithms try to use predefined parameters or constraints for all types of patient. That is, to be robust for handling special cases, leading to complicate models with large number of parameters.

A computationally efficient personalized QRS detection algorithm for flexible ECG sensors is presented in this chapter. The proposed user specific QRS template avoids the complicate models and parameters used in existing algorithms while covers most situations for practical applications. The detection is based on the comparison of the correlation coefficient of the user-specific template extracted from individual user with the input ECG signal segment under detection. To reduce the computation, a novel one-
target clustering is proposed to reduce the required loops from \((K+1)K/2\) to 1 comparing to a K target group clustering; meanwhile, the detection judgement is triggered by the possible peaks to avoid a continuous time correlation calculation and can be taken as one-point FIR filter operation following by the dividing of the standard derivation of the input ECG signal segment. It achieves an average positive prediction rate of 99.39% and sensitivity rate of 99.21% evaluated on 48 tapes of MIT-BIH arrhythmia database.

This chapter is organized as follows. Section 4.1 presents the details of proposed algorithm including one target clustering (OTC) and the detection based on Pearson Correlation Coefficients (PCC). Section 4.2 shows the adaptive thresholding for accuracy enhancement. Section 4.3 provides experiment results. Conclusion remarks are drawn in Section 4.4.

### 4.1 User Adaptive Detection

The diagram of proposed algorithm is shown in Fig. 4-1-1. It is divided into two stages: template extraction stage and the detection stage (shown by the dotted and solid arrows respectively).

![Diagram of proposed user adaptive QRS detection algorithm.](image.png)
At the beginning, patient’s ECG signal (ECG clip as marked in Fig. 4-1-1) containing several or tens of QRS complex is sent to the windowed peak detector (WPD) during which the patient is asked to sit still to avoid motion artefacts. Potential R peaks in the ECG clip are identified by the WPD first and then ECG segments centered at the potential R peaks are sent to the customized OTC that employs PCC to calculate the similarity between each two segments to get the user-specified QRS template. Since the segments sent for clustering are QRS candidates selected by the WPD, most of them contains only R peak, thus the number of segments sent to OTC are effectively reduced to improve the computing efficiency. Meanwhile, the simple OTC optimized for the QRS segments further contributes to the lightweight computation. The WPD and OTC are one-time processing i.e., only activated at the beginning for generating the template. Once switching to the detection stage, the peak detector (PD) and PCC are activated. PD is always-on to detect any local peaks by examining each input point. Whenever a peak is detected, the stand-by PCC will compare the segment centered at the detected peak with the stored template. If the similarity is high enough, this segment is taken as a QRS and the peak is labeled as an R peak; Otherwise PCC discards this segment and waits for next comparison. That is, in practical detection, PCC is idle in most time and only the simple PD requires real-time computing.
4.1.1 Peak Detection

The data is firstly processed by the Savitzky–Golay (SG) filter with a span of 15 and degree of 6 and then smoothed by a moving sum filter with length of 26, the same pre-processing and peak detection method proposed in [124]. The peak detection is given by (4-1-1) and (4-1-2).

\[
\bigwedge_{j=0}^{2} S(i - j) - S(i - j - 1) > 0 \quad (4-1-1)
\]

\[
\bigwedge_{j=0}^{2} S(i + j + 1) - S(i + j) < 0 \quad (4-1-2)
\]

The continuously rising edge is defined by (4-1-1) that the signal amplitude at the three points after i-3 meets \(S(i) > S(i-1) > S(i-2) > S(i-3)\) and similar with continuously falling edge. If \(S(i)\) is a peak for both continuously rising and falling edge, then \(S(i)\) is taken as a valid peak.

It is obvious that all peaks no matter P, R, T or even peaks caused by noises could be detected if simply apply (4-1-1) and (4-1-2). Thus, windowed local maxima given by (4-1-3) is applied to the peak detection to remove unwanted P and T or other small peaks for the simplicity of clustering.

\[
S(p_m) = \max\{S(p_l) \cdot S(p_m) \cdot S(p_r)\} \quad (4-1-3)
\]

In (4-1-3), the window is centered at \(S(p_m)\). \(S(p_l)\) and \(S(p_r)\) are the left and right boundaries. If \(S(p_m)\) is the highest peak among all the peaks in the window, then it is a local maximum peak. The local window is decided between the longest QRS length and the shortest R-R interval. The window length is selected longer than QRS to contain enough waveform information. While, the length is set shorter than R-R interval to avoid
multiple R peaks in one segment as multiple R peaks may lead to the multiple-target issue in clustering. As the normal QRS length is from 80ms to 100ms and the normal R-R interval is from 600ms to 1.5s, the window is set as 400ms to cover entire or most part of QRS complex within 400ms and any heart rate under 150bpm.

4.1.2 One Target Clustering

For a conventional clustering, the clustering loop searches for the balanced state that each group remains the same with last loop. Nested loops lead to the heavy computing. What’s worse, initial kernel for each group should be determined.

Table 4-1-1. Pseudo code of proposed one target clustering.

<table>
<thead>
<tr>
<th>Pseudo Code of OTC</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Step 1</strong>: Initialize $T$, $GT$, $G$, $r$, $K$, $G_{1...K}$</td>
</tr>
<tr>
<td><strong>Step 2</strong>: for $i$ from 1 to $K$</td>
</tr>
<tr>
<td>for $j$ from $i+1$ to $K$</td>
</tr>
<tr>
<td>if $\text{cov}(S_i, S_j) &gt; r$</td>
</tr>
<tr>
<td>add $S_j$ to $G_i$, and $S_i$ to $G_j$</td>
</tr>
<tr>
<td>end if</td>
</tr>
<tr>
<td>end for</td>
</tr>
<tr>
<td><strong>Step 3</strong>: $G =$ the group with max member in $G_{1...K}$</td>
</tr>
<tr>
<td><strong>Step 4</strong>: update $GT$ with $G$, get $T$ by averaging $GT$, empty $G$</td>
</tr>
<tr>
<td><strong>Step 5</strong>: for $i$ from 1 to $K$</td>
</tr>
<tr>
<td>if $\text{cov}(S_i, T) &gt; r$</td>
</tr>
<tr>
<td>add $S_i$ to $G$</td>
</tr>
<tr>
<td>end if</td>
</tr>
<tr>
<td>end for</td>
</tr>
<tr>
<td><strong>Step 6</strong>: $G == GT$? Stop: Step 4</td>
</tr>
<tr>
<td><strong>Stop</strong>: Store $T$</td>
</tr>
</tbody>
</table>

Noticed that only one target group needs to be found from the segments of the WPD output. One target clustering (OTC) is proposed to reduce the computations. The
simplified OTC is shown in Table 4-1-1 and Fig. 4-1-2. First, the template kernel $T$, temporary groups $G_{1...K}$ (where $K$ is the total number of segments), target group $GT$ and its updated version $G$ are initialized to empty and threshold of PCC $r$ is set to the proper value (from 0.6 to 0.8, in this paper 0.8 is chosen) that determines the similarity among segments in each group. Then, a general clustering method runs for one time as in Step 2 to get the initial kernel. Since we know neither the kernel for the group nor how to group the segments, we assume each segment stands for one group and the corresponding kernel, which guarantees lossless grouping meanwhile provides initial kernels.

![Diagram of the proposed one target clustering.](image)

Fig. 4-1-2. Diagram of the proposed one target clustering.

One-time clustering with $K$ groups and corresponding kernels are conducted by applying the nested for loop to traverse the segment similarity with each other. Once the
PCC, i.e. $cov(S_i, S_j)$, equals or is higher than the expected threshold value $r$, then the segments $S_i$ and $S_j$ are added to each other’s group as new group members. Since there is only one major group in the input data segments, the group with maximum number of group members is taken as the initial target group. Meanwhile, we get the initial target kernel by averaging the target group in Step 3 and Step 4.

After obtaining the initial target group and kernel, the OTC is conducted by iterating Steps 4 to 6. In Step 5, just one loop is sufficient to traverse the similarity between the current kernel and each segment. In Step 6, the algorithm decides whether to stop or continue the iteration based on the condition that if the current updated target group $G$ is equal to the last target group $GT$ or not. Comparing Step 2 and Step 5, OTC requires only one loop from 1 to $K$ to calculate each update while $K$-target clustering needs $(K + 1)K/2$ loops from 1 to $K$. Moreover, less target means less updates required to reach the stable state. For the proposed OTC, several updates are enough to get the final kernel as the extracted template.

MIT-BIH arrhythmia database [128] is used for verification. Fig. 4-1-3 gives template extraction results of the proposed OTC for a 10s ECG clip from a patient with premature ventricular contractions (PVCs) as shown on the top plot where the deep valley is labeled as Type 5 in the database standing for PVC and the narrow peak is labeled as Type 1 standing for normal QRS. As seen from the middle plot, several abnormal wide peaks after the PVC are detected by the WPD with 400ms window, but the OTC can correctly recognize the major target and extract the 1.2s length template for
the normal QRS. Besides, by simply inversing the amplitude of the clip, the PVC template can be extracted for PVC detection for this patient as shown on the bottom plot with the same window and template length setting.

Fig. 4-1-3. Extracted normal and PVC QRS template by OTC.
4.1.3 Correlation Coefficient

With the user-specific template, a valid QRS can be detected by comparing the similarity of incoming real-time ECG segment with the extracted user-specific template via the PCC as given by:

\[ r = \frac{\sum_{i=1}^{n} (S_i - \bar{S}) \sum_{i=1}^{n} (T_i - \bar{T})}{\sqrt{\sum_{i=1}^{n} (S_i - \bar{S})^2} \sqrt{\sum_{i=1}^{n} (T_i - \bar{T})^2}}, \]  

(4-1-4)

where, \( S_i \) is the \( i \)th point of the current peak segment \( S \), and \( T_i \) is the \( i \)th point of the template; \( n \) is the segment length which is 144 for 400ms window under 360Hz sampling frequency; \( \bar{S} \) and \( \bar{T} \) are the mean values of \( S \) and \( T \), respectively. PCC indicates the linear correlation of the two samples and has a value between -1 and 1. When it is between 0.6 and 0.8, the two samples are linear correlated; when it is higher than 0.8, the two samples are strongly linear correlated. For ECG wave during one record, the condition doesn’t change too much, thus the QRS amplitude and duration are linear correlated. In this application, since the template is extracted, then (4-1-4) can be simplified as

\[ r = \frac{\sum_{i=1}^{n} b_i (S_i - \bar{S})}{\sqrt{\sum_{i=1}^{n} (S_i - \bar{S})^2}} = \frac{\sum_{i=1}^{n} b_i (S_i - \bar{S})}{\sigma_S}. \]  

(4-1-5)

That is, the PCC is calculated by applying a FIR filter with coefficient \( b_i \) to the segment \( (S_i - \bar{S}) \), then divide the filter output by the standard derivation of the same segment. During this calculation, the mean value of sample is removed which is useful to relax the effect of baseline drift.

Detection results of Record 208 with baseline drift and PVC distortion are shown in Fig. 4-1-4. The PCCs of most of the normal QRS complexes are higher than 0.8.
Several QRS complexes with large shape distortion, e.g. from the time 12s to 14s, are having PCCs between 0.6 and 0.8. The PCCs of most of the fake peak segments are lower than 0.4. Two noticeable points that have PCC close to 0.6 are for the segments whose shapes are close to the template. One located around 2s with tiny amplitude, it can be eliminated by comparing with the common R peak; the other is the one caused by the two adjacent PVCs from 6s to 7s, this is hard for detection even for an expert if has no knowledge about the nearby information. Except this segment, all others can be correctly detected. Moreover, utilizing the extracted PVC template by inversing the signal amplitude, the algorithm can be used to detect the PVCs. By combining PVCs position with the ECG R-R interval constraints, those abnormal or fake segments can be easily separated. In practical, the PCC can be set from 0.6 to 0.9 to achieve proper detection rate based on the template length. Short template requires high PCC to avoid too loose separation while long template needs low PCC to avoid too strict rules that may miss correct QRS. As shown on the bottom plot of Fig. 4-1-3, because of the long 1.2s template length, the segments’ tail parts are not exactly overlapped which leads to the small oscillation before 0.2s. This small oscillation reduces the similarity between the template and the PVC segment. Thus, small PCC should be used in this case. As the normal QRS template is totally different from the PVCs, paced beats, and other beat types with distinct segment shapes, the PCC is rather small when comparing the extracted main heart beat template with other rarely happened heart beats. That is, the proposed
algorithm can separate the major heart beats from other types based on the segment similarity judgment and vice versa.

Fig. 4-1-4. PCC values on record 208 with baseline drift and PVC distortions.

### 4.2 Adaptive Thresholding

To further improve the QRS detection accuracy, we are making use of ECG signal properties to eliminate wrongly detected peaks. These wrong peaks are mainly caused by motion artefacts and noises. As ECG morphology varies from person to person, a fixed set of rules is not effective in dealing with these variations. Thus, we propose an adaptive
thresholding scheme, in which PCC reference, R-R interval reference, and template are adaptively adjusted to enhance the detection accuracy.

Fig. 4-2-1. Diagram of adaptive thresholding.

As the PCC reference, R-R interval, and the template are critical to the accuracy of the QRS detection, the updating of them is subject to the strictest condition, i.e. they are updated only if the PCC is over 0.8, which means the segments under detection shows highest similarity with the stored template. In this way, the updating of the template is guaranteed as a segment centered at R peak. For the RR interval value, we set +/−20% as
the maximum variation between two adjacent intervals [174] to remove an abnormal RR value. This conservative updating strategy preserves the judgment conditions from being updated using wrong detections.

As shown in Fig. 4-2-1, the adaptive updating of the reference RR interval \( TT \), reference PCC \( ST \), and the stored template \( C \) for next detection is as follows.

Step 1: The \( TT \), \( ST \), and \( C \) are initialized from the OTC stage, where the subject is asked to sit still. We treat the ECG signal form the OTC stage as the “ideal” ECG signal. Thus, the extracted template, the average RR intervals, and average PCC of each detected QRS complex segments during the OTC are respectively taken as the initial values of \( C \), \( TT \), and \( ST \). The PCC of the last two QRSs, and time information are also initialized to \( S1 \), \( S2 \), \( T1 \), and \( T2 \), respectively.

Step 2: calculate the PCC for current QRS complex under detection, \( SC \).

Step 3: if \( SC>ST \) (the current PCC is greater than the stored average PCC), this means that the current segment is strongly correlated with the stored template, i.e., the current segment is highly possible a QRS complex. To make sure that the current segment is indeed a QRS, we apply RR interval constraints under three cases:

Case 1: the detected RR interval falls within +/-20% variation, i.e. RR interval = 80% TT ~120%TT.

Case 2: the detected RR interval is more than +20% variation, i.e. RR interval > 120% TT.
Case 3: the detected RR interval is less than -20% variation, i.e. RR interval < 80%TT.

In these three cases, we use three sets of rule to determine if the current segment is a QRS in Step 4.1, Step 4.2, and Step 4.3, respectively.

Step 4.1: the first case after Step 3 when SC>ST (the current PCC is greater than the stored PCC). For this case, the current RR interval T is within the reasonable range set by the reference value TT. In this case the segments will be reported as an R peak since it is qualified by the similarity and the time constraint. On this condition, the thresholds will be updated when the similarity is higher than the highest setting, 0.8.

Step 4.2: the second case after Step 3. For this case, the current RR interval T is longer than the upper limit of the set reference range (1+F)TT. That means, the current segment is a special one. If the similarity is higher than the highest value 0.8, then it is reported as an R peak, and the thresholds will be updated to follow the RR interval variation. While, if the similarity is less than 0.8, it is still reported as an R peak, but no thresholds updating occurs to avoid any possible risk.

Step 4.3: the third case after Step 3. For this case, the current RR interval T is shorter than the lower limit of the set reference range (1-F)TT. That means, either the current or the previous detection is wrong. So the decision rules will be applied to remove one of the two detections. No threshold updating in this case.
Step 4.4: if $SC < ST$, it is still possible that the current segment is a QRS considering the distortion of motion artifact. In this case, the $ST$ is decreased with a slope of $0.4/TT$ until reaching the minimum value of 0.4.

### 4.3 Experiment Results

The algorithm is implemented in MATLAB, the GUI of the detection and analysis is shown in Fig. 4-3-1. The passband of the FIR pre-filter is set as 0.7 to 40Hz with transition width of 0.5 and 1Hz, respectively. The waveform of 2-hour recording by using the proposed AFE (as in Section 3.3) is adopted for functionality verification. As shown on the main window, the detected label is accurately predicted at each R peak of the filtered ECG. The detected results and the filtered ECG can be saved to a text file for further study. In the implementation, the HRV parameters are also calculated, e.g., the estimated heart rate (HR) is 82bpm.

To evaluate the detection performance, sensitivity (SE) and positive prediction (+P) are used. As defined by (35) and (36), the false positive (FP), false negative (FN) and true positive (TP) are calculated where FP indicates a declaration of QRS while there is none, FN indicates lost detection of actual QRS event and TP indicates all correctly detected QRS.

$$\text{SE(\%)} = \frac{\text{TP}}{\text{TP+FN}}$$  \hspace{1cm} (4-3-1)

$$\text{+P(\%)} = \frac{\text{TP}}{\text{TP+FP}}$$  \hspace{1cm} (4-3-2)
The SE and +P for each record in MIT-BIH arrhythmia database are summarized in Table 4-3-1. Our algorithm can accurately separate the majority heart beat type from the total beat labels with an average +P of 99.39% and SE of 99.21%. The lowest performance appears at Record 203 where there are highest PVCs that leads to the lowest +P and SE.
The performance comparison of proposed QRS detection algorithms with other algorithms are given in Table 4-3-2. The sensitivity and positive detection rate are comparable to or slight lower (less than 0.5%) than the state-of-the-art algorithm.

Table 4-3-1. Performance Evaluation on MIT-BIH Database

<table>
<thead>
<tr>
<th>Tape</th>
<th>Major/Total</th>
<th>FN</th>
<th>FP</th>
<th>+P</th>
<th>SE</th>
</tr>
</thead>
<tbody>
<tr>
<td>100</td>
<td>2273/2273</td>
<td>2</td>
<td>0</td>
<td>100%</td>
<td>99.91%</td>
</tr>
<tr>
<td>101</td>
<td>1865/1865</td>
<td>6</td>
<td>0</td>
<td>100%</td>
<td>99.68%</td>
</tr>
<tr>
<td>102</td>
<td>2183/2187</td>
<td>24</td>
<td>24</td>
<td>98.90%</td>
<td>98.90%</td>
</tr>
<tr>
<td>103</td>
<td>2084/2084</td>
<td>4</td>
<td>0</td>
<td>100%</td>
<td>99.81%</td>
</tr>
<tr>
<td>104</td>
<td>1380/2229</td>
<td>55</td>
<td>44</td>
<td>96.76%</td>
<td>95.98%</td>
</tr>
<tr>
<td>105</td>
<td>2526/2572</td>
<td>84</td>
<td>23</td>
<td>99.06%</td>
<td>96.65%</td>
</tr>
<tr>
<td>106</td>
<td>1507/2027</td>
<td>1</td>
<td>6</td>
<td>99.60%</td>
<td>99.93%</td>
</tr>
<tr>
<td>107</td>
<td>2078/2137</td>
<td>2</td>
<td>1</td>
<td>99.95%</td>
<td>99.90%</td>
</tr>
<tr>
<td>108</td>
<td>1746/1763</td>
<td>41</td>
<td>39</td>
<td>97.76%</td>
<td>97.65%</td>
</tr>
<tr>
<td>109</td>
<td>2494/2532</td>
<td>11</td>
<td>12</td>
<td>99.52%</td>
<td>99.56%</td>
</tr>
<tr>
<td>111</td>
<td>2123/2124</td>
<td>38</td>
<td>38</td>
<td>98.21%</td>
<td>98.21%</td>
</tr>
<tr>
<td>112</td>
<td>2539/2539</td>
<td>0</td>
<td>0</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>113</td>
<td>1795/1795</td>
<td>8</td>
<td>0</td>
<td>100%</td>
<td>99.95%</td>
</tr>
<tr>
<td>114</td>
<td>1836/1879</td>
<td>4</td>
<td>1</td>
<td>99.95%</td>
<td>99.78%</td>
</tr>
<tr>
<td>115</td>
<td>1953/1953</td>
<td>1</td>
<td>0</td>
<td>100%</td>
<td>99.95%</td>
</tr>
<tr>
<td>116</td>
<td>2303/2412</td>
<td>29</td>
<td>0</td>
<td>100%</td>
<td>98.72%</td>
</tr>
<tr>
<td>117</td>
<td>1535/1535</td>
<td>2</td>
<td>0</td>
<td>100%</td>
<td>99.87%</td>
</tr>
<tr>
<td>118</td>
<td>2262/2278</td>
<td>6</td>
<td>2</td>
<td>99.91%</td>
<td>99.73%</td>
</tr>
<tr>
<td>119</td>
<td>1543/1987</td>
<td>0</td>
<td>0</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>121</td>
<td>1862/1863</td>
<td>2</td>
<td>1</td>
<td>99.95%</td>
<td>99.89%</td>
</tr>
<tr>
<td>122</td>
<td>2476/2476</td>
<td>2</td>
<td>0</td>
<td>100%</td>
<td>99.92%</td>
</tr>
<tr>
<td>123</td>
<td>1515/1518</td>
<td>2</td>
<td>0</td>
<td>100%</td>
<td>99.87%</td>
</tr>
<tr>
<td>124</td>
<td>1572/1619</td>
<td>6</td>
<td>6</td>
<td>99.62%</td>
<td>99.62%</td>
</tr>
<tr>
<td>200</td>
<td>1775/2601</td>
<td>14</td>
<td>24</td>
<td>98.66%</td>
<td>99.22%</td>
</tr>
<tr>
<td>201</td>
<td>1765/1963</td>
<td>22</td>
<td>7</td>
<td>99.6%</td>
<td>98.74%</td>
</tr>
<tr>
<td>202</td>
<td>2117/2136</td>
<td>1</td>
<td>4</td>
<td>99.81%</td>
<td>99.95%</td>
</tr>
<tr>
<td>203</td>
<td>2536/2980</td>
<td>163</td>
<td>166</td>
<td>93.45%</td>
<td>93.41%</td>
</tr>
<tr>
<td>205</td>
<td>2585/2656</td>
<td>1</td>
<td>1</td>
<td>99.96%</td>
<td>99.96%</td>
</tr>
<tr>
<td>207</td>
<td>1457/1543</td>
<td>14</td>
<td>10</td>
<td>99.31%</td>
<td>99.04%</td>
</tr>
<tr>
<td>208</td>
<td>1586/2955</td>
<td>22</td>
<td>26</td>
<td>98.68%</td>
<td>98.88%</td>
</tr>
<tr>
<td>209</td>
<td>3005/3006</td>
<td>0</td>
<td>0</td>
<td>100%</td>
<td>100%</td>
</tr>
<tr>
<td>210</td>
<td>2446/2640</td>
<td>19</td>
<td>12</td>
<td>99.50%</td>
<td>99.22%</td>
</tr>
<tr>
<td>212</td>
<td>2748/2748</td>
<td>0</td>
<td>0</td>
<td>100%</td>
<td>100%</td>
</tr>
</tbody>
</table>
Table 4-3-2. Comparison with Other Algorithms

<table>
<thead>
<tr>
<th>Method</th>
<th>SE(%)</th>
<th>+P(%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Multiscale Morphology [121]</td>
<td>99.81</td>
<td>99.80</td>
</tr>
<tr>
<td>Quadratic Spline wavelet [175]</td>
<td>99.31</td>
<td>99.70</td>
</tr>
<tr>
<td>Joint QRS [124]</td>
<td>99.64</td>
<td>99.81</td>
</tr>
<tr>
<td>Event Driven [126]</td>
<td>97.76</td>
<td>98.59</td>
</tr>
<tr>
<td>Genetic Algorithm [176]</td>
<td>99.60</td>
<td>99.51</td>
</tr>
<tr>
<td>Proposed Algorithm</td>
<td>99.21</td>
<td>99.39</td>
</tr>
</tbody>
</table>

4.4 Conclusion

In this chapter, the details of the proposed user adaptive QRS detection algorithm are presented. It employs PD triggering, simplified OTC, simplified PCC, and adaptive thresholding. The proposed OTC requires \( (K^2 + K - 2)/2 \) less loops compared to the \( K \)-target group clustering. The simplified PCC is equivalent to a one-point FIR operation following by a division by the standard derivation of the triggered input segment. Average +P and SE of the proposed algorithm are 99.39% and 99.21%, respectively. The
GUI of the algorithm is developed in MATLAB environment, in which pre-filtering parameters can be set. Moreover, based on the detected QRS position, HRV analysis is also conducted.
Chapter 5
An Event-driven Patient Specific ANN-CAC

Deep learning methods, e.g., artificial neural network (ANN) and convolutional neural network (CNN), are favoured in cardiac arrhythmia classifications (CAC) for its high accuracy. The insufficient abnormal beats and extremely unbalanced data set limit the ANN-CAC performance. Moreover, the implementation of such method is challenging for wearable electrocardiogram (ECG) sensors due to heavy computational cost.

An event-driven ANN-CAC is presented in this chapter to address the challenges. Continuous-in-time discrete-in-amplitude (CTDA) signal flow is adopted for the reduction of multiplication operations in ANN-CAC. Customized three-stage pipelined multiplier and customized finite stage machine (FSM) are adopted to enhance the execution speed and efficiency. Regarding the unbalanced data set and the ECG identity, conditional grouping scheme (CGS) accompany with the biased training (BT) are proposed to enhance the classification accuracy. The design is verified on FPGA and implemented in CMOS 0.18μm process. Simulated average power is 13μW for the patient with heart rate of 75bpm. Verified on MIT-BIH arrhythmia database, the design shows over 99% classification accuracy, 97% sensitivity, and 94% positive predictivity on the 5 types defined by the AAMI standard.

This chapter is organized as follows. In Section 5.1, data preparation and the details of CGS is described. Feature selections, implementation considerations, and topology of the proposed CTDA patient specific ANN-CAC are presented in Section 5.2.
BT process is shown in Section 5.3. Implementation details including the customized multiplier and FSMs are presented in Section 5.4. Benchmark results of the implemented event-driven ANN-CAC are given in Section 5.5. Conclusions are discussed in Section 5.6.

5.1 Data Preparation

Table 5-1-1. Heart beat label and corresponding meaning in MIT-BIH

<table>
<thead>
<tr>
<th>Symbol</th>
<th>Label</th>
<th>Meaning</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>1</td>
<td>Normal beat</td>
</tr>
<tr>
<td>L</td>
<td>2</td>
<td>Left bundle branch block beat</td>
</tr>
<tr>
<td>R</td>
<td>3</td>
<td>Right bundle branch block beat</td>
</tr>
<tr>
<td>a</td>
<td>4</td>
<td>Aberrated atrial premature beat</td>
</tr>
<tr>
<td>V</td>
<td>5</td>
<td>Premature ventricular contraction</td>
</tr>
<tr>
<td>F</td>
<td>6</td>
<td>Fusion of ventricular and normal beat</td>
</tr>
<tr>
<td>J</td>
<td>7</td>
<td>Nodal (junctional) premature beat</td>
</tr>
<tr>
<td>A</td>
<td>8</td>
<td>Atrial premature beat</td>
</tr>
<tr>
<td>S</td>
<td>9</td>
<td>Premature or ectopic supraventricular beat</td>
</tr>
<tr>
<td>E</td>
<td>10</td>
<td>Ventricular escape beat</td>
</tr>
<tr>
<td>j</td>
<td>11</td>
<td>Nodal (junctional) escape beat</td>
</tr>
<tr>
<td>/ (P)</td>
<td>12</td>
<td>Paced beat</td>
</tr>
<tr>
<td>Q</td>
<td>13</td>
<td>Unclassified beat</td>
</tr>
<tr>
<td>e</td>
<td>34</td>
<td>Atrial escape beat</td>
</tr>
<tr>
<td>f</td>
<td>38</td>
<td>Fusion of paced and normal beat</td>
</tr>
</tbody>
</table>

Table 5-1-2. Mapping of MIT-BIH labels to AAMI types

<table>
<thead>
<tr>
<th>AAMI</th>
<th>Meaning</th>
<th>MIT-BIH</th>
<th>Redefined Label</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>Any beat not in S, V, F, and Q</td>
<td>1, 2, 3, 34,</td>
<td>[1, 0, 0, 0, 0]</td>
</tr>
<tr>
<td>S</td>
<td>Supraventricular ectopic beats</td>
<td>4, 7, 8, 9</td>
<td>[0, 1, 0, 0, 0]</td>
</tr>
<tr>
<td>V</td>
<td>Ventricular ectopic beats</td>
<td>5, 10</td>
<td>[0, 0, 1, 0, 0]</td>
</tr>
<tr>
<td>F</td>
<td>Fusion beats</td>
<td>6</td>
<td>[0, 0, 0, 1, 0]</td>
</tr>
<tr>
<td>Q</td>
<td>Unknown beats</td>
<td>12, 13, 38</td>
<td>[0, 0, 0, 0, 1]</td>
</tr>
</tbody>
</table>

For the consistency of comparison with other works on the benchmarking, the standard MIH-BIH arrhythmia database is used for network training and performance evaluation.
There are 48 half-hour sets of ECG records from 47 subjects [123]. Modified limb Lead II (MLII) and modified Lead V ECG signals are stored in each record. Only MLII data is used in this study to keep consistency with other works. All records are digitalized with 11-bit resolution over 10mV at 360Hz sampling frequency. The entire database provides 15 types heart beats. The beat type related labels and its corresponding annotation meaning as defined in MIT-BIH are shown in Table 5-1-1. As recommended by AAMI practice [144], the 4 paced records including 102, 104, 107 and 217, should be removed from the study. Except the unnecessary Q type for the evaluation of CAC in AAMI, i.e., paced beat (label 12), unclassified beat (label 13), and fusion of normal and paced beat (label 38), all other N, S, V and F types should be evaluated from MIT-BIH labels. For further update and the coverage of all situations in practical, all the 5 types are considered in the proposed ANN-CAC even though Q type is not required by AAMI Standard, i.e., this design has five output types. The mapping of each AAMI heartbeat type and the redefined labels for classification purpose are provided in Table 5-1-2.

5.1.1 Identity of ECG

The morphologies of same type of beat defined in AAMI from different patients are quite different. Taking type S as an example as shown in Fig. 5-1-1. The S type from 44 records excluding paced beats can be divided into sub-morphologies. Six of them from records 100, 207, 220, 118, 228 and 124 with obvious different waveforms are presented. In record 100, the ECG QRS complex shows symmetrical small Q and S valleys. Except the flat T peak, the entire waveform is similar with a normal heartbeat, while the shapes
are quite different from the normal one in other records. The heartbeats in records 207, 220, 118, 228 and 124 have remarkable deep R valley, distinguishing sharp and deep S valley, deep but wide S valley, obvious Q valley, relative short PQ segment, respectively. It should be noticed that the morphology differences of S type are not limited to the illustrated six records but each record has its own details. Not only S type but also V type has its own property in each record. We call the diversity of the ECG morphology as the “identity” of the ECG.

Fig. 5-1-1. Illustration of ECG identity.

Apparently, a global classifier is not able to handle the identity well and thus leads to low classification accuracy. The joint patient specific set [134] also introduces interferences during training. For example, in the training of the ANN-CAC for patient 207, there is no S type appears in the first 5 minutes. Thus, all the S types are borrowed
from the in-commonly used beats during training. In this case, the trained network is biased to the shapes like 100 and 124 that are totally different with 207, finally leading to poor accuracy on the detection of S type for patient 207. In addition, the inconsistency of the morphology under the same type from in-common used records generally results in error oscillations during training. Therefore, more reasonable data grouping and training strategy should be adopted.

5.1.2 Conditional Data Grouping Scheme (CGS)

We propose to use the CGS data grouping method to address issues related to the identity of ECG, i.e. grouping the data according to the heartbeat statistics of each record in the database to remove the records with too few abnormal beats from training and classification. The threshold is set at 1% of the entire record. If the abnormal beats less than this criterion, the record will not be considered in the group that contains the corresponding label.

The statistics of beat types for both entire record and the first 5 minutes in each record are provided in Table 5-1-3. We clustered the dataset into 5 groups, i.e., (N, S), (N, V), (N, S, V), (N, V, F), and (N). For (N), the records provide negligible number of abnormal beats (less than 1% of the total beats), e.g., the record 113 contains total 6 S type beats out of 1789 and only 3 of them occur in the first 5 minutes. If we train the ANN-CAC using their own abnormal beats, the samples are far from enough. While, borrowing from other records will make part of the N types being treated as other types due to the ECG identity.
### Table 5-1-3. Statistics of heartbeat number in each record and the proposed CGS

<table>
<thead>
<tr>
<th>Record</th>
<th>Entire Record</th>
<th>1% Total</th>
<th>Total</th>
<th>First 5 Minutes</th>
<th>CGS</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>N</td>
<td>S</td>
<td>V</td>
<td>F</td>
<td>N</td>
</tr>
<tr>
<td>232</td>
<td>398</td>
<td>1382</td>
<td>0</td>
<td>0</td>
<td>18</td>
</tr>
<tr>
<td>209</td>
<td>2621</td>
<td>383</td>
<td>1</td>
<td>0</td>
<td>30</td>
</tr>
<tr>
<td>222</td>
<td>2274</td>
<td>209</td>
<td>0</td>
<td>0</td>
<td>25</td>
</tr>
<tr>
<td>118</td>
<td>2166</td>
<td>96</td>
<td>16</td>
<td>0</td>
<td>23</td>
</tr>
<tr>
<td>220</td>
<td>1954</td>
<td>94</td>
<td>0</td>
<td>0</td>
<td>20</td>
</tr>
<tr>
<td>202</td>
<td>2061</td>
<td>55</td>
<td>19</td>
<td>1</td>
<td>21</td>
</tr>
<tr>
<td>234</td>
<td>2700</td>
<td>50</td>
<td>3</td>
<td>0</td>
<td>28</td>
</tr>
<tr>
<td>100</td>
<td>2239</td>
<td>33</td>
<td>1</td>
<td>0</td>
<td>23</td>
</tr>
<tr>
<td>233</td>
<td>2230</td>
<td>7</td>
<td>831</td>
<td>11</td>
<td>31</td>
</tr>
<tr>
<td>106</td>
<td>1507</td>
<td>0</td>
<td>520</td>
<td>0</td>
<td>20</td>
</tr>
<tr>
<td>203</td>
<td>2529</td>
<td>2</td>
<td>444</td>
<td>1</td>
<td>30</td>
</tr>
<tr>
<td>119</td>
<td>1543</td>
<td>0</td>
<td>444</td>
<td>0</td>
<td>20</td>
</tr>
<tr>
<td>221</td>
<td>2031</td>
<td>0</td>
<td>396</td>
<td>0</td>
<td>24</td>
</tr>
<tr>
<td>228</td>
<td>1688</td>
<td>3</td>
<td>362</td>
<td>0</td>
<td>21</td>
</tr>
<tr>
<td>214</td>
<td>2003</td>
<td>0</td>
<td>256</td>
<td>1</td>
<td>23</td>
</tr>
<tr>
<td>210</td>
<td>2423</td>
<td>22</td>
<td>195</td>
<td>10</td>
<td>27</td>
</tr>
<tr>
<td>215</td>
<td>3195</td>
<td>3</td>
<td>164</td>
<td>1</td>
<td>34</td>
</tr>
<tr>
<td>116</td>
<td>2302</td>
<td>1</td>
<td>109</td>
<td>0</td>
<td>24</td>
</tr>
<tr>
<td>205</td>
<td>2571</td>
<td>3</td>
<td>71</td>
<td>11</td>
<td>27</td>
</tr>
<tr>
<td>219</td>
<td>2082</td>
<td>7</td>
<td>64</td>
<td>1</td>
<td>22</td>
</tr>
<tr>
<td>114</td>
<td>1820</td>
<td>12</td>
<td>43</td>
<td>4</td>
<td>19</td>
</tr>
<tr>
<td>105</td>
<td>2526</td>
<td>0</td>
<td>41</td>
<td>0</td>
<td>26</td>
</tr>
<tr>
<td>109</td>
<td>2492</td>
<td>0</td>
<td>38</td>
<td>2</td>
<td>25</td>
</tr>
<tr>
<td>201</td>
<td>1635</td>
<td>128</td>
<td>198</td>
<td>2</td>
<td>20</td>
</tr>
<tr>
<td>207</td>
<td>1543</td>
<td>107</td>
<td>210</td>
<td>0</td>
<td>19</td>
</tr>
<tr>
<td>200</td>
<td>1743</td>
<td>30</td>
<td>826</td>
<td>2</td>
<td>26</td>
</tr>
<tr>
<td>223</td>
<td>2045</td>
<td>73</td>
<td>473</td>
<td>14</td>
<td>26</td>
</tr>
<tr>
<td>124</td>
<td>1536</td>
<td>31</td>
<td>47</td>
<td>5</td>
<td>16</td>
</tr>
<tr>
<td>208</td>
<td>1586</td>
<td>2</td>
<td>992</td>
<td>373</td>
<td>30</td>
</tr>
<tr>
<td>213</td>
<td>2641</td>
<td>28</td>
<td>220</td>
<td>362</td>
<td>33</td>
</tr>
<tr>
<td>113</td>
<td>1789</td>
<td>6</td>
<td>0</td>
<td>0</td>
<td>18</td>
</tr>
<tr>
<td>108</td>
<td>1740</td>
<td>4</td>
<td>17</td>
<td>2</td>
<td>18</td>
</tr>
<tr>
<td>101</td>
<td>1860</td>
<td>3</td>
<td>0</td>
<td>0</td>
<td>19</td>
</tr>
<tr>
<td>103</td>
<td>2082</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>21</td>
</tr>
<tr>
<td>112</td>
<td>2537</td>
<td>2</td>
<td>0</td>
<td>0</td>
<td>25</td>
</tr>
<tr>
<td>231</td>
<td>1568</td>
<td>1</td>
<td>2</td>
<td>0</td>
<td>16</td>
</tr>
<tr>
<td>121</td>
<td>1861</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>19</td>
</tr>
<tr>
<td>117</td>
<td>1534</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>15</td>
</tr>
<tr>
<td>123</td>
<td>1515</td>
<td>0</td>
<td>3</td>
<td>0</td>
<td>15</td>
</tr>
<tr>
<td>111</td>
<td>2123</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>21</td>
</tr>
<tr>
<td>230</td>
<td>2255</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>23</td>
</tr>
<tr>
<td>115</td>
<td>1953</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>20</td>
</tr>
<tr>
<td>122</td>
<td>2476</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>25</td>
</tr>
<tr>
<td>212</td>
<td>2748</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>27</td>
</tr>
</tbody>
</table>

98
In the proposed CGS, first four groups are considered for classification purpose. Each group has its own training data and classification task. For example, types N, V, and F samples from an individual record form the training ensemble and evaluation set for the records in (N, V, F) to avoid the interferences from other records. Since the 1% criterion is set, the considered records in each group contain at least several tens corresponding beats for training purpose. The training set is completely from the patient specific record. For other groups, same principle is applied.

5.2 Proposed CTDA ANN-CAC
Continuous-in-time and discrete-in-amplitude (CTDA) is a type of signal that features both analog signal and digital signal properties. It is a non-uniform signal processing concept firstly reported in 1966, but has not been implemented until 1996 [151]-[153]. CTDA produces samples proportional to the signal amplitude instead of evenly sampling on time. It is free from aliasing compared with the Nyquist sampling scheme. With slow changing amplitude, few or even no samples will be generated using CTDA scheme. The inherent adaptive sampling significantly reduces the number of samples in the recovery of burst signal such as ECG and music. Level-crossing analog-to-digital converter (LC-ADC) is the circuits implementation of CTDA signal flow. Adoption of LC-ADC becomes an excellent candidate for the design of ultralow power wearable ECG sensors in recent years.
5.2.1 **CTDA vs Nyquist Sampling**

Nyquist sampling scheme acquires samples that are evenly distributed in time regardless of the amplitude change which is also the basic operational principle of general-purpose ADCs. Contrast to Nyquist sampling scheme, the CTDA sampling scheme acquires samples that are evenly distributed in amplitude, which can be done by a level-crossing ADC. Fig. 5-2-1 shows the two sampling schemes for an ECG signal. The Nyquist sampling scheme treats the baseline part and the P-QRS-T segment of the ECG at the same weight of length, i.e., the length of the signal decides the number of samples. As illustrated in Fig. 5-2-1(A), 23 and 12 samples are acquired for the baseline signal and P-
QRS-T signal, respectively, during one ECG cycle. The sampling effort at each point is the same, but baseline samples contribute little useful morphology information on the classification of cardiac arrhythmia.

Different from Nyquist sampling, the CTDA scheme samples the ECG signal based on the change of amplitude, i.e., obtains samples evenly distributed in amplitude. The number of samples is sensitive to the change of the amplitude. Sampling occurs whenever the signal surpasses the upper or lower thresholds of an amplitude interval window. For the ECG signal as shown in Fig. 5-2-1(B), 26 samples are obtained for the ECG complex with no samples at baseline. Comparing Fig. 5-2-1(A) and Fig. 5-2-1(B), 2 and 2 samples are for P peaks, 4 and 20 samples are for QRS complex, and 5 and 4 samples are for T peaks in the corresponding Nyquist and CTDA schemes, respectively. That is, the sampling efforts at baseline are suppressed while more samples are assigned to the fast-change part of the ECG signal in CTDA. Thus, the computations for baseline samples are saved in the ANN-CAC.

Considering a person with 75bpm heart rate, 160ms length P-QRS-T segment, 1mV R peak, S and T are at 1/5 of R, and the ECG signal is monitored with a AFE with gain of 60dB, then the number of samples generated in one ECG cycle by a 250Hz 8-bit Nyquist ADC and a LC-ADC with 20mV level window are 200 and 140, respectively, i.e., 1600 bits for Nyquist and 140 bits for CTDA. Moreover, just 20% of the 1600 bits in Nyquist data are for useful P-QRS-T segment while almost all 140-bit CTDA data are for the rapidly changing of the P-QRS-T. The samples have 90% reduction using LC-ADC.
5.2.2 Processing CTDA Signal for ANN-CAC

As shown in Fig. 5-2-2, the topology of the state-of-the-art ultralow power wearable ECG sensor with CTDA signal flow is composed of power management unit (PMU), analog frontend (AFE), LC-ADC, event-driven QRS detector, encoder and transmitter with on-chip antenna [66]. The instrument amplifier (IA), which commonly has fixed gain and low noise, buffers the vulnerable on-body ECG signal (RA and LA leads are used for illustration) and passes the output to the followed programmable gain amplifier (PGA) to amplify the obtained weak ECG to a proper amplitude. Then the acquired analog signal will be converted into pulses with direction and time labels in the LC-ADC. Based on the labeled pulses from the LC-ADC, the event-driven QRS detector can accurately detect the position of R peak and extracts the two consecutive RR intervals. Raw samples from the output of LC-ADC are encoded into transmitter preferred format, and then wireless transmits to the gateway via on-chip power amplifier and antenna. The ECG visualization terminal, such as smartphone, tablet and laptop, captures the in-air signal and interacts with user.

The proposed ANN-CAC requires the samples from an LC-ADC and the extracted two adjacent RR intervals from the event-driven QRS detector rather than complex feature extractions. The pulses with positive and negative direction from the LC-ADC are mapped to 1s and 0s in the input of the ANN-CAC, respectively, as shown in Fig. 5-2-3 taking the R peak as an example. With a stream of 1s and 0s and RR
interval information, the presented ANN-CAC with CTDA signal flow can be seamlessly integrated into an ultralow power ECG sensor.

Fig. 5-2-2. Topology of CTDA ECG sensor.

Fig. 5-2-3. Mapping of LC-ADC output and adjacent RR intervals to ANN-CAC input.
5.2.3 Topology of Proposed CTDA ANN-CAC

Benefiting from the CTDA LC-ADC sampling scheme, the input of the ANN is a vector of pure 1s and 0s, of which 22-bit are from the two RR intervals and 74-bit are the mapped series from the output of the LC-ADC to represent the morphology as illustrated in Fig. 5-2-3. As shown in Fig. 5-2-4(A), the network size is 32×16×5. The 32 neurons in the first layer need no multiplications because of only 1 or 0 are possible for each input. Thus, the arithmetic operations before the activation can be expressed as:

\[ x_i = b_i + \sum w_{i,j}, \quad \text{for } S_j \neq 0, \quad (5-2-1) \]

where \( w_{i,j} \) is the weight for the \( j^{\text{th}} \) input to the \( i^{\text{th}} \) neuron in the first layer, \( S_j \) represents the \( j^{\text{th}} \) input sample, and \( b_i \) and \( x_i \) are respectively the bias and the activation input of the \( i^{\text{th}} \) neuron in the first layer.

The 16-bit weights and biases are fixed point format with 1-bit sign, 1-bit integer. Since the input and targeting output are 1s and 0s, the absolute value of all the weights and biases after training are less than or close to 2. 1-bit integer is enough for this scenario. The minimum integer bit leaves maximum room for the best representation of decimal part of the weights and biases. Although the input is either 1 or 0, the accumulation in the first layer is for weights and biases of the neurons, i.e., the activations of the first layer are decimals. Thus, the operations ahead of the activation of neurons in second and third layers should be in full word-length:

\[ x_i = b_i + \sum (a_j \cdot w_{i,j}), \quad (5-2-2) \]
in which, \(a_j\) is the activation of the \(j^{th}\) neuron from previous layer, \(w_{l,j}\) stands for the weight from the \(j^{th}\) neuron in previous layer to the \(i^{th}\) neuron in current layer, and \(b_i\) and \(x_i\) respectively represent the corresponding neuron bias and the input of the current neuron activation.

![Diagram of ANN](image)

**Fig. 5-2-4.** Topology of the proposed patient specific CTDA ANN-CAC.

There are two main types of activation functions used in ANN. One is the totally non-linear functions such as Sigmoid, TanH and SoftPlus. The other is partial linear functions, e.g., rectified linear unit (ReLU), bipolar ReLU and leaky ReLU. Each type of
activation has its own advantages and disadvantages that affect the training process. For example, the Sigmoid suffers from gradient vanishing issue, may saturate the gradient, and leads to slow convergence in training. ReLU solves the gradient vanishing well, while it may lead to dead neurons in training. Hence, the disadvantages are not limitations in the implementation of the ANN once the issues are addressed in training stage, where final fixed weights and biases are obtained.

For power constraint wearable applications, the main consideration in the implementation should be the simplicity of the function. The sigmoid function, as defined in (5-2-3), contains division and exponential operations. To simplify the calculations, exponential operations can be approximated by (5-2-4) and (5-2-5) [142]. In (42), floor(x) is the maximum integer less than x and delt is the fractional part of x. Nevertheless, the Sigmoid activating operations needs at least one FSM and tens of register shifting for division, and several times shift operation, one multiplication and addition for exponential approximation with marginal error. Different from sigmoid and any other totally non-linear functions, the ReLU as defined in (5-2-6) just require a MUX. Hence, in the implementation, ReLU is selected.

\[
a_i = \sigma(x_i) = \frac{1}{1 + e^{-x_i}}, \tag{5-2-3}
\]

\[
e^x = 2^{x \log_2 e} \approx 2^{1.5x}, \tag{5-2-4}
\]

\[
2^x = 2^{\text{floor}(x) + \text{delt}} \approx 2^{\text{floor}(x)} (1 + \text{delt}). \tag{5-2-5}
\]

\[
a_i = \text{ReLU}(x_i) = \begin{cases} x_i & \text{for } x_i \geq 0 \\ 0 & \text{for } x_i < 0 \end{cases}. \tag{5-2-6}
\]
The operation principles of the implementation are shown in Fig. 5-2-4(C). Once the QRS detector reports an R peak, the Top FSM will be activated to classify the heartbeat centered at the previous R peak. The Top FSM then sends “start” instruction to the FSM Conditional Accumulating (FSM-CA), which reads the 96-bit input stored in the first address sector of the SRAM and controls the MUX to pass the current weights or last weights to the input of the adder based on the current input value. If input is 0, the adder’s inputs and output keep same with their last state; otherwise update the adder input with current weight. In this way, no dynamic power wastes on 0 input. After 96 times accumulations and bias adding, the activation MUX controlled by the FSM-CA will pass through the final accumulation to SRAM if the sign bit of the final accumulation is 0. Otherwise, it sends 0 to SRAM. One cycle of this executes the (5-2-1) and (5-2-6) for one neuron of the first layer. When 32 cycles are done, FSM-CA handshakes with Top FSM and turns to sleep. After handshaking, the Top FSM activates the FSM Normal Neuron Function (FSM-NN). The accumulation and activation MUX in second and third layers are the same as in the first layer. Thus, single adder and activation MUX are reused in these three layers for all neurons. Different from the operation in the first layer, the conditional MUX at the input of the adder is replaced by a multiplier to achieve the operations in (5-2-2) and (5-2-6). Similarly, single multiplier is reused for the second and third layers. After 21 multiplication-accumulation cycles (16 for layer 2 and 5 for layer 3), FSM-NN informs the Top FSM to activate the final FSM Max Out (FSM-MO) to find the position of the maximum value in the final 5 activations of the network, and convert the
value on the position to 1 and others to 0, i.e., achieve the redefined heartbeat type label in Table 5-1-2. Once finished, the Top FSM sets the ANN_ready signal to 0 to indicate the value in the 5-bit output register is ready for reading. Then, the classifier switches into idle mode for power saving. The reuse of single multiplier and adder eliminates unnecessary wastes on leakage power. Thanks to the simplified network operations, classification can be finished in short time with single multiplier and adder. The reserved redundancy sector of the SRAM is for testing purpose and the entire SRAM is programmable by the on-chip SPI slave module.

It should be noted that accumulation of continuous positive or negative values is risky. Thus, 3 more bits are given for the integer value in adder, i.e. 1-bit sign, 7-bit integer and 16-bit decimal is taken as the data format of the adder. To cooperate with the multiplier, 3-bit extension is added to the most significant bit (MSB) of the multiplication result as shown in Fig. 5-2-4(B). Activation is taken from the successive 2-bit integer and 14-bit decimal part of the accumulation results.

5.3 Design Considerations

Dynamic power dominates the total power consumption when the chip operates at high speed (or high clock frequency). With the drop of the speed, leakage power increases while dynamic power keeps almost the same due to the constant switching activities, i.e. high clock frequency is preferred in order to minimize the power. With the limit of the execution speed of sub modules, e.g., SRAM, multiplier, and adder, there is a limit on the clock frequency. Thus, the optimization of sub modules is important. Adder is much
faster than multiplier, and SRAM is the hard IP that we can do nothing on it. In this design, the default 24-bit Sklansky Tree adder is used. The three-dimensional optimized (TDM) multiplier is enhanced to operate at the same speed as the adder. Besides the operation speed of adder and multiplier, an efficient schedule to reduce the total clock cycles is necessary. That is, finite state machines (FSM) for the control of the parameters reading and arithmetic executing should be well designed. Since thousands of weights need to be stored in the SRAM for processing, customized SPI protocol with 48-bit length of each cycle is designed.

5.3.1 SRAM

The block diagram and operational principles of the SRAM are shown in Fig. 5-3-1. The 1024 words by 32-bit single input and output data port SRAM has 10-bit address port, ADR[9: 0], 32-bit data in port Din[31: 0], 32-bit data out port Dout[31: 0], memory enable port ME, write enable WE, read enable OE, and clock CLK. Memory is activated only if ME is asserted. As shown in Fig. 53(B), writing happens at the second rising edge of the CLK, where ME and WE are both high. The value of Din is written in the memory indexed at 0x002. Assume the content at ADR 0x008 is 0x00000000, then at the third rising clock edge where both ME and OE are high, the content at address 0x008 is read out to Dout.
The maximum operating frequency for SRAM is 166MHz with typical average reading power of 35μW/MHz and writing power of 38μW/MHz. Due to the sub-threshold and junction leakage, the typical average static current is 2.75μA when ME is high. When ME is disabled, SRAM takes zero power. Thus, it is important to disable ME whenever there is no memory operation.

**5.3.2 SPI**

The SPI is customized to 48-bit length with SRAM controlling. As shown in Fig. 5-3-2, 32-bit of master in slave out (MISO) signal at positive edge of SCK is for reading SRAM output data Dout. For the simplicity of testing, reading and writing share the address represented by the first 10-bit of master out slave in (MOSI), where reading address use the represented value while writing address always higher than the represented value by one, e.g., 10’h001 for reading address, and 10’h002 for writing address. In this way, each SPI cycle can both read the value written to SRAM during last SPI cycle, and write the
value represented by the MOSI (12th to 43th bit) to SRAM via SPI slave. The last bit of MOSI controls the execution of the classifier, e.g., the value in Fig. 5-3-2 is 1 which will trigger the execution of the classifier.

![Timing of customized SPI](image)

**Fig. 5-3-2. Timing of customized SPI.**

During the first 11 cycles, the address is fetched by the on-chip SPI slave module and the writing address is calculated at the end of the 44th cycle. Two 1024 words SRAM is used, thus the total number of bits for address is 11. The 10-bit ADR will be assigned to the corresponding SRAM by SPI slave according to the first bit. At the beginning of 12th cycle (12th negative edge of SCK), memory enable (ME) and read enable (OE) are activated by SPI slave, i.e., Dout is ready slightly after the end of the 12th cycle (12th positive edge of SCK). Then, the 32-bit Dout will be sequentially presented at the positive edge of SCK on MISO for reading by off-chip SPI master. The data to be written, i.e., Din, follows the address on MOSI and will be fetched by the SPI slave at each
negative edge. At the beginning of 45th cycle all 32-bit Din will be ready and the ME and write enable (WE) will be enabled by SPI slave to write the Din to SRAM. The pulse of start signal is possible to be generated by SPI slave based on the last valid bit on MOSI as indicated by the red solid circle in Fig. 5-3-2, i.e., “1” for generating the pulse to trigger the classification and “0” for doing nothing.

5.3.3 Multiplier
Operation of ANN is based on thousands of multiplications and accumulations. Thus, the multiplier optimization is critical for performance enhancement in ANN. For wearable and mobile applications, digital multiplier array is not preferred due to the limit of power and area. Thus, the optimization of multiplier is critical in the implementation of ANN for wearable devices. The optimization mainly focuses on two aspects: reducing the number of partial product (PP) [178], [179]; shortening the critical path in PP addition [180], [181]. In this section, the customized TDM multiplier with three-stage pipeline is presented [182].

To reduce the PP, two encoding schemes are widely used. One is canonical signed digit (CSD) which transforms the zero followed by ones into one followed by zeros. In CSD encoding, the probability of digit to be zeros is about 16% higher than in two’s complementary representation. Thus, the number of partial products can be reduced by 16%. However, the CSD encoding just aims at constant values, since the encoding is not a uniform rule, i.e., each number must be manually encoded. For the proposed ANN-
CAC, each neuron has individual weights, thus CSD is not a good choice. For general purpose multiplier, modified booth encoding (MBE) is the best choice.

The MBE reduces the number of PP from \(L_{MR}\) (the length of multiplier) to \((L_{MR}+2)/2\) or \((L_{MR}+1)/2\) by encoding the radix-2 operation into radix-4 \([179]\), which can be summarized as

\[
S_i = X_{2i-1} \oplus X_{2i},
\]

\[
D_i = \overline{X_{2i+1} \cdot X_{2i} \cdot X_{2i-1}} + X_{2i+1} \cdot \overline{X_{2i} \cdot X_{2i-1}},
\]

\[
N_i = X_{2i+1},
\]

\[
Y2_i = \{L_{MD}D_i\}&Y, 1'b0,
\]

\[
Y1_i = \{1'b0, L_{MD}S_i\}&Y,
\]

\[
YP_i = (Y2_i|Y1_i)^{(L_{MD} + 1)N_i}),
\]

where \(X\) and \(Y\) are multiplier and multiplicand, respectively; \(L_{MD}\) is the length of multiplicand; \(L_{MD}D_i\) stands for concatenation of \(L_{MD}\) copies of \(D_i\); \(X_i\) means the \(i^{th}\) bit of \(X\); \(S_i\) is single \(Y\) indicator to generate \(Y1_i\); \(D_i\) is double \(Y\) indicator to generate \(Y2_i\); the \(i^{th}\) single and double \(Y\), \(Y1_i\) and \(Y2_i\), and the \(i^{th}\) sign extension \(N_i\) will be used to generate the \(i^{th}\) partial production \(PP_i\) by

\[
PP_0 = \{1'b0, \overline{N_0}, 2\{N_0\}, YP_0\} \text{ for } i = 0;
\]

\[
PP_i = \{1'b1, \overline{N_i}, YP_i, 1'b0, N_{i-1}\} \text{ for } i \neq 0.
\]
In this design, ReLU activation function is selected, i.e., all the multiplications at layer 2 and layer 3 have one non-negative input as the output of ReLU is non-negative based on (5-2-6). Thus, the MBE scheme with unsigned multiplicand and signed multiplier is designed to generate the PP. Taking the 16-bit length as an example, the details of the encoding is illustrated in Fig. 5-3-3. First, the extension of multiplier is done by adding one 0 to the lower position of the least significant bit and one sign bit (if $L_{MR}$ is odd) or two sign bits (if $L_{MR}$ is even) to the higher position of most significant bit (MSB). The $N_{PP}$ is determined by

$$N_{PP} = \begin{cases} \frac{L_{MR}+2}{2} & \text{if } L_{MR} \text{ is even} \\ \frac{L_{MR}+1}{2} & \text{if } L_{MR} \text{ is odd} \end{cases}.$$  

(5-3-9)

Fig. 5-3-3. Generated PP for 16-bit unsigned multiplicand and signed multiplier.
Adding the sign extension operation to the multiplicand in the encoding, the stored sign bit removed multiplicand can be one more bit resolution higher than the original one. From (43) to (50), the radix-4 booth encoder is shown in Fig. 5-3-4 [180].

![Fig. 5-3-4. Radix-4 booth encoding diagram.](image)

To short the PP addition, TDM with rearranged half-adder position is proposed to further short the critical path [181], [183]. TDM takes all the PP as input and sorts them in proper order to connect the early ready signal to the slow path of the half-adder (HA) or full-adder (FA) to shorten the critical path as shown in Fig. 5-3-5. Assuming the delay from inputs \(a\) and \(b\) to outputs are all 2, from inputs carry, \(cin\), to outputs are 1, then the output sum, \(s\), and carry out, \(co\) are ready at 5. Sweeping the connection of \(a\) and \(cin\), then the ready time can be shortened to 4 as shown on the right side of Fig. 5-3-5.

![Fig. 5-3-5. Delay reduction with rearrange of input order.](image)
As shown in Fig. 5-3-6, the TDM will divide the PP into vertical compress slice (VCS), and then arrange the input connection orders of the HA and FA. Finally, the output of VCS is added in the output carry propagate adder (CPA). Fig. 5-3-6 shows the details of VCS_{16}. In the original TDM, the HA is placed at the beginning to minimize the vertical delay. With the increase of the bit, horizontal delay dominates the critical path. In this design, the HA is placed at the end as indicated by the red arrow. According to the simulation, it is 0.8ns faster than the original position.
Although both MBE and TDM effectively reduced the length of the critical path of the multiplier, the speed of multiplier still falls short of the adder. One way to accelerate the multiplication is parallel implementation. But it increases silicon area and leakage power. Another way is to use pipelined multiplication at the price of additional registers, the negligible system delay and the design complexity. Thus, in this design the pipeline is adopted as shown in Fig. 5-3-7.

In Fig. 5-3-7(a), the TDM is divided into three modules in the three stages with almost same propagation delay, $T_d$, by examining the delay information calculated from the port connection and HA, FA delay. After the division, the output of the module in the
first-stage is sorted into three groups \{s12, c12\} (sum and carry need to be sent to the second module), \{s13, c13\} (sum and carry need to be sent to the third module), and o1 directly to output o12. The sort of the second module takes the same way. Assuming the registers pass data at positive edge of CLK, the three-stage pipeline operational principle is illustrated in Fig. 5-3-7(b): At first positive edge CLK, the first valid pp_1 is passed to register array pp1. At the second edge, the first valid output group s12_1, c12_1, c13_1, s13_1 and o1_1 from the output ports s12, c12, c13, s13 and o1 of module one are passed to the register arrays reg1, reg2, reg7, reg5 and o11, respectively. Meanwhile, pp_1 is passed to pp2, and pp_2 is ready to be passed to pp1. At next edge, pp_1, pp_2, pp_3, s12_2, c12_2, s13_2, s13_1, c13_1, c13_2, o1_1 and o1_2 propagate to the register arrays pp3, pp2, pp1, reg1, reg2, reg6, reg5, reg8, reg7, o12 and o11. Before the edge, the valid values at s23 and c23 are produced by the second module with pp_1, s12_1 and c12_1 as input, i.e., s23_1 and c23_1 are generated before the edge and propagate to reg3 and reg4 at the edge. After the edge, the inputs of the third module are s13_1, c13_1, s23_1 and c23_1. Thus, o3_1 can be generated ahead of the fourth edge. Meanwhile, o2_1 and o1_1 are ready at register arrays o21 and o12. That is, the first results are generated at the fourth positive CLK edge by passing the results of register arrays o12 and o21 and wires o3 to the output register. From above analysis, it can be found that the speed is 3 times better than TDM without pipeline for continuous multiplications. The system delay introduced by the pipeline is 3 clock cycles.
The simulation result shows the designed three-stage pipelined TDM multiplier can operate over 100MHz and takes 0.050mm² area in 0.18μm process.

5.3.4 Top FSM

The Top FSM controls the execution of each sub-module. The transition diagram is presented in Fig. 5-3-8. After resetting, the Top FSM keeps in idle stage, until the start from SPI is encountered. The start pulse changes the FSM to the stage start1. In start1, the ANN_ready is assigned to 1 to indicate the execution of the ANN in the meanwhile start is assigned to 1 to trigger the execution of the FSM for conditional accumulation (FSM-CA) for the first layer. Without any condition, the state transfers to stop1 from start1. In stop1, if ready1=1 (ready1 will be generated from FSM-CA when the execution in the first layer is finished), move to start2 to trigger the FSM for normal neuron (FSM-NN) of the second layer, otherwise toggles in state stop1. Receiving ready2=1 from FSM-NN of the second layer in stop2, Top FSM knows the second layer calculations are done, then move to start3 to trigger the FSM-NN for the third layer. Similarly, the FSM for final max out (FSM-MO) will be started by receiving ready3=1 in stop3. Once the ready4 from the FSM-MO, the FSM navigates to ready and ANN_ready will be assigned to 0, to tell the external module that the ANN-CAC is in idle state.
5.3.5 FSM for Conditional Accumulation (FSM-CA)

The FSM-CA is for the 32 neurons without multiplications in the first layer. As discussed in Section 5.2.3, the 96-bit input is stored in the first three addresses of the 32-bit SRAM, and the 16-bit weights and biases are sequentially programmed as in the right side of Fig. 5-3-9. Since the SRAM is 32-bit, for better utilization of memory space, each SRAM address stores two weights. While the number of biases is much smaller than weights (in this design, there are 3664 weights but just 53 biases), for simplicity of control logic, biases are individually stored in each SRAM address. Thus, each SRAM reading requires maximum twice accumulation of weights. So, the ANN-CAC input register input[95:0] should be circulated shifted by two bits when the SRAM address increased by 1 after each two weights reading as illustrated in Fig. 5-3-9. In this way, the two weights at current address are always for the updated two inputs in input[95:94]. In other words, the corresponding inputs (accumulation condition) are updated to fixed register input[95:94].
enabling the reuse of the conditioning circuits. According to the value of \( \text{input}[95:94] \), the operations can be divided into 4 cases: Case 1) \( \text{input}[95:94]=2'b00 \), the two inputs are zeros, no need for accumulation and SRAM reading; Case 2) \( \text{input}[95:94]=2'b01 \), one weight from Dout[15:0] (the low 16-bit of SRAM output) should be accumulated; Case 3) \( \text{input}[95:94]=2'b10 \), the other weight from Dout[31:16] (the high 16-bit of SRAM output) should be accumulated; Case 4) \( \text{input}[95:94]=2'b11 \), both two weights should be accumulated, twice additions are required. To control such a data flow, a customized FSM-CA is proposed in Fig. 5-3-10.

![Diagram](image)

**Fig. 5-3-9. Data flow of FSM-CA.**
In the FSM-CA, there are three counters. Counter1 records how many inputs were fetched. Counter2 indicates how many input shifting occurred. Counter3 stands for how many neurons have been executed. Signal start1=1 from the Top FSM activates FSM-CA from state idle to sr_in, preparing reading SRAM. Then, in the following three clock cycles in state r_in, in[95:0] at the first three addresses of SRAM will be passed to the input register input[95:0]. The 4 chains judge1-> add1, judge1-> add2, judge1-> add3, and judge1-> add4-> add5 execute the 4 different operation cases based on input[95:94]. Counter2 always increased by 1 in state judge1 to record one more input shifting. When counter2 gets to 48, accumulations for 96 inputs are done, meanwhile, the input shifted to the initial value for next accumulation cycle. Then, at the end of the 4 chains, the state will go to rb to accumulate the bias. Otherwise, the state return to judge1 for next input shifting and operation.

Fig. 5-3-10. Transition diagram for FSM-CA.
In \texttt{judge1-> add1}, no SRAM reading and addition, i.e., no dynamic power for accumulation because of the two zero inputs. In \texttt{judge1-> add2} and \texttt{judge1-> add3}, SRAM reading and one accumulation is required. In \texttt{judge1-> add4-> add5}, SRAM reading and twice accumulations are needed. In \texttt{rb}, the bias for current neuron is accumulated, meanwhile, \texttt{counter2} will be reset to zero for next neuron accumulations, and \texttt{counter3} is increased by 1 to indicate one more neuron is done. In \texttt{judge2}, if \texttt{counter3} equals to 32, all the 32 neurons are done, then FSM-CA get to \texttt{ready1} to set the signal \texttt{ready1}=1 to inform the Top FSM that FSM-CA is done, otherwise, return to \texttt{judge1} for the conditional accumulation of next neuron.

\textbf{5.3.6 FSM for Normal Neural Operation (FSM-NN)}

For the layer 2 and layer 3, they use the same FSM for normal neural operation as in (39) but with different parameters since the neuron numbers in the layer 2 and layer 3 are different. In FSM-NN, the three-stage pipelined multiplier is involved. The pipelined multiplication lagging should be considered. Take the layer 2 (16 neurons) as an example, the transition diagram of the FSM-NN is shown in Fig. 5-3-11.

The counters, i.e., \texttt{counter1}, \texttt{counter2}, \texttt{counter3}, and \texttt{counter4}, control the signal flow. Each multiplication will increase \texttt{counter1} by 1 until it gets to 4 that the multiplication pipelining delay has passed. \texttt{Counter2} records the number of weights that have been read. When \texttt{counter2} gets to 32, next value from SRAM is bias, which won’t be passed for multiplication. \texttt{Counter3} records how many valid multiplication-
accumulations are executed. Due to the pipeline lagging, $counter_2$ reaches to 32 earlier than $counter_3$, i.e., the final multiplication-accumulation is after the bias accumulation. Thus, when $counter_3$ get to 32, all operations for one neuron are done. $Counter_4$ records how many neurons are executed. Each address stores two weights, the same memory distribution as in Fig. 5-3-9, i.e., each SRAM reading gives two weights.

![Transition Diagram for FSM-NN](image)

Fig. 5-3-11. Transition diagram for FSM-NN.

For the execution of the first neuron in layer 2, it first runs the circle $rds->rd1->rd2->mp1->add1->rds$ for 16 times, after which $counter_2$ equals to 32 but $counter_3$ equals to 28 due to the delay of the multiplier pipelining. As $counter_2$ gets to 32, next reading of the SRAM is bias, thus it goes to $rd1->rd2->mp2->rb2->add3$ to read the
bias and accumulate the bias to the intermediate multiplication-accumulation results. For now, the counter4 is 1 as it is the first neuron, thus it gets back to the rds for continuing the reading of the weights for next neuron. Due to the pipelining delay, the output of the multiplier is still for the last four weights of the current neuron until counter3 reaches to 32. Then, it goes to the branch rb1->add2->rds for the result writing back and parameters resetting.

Other neurons executing the same procedures as the first neuron except the last one, which goes to the four times circulating of mp3->add4->mp3 without reading anything but keep multiplying and accumulating to get the final result. Then goes to rb3 to generate the handshake pulse and write back the results.

5.4 Training of ANN-CAC
The training is conducted on a desktop with i7-4790 CPU. Thus, resolution of multiplier and adder is not limited. Quadratic cost is selected in each epoch update, \( C \), as defined by

\[
C = \frac{1}{2n} \sum \sum (a_i - y_i)^2, \quad (5-4-1)
\]

where \( n \) is the number of samples used in each mini-batch, \( y_i \) is the expected value in the \( i^{th} \) position of the sample label, \( a_i \) is the activation of the \( i^{th} \) neuron of the output layer, the first sum is over \( n \) and the second sum is over output neurons. It evaluates the mean square error between the expected label and the activation of the output neurons. Hence, the training promotes the output approaching to the redefined label by minimizing the
cost. The value at corresponding label position get closer to 1 while others to 0, i.e., the max out module is used to find the position of maximum value in the output activations to get the classified heartbeat label.

Regularization factor, $\lambda$, is introduced in the training process to reduce the overfitting [184]. Regularization technique is defined in (5-4-2) to get the regularized cost, $C_r$, by including the weights decay. In (5-4-2), the sum is over weights, $w$. Meanwhile, learning rate, $\eta$, is adopted to guarantee the optimization of the backpropagation.

$$C_r = C - \frac{\lambda}{2n} \sum w^2$$  \hspace{1cm} (5-4-2)

### 5.4.1 Backpropagation

Assuming, the input at the $j^{th}$ neuron in the layer $l$ has a small change $\Delta x_j^l$, and the cost variation caused by $\Delta x_j^l$ is $\Delta C(\Delta x_j^l)$ which can be expressed as a cost partial derivation (or the observed error at this neuron $\delta_j^l$) in terms of $x_j^l$ as:

$$\Delta C(\Delta x_j^l) = (\partial C/\partial x_j^l) \Delta x_j^l = \delta_j^l \Delta x_j^l.$$  \hspace{1cm} (5-4-3)

Applying the chain rule, $\delta_j^l$ can be rewritten as:

$$\delta_j^l = \sum \left( \frac{\partial C}{\partial x_{m+1}^l} \frac{\partial x_{m+1}^l}{\partial a_j^l} \right) \left( \frac{da_j^l}{dx_j^l} \right) = \text{ReLU}'(x_j^l) \sum \delta_{m+1}^l w_{m,j}^{l+1}.$$  \hspace{1cm} (5-4-4)

In (5-4-4), the sum is over the neurons in the layer $l+1$, and $w_{m,j}^{l+1}$ is the weight for the $j^{th}$ neuron in the layer $l$ to the $m^{th}$ neuron in the layer $l+1$, i.e., the observed error can be computed for neurons layer by layer, i.e. from the last to the beginning layer. Since the
calculation is from the output to input, the propagation is called as backpropagation [185].

Relation in (5-4-4) holds only for none output layers. For the out layer, the error is defined by

\[ \delta_j^o = \text{ReLU}'(x_j^o)(\partial C / \partial a_j^o). \]  

(5-4-5)

Notice the relation of the neuron input with the weights and bias as defined in (5-2-2), the gradient of the cost regarding bias and weight are respectively given as

\[
\begin{align*}
\frac{\partial C}{\partial b_j^i} &= (\frac{\partial C}{\partial x_j^i})(\frac{\partial x_j^i}{\partial b_j^i}) = \delta_j^i, \\
\frac{\partial C}{\partial w_{j,k}^i} &= (\frac{\partial C}{\partial x_j^i})(\frac{\partial x_j^i}{\partial w_{j,k}^i}) = \delta_j^i x_k^{l-1}.
\end{align*}
\]  

(5-4-6)  

(5-4-7)

The update bias, \( \Delta b_j^i \), and weights, \( \Delta w_{j,k}^i \), can be respectively expressed as (5-4-8) and (5-4-9) by applying the learning rate according to the gradient decent rule.

\[
\begin{align*}
\Delta b_j^i &= -\eta (\frac{\partial C}{\partial b_j^i}) = -\eta \delta_j^i, \\
\Delta w_{j,k}^i &= -\eta (\frac{\partial C}{\partial w_{j,k}^i}) = -\eta \delta_j^i x_k^{l-1}.
\end{align*}
\]  

(5-4-8)  

(5-4-9)

### 5.4.2 Biased Training

Each record of the MIT-BIH database only contains a small portion of abnormal heartbeats (type S, V, and F). We noticed that the normal mini-batch sampling treats all samples with same probability, i.e. the proportion of different types in the obtained mini-batch keeps same with the entire training samples. Thus, sampling the unbalanced training data with even probability produces the unbalanced mini-batch for training (the evenly sampled training). Since N type dominates the unbalanced data, the trained weights and biases are significantly affected by N type beats but lack of effects from
abnormal types. Hence, the biased training (BT) process is proposed to address the issue.

The details of the BT are provided in Table 5-4-1.

Table 5-4-1. Details of the proposed biased training.

<table>
<thead>
<tr>
<th>Stochastic gradient decent algorithm with biased sampling update</th>
</tr>
</thead>
<tbody>
<tr>
<td><strong>Required:</strong> Learning rate $\eta$, regularization factor $\lambda$, beat type grouping, minibatch size $m$, total training samples $ts$, and number of training epochs $ne$.</td>
</tr>
<tr>
<td><strong>Required:</strong> Randomly initialize weights $w_{j,k}^l$, and $b_j^l$ with Gaussian distribution and normalize them.</td>
</tr>
<tr>
<td><strong>while</strong> target training accuracy not reached <strong>do:</strong></td>
</tr>
<tr>
<td>Updating times $T \leftarrow \text{floor}(ts/m)$.</td>
</tr>
<tr>
<td><strong>for</strong> 1 to $T$ <strong>do:</strong></td>
</tr>
<tr>
<td><strong>Obtain</strong> $m$ samples with same beat type proportion.</td>
</tr>
<tr>
<td><strong>Calculate</strong> inputs and activations of each neuron according to (5-2-2) and (5-2-6) by forward propagation.</td>
</tr>
<tr>
<td><strong>Get</strong> cost based on (5-4-1) and (5-4-2).</td>
</tr>
<tr>
<td><strong>Derive</strong> output observed error $\delta_j^o$, updating amount for biases and weights of output layer $\Delta b_j^o$ and $\Delta w_{j,k}^l$ from (5-4-6) to (5-4-9).</td>
</tr>
<tr>
<td><strong>Backpropagate</strong> $\delta_j^l$ for none output layers based on (5-4-5).</td>
</tr>
<tr>
<td><strong>Update</strong> $b_j^l \leftarrow b_j^l + \Delta b_j^l$ and $w_{j,k}^l \leftarrow w_{j,k}^l + \Delta w_{j,k}^l$.</td>
</tr>
<tr>
<td><strong>end for</strong></td>
</tr>
<tr>
<td><strong>end while</strong></td>
</tr>
</tbody>
</table>

As shown in the first step of the **for** loop, the proposed BT guarantees each beat type takes same proportion of the $m$ samples. For example, in our CGS, record 208 belongs to (N, V, F). Firstly, the samples in record 208 are divided into three subsets with individual N, V, and F labels. During collecting the $m$ samples, e.g., $m$ is 12, the biased training would take 3 samples from each subset to form the balanced $m$ samples for updating. The comparison of CGS-BT with conventional cross-validation scheme based evenly sampled training (CV-ET) during 200 training epochs is shown in Fig. 5-4-1. For both methods, $\eta$ and $\lambda$ is respectively selected as 0.001 and 0.0001. Two remarkable
conclusions can be observed. First, the CGS-BT converges quicker and shows lower cost than the CV-ET on training dataset. Second, the cost on evaluation dataset coincides with the trend on training dataset by using the CGS-BT, while the training convergence has little help on the evaluation result in the CV-ET and even the evaluation cost oscillates across the whole 200 epochs. Thus, cooperating with the CGS, the BT handles the imbalanced database well and shows excellent performance.

![CGS-BT vs CV-ET](image)

Fig. 5-4-1. Comparison between conventional training and the proposed biased training.

### 5.5 Experiment Results

During experiments, all patients in group (N, V), (N, S), (N, S, V) and (N, V, F) are evaluated. Considering the ECG identity, each patient uses their own ECG data for both training and evaluation without borrowing from others. Randomly selected 70% samples
of a record is taken as the training data set of the patient, and the rest 30% samples are used for evaluation. Take patient 209 as an example, the record contains 2621 N Types and 383 S Types. Thus, the training samples consists of 1835 N Types and 268 S Types that are randomly chosen respectively from the total N Type and S Type. The left 30% samples with 796 N Types and 115 S Types are used for evaluation.

5.5.1 Verification on FPGA

Pynq-Z2 board with Xilinx Artix-7 family FPGA is target for the verification as shown in Fig. 5-5-1. The Zynq-7020 chip contains an Artix-7 and dual ARM Contex-A9 core. In this verification, it is programmed via JTAG protocol where the ARM core is bypassed as circled by the yellow box. There are in total 4 different types of beats to be classified in the MIT-BIH, i.e., Types N, S, V, and F. The 4 LEDs and buttons as in the bottom yellow box to respectively indicate the classified type and trigger the classification. According to the label in the database, the ECG segments from the specific patient are correspondingly stored in memory sector N, S, V, and F to prepare for the input of the implemented classifier. The trained weights and biases for the patient is programed via the SPI master and slave to the instantiated two 1K 32bit SRAM modules, i.e., to keep exactly same with the ASIC implementation.

The 4 buttons sequentially transfer the input from the corresponding 4 memory sectors to the classifier and trigger the classification as in the bottom yellow box of Fig. 5-5-1. That is, the corresponding LED at the top of the button is expected to be lighted by pressing of the button. For example, when verify the accuracy of Type N detection, all
the LED lighting times are recorded during continuously pressing the first button. Once finish the classification of Type N, then the pressing of other buttons in the same way to get the fusion matrix of the classification for the patient. When current patient is done, the specific weights, biases, and the ECG segments of the next patient should be reprogrammed. The final fusion matrix is the same with the simulation in Modelsim and MATLAB which will be discussed in Section 5.5.3.

Fig. 5-5-1. Implementation verified on Pynq-Z2 board.

The Artix-7 FPGA on Zynq-7020 has 13.3K logic slices, 630KB RAM and 220 DSP slices. In this design, we only use the logic slices to keep the implementation as same as the ASIC implementation, where customized multiplier is used for the arithmetic operations. As in Fig. 5-5-2, 5269 look up tables (LUTs), 1024 LUTs based RAM (LUTRAM), 1331 flip flops (FF), and 3 buffer gates are utilized for this implementation. It takes roughly only 10% logic slices thanks to the single multiplier and adder. The power consumption at 2.5MHz is shown in Fig. 5-5-3, where the classification dynamic
power is about 3mW which is much smaller than the device static power. The clock speed can be increased up to 125MHz.

![Fig. 5-5-2. Resource utilization on Atrix-7 FPGA verification.](image)

Fig. 5-5-3. Power statistic for the FPGA implementation at 2.5MHz.

### 5.5.2 ASIC Implementation

The design is also implemented in a CMOS 0.18µm process, where 64Kb SRAM is used for the storage of weights, biases, inputs, and activations. To short the path, we symmetrically placed two separate 32Kb SRAM blocks as shown at the left side of Fig. 5-5-4. The area is 0.9246mm$^2$, of which approximate 2/3 is taken by SRAM and 1/3 is for FSMs, multiplier, adder, and the SPI slave module.
The synthesizing, placement, and routing efforts aims for up to 50MHz from 1.8V to 3.3V VDD with minimum slack of 100ps. For each classification, it takes 6298 clock cycles with single multiplier and adder. For the proposed 96×32×16×5 ANN-CAC, 3717 additions, 592 multiplications, 3 cycles to fetch input, and 40 clocks to calculate the max out layer, i.e., theoretically a minimum of 4352 clock cycles can finish the classification if each multiplication and addition takes one clock. That is, we still have 30.9% room for optimization on the reduction of the clock cycles for better speed and energy efficiency. Nevertheless, the implemented design is able to finish each beat from 629.8ms to 251.92µs at the operating frequency from 10KHz to 25MHz as illustrated by the black line in Fig. 5-5-5.
Fig. 5-5-5. Simulated classification speed and power peaks at different clock frequency.

Table 5-5-1. Current summary at 75bpm heart rate.

<table>
<thead>
<tr>
<th>Frequency (MHz)</th>
<th>0.01</th>
<th>0.05</th>
<th>0.1</th>
<th>0.5</th>
<th>1</th>
<th>10</th>
<th>25</th>
</tr>
</thead>
<tbody>
<tr>
<td>Current (μA)</td>
<td>15.51</td>
<td>8.993</td>
<td>8.18</td>
<td>7.526</td>
<td>7.444</td>
<td>7.371</td>
<td>7.366</td>
</tr>
</tbody>
</table>

Since, the circuits and clock tree are fixed to meet the maximum 50MHz frequency and the CAC goes to sleep after the task, longer processing time means longer leakage but no effect on dynamic power as the total clock cycles are the same. As shown in Table 5-5-1, with the increase of clock frequency, the average current decreases at 75bpm heart rate. From 10KHz to 500KHz, the current reduction is significant, i.e. almost 50%. Beyond 500KHz, the current reduction is small. It should be noted that, with the increase of clock frequency, the power peak increases almost linearly as shown by the red line in Fig. 5-5-5. The high power peak requires a power management unit (PMU) with strong instantaneous load capacity. Thus, 500KHz is recommended for the tasks that are not time sensitive. Without doubt, 50MHz is the choice for applications that are
expected to process each classification within 126µs. With fixed architecture and almost same arithmetic operations, energy consumed on the detection of different types is very close as in Fig. 5-5-6. The top plot shows current for different types are around 7.35µA and the energy per clock cycle are close to 935pJ at 25MHz. The bottom is for 10KHz, current and energy per clock cycle are approximate 15.51µA and 1.97nJ. The power and energy results are on 3.3V supply, they can be accordingly reduced with the decrease of supply voltage e.g., at 1.8V the energy per cycle is around 510pJ at 25MHz. With the CTDA signal flow, the remarkable reduction of the network size and arithmetic operations enables a micro power ANN-CAC.

![Energy Summary](image)

Fig. 5-5-6. Energy summary at 25MHz and 10KHz for 75bpm heart rate.

### 5.5.3 Benchmarking

As the CGS divisions in Table 5-1-3, records in none (N) groups are taken into consideration. During training, randomly selected 70% and 30% samples from each record are used for the training and verification purposes, respectively. The classification
fusion matrix is given in Table 5-5-2 and the accuracy performance on each individual record of the proposed classifier is summarized in Fig. 5-5-7. Thanks to the CGS, BT and the plentiful P-QRS-T complex details in CTDA signal flow, the ACC, SE, and +P for type N are over 99.3%. The ACC_S, SE_S, +P_S, ACC_V, SE_V and +P_V reach the excellent level of 99.70%, 97.75%, 94.39%, 99.68%, 98.67%, 98.07%, respectively. Even for Type F, the ACC and +P get to the high level of 98.87% and 87.80% except the 49.03% SE.

Table 5-5-2. Classification fusion matrix based on simulation and FPGA verification.

<table>
<thead>
<tr>
<th>Detected Label</th>
<th>N</th>
<th>S</th>
<th>V</th>
<th>F</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>59874</td>
<td>150</td>
<td>104</td>
<td>75</td>
</tr>
<tr>
<td>S</td>
<td>57</td>
<td>2609</td>
<td>3</td>
<td>0</td>
</tr>
<tr>
<td>V</td>
<td>67</td>
<td>5</td>
<td>6805</td>
<td>20</td>
</tr>
<tr>
<td>F</td>
<td>23</td>
<td>0</td>
<td>27</td>
<td>684</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Record</th>
<th>Type N (%)</th>
<th>Type S (%)</th>
<th>Type V (%)</th>
<th>Type F (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>N</td>
<td>ACC, SE, +P</td>
<td>ACC, SE, +P</td>
<td>ACC, SE, +P</td>
<td>ACC, SE, +P</td>
</tr>
<tr>
<td>100</td>
<td>99.91, 100, 100</td>
<td>99.91, 100, 100</td>
<td>99.91, 100, 100</td>
<td>99.91, 100, 100</td>
</tr>
<tr>
<td>118</td>
<td>99.96, 100, 100</td>
<td>99.96, 100, 100</td>
<td>99.96, 100, 100</td>
<td>99.96, 100, 100</td>
</tr>
</tbody>
</table>

Fig. 5-5-7. Classification accuracy details on each records in none (N) groups.
Table 5-5-3. Classification accuracy comparison.

<table>
<thead>
<tr>
<th>Method</th>
<th>Type V</th>
<th></th>
<th></th>
<th>Type S</th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>ACC</td>
<td>SE</td>
<td>+P</td>
<td>ACC</td>
<td>SE</td>
<td>+P</td>
</tr>
<tr>
<td>Hu et al. [145]</td>
<td>94.8</td>
<td>78.9</td>
<td>75.8</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Chaza et al. [147]</td>
<td>93.6</td>
<td>78.9</td>
<td>76.0</td>
<td>N/A</td>
<td>N/A</td>
<td>N/A</td>
</tr>
<tr>
<td>Jiang et al. [148]</td>
<td>98.1</td>
<td>86.6</td>
<td>93.3</td>
<td>96.6</td>
<td>50.6</td>
<td>67.9</td>
</tr>
<tr>
<td>Ince et al. [149]</td>
<td>97.6</td>
<td>83.4</td>
<td>87.4</td>
<td>96.1</td>
<td>81.8</td>
<td>63.4</td>
</tr>
<tr>
<td>Kiranyaz et al. [134]</td>
<td>98.6</td>
<td>95.0</td>
<td>89.5</td>
<td>96.4</td>
<td>64.6</td>
<td>62.1</td>
</tr>
<tr>
<td>This Work</td>
<td>99.6</td>
<td>98.6</td>
<td>98.0</td>
<td>99.7</td>
<td>97.7</td>
<td>94.3</td>
</tr>
</tbody>
</table>

Table 5-5-4. Performance comparison with other implementations.

<table>
<thead>
<tr>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr>
<td>Process</td>
<td>0.13μm</td>
<td>65nm</td>
<td>0.18μm</td>
<td>40nm</td>
<td>0.18μm</td>
</tr>
<tr>
<td>Frequency (Hz)</td>
<td>4K</td>
<td>10K</td>
<td>250-500</td>
<td>1M</td>
<td>10K-50M</td>
</tr>
<tr>
<td>Area (mm²)</td>
<td>1.21</td>
<td>0.112</td>
<td>N/A</td>
<td>0.135</td>
<td>0.925</td>
</tr>
<tr>
<td>VDD (V)</td>
<td>0.9</td>
<td>1</td>
<td>0.5</td>
<td>1</td>
<td>1.8</td>
</tr>
<tr>
<td>ACC_N</td>
<td>Doesn't follow AAMI</td>
<td>86%</td>
<td>Just</td>
<td>95.84%</td>
<td>99.32%</td>
</tr>
<tr>
<td>ACC_S</td>
<td>N/A</td>
<td></td>
<td>differentiates</td>
<td>89.69%</td>
<td>99.70%</td>
</tr>
<tr>
<td>ACC_V</td>
<td>N/A</td>
<td></td>
<td>P, QRS, and T</td>
<td>78.67%</td>
<td>99.68%</td>
</tr>
<tr>
<td>ACC_F</td>
<td>N/A</td>
<td></td>
<td>T</td>
<td>N/A</td>
<td>98.87%</td>
</tr>
<tr>
<td>Energy (pJ/cycle)</td>
<td>96</td>
<td>278</td>
<td>1740-870</td>
<td>94</td>
<td>1075-510</td>
</tr>
<tr>
<td>Class Number</td>
<td>2</td>
<td>2</td>
<td>N/A</td>
<td>3</td>
<td>4</td>
</tr>
</tbody>
</table>

As shown in Table 5-5-3, with the help of CTDA, CGS, and BT, this work shows the best classification accuracy comparing with other deep learning based classifiers.

More importantly, those deep learning methods are all running on a workstation or PC.

The implementation we proposed consumes average power of 13.34μW with 126μs classification speed at the condition of 1.8V VDD and 50MHz clock.
Compare with other implemented classifiers, this design presents the best classification accuracy with maximum class numbers and acceptable energy consumption as summarized in Table 5-5-4.

**5.6 Conclusions**

In this chapter, an ultralow power CTDA based ANN-CAC is presented. Considering the rarely happened abnormal beats and the unbalanced dataset, CGS-BT is proposed to improve the accuracy. The CTDA signal flow significantly simplified ANN topology and the arithmetic operations, especially at first layer where just conditional accumulations need to be used. For ultralow power implementation, a customized three-stage pipelined TDM multiplier is proposed to replace the multiplier arrays. Besides the multiplier, other on-chip modules, e.g., the SPI slave and FSMs are customized to cooperate with the SRAM and multiplier to achieve excellent execution efficiency.

The classification performance of the proposed ANN-CAC is verified by both simulation and FPGA. The classification accuracy for Types N, S, V, and F are respectively 99.32%, 99.70%, 99.68%, and 98.87%. The simulated power for the ASIC implementation is 13.34μW with maximum 126μs classification speed. Comparing with the state-of-the-art pure software implementation, this design shows comparable accuracy while requires no CPU or GPU. Comparing with other hardware CAC, this design shows the best classification accuracy and is capable of classifying all 4 Types beats at the cost of slightly higher power.
Chapter 6
Conclusions and Future Works

6.1 Fulfilled Objectives and Future Works

For the low-cost flexible ECG patches, belts, and other types of wireless wearable ECG sensors, power is the main constraint due to the compact size (or limited battery). For ambulatory ECG recording, high input impedance is necessary to handle the motion artefact for better diagnostic yield. Besides power and input impedance, the International Standards set the requirements on gain variation, CMRR, and noise for ambulatory ECG systems.

In this research, a novel DC-coupled analog frontend based on FDDA is proposed to achieve high input impedance and high CMRR. In conventional three-amplifier DC-coupled AFE, both common mode and differential mode signals are amplified in the two branches which is the reason of the poor CMRR. We propose to replace the two branches by an FDDA to suppress the common mode amplification leading to a high CMRR of 76dB while maintaining an input impedance of 1 GΩ. To improve the noise performance and area utilization, the parasitic capacitance of the large size input transistor is reused as part of the gain ratio capacitor leading to excellent noise performance of 1.02μV_{rms} with noise effective factor of 2.55 and smaller size of 0.405mm². An on-body DC bias is proposed to stabilize the DC bias while handling the DC off-set. Verified in 0.35μm CMOS process, the designed AFE can pick up clear ECG signal with the electrode distance as close as 2cm during both static and walking condition. It also records the EEG
signal that shows clear oscillations with frequency around 10Hz as the \(\alpha\)-wave observed after eye blinking during EEG testing. Performance testing results show that the power of the AFE is 900nW under 1.8V VDD, gain variation is smaller than 3\% with programmability of 56dB to 68dB with fixed bandwidth from 0.4Hz to 120Hz.

High input impedance is helpful to handle the motion artefact by relaxing the impact of the skin-electrode contact impedance variation (potential divider effect). It’s noticeable that the contact impedance variation is not the only source but also the skin potential variation due to the deformation of the skin during body movement. No research has been reported on how to address the skin potential variation, neither in the proposed AFE design. Thus, in the future, circuits that are tolerant to the skin potential variation should be studied. The designed AFE shows 1G\(\Omega\) at DC which can be further improved. Possible solution is to combine impedance boosting with DC-coupled IA.

Existing QRS detectors can easily achieve over 99\% detection accuracy based on the MIT-BIH database. In practice, the detection accuracy is much lower than the reported results because of noise corrupted ECG and differences in ECG morphology of each individual. Current researches on the QRS detection for wearable applications focus on the improvement of the detector’s power efficiency. However, complex handcrafted model and parameters are required due to the diversity of the subject. User specific method aims for each individual which has becoming the mainstream in ECG classification but rarely studied in QRS detection.
A personalized QRS detection method is proposed in this study to cover the ECG diversity. We propose to generate an ECG template for each individual, which can be done at the first use by recording a 20 second ECG segment. One target clustering is applied to the segment to extract the user-specified ECG template. After the extraction, the segment under detection is compared against the user-specified template by calculating the Pearson Correlation Coefficients (PCC). Whenever, the PCC surpasses a pre-set threshold, a QRS complex centred at R peak is generated. For the accuracy enhancement, adaptive thresholding is also proposed for the real-time updating of the RR interval, QRS template, and the threshold.

Evaluated on the 48 records of the MIT-BIH database, the majority 99127 normal beats are detected from the total 109369 beats. The total false positive and false negative are respectively 786 and 604. The average 99.21% sensitivity and 99.39% positive prediction are achieved. Compare with other algorithms, this proposed one shows comparable performance. In addition, it is capable of separating the majority QRS type from other types, which is not available from existing algorithms.

The proposed personalized QRS detection is implemented in software only. In the future, the ASIC implementation can be considered. In addition, it is interest to explore a CTDA version of the algorithm to get better energy efficiency.

The CAC gives patients the direct information on their cardiac conditions, enabling a smart ECG sensor. Currently, the design of high-performance CAC is a hot research topic. There are two main branches. The first is the low power hardware
implementation of CAC algorithms with linear model or simple machine learning method, which presents limited detection accuracy but suitable for wearable applications for their ultralow power consumption. The second is the patient specific algorithms using neural network based model, where the best detection accuracy can be achieved but requires more computational power such as CPU or GPU.

With the introduction of CTDA signal flow, the use of LC-ADC to generate event based samples is getting more popular in ultralow power ECG sensor research. Different from the evenly distributed Nyquist sampling, the samples generated by LC-ADC is proportional to the signal slope. This is preferable in processing ECG signal since less samples are generated on the baseline while more details are available for the useful P-wave, QRS complex, and T-wave.

In this study, a CTDA ANN-CAC is proposed to reduce the hardware cost and thus power consumption while achieving the comparable classification performance with the software implementations. Two main benefits are the result of CTDA. First, the total number of samples for the ANN-CAC input is reduced which dramatically simplifies the ANN structure. Second, the input samples in a CTDA data stream are pure “1” and “0”, the multiplications of the first layer can be removed. A fully connected three-layer ANN with structure of $32\times16\times5$ with 96-bit input samples is trained. For such structure, each classification requires 3717 additions and 592 multiplications. Besides the simplification of the ANN-CAC structure, imbalanced training samples are also considered to improve the classification accuracy. According to the statistics of the MIT-BIH database, the
records are divided into groups with sufficient abnormal beats. Then the biased training method is proposed to cooperate with the grouping scheme for quicker training convergence.

Several considerations are applied in the hardware implementation of the CTDA ANN-CAC. Firstly, ReLU activation is adopted to simplify the circuits as the simple activation function achieves the similar inference performance with complex activation functions by proper training. With the ReLU, the outputs of the neurons are non-negative, thus the customized Booth Encoding saves 1-bit in the design of the multiplier. More multiplier and adder result more area and leakage power because of the fixed arithmetic operations. By sharing one multiplier and one adder, we manage to minimize the area while guarantee the classification time within 1ms. For the arithmetic operations, 16-bit fixed point multiplication and 24-bit accumulation are used to guarantee the accuracy and avoid overflow of the accumulation. To resolve the speed limitation of the multiplier, three-dimensional reduction (TDM) for optimization of the addition of partial products as well as three-stage pipelining are used in multiplier design, which results a multiplier that has similar speed of the adder. With the customized multiplier, corresponding control logics for different layers are developed.

The classification accuracy is verified by simulation in MATLAB, implementation in FPGA, and the simulation of ASIC in 0.18μm process. Equivalent results are obtained. Evaluated on the 44 records of the MIT-BIH database, the average ACC, SE, +P, and FPR for Types N, S, V, and F are comparable to the current state-of-
the-art. Compared with the stage-of-the-art software implementations, the design shows the best performance except for Type F.

Implementing on Artix-7 FPGA, the resources taken is smaller than 10% of the total FPGA hardware. The estimated power for each classification is about 3mW. Implemented in 0.18μm CMOS process, the design can be operated from 10KHz to 50MHz from 1.8 to 3.3V. The maximum classification speed for a patient with 75bpm heart rate is within 128μs. The average power can be as low as 13.34μW. Compared with other implementations, this design shows the best classification performance with slightly higher power.

The CTDA ANN-CAC has been verified in simulation and FPGA. However, pruning is not conducted during the implementation. It is possible to reduce the power by pruning and sub-threshold techniques.

For now, the three modules are independent with each other. In the next stage, all three modules will be implemented on a single chip to form a smart ECG-on-Chip.

### 6.2 Research Contributions

This research has made several contributions to the flexible/wearable ECG sensor building blocks. For the ECG analog front-end circuit, we demonstrated a DC-coupled FDDA analog front-end that achieves 1GΩ input impedance, 76dB CMRR, an excellent noise performance of 1.02μV<sub>rms</sub> with noise effective factor of 2.55, small size of 0.405mm<sup>2</sup>, tuneable gain of 56-68dB, and 900nW of power. The high input impedance and high CMRR are the result of proposed DC-coupled FDDA circuit topology. The
good noise performance and small area are achieved by proposed parasitic capacitor sharing scheme. The proposed on-body biasing scheme stabilizes the DC offset caused by skin-electrode interface and eliminates the need of drive-right-leg circuit, leading to significant power saving. The AFE achieves the best Noise Effective Factor among all existing state-of-the-art ECG AFEs.

For the QRS detector, we proposed the personalized ECG template based detection method, which is capable of tracking the variation in ECG morphology from person to person. This feature enables better detection accuracy and is more robust in dealing with people with different race, gender, age, and weight. The proposed one target clustering method reduces the computational load for personalized template generation, which make it suitable to embed into flexible ECG sensors.

For cardiac arrhythmia classifier, we introduced continuous-in-time discrete-in-amplitude signal flow to the machine learning, which reduces the number of arithmetic operations by more than 100 times compared to Nyquist sampling based scheme. The use of level-crossing ADC to convert ECG signal into the continuous-in-time discrete-in-amplitude signal results over 90% input sample reduction. Based on a simple ANN, we are able to achieve more than 98% classification accuracy for all four types of arrhythmias at the power consumption of 13.34\(\mu\)W, the best among all start-of-the-art implementations of cardiac arrhythmia classifier.
Bibliography


Biomedical Engineering Society] [Engineering in Medicine and Biology, 2002, pp. 2111-2112 vol.3.


[52]. Xiaoyang Zhang, “Ultra low power circuits for wearable biomedical sensors,” Ph.D Theses (Open), Available:
http://scholarbank.nus.edu.sg/handle/10635/119211


