#### 논문 2017-54-6-11

# Zynq 기반 baremetal 멀티프로세싱에 의한 초음파 TOF 측정

# (Measuring ultrasonic TOF using Zynq baremetal Multiprocessing)

## 강 문 호\*

(Moon ho Kang<sup>©</sup>)

#### 요 약

본 연구에서는 Xilinx의 Zynq SoC (system on chip)를 이용하여 초음파 신호의 TOF (Time of Flight)를 측정한다. TOF는 특정 거리를 이동하는 데 소요되는 RF (radio frequency) 기준 신호와 초음파 신호의 시간차이로 부터 계산되고, 공기중 초음 파의 속도를 곱하여 초음파 이동거리를 알아낸다. 이를 위해 Zynq의 내장 ADC, FIR (finite impulse response) 필터, Kalman 필터로부터 초음파 펄스를 생성하고, RF 인터페이스로부터 RF 기준펄스를 생성한다. Kalman 필터와 RF 인터페이스는 baremetal 멀티프로세싱에 의해 Zynq의 듀얼 프로세서 코어에 c-코드로 프로그래밍하고 나머지 구성 요소들은 Zynq의 FPGA 내에 설계하여, HW/SW co-design을 구현한다. 이를 통해 HW design에 비해 Zynq 자원의 가용률을 낮추고, 설계 시간을 대 폭 줄일 수 있었다. 설계 툴로 Vivado IDE (integrated design environment)를 이용하여, 전체 신호처리 시스템을 계층적 블록 다이어그램의 형태로 설계하였다.

#### Abstract

In this research the TOF (time of flight) of ultrasonic signal is measured using Xilinx's Zynq SoC (system on chip). The TOF is calculated from the difference between periods during which RF (radio frequency) and ultrasonic signals come across a distance, and then travelling distance is obtained by multiplying the TOF by the ultrasonic speed in the air. For this purpose, a ultrasonic pulse is generated from a Zynq's internal ADC, a FIR (finite impulse response) filter, and a Kalman filter. And a RF reference pulse is generated from a RF interface. Based on baremetal multiprocessing, the Kalman filter and the RF interface are c-programmed on Zynq's dual processor cores, with other components fabricated on Zynq's FPGA. With this HW/SW co-design, both lower resource utilization and much smaller designing period were obtained than the HW design. As a design tool, Vivado IDE(integrated design environment) is used to design the whole signal processing system in hierarchical block diagrams.

Keywords: Ultrasonic signal TOF, Zynq SoC, Baremetal Multiprocessing, HW/SW co-design, Vivado

#### I. Introduction

With the development of digital signal processing algorithms, high-performance devices are being developed which enable HW/SW co-design integrating both FPGA and processors in a single chip. As an example, the Zynq-7000 SoC (system on chip)<sup>[1]</sup>, which integrates both Xilinx's 7-series FPGA

and dual-core ARM processor, has been applied in the fields needing high-performance digital signal processing<sup>[2~6]</sup>. On the other hand, Zynq can implement multiprocessing based on its dual cores, which enhances its usability. With the asymmetric multiprocessing without OS (operating system), called baremetal, two applications can be performed independently without interferences from  $OS^{[7~8]}$ .

In this study, an ultrasonic signal processing system for calculating the ultrasonic TOF (time of flight)<sup>[9]</sup> is produced and test results are shown. The system is composed of an ultrasonic transmitter and a receiver. Ultrasonic transmitter produces periodic

<sup>\*</sup> 정회원, 선문대학교 정보통신 디스플레이 공학과 (Department of Information Communication & Display Engineering, Sunmoon University)

<sup>&</sup>lt;sup>(e)</sup> Corresponding Author (E-mail : mhkang@sunmoon.ac.kr) Received ; February 1, 2017 Revised ; March 27, 2017 Accepted ; May 23, 2017



그림 1. Zynq를 이용한 초음파 신호 멀티-프로세싱 시스템 (a) 초음파 발신기 (b) 초음파 수신기 Fig. 1. Ultrasonic signal multiprocessing system using Zynq (a) Ultrasonic sender (b) Ultrasonic receiver.

40kHz ultrasonic waves and 2.4GHz RF (radio frequency) signals. Consisting of an ultrasonic sensor, a RF module and a Zynq board, ultrasonic receiver detects both ultrasonic waves and RF signals, then generates an ultrasonic pulse and a timing RF pulse. The TOF is calculated from the time difference between these pulses. All design elements are configured on a Zynq SoC - ADC interface, FIR filter, absolute value calculator, Kalman filter, RF interface, TOF calculation module, and etc.

The whole system was designed by using the baremetal multiprocessing with three parts, ultrasonic pulse generation, RF pulse generation, and TOF calculation. With suitable IPs (Intellectual property)<sup>[2]</sup>, components are fabricated on the FPGA by the IPs, or, components are programed on processor cores by the c-programs. The ultrasonic pulse generation is implemented on the Zynq's FPGA and the first processor core (cpu0), and both RF pulse generation and TOF calculation are implemented on the second processor core (cpu1). By the HW/SW co-design like this, it was possible to obtain both lower resource utilization and much smaller design period than the HW design, as well as favorable TOF accuracy. The entire system is designed by the Xilinx's Vivado IDE (Integrated Design Environment)<sup>[10]</sup> in the form of hierarchical block diagrams.

#### II. Ultrasonic signal multiprocessing system

Fig. 1 shows entire system block diagram. The ultrasonic transmitter consists of a ultrasonic sender module and a RF module, and the ultrasonic receiver consists of a ultrasonic sensor, a RF module, and a Zynq-7010 board<sup>[11]</sup>. The transmitter transmits 40kHz ultrasonic signal and 2.4GHz RF signal periodically towards the receiver.

#### 1. Ultrasound pulse generation

When the ultrasonic signal is received at the ultrasonic receiver, it is amplified and then sampled by Zynq's internal 12-bit ADC with the maximum sampling frequency of 1MHz. The sampled signal is applied to a FIR band pass filter having a center frequency of 40kHz to remove dc offset value. After rectified through an absolute value calculator (ABS), and filtered by a Kalman filter to remove ripples, the ultrasonic wave envelope is detected<sup>[12]</sup>. Finally, an ultrasonic pulse is generated after the ultrasound envelope is compared with a reference level. Components from the ADC interface to the ABS module are fabricated on the FPGA, and the Kalman filter is programmed by c-codes in the cpu0.

2. RF pulse generation and TOF calculation

To calculate the TOF on the receiver, the start time of the ultrasonic wave transmission should be known, so, a RF signal is transmitted along with the ultrasound. Ignoring the propagation delay time of the RF signal in air, the TOF can be calculated from the arrival time difference between the ultrasound and the RF signals at the receiver. For this purpose, a RF pulse is generated synchronously with the arrival of the RF signal, which is programmed by c-codes in the cpul. Fig. 2 shows a flow diagram of the baremetal multiprocessing of the Zynq, which shows that cpu0 is programmed to generate the ultrasonic pulse as a master, and that cpu1, waken by the cpu0, is programmed to initialize the RF module and calculate the TOF along with traveling distance.



그림 2. baremetal 멀티-프로세싱 절차 Fig. 2. Flow of baremetal multiprocessing.

All blocks of Fig. 1 are designed using the Xilinx Vivado and outputs from the blocks are collected in real time by an integrated logic analyzer IP (ILA)<sup>[13]</sup> and transferred to PC. Fig. 3 shows the ultrasonic transmitter and receiver PCB boards.

#### III. System design by Vivado

#### 1. Vivado system schematic

Fig. 4 shows the system schematic designed on the Vivado IDE. Front-end XAdc\_SysMon block is used for interfacing the Zynq's built-in ADC and for sampling the received ultrasonic signal. FIR\_BPF block implements the FIR band pass filter. ABSfnc block is for calculating the absolute value, and



Fig. 3. Ultrasonic transmitter and receiver PCB.

Kalman\_Rf block performs Kalman filtering along with RF interface. Table 1 shows major design specifications of each system components. Fig. 5 shows the internal structure of the Kalman\_Rf block shown in Fig. 4. Accepting the output from the ABSfnc block via axigpio module, Zynq's dual-core processor block (ZYNQ\_PS7) performs the Kalman filtering, ultrasound and RF pulse generations, and TOF and moving distance calculations. For the remaining blocks please refer to reference<sup>[12]</sup>.

표 1. 시스템 설계 사양 Table1. System design specification.

| ADC(@Z-7010)                                               | sampling: 333[kHz]                         |                                      |            | input channels: 14                    |       |  |
|------------------------------------------------------------|--------------------------------------------|--------------------------------------|------------|---------------------------------------|-------|--|
| FIR BPF                                                    | window                                     |                                      | BW         | sampling                              | order |  |
|                                                            | hamming                                    |                                      | 35~45[kHz] | 1[MHz]                                | 100   |  |
| Kalman Filter                                              | order:1 R/Q: 2500/1 execution cycle: 0.5µs |                                      |            |                                       |       |  |
| Ultrasonic modules<br>sender(MA40S4S)<br>receiver(MA40S4R) |                                            | nominal freq.:<br>40[kHz]<br>40[kHz] |            | input volt.:<br>18(square)[Vp-p]<br>- |       |  |
| RF module(CC2500)                                          |                                            | 2,4GHz ISM/SRD band transceiver      |            |                                       |       |  |
| R: measurement noise cov. Q: process noise cov.            |                                            |                                      |            |                                       |       |  |

#### 2. File building procedure

Fig. 6 shows file building procedure during system design. After entire system schematic is created, a bitstream file is generated to fabricate Zynq's FPGA part. Then, a first stage boot loader (FSBL) is generated which is responsible for loading application







그림 5. Kalman\_Rf 블록 내부 구성

Fig. 5. Internal configuration of Kalman\_Rf block.

c-codes into the two cores, cpu0 and cpu1. After application codes are made, memory areas for the cpu0 and cpu1 are defined by linker script files. Finally, FSBL, bitstream file and application codes are integrated into a boot image. Then, an executable file (mcs file) which will be downloaded onto the flash memory of target board is generated from the boot-image.

#### IV. Test and analysis

System specifications are shown in the table 1 of the chapter III. With sampling frequency of 333kHz, ADC samples 40kHz ultrasonic signal. FIR filter's band width is  $35 \sim 45$ kHz and its sampling frequency is set to 1MHz. Kalman filter's R and Q are set to 2500 and 1, respectively. The execution cycle of the Kalman filtering is about 0.5µs. RF module is made up of a 2.4GHz transceiver<sup>[14]</sup>.

#### 1. Resource utilization

Fig. 7 shows schematic diagram for detecting the ultrasonic envelope along with percent utilizations of the Zynq FPGA parts to implement the schematic where Kalman filter block is c-coded on the cpu0 but other blocks are fabricated on FPGA (HW/SW co-design). Fig. 8 shows another schematic. Compared with Fig. 7, all elements are designed on FPGA (HW



그림 6. 시스템 파일 생성 절차

강문호



그림 7. 칼만 필터를 이용한 초음파 포락선 감지와 FPGA 사용률 (HW/SW 혼용 설계)

Fig. 7. Ultrasonic-envelope detecting with Kalman filter and FPGA utilization (HW/SW co-design).



그림 8. 로우 패스 필터를 이용한 초음파 포락선 감지와 FPGA 사용률 (HW 설계) Fig. 8. Ultrasonic-envelope detecting with Low Pass Filter and FPGA utilization (HW design).

design) where a low pass filter (FIR\_LPF) is adopted to replace the Kalman filter of Fig. 7. According to the utilization results, it can be seen that not only DSP slice (DSP48) utilization of the HW design becomes double compared to the HW/SW co-design but considering other resources, HW/SW co-design is more efficient than the HW design in resource utilization.

#### 2. Design period

Fig. 9 shows steps for system design tasks. The dotted lines indicate paths corresponding to the HW design and the solid lines show the HW/SW co-design paths. In the HW design, whenever a redesign is needed, synthesis, implementation, and generation and download of bitstream should be performed, but, in the HW/SW, only build and download of the c-application file are needed. Table 2 shows measures of the periods for these tasks. From



그림 9. 시스템 설계 작업 절차

Fig. 9. Flow of system design tasks.

표 2. 시스템 설계 작업별 소요 기간 Table2. Periods of each system design tasks.

| design type    | tasks          | periods[s] | sum[s] |
|----------------|----------------|------------|--------|
|                | Synthesis      | 186        |        |
| HW design      | Implementation | 197        | 479    |
|                | Bitstream gen. | 96         |        |
| HW/SW codesign | Rebuild        | 3          | 3      |

Xilinx Vivado 2014.1, Intel Core 2.5GHz



그림 10. 시스템 출력 파형 (a) ADC (b) BPF (c) ABS (d) 칼만 필터 (e) RF 펄스 (f) 초음파 펄스 Fig. 10. System output waveforms. (a) ADC (b) BPF (c) ABS (d) Kalman filter (e) RF pulses (f) Ultrasonic pulses.



그림 11. 비교기 기준 레벨(Vref) [V]에 따른 측정 거리 (a) 0.1 (b) 0.075 (c) 0.05 (d) 0.025 (1000 샘플, 15ms/샘플) Fig. 11. Distances according to comparator reference levels(Vref) [V]. (a) 0.1 (b) 0.075 (c) 0.05 (d) 0.025 (1000 samples, 15 ms/sample)

the results, it can be seen that HW/SW co-design is much more efficient than the HW design in terms of work time. The cpu clock of the PC is 2.5GHz used in this analysis.

#### 3. TOF/Distance measurement

Fig. 10 shows the output waveforms of the ultrasonic processing system of Fig. 4, collected during 32ms by the ILA. Waveforms (a) and (b) show the outputs of ADC and BPF, respectively. Waveform (c) shows the output of ABS. Ultrasonic envelop, (d), is the filtered result of (c) by Kalman filter. Waveforms (e) and (f) show RF and ultrasonic pulses, respectively, where ultrasonic pulses are generated by comparing the envelope waveform (d)



with a reference level. TOF is calculated by the time difference between the rising edges of a RF pulse and an ultrasonic pulse. Fig. 11 shows moving distances of the ultrasonic waves, calculated by multiplying the TOF by ultrasonic velocity in air. Placing the ultrasonic transmitter and receiver about 2.3 meters away, distances are calculated 1000 times with 15ms sample time by changing the reference levels. From the figure, it can be seen that the distances varies within about 3 cm depending on the reference levels and increases as the level increases. Future research will be conducted to obtain more accurate measurement results by compensating the variance according to the comparison level and by designing digital filters.

# V. Conclusion

In this study, an ultrasonic TOF measuring system using the Zynq SoC-based baremetal multiprocess is proposed and its test results are shown. The system is designed with Vivado and implemented by using only a Zynq SoC. System components are fabricated on Zynq's FPGA and programmed on the dual process cores (cpu0 and cpu1), so a HW/SW co-design is implemented. With this baremetal HW/SW co-design, lower resource utilization and much smaller design period were obtained than the HW design. And, from the output waveforms, not only TOF accuracy but degrading factor were analyzed.

### REFERENCES

- "Zynq-7000 All programmable SoC overview," DS190 (v1.6) Xilinx, December 2, 2013.
- [2] M. J. Sarmah and C. Murphy, "Implementation of signal processing IP on Zynq-7000 AP SoC to post-process XADC samples," XAPP1203 (v1.0) Xilinx, April 2014.
- [3] P. Wehner, M. Ferger, D. Gohringer and M. Hubner, "Rapid prototyping of a portable HW/SW co-design on the virtual zynq platform using SystemC," IEEE 26th International

Conference on SOC(SOCC), pp. 296-300, 2013.

- [4] S. Gilliland, P. Govindan, T. Gonnot and J. Saniie, "Performance evaluation of FPGA based embedded ARM processor for ultrasonic imaging," IEEE International Ultrasonics Symposium (IUS), pp. 519–522, 2013.
- [5] H. P. Bruckner, C. Spindeldreier and H. Blume, "Energy-efficient inertial sensor fusion on heterogeneous FPGA-fabric/RISC system on chip," Seventh International Conference on Sensing Technology (ICST), pp. 506–511, 2013.
- [6] A. Astarloa, J. Lazaro, U. Bidarte, A. Zuloaga and M. Idirin, "System-on-Chip implementation of Reliable Ethernet Networks nodes," 39th Annual Conference of the IEEE Industrial Electronics Society, IECON, pp. 2329–2334, 2013.
- [7] A. Schmidt, "Profiling bare-metal cores in AMP systems," System, Software, SoC and Silicon Debug Conference, pp. 1–4, 2012.
- [8] J. McDougall, "Simple AMP: bare-metal system running on both Cortex-A9 processors," Application Note: Zynq-7000 AP SoC, Xilinx, Jan. 24, 2014.
- [9] J. C. Jackson, R. Summan, S. M. Whiteley, S. G. Pierce, and G. Hayward, "Time-of-flight measurement techniques for airborne ultrasonic ranging," IEEE Trans. Ultrason. Ferroelectr. Freq. Control, Vol. 60, no. 2, pp. 343–355, 2013.
- [10] Vivado design suite user guide, programming and debugging, Xilinx, Apr. 2014.
- [11] ZYBO reference manual, digilent, Feb. 2014.
- [12] B. G. Lim and M. H. Kang, "HW/SW co-design for an ultrasonic signal processing system using Zynq SoC," *Journal of The Institute of Electronics Engineers of Korea*, Vol. 51, no. 8, pp. 148–155, August 2014.
- [13] Integrated Logic Analyzer v6.0, LogiCORE IP product guide, Vivado design suite, Nov. 2015.
- [14] CC2500 low-cost low-power 2.4 GHz RF transceiver, Datasheet, TI, 2014.



#### 강 문 호(정회원) 1990년 고려대학교 전기공학과 석사

- 저 자 소 개 —

- 1995년 고려대학교 전기공학과 박사
- 졸업. 현 재 선문대학교 정보통신 디스
- 전 새 신군대약교 정모농신 니스 플레이 공학과 교수.

<주관심분야: 모바일 및 임베디드 시스템>