# Implementation of Acoustic Echo Canceller with FPGA

Un-Cheon Lim\*, Dai-Tchul Moon\*

\*Deptartment of Information & Telecommunication Engineering Hoseo University (Received December 8 2003; accepted November 2 2004)

#### Abstract

In this paper, the AEC(acoustic echo canceller) is designed and implemented using VHDL(VHSIC hardware description language). The designed Echo Canceller employs the pipeline and the master-slave structure, and is realized with FPGA. As an adaptive algorithm, the Normalized LMS algorithm is used. For the coefficient adjustment, the Stochastic Iteration Algorithm(SIA) which uses only current residual values is used and the number of registers are evidently reduced and convergence speed is also much improved comparing to existing methods by using EAB of FPGA for FIR filter structure of transceiver. The designed Echo Canceller is verified with the test board implemented for this paper. From the timing simulation echo signals at about 1500 sampling data are converged and ERLE is improved by about 42-dB.

Keywords: Echo Canceller, NLMS, FPGA, EAB, SIA

### 1. Introduction

Even though one problem of FPGA(field programable gate array) is low operating speed, many researchers have been done to overcome this problem and the capacity of gates is getting increased. The Acoustic Echo Canceller is used to remove acoustic echoes in mobile communication systems or remote video conferencing systems. To increase processing speed, the echo canceller is designed inpipelined structure using VHSIC(very high speed integrated circuit) hardware description language and is prototyped using FPGA for reusability. For main algorithms in the echo canceller, SIA and NLMS(Normalized Least Mean Square) are adapted by considerations of system performance and hardware complexities[1]. In order for processing a coefficient of echo canceller and recieved data need many filter taps. EAB(embedded array block) is used for transversal typed FIR filter, which results in much

decreasing the number of flip-flops. Moreover, when we used EAB, system performance is improved to fast access time without delay of routing. In this paper, full duplex communication is not considered, sampling rate is 8KHz and filter has 256 taps. The system operation is verified with ModelSim simulator and synthesized with FPGA Express. For FPGA device, Altera FLEX10K50RC240 is used.

#### II. Algorithms for Echo Canceller

The adaptive filtering method finds optimum values by adjusting coefficients of filters to minimize cost functions without information about environments. By selecting cost functions, various adaptive filtering algorithms can be used. The most common algorithm is the LMS(least mean square) that minimizes average power of tolerance[1]. The convergence characteristic of the LMS is well known and stability is easily acquired with less computation. The convergence speed of the LMS algorithm is severely

Corresponding author: Dai Tchul Moon (dtmoon@office.hoseo.ac.kr) Hoseo University, 336-795 BaeBang-Myeon, Asan-Si, Chungnam, Korea



Figure 1. Structure of SIA

. . . .

affected by convergence constant  $\alpha$ . Namely, the size of  $\alpha$  decides the convergence and ERLE value. And the  $\alpha$  is directly related with stability of system. In case that input voice signals have the power with large dynamic range, fixing the value of  $\alpha$  can degrade system performance. As a method to overcome this problem, the Normalized LMS algorithm is used, it's represented by the following equations [1,2].

$$c_{k+1} = c_k + 2\alpha_N Er(k)a_k \tag{1}$$

$$\alpha_N = \frac{\alpha}{p_k} \tag{2}$$

$$\hat{e}(k) = a_k \cdot c_k \tag{3}$$

Where  $\widehat{c}(k)$  is a estimated echo signals. r(k) indicates difference between output signals of the AEC and acoustic echo signals. p(k) is the power of the kth input signal represented by the following;

$$p_{k+1} = (1+\beta)p_k + \beta x_k^2$$
(4)

Where  $\beta$  is a positive constant which is less than 1 and is



Figure 2, Structure of AEC system,

called forgetting factor. The convergence constant  $\alpha_N$  is changed to the proper value in every sampling. The second item of equation (1) is found, in reality, many applications. In general, we use stochastic iteration algorithm, that is we omit the process of finding the average value.

$$c_{k+1} = c_k + 2\alpha_N r(k)a_k \tag{5}$$

This is suitable for many adaptive algorithms, when we consider the convergence speed, ERLE (echo return loss enhancement), and hardware complexity. As an algorithm for coefficient control in AEC, the SIA that uses only current residual values for coefficient update is used. Figure 1 shows the structure of SIA.

### III. Circuit Design of AEC

The proposed AEC consists of counter circuit, control circuit and adder, multiplier, Accumulator, ALU with A/D converter interface, Data RAM, and coefficient RAM. As shown in Figure 2, the overall system is designed in pipeline structure for speed-up[3].

Master clock frequency will be 4.096 MHz, two times of 8 kHz  $\times$  256 faps = 4.096 MHz. Necessary clock signals are divided with divider and control signals are produced by decoding clock output with combinational logic circuit.

ALU consists of adder, multiplier, ACC, and A/D convert interface circuit. The adder uses CSA(carry save adder) and CLA(carry look-ahead) to reduce delay of carry. The multiplier uses modified Booth's algorithm for a part operation caused by



Figure 3, Parallel Port Timing,

| Table | 1, | Primitive | Reference | Count |
|-------|----|-----------|-----------|-------|
|-------|----|-----------|-----------|-------|

| SUMMARY             | Shift register | EAB |
|---------------------|----------------|-----|
| <u>A_</u> 21MUX     | 9              | 9   |
| CARRY               | 50             | 50  |
| DFFE                | 14444          | 177 |
| INV                 | 2              | 2   |
| LUT                 | 754            | 754 |
| LUT_CARRY           | 50             | 50  |
| Syn_ram_256x10_irou | 0              | 1   |
| Syn_ram_256x18_irou | 0              | 1   |

multiplication so that multiplication speed is improved. Considering overflow and underflow in ACC, over/under flow detector circuit is added [4,5]. For ADC, AD7813 with 8/10bit sampling is used. To use 8 bit MSBs and 2 bit LSBs inputs, A/D convert interface circuit is designed. Figure 3 shows timing chart for AD7813.

Data RAM and Coefficient RAM should implement a number of filter taps. In this paper, EAB of FPGA is used for the implementation of transversal structured FIR filter for receiving data and AEC's coefficients. Therefore, by using EAB, the number of flip-flops is reduced by 14,267 comparing to when shift register is used. Figure 4 shows the block diagram of designed AEC.



Figure 4, Block diagram of designed echo canceller,



Figure 5, Block diagram of FLEX10K Device,

FLEX10K device contains EAB(embedded array block) to build memory and each EAB is the size of 2048 bits. In case of using EAB, system performance is improved by fast access time. FLEX10K50RC240 has 10 EABs so that memory of 20480 bits can be implemented. In this paper, EAB of 7168 bits is used for 256×10bit as Data RAM and 256×18bit as Coefficient RAM. A block diagram of FLEX 10K device is shown in Figure 5[6].

In this paper, each block is designed and logically synthesized using VHDL. Circuit synthesis is implemented with FPGA Express of Synopsys and prototyped with FLEX 10K50RC240 of Altera. The overall circuit is synthesized in structural style. Table I shows the result of primitive reference count synthesized with shift register method and EAB method, and Figure 6 shows the overall circuit synthesized with EAB method.

# IV. Simulation Results and Design Verification

In order to evaluate the algorithm used in this paper and verify operating features of the adaptive filter, C-modeling simulation is programed. For the virtual voice signals, random



Figure 6, Synthesized total circuit,

| Initialization              |
|-----------------------------|
| Echo Creation               |
| Coeffcient update           |
| Sampling signal             |
| Echo estimate Creation      |
| Residual=echo-estim         |
| ERLE Computation            |
| <u>Size ++</u>              |
| Size <d size<="" th=""></d> |

Figure 7, C-modeling Flow chart,

Table 2. Device Utilization

5

| DEVICE SUMMARY      | Shift register | EAB  |
|---------------------|----------------|------|
| Input Pins          | 23             | 23   |
| Output Pins         | 44             | 44   |
| Memory Bits         | 0              | 7168 |
| Memory Utilized (%) | 0%             | 35%  |
| LCs                 | 15122          | 900  |

creation function provided by C language library is used. And a virtual impulse response which has zero average value is created for Impulse response of echo path. Flow chart of the C-modeling is shown in Figure 7







Figure 10. The ERLE curve

Table3, Comparison of simulation results with reference

|                   | ref[3]   | ref[8] | ref[9]  | This Paper |
|-------------------|----------|--------|---------|------------|
| Target            | XilinxXC | TM320C | ASDP-21 | Altera     |
| Technology        | 4000     | 54X    | 01      | Flex10K    |
| Filter Taps       | 8        | 256    | 256     | 256        |
| ADC<br>Resolution | 8bit     | 14bit  | 14bit   | 10bit      |
| Sample<br>Number  | 300      | 1,000  | 1,000   | 1,200      |
| ERLE              |          | 40dB   | 40dB    | 83dB       |

Synthesized circuit is prototyped on FLEX10K50RC240 and simulated for timing. Simulation is done using Modelsim and block simulation and timing simulation of AEC are also done. The simulation was performed under the environment in which echo signals and receiving signals are created through virtual echo paths so that it is almost same with real environment.

Table 2 describes device utilization implemented in shift register method and EAB method.

In Figure 8, residual signal from timing simulation is shown and the wave of the residual signal is shown.

In Figure 9, the signal begins to converge from 1500 sampling point.

The performance evaluation of AEC is generally done by ERLE method which is also used in this paper. Equation (6) describes the ERLE. In this equation e is echo signal and  $\hat{e}$  is estimated echo signal.

$$ERLE = 10 \log_{10} \left[ \frac{\sum_{i=0}^{N-1} e^2(n-i)}{\sum_{i=0}^{N-1} \left[ e(n-i) - \hat{e}(n-i) \right]^2} \right]$$
(6)

The ERLE curve of the designed AEC is shown in Figure 10. The ERLE curve is computed by estimated echo signals and echo signals from timing simulation for one second. According to the result, good characteristics of 83dB are shown after 1,500 sampling computations. It is assumed that there is no noise for simulation. Inputs for receiving signals are randomly created as possible but some period is shown on the signal. We think that we will have similar results under the environment with noise and random signals. Table3 shows the comparison of simulation results with reference.



(b) FPGA test board Figure 11, Test board of AEC,

| Accumulate Current Sample Period - 16.00 us   0ff Haxt Sample Period - 16.00 us   sec/Div Delay   Sec/Div Delay   200.0 ms 0ff   26 Gct 2001 06:32:32   RESTO Hill THE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | Ansigzer Haveform                                                                                                                                                                                                                                                                                                 | MACHINE ) (Acq. Control) (Cancel) (Run                            |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|
| Sec/Div Delay Horkers Acquisition Time   5.00 ms 200.0 ms 0ff 26 Bcl 2001 06:32:32   RESID 111 main training the second s                                                                                                                                                                           | Accumulate                                                                                                                                                                                                                                                                                                        | Current Sample Period = 16.00 us<br>Naxt Sample Period = 16.00 us |
| RESTO International and the international entrance in the international entrance international entrance in the international entrance internatinternate enterance international enternational entrance enternat | sec/Div<br>5.00 ms Delay<br>200.0 ms                                                                                                                                                                                                                                                                              | Horkers Acquisition Time<br>Off 26 Bcl 2001 06:32:32              |
|                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | RESID () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () () |                                                                   |

a) Echo signal without echo canceller



(b) Echo removed signal with echo canceller Figure 12. Residual waveform using Logic Analyzers.

## V. Verification with Test Board

For the test of designed FPGA, a test board is built and real FPGA operations are performed on the board. The test on the board is done by composing closed loops for residual signals as input signals. From the test, it is verified that echo signals are removed. Figure 11 shows the real FPGA test board. And figure 12 shows result test of the AEC.

## VI. Conclusions

In this paper, AEC is designed in pipeline structure using VHDL and implemented in FPGA. For FPGA, FLEX 10K50RC240 of Altera is used and the filter with 256 taps is designed using EAB and 8KHz sampled input signals. Overall system performance is improved due to the pipeline structure and EAB of FPGA and the number of flip-flops is evidently reduced by 14267 comparing to the number when shift register is used. From the timing simulation, echo signals at about 1500 sampling data are converged and ERLE is improved by about 42dB. Because of generality of VHDL and modularity of design in AEC design, we expect that the result of this paper will be easily applicable to other application areas and also design time and cost will be reduced.

#### REFERENCES

- 1. S. Haykin, Adaptive Filter Theory, 4th, Prentice-Hall, 2002
- Implementing a Li ne-Echo Canceller on the TMS320C54X, Telecommunications Applications, Texas Instruments, April, 1997, 2-13, Literature number, SPRA188,
- LK Ting, RF Woods, CFN Cowan, P Cork, C Sprigings, "High-Performance Fine-Grained Pipelined LMS Algorithm In Virtex FPGA\*, *Proceedings of SPIE 4*, 116, 2000.
- Louis P. Rubinfield, "A Proof of the Modified Booth's Algorithm for Multiplication", IEEE Transactions on Computers, 1975.
- C.C. Nagendra, M. J. Irwin, arel R. M. Owens, "Area-Time-Power Tradeoffs in Parallel Adders" *IEEE Transactions on Circuits and Systems*, 43(10), Oct. 1996,
- "LEX 10K Embedded Programmable Logic Device Family Data Sheet", Altera, 2001.
- P.C. Yip and D.M. Etter, "An Adaptive Multiple Echo Canceller for Slowly Time-Varying Echo Path", *IEEE Trans. Communi*, 1693-1698, Oct. 1990.
- T.H. You , "Implementation of Echo Canceller for CDMA Mobile Communication Systems Using a Fixed-Point DSP" , Kookmin Uni-

#### versity, 1999.

## (Profile)

#### •Dai-Tchul Moon

He received the ph. D. degrees in electronic engineering from Korea University in 1987. Since 1984 he has been a professor in depantment of Information & Telecommunication Engineering at Hoseo university. His current interests include VLSI signal processing, ASIC design, VLSI design of wireless communications, Applications of DSP.

#### **OUn-Cheon** Lim

The Jourani of Acoustical Society of Korea, Vol.8, No.6, 1989. Un-Cheon Lim was born in Korea in 1955. He received in B.E., M.S. and Ph.D. degrees from Secul National University