# 1. Introduction

The goal of Blind Source Separation (BSS) is to separate source signals only using the mixtures, which has been widely used in the fields of audio, biomedical, digital communications, and array signal processing [1-3]. Recently, many researches focus on the Underdetermined Blind Source Separation (UBSS) problems when the number of observations is less than the sources [4, 5], in which conditions the Independent Component Analysis (ICA) methods are no longer applicable. Instead, “two-step” method is introduced, which estimates the mixing matrix first and then separates the sources [6, 7].

Obviously, mixing matrix estimation is the first and crucial step in the whole process of UBSS, and some classical methods have been developed, such as Sparse Component Analysis (SCA) and statistical-based methods, etc. SCA methods aim to exploit the sparse nature of sources in Time-Frequency (TF) domain and then use the clustering algorithms to estimate the mixing matrix, which have some requirements for sparsity [8, 9]. Differently, statistical-based methods further exploit the mutual statistical independence between sources, and formulate the problem in terms of a decomposition of a tensor (such as Canonical Decomposition), which has closed form solutions in some cases [10, 11]. For example, X. Luciani et al. [12] formulated the Second Characteristic Function (CAF) of the observations, and blindly estimated the underdetermined mixing matrix successfully using tensor decomposition. However, even if the mixing matrix has been estimated, the original signals are still not easily recovered in the underdetermined condition because the model has more unknowns than equations. To solve the ill-conditioned problem, additional assumptions must be set, then sources can be extracted by minimum norm solution using lp -norm criterion [13], matrix diagonalization [14], and subspace method [15], etc.

Among the existing algorithms, Single-Source-Point (SSP) detection [16-18] is a kind of simple and efficient method for mixing matrix estimation, which achieves good performance in both over and underdetermined cases. Specifically, it sets no conditions on the stationarity, independence or non-Gaussianity of the sources, which made it be widely used. F. Abrard et al. [19] first introduced the concept of single-source regions, and developed the Time-Frequency Ratio of Mixtures (TIFROM) algorithm for instantaneous mixtures. Based on this, M. Puigt et al. [20] got two extensions, which take both attenuation and delay into account. However, TIFROM-based methods have their limitations. They suppose single-source TF regions can be found, but if only discrete SSPs exist, the performances will turn worse.

Recently, exploitation of prior information and latent component has been proved to be quite powerful for blind separation in some complicated conditions. V. G. Reju [17] found that the directions of real and imaginary parts of Fourier transform coefficients of the same source should stay the same when the sources are linear instantaneous mixed. In general, Reju’s method is simple and practicable, but it works only when the elements of mixing matrix are real-valued, which limits their applications. Lihui et al. [18] determined another criterion to identify SSPs. After that, binary masking and K-means clustering methods are implemented to get the clustering centers. However, K-means algorithm needs to know the number of sources to ensure the accuracy, which is not able to be guaranteed in non-cooperative environment.

In this paper, a new mixing matrix estimation method for UBSS from delayed mixtures is introduced. Firstly, we exploit prior information carried on by the special structured complex mixing matrix, and then derive the new criterion for SSPs detecting. Then, modified Agglomerative Hierarchical Clustering (AHC) is used for automatic clustering and mixing matrix calculating. Finally, subspace projection method is used to extract the original signals.

The main contribution of this paper is to propose a new algorithm based on SSP detection for complex-valued mixing matrix estimation. Comparing with the existing algorithms, the proposed prior-aided algorithm plays a more rigid role in SSPs detecting, which achieves better performance for mixing matrix estimation, especially in noisy environment.

The remainder of this paper is organized as following. Section 2 provides problem formulation. Section 3 studies the priors and derives the criterion for SSPs detecting. In section 4, AHC is introduced for mixing matrix estimation. Section 5 proposes the separation method based on subspace. Simulation results are given in section 6. Section 7 concludes this paper.

# 2. Problem Formulation

## 2.1 Signal mixing model

In this paper, the anechoic mixing model is used taking the time delay of sources into account. Uniform Linear Antenna (ULA) is used here to receive the signals, as shown in Fig. 1. It is a delayed mixture model by considering the time delay τ . ϕn is the incident angle of n-th signal. d is the element spacing, which is set to d = λ/2 , where λ is wavelength.

**Fig. 1.**Uniform Linear Antenna with M elements

The received mixture at i-th element by linear delayed mixing can be written as

where

xi(t) observed i-th mixture signal, sn(t) n-th source signal, wi(t) additive white Gaussian noise at i-th element, τin time delay for n-th source transmitting to thei-th element, and for ULA antenna, τin = (i-1)d cos ϕn / c.

For narrow-band signals, (1) can be changed into

where fn is the carrier frequency of each source signal [14]. Particularly, if source signals are real-valued, Hilbert transformation must be applied first to get the analytic signals. Then, rewrite (2) in the form of matrix operation

where x(t) = [x1(t), x2(t), ⋯, xM(t)]T ∈ ℂM,

s(t) = [s1(t), s2(t), ⋯ sN(t)]T ∈ ℝN and

w(t) = [w1(t), w2(t), ⋯, sN(t)]T ∈ ℂM. A ∈ ℂM×N is the complex-valued mixing matrix with element e−j2πfcτin.

## 2.2 Short-Time Fourier Transform (STFT)

Short-time fourier transform is adopted to exploit the sparsity of signals in TF domain. The STFT of source and mixture signals are defined as

where h(t) is the window function.

Considering that A is a constant, and applying STFT to (3) at both sides, the mixing model in TF domain becomes

## 2.3 Assumptions

The sources are supposed to be non-Gaussian and statistically independent. In the underdetermined case, the number of sources is larger than the number of sensors, i.e., N > M . To separate all the source signals successfully, the ill-conditioned UBSS model should satisfy the following assumptions.

Assumption 1: For each source, there exist some TF points where the source exists alone. This assumption is necessary only for mixing matrix estimation since we develop the algorithm based on SSP detection.

Assumption 2: Any M × M submatrix of the mixing matrix A is of full rank. This assumption helps us in the mixing matrix estimation. To maintain this, sources are assumed to come from different directions in the simulation.

Assumption 3: The number K of sources occur at any TF point is less than the number M of observations, i.e., K < M . This assumption is required for original signals extraction using subspace projection method.

# 3. Proposed SSPs Detection Method

## 3.1 Prior information exploitation

Prior information can help to improve the performance especial in the complicated environments. Usually, they are derived from the analysis of source signals, which cannot be acquired in the premise of “blind”. Consequently, we exploit the information from the receiver end.

In section 2, time-delayed mixture is introduced. Different from the commonly studied instantaneous mixtures, mixing matrix A in (3) is a complex-valued with a specific structure, which can be expressed as

The objective of mixing matrix estimation is to calculate all the elements in (6). The column vector of mixing matrix corresponds to every source signal, which is

where ai,n denotes the element at i-th row and n-th column of A . Obviously, all the elements at the first row are equal to 1, whereas the others can be written as complex numbers, with modulus equal to 1. Let ai,n = Ri,n + jIi,n , then the prior information can be achieved as

Different from the commonly referred “prior information” exploited from the internal characteristics of sources, (8) is derived only using the parameters of receiver antennas, which is feasible in practice.

## 3.2 Proposed method for SSPs detecting

To estimate the mixing matrix, we detect the SSPs in the TF domain where only one source is active first, and then classify them by clustering method to achieve the estimation of mixing matrix. In this section, a new SSPs detection based on prior information is introduced as follows.

Temporarily ignoring the noises, (5) can be expressed as

Suppose TF point (t1, f1) is a SSP, where only signal sn(t) is active, then (9) becomes

Apply (7) into (10), and considering the particularity of a1n equals to 1, we get

Obviously, Xi(t1, f1) ,Sn(t1, f1) and ai,n = Ri,n + jIi,n are all complex numbers. Rewrite (11) in the form of complex multiplication as

where Re( X ) and Im( X ) denote the real and imaginary part of X , respectively. So

Apply (13) into (14) we can get

Then, considering that Ri,n2 + Ii,n2 = 1, (15) can be rewritten as

which can be changed to the form of matrix operation as

Obviously, only Ri,n and Ii,n are unknown, which can be calculated by (18) directly. Each SSP corresponds to a vector [Ri,n , Ii,n]T, which satisfies ai,n = Ri,n + jIi,n = 1 under ideal conditions. However, because of the noise and computational errors, |ai,n| may not equal to 1 strictly, then the threshold ε (which is very close to 0) is introduced, and the SSPs should satisfy the following inequation as in (19).

## 3.3 Feasibility analysis

Remark 1: Multi-Source-Points (MSPs) do not satisfy (19).

Derivation in Section 3.1 proves that criterion (19) holds when a TF point is SSP. However, the actual SSPs detection is a reverse process, that is, a TF point is determined to be a SSP if it satisfies (19). Before doing this, we have to ensure MSPs will not meet the criterion.

Suppose sαp(t), (p = 1,2,⋯ Nα) are the Np sources occur at TF point (t2, f2). That is

Then, similar with (12), (21) can be got as

So that,

Then, as for 2 ≤ i ≤ M , following equation can be got

Suppose (19) also holds at MSPs, equations like (16) can also be got with the help of (22), that is

It is obvious that (23) and (24) hold at the same time only when the following conditions are satisfied.

Considering that Re2(ai,αq) + Im2 (ai,αq) = 1 and Re2(ai,αp) + Im2 (ai,αp) = 1, (26) can be obtained as

Eq. (26) reveals that there are two different column vectors with same values, that is, bαp = bαq, (p ≠ q). It is impossible because Assumption 2 ensures that the mixing matrix proposed in (4) is column full rank, which means the sources come from different directions. Therefore, (26) can not meet, and MSPs can not satisfy criterion (19).

Remark 2: Algorithm can be applied to wideband signals.

For wideband signals, (2) is no longer exist, apply SFTF to (1) directly, so that

Let t' = t −τin, (27) can be changed into

Rewrite (28) in the form of matrix operation as

where Ai,n (f) = e−j2πfτin. Different from (5), the mixing matrix A(f) is not a constant in the whole TF domain, whereas a function of f instead. However, given a fixed TF point (t2, f2), A(f) is determined immediately. Then, (29) is equivalent to the standard narrowband model, which means that the proposed SSPs detection method is also available for wideband signals.

Definition 1: Any TF point (t0, f0) is determined as a SSP only when criterion (19) is satisfied. And SSPs can be collected as a series of data vector as

The only thing to note here is that the proposed SSP detection method works under the condition with at least two observed signals, i.e., M ≥ 2 , which makes sure (30) can be used to get the data for clustering.

# 4. Mixing Matrix Estimation Based on AHC

Every detected SSP corresponds to a matrix element a i,n (2≤ i ≤ M, 1≤ n ≤ N), and it is a complex of which modulus value is 1. They should distribute on a unit circle when setting X and Y axis correspond to the real and imaginary part respectively, and show a clustering feature to indicate different sources. After applying clustering method, clustering centers can be obtained, which denote different elements of mixing matrix. Once estimate all the matrix elements , the mixing matrix can be formulated as

K-means clustering method is commonly used for its fast computation and high accuracy especially for large data. However, it needs to know the clustering number and initial cluster centers before hand, which is difficult in noncooperative environments. It may converge to a local solution if inappropriate initialization is used. Besides, K-means method ascribes every scatter point to one class including the scatter points caused by noises, which undoubtedly increases the estimation error. Fig. 2 shows the distribution of SSPs. C1, C2, and C3 refer to the actual cluster centers. In fact, scatter points usually can not be avoided, which usually locate far from the true centers. They may be caused by noises or calculation errors, which should be discarded.

**Fig. 2.**Schematic plot of SSPs’ distribution histogram.

To overcome the shortage of general clustering methods, a special agglomerative hierarchical clustering is used for automatic clustering. The so-called “agglomerative” refers to the algorithm initialization. It initializes every SSP as a class and merges the two closest clusters at each step, and results in a series of classes. At last, the final clusters are determined after removing the scattered classes caused by noises or computation errors. Here, two parameters are adopted, th_d denotes the minimum Euclidean distance to determine when the clustering is completed, and th_N is used to drop discrete error classes. In this way, the number of sources is estimated together with the elements of mixing matrix. To further eliminate the influence of noise, the singular value decomposition method is applied, which takes the singular vectors with maximum singular values as the estimation of mixing matrix elements. The advantage of the AHC method is that in the non-transition merger cases, the noise points have their own clusters and discarded at last, which reduces the interferences for center calculating.

Suppose Ul = [Rl, Il]T (l=1,2⋯, L) denotes the detected L SSPs. Then, some important parameters are defined as follows. Pl denotes different classes with Nl elements, that is, Pl = {U1, U2, ⋯, UNl}. Define the center of Pl and distance between every two classes as

The detailed algorithm for mixing matrix estimation is provided in Table 1.

**Table 1.**Mixing matrix estimation based on SSP and AHC

# 5. Source Signals Extraction

After estimating the mixing matrix, source separation algorithm is then used to extract the original signals. However, source signals can not be directly acquired since we can not calculate the inverse matrix of mixing matrix in the underdetermined case. The UBSS model is ill-conditioned, which has an infinite number of possible solutions exactly. However, after exploiting the sparsity of sources and under some assumptions, it can be solved successfully. Aissa-El-Bey et al. [15] first introduced the subspace projection method for signal extraction supposing the number of signals appeared at any TF point was less than the number of sensors.

Suppose the number of active source at TF point ( t0, f0 ) is K, and the signals are sα1(t), ⋯, sαK(t), respectively, K

The orthogonal projection matrix Q for sα1 (t), ⋯, sαK (t) projecting to noise space is calculated by

In real environment, because of the noises and calculation errors, Qai (i ∈ {α1, ⋯, αK}) are not strictly equal to 0, whereas very close to 0 instead. Considering that A has been estimated, the column vectors of AK can be detected by minimizing

Then the real mixing matrix at TF point (t0, f0 ) can be formed as AK = [aβ1, ⋯, aβK].

The STFT values of K sources at (t0, f0) are estimated by pseudo-inverse calculation

At last, transform back to time domain using inverse STFT.

In general, the subspace-based method works well in most cases. However, the authors [21] have proved that it may “over-estimated”, which means the estimated mixing matrix at TF point (t0, f0) is larger than the actual ones. To solve the problem, a modified subspace method based on convex model is proposed by replace (36) with new objective function

where γ is the balance factor, which is set to 0.4 in the simulation.

# 6. Simulation and Results

Estimation error EA and average Signal to Interference Ratio (SIR) are adopted to measure the performance of mixing matrix estimation and signal extraction, respectively, as in (39).

where N is the number of sources, A and are the actual and estimation of mixing matrix. ||•||F calculates the Frobenius norm. In the following simulations, speech signals are used. The time window length of STFT is 32, overlapping with 0.75, FFT size is 256, and threshold for SSPs detecting is ε = 0.001. Set th_d and th_N to 0.3 and 90, respectively.

## 6.1 Performance of mixing matrix estimation

In this part, ULA with 2 sensors is adopted for simplicity. Three speech signals are taken as sources, and the DOAs are π/8, π/4 and 5π/8, respectively, SNR is 25dB, the actual mixing matrix can be calculated using (6) as

Fig. 3 shows the scatter plot of detected SSPs. Theoretically, all the column vectors of mixing matrix should be different when setting different DOAs, and then clustering methods can be used to distinguish. In Fig. 3(a), all the SSPs locate on a unit circle, and obvious clustering characteristic is shown according to the SSPs’ density of distribution. More points stay around the actual centers. Fig. 3(b) shows the SSPs’ distribution after AHC, with most discrete points removed. Clearly, the proposed method can cluster the SSPs into three groups automatically, and the estimated centers almost have the same position as the actual centers. The estimated mixing matrix is as in (41), with EA equals to -19.5298dB.

**Fig. 3.**Performance of AHC for SSPs: (a) SSPs before clustering; (b) SSPs after AHC. Blue and red points represent the estimated and actual centers, respectively.

Fig. 4 shows the performance of mixing matrix estimation with different SNRs, in comparison with the TIFROM [20] and the algorithms proposed by Reju [16] and Lihui [18]. When the SNR increases, the estimation errors of all the algorithms reduce. Lihui’s method seems a little different, it achieves the best performance with large SNRs, but the degradation is obvious in low SNRs conditions. It is reasonable, because the scatter points caused by noises greatly reduced in high SNR conditions, which is more convenient for clustering. On the contrary, lower SNR results in more error scatter points, and K-means method ascribes every scatter point to one class, which undoubtedly increases the estimation error. The algorithm proposed in this paper has the most stable and reliable performance, with error changing less than 5dB when SNR decreases from 40dB to 15dB. It outperforms the ones developed by Reju and TIFROM, with about 6dB error reduction when SNR is lower than 30dB. Particularly, the proposed method plays a stricter rule in detecting the SSPs by exploiting prior information of mixing matrix, which makes it more effective in low SNR environments. The advantage is more obvious when the number of sources increases, and the optimal SNR range expands.

**Fig. 4.**Performances with different SNRs, M=2: (a) N=3; (b) N=4.

Fig. 5 describes the performance with different number of sources. The increase of source number results in the estimation error expansion. Lihui’s method has the least estimation error when N=2, but when N increases, its performance drops together with the others. The proposed method has better performance when the source number continues to increase, and gets the lowest EA at last. In general, our method has significant advantages in low SNRs and more source environment, which make it more reliable in real applications.

**Fig. 5.**Performances with different number of sources, M=2. (a) SNR=15dB; (b) SNR=25dB.

## 6.2 Performance of source signal extraction

In this part, the number N of source is four and the number M of sensor is three. The DOAs of the signals are π/11, 3π/11, 5π/11 and 7π/11, respectively. Fig. 6 shows that all the signals have been reconstructed successfully. The top four plots show the original signals; the middle three plots represent the three mixtures and the bottom four plots describe the estimation of sources by the proposed algorithm and subspace method. The recovered waves almost have the same shapes as the original ones.

**Fig. 6.**Separation results with speech signals, SNR=25dB. (a)-(d) are three original signals; (e)-(g) are two observed mixtures; (h)-(k) represent the estimated source signals.

Fig. 7 shows the SIR comparison of different methods. It is clear that the proposed method achieves highest SIR when SNR is less than 30dB. It is reasonable because our method estimate the mixing matrix more accurately in the low SNR environments, which is consistent with the result expressed in Fig. 4(b). In particular, the performance tends towards stability when the SNR is higher than 30dB, which almost have the same performance as Lihui’s method.

**Fig. 7.**SIR comparison with different SNRs.

# 7. Conclusion

This paper focuses on the underdetermined blind source separation from linear time-delayed mixtures, and a new complex-valued mixing matrix estimation method based on prior information exploitation is developed. The proposed method relaxes the sparsity of signals as long as there are SSPs in TF domain, no matter continuous or discrete, and does not need the number of sources before hand, which makes it more reliable in the real non-cooperative environment. Simulation results indicate that the proposed method can estimate the mixing matrix and extract the original sources with higher accuracy, and has significant advantages especially in low SNR and more sources conditions. In the future work, our attention should be focused on exploiting more effective methods for underdetermined blind source separation in the non-disjoint cases. Another trend for blind source separation is to extract more prior information about true nature, morphology or structure of latent variables or sources, to get better performance.