1. Introduction
Automatic modulation recognition is an important technique for analyzing unauthorized users in non-cooperative communications. In addition, the accurate recognition of modulation types provides essential technical support for parameter estimation, demodulation, and interference of received signals.[1]-[4] Therefore, the concept of modulation recognition has been widely studied since proposed.
The existing automatic modulation classification techniques mainly include two kinds of methods. i.e. the maximum likelihood detection method and the other is the feature classification method. In [5], J.B.Tamakuwala propose a maximum-likelihood-based modulation recognition method. However, the computational complexity is too high and there are more requirements for the acquisition of prior information. The author of [6] propose a novel decision-theoretic approach to modulation classification for MIMO systems, where the channel state information (CSI), the number of transmit antennas, and the noise variance could be unknown. This approach reduces the need for prior information in the maximum likelihood method, but the computational complexity is still not effectively addressed. Therefore, more and more scholars solve the problem of modulation recognition by studying pattern recognition methods that could eliminate the limitation of prior information. The pattern recognition method mainly includes two processes: feature extraction and feature classification. In the paper thesis [7] and [8], the feature extraction includes instantaneous features, high-order cumulant features (HOC), and cyclic spectral features. The HOC can effectively overcome the additive white Gaussian noise (AWGN). The cyclic spectral features also have good noise immunity due to the cyclic smooth and stable characteristic. In addition, some scholars have also made important contributions to the research under non-Gaussian noise environments.[9]-[10] The authors use the adaptive weight myriad filter to preprocess the impulse noise in the received signal and propose an evolutionary neural network based on the quantum elephant herding algorithm (QEHA) to classify the modulated signal.[11] These methods usually have a good classification effect in single-in single-out (SISO) systems [12]-[13], but the space-time aliasing caused by the MIMO channel seriously degrades the effectiveness of the classical methods. Therefore, the traditional pattern recognition method can no longer adequately meet the needs of modulation recognition in the MIMO systems.
Recently, deep learning (DL) has been widely used in modulation recognition due to its robust feature extraction and accurate classification capabilities. T.OShea and J.Hoydis in [14] first combine the Convolutional Neural Network (CNN) network with the modulation recognition algorithm and achieved far better performance than traditional algorithms. This is a milestone advancement in the research process of modulation recognition problems. Different from conventional pattern recognition methods, feature extraction is developed from artificial features to neural network features. The sampled data in the modulation recognition includes mainly the in-phase/quadrature (I/Q) information of the signal and the constellation map information of the amplitude-phase mapping. In [15]-[18], Long Short-Term Memory (LSTM) and Convolutional Long Short-Term Deep Neural Network (CLDNN) are proposed to take signal I/Q information as input. Y.Mao proposed a graph neural network based on constellation map information that possesses excellent performance for phase modulation signal recognition in literature [19]. However, all these methods require a large number of training samples and the model is very computationally intensive. Existing networks all use SISO system received signal as a sample data set for recognition. The modulation characteristics of MIMO systems received signals are severely fading and can not be reliably classified by the network. Therefore, the modulation recognition problem for MIMO systems requires the design of a novel network that can not only recognize the signal modulation method with high accuracy but also keep the low complexity of the network.
In response to above questions, Y. Wang proposed a cooperative decision algorithm based on the CNN network in [20]. The cooperative decision algorithm effectively solves the problem of large errors in judging the modulation type. But it also generates recognition divergence among different sub-results, which is very limited to the improvement of recognition in low SNR environments, and it is also difficult to achieve 100% accuracy in high SNR environments. In [21], the CNN-based zero-forcing (ZF) equalization method was proposed for MIMO systems. However, this approach is strongly influenced by channel prior information estimation accuracy.
Summarizing the above problems, this paper designs a lightweight network with little influence from prior information by fully considering the two critical factors of classification accuracy and algorithm complexity of the MIMO system. The input samples of the lightweight network are the I/Q information of the signals received by all antennas. Additionally, it is found in [18]-[23] that adding an attention module to a deep learning network can significantly improve classification accuracy. The attention module can effectively guide the network to extract the network features needed for classification, just as humans consciously learn essential knowledge. Therefore, this paper designs a two-dimensional interactive attention mechanism (TDIA). The TDIA module is used to extract the time series interaction information and the channel interaction information. Different channels correspond to different receiving antennas. Then, the dimensional interactive lightweight network (DilNet) designed in this paper introduces depth-wise separable convolution (DSC) to reduce the complexity of the residual structure. The DSC drastically reduces the complexity of the algorithm at the cost of small classification accuracy, but it greatly improves the lightness of the DilNet. Finally, the DilNet is trained through the penalized statistical entropy loss function. The simulation results show that the DilNet can accurately classify the modulation methods and greatly reduce the complexity. The main contributions of this paper are as follows.
• The TDIA extracts the cooperative features of the received signals from different antennas. The experiment results show that the TDIA module can efficiently extract the interaction information of spatial dimension and channel dimension. The network features would have two-dimensional interaction characteristics, which could significantly improve the classification accuracy of digitally modulated signals in MIMO systems.
• We designed a dimensional interactive lightweight network (DilNet) with four residual layers and reduced the complexity of convolutional operations by depth-wise separable convolution (DSC). The TDIA module is embedded in the DilNet network. Compared with existing networks, this network has higher classification accuracy and lower complexity. But it usually requires more training epoch times to obtain a better classification model.
• We define a novel way of calculating the loss function. This method uses penalized statistical entropy to calculate the incremental output information for each residual layer and weigh it. Compared with the method of calculating loss function by cross-entropy, the proposed method could improve the classification accuracy of DilNet significantly.
2. MIMO Signal Model And Dataset Generation
In Fig. 1, the MIMO system consists of six processes: random generation of symbols, baseband modulation, power normalization, signal format reconstruction, antenna transmission, and antenna reception. The receiver guarantees random reception of all transmitted signals.
Fig. 1. MIMO system signal generation
2.1 MIMO System And Signal Generation
Digital signal modulation obtains different modulated signals by controlling the carrier wave frequency, amplitude, and phase characteristics. The signal modulated by different modulation methods not only has the same information as the original signal but also enhances the transmission efficiency. The mathematical models of modulation types in this paper are shown below.
Phase-shift keying (PSK) signal is a phase modulated signal. And the PSK signal modulated by using a rectangular pulse ( g(.) ) can be expressed as
\(\begin{aligned}X_{M P S K}(t)=\left[\sum_{n} g\left(t-n T_{s}\right)\right] \cos \left(2 \pi f_{c} t+\varphi_{n}\right)\\\end{aligned}\) (1)
where φn represents the phase of the n-th symbol in the signal. fc is the carrier frequency, Ts is the symbol period.
Quadrature amplitude modulation (QAM) signals are phase and amplitude modulated signals that can be expressed as
\(\begin{aligned}S_{M Q A M}(t)=\left[\sum_{n} a_{n} g\left(t-n T_{s}\right)\right] \cos \left(2 \pi f_{c} t+\varphi_{n}\right)\\\end{aligned}\) (2)
where an represents the amplitude of the n-th symbol and other parameters have the same meaning as (1).
It is assumed that the MIMO channel has Nt transmit antennas and Nr receive antennas ( Nt ≤ Nr ), MIMO channel is a flat-fading and time-invariant channel. The received signal at the time k-th can be expressed as
yk = Rxk + Gk (3)
where xk = [xk(1), xk(2),…, xk(Nt)]T is the modulated signal vector (Nt ×1) , yk = [yk(1), yk(2),…, yk(Nr)]T is the received baseband signal vector (Nr ×1) , Gk is additive white Gaussian noise.
\(\begin{aligned}\boldsymbol{R}=\left[\begin{array}{llll}r_{1,1} & r_{1,2} & \ldots & r_{1, N_{t}} \\ r_{2,1} & r_{2,2} & \ldots & r_{2, N_{t}} \\ & & \ldots & \\ r_{N_{r}, 1} & r_{N_{r}, 2} & \ldots & r_{N_{r}, N_{t}}\end{array}\right]\\\end{aligned}\) (4)
where R is the (Nr × Nt) MIMO channel matrix with mean equal to 0 and variance equal to 1, and rNr,Nt represents the channel parameter. In addition, the matrix R obeys a complex Gaussian distribution.
2.2 Dataset Generation
This paper generates the data set with modulation types (2PSK, 4PSK, 8PSK, 16QAM) according to the MIMO system signal model. The sample dataset generation process is shown in Fig. 1. The vector of modulated signals can be given as X = … [ X1,X2,…,XN ], N represents the length of the signal vector. To reasonably distinguish X from different modulation types, the power of the signal is normalized as x . It is assumed that the number of transmitting antennas is Nt . The normalized signal x can be reshaped as[ x(1), x(2),..., x(Nt)]T , with dimension Nt × N / Nt. The signal transmitted by one of the antennas can be expressed as x(j) = [x1(j), x2(j),…,xN/Nt(j)]T , j ∈ [1,Nt] . The j-th transmitter antenna, x(j) has the N / Nt continuously transmitted symbols. Continuous signals of length N / Nt transmitted by Nt antennas pass through a MIMO channel, where the MIMO channel model R obeys the complex Gaussian distribution. The noise environment is additive white Gaussian noise. The signal will be received by Nr antennas and the baseband signal will be obtained by frequency conversion. The receiver sampled signal can be expressed y = [y(1), y(2),…,y(Nr)]T, with dimension Nr × (fD × N / Nt) . The i-th antenna receiver sampled signal y(i) = [y1(i), y2(i),…,yfD×N/Nt(i)] , i ∈ [1,Nr], with length fD × N / Nt . fD is the number of sampling points for per symbol.
In the MIMO system, the signals received by different antennas correspond to different channel dimensions, and the in-phase and quadrature parts of the signals correspond to the spatial dimensions. The format of the sample dataset is Y=y, \(\begin{aligned}\boldsymbol{y} \in R^{N_{r} \times 2 \times\left(f_{D} \times \frac{N}{N_{t}}\right)} \in R^{C \times W \times H}\\\end{aligned}\), C = Nr is the number of sample channels (number of receiving antennas ), W = 2 is the sample width (the in-phase and quadrature information of the received signal) and H = fD × N / Nt is the sample length (received signal length). Because the signals transmitted by the transmit antennas are modulated in the same way, extracting the cooperative features of the signals received by all antennas helps to correctly classify the signal modulations.
3. Proposed Dimensional Interactive Lightweight Network Method
3.1 Two-dimensional Interactive Attention Mechanism
In MIMO system, the signals are mainly affected by environmental noise and space-time aliasing from different transmit antenna signals. Due to the transmit antennas being modulated in the same way, space-time aliasing will lead to severe fading of signal amplitude and phase variation features. Antenna diversity is the use of two receive channels, their fading effects are uncorrelated. It is very unlikely that both will experience the same deep fade point at the same time. When combining the signals from the two antennas, the degree of fading can be reduced. Space-diversity approach can overcome space-selective fading. Therefore, the classification of the modulation in the MIMO system needs to consider the received signals of all antennas.
Liang proposes a two-dimensional attention mechanism module that can effectively extract channel and spatial features. [18] This approach verifies the positive impact of the attention mechanism module on feature extraction. Therefore, this paper designs a TDIA module to help the network extract the collaborative interaction features of signals from different antennas. The interaction information extracted by the TDIA module includes spatial interaction information among different sampling points, channel interaction information among different receiving antennas, and channel spatial across dimensions interaction information.
In Fig. 2, TDIA consists of two parts I and II. I is to extract the interaction feature of channel and spatial from the sampled signals through two branches, respectively. In the first branch, the spatial information is averagely pooled by the (5) to obtain independent channel information.
Fig. 2. Two-dimensional interactive attention mechanism
\(\begin{aligned}\boldsymbol{Y}_{g a p}=\frac{1}{W H} \sum_{w=1}^{W} \sum_{h=1}^{H} \boldsymbol{Y}_{w h}\\\end{aligned}\) (5)
where Ygap ∈ RC×1×1 represents the global average pooling of the input feature Y ∈ RC×W×H . Different channel information is extracted by the sliding window method, and the sliding window size is SW . The SW adjacent channel information is placed in the same spatial dimension through sliding window processing in Fig. 3. The feature size of the channel information becomes YSw ∈ RSW×C×1, and transpose converts the dimension to YSW ∈ RC×SW. Each channel contains the information of K adjacent channels. Extracting the interaction information of adjacent channels can be done by convolution, and the specific calculation is as follows:
Fig. 3. The principle of sliding window extraction of interactive information.
\(\begin{aligned}\boldsymbol{g}\left(\boldsymbol{Y}^{C \times S_{w}^{n}}\right)=\left[\begin{array}{c}\boldsymbol{Y}_{(n, n+k)}^{1} \\ \boldsymbol{Y}_{(n, n+k)}^{2} \\ \ldots \\ \boldsymbol{Y}_{(n, n+k)}^{C}\end{array}\right] \otimes\left[\begin{array}{ccc}\alpha_{1}^{1} & \ldots & \alpha_{K}^{1} \\ \alpha_{2}^{2} & \ldots & \alpha_{K+1}^{2} \\ & \ldots & { } \\ \alpha_{C-K+1}^{C} & \ldots & \alpha_{C}^{C}\end{array}\right]^{T}\\\end{aligned}\) (6)
Equation (6), g (YC×SWn)represents the n-th extraction of interaction information in traversing the window of size SW , ⊗ is the matrix operation symbol for the same channel (Y(n,n+K)C × [ αC-K+1C… αCC]T) . The interaction information of each channel under the same window is extracted by one-dimensional convolution (Conv1D), α is the convolution weight parameter, and K is the convolution kernel size. The interaction information of K adjacent channels can be extracted by one convolution.
YC×(SW-K+1) = f(1,(SW-K+1))(g(Y(C×SW))) (7)
where f(1,(SW-K+1)) is the statistics of all interaction information in the compressed window after one traversal, the feature size of the channel information becomes Y ∈ RC×(SW-K+1) . The interaction information of SW adjacent channels can not be extracted by one traversal of convolution when the sliding window is larger than the convolution kernel. Loop traversal until output channel interaction feature size is Y ∈ RC×1 , complete SW channel interaction information extraction.
In the second branch of Fig. 2, the input feature size is reshaped to Y ∈ RH×W×C and the spatial information is compressed into Ygap ∈ RH×1×1 by average pooling. At this time, the feature size extracted by the sliding window is YSW∈ R H×SW After traversing all the information of the window through the convolution operation, the feature size becomes Y ∈ RH×(SW-K+1). Unlike the first branch, the output feature Y ∈ RH×1 after loop traversal expands the feature size to Y ∈ RC×1 by the convolution operation with kernel size 1 when the lengths of H and C are different.
YC×2 = cat(YC, YH) (8)
The work of part II is the interaction of channel and spatial features. In formula (8), the channel interaction information and spatial interaction information obtained by the two branches in part I are spliced (cat) as YC×2 ∈ RC×2. Next, the interactive information of splicing feature is extracted by Conv1D. The feature contains a column of spatial information and a column of channel information, so only one convolution is needed to complete the interaction of spatial and channel information. The output feature size is YC×1 ∈ RC×1.
Yweight = δ (YC×1) (9)
Yout = Y × Yweight (10)
Finally, the interactive features are output through the sigmoid activation function δ . The dimension size of the output feature is expanded from Yweight ∈ RC×1 to Yweight ∈ RC×1×1 and applied to the input information as a weight. Yweight ∈ RC×1×1 are Y ∈ RC×W×H are the same in the first dimension. Therefore, they are multiplied together in the first dimension. The output feature size of TDIA is Yout ∈ RC×W×H , and the feature size does not change compared to the input feature. However, the output of the TDIA module realizes channel interaction, spatial interaction, and channel-space interaction by adding weight information. We obtain weighted output features with two-dimensional interaction characteristics and realize the extraction of cooperative features of received signals from different antennas. In section 4, we will fully verify the effectiveness of the TDIA module through comparative experiments.
3.2 Dep-thwise Separable Convolution
The difference between depth-wise separable convolution (DSC) and standard convolution is that the standard convolution has only one step of convolution processing. However, DSC consists of two steps: depth-wise convolution and point-wise convolution. Depth-wise convolution only processes spatial information within one channel, while point convolution is a standard convolution with a convolution kernel size of 1 × 1 , which is a centralized process of channel information. The DSC completes one channel and spatial information process by concatenating depth-wise convolution and point-wise convolution.
By comparing Fig. 4 and Fig. 5, we can find the different structures of the depth separable convolution and the standard convolution. The size of the input feature is Y ∈ RCi×Wi×Hi and output feature size is Y ∈ RC0×W0×H0 by the standard convolution. When the size of the convolution kernel for extracting spatial features is DW × DH, the standard convolution parameter and the calculation amount are Ci × C0 × Dw × DH and Ci × C0 × Dw × DH × W0 × H0, respectively. In the case where the input feature size and output feature size are the same, the parameter of depth-wise convolution is Ci × Dw × DH and the parameter of point-wise convolution is Ci × C0 . The calculation amount of depth-wise convolution is Ci × Dw × DH × W0 × H0 and the calculation amount of point-wise convolution is Ci × C0 × W0 × H0 . Therefore, the parameter ratio of depth-wise separable convolution and standard convolution can be expressed as
Fig. 4. Standard convolution
Fig. 5. Depth-wise separable convolution (a) Depth-wise convolution (b) Point-wise convolution
\(\begin{aligned}P_{D / S}=\frac{C_{i} \times D_{W} \times D_{H}+C_{i} \times C_{o}}{C_{i} \times C_{o} \times D_{W} \times D_{H}}=\frac{1}{C_{o}}+\frac{1}{D_{W} \times D_{H}}\\\end{aligned}\) (11)
the calculation ratio can be expressed as:
\(\begin{aligned}C_{D / S}=\frac{C_{i} \times D_{W} \times D_{H} \times W_{o} \times H_{o}+C_{i} \times C_{o} \times W_{o} \times H_{o}}{C_{i} \times C_{o} \times D_{W} \times D_{H} \times W_{o} \times H_{o}}=\frac{1}{C_{o}}+\frac{1}{D_{W} \times D_{H}}\\\end{aligned}\) (12)
It can be seen from (10) and (11) that the DSC method divides the convolution into depth-wise convolution and point-wise convolution, which reduces the number of convolution parameters and calculations. We will apply DSC to the network in this paper in a reasonable way.
3.3 Dimensional Interactive Lightweight Network Structure
The residual structure usually has an excellent performance in classification problems. [27]-[29] Based on the residual structure, this paper designs a dimensional interactive lightweight network (DilNet) for modulation recognition in MIMO systems. The residual structure can efficiently improve the classification accuracy of the network, but high complexity is often difficult to practice. Therefore, the depth-wise separable convolution is introduced to replace the standard convolution in the residual structure to reduce the network complexity. In addition, the performance of deep neural network models for image recognition in signal recognition is limited. The number of model channels grows with the number of network layers. However, the network features of the signal are relatively easy to extract. Increasing the number of channels in the hidden layer is not conducive to network convergence. Therefore, the number of network channels proposed in this paper decreases as the depth of the model increases.
The proposed network consists of downsampling, feature extraction, and feature classification, which can be seen in Fig. 6. Among them, downsampling is composed of twodimensional standard convolution (Conv2D) to prevent serious information loss during downsampling. As the core part of the network, feature extraction consists of residual block and TDIA module, which is used to extract efficient cross-dimensional interaction features. The feature classifier outputs the classification results through the fully connected layer.
Fig. 6. Dimensional interactive lightweight network structure
In Fig. 7 (a) and Fig. 7 (b), the difference between the DSC residual block and the standard residual is that the DSC residual block uses depth-wise convolution to process spatial information. The number of channels is not dilated in the first two-dimensional convolution (Conv2D) ensuring that the output feature sizes are the same as standard convolutions. Standard convolution processes spatial information and completes the dilation of channel information in the first Conv2D, which are calculated according to formulas (11) and (12) as follows:
Fig. 7. (a) Standard residual block (b) DSC residual block
\(\begin{aligned}P_{D / S}=\frac{C_{i} \times D_{W} \times D_{H}+C_{i} \times C_{o}}{\left(C_{i}+C_{o}\right) \times C_{o} \times D_{W} \times D_{H}}=\left(\frac{C_{i}}{C_{i}+C_{o}}\right)\left(\frac{1}{C_{o}}+\frac{1}{D_{W} \times D_{H}}\right)\\\end{aligned}\) (13)
\(\begin{aligned}C_{D / S}=\frac{C_{i} \times D_{W} \times D_{H} \times W_{o} \times H_{o}+C_{i} \times C_{o} \times W_{o} \times H_{o}}{\left(C_{i}+C_{o}\right) \times C_{o} \times D_{W} \times D_{H} \times W_{o} \times H_{o}}=\left(\frac{C_{i}}{C_{i}+C_{o}}\right)\left(\frac{1}{C_{o}}+\frac{1}{D_{W} \times D_{H}}\right)\end{aligned}\) (14)
It can be seen from (12) and (13) that compared with the standard residual block, the DSC residual block designed in this paper has significantly reduced parameters and calculations. Then, the TDIA module is embedded behind the DSC, which makes the features extracted by each DSC residual block have two-dimensional interactive characteristics.
In Table 1, we provide the specific layout of DilNet, where Conv2D is standard convolutional layer. BN is to prevent overfitting, Linear is fully-connected layer and Relu is activation function.
Table 1. The structural parameters of dimensional interactive lightweight network
3.4 Penalized Statistical Entropy
As one of the calculation methods of loss function, cross-entropy is widely used in training modulation recognition models. The softmax function usually converts the network output into a probability distribution. This process can be defined as
\(\begin{aligned}p_{n i}=\frac{e^{\left(p_{n i}\right)^{\prime}}}{\sum_{n=1}^{N} e^{\left(p_{n i}\right)^{\prime}}}\\\end{aligned}\) (15)
where (Pni)' is the network prediction, Pni is the predicted probability after processing by the softmax function. i is one of the classification result labels of sample n-th .
\(\begin{aligned}{\text {loss}}=-\frac{1}{N} \sum_{n=1}^{N} \sum_{i=1}^{I} y_{n i} \log \left(p_{n i}\right)\\\end{aligned}\) (16)
where N is the total number of samples, I is the number of all classification categories. yni is the true sample distribution probability.
The cross-entropy loss function can effectively statistics output information, but only calculate the cross-entropy of the output layer. In the lightweight network, the feature extraction of each residual layer after adding the attention module will have an important impact on the classification results due to the small number of network layers. It is necessary to gather loss information for the output features of each residual layer. Therefore, this article proposes penalized statistical entropy to calculate the loss function, which can be designed in Table 2.
Table 2. Penalized Statistical Entropy
losslj represents the cross-entropy of the l-th network layer during the j-th round of iteration. If the loss of a certain layer suddenly increases, it will affect the loss of the corresponding layer in the next iteration process. We define this problem as the unqualified feature extraction effect of this layer, and use the loss to punish the corresponding layer in the next iteration. [loss1j-1, loss2j-1,…, lossLj-1] is the penalty term for the hidden layer that needs statistical penalty loss information during the j-th iteration. The network avoids finding locations that lead to penalties to reduce the loss. When the loss of each layer is minimized by this method, the loss statistics of the entire network will also be reduced, each layer can extract efficient classification features, and the recognition rate of the network will also be significantly improved.
4. Simulation Results
We fully verify the performance of DilNet proposed in this paper for modulation classification in MIMO system. Types of modulated signals include{2PSK, 4PSK, 8PSK, 16QAM }. We experiment on two cases in the MIMO channel, one is that the number of antennas at the transmitting is equal to the receiving Nt = 4, Nr = 4 , and the other is that the number of antennas at the transmitting is less than the receiving Nt = 2, Nr = 4 .
The noise environment in a MIMO channel is Gaussian white noise, and it is defined as follows:
\(\begin{aligned}S N R=10 \lg \left(\frac{E_{s}}{E_{G}}\right)\\\end{aligned}\) (17)
where SNR is the signal-to-noise ratio, Es is signal power, and EG is noise power. The two classification performance evaluation rules can be expressed as:
\(\begin{aligned}P_{c c}^{s n r}=\frac{S_{\text {correct }}^{\text {snr }}}{S_{\text {test }}} \times 100 \%\\\end{aligned}\) (18)
\(\begin{aligned}P_{c c}^{a v e}=\frac{\sum_{s n r=-10}^{10} S_{\text {correct }}^{s n r}}{S_{\text {test }} \times N_{s n r}}\\\end{aligned}\) (19)
Where Pccsnr is the accuracy of a single SNR. Pccave is the average accuracy of all SNRs. Stest is the number of samples in the testset, Scorrectsnr is the number of correctly classified samples in the testset; Nsnr is the number of SNRs. In addition, we also verify the DilNet classification accuracy in the absence of SNR priors, and DilNet excellent performance in computational complexity. The simulation parameters of modulation signals and the environment parameters of DilNet is presented in Table 3 and Table 4.
Table 3. The simulation parameters of modulation signals
Table 4. The environment parameters of DilNet
4.1 Performance Comparisons of TDIA
In this section, we compare the proposed TDIA module with other attention modules. In the attention module comparison, we guarantee the criterion that the embedding modules are in the same network.
Fig. 8(a) and Fig. 8(b) show the classification performance of the same network embedded with different attention modules. When the number of transmit antennas is less than the number of receive antennas, the classification accuracy of digitally modulated signals is higher. It can be observed that the TDIA module has higher classification performance than CBAM and CPAM modules. The CBAM module can help the network extract effective features from the channel and space dimensions, but it can not extract interactive information. The modulation recognition of MIMO system needs to make cooperative decisions on the received signals of all antennas. Therefore, it is very important to extract the cross-dimensional interaction information. This is the reason for the poor performance of the CBAM module. The CPAM interacts with all channel and spatial information, resulting in information redundancy. This is an important reason why the recognition accuracy is lower than TDIA. Besides, the DilNet is trained by using the penalty statistical entropy loss function. Compared with to the TDIA network trained by the cross-entropy loss function, the experiment results show that the penalty statistical entropy can improve the overall recognition accuracy of the signal. This proves that the supervision of the hidden layer is effective. The classification accuracy is improved by about 1%-5% at low SNRs.
Fig. 8. The classification performance of different modules: (a) Nt = 2 Nr = 4 (b) Nt = 4 Nr = 4
4.2 Performance Comparisons of DilNet with Other Automatic Modulation Classification Method
In this section, we compare the proposed network with typical modulation recognition methods. In the network comparison, we follow the same guidelines for the number of network layers and the parameters of the hidden layer. Using the characteristic that the high-order cumulant (HOC) of Gaussian noise is 0, the interference of noise can be effectively reduced. Therefore higher-order cumulant features are used to distinguish modulated signals.[12] The traditional HOC feature extraction method is as follows:
Mpq = E[yj(t)p-q yj*(t)q] (20)
where yj(t) is the signal received by the j-th antenna at time t , Mpq is p-order mixing moment, and , E[.] is the mathematical expectation function. Next, HOC features will be extracted based on the mixing moments.
C40 = M40 - 3M202
C41 = M41 - 3M20M21
C42 = M42 - |M20|2 - 2M212 (21)
According to (21) the theoretical value of the HOC can be calculated. The theoretical values of the four signals are shown in Table 5.
Table 5. Higher order cumulant theoretical value
It can be seen from Table 5 that the HOC features can distinguish four kinds of modulation signals. This traditional method usually uses the support vector machine (SVM) as the classifier, so this paper will also compare it.
The performance of different typical methods is shown in Fig. 9 (a) and Fig. 9 (b), the classification accuracy of the traditional method based on the HOC feature is significantly lower than that of the convolution feature method extracted by the DL, and it can not be accurately classified at high SNRs when Nt = Nr . Because the signal is severely fading in the MIMO channel, the traditional features are affected and distorted. The Co-proposed method is to calculate each CNN model independent output results and make a cooperative decision. However, cooperative features are not extracted in the feature extraction process. There is a lack of information compensation among different antennas. Therefore, the DL methods for extracting cooperative features are significantly better than the Co-proposed method. In addition, comparing the lightweight network proposed in this paper with the other three existing networks (Resnet10, CNN, MobilenetV2), it can be found that the classification accuracy of MobilenetV2, a typical lightweight network, is lower than that of Resnet10 and CNN networks. The reason for the above situation is that existing lightweight network methods reduce network complexity by sacrificing classification accuracy. In this paper, the method of embedding the attention module and training the network with penalized statistical entropy is used to compensate for the classification accuracy without greatly increasing the complexity of the network. The experimental results show that the lightweight network proposed in this paper has the best classification accuracy.
Fig. 9. The classification performance of different mothods: (a) Nt = 2 Nr = 4 (b) Nt = 4 Nr = 4
In order to observe the effect of DilNet on each signal classification, Fig. 10 and Fig. 11 show experimental results through confusion matrix. By observing Fig. 10 (a) and Fig. 11 (a) that 8PSK and 16QAM signals have lower classification accuracy. There are many types of phases of these two signals, and the phase fading is severe in the MIMO channel transmission process. The phase characteristics are unclear and difficult to distinguish. By observing Fig. 10 (b) and Fig. 11 (b) that when the SNR is greater than -6dB and -2dB, all signals have classification accuracy greater than 99%.
Fig. 10. Nt = 2 Nr = 4, DilNet confusion matrix for the modulation classification. (a) SNR = -10dB (b) SNR = -6dB
Fig. 11. Nt = 4 Nr = 4, DilNet confusion matrix for the modulation classification. (a)SNR = -2dB (b) SNR = -2dB
4.3 Performance of DilNet in Mixed SNR and Small Sample Datasets
We compare the performance of some classical modulation recognition methods in the absence of SNR priors, where SNR ∈ [ -10, 10] dB and step size 2 takes 11 SNRs in this paper. We mix all SNR samples for training, the average recognition rate Pccave can be calculated by (18). Pccsnr is the performance of the mixed SNR model at each SNR. The training dataset in the mixed SNR dataset is 20% of the total sample dataset. The ratio of training set, test set, and validation set is 4:1:4. The performance of different typical methods is shown in Table 6 and Table 7.
Table 6. Nt = 2 Nr = 4, the performance of different typical methods in mixed SNR
Table 7. Nt = 4 Nr = 4, the performance of different typical methods in mixed SNR
In Table 6 and Table 7, the experiment results show that the DilNet has the highest classification accuracy. WhenSNR ∈ [ -8, -2] dB , the traditional and MobilenetV2 methods show the phenomenon of the accuracy decreases. Because the traditional method can hardly meet the requirements of MIMO system signal recognition through traditional feature classification. The MobilenetV2 method only seeks to reduce the complexity of the network, which leads to a decrease in the ability of network feature extraction. Therefore, there is no guarantee that the classification accuracy will increase with the increase of SNR. While reducing the network complexity, DilNet ensures the efficiency of feature extraction through the TDIA module and penalized statistical entropy. DilNet has good performance in the absence of SNR priors. The average recognition rate of DilNet is higher than Resnet10, but the recognition rate of some SNRs is lower than Resnet10. Because reducing the complexity of the model through DSC will also affect the model classification ability. Besides, the performance of CO-Proposed in mixed SNR also proves that the method of cooperative feature classification is superior to cooperative decision.
Table 8 and Table 9 exhibit the recognition accuracy of different methods in the small sample dataset. When the training dataset is larger than 3% of the sample dataset, DilNet has the highest recognition rate. However, the recognition rate of DilNet is lower than that of Resnet10, when the training dataset is less than 3% of the sample dataset. In the process of reducing the weight of the network, a part of the classification accuracy needs to be sacrificed. When the samples are sufficient, the classification accuracy can be compensated by efficient supervised training methods and reasonable attention module assistance. When the number of samples is small, the disadvantage of DilNet is difficult to eliminate. Simulation experiments show that when the number of samples in the dataset is larger than 5%, DilNet can guarantee classification accuracy.
Table 8. Nt = 2 Nr = 4, the performance of different typical methods in small sample datasets
Table 9. Nt = 4 Nr = 4, the performance of different typical methods in small sample datasets
4.4 Performance of DilNet on Computational Complexity and Convergence
Table 10 shows the computational complexity of different methods. Parameters is model parameter size, and FLOPs are the amount of computation required by the model to classify a single sample. The model train and test times are the average training and testing times per epoch. The number of transmitting antennas does not affect the analysis of the complexity regular, so this paper chooses to analyze the case of Nt = 4, Nr = 4 .
Table 10. Computational complexity performance
TDIA completes the interaction of channel and spatial information in the channel dimension and effectively reduces the complexity of the attention module through dimensional reshaping and Conv1D. Compared with CBAM and CPAM, the time complexity and space complexity of TDIA are significantly reduced. The penalty statistical entropy training network is based on the output information of the fully connected layer for statistics of each hidden layer loss. Therefore, time and space complexity are slightly improved compared to cross-entropy. Besides, compared with Resnet10 with the same number of residual layers, our network reduces the computational time and space complexity by 78.78% and 94.75%. The network proposed in this paper also has obvious advantages compared to CNN and MobilenetV2. However, the train and test time is higher than that of the CNN network, the residual structure is usually an important reason for the increased time complexity.
When the number of training epoch times increases, the training accuracy of models will be changed as shown in Fig. 12. Compared with CPAM, Resnet10, and CNN, the TDIA module requires more training epoch times to obtain the best training model. However, the TDIA module is prone to overfitting, resulting in a decrease in classification accuracy, when there are too many training epoch times. After adding penalized statistical entropy to our network, the problem of overfitting is solved, and the training accuracy is significantly improved. This experiment proves that the DilNet requires more training epoch times to obtain a better trained model. We will do further research on this issue in future work.
Fig. 12. The training accuracy of the different methods, where the number of training epoch times increases.
5. Conclusion
This paper proposes a dimensional interactive lightweight network for modulation recognition in MIMO systems. The lightweight network based on the TDIA module is used to extract cooperate features, and the network is trained by penalized statistical entropy. The experiment results demonstrate that the TDIA module has the best classification accuracy and the lowest computational complexity among three attention modules in the same network structure. When the lightweight network chooses the penalty statistical entropy as the loss function, training results are significantly improved on the original basis, with a slight extra computation needed. Additionally, the lightweight network also performs better than traditional HOC method, the cooperative modulation recognition method, and other neural networks in mixed SNR and small-sample datasets.
Acknowledgment
The authors gratefully acknowledge the funding supports by the National Natural Science Foundation of China (No. 62001138, No.62001139), the Heilongjiang Provincial Natural Science Foundation of China (No.LH2021F009), for executing parts of this work.
References
- Dhamyaa H. Al-Nuaimi, Ivan A. Hashim, Intan S. Zainal Abidin, Laith B. Salman, and Nor Ashidi Mat Isa, "Performance of Feature-Based Techniques for Automatic Digital Modulation Recognition and Classification-A Review," electronics., vol. 8, no. 12, pp. 1407, Nov 2019.
- C. Lin, W. Yan, L. Zhang, and Y. Wang, "An Overview of Communication Signals Modulation Recognition," J. China. Academy. Electron. Inform. Technol., vol. 16, no. 11, pp. 1074-1085, Nov 2021.
- J. He, and W. Zhang, "Communication Signal Modulation Recognition Technology and Its Development," High. Technol. lett., vol. 26, no. 2, pp. 157-165, Feb 2016.
- Y. Wang, J. Yang, M. Liu, and G. Gui, "LightAMC: Lightweight Automatic Modulation Classification Via Deep Learning and Compressive Sensing," IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 3491-3495, March 2020. https://doi.org/10.1109/tvt.2020.2971001
- J. B. Tamakuwala, "New Low Complexity Variance Method for Automatic Modulation Classification and Comparison with Maximum Likelihood Method," in Proc. of 2019 International Conference on Range Technology (ICORT), Balasore, India, pp. 1-5, February. 2019.
- M. Turan, M. Oner, and H. A. Cirpan, "Joint Modulation Classification and Antenna Number Detection for MIMO Systems," IEEE Commun. lett., vol. 20, no. 1, pp. 193-196, Jan. 2016. https://doi.org/10.1109/LCOMM.2015.2500898
- X. Zhao, C. Guo, and J. Li, "Mixed Recognition Algorithm for Signal Modulation Schemes by High-order Cumulants and Cyclic Spectrum," (in Chinese), J. Electron. Inform. Technol., vol. 38, no. 3, pp. 674-680, Mar 2016.
- C. Wu, and G. Feng, "New Automatic Modulation Classifier Using Cyclic-spectrum Graphs with Optimal Training Features," IEEE Commun. lett., vol. 22, no. 6, pp. 1204-1207, June 2018. https://doi.org/10.1109/lcomm.2018.2819991
- X. Yan, G. Liu, H. Wu, G. Zhang, Q. Wang, and Y. Wu, "Robust Modulation Classification Over α-Stable Noise Using Graph-Based Fractional Lower-Order Cyclic Spectrum Analysis," IEEE Trans. Veh. Technol., vol. 69, no. 3, pp. 2836-2849, January. 2020. https://doi.org/10.1109/tvt.2020.2965137
- T. V. R. O. Camara, A. D. L. Lima, B. M. M. Lima, A. I. R. Fontes, A. D. M. Martins, and L. F. Q. Silveira, "Automatic Modulation Classification Architectures Based on Cyclostationary Features in Impulsive Environments," IEEE Access, vol. 7, pp. 138512-138527, Sep. 2019. https://doi.org/10.1109/access.2019.2943300
- H. Gao, S. Wang, Y. Su, H. Sun, and Z. Zhang, "Evolutionary Neural Network based on Quantum Elephant Herding Algorithm for Modulation Recognition in Impulse Noise," KSII Trans. Internet. Inform. Systems., vol. 15, pp. 2356-2376, July. 2021.
- X. Zhang, J. Sun, and X. Zhang, "Automatic Modulation Classification Based on Novel Feature Extraction Algorithms," IEEE Access, vol. 8, pp. 16362-16371, 2020. https://doi.org/10.1109/access.2020.2966019
- D. Das, P. K. Bora, and R. Bhattacharjee, "Blind Modulation Recognition of the Lower Order PSK Signals under the MIMO Keyhole Channel," IEEE Commun. lett., vol. 22, no. 9, pp. 1834-1837, Sept. 2018. https://doi.org/10.1109/lcomm.2018.2853638
- T. O'Shea, and J. Hoydis, "An Introduction to Deep Learning for The Physical Layer," IEEE Trans. Cogn. Commun. Netw., vol. 3, no. 4, pp. 563-575, Dec. 2017. https://doi.org/10.1109/tccn.2017.2758370
- S. Hu, Y. Pei, P. P. Liang, and Y. Liang, "Deep Neural Network for Robust Modulation Classification under Uncertain Noise Conditions," IEEE Trans. Veh. Technol., vol. 69, no. 1, pp. 564-577, Jan. 2020. https://doi.org/10.1109/tvt.2019.2951594
- T. J. O'Shea, T. Roy, and T. C. Clancy, "Over-the-air Deep Learning Based Radio Signal Classification," IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 168-179, Feb. 2018. https://doi.org/10.1109/jstsp.2018.2797022
- Y. Wang, G. Gui, T. Ohtsuki, and F. Adachi, "Multi-task Learning for Generalized Automatic Modulation Classification under Non-gaussian Noise with Varying SNR Conditions," IEEE Trans. Wireless Commun., vol. 20, no. 6, pp. 3587-3596, June 2021. https://doi.org/10.1109/TWC.2021.3052222
- Z. Liang, M. Tao, L. Wang, J. Su, and X. Yang, "Automatic Modulation Recognition Based on Adaptive Attention Mechanism and ResNeXt WSL Model," IEEE Commun. lett., vol. 25 pp. 2953-2957, Sept 2021. https://doi.org/10.1109/LCOMM.2021.3093485
- Y. Mao, Y. -Y. Dong, T. Sun, X. Rao, and C. -X. Dong, "Attentive Siamese Networks for Automatic Modulation Classification Based on Multitiming Constellation Diagrams," IEEE Trans. Neural Netw. Learn Systems., pp. 1-15, 2021.
- Y. Wang, J. Wang, W. Zhang, J. Yang, and G. Gui, "Deep Learning-based Cooperative Automatic Modulation Classification Method for MIMO Systems," IEEE Trans. Veh. Technol., vol. 69, no. 4, pp. 4575-4579, April 2020. https://doi.org/10.1109/tvt.2020.2976942
- Y. Wang, J. Gui, Y. Yin, J. Wang, and G. Gui, "Automatic Modulation Classification for MIMO Systems via Deep Learning and Zero-Forcing Equalization," IEEE Trans. Veh. Technol., vol. 69, no. 5, pp. 5688-5692, May 2020. https://doi.org/10.1109/tvt.2020.2981995
- J. Fu, J. Liu, and H. Tian, "Dual Attention Network for Scene Segmentation," in Proc. of IEEE/CVF. Conf. Computer Vision. Pattern Recognition, pp. 3141-3149, June 2019.
- M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L. -C. Chen, "MobileNetV2: Inverted Residuals and Linear Bottlenecks," in Proc. of IEEE/CVF. Conf. Computer Vision. Pattern Recognition, pp. 4510-4520, June 2018.
- S. Kalluri and G. R. Arce, "Adaptive Weighted Myriad Filter Algorithms for Robust Signal Processing in /Spl Alpha/-Stable Noise Environments," IEEE Trans Signal Process., vol. 46, no. 2, pp. 322-334, Feb. 1998. https://doi.org/10.1109/78.655418
- W. Shi, D. Liu, X. Cheng, Y. Li, and Y. Zhao, "Particle Swarm Optimization-Based Deep Neural Network for Digital Modulation Recognition," IEEE Access, vol. 7, pp. 104591-104600, 2019. https://doi.org/10.1109/access.2019.2932266
- C. Zhang, S. Yu, G. Li, and Y. Xu, "The Recognition Method of MQAM Signals Based on BP Neural Network and Bird Swarm Algorithm," IEEE Access, vol. 9, pp. 36078-36086, 2021. https://doi.org/10.1109/ACCESS.2021.3061585
- K. Bu, Y. He, X. Jing, and J. Han, "Adversarial Transfer Learning for Deep Learning Based Automatic Modulation Classification," IEEE Signal Process Lett., vol. 27, pp. 880-884, 2020. https://doi.org/10.1109/lsp.2020.2991875
- C. -F. Teng, C. -Y. Chou, C. -H. Chen, and A. -Y. Wu, "Accumulated Polar Feature-Based Deep Learning for Efficient and Lightweight Automatic Modulation Classification with Channel Compensation Mechanism," IEEE Trans. Veh. Technol., vol. 69, no. 12, pp. 15472-15485, Dec. 2020. https://doi.org/10.1109/TVT.2020.3041843
- L. Li, J. Huang, Q. Cheng, H. Meng, and Z. Han, "Automatic Modulation Recognition: A FewShot Learning Method Based on the Capsule Network," IEEE Wireless Commun Lette., vol. 10, no. 3, pp. 474-477, March 2021. https://doi.org/10.1109/LWC.2020.3034913