Search | Korea Science

Heo, Jungwoo;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.2
- /
- pp.148-154
- /
- 2021
Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.
https://doi.org/10.7776/ASK.2021.40.2.148 인용 PDF KSCI