• Title/Summary/Keyword: speech data clustering

Search Result 28, Processing Time 0.022 seconds

Automatic Clustering of Speech Data Using Modified MAP Adaptation Technique (수정된 MAP 적응 기법을 이용한 음성 데이터 자동 군집화)

  • Ban, Sung Min;Kang, Byung Ok;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.77-83
    • /
    • 2014
  • This paper proposes a speaker and environment clustering method in order to overcome the degradation of the speech recognition performance caused by various noise and speaker characteristics. In this paper, instead of using the distance between Gaussian mixture model (GMM) weight vectors as in the Google's approach, the distance between the adapted mean vectors based on the modified maximum a posteriori (MAP) adaptation is used as a distance measure for vector quantization (VQ) clustering. According to our experiments on the simulation data generated by adding noise to clean speech, the proposed clustering method yields error rate reduction of 10.6% compared with baseline speaker-independent (SI) model, which is slightly better performance than the Google's approach.

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF

Speaker Adaptation Using i-Vector Based Clustering

  • Kim, Minsoo;Jang, Gil-Jin;Kim, Ji-Hwan;Lee, Minho
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.7
    • /
    • pp.2785-2799
    • /
    • 2020
  • We propose a novel speaker adaptation method using acoustic model clustering. The similarity of different speakers is defined by the cosine distance between their i-vectors (intermediate vectors), and various efficient clustering algorithms are applied to obtain a number of speaker subsets with different characteristics. The speaker-independent model is then retrained with the training data of the individual speaker subsets grouped by the clustering results, and an unknown speech is recognized by the retrained model of the closest cluster. The proposed method is applied to a large-scale speech recognition system implemented by a hybrid hidden Markov model and deep neural network framework. An experiment was conducted to evaluate the word error rates using Resource Management database. When the proposed speaker adaptation method using i-vector based clustering was applied, the performance, as compared to that of the conventional speaker-independent speech recognition model, was improved relatively by as much as 12.2% for the conventional fully neural network, and by as much as 10.5% for the bidirectional long short-term memory.

Speaker Identification Using GMM Based on Local Fuzzy PCA (국부 퍼지 클러스터링 PCA를 갖는 GMM을 이용한 화자 식별)

  • Lee, Ki-Yong
    • Speech Sciences
    • /
    • v.10 no.4
    • /
    • pp.159-166
    • /
    • 2003
  • To reduce the high dimensionality required for training of feature vectors in speaker identification, we propose an efficient GMM based on local PCA with Fuzzy clustering. The proposed method firstly partitions the data space into several disjoint clusters by fuzzy clustering, and then performs PCA using the fuzzy covariance matrix in each cluster. Finally, the GMM for speaker is obtained from the transformed feature vectors with reduced dimension in each cluster. Compared to the conventional GMM with diagonal covariance matrix, the proposed method needs less storage and shows faster result, under the same performance.

  • PDF

Modified Phonetic Decision Tree For Continuous Speech Recognition

  • Kim, Sung-Ill;Kitazoe, Tetsuro;Chung, Hyun-Yeol
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.4E
    • /
    • pp.11-16
    • /
    • 1998
  • For large vocabulary speech recognition using HMMs, context-dependent subword units have been often employed. However, when context-dependent phone models are used, they result in a system which has too may parameters to train. The problem of too many parameters and too little training data is absolutely crucial in the design of a statistical speech recognizer. Furthermore, when building large vocabulary speech recognition systems, unseen triphone problem is unavoidable. In this paper, we propose the modified phonetic decision tree algorithm for the automatic prediction of unseen triphones which has advantages solving these problems through following two experiments in Japanese contexts. The baseline experimental results show that the modified tree based clustering algorithm is effective for clustering and reducing the number of states without any degradation in performance. The task experimental results show that our proposed algorithm also has the advantage of providing a automatic prediction of unseen triphones.

  • PDF

Efficient Triphone Clustering Using Monophone Distance (모노폰 거리를 이용한 트라이폰 클러스터링 방법 연구)

  • Bang Kyu-Seop;Yook Dong-Suk
    • Proceedings of the KSPS conference
    • /
    • 2006.05a
    • /
    • pp.41-44
    • /
    • 2006
  • The purpose of state tying is to reduce the number of models and to use relatively reliable output probability distributions. There are two approaches: one is top down clustering and the other is bottom up clustering. For seen data, the performance of bottom up approach is better than that of top down approach. In this paper, we propose a new clustering technique that can enhance the undertrained triphone clustering performance. The basic idea is to tie unreliable triphones before clustering. An unreliable triphone is the one that appears in the training data too infrequently to train the model accurately. We propose to use monophone distance to preprocess these unreliable triphones. It has been shown in a pilot experiment that the proposed method reduces the error rate significantly.

  • PDF

Construction of Customer Appeal Classification Model Based on Speech Recognition

  • Sheng Cao;Yaling Zhang;Shengping Yan;Xiaoxuan Qi;Yuling Li
    • Journal of Information Processing Systems
    • /
    • v.19 no.2
    • /
    • pp.258-266
    • /
    • 2023
  • Aiming at the problems of poor customer satisfaction and poor accuracy of customer classification, this paper proposes a customer classification model based on speech recognition. First, this paper analyzes the temporal data characteristics of customer demand data, identifies the influencing factors of customer demand behavior, and determines the process of feature extraction of customer voice signals. Then, the emotional association rules of customer demands are designed, and the classification model of customer demands is constructed through cluster analysis. Next, the Euclidean distance method is used to preprocess customer behavior data. The fuzzy clustering characteristics of customer demands are obtained by the fuzzy clustering method. Finally, on the basis of naive Bayesian algorithm, a customer demand classification model based on speech recognition is completed. Experimental results show that the proposed method improves the accuracy of the customer demand classification to more than 80%, and improves customer satisfaction to more than 90%. It solves the problems of poor customer satisfaction and low customer classification accuracy of the existing classification methods, which have practical application value.

Speech Recognition Performance Improvement using a convergence of GMM Phoneme Unit Parameter and Vocabulary Clustering (GMM 음소 단위 파라미터와 어휘 클러스터링을 융합한 음성 인식 성능 향상)

  • Oh, SangYeob
    • Journal of Convergence for Information Technology
    • /
    • v.10 no.8
    • /
    • pp.35-39
    • /
    • 2020
  • DNN error is small compared to the conventional speech recognition system, DNN is difficult to parallel training, often the amount of calculations, and requires a large amount of data obtained. In this paper, we generate a phoneme unit to estimate the GMM parameters with each phoneme model parameters from the GMM to solve the problem efficiently. And it suggests ways to improve performance through clustering for a specific vocabulary to effectively apply them. To this end, using three types of word speech database was to have a DB build vocabulary model, the noise processing to extract feature with Warner filters were used in the speech recognition experiments. Results using the proposed method showed a 97.9% recognition rate in speech recognition. In this paper, additional studies are needed to improve the problems of improved over fitting.

Wideband Speech Reconstruction Using Modular Neural Networks (모듈화한 신경 회로망을 이용한 광대역 음성 복원)

  • Woo Dong Hun;Ko Charm Han;Kang Hyun Min;Jeong Jin Hee;Kim Yoo Shin;Kim Hyung Soon
    • MALSORI
    • /
    • no.48
    • /
    • pp.93-105
    • /
    • 2003
  • Since telephone channel has bandlimited frequency characteristics, speech signal over the telephone channel shows degraded speech quality. In this paper, we propose an algorithm using neural network to reconstruct wideband speech from its narrowband version. Although single neural network is a good tool for direct mapping, it has difficulty in training for vast and complicated data. To alleviate this problem, we modularize the neural networks based on appropriate clustering of the acoustic space. We also introduce fuzzy computing to compensate for probable misclassification at the cluster boundaries. According to our simulation, the proposed algorithm showed improved performance over the single neural network and conventional codebook mapping method in both objective and subjective evaluations.

  • PDF

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

  • Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.6
    • /
    • pp.566-571
    • /
    • 2009
  • This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.