• Title/Summary/Keyword: TIMIT

Search Result 43, Processing Time 0.029 seconds

A Generalized Subspace Approach for Enhancing Speech Corrupted by Colored Noise Using Voice Activity Detector(VAD) (음성활동영역검색을 사용하는 유색잡음에 오염된 음성의 향상을 위한 일반화 부공간 접근)

  • Son, Kyung-Sik;Kim, Hyun-Tae
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.8
    • /
    • pp.1769-1776
    • /
    • 2013
  • In this paper, we proposed the modified YL(Yi and Loizou) algorithm, using a VAD(voice activity detector) for enhancing speech corrupted by colored noise. The performance of the proposed algorithm has been compared to the YL algorithm and LS(Lee and Son, etc.) algorithm by computer simulation. The colored noises used in the experiment were a car noise and multi-talker babble from the AURORA data base and the used voices from the TIMIT data base. It is confirmed that the proposed algorithm shows better performance from SNR(signal to noise ratio) and SSD(speech spectral distortion) viewpoint over the previous two approach.

Speaker Normalization using Gaussian Mixture Model for Speaker Independent Speech Recognition (화자독립 음성인식을 위한 GMM 기반 화자 정규화)

  • Shin, Ok-Keun
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.437-442
    • /
    • 2005
  • For the purpose of speaker normalization in speaker independent speech recognition systems, experiments are conducted on a method based on Gaussian mixture model(GMM). The method, which is an improvement of the previous study based on vector quantizer, consists of modeling the probability distribution of canonical feature vectors by a GMM with an appropriate number of clusters, and of estimating the warp factor of a test speaker by making use of the obtained probabilistic model. The purpose of this study is twofold: improving the existing ML based methods, and comparing the performance of what is called 'soft decision' method with that of the previous study based on vector quantizer. The effectiveness of the proposed method is investigated by recognition experiments on the TIMIT corpus. The experimental results showed that a little improvement could be obtained tv adjusting the number of clusters in GMM appropriately.

Rank-weighted reconstruction feature for a robust deep neural network-based acoustic model

  • Chung, Hoon;Park, Jeon Gue;Jung, Ho-Young
    • ETRI Journal
    • /
    • v.41 no.2
    • /
    • pp.235-241
    • /
    • 2019
  • In this paper, we propose a rank-weighted reconstruction feature to improve the robustness of a feed-forward deep neural network (FFDNN)-based acoustic model. In the FFDNN-based acoustic model, an input feature is constructed by vectorizing a submatrix that is created by slicing the feature vectors of frames within a context window. In this type of feature construction, the appropriate context window size is important because it determines the amount of trivial or discriminative information, such as redundancy, or temporal context of the input features. However, we ascertained whether a single parameter is sufficiently able to control the quantity of information. Therefore, we investigated the input feature construction from the perspectives of rank and nullity, and proposed a rank-weighted reconstruction feature herein, that allows for the retention of speech information components and the reduction in trivial components. The proposed method was evaluated in the TIMIT phone recognition and Wall Street Journal (WSJ) domains. The proposed method reduced the phone error rate of the TIMIT domain from 18.4% to 18.0%, and the word error rate of the WSJ domain from 4.70% to 4.43%.

Speaker Identification with Estimating the Number of Cluster Based on Boundary Subtractive Clustering (경계 차감 클러스터링에 기반한 클러스터 개수 추정 화자식별)

  • Lee, Youn-Jeong;Choi, Min-Jung;Seo, Chang-Woo;Hahn, Hern-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.26 no.5
    • /
    • pp.199-206
    • /
    • 2007
  • In this paper we propose a new clustering algorithm that performs clustering the feature vectors for the speaker identification. Unlike typical clustering approaches, the proposed method performs the clustering without the initial guesses of locations of the cluster centers and a priori information about the number of clusters. Cluster centers are obtained incrementally by adding one cluster center at a time through the boundary subtractive clustering algorithm. The number of clusters is obtained from investigating the mutual relationship between clusters. The experimental results for artificial datum and TIMIT DB show the effectiveness of the proposed algorithm as compared with the conventional methods.

Improvement of Reliability based Information Integration in Audio-visual Person Identification (시청각 화자식별에서 신뢰성 기반 정보 통합 방법의 성능 향상)

  • Tariquzzaman, Md.;Kim, Jin-Young;Hong, Joon-Hee
    • MALSORI
    • /
    • no.62
    • /
    • pp.149-161
    • /
    • 2007
  • In this paper we proposed a modified reliability function for improving bimodal speaker identification(BSI) performance. The convectional reliability function, used by N. Fox[1], is extended by introducing an optimization factor. We evaluated the proposed method in BSI domain. A BSI system was implemented based on GMM and it was tested using VidTIMIT database. Through speaker identification experiments we verified the usefulness of our proposed method. The experiments showed the improved performance, i.e., the reduction of error rate by 39%.

  • PDF

A Novel Algorithm for Discrimination of Voiced Sounds (유성음 구간 검출 알고리즘에 관한 연구)

  • Jang, Gyu-Cheol;Woo, Soo-Young;Yoo, Chang-D.
    • Speech Sciences
    • /
    • v.9 no.3
    • /
    • pp.35-45
    • /
    • 2002
  • A simple algorithm for discriminating voiced sounds in a speech is proposed. In addition to low-frequency energy and zero-crossing rate (ZCR), both of which have been widely used in the past for identifying voiced sounds, the proposed algorithm incorporates pitch variation to improve the discrimination rate. Based on TIMIT corpus, evaluation result shows an improvement of 13% in the discrimination of voiced phonemes over that of the traditional algorithm using only energy and ZCR.

  • PDF

Adaptive Noise Cancellation Based on NLMS Algorithm

  • Li, Shicong;Seo, Ji-Hun;Lee, Seok-Pil
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2014.06a
    • /
    • pp.179-180
    • /
    • 2014
  • The main goal of this paper is to present an adaptive filter system using NLMS(Normalized Least mean square) adaptive algorithm for noise cancellation. The proposed algorithm has less computational complexity and better convergence property than the former algorithms like spectral subtraction algorithm, etc. We use TIMIT criterion voice and Noisex-92 for the experiment. The experimental result shows the feasibility of our algorithm for filtering noise from voice effectively.

  • PDF

A New Speaker Adaptation Technique using Maximum Model Distance

  • Tahk, Min-Jea
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.154.2-154
    • /
    • 2001
  • This paper presented a adaptation approach based on maximum model distance (MMD) method. This method shares the same framework as they are used for training speech recognizers with abundant training data. The MMD method could adapt to all the models with or without adaptation data. If large amount of adaptation data is available, these methods could gradually approximate the speaker-dependent ones. The approach is evaluated through the phoneme recognition task on the TIMIT corpus. On the speaker adaptation experiments, up to 65.55% phoneme error reduction is achieved. The MMD could reduce phoneme error by 16.91% even when ...

  • PDF

A New Speaker Adaptation Technique using Maximum Model Distance

  • Lee, Man-Hyung;Hong, Suh-Il
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.99.1-99
    • /
    • 2001
  • This paper presented an adaptation approach based on maximum model distance (MMD) method. This method shares the same framework as they are used for training speech recognizers with abundant training data. The MMD method could adapt to all the models with or without adaptation data. If large amount of adaptation data is available, these methods could gradually approximate the speaker-dependent ones. The approach is evaluated through the phoneme recognition task on the TIMIT corpus. On the speaker adaptation experiments, up to 65.55% phoneme error reduction is achieved. The MMD could reduce phoneme error by 16.91% even when only one adaptation utterance is used.

  • PDF

A nonlinear transformation methods for GMM to improve over-smoothing effect

  • Chae, Yi Geun
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.38 no.2
    • /
    • pp.182-187
    • /
    • 2014
  • We propose nonlinear GMM-based transformation functions in an attempt to deal with the over-smoothing effects of linear transformation for voice processing. The proposed methods adopt RBF networks as a local transformation function to overcome the drawbacks of global nonlinear transformation functions. In order to obtain high-quality modifications of speech signals, our voice conversion is implemented using the Harmonic plus Noise Model analysis/synthesis framework. Experimental results are reported on the English corpus, MOCHA-TIMIT.