• Title/Summary/Keyword: speech parameter

Search Result 373, Processing Time 0.019 seconds

EM Algorithm with Initialization Based on Incremental ${\cal}k-means$ for GMM and Its Application to Speaker Identification (GMM을 위한 점진적 ${\cal}k-means$ 알고리즘에 의해 초기값을 갖는 EM알고리즘과 화자식별에의 적용)

  • Seo Changwoo;Hahn Hernsoo;Lee Kiyong;Lee Younjeong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.141-149
    • /
    • 2005
  • Tn general. Gaussian mixture model (GMM) is used to estimate the speaker model from the speech for speaker identification. The parameter estimates of the GMM are obtained by using the Expectation-Maximization (EM) algorithm for the maximum likelihood (ML) estimation. However the EM algorithm has such drawbacks that it depends heavily on the initialization and it needs the number of mixtures to be known. In this paper, to solve the above problems of the EM algorithm. we propose an EM algorithm with the initialization based on incremental ${\cal}k-means$ for GMM. The proposed method dynamically increases the number of mixtures one by one until finding the optimum number of mixtures. Whenever adding one mixture, we calculate the mutual relationship between it and one of other mixtures respectively. Finally. based on these mutual relationships. we can estimate the optimal number of mixtures which are statistically independent. The effectiveness of the proposed method is shown by the experiment for artificial data. Also. we performed the speaker identification by applying the proposed method comparing with other approaches.

Acoustic Characteristics of Sound Field in Partially Opened Rooms -Emphasis on Vertical Coupling of Diffuse and Free Field- (실내공간의 부분적 개방에 따른 음향특성변화 II -확산음장과 자유음장의 수직적 결합을 중심으로-)

  • Jeong, Dae-Up;Choi, Young-Ji
    • Journal of Korean Association for Spatial Structures
    • /
    • v.7 no.5
    • /
    • pp.75-82
    • /
    • 2007
  • The present work measured and analyzed changes in the acoustics of a sound field which has a retractable ceiling. An 1/20 scale model of an openable space was built and measurement was carried out by varying the opened area of a ceiling. The most widely used room acoustic and design parameters, RT, EDT, and D50 were investigated. The results suggest that the use of RT as an acoustic design parameter may not be proper in an openable space and further it is likely to mislead the initial acoustic design of such spaces. It is mainly due to the characteristics of RT in which non-exponential decay processes are linearly fitted. Early decay times were found to be decreased in proportion to increaing the ratio of opened area. D50, an index of speech intelligibility, was effectively shows the influence of openings on the acoustics. It is also found that EDT and D50 at the seats, not directly exposed to the opened part of a ceiling, were almost linearly decreased in proportion to the ratio of opened area, while little influence was found for the opening ratio larger than 40% at the directly exposed seats to the opened part of a ceiling.

  • PDF

A study on the lip shape recognition algorithm using 3-D Model (3차원 모델을 이용한 입모양 인식 알고리즘에 관한 연구)

  • 배철수
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.3 no.1
    • /
    • pp.59-68
    • /
    • 1999
  • Recently, research and developmental direction of communication system is concurrent adopting voice data and face image in speaking to provide more higher recognition rate then in the case of only voice data. Therefore, we present a method of lipreading in speech image sequence by using the 3-D facial shape model. The method use a feature information of the face image such as the opening-level of lip, the movement of jaw, and the projection height of lip. At first, we adjust the 3-D face model to speeching face image sequence. Then, to get a feature information we compute variance quantity from adjusted 3-D shape model of image sequence and use the variance quality of the adjusted 3-D model as recognition parameters. We use the intensity inclination values which obtaining from the variance in 3-D feature points as the separation of recognition units from the sequential image. After then, we use discrete HMM algorithm at recognition process, depending on multiple observation sequence which considers the variance of 3-D feature point fully. As a result of recognition experiment with the 8 Korean vowels and 2 Korean consonants, we have about 80% of recognition rate for the plosives and vowels. We propose that usability with visual distinguishing factor that using feature vector because as a result of recognition experiment for recognition parameter with the 10 korean vowels, obtaining high recognition rate.

  • PDF