Search | Korea Science

The Reduction or computation in MLLR Framework using PCA or ICA for Speaker Adaptation (화자적응에서 PCA 또는 ICA를 이용한 MLLR알고리즘 연산량 감소)

김지운;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.22 no.6
- /
- pp.452-456
- /
- 2003
We discuss how to reduce the number of inverse matrix and its dimensions requested in MLLR framework for speaker adaptation. To find a smaller set of variables with less redundancy, we adapt PCA (principal component analysis) and ICA (independent component analysis) that would give as good a representation as possible. The amount of additional computation when PCA or ICA is applied is as small as it can be disregarded. 10 components for ICA and 12 components for PCA represent similar performance with 36 components for ordinary MLLR framework. If dimension of SI model parameter is n, the amount of computation of inverse matrix in MLLR is proportioned to O(n⁴). So, compared with ordinary MLLR, the amount of total computation requested in speaker adaptation is reduced by about 1/81 in MLLR with PCA and 1/167 in MLLR with ICA.
PDF KSCI

A Comparative Study of Speech Parameters for Speech Recognition Neural Network (음성 인식 신경망을 위한 음성 파라키터들의 성능 비교)

Kim, Ki-Seok;Im, Eun-Jin;Hwang, Hee-Yung
- The Journal of the Acoustical Society of Korea
- /
- v.11 no.3
- /
- pp.61-66
- /
- 1992
There have been many researches that uses neural network models for automatic speech recognition, but the main trend was finding the neural network models and learning rules appropriate to automatic speech recognition. However, the choice of the input speech parameter for the neural network as well as neural network model itself is a very important factor for the improvement of performance of the automatic speech recognition system using neural network. In this paper we select 6 speech parameters from surveys of the speech recognition papers which uses neural networks, and analyze the performance for the same data and the same neural network model. We use 8 sets of 9 Korean plosives and 18 sets of 8 Korean vowels. We use recurrent neural network and compare the performance of the 6 speech parameters while the number of nodes is constant. The delta cepstrum of linear predictive coefficients showed best result and the recognition rates are 95.1% for the vowels and 100.0% for plosives.
PDF

On Codebook Design to Improve Speaker Adaptation (음성 인식 시스템의 화자 적응 성능 향상을 위한 코드북 설계)

Yang, Tae-Young;Shin, Won-Ho;Kim, Weon-Goo;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.15 no.2
- /
- pp.5-11
- /
- 1996
The purpose of this paper is to propose a method improving the performance of a semi-continuous hidden Markov model(SCHMM) speaker adaptation system which uses Bayesian Parameter reestimation approach. The performance of Bayesian speaker adaptation could be degraded in case that the features of a new speaker are severely different from those of a reference codebook. The excessive codewords of the reference codebook still remain after adaptation proess. which cause confusion in recognition process. To solve such problems, the proposed method uses formant information which is extracted from the cepstral coefficients of the reference codebook and adaptation data. The reference codebook is adapted to represent the formant distribution of a new speaker and it is used for Bayesian speaker adaptation as an initial codebook. The proposed method provides accurate correspondence between reference codebook and adaptation data. It was observed that the excessive codewords were not selected during recognition process. The experimental results showed that the proposed method improved the recognition performance.
PDF

A study on the design of ensemble reflector in a concert hall (콘서트홀 무대반사판의 설계에 관한 연구)

Kim, Min Ae;Oh, Yang Ki
- The Journal of the Acoustical Society of Korea
- /
- v.37 no.5
- /
- pp.356-362
- /
- 2018
Stage in classical shoebox type concert hall is placed and occupy one side of the hall and have much early reflections from surrounded walls and ceiling nearby. On the other hand stage in vinyard terrace concert hall, which is surrounded by terrace seats instead of walls and ceiling, has lack of early reflections which may cause lack of communications among the players. Vinyard hall stage is enclosed with terrace seats front walls, while the players located on the stage riser keep the walls off as the walls have limited heights. Ensemble reflector installed above the stage is an effective way for the players to monitor the sound produced on the stage. That may help achieving a good ensemble of the performance. Ensemble reflector over the stage of a large vinyard terrace hall of 2,000 seats was designed with the variables of the location, the shape and the area. The effectiveness of the ensemble reflector is verified with the parameter of stage support.
https://doi.org/10.7776/ASK.2018.37.5.356 인용 PDF KSCI

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

Lee Ki-Seung
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.3
- /
- pp.150-159
- /
- 2005
This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.
PDF KSCI

A study of estimation for excess attenuation of Noise propagated on the ground (지표면상을 전파하는 소음의 초과감쇠 산정방법에 관한 연구)

Oh, J.E.;Kim, D.G.;Yim, T.K.
- The Journal of the Acoustical Society of Korea
- /
- v.7 no.2
- /
- pp.20-25
- /
- 1988
This study is to explain the characteristic of excess attenuation on the ground through the outdoors experiment about noise propagation and the reduced model experiment of acoustic. The outdoors experiment on the attenuation of noise propagation was tried with the small engine that had large acoustic output, and then it was conformed that there was relationship between the excess attenuation calculated by measurement from distance attenuation and Log(D/(Hs+Hr)). As a result, it was found that the attenuation of noise propogation depended upon the direction of the wind and frequency and was regressed in a straight line. And the numerical values of excess attenuation on the ground could be calculated by regarding Log(D/(Hs+Hr)) as a parameter with an airing resistance $\sigma$. It was found that when the mean square error between the excess attenuation calculated by measurement and the value calculated by a fomula $L=-20Log\mid1+(r_1/r_2)Qexp(ik, \bigtriangleup r)\mid$ about optional $\sigma$ was least, the optimal decision of u was made. As the characteristic of model is the model experiment on a reduced scale of 1 to 40, It was conformed that it corresponds enough with the measurement value with measuring the distance attenuation in the large anecoic chamber.
PDF

Improved Synthesis Method of Negative Inter-channel Correlation Parameter Based on Anti-phase Primary Component (반위상 주요성분에 기반을 둔 개선된 음수 채널간 상관도 파라미터 합성 기법)

Hyun, Dong-Il;Lee, Seok-Pil;Park, Young-Cheol;Youn, Dae-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.31 no.6
- /
- pp.410-418
- /
- 2012
Parametric stereo(PS) and MPEG surround(MPS) are major spatial audio coding(SAC) tools. In this paper, the problem of the inter-channel correlation(ICC) synthesis in the conventional SAC is analyzed. Conventional methods assume that ambient components mixed to two output channels are anti-phased, while the primary components are assumed to be in-phased. This assumption can cause excessive ambient mixing for a negative-valued ICC. As a remedy to this problem, we propose a new ICC synthesis method based on an assumption that the primary components are anti-phased each other for a negative ICC. The proposed method is also applied to the approximation which works in practice. The performance of the proposed method was evaluated by computer simulations and the subjective listening tests verified that the proposed method is effective in not only headphones but also loudspeakers playback.
https://doi.org/10.7776/ASK.2012.31.6.410 인용 PDF KSCI

Noisy Environmental Adaptation for Word Recognition System Using Maximum a Posteriori Estimation (최대사후확률 추정법을 이용한 단어인식기의 잡음환경적응화)

Lee, Jung-Hoon;Lee, Shi-Wook;Chung, Hyun-Yeol
- The Journal of the Acoustical Society of Korea
- /
- v.16 no.2
- /
- pp.107-113
- /
- 1997
To achive a robust Korean word recognition system for both channel distortion and additive noise, maximum a posteriori estimation(MAP) adaptation is proposed and the effectiveness of environmental adaptation for improving recognition performance is investigated in this paper. To do this, recognition experiments using MAP adaptation are carried out for the three different speech ; 1) channel distortion is introduced, 2) environmental noise is added, 3) both channel distortion and additive noise are presented. Theeffectiveness of additive feature parameters, such as regressive coefficients and durations, for environmental adaptation are also investigated. From the speaker independent 100 words recognition tests, we had 9.0% of recognition improvement for the case 1), more than 75% for the case 2), and 11%~61.4% for the case 3) respectively, resulting that a MAP environmental adaptation is effective for both channel distorted and noise added speech recognition. But it turned out that duration information used as additive feature parameter did not played an important role in the tests.
PDF

A study on the acoustic performance of a silencer according to the change of properties of absorbing material (흡음재 물성치 변화에 따른 소음기 음향성능 연구)

Lee, Yongbeom;Yang, Haesang
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.4
- /
- pp.278-289
- /
- 2021
In this study, the acoustic performance of a dissipative silencer used in the ship with excellent performance compared to its size was predicted and analyzed using a numerical analysis method to reduce the pipe noise. To this end, the performance of the single expansion chamber-shaped silencer was verified using experimental and numerical analysis methods. The acoustic performance of the silencer was expressed using the Transmission Loss (TL), an indicator of its own performance, and the result was derived using the two-load method, which measured by changing the impedance at the end of the pipe. For the numerical analysis method, a general-purpose finite element analysis program was used, and the Delany-Bazley-Miki model with the flow resistivity of the sound absorbing material as an input parameter was applied. Finally, we compared the experimental and simulated results for each of the acoustic performances of the single expansion type and the dissipative silencer to confirm the consistency of the results, and predicted and analyzed the simulation results for four cases according to the properties of the sound absorbing material.
https://doi.org/10.7776/ASK.2021.40.4.278 인용 PDF KSCI

A quantitative analysis of synthetic aperture sonar image distortion according to sonar platform motion parameters (소나 플랫폼의 운동 파라미터에 따른 합성개구소나 영상 왜곡의 정량적 분석)

Kim, Sea-Moon;Byun, Sung-Hoon
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.4
- /
- pp.382-390
- /
- 2021
Synthetic aperture sonars as well as side scan sonars or multibeam echo sounders have been commercialized and are widely used for seafloor imaging. In Korea related research such as the development of a towed synthetic aperture sonar system is underway. In order to obtain high-resolution synthetic aperture sonar images, it is necessary to accurately estimate the platform motion on which it is installed, and a precise underwater navigation system is required. In this paper we are going to provide reference data for determining the required navigation accuracy and precision of navigation sensors by quantitatively analyzing how much distortion of the sonar images occurs according to motion characteristics of the platform equipped with the synthetic aperture sonar. Five types of motions are considered and normalized root mean square error is defined for quantitative analysis. Simulation for error analysis with parameter variation of motion characteristics results in that yaw and sway motion causes the largest image distortion whereas the effect of pitch and heave motion is not significant.
https://doi.org/10.7776/ASK.2021.40.4.382 인용 PDF KSCI

Search Result 241, Processing Time 0.021 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)