Search | Korea Science

An Objective Speech Quality Measure using Masking Effect under Digital Mobile Telephone Network Environment (디지털 이동통신망 환경 하에서 마스킹 효과를 이용한 객관적 음질 평가 척도)

김광수;김민정;석수영;정호열;정현일
- Journal of Korea Multimedia Society
- /
- v.5 no.4
- /
- pp.405-414
- /
- 2002
In this paper, we propose a new objective speech quality measure using noise masking threshold for speech quality assessment of mobile telephone network environments, and verify the effectiveness of the proposed method through the experiments. For such a purpose, well known objective speech quality measures such as BSD and PSQM are first evaluated for digital mobile telephone network environments. However, these conventional methods does not have good performance under mobile networks environments compared to literary results. To be mote effective objective speech quality measure under mobile telephone environments, the proposed method employs human psychoacoustic masking effect. The DMOS, instead of MOS, is used as a subjective speech quality measure for performance evaluation. The performance comparison are carried out with speech data collected from digital mobile telephone environments. As results, the proposed measure have and average 4% higher performance, in terms of correlation, than existing objective speech quality measures such as BSD and PSQM.
PDF

Speech Activity Detection using Lip Movement Image Signals (입술 움직임 영상 선호를 이용한 음성 구간 검출)

Kim, Eung-Kyeu
- Journal of the Institute of Convergence Signal Processing
- /
- v.11 no.4
- /
- pp.289-297
- /
- 2010
In this paper, A method to prevent the external acoustic noise from being misrecognized as the speech recognition object is presented in the speech activity detection process for the speech recognition. Also this paper confirmed besides the acoustic energy to the lip movement image signals. First of all, the successive images are obtained through the image camera for personal computer and the lip movement whether or not is discriminated. The next, the lip movement image signal data is stored in the shared memory and shares with the speech recognition process. In the mean time, the acoustic energy whether or not by the utterance of a speaker is verified by confirming data stored in the shared memory in the speech activity detection process which is the preprocess phase of the speech recognition. Finally, as a experimental result of linking the speech recognition processor and the image processor, it is confirmed to be normal progression to the output of the speech recognition result if face to the image camera and speak. On the other hand, it is confirmed not to the output the result of the speech recognition if does not face to the image camera and speak. Also, the initial feature values under off-line are replaced by them. Similarly, the initial template image captured while off-line is replaced with a template image captured under on-line, so the discrimination of the lip movement image tracking is raised. An image processing test bed was implemented to confirm the lip movement image tracking process visually and to analyze the related parameters on a real-time basis. As a result of linking the speech and image processing system, the interworking rate shows 99.3% in the various illumination environments.
PDF KSCI

The study on the information compression by coding method and its performance (파형 부호와 방식에 의한 정보압축과 퍼포먼스에 관한 연구)

안동순
- Proceedings of the Acoustical Society of Korea Conference
- /
- 1985.10a
- /
- pp.68-71
- /
- 1985
In this paper, Sentence-Sip E Il Ka Gi Seo U1 E Gan Da was spoken by 4 men and 3 see sound is used for the experiment. A/D conversion time is 30 sec. Data are obtained using the microcomputer and compressed by ADPCM Rate of compression is 1/8. Data compressed by ADPCM are synthesized and compared to the original sound. Rate of speech identification is analysed using the sound pressure, white noise. Coding of ADPCM is done for 5bit. As the result of fixing starting voltage by 2.6V. It is acertained that variable value increases in initial speech signal and then process is made by minimum value "3". From the result of processing, synthesized sound is almost eaual to original sound. Minimum values cause distorition, Dummy Head System is used in this experiment.xperiment.
PDF

Modification of pitch Algorithm and Its Application to Noise (피치 알고리즘의 수정 및 소음에의 적용)

Shin, Sung-Hwan;Ih, Jeong-Guon
- Proceedings of the Korean Society for Noise and Vibration Engineering Conference
- /
- 2002.11b
- /
- pp.511-516
- /
- 2002
Pitch is a perception related to the subjective frequency that is one of the psychological aspects or attributes of tones. It is also an important factor to determine the sound quality together with loudness and timber. Although the study on pitch has been active in the field of speech communication, but its application to the product sound quality is not yet enough. In this study, the empirical data by Zwicker is made use in the modification of the currently available pitch extraction model based on the place theory. By applying this modified model to various sound samples composed of tonal or banded components, the applicability of the model is suggested. As a demonstration example, the algorithm is used for the sound quality analysis of a product noise having fundamental frequency and harmonics. The result shows that the pitch should be regarded as an important subjective cue in the sound quality analysis.
PDF

A New Speech Waveform Coding Based on the Nonuniform Sampling Method with Separated to High-Low Band (대역분리-비균일표본화 방법을 이용한 새로운 음성신호의 파형부호화 연구)

Bae, Myung-Jin;Lee, Joo-Hun;Im, Sung-Bin;Lee, Won-Cheol
- The Journal of the Acoustical Society of Korea
- /
- v.14 no.5
- /
- pp.89-93
- /
- 1995
To reduce the redundancy within samples that resulted from uniform sampling method, nonuniform sampling or nonredundant-sample coding methods can be considered. However, it is well known that when conventional nonuniform sampling methods are applied directly to speech signal, the required amount of data is comparable to or mure than that by uniform sampling method like PCM. To overcome this problem, a new nonuniform sampling method is proposed, in which nonuniform sampling is applied to the low-pass filtered speech signal and higher band is compensated by 8 colored Gaussian random noise with various noise levels. By this method, speech signal waveform can be encoded by 1.8 times larger compression ratio than the conventional nonuniform sampling method.
PDF

Effects of reverberation time on binaural Korean monosyllabic word recognition in normal hearing subjects (잔향시간이 양이를 사용한 한국어 단음절 인지에 미치는 영향)

Lim, Dukhwan
- The Journal of the Acoustical Society of Korea
- /
- v.40 no.6
- /
- pp.678-682
- /
- 2021
Reverberation Time (RT) with noise levels can affect speech recognition ability in a listening environment. The degree of influence may depend on reverberation times and modes of binaural hearing. In this study, Korean monosyllabic Word Recognition Scores (WRS) were investigated in 10 young normal hearing subjects under binaural conditions. The RT of 3.4 s and signal to noise ratio of 0 dB were used at 55 dB HL for diotic (noise with the same phase) and dichotic (noise with the fixed phase difference, π) conditions. The improvement in WRS was noted in dichotic hearing (p < 0.05) while the similar trend was not observed in diotic hearing. This data may be useful in analyzing psychoacoustic effects of RTs under noisy conditions.
https://doi.org/10.7776/ASK.2021.40.6.678 인용 PDF KSCI

An Acoustical Study of Korean Diphthongs (한국어 이중모음의 음향학적 연구)

Yang Byeong-Gon
- MALSORI
- /
- no.25_26
- /
- pp.3-26
- /
- 1993
The goals of the present study were (3) to collect and analyze sets of fundamental frequency (F0) and formant frequency (F1, F2, F3) data of Korean diphthongs from ten linguistically homogeneous speakers of Korean males, and (2) to make a comparative study of Korean monophthongs and diphthongs. Various definitions, kinds, and previous studies of diphthongs were examined in the introduction. Procedures for screening subjects to form a linguistically homogeneous group, time point selection and formant determination were explained in the following section. The principal findings were as follows: 1. Much variation was observed in the ongliding part of diphthongs. 2. F2 values of (j) group descended while those of [w] group ascended, 3. The average duration of diphthongs were about 110 msec, and there was not much variation between speakers and diphthongs. 4. In a comparative study of monophthongs and diphthongs, Fl and F2 values of the same offgliding part at the third time point almost converged. 5. The gliding of diphthongs was very short beginning from the h-noise. Perceptual studies using speech synthesis are desirable to find major parameters for diphthongs. The results of the present study wi11 be useful in the area of automated speech recognition and computer synthesis of speech.
PDF

Performance Assessment of Speech Recogniger using Lombard Speech (롬바드 음성을 이용한 음성인식기의 성능 평가)

Jung, Sung-Yun;Chung, Hyun-Yeol;Kim, Kyung-Tae
- The Journal of the Acoustical Society of Korea
- /
- v.13 no.5
- /
- pp.59-68
- /
- 1994
This paper describes the performance assessment test and analysis of test results on a Korean speech recognizer which recognizes Lombard effect received speech in noisy environment, as a basic performance assessment research. In the assessement test, standard speech data were first manipulated close to speech uttered in a noisy environment, and then performance assessment tests were carried out along with the assessment items (the type of noise, SNR) in two ways-one with Lombard effect received speech(LES), the other with not received(NLES). As a result, when 90% of recognition rate is set to be a recognition limit, it was achieved at 10dB SNR point with LES, while at 30dB with NLES. This 20dB of SNR difference indicates Lombard effect should be considered in real world assessment test. The type of noises didn't affect performance of recognizers in out tests. ANOVA analysis, in evaluating several kinds of recognizers, showed every assessment item affecting the recognition performance could be quantified.
PDF

Simultaneous Speaker and Environment Adaptation by Environment Clustering in Various Noise Environments (다양한 잡음 환경하에서 환경 군집화를 통한 화자 및 환경 동시 적응)

Kim, Young-Kuk;Song, Hwa-Jeon;Kim, Hyung-Soon
- The Journal of the Acoustical Society of Korea
- /
- v.28 no.6
- /
- pp.566-571
- /
- 2009
This paper proposes noise-robust fast speaker adaptation method based on the eigenvoice framework in various noisy environments. The proposed method is focused on de-noising and environment clustering. Since the de-noised adaptation DB still has residual noise in itself, environment clustering divides the noisy adaptation data into similar environments by a clustering method using the cepstral mean of non-speech segments as a feature vector. Then each adaptation data in the same cluster is used to build an environment-clustered speaker adapted (SA) model. After selecting multiple environmentally clustered SA models which are similar to test environment, the speaker adaptation based on an appropriate linear combination of clustered SA models is conducted. According to our experiments, we observe that the proposed method provides error rate reduction of $40{\sim}59%$ over baseline with speaker independent model.
https://doi.org/10.7776/ASK.2009.28.6.566 인용 PDF KSCI

A study on the clinical usefulness and improvement of hearing in noise test in evaluating central auditory processing (중추 청각 처리 기능 평가에서 hearing in noise test의 임상적 유용성과 개선점 고찰)

Han, Soo-Hee
- The Journal of the Acoustical Society of Korea
- /
- v.41 no.1
- /
- pp.108-113
- /
- 2022
Speech recognition in noise situation is an important skill for effective communication. Hearing In Noise Test (HINT) has been suggested as a clinical tool to evaluate these aspects. However, this tool has not been used widely in domestic clinics. In this study, psychophysical aspects of HINT and burdens in clinical application were analyzed to improve the applicability of the tool. The difficulty in understanding speech in the elderly population is due to hearing loss based on aging of peripheral and central auditory pathways. As typical clinical cases, HINT scores for young and elderly listeners (20s vs 70s) were compared. Four conditions of HINT test were Quiet (Q), Noise Front (NF), Noise Right (NR), and Noise Left (NL). Quantitative scores showed that the elderly listener required more Signal to Noris Ratio (SNR) values than the younger counterpart in noisy situations. Although both showed Binaural Masking Level Difference (BMLD) effect, the strength was smaller in the elder. However, the age-matched normalized data were not established in detail for clinical application. Confirmed usefulness of HINT and the related improvement in clinical measuring procedure were suggested.
https://doi.org/10.7776/ASK.2022.41.1.108 인용 PDF KSCI

Search Result 144, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)