Search | Korea Science

Vector Quantization based Speech Recognition Performance Improvement using Maximum Log Likelihood in Gaussian Distribution (가우시안 분포에서 Maximum Log Likelihood를 이용한 벡터 양자화 기반 음성 인식 성능 향상)

Chung, Kyungyong;Oh, SangYeob
- Journal of Digital Convergence
- /
- v.16 no.11
- /
- pp.335-340
- /
- 2018
Commercialized speech recognition systems that have an accuracy recognition rates are used a learning model from a type of speaker dependent isolated data. However, it has a problem that shows a decrease in the speech recognition performance according to the quantity of data in noise environments. In this paper, we proposed the vector quantization based speech recognition performance improvement using maximum log likelihood in Gaussian distribution. The proposed method is the best learning model configuration method for increasing the accuracy of speech recognition for similar speech using the vector quantization and Maximum Log Likelihood with speech characteristic extraction method. It is used a method of extracting a speech feature based on the hidden markov model. It can improve the accuracy of inaccurate speech model for speech models been produced at the existing system with the use of the proposed system may constitute a robust model for speech recognition. The proposed method shows the improved recognition accuracy in a speech recognition system.
https://doi.org/10.14400/JDC.2018.16.11.335 인용 PDF KSCI HTML

Towards better acoustic conditions in school buildings in Korea-a need for Korean standard for classroom acoustics (국내 교육시설의 음향기준 제정의 필요성 제고)

Young-Ji Choi
- The Journal of the Acoustical Society of Korea
- /
- v.42 no.2
- /
- pp.113-123
- /
- 2023
This paper describes the acoustical conditions of elementary school and high school classrooms as well as university classrooms in Korea and suggests a need for Korean acoustic standards and guidelines for classroom design. Current standards and guidelines of classroom acoustics in several countries were briefly introduced to understand their acoustical performance criteria for background noise levels and reverberation times, and noise isolation design requirements in various types of classrooms. The results of several acoustic survey of domestic classrooms in elementary school, high school, and university were described and compared to provide information of the acoustic characteristics of Korean school classrooms. The survey includes occupied and unoccupied data on the acoustical conditions, noise levels, and noise isolation performance in the classrooms. Acoustical parameter values for achieving 'good' speech intelligibility in active university classrooms were also presented.
https://doi.org/10.7776/ASK.2023.42.2.113 인용 PDF

Dimension Reduction Method of Speech Feature Vector for Real-Time Adaptation of Voice Activity Detection (음성구간 검출기의 실시간 적응화를 위한 음성 특징벡터의 차원 축소 방법)

Park Jin-Young;Lee Kwang-Seok;Hur Kang-In
- Journal of the Institute of Convergence Signal Processing
- /
- v.7 no.3
- /
- pp.116-121
- /
- 2006
In this paper, we propose the dimension reduction method of multi-dimension speech feature vector for real-time adaptation procedure in various noisy environments. This method which reduces dimensions non-linearly to map the likelihood of speech feature vector and noise feature vector. The LRT(Likelihood Ratio Test) is used for classifying speech and non-speech. The results of implementation are similar to multi-dimensional speech feature vector. The results of speech recognition implementation of detected speech data are also similar to multi-dimensional(10-order dimensional MFCC(Mel-Frequency Cepstral Coefficient)) speech feature vector.
PDF

A Multi-Model Based Noisy Speech Recognition Using the Model Compensation Method (다 모델 방식과 모델보상을 통한 잡음환경 음성인식)

Chung, Young-Joo;Kwak, Seung-Woo
- MALSORI
- /
- no.62
- /
- pp.97-112
- /
- 2007
The speech recognizer in general operates in noisy acoustical environments. Many research works have been done to cope with the acoustical variations. Among them, the multiple-HMM model approach seems to be quite effective compared with the conventional methods. In this paper, we consider a multiple-model approach combined with the model compensation method and investigate the necessary number of the HMM model sets through noisy speech recognition experiments. By using the data-driven Jacobian adaptation for the model compensation, the multiple-model approach with only a few model sets for each noise type could achieve comparable results with the re-training method.
PDF

Hybrid Commanding Delta Modulation with Silence Detection (묵음 검출 기능을 사용한 하이브리드 압신 델타 변조기)

조동호;은종관
- Journal of the Korean Institute of Telematics and Electronics
- /
- v.19 no.6
- /
- pp.84-90
- /
- 1982
In this paper we exploit the use of the intermittent property of speech to reduce the transmission rate or to increase signal-to-quantization noise ratio (SQNR) in coding speech by hybrid companding data modulation (HCDM). In this scheme we detect silence in speech by a speech/silence discriminator. HCDM coding is done only for speech portion. For silence that is detected in evert block of 5 ms, only the information indicating that the Since the HCDM coder transmits bina교 signal synchronously at a fixed rate, the use of a buffer and its efficient control is essential. By using the HCDM with silence detection in coding speech, we could improve SONR by as much as 6 dB over the conventional HCDM or reduce the transmission rate by one third of the HCDM rate.
PDF

Building a Korean conversational speech database in the emergency medical domain (응급의료 영역 한국어 음성대화 데이터베이스 구축)

Kim, Sunhee;Lee, Jooyoung;Choi, Seo Gyeong;Ji, Seunghun;Kang, Jeemin;Kim, Jongin;Kim, Dohee;Kim, Boryong;Cho, Eungi;Kim, Hojeong;Jang, Jeongmin;Kim, Jun Hyung;Ku, Bon Hyeok;Park, Hyung-Min;Chung, Minhwa
- Phonetics and Speech Sciences
- /
- v.12 no.4
- /
- pp.81-90
- /
- 2020
This paper describes a method of building Korean conversational speech data in the emergency medical domain and proposes an annotation method for the collected data in order to improve speech recognition performance. To suggest future research directions, baseline speech recognition experiments were conducted by using partial data that were collected and annotated. All voices were recorded at 16-bit resolution at 16 kHz sampling rate. A total of 166 conversations were collected, amounting to 8 hours and 35 minutes. Various information was manually transcribed such as orthography, pronunciation, dialect, noise, and medical information using Praat. Baseline speech recognition experiments were used to depict problems related to speech recognition in the emergency medical domain. The Korean conversational speech data presented in this paper are first-stage data in the emergency medical domain and are expected to be used as training data for developing conversational systems for emergency medical applications.
https://doi.org/10.13064/KSSS.2020.12.4.081 인용 PDF KSCI

Speech Basis Matrix Using Noise Data and NMF-Based Speech Enhancement Scheme (잡음 데이터를 활용한 음성 기저 행렬과 NMF 기반 음성 향상 기법)

Kwon, Kisoo;Kim, Hyung Young;Kim, Nam Soo
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.40 no.4
- /
- pp.619-627
- /
- 2015
This paper presents a speech enhancement method using non-negative matrix factorization (NMF). In the training phase, each basis matrix of source signal is obtained from a proper database, and these basis matrices are utilized for the source separation. In this case, the performance of speech enhancement relies heavily on the basis matrix. The proposed method for which speech basis matrix is made a high reconstruction error for noise signal shows a better performance than the standard NMF which basis matrix is trained independently. For comparison, we propose another method, and evaluate one of previous method. In the experiment result, the performance is evaluated by perceptual evaluation speech quality and signal to distortion ratio, and the proposed method outperformed the other methods.
https://doi.org/10.7840/kics.2015.40.4.619 인용 PDF KSCI

Speech Activity Decision with Lip Movement Image Signals (입술움직임 영상신호를 고려한 음성존재 검출)

Park, Jun;Lee, Young-Jik;Kim, Eung-Kyeu;Lee, Soo-Jong
- The Journal of the Acoustical Society of Korea
- /
- v.26 no.1
- /
- pp.25-31
- /
- 2007
This paper describes an attempt to prevent the external acoustic noise from being misrecognized as the speech recognition target. For this, in the speech activity detection process for the speech recognition, it confirmed besides the acoustic energy to the lip movement image signal of a speaker. First of all, the successive images are obtained through the image camera for PC. The lip movement whether or not is discriminated. And the lip movement image signal data is stored in the shared memory and shares with the recognition process. In the meantime, in the speech activity detection Process which is the preprocess phase of the speech recognition. by conforming data stored in the shared memory the acoustic energy whether or not by the speech of a speaker is verified. The speech recognition processor and the image processor were connected and was experimented successfully. Then, it confirmed to be normal progression to the output of the speech recognition result if faced the image camera and spoke. On the other hand. it confirmed not to output of the speech recognition result if did not face the image camera and spoke. That is, if the lip movement image is not identified although the acoustic energy is inputted. it regards as the acoustic noise.
https://doi.org/10.7776/ASK.2007.26.1.025 인용 PDF KSCI

Statistical Investigation on Airborne Noise Levels of Navy Shipboard Compartments (함정의 격실 소음도에 관한 통계적 조사 연구)

김종철;박일권;조대승
- Transactions of the Korean Society for Noise and Vibration Engineering
- /
- v.13 no.8
- /
- pp.637-644
- /
- 2003
Airborne noise is one of the considerable environmental factors for navy ship personnel who should accomplish their tasks in restricted ship spaces against adverse health effect of noise. However, it is difficult to find studies on actual condition of airborne noise for navy ships. In this study, we measured airborne noise levels at 379 compartments of 33 Korean navy ships. Using the measured data, we presented mean values and their standard deviations of measured noise levels with factors of ship type, compartment category, and operation mode. Additionally, we evaluated the mean value and cumulative probability of airborne noise levels of compartments with those of the US navy ships. These results can be rationally used in making the proper airborne noise criteria of the navy ship for the future.
https://doi.org/10.5050/KSNVN.2003.13.8.637 인용 PDF KSCI

Analysis of Eigenvalues of Covariance Matrices of Speech Signals in Frequency Domain (음성 신호의 주파수 영역에서의 공분산행렬의 고유값 분석)

Kim, Seonil
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2015.05a
- /
- pp.47-50
- /
- 2015
Speech Signals consist of signals of consonants and vowels, but the lasting time of vowels is much longer than that of consonants. It can be assumed that the correlations between signal blocks in speech signal is very high. Each speech signal is divided into blocks which have 128 speech data. FFT is applied to each block. Low frequency areas of the results of FFT is taken and Covariance matrix between blocks in a speech signal is extracted and finally eigenvalues of those matrix are obtained. It is studied that what the distribution of eigenvalues of various speech files is. The differences between speech signals and noise signals from cars are also studied.
PDF

Search Result 144, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)