Search | Korea Science

A Study on the Effective Command Delivery of Commanders Using Speech Recognition Technology (국방 분야에서 전장 소음 환경 하에 음성 인식 기술 연구)

Yeong-hoon Kim;Hyun Kwon
- Convergence Security Journal
- /
- v.24 no.2
- /
- pp.161-165
- /
- 2024
Recently, speech recognition models have been advancing, accompanied by the development of various speech processing technologies to obtain high-quality data. In the defense sector, efforts are being made to integrate technologies that effectively remove noise from speech data in noisy battlefield situations and enable efficient speech recognition. This paper proposes a method for effective speech recognition in the midst of diverse noise in a battlefield scenario, allowing commanders to convey orders. The proposed method involves noise removal from noisy speech followed by text conversion using OpenAI's Whisper model. Experimental results show that the proposed method reduces the Character Error Rate (CER) by 6.17% compared to the existing method that does not remove noise. Additionally, potential applications of the proposed method in the defense are discussed.
https://doi.org/10.33778/kcsa.2024.24.2.161 인용 PDF HTML

Reduction Algorithm of Environmental Noise by Multi-band Filter (멀티밴드필터에 의한 환경잡음억압 알고리즘)

Choi, Jae-Seung
- Journal of the Korea Society of Computer and Information
- /
- v.17 no.8
- /
- pp.91-97
- /
- 2012
This paper first proposes the speech recognition algorithm by detection of the speech and noise sections at each frame, then proposes the reduction algorithm of environmental noise by multi-band filter which removes the background noises at each frame according to detection of the speech and noise sections. The proposed algorithm reduces the background noises using filter bank sub-band domain after extracting the features from the speech data. In this experiment, experimental results of the proposed noise reduction algorithm by the multi-band filter demonstrate using the speech and noise data, at each frame. Based on measuring the spectral distortion, experiments confirm that the proposed algorithm is effective for the speech by corrupted the noise.
https://doi.org/10.9708/jksci.2012.17.8.091 인용 PDF KSCI

Estimation method of noise intensity by neural network for application in speech enhancement (음성강조에의 응용을 위한 신경회로망에 의한 잡음량의 추정법)

Choi Jae-Seung
- Journal of the Institute of Electronics Engineers of Korea SP
- /
- v.42 no.3 s.303
- /
- pp.129-136
- /
- 2005
To reduce the noise in the noisy speech, it is desirable to change the parameters of the speech processing system according to the noise intensity to reproduce a good quality speech. This paper proposes an estimation method of noise intensity using a three layered neural network, which is able to learn the three graded speeches that is degraded by white noise or road noise. Experimental results demonstrate that the noise intensity could be estimated by the neural network. Even if the speakers and speech data are different from the training data, estimation rates for the noise intensity can be estimated by the neural network with an average accuracy of $95\%$ or more for white noise.
PDF KSCI

Improved Acoustic Modeling Based on Selective Data-driven PMC

Kim, Woo-Il;Kang, Sun-Mee;Ko, Han-Seok
- Speech Sciences
- /
- v.9 no.1
- /
- pp.39-47
- /
- 2002
This paper proposes an effective method to remedy the acoustic modeling problem inherent in the usual log-normal Parallel Model Composition intended for achieving robust speech recognition. In particular, the Gaussian kernels under the prescribed log-normal PMC cannot sufficiently express the corrupted speech distributions. The proposed scheme corrects this deficiency by judiciously selecting the 'fairly' corrupted component and by re-estimating it as a mixture of two distributions using data-driven PMC. As a result, some components become merged while equal number of components split. The determination for splitting or merging is achieved by means of measuring the similarity of the corrupted speech model to those of the clean model and the noise model. The experimental results indicate that the suggested algorithm is effective in representing the corrupted speech distributions and attains consistent improvement over various SNR and noise cases.
PDF

Data Augmentation for DNN-based Speech Enhancement (딥 뉴럴 네트워크 기반의 음성 향상을 위한 데이터 증강)

Lee, Seung Gwan;Lee, Sangmin
- Journal of Korea Multimedia Society
- /
- v.22 no.7
- /
- pp.749-758
- /
- 2019
This paper proposes a data augmentation algorithm to improve the performance of DNN(Deep Neural Network) based speech enhancement. Many deep learning models are exploring algorithms to maximize the performance in limited amount of data. The most commonly used algorithm is the data augmentation which is the technique artificially increases the amount of data. For the effective data augmentation algorithm, we used a formant enhancement method that assign the different weights to the formant frequencies. The DNN model which is trained using the proposed data augmentation algorithm was evaluated in various noise environments. The speech enhancement performance of the DNN model with the proposed data augmentation algorithm was compared with the algorithms which are the DNN model with the conventional data augmentation and without the data augmentation. As a result, the proposed data augmentation algorithm showed the higher speech enhancement performance than the other algorithms.
https://doi.org/10.9717/kmms.2019.22.7.749 인용 PDF KSCI HTML

Performance Improvement in the Multi-Model Based Speech Recognizer for Continuous Noisy Speech Recognition (연속 잡음 음성 인식을 위한 다 모델 기반 인식기의 성능 향상에 대한 연구)

Chung, Yong-Joo
- Speech Sciences
- /
- v.15 no.2
- /
- pp.55-65
- /
- 2008
Recently, the multi-model based speech recognizer has been used quite successfully for noisy speech recognition. For the selection of the reference HMM (hidden Markov model) which best matches the noise type and SNR (signal to noise ratio) of the input testing speech, the estimation of the SNR value using the VAD (voice activity detection) algorithm and the classification of the noise type based on the GMM (Gaussian mixture model) have been done separately in the multi-model framework. As the SNR estimation process is vulnerable to errors, we propose an efficient method which can classify simultaneously the SNR values and noise types. The KL (Kullback-Leibler) distance between the single Gaussian distributions for the noise signal during the training and testing is utilized for the classification. The recognition experiments have been done on the Aurora 2 database showing the usefulness of the model compensation method in the multi-model based speech recognizer. We could also see that further performance improvement was achievable by combining the probability density function of the MCT (multi-condition training) with that of the reference HMM compensated by the D-JA (data-driven Jacobian adaptation) in the multi-model based speech recognizer.
PDF

On-line model compensation using noise masking effect for robust speech recognition (잡음 차폐를 이용한 온라인 모델 보상)

Jung Gue-Jun;Cho Hoon-Young;Oh Yung-Hwan
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.215-218
- /
- 2003
In this paper we apply PMC (parallel model combination) to speech recognition system online. As a representative of model based noise compensation techniques, PMC compensates environmental mismatch by combining pretrained clean speech models and real-time estimated noise information. This is very effective approach for compensating extreme environmental mismatch but is inadequate to use in on-line system for heavy computational cost. To reduce the computational cost and to apply PMC online, we use a noise masking effect - the energy in a frequency band is dominated either by clean speech energy or by noise energy - in the process of model compensation. Experiments on artificially produced noisy speech data confirm that the proposed technique is fast and effective for the on-line model compensation.
PDF

Noise Reduction Algorithm in Speech by Wiener Filter (위너필터에 의한 음성 중의 잡음제거 알고리즘)

Choi, Jae-Seung
- The Journal of the Korea institute of electronic communication sciences
- /
- v.8 no.9
- /
- pp.1293-1298
- /
- 2013
This paper proposes a noise reduction algorithm using Wiener filter to remove the noise components from the noisy speech in order to improve the speech signal. The proposed algorithm first removes the noise spectrums of white noise from the noisy signal based on the noise reshaping and reduction method at each frame. And this algorithm enhances the speech signal using Wiener filter based on linear predictive coding analysis. In this experiment, experimental results of the proposed algorithm demonstrate using the speech and noise data by Japanese male speaker. Based on measuring the spectral distortion (SD) measure, experiments confirm that the proposed algorithm is effective for the speech by contaminated white noise. From the experiments, the maximum improvement in the output SD values was 4.94 dB better for white noise compared with former Wiener filter.
https://doi.org/10.13067/JKIECS.2013.8.9.1293 인용 PDF KSCI

Speech perception difficulties and their associated cognitive functions in older adults (노년층의 말소리 지각 능력 및 관련 인지적 변인)

Lee, Soo Jung;Kim, HyangHee
- Phonetics and Speech Sciences
- /
- v.8 no.1
- /
- pp.63-69
- /
- 2016
The aims of the present study are two-fold: 1) to explore differences on speech perception between younger and older adults according to noise conditions; and 2) to investigate which cognitive domains are correlated with speech perception. Data were acquired from 15 younger adults and 15 older adults. Sentence recognition test was conducted in four noise conditions(i.e., in-quiet, +5 dB SNR, 0 dB SNR, -5 dB SNR). All participants completed auditory and cognitive assessment. Upon controlling for hearing thresholds, the older group revealed significantly poorer performance compared to the younger adults only under the high noise condition at -5 dB SNR. For older group, performance on Seoul Verbal Learning Test(immediate recall) was significantly correlated with speech perception performance, upon controlling for hearing thresholds. In older adults, working memory and verbal short-term memory are the best predictors of speech-in-noise perception. The current study suggests that consideration of cognitive function for older adults in speech perception assessment is necessary due to its adverse effect on speech perception under background noise.
https://doi.org/10.13064/KSSS.2016.8.1.063 인용 PDF KSCI

Fast Speaker Adaptation Based on Eigenspace-based MLLR Using Artificially Distorted Speech in Car Noise Environment (차량 잡음 환경에서 인위적 왜곡 음성을 이용한 Eigenspace-based MLLR에 기반한 고속 화자 적응)

Song, Hwa-Jeon;Jeon, Hyung-Bae;Kim, Hyung-Soon
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.119-125
- /
- 2009
This paper proposes fast speaker adaptation method using artificially distorted speech in telematics terminal under the car noise environment based on eigenspace-based maximum likelihood linear regression (ES-MLLR). The artificially distorted speech is built from adding the various car noise signals collected from a driving car to the speech signal collected from an idling car. Then, in every environment, the transformation matrix is estimated by ES-MLLR using the artificially distorted speech corresponding to the specific noise environment. In test mode, an online model is built by weighted sum of the environment transformation matrices depending on the driving condition. In 3k-word recognition task in the telematics terminal, we achieve a performance superior to ES-MLLR even using the adaptation data collected from the driving condition.
PDF

Search Result 144, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)