Search | Korea Science

A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments (네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계)

Lee, Gil-Ho;Yoon, Jae-Sam;Oh, Yoo-Rhee;Kim, Hong-Kook
- MALSORI
- /
- no.54
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

Eom, Youngsik;Bang, Junseong
- Journal of information and communication convergence engineering
- /
- v.19 no.3
- /
- pp.148-154
- /
- 2021
With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.
https://doi.org/10.6109/jicce.2021.19.3.148 인용 PDF KSCI

Speech Noise Cancellation using Time Adaptive Threshold Value in Wavelet Transform

Lee Chul-Hee;Lee Ki-Hoon;Hwang Hyang-Ja;Moon In-Seob;Kim Chong-Kyo
- Proceedings of the IEEK Conference
- /
- summer
- /
- pp.244-248
- /
- 2004
This paper proposes a new noise cancellation method for speech recognition in noise environments. We determine the time adaptive threshold value using standard deviations of wavelet coefficients after wavelet transform by frames. The time adaptive threshold value is set up by using sum of standard deviations of wavelet coefficients in cA3 and weighted cD1. cA3 coefficients represent the voiced sound with lower frequency components and cD1 coefficients represent the unvoiced sound with higher frequency components. In experiments, we removed noise after adding white Gaussian noise and colored noise to original speech. The proposed method improved SNR and MSE more than wavelet transform and wavelet packet transform does. As a result of speech recognition experiment using noise speech DB, recognition performance is improved by $2\sim4\;\%.$
PDF

Optimal Feature Parameters Extraction for Speech Recognition of Ship's Wheel Orders (조타명령의 음성인식을 위한 최적 특징파라미터 검출에 관한 연구)

Moon, Serng-Bae;Chae, Yang-Bum;Jun, Seung-Hwan
- Journal of the Korean Society of Marine Environment & Safety
- /
- v.13 no.2 s.29
- /
- pp.161-167
- /
- 2007
The goal of this paper is to develop the speech recognition system which can control the ship's auto pilot. The feature parameters predicting the speaker's intention was extracted from the sample wheel orders written in SMCP(IMO Standard Marine Communication Phrases). And we designed the post-recognition procedure based on the parameters which could make a final decision from the list of candidate words. To evaluate the effectiveness of these parameters and the procedure, the basic experiment was conducted with total 525 wheel orders. From the experimental results, the proposed pattern recognition procedure has enhanced about 42.3% over the pre-recognition procedure.
PDF

A study on Autonomous Travelling Control of Mobile Robot (이동로봇의 자율주행제어에 관한 연구)

Lee, Woo-Song;Shim, Hyun-Seok;Ha, Eun-Tae;Kim, Jong-Soo
- Journal of the Korean Society of Industry Convergence
- /
- v.18 no.1
- /
- pp.10-17
- /
- 2015
We describe a research about remote control of mobile robot based on voice command in this paper. Through real-time remote control and wireless network capabilities of an unmanned remote-control experiments and Home Security / exercise with an unmanned robot, remote control and voice recognition and voice transmission are possible to transmit on a PC using a microphone to control a robot to pinpoint of the source. Speech recognition can be controlled robot by using a remote control. In this research, speech recognition speed and direction of self-driving robot were controlled by a wireless remote control in order to verify the performance of mobile robot with two drives.
https://doi.org/10.21289/KSIC.2015.18.1.010 인용 PDF

Defending and Detecting Audio Adversarial Example using Frame Offsets

Gong, Yongkang;Yan, Diqun;Mao, Terui;Wang, Donghua;Wang, Rangding
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.15 no.4
- /
- pp.1538-1552
- /
- 2021
Machine learning models are vulnerable to adversarial examples generated by adding a deliberately designed perturbation to a benign sample. Particularly, for automatic speech recognition (ASR) system, a benign audio which sounds normal could be decoded as a harmful command due to potential adversarial attacks. In this paper, we focus on the countermeasures against audio adversarial examples. By analyzing the characteristics of ASR systems, we find that frame offsets with silence clip appended at the beginning of an audio can degenerate adversarial perturbations to normal noise. For various scenarios, we exploit frame offsets by different strategies such as defending, detecting and hybrid strategy. Compared with the previous methods, our proposed method can defense audio adversarial example in a simpler, more generic and efficient way. Evaluated on three state-of-the-arts adversarial attacks against different ASR systems respectively, the experimental results demonstrate that the proposed method can effectively improve the robustness of ASR systems.
https://doi.org/10.3837/tiis.2021.04.019 인용 PDF KSCI HTML

A Study on Development of ECS for Severly Handicaped (중증 장애인을 위한 생활환경 제어장치개발에 관한 연구)

임동철;이행세;홍석교;이일영
- Journal of Biomedical Engineering Research
- /
- v.24 no.5
- /
- pp.427-434
- /
- 2003
In this paper, we present a speech-based Environmental Control System(ECS) and its application. In the concrete, an ECS using the speech recognition and an portable wheelchair lift control system with the speech synthesis are developed through the simulation and the embodiment. The developed system apply to quadriplegic man and we evaluate the result of physical effect and of mental effect. Speech recognition system is constructed by real time modules using HMM model. For the clinical application of the device, we investigate the result applied to 54-years old quadriplegic man during a week through the questionnaires of Beck Depression Inventory and of Activity Pattern Indicator. Also the motor drive control system of potable wheelchair lift is implemented and the mechanical durability is tested by structural analysis. Speech recognition rate results in over 95% through the experiment. The result of the questionnaires shows higher satisfaction and lower nursing loads. In addition, the depression tendency of the subject were decreased. The potable wheelchair lift shows good fatigue life-cycle as the material supporting the upper wheelchair and shows the centroid mobility of safety. In this paper we present an example of ECS which consists of real-time speech recognition system and potable wheelchair lift. Also the experiments shows needs of the ECS for korean environments. This study will be the base of a commercial use.
PDF KSCI

Multi-layer Speech Processing System for Point-Of-Interest Recognition in the Car Navigation System (차량용 항법장치에서의 관심지 인식을 위한 다단계 음성 처리 시스템)

Bhang, Ki-Duck;Kang, Chul-Ho
- Journal of Korea Multimedia Society
- /
- v.12 no.1
- /
- pp.16-25
- /
- 2009
In the car environment that the first priority is a safety problem, the large vocabulary isolated word recognition system with POI domain is required as the optimal HMI technique. For the telematics terminal with a highly limited processing time and memory capacity, it is impossible to process more than 100,000 words in the terminal by the general speech recognition methods. Therefore, we proposed phoneme recognizer using the phonetic GMM and also PDM Levenshtein distance with multi-layer architecture for the POI recognition of telematics terminal. By the proposed methods, we obtained high performance in the telematics terminal with low speed processing and small memory capacity. we obtained the recognition rate of maximum 94.8% in indoor environment and of maximum 92.4% in the car navigation environments.
PDF

Implementing Onetime Password based Access Control System for Secure Sharing Service

Kang, Namhi
- International Journal of Internet, Broadcasting and Communication
- /
- v.13 no.3
- /
- pp.1-11
- /
- 2021
Development of ICT technologies leads exponential growth of various sharing economy over the last couple of years. The intuitive advantage of the sharing economy is efficient utilization of idle goods and services, but there are safety and security concerns. In this paper, we propose a onetime password based access control system to support secure accommodation sharing service and show the implementation results. To provide a secure service to both the provider and the user, the proposed system issues a onetime access password that is valid only during the sharing period reserved by the user, thereafter access returns to the accommodation owner. Especially, our system provides secure user access by merging the two elements of speaker recognition using voice and a one-time password to open and close the door lock. In this paper, we propose a secure system for accommodation sharing services as a use-case, but the proposed system can be applicable to various sharing services utilizing security-sensitive facilities.
https://doi.org/10.7236/IJIBC.2021.13.3.1 인용 PDF KSCI

A Design of the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomous vehicle safety (자율차량 안전을 위한 긴급상황 알림 및 운전자 반응 확인 시스템 설계)

Son, Su-Rak;Jeong, Yi-Na
- The Journal of Korea Institute of Information, Electronics, and Communication Technology
- /
- v.14 no.2
- /
- pp.134-139
- /
- 2021
Currently, the autonomous vehicle market is commercializing a level 3 autonomous vehicle, but it still requires the attention of the driver. After the level 3 autonomous driving, the most notable aspect of level 4 autonomous vehicles is vehicle stability. This is because, unlike Level 3, autonomous vehicles after level 4 must perform autonomous driving, including the driver's carelessness. Therefore, in this paper, we propose the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomousvehicle safety that notifies the driver of an emergency situation and recognizes the driver's reaction in a situation where the driver is careless. The EDCS uses the emergency situation delivery module to make the emergency situation to text and transmits it to the driver by voice, and the driver response confirmation module recognizes the driver's reaction to the emergency situation and gives the driver permission Decide whether to pass. As a result of the experiment, the HMM of the emergency delivery module learned speech at 25% faster than RNN and 42.86% faster than LSTM. The Tacotron2 of the driver's response confirmation module converted text to speech about 20ms faster than deep voice and 50ms faster than deep mind. Therefore, the emergency notification and driver response confirmation system can efficiently learn the neural network model and check the driver's response in real time.
https://doi.org/10.17661/jkiiect.2021.14.2.134 인용 PDF KSCI

Search Result 15, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)