통합 검색 | Korea Science

네트워크 환경에서 서버용 음성 인식을 위한 MFCC 기반 음성 부호화기 설계 (A MFCC-based CELP Speech Coder for Server-based Speech Recognition in Network Environments)

이길호;윤재삼;오유리;김홍국
- 대한음성학회지:말소리
- /
- 제54호
- /
- pp.27-43
- /
- 2005
Existing standard speech coders can provide speech communication of high quality while they degrade the performance of speech recognition systems that use the reconstructed speech by the coders. The main cause of the degradation is that the spectral envelope parameters in speech coding are optimized to speech quality rather than to the performance of speech recognition. For example, mel-frequency cepstral coefficient (MFCC) is generally known to provide better speech recognition performance than linear prediction coefficient (LPC) that is a typical parameter set in speech coding. In this paper, we propose a speech coder using MFCC instead of LPC to improve the performance of a server-based speech recognition system in network environments. However, the main drawback of using MFCC is to develop the efficient MFCC quantization with a low-bit rate. First, we explore the interframe correlation of MFCCs, which results in the predictive quantization of MFCC. Second, a safety-net scheme is proposed to make the MFCC-based speech coder robust to channel error. As a result, we propose a 8.7 kbps MFCC-based CELP coder. It is shown from a PESQ test that the proposed speech coder has a comparable speech quality to 8 kbps G.729 while it is shown that the performance of speech recognition using the proposed speech coder is better than that using G.729.
PDF

Speech Emotion Recognition Using 2D-CNN with Mel-Frequency Cepstrum Coefficients

Eom, Youngsik;Bang, Junseong
- Journal of information and communication convergence engineering
- /
- 제19권3호
- /
- pp.148-154
- /
- 2021
With the advent of context-aware computing, many attempts were made to understand emotions. Among these various attempts, Speech Emotion Recognition (SER) is a method of recognizing the speaker's emotions through speech information. The SER is successful in selecting distinctive 'features' and 'classifying' them in an appropriate way. In this paper, the performances of SER using neural network models (e.g., fully connected network (FCN), convolutional neural network (CNN)) with Mel-Frequency Cepstral Coefficients (MFCC) are examined in terms of the accuracy and distribution of emotion recognition. For Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS) dataset, by tuning model parameters, a two-dimensional Convolutional Neural Network (2D-CNN) model with MFCC showed the best performance with an average accuracy of 88.54% for 5 emotions, anger, happiness, calm, fear, and sadness, of men and women. In addition, by examining the distribution of emotion recognition accuracies for neural network models, the 2D-CNN with MFCC can expect an overall accuracy of 75% or more.
https://doi.org/10.6109/jicce.2021.19.3.148 인용 PDF KSCI

Speech Noise Cancellation using Time Adaptive Threshold Value in Wavelet Transform

Lee Chul-Hee;Lee Ki-Hoon;Hwang Hyang-Ja;Moon In-Seob;Kim Chong-Kyo
- 대한전자공학회:학술대회논문집
- /
- 대한전자공학회 2004년도 ICEIC The International Conference on Electronics Informations and Communications
- /
- pp.244-248
- /
- 2004
This paper proposes a new noise cancellation method for speech recognition in noise environments. We determine the time adaptive threshold value using standard deviations of wavelet coefficients after wavelet transform by frames. The time adaptive threshold value is set up by using sum of standard deviations of wavelet coefficients in cA3 and weighted cD1. cA3 coefficients represent the voiced sound with lower frequency components and cD1 coefficients represent the unvoiced sound with higher frequency components. In experiments, we removed noise after adding white Gaussian noise and colored noise to original speech. The proposed method improved SNR and MSE more than wavelet transform and wavelet packet transform does. As a result of speech recognition experiment using noise speech DB, recognition performance is improved by $2\sim4\;\%.$
PDF

조타명령의 음성인식을 위한 최적 특징파라미터 검출에 관한 연구 (Optimal Feature Parameters Extraction for Speech Recognition of Ship's Wheel Orders)

문성배;채양범;전승환
- 해양환경안전학회지
- /
- 제13권2호
- /
- pp.161-167
- /
- 2007
이 논문은 선박의 자동조타장치를 음성인식으로 제어할 수 있는 시스템을 개발하기 위한 기초연구로 SMCP(IMO Standard Marine Communication Phrases)에 제시된 조타명령문의 구성 형태를 분석하여 화자의 의도를 예측할 수 있는 특정 파라미터를 추출하였다. 그리고 이 파라미터를 이용하여 1차 패턴인식 과정으로부터 도출된 후보단어 집합으로부터 최종 단어를 결정하는 후처리 인식 프로시저를 설계하였다. 이 프로시저의 유용성을 검증하기 위하여 음성인식용으로 총 525개의 조타명령문을 획득하였고, 표준패턴 기반의 인식과정 인식률과의 비교실험을 수행하였다. 실험결과 의도예측 특정 파라미터를 이용한 인식 프로시저의 인식률이 약 42.3% 향상되어 유효함을 알 수 있었다.
PDF

이동로봇의 자율주행제어에 관한 연구 (A study on Autonomous Travelling Control of Mobile Robot)

이우송;심현석;하언태;김종수
- 한국산업융합학회 논문집
- /
- 제18권1호
- /
- pp.10-17
- /
- 2015
We describe a research about remote control of mobile robot based on voice command in this paper. Through real-time remote control and wireless network capabilities of an unmanned remote-control experiments and Home Security / exercise with an unmanned robot, remote control and voice recognition and voice transmission are possible to transmit on a PC using a microphone to control a robot to pinpoint of the source. Speech recognition can be controlled robot by using a remote control. In this research, speech recognition speed and direction of self-driving robot were controlled by a wireless remote control in order to verify the performance of mobile robot with two drives.
https://doi.org/10.21289/KSIC.2015.18.1.010 인용 PDF

Defending and Detecting Audio Adversarial Example using Frame Offsets

Gong, Yongkang;Yan, Diqun;Mao, Terui;Wang, Donghua;Wang, Rangding
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제15권4호
- /
- pp.1538-1552
- /
- 2021
Machine learning models are vulnerable to adversarial examples generated by adding a deliberately designed perturbation to a benign sample. Particularly, for automatic speech recognition (ASR) system, a benign audio which sounds normal could be decoded as a harmful command due to potential adversarial attacks. In this paper, we focus on the countermeasures against audio adversarial examples. By analyzing the characteristics of ASR systems, we find that frame offsets with silence clip appended at the beginning of an audio can degenerate adversarial perturbations to normal noise. For various scenarios, we exploit frame offsets by different strategies such as defending, detecting and hybrid strategy. Compared with the previous methods, our proposed method can defense audio adversarial example in a simpler, more generic and efficient way. Evaluated on three state-of-the-arts adversarial attacks against different ASR systems respectively, the experimental results demonstrate that the proposed method can effectively improve the robustness of ASR systems.
https://doi.org/10.3837/tiis.2021.04.019 인용 PDF KSCI HTML

중증 장애인을 위한 생활환경 제어장치개발에 관한 연구 (A Study on Development of ECS for Severly Handicaped)

임동철;이행세;홍석교;이일영
- 대한의용생체공학회:의공학회지
- /
- 제24권5호
- /
- pp.427-434
- /
- 2003
본 논문은 음성기반 생활환경 제어장치 (Enviropmet Control System: ECS)의 개발과 적용에 대해 다룬다. 구체적으로 개발 내용은 음성인식을 통한 생활환경의 제어 장치와 음성 지원 이동형 휠체어 리프트 제어장치를 구현하였다. 구현된 시스템은 경수 손상 사지마비 환자에게 적용되었고 활용의 문제점과 장치의 사용이 장애인 및 가족에게 미치는 신체적 정신적 영향을 평가 고찰하였다. 음성 인식 시스템은 HMM기반 실시간 음성 인식 시스템을 구성하였고 임상의 적용을 평가하기 위하여 가전 기구를 제어하도록 간략화된 음성인식 모듈을 구현하여 54세 남자로 완전 제 4 경수 손상인을 대상으로 1주간 관찰하였다. 그리고 설문 조사를 통하여 Beck 우울 척도와 오락 및 사회적 관계성의 변화를 평가하였다. 또한 음성 지원 이동형 휠체어 리프트의 전동기 구동 제어 장치를 개발하고 구조 역학 해석을 통해 기계적 안정성을 평가하고 시제품을 제작 성능 시험을 하였다. 시스템의 실험 결과 95%이상의 인식률을 나타내었다. 설문조사 결과 장애인의 장치에 대한 만족도는 높은 편이고 간호의존도와 사회적 관계성의 증진을 확인할 수 있었다. 이에 더불어 간병인의 부담과 우울 경향의 감소가 있었다. 또한 음성 지원 이동형 휠체어 리프트는 구조 해석 결과, 상단 휠체어 바퀴 지지대의 재료로서의 피로 수명이 우수하고 무게 중심 이동이 안정성을 가짐을 확인하였다. 본 논문은 실시간 음성 인식 시스템과 이동형 휠체어 리프트를 이용한 장애자를 위한 생활환경 제어장치 구성의 한 예를 제시하였고, 실험을 통하여 실시간 음성인식의 중요한 활용 분야인 재활 의학 분야에서 한국형 생활환경 제어장치의 개발의 필요성을 확인하였으며 상용화를 모색의 발판이 될 것이다.
PDF KSCI

차량용 항법장치에서의 관심지 인식을 위한 다단계 음성 처리 시스템 (Multi-layer Speech Processing System for Point-Of-Interest Recognition in the Car Navigation System)

방기덕;강철호
- 한국멀티미디어학회논문지
- /
- 제12권1호
- /
- pp.16-25
- /
- 2009
안전성을 최우선시 해야 하는 자동차 환경에서 관심지 (POI, Point-Of-Interest) 도메인을 대상으로 하는 대용량 고려 단어 인식 시스템은 최적의 인간-기계 상호접속(HMI, Human-Machine Interface) 기술을 요구하고 있다. 하지만, 매우 제한된 연산처리 능력과 메모리를 가지는 텔레매틱스 단말기에서 10만 단어 이상을 일반적인 음성인식 방식으로 처리하기는 불가능하다. 따라서 본 논문에서는 텔레매틱스 단말기의 관심지 인식을 위하여 다단계 구조의 대용량 고립단어 인식 시스템을 제안하였다. 이 관심지 인식 시스템의 성능향상을 위해 음소별 가우시안 혼합모델(GMM, Gaussian Mixture Model)을 사용한 음소 인식기와 음소별 거리 행렬(PDM, Phoneme-distance Matric) 레빈쉬타인(Levenshtein) 거리를 제안하였다. 제안한 방법은 낮은 처리속도와 적은 양의 메모리를 가지는 텔레매틱스 단말기에서도 대용량 고립단어에 대하여 우수한 인식 성능을 나타내었다. 본 논문에서 제안한 다단계 인식 시스템을 사용하였을 경우 실내에서 최대 94.8%, 자동차환경에서는 최대 92.4%의 인식 성능을 얻을 수 있었다.
PDF

Implementing Onetime Password based Access Control System for Secure Sharing Service

Kang, Namhi
- International Journal of Internet, Broadcasting and Communication
- /
- 제13권3호
- /
- pp.1-11
- /
- 2021
Development of ICT technologies leads exponential growth of various sharing economy over the last couple of years. The intuitive advantage of the sharing economy is efficient utilization of idle goods and services, but there are safety and security concerns. In this paper, we propose a onetime password based access control system to support secure accommodation sharing service and show the implementation results. To provide a secure service to both the provider and the user, the proposed system issues a onetime access password that is valid only during the sharing period reserved by the user, thereafter access returns to the accommodation owner. Especially, our system provides secure user access by merging the two elements of speaker recognition using voice and a one-time password to open and close the door lock. In this paper, we propose a secure system for accommodation sharing services as a use-case, but the proposed system can be applicable to various sharing services utilizing security-sensitive facilities.
https://doi.org/10.7236/IJIBC.2021.13.3.1 인용 PDF KSCI

자율차량 안전을 위한 긴급상황 알림 및 운전자 반응 확인 시스템 설계 (A Design of the Emergency-notification and Driver-response Confirmation System(EDCS) for an autonomous vehicle safety)

손수락;정이나
- 한국정보전자통신기술학회논문지
- /
- 제14권2호
- /
- pp.134-139
- /
- 2021
현재 자율주행차량 시장은 3레벨 자율주행차량을 상용화하고 있으나, 여전히 운전자의 주의를 필요로 한다. 3레벨 자율주행 이후 4레벨 자율주행차량에서 가장 주목되는 부분은 차량의 안정성이다. 3레벨과 다르게 4레벨 이후의 자율주행차량은 운전자의 부주의까지 포함하여 자율주행을 실시해야 하기 때문이다. 따라서 본 논문에서는 운전자가 부주의한 상황에서 긴급상황을 알리고 운전자의 반응을 인식하는 자율차량 안전을 위한 긴급상황 알림 및 운전자 반응 확인 시스템을 제안한다. 긴급상황 알림 및 운전자 반응 확인 시스템은 긴급상황 전달 모듈을 사용하여 긴급상황을 텍스트화하여 운전자에게 음성으로 전달하며 운전자 반응 확인 모듈을 사용하여 긴급상황에 대한 운전자의 반응을 인식하고 운전 권한을 운전자에게 넘길지 결정한다. 실험 결과, 긴급상황 전달 모듈의 HMM은 RNN보다 25%, LSTM보다 42.86% 빠른 속도로 음성을 학습했다. 운전자 반응 확인 모듈의 Tacotron2는 deep voice보다 약 20ms, deep mind 보다 약 50ms 더 빨리 텍스트를 음성으로 변환했다. 따라서 긴급상황 알림 및 운전자 반응 확인 시스템은 효율적으로 신경망 모델을 학습시키고, 실시간으로 운전자의 반응을 확인할 수 있다.
https://doi.org/10.17661/jkiiect.2021.14.2.134 인용 PDF KSCI

검색결과 15건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)