• 제목/요약/키워드: Sound activity detection

검색결과 13건 처리시간 0.03초

주의집중 기반의 합성곱 양방향 게이트 순환 유닛을 이용한 코골이 소리 검출 방식 (Snoring sound detection method using attention-based convolutional bidirectional gated recurrent unit)

  • 김민수;이기용;김형국
    • 한국음향학회지
    • /
    • 제40권2호
    • /
    • pp.155-160
    • /
    • 2021
  • 본 논문은 수면 무호흡 환자의 중요한 증상 중의 하나인 코골이 사운드 자동 검출 방식을 제안한다. 제안된 방식에서는 수면 중 발생하는 소리 신호를 입력받아 소리 발생 구간을 검출하고, 검출된 소리 구간으로부터 변환된 스펙트로그램을 주의집중 기반의 합성곱 양방향 게이트 순환 유닛 기반의 분류기에 적용하였다. 적용된 주의집중 메커니즘은 합성곱 양방향 게이트 순환 유닛 모델을 확장하여 코골이 소리에 대한 차별적 특징 표현을 학습함으로써 코골이 검출 성능을 향상시켰다. 실험 결과는 제안하는 코골이 검출 방식이 기존 방식보다 약 3.1 % ~ 5.5 %의 정확도 향상을 보여준다.

CNN based Sound Event Detection Method using NMF Preprocessing in Background Noise Environment

  • Jang, Bumsuk;Lee, Sang-Hyun
    • International journal of advanced smart convergence
    • /
    • 제9권2호
    • /
    • pp.20-27
    • /
    • 2020
  • Sound event detection in real-world environments suffers from the interference of non-stationary and time-varying noise. This paper presents an adaptive noise reduction method for sound event detection based on non-negative matrix factorization (NMF). In this paper, we proposed a deep learning model that integrates Convolution Neural Network (CNN) with Non-Negative Matrix Factorization (NMF). To improve the separation quality of the NMF, it includes noise update technique that learns and adapts the characteristics of the current noise in real time. The noise update technique analyzes the sparsity and activity of the noise bias at the present time and decides the update training based on the noise candidate group obtained every frame in the previous noise reduction stage. Noise bias ranks selected as candidates for update training are updated in real time with discrimination NMF training. This NMF was applied to CNN and Hidden Markov Model(HMM) to achieve improvement for performance of sound event detection. Since CNN has a more obvious performance improvement effect, it can be widely used in sound source based CNN algorithm.

Robust Voice Activity Detection Using the Spectral Peaks of Vowel Sounds

  • Yoo, In-Chul;Yook, Dong-Suk
    • ETRI Journal
    • /
    • 제31권4호
    • /
    • pp.451-453
    • /
    • 2009
  • This letter proposes the use of vowel sound detection for voice activity detection. Vowels have distinctive spectral peaks. These are likely to remain higher than their surroundings even after severe corruption. Therefore, by developing a method of detecting the spectral peaks of vowel sounds in corrupted signals, voice activity can be detected as well even in low signal-to-noise ratio (SNR) conditions. Experimental results indicate that the proposed algorithm performs reliably under various noise and low SNR conditions. This method is suitable for mobile environments where the characteristics of noise may not be known in advance.

Human-Robot Interaction in Real Environments by Audio-Visual Integration

  • Kim, Hyun-Don;Choi, Jong-Suk;Kim, Mun-Sang
    • International Journal of Control, Automation, and Systems
    • /
    • 제5권1호
    • /
    • pp.61-69
    • /
    • 2007
  • In this paper, we developed not only a reliable sound localization system including a VAD(Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate three systems in the human-robot interaction to compensate errors in the localization of a speaker and to reject unnecessary speech or noise signals entering from undesired directions effectively. For the purpose of verifying our system's performances, we installed the proposed audio-visual system in a prototype robot, called IROBAA(Intelligent ROBot for Active Audition), and demonstrated how to integrate the audio-visual system.

차량 잡음 환경에서 엔트로피 기반의 음성 구간 검출 (Voice Activity Detection Based on Entropy in Noisy Car Environment)

  • 노용완;이규범;이우석;홍광석
    • 융합신호처리학회논문지
    • /
    • 제9권2호
    • /
    • pp.121-128
    • /
    • 2008
  • 정확한 음성 구간 검출은 음성 인식 및 음성 코딩 그리고 음성 통신 시스템 등과 같은 음성 어플리케이션의 성능에 큰 영향을 미친다. 본 논문에서는 실제 운전하고 있는 상태에서 다양한 차량 노이즈 환경의 음성 구간 검출 방법을 제안한다. 기존의 음성 구간 검출은 시간 에너지, 주파수 에너지, 영 교차율, spectral entropy 등 다양한 방법을 사용하였으며 잡음 환경에서 급격하게 성능이 저하되는 단점이 있었다. 본 논문에서는 기존의 spectral entropy를 기반으로 하여 MFB(Mel-frequency Filter Banks) spectral entropy, 기울기 FFT(Fast Fourier Transform) spectral entropy, 기울기 MFB spectral entropy를 이용한 음성 구간 검출 방법을 제안한다. MFB는 멜 스케일과 FFT를 곱한 것으로 멜 스케일은 인간이 소리를 인지할 때 주파수에 대해 비선형적인 스케일이며 음성의 특징을 잘 반영한다. 제안한 MFB spectral entropy 방법은 다양한 차량 잡음 환경에서 음성 및 비음성 분별 능력을 향상시킬 수 있으며 실험 결과 93.21%의 음성 구간 검출율을 나타내었다. 이는 기존의 spectral entropy 방법과 비교할 때 MFB를 이용한 음성 구간 검출 방법이 3.2%의 검출율이 향상되었다.

  • PDF

Applying the Bi-level HMM for Robust Voice-activity Detection

  • Hwang, Yongwon;Jeong, Mun-Ho;Oh, Sang-Rok;Kim, Il-Hwan
    • Journal of Electrical Engineering and Technology
    • /
    • 제12권1호
    • /
    • pp.373-377
    • /
    • 2017
  • This paper presents a voice-activity detection (VAD) method for sound sequences with various SNRs. For real-time VAD applications, it is inadequate to employ a post-processing for the removal of burst clippings from the VAD output decision. To tackle this problem, building on the bi-level hidden Markov model, for which a state layer is inserted into a typical hidden Markov model (HMM), we formulated a robust method for VAD not requiring any additional post-processing. In the method, a forward-inference-ratio test was devised to detect the speech endpoints and Mel-frequency cepstral coefficients (MFCC) were used as the features. Our experiment results show that, regarding different SNRs, the performance of the proposed approach is more outstanding than those of the conventional methods.

음향 기반 물 사용 활동 감지용 엣지 컴퓨팅 시스템 (The Edge Computing System for the Detection of Water Usage Activities with Sound Classification)

  • 현승호;지영준
    • 대한의용생체공학회:의공학회지
    • /
    • 제44권2호
    • /
    • pp.147-156
    • /
    • 2023
  • Efforts to employ smart home sensors to monitor the indoor activities of elderly single residents have been made to assess the feasibility of a safe and healthy lifestyle. However, the bathroom remains an area of blind spot. In this study, we have developed and evaluated a new edge computer device that can automatically detect water usage activities in the bathroom and record the activity log on a cloud server. Three kinds of sound as flushing, showering, and washing using wash basin generated during water usage were recorded and cut into 1-second scenes. These sound clips were then converted into a 2-dimensional image using MEL-spectrogram. Sound data augmentation techniques were adopted to obtain better learning effect from smaller number of data sets. These techniques, some of which are applied in time domain and others in frequency domain, increased the number of training data set by 30 times. A deep learning model, called CRNN, combining Convolutional Neural Network and Recurrent Neural Network was employed. The edge device was implemented using Raspberry Pi 4 and was equipped with a condenser microphone and amplifier to run the pre-trained model in real-time. The detected activities were recorded as text-based activity logs on a Firebase server. Performance was evaluated in two bathrooms for the three water usage activities, resulting in an accuracy of 96.1% and 88.2%, and F1 Score of 96.1% and 87.8%, respectively. Most of the classification errors were observed in the water sound from washing. In conclusion, this system demonstrates the potential for use in recording the activities as a lifelog of elderly single residents to a cloud server over the long-term.

조정 응답 파워 방법과 결합된 generalized cross correlation with phase transform 음원 위치 추정 (Generalized cross correlation with phase transform sound source localization combined with steered response power method)

  • 김영준;오민재;이인성
    • 한국음향학회지
    • /
    • 제36권5호
    • /
    • pp.345-352
    • /
    • 2017
  • 본 논문에서는 잔향과 잡음이 존재하는 실제 환경을 모델링하여 두 개의 마이크로폰을 이용한 음원 위치추정의 정확성을 향상시키는 방법을 제안하였다. 입력신호에 VAD(Voice Activity Detection)를 적용하여 묵음 구간을 제외한 음성 구간만을 사용하였고, 샘플링 주파수의 제한으로 인한 측정 범위를 벗어나는 프레임은 업샘플링(up-sampling)을 통해 지연시간을 다시 추정하였다. 여기서 계산된 도착 지연 시간은 Time-table을 참조해 주변 후보위치의 지연 값들과의 비교로 최대 파워 값을 갖는 지연 시간을 선택하여 음원 위치의 정확도를 높였다. 또한 프레임간의 상관성을 이용하여 연속된 음성 프레임의 경우 큰 추정 차가 발생하는 곳을 찾아 주변 프레임의 평균값으로 대체함으로써 음원의 위치 추정 성능을 향상시켰다.

휴머노이드 로봇을 위한 원거리 음성 인터페이스 기술 연구 (Distant-talking of Speech Interface for Humanoid Robots)

  • 이협우;육동석
    • 대한음성학회:학술대회논문집
    • /
    • 대한음성학회 2007년도 한국음성과학회 공동학술대회 발표논문집
    • /
    • pp.39-40
    • /
    • 2007
  • For efficient interaction between human and robots, speech interface is a core problem especially in noisy and reverberant conditions. This paper analyzes main issues of spoken language interface for humanoid robots, such as sound source localization, voice activity detection, and speaker recognition.

  • PDF

Adaptive Post Processing of Nonlinear Amplified Sound Signal

  • Lee, Jae-Kyu;Choi, Jong-Suk;Seok, Cheong-Gyu;Kim, Mun-Sang
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.872-876
    • /
    • 2005
  • We propose a real-time post processing of nonlinear amplified signal to improve voice recognition in remote talk. In the previous research, we have found the nonlinear amplification has unique advantage for both the voice activity detection and the sound localization in remote talk. However, the original signal becomes distorted due to its nonlinear amplification and, as a result, the rest of sequence such as speech recognition show less satisfactorily results. To remedy this problem, we implement a linearization algorithm to recover the voice signal's linear characteristics after the localization has been done.

  • PDF