• Title/Summary/Keyword: Speech activity detection

Search Result 85, Processing Time 0.019 seconds

Implementation of Adaptive Multi Rate (AMR) Vocoder for the Asynchronous IMT-2000 Mobile ASIC (IMT-2000 비동기식 단말기용 ASIC을 위한 적응형 다중 비트율 (AMR) 보코더의 구현)

  • 변경진;최민석;한민수;김경수
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.1
    • /
    • pp.56-61
    • /
    • 2001
  • This paper presents the real-time implementation of an AMR (Adaptive Multi Rate) vocoder which is included in the asynchronous International Mobile Telecommunication (IMT)-2000 mobile ASIC. The implemented AMR vocoder is a multi-rate coder with 8 modes operating at bit rates from 12.2kbps down to 4.75kbps. Not only the encoder and the decoder as basic functions of the vocoder are implemented, but VAD (Voice Activity Detection), SCR (Source Controlled Rate) operation and frame structuring blocks for the system interface are also implemented in this vocoder. The DSP for AMR vocoder implementation is a 16bit fixed-point DSP which is based on the TeakLite core and consists of memory block, serial interface block, register files for the parallel interface with CPU, and interrupt control logic. Through the implementation, we reduce the maximum operating complexity to 24MIPS by efficiently managing the memory structure. The AMR vocoder is verified throughout all the test vectors provided by 3GPP, and stable operation in the real-time testing board is also proved.

  • PDF

Electroglottographic Measurements of Glottal Function in Voice according to Gender and Age

  • Ko, Do-Heung
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.97-102
    • /
    • 2011
  • Electroglottography (EGG) is a common method for providing non-invasive measurements of glottal activity. EGG has been used in vocal pathology as a clinical or research tool to measure vocal fold contact. This paper presents the results of pitch, jitter, and closed quotient (CQ) measurements in electroglottographic signals of young (mean = 22.7 years) and elderly (mean = 74.3 years) male and female subjects. The sustained corner vowels /i/, /a/, and /u/ were measured at around 70 dB SPL since the most notable among EGG variables is the phonation intensity, which showed positive correlation with closed phase. The aim of this paper was to measure EGG data according to age and gender. In CQ, there was a significant difference between young and elderly female subjects while there was no significant difference between young and elderly male subjects. The mean value for young males was higher than that for elderly males while the mean value for young females was lower than that for elderly females. Thus, it can be said that in mean values, increased CQ was related to decreased age for females, while CQ decreased for males as the speaker's age decreased. Although the laryngeal degeneration due to increased age seems to occur to a lesser extent in females, the significant increase of CQ in elderly female voices could not be explained in terms of age-related physiological changes. In standard deviation of pitch and jitter, the mean values for young and elderly males were higher than that for young and elderly females. That is, male subjects showed higher in mean values of voice variables than female subjects. This result could be considered as a sign of vocal instability in males. It was suggested that these results may provide powerful insights into the control and regulation of normal phonation and into the detection and characterization of pathology.

  • PDF

Statistical Model-Based Voice Activity Detection Using Spatial Cues for Dual-Channel Noisy Speech Recognition (이중채널 잡음음성인식을 위한 공간정보를 이용한 통계모델 기반 음성구간 검출)

  • Shin, Min-Hwa;Park, Ji-Hun;Kim, Hong-Kook
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 2010.07a
    • /
    • pp.150-151
    • /
    • 2010
  • 본 논문에서는 잡음환경에서의 이중채널 음성인식을 위한 통계모델 기반 음성구간 검출 방법을 제안한다. 제안된 방법에서는 다채널 입력 신호로부터 얻어진 공간정보를 이용하여 음성 존재 및 부재 확률모델을 구하고 이를 통해 음성구간 검출을 행한다. 이때, 공간정보는 두 채널간의 상호 시간 차이와 상호 크기 차이로, 음성 존재 및 부재 확률은 가우시안 커널 밀도 기반의 확률모델로 표현된다. 그리고 음성구간은 각 시간 프레임 별 음성 존재 확률 대비 음성 부재 확률의 비를 추정하여 검출된다. 제안된 음성구간 검출 방법의 평가를 위해 검출된 구간만을 입력으로 하는 음성인식 성능을 측정한다. 실험결과, 제안된 공간정보를 이용하는 통계모델 기반의 음성구간 검출 방법이 주파수 에너지를 이용하는 통계모델 기반의 음성구간 검출 방법과 주파수 스펙트럼 밀도 기반 음성구간 검출 방법에 비해 각각 15.6%, 15.4%의 상대적 오인식률 개선을 보였다.

  • PDF

A Study on the Voice Traffic Efficiency and Buffer Management by Priority Control in ATM Multiplexer (ATM 멀티플렉서에서 우선순위 제어에 의한 음성전송효율 및 버퍼관리에 관한 연구)

  • 이동수;최창수;강준길
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.19 no.2
    • /
    • pp.354-363
    • /
    • 1994
  • This paper describes the method that voice traffic is served efficiently in BISDN. Voice is divided into talkspurt and silent period, and it is possible to transmit olny talksurt by the speech activity detection. This paper described the voice traffic control algorithm in the ATM network where cell discarding method is applied to the embedded ADPCM voice data. For traffic control, the cell discarding was used over low priority cells when it overflows the queue threshold. To estimate the efficiency of traffic control algorithm, the computer simuation was performed with cell loss probability, queue length and mean delay as performance parameters. The embedded ADPCM voice coding and cell disscarding resulted in improving the voice cell traffic efficiency and the dynamic control over network congestion.

  • PDF

Voice Activity Detection Algorithm using Wavelet Band Entropy Ensemble Analysis in Car Noisy Environments (문서 편집 접근성 향상을 위한 음성 명령 기반 모바일 어플리케이션 개발)

  • Park, Joo Hyun;Park, Seah;Lee, Muneui;Lim, Soon-Bum
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.11
    • /
    • pp.1342-1352
    • /
    • 2018
  • Voice Command systems are important means of ensuring accessibility to digital devices for use in situations where both hands are not free or for people with disabilities. Interests in services using speech recognition technology have been increasing. In this study, we developed a mobile writing application using voice recognition and voice command technology which helps people create and edit documents easily. This application is characterized by the minimization of the touch on the screen and the writing of memo by voice. We have systematically designed a mode to distinguish voice writing and voice command so that the writing and execution system can be used simultaneously in one voice interface. It provides a shortcut function that can control the cursor by voice, which makes document editing as convenient as possible. This allows people to conveniently access writing applications by voice under both physical and environmental constraints.