• 제목/요약/키워드: Speech signals

검색결과 499건 처리시간 0.029초

입술움직임 영상신호를 고려한 음성존재 검출 (Speech Activity Decision with Lip Movement Image Signals)

  • 박준;이영직;김응규;이수종
    • 한국음향학회지
    • /
    • 제26권1호
    • /
    • pp.25-31
    • /
    • 2007
  • 본 논문은 음성인식을 위한 음성구간 검출과정에서, 음향에너지 이외에도 화자의 입술움직임 영상신호까지 확인하도록 함으로써, 외부의 음향잡음이 음성인식 대상으로 오인식되는 것을 방지하기 위하여 시도한 것이다. 먼저, PC용 화상카메라를 통하여 영상을 획득하고, 입술움직임 여부가 식별된다. 그리고 입술움직임 영상신호 데이터는 공유메모리에 저장되어 음성인식 프로세스와 공유한다. 한편, 음성인식의 전처리 단계인 음성구간 검출과정에서는 공유메모리에 저장되어 있는 데이터를 확인함으로써 사람의 발성에 의한 음향에너지인지의 여부를 확인하게 된다. 음성인식기와 영상처리기를 연동시켜 실험한 결과, 화상카메라에 대면해서 발성하면 음성인식 결과의 출력까지 정상적으로 진행됨을 확인하였고, 화상카메라에 대면하지 않고 발성하면 음성인식 결과를 출력하지 않는 것을 확인하였다. 이는 음향에너지가 입력되더라도 입술움직임 영상이 확인되지 않으면 음향잡음으로 간주하도록 한 것에 따른 것이다.

개선된 직교투사 알고리즘을 이용한 음향궤환제거기 (An Acoustic Feedback Canceller for Hearing Aids Using Improved Orthogonal Projection Algorithm)

  • 이행우
    • 디지털산업정보학회논문지
    • /
    • 제8권2호
    • /
    • pp.49-58
    • /
    • 2012
  • This paper is on an improved orthogonal projection method which can cancel the acoustic feedback signals in the digital hearing aids. Comparing with the NLMS algorithm which is widely used for simplicity and stability, it shows that this method has the improvement of the convergence performances, and has small computational quantities, for signals with the large auto-correlation as speech signals. This uses the improved orthogonal projection algorithm which reduces the correlation of signals. To verify the convergence characteristics of the proposed algorithm, we simulated about various input signals. The acoustic feedback canceller has a 12-bit resolution with 64-tap adaptive FIR filter. And we compared the results of simulation for this algorithm with the ones for the NLMS algorithm. By these works, it is proved that the feedback canceller adopting the proposed algorithm shows about 3.5dB more high SNR than the NLMS algorithm in the colored input signals.

스테레오 음향반향제거기의 BSS 후처리방법 (Post Processing using Blind Signal Separation in Stereo Acoustic Echo Canceller)

  • 이행우
    • 디지털산업정보학회논문지
    • /
    • 제10권1호
    • /
    • pp.131-138
    • /
    • 2014
  • This paper is on a stereo acoustic echo canceller with the blind signal separation for post processing. The convergence speed of the stereo acoustic echo canceller is deteriorated due to mixing two residual signals which are update signals of each echo canceller. To solve this problem, we are to use the blind signal separation(BSS) method separating the mixed signals after the echo cancellers. The blind signal separation method can extracts the source signals by means of the iterative computations with two input signals. We had verified performances of the proposed acoustic echo canceller for stereo through simulations. The results of simulations show that the acoustic echo canceller for stereo using this algorithm operates stably without divergence in the normal state. And, when the speech signals were inputted, this echo canceller achieved about 2dB higher ERLE with the BSS post processing method than without this method. This stereo echo canceller showed the best performance in the case of inputting the real voice signal.

암묵신호분리를 이용한 스테레오 음향반향제거기 (An Acoustic Echo Canceller for Stereo Using Blind Signal Separation)

  • 이행우
    • 디지털산업정보학회논문지
    • /
    • 제8권3호
    • /
    • pp.125-131
    • /
    • 2012
  • This paper is on a stereo acoustic echo canceller with the blind signal separation. The convergence speed of the stereo acoustic echo canceller is deteriorated due to mixing two residual signals in the update signal of each echo canceller. To solve this problem, we are to use the blind signal separation(BSS) method separating the mixed signals. The blind signal separation method can extracts the source signals by means of the iterative computations with two input signals. We had verified performances of the proposed acoustic echo canceller for stereo through simulations. The results of simulations show that the acoustic echo canceller for stereo using this algorithm operates stably without divergence in the normal state. And, when the speech signals were inputted, this echo canceller achieved about 3dB higher ERLE in the case of using the BSS algorithm than the case of not using the BSS algorithm. But this echo canceller didn't get good performances in the case of inputting the white noises as stereo signals.

Categorization and production in lexical pitch accent contrasts of North Kyungsang Korean

  • Kim, Jungsun
    • 말소리와 음성과학
    • /
    • 제10권1호
    • /
    • pp.1-7
    • /
    • 2018
  • Categorical production in language processing helps speakers to produce phonemic contrasts. This categorization and production is utilized for the production-based and imitation-based approach in the present study. Contrastive signals in speakers' speech reflect the shapes of boundaries with categorical characteristics. Signals that provide information about lexical pitch accent contrasts can introduce categorical distinctions for productive and cognitive selection. This experiment was conducted with nine North Kyungsang speakers for a production task and nine North Kyungsang speakers for an imitation task. The first finding of the present study is the rigidity of categorical production, which controls the boundaries of lexical pitch accent contrasts. The categorization of North Kyungsang speakers' production allows them to classify minimal pitch accent contrasts. The categorical production in imitation appeared in two clusters, representing two meaningful contrasts. The second finding of the present study is that there are individual differences in speakers' production and imitation responses. The distinctive performances of individual speakers showed a variety of curves. For the HL-LH patterns, the categorical production tended to be highly distinctive as compared to the other pitch accent patterns (HH-HL and HH-LH), showing that there are more continuous curves than categorical curves. Finally, the present study shows that, for North Kyungsang speakers, imitative production is the core type of categorical production for determining the existence of the lexical pitch accent system. However, several questions remain for defining that categorical production, which leads to ideas for future research.

다중 응답 분류회귀트리를 이용한 음성 개성 변환 (Voice Personality Transformation Using a Multiple Response Classification and Regression Tree)

  • 이기승
    • 한국음향학회지
    • /
    • 제23권3호
    • /
    • pp.253-261
    • /
    • 2004
  • 본 논문에서는 음성 신호가 지니고 있는 화자 의존적 특징 변수를 변환 시키는 음성 개성 변환 기법이 새롭게 제안되었다. 제안된 방법은 성도 전달 함수의 특성을 반영하는 켑스트럼 벡터와 여기 신호의 특성을 반영하는 피치 값을 변환 대상 변수로 삼았으며, 이들에 대한 변환 기법으로 다중 응답 분류 회귀 트리를 사용하였다. 다중 응답 분류 회귀 트리는 기존의 분류 회귀 트리를 다차원 확장시킨 형태로서, 반응값이 벡터 형태로 존재하는 분류 회귀 트리를 의미한다. 본 논문에서는 기존의 코드북 메핑 방법과 비교하여 제안된 기법의 성능을 평가하였으며, 분류 회귀 트리에 입력되는 관찰값을 다양하게 변화시켜 트리의 복잡도와 변환 성능을 정량적으로 분석하였다. 네 명의 화자를 이용한 음성 개성 변환 실험에서, 기존의 코드북 메핑과 비교하여 객관적으로 우수한 성능을 나타내었으며, 청취 테스트에서도 변환음이 목표로 하는 화자의 음성과 유사함을 관찰할 수 있었다.

Multi-channel Speech Enhancement Using Blind Source Separation and Cross-channel Wiener Filtering

  • Jang, Gil-Jin;Choi, Chang-Kyu;Lee, Yong-Beom;Kim, Jeong-Su;Kim, Sang-Ryong
    • The Journal of the Acoustical Society of Korea
    • /
    • 제23권2E호
    • /
    • pp.56-67
    • /
    • 2004
  • Despite abundant research outcomes of blind source separation (BSS) in many types of simulated environments, their performances are still not satisfactory to be applied to the real environments. The major obstacle may seem the finite filter length of the assumed mixing model and the nonlinear sensor noises. This paper presents a two-step speech enhancement method with multiple microphone inputs. The first step performs a frequency-domain BSS algorithm to produce multiple outputs without any prior knowledge of the mixed source signals. The second step further removes the remaining cross-channel interference by a spectral cancellation approach using a probabilistic source absence/presence detection technique. The desired primary source is detected every frame of the signal, and the secondary source is estimated in the power spectral domain using the other BSS output as a reference interfering source. Then the estimated secondary source is subtracted to reduce the cross-channel interference. Our experimental results show good separation enhancement performances on the real recordings of speech and music signals compared to the conventional BSS methods.

가변 대역폭 LPF를 이용한 피치 검출 (Pitch Detection Using Variable Bandwidth LPF)

  • 금홍;백금란;배명진;장호성
    • 한국음향학회지
    • /
    • 제13권5호
    • /
    • pp.77-82
    • /
    • 1994
  • 음성신호 처리에서, 피치를 정확하게 찾아내는 것이 매우 중요하다. 현재까지 많은 피치 검출 방법들이 제안되어 왔지만, 광범위한 화자와 다양한 음성 데이터로부터 정확한 피치를 찾는 것은 어렵다. 따라서 본 논문에서는 G-peak 검출을 이용한 새로운 피치 검출 알고리즘을 제안한다. 이 방법은 G-peak 의 MZCI (최대 영교차 간격) 을 LPF (low-pass filter)의 차단대역폭으로 결정하여 음성신호의 피치를 검출하는 방법이다. 본 알고리즘은 0dB SNR 환경 하에서 3.36%의 그로스 에러를 나타내는 잡음에 강인한 방법이다. 또한 잡음이 없는 음성의 그로스 에러는 0.18%였고, 모든 과정은 고속 처리가 가능하다.

  • PDF

Efficient Noise Estimation for Speech Enhancement in Wavelet Packet Transform

  • Jung, Sung-Il;Yang, Sung-Il
    • The Journal of the Acoustical Society of Korea
    • /
    • 제25권4E호
    • /
    • pp.154-158
    • /
    • 2006
  • In this paper, we suggest a noise estimation method for speech enhancement in nonstationary noisy environments. The proposed method consists of the following two main processes. First, in order to receive fewer affect of variable signals, a best fitting regression line is used, which is obtained by applying a least squares method to coefficient magnitudes in a node with a uniform wavelet packet transform. Next, in order to update the noise estimation efficiently, a differential forgetting factor and a correlation coefficient per subband are used, where subband is employed for applying the weighted value according to the change of signals. In particular, this method has the ability to update the noise estimation by using the estimated noise at the previous frame only, without utilizing the statistical information of long past frames and explicit nonspeech frames by voice activity detector. In objective assessments, it was observed that the performance of the proposed method was better than that of the compared (minima controlled recursive averaging, weighted average) methods. Furthermore, the method showed a reliable result even at low SNR.

다자간 음성통화 품질 향상을 위한 오디오 믹서 알고리즘 (Audio Mixer Algorithm for Enhancing Speech Quality of Multi-party Audio Telephony)

  • 류상현;김형국
    • 한국음향학회지
    • /
    • 제32권6호
    • /
    • pp.541-547
    • /
    • 2013
  • 두세 명 혹은 그 이상의 참가자간사이의 다자간통화 시 음량불균형, 음량포화, 잡음레벨상승으로 인해서 음질 저하가 발생한다. 이 문제를 해결하기 위해서 본 논문은 소프트웨어 기반의 다지점제어장치를 위한 향상된 오디오 믹싱 알고리즘을 제안한다. 제안된 방식은 음성구간검출과 게인콘트롤이 결합된 기술로서 음성신호 분류, 음량 추정, 게인값 적용, 모든 채널의 음성신호를 믹싱하는 알고리즘들로 구성되어 있다. 제안된 오디오 믹싱 알고리즘은 효율적인 연산과 고품질의 음성을 제공하며, 실질적인 다자간 음성 통화에 적합하다.