• 제목/요약/키워드: Speech signals

검색결과 497건 처리시간 0.034초

전처리된 가변대역폭 LPF에 의한 피치검출법 (On a Pitch Detection using Low Pass Filter with Variable Bandwidth Preprocessed)

  • 한진희
    • 한국음향학회:학술대회논문집
    • /
    • 한국음향학회 1995년도 제12회 음성통신 및 신호처리 워크샵 논문집 (SCAS 12권 1호)
    • /
    • pp.221-224
    • /
    • 1995
  • In speech signal processing, it is necessary to detect exactly the pitch. The algorithms of pitch extraction with have been proposed until now are difficult to detect pitches over wide range speech signals. In this paper, thus, we proposed a new pitch detection algorithm that used a low pass filter with variable bandwidth. It is the method that preprosses to find the first formant of speech signals by the FFT at each frame and detects the pitches for signals LPFed with the cut off frequency according to the first formant. Applying the method, we obtained the pitch contours, improving the accuracy of pitch detection in some noise environments.

  • PDF

화상 전화용 음성 보코더의 실시간 구현 (Real-Time Implementation of Speech Vocoder For Video Telephony)

  • 남일룡;서성대;남현도
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 1998년도 하계학술대회 논문집 G
    • /
    • pp.2414-2416
    • /
    • 1998
  • This paper presents real-time implementation of speech vocoder for PSTN video telephony using ITU G.723 16Kbps ADPCM algorithm. The ADPCM encoder accepts 8-bit PCM compressed signals and expends it to a 14-bit-per-sample. The predicted values are subtracted from encoded signals to produce difference signals. Adaptive quantization is performed on the difference signal to produce a 2-bit, output for transmission over the channel. Computer simulations and experiments were performed to evaluate the performance of the speech vocoder.

  • PDF

Adaptive Speech Streaming Based on Packet Loss Prediction Using Support Vector Machine for Software-Based Multipoint Control Unit over IP Networks

  • Kang, Jin Ah;Han, Mikyong;Jang, Jong-Hyun;Kim, Hong Kook
    • ETRI Journal
    • /
    • 제38권6호
    • /
    • pp.1064-1073
    • /
    • 2016
  • An adaptive speech streaming method to improve the perceived speech quality of a software-based multipoint control unit (SW-based MCU) over IP networks is proposed. First, the proposed method predicts whether the speech packet to be transmitted is lost. To this end, the proposed method learns the pattern of packet losses in the IP network, and then predicts the loss of the packet to be transmitted over that IP network. The proposed method classifies the speech signal into different classes of silence, unvoiced, speech onset, or voiced frame. Based on the results of packet loss prediction and speech classification, the proposed method determines the proper amount and bitrate of redundant speech data (RSD) that are sent with primary speech data (PSD) in order to assist the speech decoder to restore the speech signals of lost packets. Specifically, when a packet is predicted to be lost, the amount and bitrate of the RSD must be increased through a reduction in the bitrate of the PSD. The effectiveness of the proposed method for learning the packet loss pattern and assigning a different speech coding rate is then demonstrated using a support vector machine and adaptive multirate-narrowband, respectively. The results show that as compared with conventional methods that restore lost speech signals, the proposed method remarkably improves the perceived speech quality of an SW-based MCU under various packet loss conditions in an IP network.

An Optimality Theoretic Approach to the Feature Model for Speech Understanding

  • Kim, Kee-Ho
    • 음성과학
    • /
    • 제2권
    • /
    • pp.109-124
    • /
    • 1997
  • This paper shows how a distinctive feature model can effectively be implemented into speech understanding within the framework of the Optimality Theory(OT); i.e., to show how distinctive features can optimally be extracted from given speech signals, and how segments can be chosen as the optimal ones among plausible candidates. This paper will also show how the sequence of segments can successfully be matched with optimal words in a lexicon.

  • PDF

Speech Enhancement Using Phase-Dependent A Priori SNR Estimator in Log-Mel Spectral Domain

  • Lee, Yun-Kyung;Park, Jeon Gue;Lee, Yun Keun;Kwon, Oh-Wook
    • ETRI Journal
    • /
    • 제36권5호
    • /
    • pp.721-729
    • /
    • 2014
  • We propose a novel phase-based method for single-channel speech enhancement to extract and enhance the desired signals in noisy environments by utilizing the phase information. In the method, a phase-dependent a priori signal-to-noise ratio (SNR) is estimated in the log-mel spectral domain to utilize both the magnitude and phase information of input speech signals. The phase-dependent estimator is incorporated into the conventional magnitude-based decision-directed approach that recursively computes the a priori SNR from noisy speech. Additionally, we reduce the performance degradation owing to the one-frame delay of the estimated phase-dependent a priori SNR by using a minimum mean square error (MMSE)-based and maximum a posteriori (MAP)-based estimator. In our speech enhancement experiments, the proposed phase-dependent a priori SNR estimator is shown to improve the output SNR by 2.6 dB for both the MMSE-based and MAP-based estimator cases as compared to a conventional magnitude-based estimator.

음성 강화를 위한 a priori SNR 추정기반 적응 바람소리 저감 방법 (An Adaptive Wind Noise Reduction Method Based on a priori SNR Estimation for Speech Eenhancement)

  • 서지훈;이석필
    • 전기학회논문지
    • /
    • 제64권12호
    • /
    • pp.1756-1760
    • /
    • 2015
  • This paper focuses on a priori signal to noise ratio (SNR) estimation method for the speech enhancement. There are many researches for speech enhancement with several ambient noise cancellation methods. The method based on spectral subtraction (SS) which is widely used in noise reduction has a trade-off between the performance and the distortion of the signals. So the need of adaptive method like an estimated a priori SNR being able to making a high performance and low distortion is increasing. The decision directed (DD) approach is used to determine a priori SNR in noisy speech signals. A priori SNR is estimated by using only the magnitude components and consequently follows a posteriori SNR with one frame delay. We propose a modified a priori SNR estimator and the weighted rational transfer function for speech enhancement with wind noises. The experimental result shows the performance of our proposed estimator is better Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862) compare to the conventional DD approach-based systems and different noise reduction methods.

음성신호와 전기성문파를 이용하는 새로운 매개변수 ; 성대 폐쇄 지연비율(Glottal Closure Delay Ratio) (New Parameter on Speech and EGG; Glottal Closure Delay Ratio)

  • 최종민;권택균;정은정;이명철;김광현;성명훈;박광석
    • 대한후두음성언어의학회지
    • /
    • 제18권1호
    • /
    • pp.22-25
    • /
    • 2007
  • Background and Objectives: Biomedical signals have been usually used for the diagnosis of the laryngeal function such as speech, electroglottograph(EGG), airflow and other signals. But, in most cases these signals were analysed separately. Here, we propose a new interchannel parameter Glottal Closure Delay Ratio(GCDR) which is estimated from speech and EGG measured simultaneously. Materials and Method: Speech and EGG signal were recorded simultaneously from 13 normal subjects, 39 patients. The patients' data included 16 polyps and 23 vocal folds palsy. Time difference between glottal closing instance on EGG and the first maximum peak on speech in a pitch period was calculated. Glottal closing instance was defined as the maximum peak on the first derivative of EGG signal(dEGG). Results: The standard deviation and jitter were calculated using 20-30 GCDRs extracted from each data, and they are significant different between normal and vocal fold paralysis group. Conclusion: The GCDR may be the first index reflecting speech and EGG characteristics and the perturbation of this parameter was significant different between normal and vocal fold paralysis group.

  • PDF

A Novel Two-Level Pitch Detection Approach for Speaker Tracking in Robot Control

  • Hejazi, Mahmoud R.;Oh, Han;Kim, Hong-Kook;Ho, Yo-Sung
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 제어로봇시스템학회 2005년도 ICCAS
    • /
    • pp.89-92
    • /
    • 2005
  • Using natural speech commands for controlling a human-robot is an interesting topic in the field of robotics. In this paper, our main focus is on the verification of a speaker who gives a command to decide whether he/she is an authorized person for commanding. Among possible dynamic features of natural speech, pitch period is one of the most important ones for characterizing speech signals and it differs usually from person to person. However, current techniques of pitch detection are still not to a desired level of accuracy and robustness. When the signal is noisy or there are multiple pitch streams, the performance of most techniques degrades. In this paper, we propose a two-level approach for pitch detection which in compare with standard pitch detection algorithms, not only increases accuracy, but also makes the performance more robust to noise. In the first level of the proposed approach we discriminate voiced from unvoiced signals based on a neural classifier that utilizes cepstrum sequences of speech as an input feature set. Voiced signals are then further processed in the second level using a modified standard AMDF-based pitch detection algorithm to determine their pitch periods precisely. The experimental results show that the accuracy of the proposed system is better than those of conventional pitch detection algorithms for speech signals in clean and noisy environments.

  • PDF

4채널 환경에서 독립벡터분석 및 주파수대역 빔형성 알고리즘에 의한 혼합잡음제거 (Mixed Noise Cancellation by Independent Vector Analysis and Frequency Band Beamforming Algorithm in 4-channel Environments)

  • 최재승
    • 한국전자통신학회논문지
    • /
    • 제14권5호
    • /
    • pp.811-816
    • /
    • 2019
  • 본 논문에서는 잡음이 포함된 4채널의 음원신호를 주파수 대역의 독립벡터분석 알고리즘에 의하여 깨끗한 음성신호와 혼합잡음신호를 분리하는 기법을 먼저 제안한다. 제안한 독립벡터분석 알고리즘에 의하여 분리된 음원신호를 주파수대역 지연합 빔형성기로부터 출력되는 신호와 독립벡터분석으로부터 분리된 출력신호 간의 상호 상관성을 이용하여 향상된 출력음성신호를 구한다. 본 실험에서는 백색잡음이 포함된 0dB, -5dB의 SNR의 입력 혼합잡음음성에 대하여, 본 논문에서 제안하고 있는 알고리즘이 주파수대역 지연합 빔형성기 알고리즘만을 사용하였을 때 보다 최대 10.90dB의 SNR 및 10.02dB의 Segmental SNR이 개선되었음을 확인하였다. 따라서 본 논문의 알고리즘 기법이 주파수대역 지연합 빔형성기와 비교하여 음성품질이 향상된 것을 실험 및 고찰을 통하여 확인할 수 있었다.

Multi Mode Harmonic Transform Coding for Speech and Music

  • Kim, Jonghark;Shin, Jae-Hyun;Lee, Insung
    • The Journal of the Acoustical Society of Korea
    • /
    • 제22권3E호
    • /
    • pp.101-109
    • /
    • 2003
  • A multi-mode harmonic transform coding (MMHTC) for speech and music signals is proposed. Its structure is organized as a linear prediction model with an input of harmonic and transform-based excitation. The proposed coder also utilizes harmonic prediction and an improved quantizer of excitation signal. To efficiently quantize the excitation of music signals, the modulated lapped transform(MLT) is introduced. In other words, the coder combines both the time domain (linear prediction) and the frequency domain technique to achieve the best perceptual quality. The proposed coder showed better speech quality than that of the 8 kbps QCELP coder at a bit-rate of 4 kbps.