• Title/Summary/Keyword: speech signals

Search Result 499, Processing Time 0.022 seconds

Implementation of Quad Variable Rates ADPCM Speech CODEC on C6000 DSP considering the Environmental Noise (배경잡음을 고려한 4배 가변 압축률을 갖는 ADPCM의 C6000 DSP 실시간 구현)

  • Kim Dae-Sung;Han Kyong-ho
    • Proceedings of the KIPE Conference
    • /
    • 2002.07a
    • /
    • pp.727-729
    • /
    • 2002
  • In this paper, we proposed quad variable rates ADPCM coding method and its implementation on C6000 DSP, which is modified from the standard ADPCM of ITU G.726 for speech quality improvement considering the environmental noise Four coding rates, 16Kbps, 24Kbps, 32Kbps and 40Kbps are used for speech window samples and the rate decision threshold is decided by the environmental noise level. The object of the proposed method is to reduce the coding rate while retaining the speech quality and the speech quality is considerably close to 40Kbps single rate coder with the coding rate close to 16Kbps single rate coder under the environmental noise. The environmental noise level affects the coding rate and the noise level is calculated per every speech window samples. At high noise level, more samples are coded at higher rates to enhance the quality, but at low noise level, only the big speech signals are coded at higher rates and more speech samples are coded at lower coding rates to reduce the coding rates. The influence of the noise on tile speech signal is considerably high for small signals and the small signal has the higher ZCR (zero crossing rate). The method is simulated in PC and to be implemented on C6000 floating point DSP board in real time operations.

  • PDF

A Study on Objective Speech Quality Measure under CDMA Telephone Networks Environment (CDMA 통신망에서의 객관적 음질 평가 척도에 관한 연구)

  • 김광수;김민정;석수영;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.4
    • /
    • pp.53-58
    • /
    • 2001
  • In this paper to develop objective speech quality measure for CDMA telephone network environments, recent developed measures are investigated first. But those measures show low performances in CDMA telephone networks. To solve this problem, new objective speech quality measure adopting noise masking threshold is proposed and studied. To acquire better performance, scaled noise masking threshold calculation for speech signals is employed instead of conventional tone signals. To verify effectiveness of proposed method performance comparison experiments are carried out for CDMA telephone network speech databases, for the results proposed methods show improved performances compared to existing meaures.

  • PDF

An Emotion Recognition Technique using Speech Signals (음성신호를 이용한 감정인식)

  • Jung, Byung-Wook;Cheun, Seung-Pyo;Kim, Youn-Tae;Kim, Sung-Shin
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.4
    • /
    • pp.494-500
    • /
    • 2008
  • In the field of development of human interface technology, the interactions between human and machine are important. The research on emotion recognition helps these interactions. This paper presents an algorithm for emotion recognition based on personalized speech signals. The proposed approach is trying to extract the characteristic of speech signal for emotion recognition using PLP (perceptual linear prediction) analysis. The PLP analysis technique was originally designed to suppress speaker dependent components in features used for automatic speech recognition, but later experiments demonstrated the efficiency of their use for speaker recognition tasks. So this paper proposed an algorithm that can easily evaluate the personal emotion from speech signals in real time using personalized emotion patterns that are made by PLP analysis. The experimental results show that the maximum recognition rate for the speaker dependant system is above 90%, whereas the average recognition rate is 75%. The proposed system has a simple structure and but efficient to be used in real time.

Speech Recognition through Speech Enhancement (음질 개선을 통한 음성의 인식)

  • Cho, Jun-Hee;Lee, Kee-Seong
    • Proceedings of the KIEE Conference
    • /
    • 2003.11c
    • /
    • pp.511-514
    • /
    • 2003
  • The human being uses speech signals to exchange information. When background noise is present, speech recognizers experience performance degradations. Speech recognition through speech enhancement in the noisy environment was studied. Histogram method as a reliable noise estimation approach for spectral subtraction was introduced using MFCC method. The experiment results show the effectiveness of the proposed algorithm.

  • PDF

A Fixed Rate Speech Coder Based on the Filter Bank Method and the Inflection Point Detection

  • Iem, Byeong-Gwan
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.4
    • /
    • pp.276-280
    • /
    • 2016
  • A fixed rate speech coder based on the filter bank and the non-uniform sampling technique is proposed. The non-uniform sampling is achieved by the detection of inflection points (IPs). A speech block is band passed by the filter bank, and the subband signals are processed by the IP detector, and the detected IP patterns are compared with entries of the IP database. For each subband signal, the address of the closest member of the database and the energy of the IP pattern are transmitted through channel. In the receiver, the decoder recovers the subband signals using the received addresses and the energy information, and reconstructs the speech via the filter bank summation. As results, the coder shows fixed data rate contrary to the existing speech coders based on the non-uniform sampling. Through computer simulation, the usefulness of the proposed technique is confirmed. The signal-to-noise ratio (SNR) performance of the proposed method is comparable to that of the uniform sampled pulse code modulation (PCM) below 20 kbps data rate.

Fundamental Frequency Estimation of Voiced Speech Signals Based on the Inflection Point Detection (변곡점 검출에 기반한 음성의 기본 주파수 추정)

  • Byeonggwan Iem
    • Journal of IKEEE
    • /
    • v.27 no.4
    • /
    • pp.472-476
    • /
    • 2023
  • Fundamental frequency/pitch period are major characteristics of speech signals. They are used in many speech applications like speech coding, speech recognition, speaker identification, and so on. In this paper, some of inflection points are used to estimate the pitch which is the inverse of the fundamental frequency. The inflection points are defined as points where local maxima, local minima or the slope changes occur. The speech signal is preprocessed to remove unnecessary inflection points due to the high frequency components using a low pass filter. Only the inflection points from local maxima are used to get the pitch period. While the existing pitch estimation methods process speech signals in blockwise, the proposed method detects the inflection points in sample and produces the pitch period/fundamental frequency estimates along the time. Computer simulation shows the usefulness of the proposed method as a fundamental frequency estimator.

The Korean Text-to-speech Using Syllable Units (음절 단위를 이용한 한국어 음성 합성)

  • 김병수;윤기선;박성한
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.1
    • /
    • pp.143-150
    • /
    • 1990
  • In this paper, a rule-based method for improving the intelligibility of synthetic speech is proposed. A 12-pole linear prediction coding method is used to model syllable speech signals. A syllable concatenation rule for pause and frame rejection between syllables is developed to improve the naturalness of the synthetic speech. In addition, phonoligical structure transform rule and prosody rule are applied to the synthetic speech by LPC. The illustrative results demonstrate that the synthetic speech obtained by applying these rules has better naturalness than the synthetic speech by LPC.

  • PDF

The High Speed Pitch Extraction of Speech Signals Using the Area Comparison Method (면적 비교법에 의한 고속 PITCH 추출)

  • 배명진;안수결
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.22 no.2
    • /
    • pp.13-17
    • /
    • 1985
  • In this paper, a new pitch extraction method, the area comparison method, is proposed. By the speech production model, the area of the first peak on a pitch interval of speech signals is emphasized. By using the above characteristics, this method have more advantages than the others for pitch extraction. The defective decision caused by an impulsive noise is minimized and the pre-filtering is not necessary for this method, because the intergration of signals takes place in the process.

  • PDF

A Study on the Relation Between the LSF's and Spectral Distribution of Speech Signals (Line Spectral Frequency와 음성신호의 주파수 분포에 관한 연구)

  • 이동수;김영화
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.25 no.4
    • /
    • pp.430-436
    • /
    • 1988
  • LSF(Line Spectral Frequency) derived from LPC has known as a very useful transmission parameter of speech signals, for it has a good linear interpolation characteristics and a low spectrum distortion at low bit rates coding. This paper presents that it is possible to extract directly the formant frequencies of speech signals from LSF parameter without application of FFT algorithm by comparing the distribution of LSF parameter with the frequency distribution of analysis filter. This paper suggests the advanced algorithm that results in improving the speed of convergence at analytic solution method. Also, for the flexibility of parameters, the process that transforms from LSF to LPC is presented.

  • PDF

Speech Enhancement Using Blind Signal Separation Combined With Null Beamforming

  • Nam Seung-Hyon;Jr. Rodrigo C. Munoz
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.4E
    • /
    • pp.142-147
    • /
    • 2006
  • Blind signal separation is known as a powerful tool for enhancing noisy speech in many real world environments. In this paper, it is demonstrated that the performance of blind signal separation can be further improved by combining with a null beamformer (NBF). Cascading the blind source separation with null beamforming is equivalent to the decomposition of the received signals into the direct parts and reverberant parts. Investigation of beam patterns of the null beamformer and blind signal separation reveals that directional null of NBF reduces mainly direct parts of the unwanted signals whereas blind signal separation reduces reverberant parts. Further, it is shown that the decomposition of received signals can be exploited to solve the local stability problem. Therefore, faster and improved separation can be obtained by removing the direct parts first by null beamforming. Simulation results using real office recordings confirm the expectation.