• Title/Summary/Keyword: Speech signal processing

Search Result 333, Processing Time 0.032 seconds

Phonetic Acoustic Knowledge and Divide And Conquer Based Segmentation Algorithm (음성학적 지식과 DAC 기반 분할 알고리즘)

  • Koo, Chan-Mo;Wang, Gi-Nam
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.215-222
    • /
    • 2002
  • This paper presents a reliable fully automatic labeling system which fits well with languages having well-developed syllables such as in Korean. The ASL System utilize DAC (Divide and Conquer), a control mechanism, based segmentation algorithm to use phonetic and acoustic information with greater efficiency. The segmentation algorithm is to devide speech signals into speechlets which is localized speech signal pieces and to segment each speechlet for speech boundaries. While HMM method has uniform and definite efficiencies, the suggested method gives framework to steadily develope and improve specified acoustic knowledges as a component. Without using statistical method such as HMM, this new method use only phonetic-acoustic information. Therefore, this method has high speed performance, is consistent extending the specific acoustic knowledge component, and can be applied in efficient way. we show experiment result to verify suggested method at the end.

Improvement of the Linear Predictive Coding with Windowed Autocorrelation (윈도우가 적용된 자기상관에 의한 선형예측부호의 개선)

  • Lee, Chang-Young;Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.6 no.2
    • /
    • pp.186-192
    • /
    • 2011
  • In this paper, we propose a new procedure for improvement of the linear predictive coding. To reduce the error power incurred by the coding, we interchanged the order of the two procedures of windowing on the signal and linear prediction. This scheme corresponds to LPC extraction with windowed autocorrelation. The proposed method requires more calculational time because it necessitates matrix inversion on more parameters than the conventional technique where an efficient Levinson-Durbin recursive procedure is applicable with smaller parameters. Experimental test over various speech phonemes showed, however, that our procedure yields about 5 % less power distortion compared to the conventional technique. Consequently, the proposed method in this paper is thought to be preferable to the conventional technique as far as the fidelity is concerned. In a separate study of speaker-dependent speech recognition test for 50 isolated words pronounced by 40 people, our approach yielded better performance too.

A Study on Pseudo N-gram Language Models for Speech Recognition (음성인식을 위한 의사(疑似) N-gram 언어모델에 관한 연구)

  • 오세진;황철준;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.3
    • /
    • pp.16-23
    • /
    • 2001
  • In this paper, we propose the pseudo n-gram language models for speech recognition with middle size vocabulary compared to large vocabulary speech recognition using the statistical n-gram language models. The proposed method is that it is very simple method, which has the standard structure of ARPA and set the word probability arbitrary. The first, the 1-gram sets the word occurrence probability 1 (log likelihood is 0.0). The second, the 2-gram also sets the word occurrence probability 1, which can only connect the word start symbol and WORD, WORD and the word end symbol . Finally, the 3-gram also sets the ward occurrence probability 1, which can only connect the word start symbol , WORD and the word end symbol . To verify the effectiveness of the proposed method, the word recognition experiments are carried out. The preliminary experimental results (off-line) show that the word accuracy has average 97.7% for 452 words uttered by 3 male speakers. The on-line word recognition results show that the word accuracy has average 92.5% for 20 words uttered by 20 male speakers about stock name of 1,500 words. Through experiments, we have verified the effectiveness of the pseudo n-gram language modes for speech recognition.

  • PDF

Performance Improvement of Stereo Acoustic Echo Canceller Using MINT Filtering (MINT 필터링에 의한 스테레오 음향 반향 제거기의 성능 향상)

  • 차경환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.21 no.1
    • /
    • pp.42-46
    • /
    • 2002
  • In this paper, a new pre-processing algorithm is proposed to improve the performance of stereo acoustic echo canceller. The proposed algorithm has the improved performance by the estimation error reduction of filter coefficient using input signal which was reduced reverberation of room in the basis MINT (Mu1tip1e-input/output Inverse Theorem) filtering. For real stereo speech signal and real room impulse response the results of simulation, we showed that the proposed method could improved 3∼5 dB ERLE (Echo Return Loss Enhancement) regardless of NLMS (Normalized Least Mean Square) and Projection adaptive algorithm.

A study on the design of new floating resistor and it′s application (새로운 CMOS Floating저항의 설계와 그 응용에 대한연구)

  • 이영훈
    • Journal of the Korea Society of Computer and Information
    • /
    • v.5 no.3
    • /
    • pp.76-83
    • /
    • 2000
  • The continuous time signal system by development of CMOS technology have been receiving consideration attention. In this paper, Low pass filter using new CMOS floating resistor have been designed with cut off frequency for speech signal Processing. Especially a new floating resistor consisting entirely of CMOS devices in saturation has been developed. Linearity within $\pm$0.04% is achieved through nonlinearity via current mirrors over an applied range of $\pm$1V The frequency response exceeds 10MHz, and the resistors are expected to be useful in implementing integrated circuit active RC filters. The low pass filter designed using this method has simpler structure than switched capacitofilter. So reduce the chip area. The characteristics of the designed low pass filter using this method are simulated by pspice program.

  • PDF

Variational autoencoder for prosody-based speaker recognition

  • Starlet Ben Alex;Leena Mary
    • ETRI Journal
    • /
    • v.45 no.4
    • /
    • pp.678-689
    • /
    • 2023
  • This paper describes a novel end-to-end deep generative model-based speaker recognition system using prosodic features. The usefulness of variational autoencoders (VAE) in learning the speaker-specific prosody representations for the speaker recognition task is examined herein for the first time. The speech signal is first automatically segmented into syllable-like units using vowel onset points (VOP) and energy valleys. Prosodic features, such as the dynamics of duration, energy, and fundamental frequency (F0), are then extracted at the syllable level and used to train/adapt a speaker-dependent VAE from a universal VAE. The initial comparative studies on VAEs and traditional autoencoders (AE) suggest that the former can efficiently learn speaker representations. Investigations on the impact of gender information in speaker recognition also point out that gender-dependent impostor banks lead to higher accuracies. Finally, the evaluation on the NIST SRE 2010 dataset demonstrates the usefulness of the proposed approach for speaker recognition.

Detection of Glottal Closure Instant using the property of G-peak (G-peak의 특성을 이용한 성문폐쇄시점 검출)

  • Keum, Hong;Kim, Dae-Sik;Bae, Myung-Jin;Kim, Young-Il
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.1E
    • /
    • pp.82-88
    • /
    • 1994
  • It is important to exactly detect the GCI(Glottal Closure Instant) in the speech signal processing. A few methods to detect the GCI of voiced speech have een proposer, untill now. But these are difficult to detect the GCI for wide range of speakers and or various vowel signals. In this paper, we prposed a new method for GCI detection using the G-peak. The speech waveforms are passed through the LPF of variable bandwidth. Then, the GCI's of voiced speech are detected by the G-peak based on the filtered signals. We compared the detected with the eye-checked GCI at the SNR of clean, 20dB, and 0dB. We took into account the range within 1ms between eye-checked and detected GCI. We obtained the result of the detection rate as 97.9% in the clean speech, 96.5% in 20dB SNR, and 94.8% in 0dB SNR, respectively.

  • PDF

A Study on the Robust Pitch Period Detection Algorithm in Noisy Environments (소음환경에 강인한 피치주기 검출 알고리즘에 관한 연구)

  • Seo Hyun-Soo;Bae Sang-Bum;Kim Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2006.05a
    • /
    • pp.481-484
    • /
    • 2006
  • Pitch period detection algorithms are applied to various speech signal processing fields such as speech recognition, speaker identification, speech analysis and synthesis. Furthermore, many pitch detection algorithms of time and frequency domain have been studied until now. AMDF(average magnitude difference function) ,which is one of pitch period detection algorithms, chooses a time interval from the valley point to the valley point as the pitch period. AMDF has a fast computation capacity, but in selection of valley point to detect pitch period, complexity of the algorithm is increased. In order to apply pitch period detection algorithms to the real world, they have robust prosperities against generated noise in the subway environment etc. In this paper we proposed the modified AMDF algorithm which detects the global minimum valley point as the pitch period of speech signals and used speech signals of noisy environments as test signals.

  • PDF

A Study on Frequency-Time Plane Analysis of Wavelet (웨이브렛의 주파수-시간 평면 해석에 관한 연구)

  • Bae, Sang-Bum;Ryu, Ji-Goo;Kim, Nam-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • v.9 no.2
    • /
    • pp.451-454
    • /
    • 2005
  • Recently, many methods to analyze signal have been proposed and representative methods are the Fourier transform and wavelet transform. In these methods, the Fourier transform represents signal with combination cosine and sine at all locations in the frequency domain. However, it doesn't provide time information that particular frequency occurs in signal and depends on only the global feature of the signal. So, to improve these points the wavelet transform which is capable of multiresolution analysis has been applied to many fields such as speech processing, image processing and computer vision. And the wavelet transform, which uses changing window according to scale parameter, presents time-frequency localization. In this paper, we proposed a new approach using a wavelet of cosine and sine type and analyzed features of signal in a limited point of frequency-time plane.

  • PDF

Artifact Cancellation due to Rotational Motion in MRI (MRI내 회전운동에 기인한 아티팩트 제거)

  • Kim, Eung-Kyeu;Lee, Soo-Jong
    • Proceedings of the Korea Institute of Convergence Signal Processing
    • /
    • 2005.11a
    • /
    • pp.155-158
    • /
    • 2005
  • MRI 스캔시 화상평면내에서 촬상대상물체의 회전은 MRI 신호에 위상오차와 불균일 표본화를 일으킨다. MRI 신호의 위상오차와 불균일 표본화에 대한 문제의 모델은 화상평면내 임의 중심과 원점에 관한 회전운동에 의해 열화된 MRI 신호들사이에 위상차가 존재함을 나타낸다. 이에 아티팩트가 포함된 MR 화상의 화질을 개선하기위해서 다음과 같은 방법을 제안한다. 우선, 2차원 회전운동의 회전각은 이미 알려져 있고, 회전중심의 위치가 미지인 경우에 대해 위상보정에 기초한 아티팩트를 보정하는 알고리즘과, 다음으로, 회전중심과 각도가 모두 미지인 2차원 회전운동에 대해 아티팩트를 보정하는 알고리즘을 제안한다. 이때, 미지 운동 파라메타를 예측하기위해 촬상대상물체의 경계바깥쪽에서 이상적인 MR 화상의 에너지는 최소가 되고, 촬상대상물체의 회전이 존재할 때 측정된 에너지는 증가한다는 성질을 이용한다. 이러한 성질을 이용해서 각 위상부호화 단계에서 미지의 회전각 크기를 추정하기위한 평가함수가 도입된다. 최종적으로 시뮬레이션 화상 및 실제화상에 적용해서 제안한 본 방법의 유효성을 확인한다.

  • PDF