• 제목/요약/키워드: cepstral distance

Search Result 41, Processing Time 0.028 seconds

Speech Enhancement Using the Adaptive Noise Canceling Technique with a Recursive Time Delay Estimator (재귀적 지연추정기를 갖는 적응잡음제거 기법을 이용한 음성개선)

  • 강해동;배근성
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.7
    • /
    • pp.33-41
    • /
    • 1994
  • A single channel adaptive noise canceling (ANC) technique with a recursive time delay estimator (RTDE) is presented for removing effects of additive noise on the speech signal. While the conventional method makes a reference signal for the adaptive filter using the pitch estimated on a frame basis from the input speech, the proposed method makes the reference signal using the delay estimated recursively on a sample-by-sample basis. As the RTDEs, the recursion formulae of autocorrelation function (ACF) and average magnitude difference function (AMDF) are derived. The normalized least mean square (NLMS) and recursive least square (RLS) algorithms are applied for adaptation of filter coefficients. Experimental results with noisy speech demonstrate that the proposed method improves the perceived speech quality as well as the signal-to-noise ratio and cepstral distance when compared with the conventional method.

  • PDF

A Study on the Consonant Classification Using Fuzzy Inference (퍼지추론을 이용한 한국어 자음분류에 관한 연구)

  • 박경식
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1992.06a
    • /
    • pp.71-75
    • /
    • 1992
  • This paper proposes algorithm in order to classify Korean consonant phonemes same as polosives, fricatives affricates into la sounds, glottalized sounds, aspirated sounds. This three kinds of sounds are one of distinctive characters of the Korean language which don't eist in language same as English. This is thesis on classfication of 14 Korean consonants(k, t, p, s, c, k', t', p', s', c', kh, ph, ch) as a previous stage for Korean phone recognition. As feature sets for classification, LPC cepstral analysis. The eperiments are two stages. First, using short-time speech signal analysis and Mahalanobis distance, consonant segments are detected from original speech signal, then the consonants are classified by fuzzy inference. As the results of computer simulations, the classification rate of the speech data was come to 93.75%.

  • PDF

Denoising of Speech Signal Using Wavelet Transform (웨이브렛 변환을 이용한 음성신호의 잡음제거)

  • 한미경;배건성
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.27-34
    • /
    • 2000
  • This paper deals with speech enhancement methods using the wavelet transform. A cycle-spinning scheme and undecimated wavelet transform are used for denoising of speech signals, and then their results are compared with that of the conventional wavelet transform. We apply soft-thresholding technique for removing additive background noise from noisy speech. The symlets 8-tap wavelet and pyramid algorithm are used for the wavelet transform. Performance assessments based on average SNR, cepstral distance and informal subjective listening test are carried out. Experimental results demonstrate that both cycle-spinning denoising(CSD) method and undecimated wavelet denoising(CWD) method outperform conventional wavelet denoising(UWD) method in objective performance measure as welt as subjective listening test. The two methods also show less "clicks" that usually appears in the neighborhood of signal discontinuities.

  • PDF

A STUDY ON THE SPEECH SYNTHESIS-BY-RULE SYSTEM APPLIED MULTIBAND EXCITATION SIGNAL

  • Kyung, Younjeong;Kim, Geesoon;Lee, Hwangsoo;Lee, Yanghee
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1994.06a
    • /
    • pp.1098-1103
    • /
    • 1994
  • In this paper, we design and implement the Korean speech synthesis by rule system. This system is applied the multiband excitation signal on voiced sounds. The multiband excitation signal is obtained by mixing impluse spectrum and which noise spectrum. We find that the quality of synthesized speech is improved using this application. Also, we classify the voiced sounds by cepstral euclidian distance measure for reducing overhead memory. The representative excitation signal of the same group's voiced sounds is used as excitation signal on synthesis. This method does not affect the quality of synthesized speech. As the result of experiment, this method eliminates the "buzziness" of synthesized speech and reduces the spectral distortion of synthesized speech.ed speech.

  • PDF

A hybrid structural health monitoring technique for detection of subtle structural damage

  • Krishansamy, Lakshmi;Arumulla, Rama Mohan Rao
    • Smart Structures and Systems
    • /
    • v.22 no.5
    • /
    • pp.587-609
    • /
    • 2018
  • There is greater significance in identifying the incipient damages in structures at the time of their initiation as timely rectification of these minor incipient cracks can save huge maintenance cost. However, the change in the global dynamic characteristics of a structure due to these subtle damages are insignificant enough to detect using the majority of the current damage diagnostic techniques. Keeping this in view, we propose a hybrid damage diagnostic technique for detection of minor incipient damages in the structures. In the proposed automated hybrid algorithm, the raw dynamic signatures obtained from the structure are decomposed to uni-modal signals and the dynamic signature are reconstructed by identifying and combining only the uni-modal signals altered by the minor incipient damage. We use these reconstructed signals for damage diagnostics using ARMAX model. Numerical simulation studies are carried out to investigate and evaluate the proposed hybrid damage diagnostic algorithm and their capability in identifying minor/incipient damage with noisy measurements. Finally, experimental studies on a beam are also presented to compliment the numerical simulations in order to demonstrate the practical application of the proposed algorithm.

Multimodal Emotion Recognition using Face Image and Speech (얼굴영상과 음성을 이용한 멀티모달 감정인식)

  • Lee, Hyeon Gu;Kim, Dong Ju
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.8 no.1
    • /
    • pp.29-40
    • /
    • 2012
  • A challenging research issue that has been one of growing importance to those working in human-computer interaction are to endow a machine with an emotional intelligence. Thus, emotion recognition technology plays an important role in the research area of human-computer interaction, and it allows a more natural and more human-like communication between human and computer. In this paper, we propose the multimodal emotion recognition system using face and speech to improve recognition performance. The distance measurement of the face-based emotion recognition is calculated by 2D-PCA of MCS-LBP image and nearest neighbor classifier, and also the likelihood measurement is obtained by Gaussian mixture model algorithm based on pitch and mel-frequency cepstral coefficient features in speech-based emotion recognition. The individual matching scores obtained from face and speech are combined using a weighted-summation operation, and the fused-score is utilized to classify the human emotion. Through experimental results, the proposed method exhibits improved recognition accuracy of about 11.25% to 19.75% when compared to the most uni-modal approach. From these results, we confirmed that the proposed approach achieved a significant performance improvement and the proposed method was very effective.

Speech Enhancement Using Multiresolutional Signal Analysis Methods (다해상도 신호해석 방법을 이용한 음성개선)

  • Seok, Jong-Won;Han, Mi-Kyung;Bae, Keun-Sung
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.36S no.7
    • /
    • pp.134-135
    • /
    • 1999
  • This paper presents a speech enhancement method with spectral subtraction using wavelet, wavelet packet and cosine packet transforms which are known as multiresolutional signal analysis method. The performance of each method is compared with the conventional spectral subtraction method. Performance assessments based on average SNR, cepstral distance and informal subjective listening test are carried out. Experimental result demonstrate that cosine packet shows the best result in objective performance measure as well as subjective shows less musical noise than the conventional spectral subtraction method after removing the noise components.

  • PDF

Audio Fingerprint Retrieval Method Based on Feature Dimension Reduction and Feature Combination

  • Zhang, Qiu-yu;Xu, Fu-jiu;Bai, Jian
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.2
    • /
    • pp.522-539
    • /
    • 2021
  • In order to solve the problems of the existing audio fingerprint method when extracting audio fingerprints from long speech segments, such as too large fingerprint dimension, poor robustness, and low retrieval accuracy and efficiency, a robust audio fingerprint retrieval method based on feature dimension reduction and feature combination is proposed. Firstly, the Mel-frequency cepstral coefficient (MFCC) and linear prediction cepstrum coefficient (LPCC) of the original speech are extracted respectively, and the MFCC feature matrix and LPCC feature matrix are combined. Secondly, the feature dimension reduction method based on information entropy is used for column dimension reduction, and the feature matrix after dimension reduction is used for row dimension reduction based on energy feature dimension reduction method. Finally, the audio fingerprint is constructed by using the feature combination matrix after dimension reduction. When speech's user retrieval, the normalized Hamming distance algorithm is used for matching retrieval. Experiment results show that the proposed method has smaller audio fingerprint dimension and better robustness for long speech segments, and has higher retrieval efficiency while maintaining a higher recall rate and precision rate.

Multi-constrained optimization combining ARMAX with differential search for damage assessment

  • K, Lakshmi;A, Rama Mohan Rao
    • Structural Engineering and Mechanics
    • /
    • v.72 no.6
    • /
    • pp.689-712
    • /
    • 2019
  • Time-series models like AR-ARX and ARMAX, provide a robust way to capture the dynamic properties of structures, and their residuals can be effectively used as features for damage detection. Even though several research papers discuss the implementation of AR-ARX and ARMAX models for damage diagnosis, they are basically been exploited so far for detecting the time instant of damage and also the spatial location of the damage. However, the inverse problem associated with damage quantification i.e. extent of damage using time series models is not been reported in the literature. In this paper, an approach to detect the extent of damage by combining the ARMAX model by formulating the inverse problem as a multi-constrained optimization problem and solving using a newly developed hybrid adaptive differential search with dynamic interaction is presented. The proposed variant of the differential search technique employs small multiple populations which perform the search independently and exchange the information with the dynamic neighborhood. The adaptive features and local search ability features are built into the algorithm in order to improve the convergence characteristics and also the overall performance of the technique. The multi-constrained optimization formulations of the inverse problem, associated with damage quantification using time series models, attempted here for the first time, can considerably improve the robustness of the search process. Numerical simulation studies have been carried out by considering three numerical examples to demonstrate the effectiveness of the proposed technique in robustly identifying the extent of the damage. Issues related to modeling errors and also measurement noise are also addressed in this paper.

A Signal Processing Technique for Predictive Fault Detection based on Vibration Data (진동 데이터 기반 설비고장예지를 위한 신호처리기법)

  • Song, Ye Won;Lee, Hong Seong;Park, Hoonseok;Kim, Young Jin;Jung, Jae-Yoon
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.111-121
    • /
    • 2018
  • Many problems in rotating machinery such as aircraft engines, wind turbines and motors are caused by bearing defects. The abnormalities of the bearing can be detected by analyzing signal data such as vibration or noise, proper pre-processing through a few signal processing techniques is required to analyze their frequencies. In this paper, we introduce the condition monitoring method for diagnosing the failure of the rotating machines by analyzing the vibration signal of the bearing. From the collected signal data, the normal states are trained, and then normal or abnormal state data are classified based on the trained normal state. For preprocessing, a Hamming window is applied to eliminate leakage generated in this process, and the cepstrum analysis is performed to obtain the original signal of the signal data, called the formant. From the vibration data of the IMS bearing dataset, we have extracted 6 statistic indicators using the cepstral coefficients and showed that the application of the Mahalanobis distance classifier can monitor the bearing status and detect the failure in advance.