• Title/Summary/Keyword: Speech signals

Search Result 498, Processing Time 0.04 seconds

An Acoustic Echo Canceller By Using the Reduced Lattice Filter Structure (축소격자필터 구조를 사용한 음향반향제거기)

  • 유재하;조성호;윤대희;차일환
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.32B no.11
    • /
    • pp.1473-1480
    • /
    • 1995
  • When the LMS algorithm is employed in the transversal filter structure, the computational complexity can be kept reasonably low. However, if the impulse response to be estimated is very long or signals involved are highly correlated like a speech the convergence speed becomes slow. The lattice filter is an excellent alternative to improve convergence speed since the lattice structure inherently has the orthogonal property among the backward prediction errors, but at the expense of the excessive computational load. If the input signal to be used can be sufficiently well modeled as a .RHO.-th order autoregressive(AR) process, the reflection coefficients after the .RHO.- th stage will be close to zero. Then, instead of employing the full lattice structure, the joint lattice filter structure can be implemented in conjunction with the transversal filter structure after the .RHO.-th stage. We propose, in this paper, this new lattice/transversal joint structure, and we will call it the reduced lattice filter. Using the reduced lattice filter, we are now able to achieve the performance as good as that of the lattice filter, while maintaining the complexity as low as that of the transversal filter. The proposed filter is particularly useful for an acoustic echo canceller due to the highly correlatedness nature of speeches and the long and frequently changing echo paths.

  • PDF

Identification and Detection of Emotion Using Probabilistic Output SVM (확률출력 SVM을 이용한 감정식별 및 감정검출)

  • Cho, Hoon-Young;Jung, Gue-Jun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.8
    • /
    • pp.375-382
    • /
    • 2006
  • This paper is about how to identify emotional information and how to detect a specific emotion from speech signals. For emotion identification and detection task. we use long-term acoustic feature parameters and select the optimal Parameters using the feature selection technique based on F-score. We transform the conventional SVM into probabilistic output SVM for our emotion identification and detection system. In this paper we propose three approximation methods for log-likelihoods in a hypothesis test and compare the performance of those three methods. Experimental results using the SUSAS database showed the effectiveness of both feature selection and Probabilistic output SVM in the emotion identification task. The proposed methods could detect anger emotion with 91.3% correctness.

Optimization of Memristor Devices for Reservoir Computing (축적 컴퓨팅을 위한 멤리스터 소자의 최적화)

  • Kyeongwoo Park;HyeonJin Sim;HoBin Oh;Jonghwan Lee
    • Journal of the Semiconductor & Display Technology
    • /
    • v.23 no.1
    • /
    • pp.1-6
    • /
    • 2024
  • Recently, artificial neural networks have been playing a crucial role and advancing across various fields. Artificial neural networks are typically categorized into feedforward neural networks and recurrent neural networks. However, feedforward neural networks are primarily used for processing static spatial patterns such as image recognition and object detection. They are not suitable for handling temporal signals. Recurrent neural networks, on the other hand, face the challenges of complex training procedures and requiring significant computational power. In this paper, we propose memristors suitable for an advanced form of recurrent neural networks called reservoir computing systems, utilizing a mask processor. Using the characteristic equations of Ti/TiOx/TaOy/Pt, Pt/TiOx/Pt, and Ag/ZnO-NW/Pt memristors, we generated current-voltage curves to verify their memristive behavior through the confirmation of hysteresis. Subsequently, we trained and inferred reservoir computing systems using these memristors with the NIST TI-46 database. Among these systems, the accuracy of the reservoir computing system based on Ti/TiOx/TaOy/Pt memristors reached 99%, confirming the Ti/TiOx/TaOy/Pt memristor structure's suitability for inferring speech recognition tasks.

  • PDF

Performance Improvement of Speaker Recognition by MCE-based Score Combination of Multiple Feature Parameters (MCE기반의 다중 특징 파라미터 스코어의 결합을 통한 화자인식 성능 향상)

  • Kang, Ji Hoon;Kim, Bo Ram;Kim, Kyu Young;Lee, Sang Hoon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.6
    • /
    • pp.679-686
    • /
    • 2020
  • In this thesis, an enhanced method for the feature extraction of vocal source signals and score combination using an MCE-Based weight estimation of the score of multiple feature vectors are proposed for the performance improvement of speaker recognition systems. The proposed feature vector is composed of perceptual linear predictive cepstral coefficients, skewness, and kurtosis extracted with lowpass filtered glottal flow signals to eliminate the flat spectrum region, which is a meaningless information section. The proposed feature was used to improve the conventional speaker recognition system utilizing the mel-frequency cepstral coefficients and the perceptual linear predictive cepstral coefficients extracted with the speech signals and Gaussian mixture models. In addition, to increase the reliability of the estimated scores, instead of estimating the weight using the probability distribution of the convectional score, the scores evaluated by the conventional vocal tract, and the proposed feature are fused by the MCE-Based score combination method to find the optimal speaker. The experimental results showed that the proposed feature vectors contained valid information to recognize the speaker. In addition, when speaker recognition is performed by combining the MCE-based multiple feature parameter scores, the recognition system outperformed the conventional one, particularly in low Gaussian mixture cases.

Selection of Auditory Icons in Ship Bridge Alarm Management System Using the Sensibility Evaluation (감성평가를 이용한 선교알람관리시스템의 청각아이콘 평가)

  • Oh, Seungbin;Jang, Jun-Hyuk;Park, Jin Hyoung;Kim, Hongtae
    • Journal of Navigation and Port Research
    • /
    • v.37 no.4
    • /
    • pp.401-407
    • /
    • 2013
  • In parallel with the development of ship equipment, bridge systems have been improved, but marine accidents due to human error have not been reduced. Recently, research in nautical bridge equipment has focused on suitable ergonomic designs in order to reduce these errors due to human factors. In a bridge of a ship, there are numerous auditory signals that deliver important information clearly to the sailors. However, only a few studies have been conducted related to the human recognition of these auditory signals. There are three types of auditory signals: voice alarms, abstract sounds, and auditory icons. This study was conducted in order to design more appropriate auditory icons using a sensibility evaluation method. The auditory icons were rated to have five warning situations (engine failure, fire, steering failure, low power, and collision) using the Semantic Differential Method. It is expected that the results of this study will be used as basic data for auditory displays inside bridges and for integrated bridge alarm systems.

Evaluation of a signal segregation by FDBM (FDBM의 음원분리 성능평가)

  • Lee, Chai-Bong
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.8 no.12
    • /
    • pp.1793-1802
    • /
    • 2013
  • Various approaches for sound source segregation have been proposed. Among these approaches, frequency domain binaural model(FDBM) has the advantages of low computational load and effective howling cancellation. A binaural hearing assistance system based on FDBM has been proposed. This system can enhance desired signal based on the directivity information. Although FDBM has been evaluated in terms of signal-to-noise ratio (SNR) and coherence function, the evaluation results do not always agree with the human impressions. These evaluation methods provide physical measures, and do not take account of perceptual aspect of human being. Considering a binaural hearing assistance system as a one of major applications, the quality of segregated sound should keep level enough. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and Perceptual Evaluation of Speech Quality(PESQ), to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions. In the paper, signal segregation performance by means of FDBM is evaluated by three objective methods, i.e., SNR, coherence and PESQ, to discuss the characteristic of FDBM on the sound source segregation performance. The simulation's evaluation results show that FDBM improves the quality of the left and right channel signals to an equivalent level. And the results suggest the possibility that PESQ provides a more useful measure than SNR and coherence in terms of the segregation performance of FDBM. The evaluation results by PESQ show the effects from segregation parameters and indicate appropriate parameters under the conditions.

Functional MRI of Language Area (언어영역의 기능적 자기공명영상)

  • 유재욱;나동규;변홍식;노덕우;조재민;문찬홍;나덕렬;장기현
    • Investigative Magnetic Resonance Imaging
    • /
    • v.3 no.1
    • /
    • pp.53-59
    • /
    • 1999
  • Purpose : To evaluate the usefulness of functional MR imaging (fMRI) for language mapping and determination of language lateralization. Materials and Methods : Functional maps of the language area were obtained during word generation tasks and decision task in ten volunteers (7 right handed, 3 left-handed). MR examinations were performed at 1.5T scanner with EPI BOLD technique. Each task consisted of three resting periods and two activation periods with each period of 30 seconds. Total acquisition time was 162 sec. SPM program was used for the postprocessing of images. Statistical comparisons were performed by using t-statistics on a pixel-by- pixel basis after global normalization by ANCOVA. Activation areas were topographically analyzed (p>0.001) and activated pixels in each hemisphere were compared quantitatively by lateralization index. Results : Significant activation signals were demonstrated in 9 of 10 volunteers. Activation signals were found in the premotor and motor cortices, the inferior frontal, inferior parietal, and mid-temporal lobes during stimulation tasks. In the right handed seven volunteers, activation of language areas was lateralized to the left side. Verb generation task produced stronger activation in the language areas and higher value of lateralization index than noun generation task or decision task. Conclusion : fMRI could be a useful non-invasive method for language mapping and determination of language dominance.

  • PDF

On the Behavior of the Signed Regressor Least Mean Squares Adaptation with Gaussian Inputs (가우시안 입력신호에 대한 Signed Regressor 최소 평균자승 적응 방식의 동작 특성)

  • 조성호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.7
    • /
    • pp.1028-1035
    • /
    • 1993
  • The signed regressor (SR) algorithm employs one bit quantization on the input regressor (or tap input) in such a way that the quantized input sequences become +1 or -1. The algorithm is computationally more efficient by nature than the popular least mean square (LMS) algorithm. The behavior of the SR algorithm unfortunately is heavily dependent on the characteristics of the input signal, and there are some Inputs for which the SR algorithm becomes unstable. It is known, however, that such a stability problem does not take place with the SR algorithm when the input signal is Gaussian, such as in the case of speech processing. In this paper, we explore a statistical analysis of the SR algorithm. Under the assumption that signals involved are zero-mean and Gaussian, and further employing the commonly used independence assumption, we derive a set of nonlinear evolution equations that characterizes the mean and mean-squared behavior of the SR algorithm. Experimental results that show very good agreement with our theoretical derivations are also presented.

  • PDF

A Study on the Acoustic Characteristics of the Pansori by Voice Signals Analysis (음성신호 분석에 의한 판소리의 음성학적 특징 연구)

  • Kim, HyunSook
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.7
    • /
    • pp.3218-3222
    • /
    • 2013
  • Pansori is our traditional vocal sound, originality and excellence in the art of conversation, gesture general became a globally recognized world intangible heritage. Especially, Pansori as shrews and humorous representation of audience participation with a high degree of artistic value and enjoy the arts throughout all layers to be responsible for the social integration of functions is evaluated. Therefore, in this paper, Pansori five yard target speech signal analysis techniques applied to analyze the Pansori acoustic features of a representation of a society and era correlation extraction studies were performed. Pansori on the five yard spectrogram, pitch, stability and strength analysis for this experiment. Pansori through experimental results Comical story while keeping the audience focused and interested to better reflect the characteristics of energy for the wave of voice and vocal cord tremor change the width of a large, stable and voice with a loud voice, that expresses were analyzed.

Double-talk Control using Blind Signal Separation based on Geometric Concept in Acoustic Echo Canceller (음향반향제거기에서 기하학적 개념의 BSS를 이용한 동시통화 제어)

  • Lee, Haeng-Woo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.12 no.3
    • /
    • pp.419-426
    • /
    • 2017
  • This paper describes an acoustic echo canceller with double-talk using BSS(: Blind Signal Separation) based on the geometric concept. The acoustic echo canceller may be deteriorated or diverged during the double-talk period. So we use the blind signal separation to detect the double talking by separating the near-end speech signal from the mixed microphone signal. In the closed reverberation environment, the blind signal separation extracts the near-end signal from unknown signals with the transformation and rotation based on the geometric concept. By this method, the acoustic echo canceller operates irrespective of double-talking. We verified performances of the proposed acoustic echo canceller by computer simulations. The results show that the acoustic echo canceller with this algorithm detects the double-talk periods thoroughly, and operates stably in the normal state without diverging of coefficients after ending the double-talking.