• Title/Summary/Keyword: Auditory Signal

Search Result 176, Processing Time 0.02 seconds

Content Based Classification of Audio Signal using Discriminant Function (식별함수를 이용한 오디오신호의 내용기반 분류)

  • Kim, Young-Sub;Lee, Kwang-Seok;Koh, Si-Young;Hur, Kang-In
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2007.06a
    • /
    • pp.201-204
    • /
    • 2007
  • In this paper, we research the content-based analysis and classification according to the composition of the feature parameters pool for the auditory signals to implement the auditory indexing and searching system. Auditory data is classified to the primitive various auditory types. we described the analysis and feature extraction method for the feature parameters available to the auditory data classification. And we compose the feature parameters pool in the indexing group unit, then compare and analysis the auditory data centering around the including level and indexing criterion into the audio categories. Based on this result, we composit feature vectors of audio data according to the classification categories, then experiment the classification using discrimination function.

  • PDF

The Evaluation Structure of Auditory Images on the Streetscapes - The Semantic Issues of Soundscape based on the Students' Fieldwork - (거리경관에 대한 청각적 이미지의 평가구조 - 대학생들의 음풍경 체험을 통한 의미론적 고찰 -)

  • Han Myung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.8
    • /
    • pp.481-491
    • /
    • 2005
  • The purpose of this study is to interpret the evaluation structure of auditory images about streetscapes in urban area on the basis of the semantic view of soundscapes. Using the caption evaluation method. which is a new method, from 2001 to 2005, a total of 45 college students participated in a fieldwork to find out the images of sounds while walking on the main streets of Namwon city. It was able get various data which include elements, features, impressions, and preferences about auditory scene. In Namwon city, the elements of the formation of auditory images are classified into natural sound and artificial sound which include machinery sounds, community sounds. and signal sounds. Also, the features of the auditory scene are classified by kind of sound, behavior, condition, character, relationship of circumference and image. Finally, the impression of auditory scene is classified into three categories, which are the emotions of humans, atmosphere of the streets, and the characteristics of the sound itself. From the relationship between auditory scene and estimation, the elements, features and impressions of auditory scene consist of the items which are positive, neutral, and negative images. Also, it was able to grasp the characteristics of auditory image of place or space through the evaluation model of streetscapes in Namwon city.

Digital Watermarking Using Psychoacoustic Model

  • Poomdaeng, S.;Toomnark, S.;Amornraksa, T.
    • Proceedings of the IEEK Conference
    • /
    • 2002.07b
    • /
    • pp.872-875
    • /
    • 2002
  • A digital watermarking technique applying psychoacoustic model for audio signal is proposed in this paper. In the watermarking scheme, the pseudo-random bit stream used as a watermark signal is embedded into the audio signal in both speech and music. The strength of the embedded signal is subject to the human auditory system in such a way that the disturbances on host audio signal are beyond the sensing of human ears. The experimental results show that the quality of the watermarked audio signal, in term of signal to noise ratio, can be improved up to 3.2 dB.

  • PDF

Speech Recognition Performance Improvement using Gamma-tone Feature Extraction Acoustic Model (감마톤 특징 추출 음향 모델을 이용한 음성 인식 성능 향상)

  • Ahn, Chan-Shik;Choi, Ki-Ho
    • Journal of Digital Convergence
    • /
    • v.11 no.7
    • /
    • pp.209-214
    • /
    • 2013
  • Improve the recognition performance of speech recognition systems as a method for recognizing human listening skills were incorporated into the system. In noisy environments by separating the speech signal and noise, select the desired speech signal. but In terms of practical performance of speech recognition systems are factors. According to recognized environmental changes due to noise speech detection is not accurate and learning model does not match. In this paper, to improve the speech recognition feature extraction using gamma tone and learning model using acoustic model was proposed. The proposed method the feature extraction using auditory scene analysis for human auditory perception was reflected In the process of learning models for recognition. For performance evaluation in noisy environments, -10dB, -5dB noise in the signal was performed to remove 3.12dB, 2.04dB SNR improvement in performance was confirmed.

Speech Feature Extraction based on Spikegram for Phoneme Recognition (음소 인식을 위한 스파이크그램 기반의 음성 특성 추출 기술)

  • Han, Seokhyeon;Kim, Jaewon;An, Soonho;Shin, Seonghyeon;Park, Hochong
    • Journal of Broadcast Engineering
    • /
    • v.24 no.5
    • /
    • pp.735-742
    • /
    • 2019
  • In this paper, we propose a method of extracting speech features for phoneme recognition based on spikegram. The Fourier-transform-based features are widely used in phoneme recognition, but they are not extracted in a biologically plausible way and cannot have high temporal resolution due to the frame-based operation. For better phoneme recognition, therefore, it is desirable to have a new method of extracting speech features, which analyzes speech signal in high temporal resolution following the model of human auditory system. In this paper, we analyze speech signal based on a spikegram that models feature extraction and transmission in auditory system, and then propose a method of feature extraction from the spikegram for phoneme recognition. We evaluate the performance of proposed features by using a DNN-based phoneme recognizer and confirm that the proposed features provide better performance than the Fourier-transform-based features for short-length phonemes. From this result, we can verify the feasibility of new speech features extracted based on auditory model for phoneme recognition.

Masking Level Difference: Performance of School Children Aged 7-12 Years

  • de Carvalho, Nadia Giulian;do Amaral, Maria Isabel Ramos;de Barros, Vinicius Zuffo;dos Santos, Maria Francisca Colella
    • Journal of Audiology & Otology
    • /
    • v.25 no.2
    • /
    • pp.65-71
    • /
    • 2021
  • Background and Objectives: In masking level difference (MLD), the masked detection threshold for a signal is determined as a function of the relative interaural differences between the signal and the masker. Study 1 analyzed the results of school-aged children with good school performance in the MLD test, and study 2 compared their results with those of a group of children with poor academic performance. Subjects and Methods: Study 1 was conducted with 47 school-aged children with good academic performance (GI) and study 2 was carried out with 32 school-aged children with poor academic performance (GII). The inclusion criteria adopted for both studies were hearing thresholds within normal limits in basic audiological evaluation. Study 1 also considered normal performance in the central auditory processing test battery and absence of auditory complaints and/or of attention, language or speech issues. The MLD test was administered with a pure pulsatile tone of 500 Hz, in a binaural mode and intensity of 50 dBSL, using a CD player and audiometer. Results: In study 1, no significant correlation was observed, considering the influence of the variables age and sex in relation to the results obtained in homophase (SoNo), antiphase (SπNo) and MLD threshold conditions. The final mean MLD threshold was 13.66 dB. In study 2, the variables did not influence the test performance either. There was a significant difference between test results in SπNo conditions of the two groups, while no differences were found both in SoNo conditions and the final result of MLD. Conclusions: In study 1, the cut-off criterion of school-aged children in the MLD test was 9.3 dB. The variables (sex and age) did not interfere with the MLD results. In study 2, school performance did not differ in the MLD results. GII group showed inferior results than GI group, only in SπNo condition.

Masking Level Difference: Performance of School Children Aged 7-12 Years

  • de Carvalho, Nadia Giulian;do Amaral, Maria Isabel Ramos;de Barros, Vinicius Zuffo;dos Santos, Maria Francisca Colella
    • Korean Journal of Audiology
    • /
    • v.25 no.2
    • /
    • pp.65-71
    • /
    • 2021
  • Background and Objectives: In masking level difference (MLD), the masked detection threshold for a signal is determined as a function of the relative interaural differences between the signal and the masker. Study 1 analyzed the results of school-aged children with good school performance in the MLD test, and study 2 compared their results with those of a group of children with poor academic performance. Subjects and Methods: Study 1 was conducted with 47 school-aged children with good academic performance (GI) and study 2 was carried out with 32 school-aged children with poor academic performance (GII). The inclusion criteria adopted for both studies were hearing thresholds within normal limits in basic audiological evaluation. Study 1 also considered normal performance in the central auditory processing test battery and absence of auditory complaints and/or of attention, language or speech issues. The MLD test was administered with a pure pulsatile tone of 500 Hz, in a binaural mode and intensity of 50 dBSL, using a CD player and audiometer. Results: In study 1, no significant correlation was observed, considering the influence of the variables age and sex in relation to the results obtained in homophase (SoNo), antiphase (SπNo) and MLD threshold conditions. The final mean MLD threshold was 13.66 dB. In study 2, the variables did not influence the test performance either. There was a significant difference between test results in SπNo conditions of the two groups, while no differences were found both in SoNo conditions and the final result of MLD. Conclusions: In study 1, the cut-off criterion of school-aged children in the MLD test was 9.3 dB. The variables (sex and age) did not interfere with the MLD results. In study 2, school performance did not differ in the MLD results. GII group showed inferior results than GI group, only in SπNo condition.

CASA Based Approach to Estimate Acoustic Transfer Function Ratios (CASA 기반의 마이크간 전달함수 비 추정 알고리즘)

  • Shin, Minkyu;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.54-59
    • /
    • 2014
  • Identification of RTF (Relative Transfer Function) between sensors is essential to multichannel speech enhancement system. In this paper, we present an approach for estimating the relative transfer function of speech signal. This method adapts a CASA (Computational Auditory Scene Analysis) technique to the conventional OM-LSA (Optimally-Modified Log-Spectral Amplitude) based approach. Evaluation of the proposed approach is performed under simulated stationary and nonstationary WGN (White Gaussian Noise). Experimental results confirm advantages of the proposed approach.

Emotion recognition from speech using Gammatone auditory filterbank

  • Le, Ba-Vui;Lee, Young-Koo;Lee, Sung-Young
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2011.06a
    • /
    • pp.255-258
    • /
    • 2011
  • An application of Gammatone auditory filterbank for emotion recognition from speech is described in this paper. Gammatone filterbank is a bank of Gammatone filters which are used as a preprocessing stage before applying feature extraction methods to get the most relevant features for emotion recognition from speech. In the feature extraction step, the energy value of output signal of each filter is computed and combined with other of all filters to produce a feature vector for the learning step. A feature vector is estimated in a short time period of input speech signal to take the advantage of dependence on time domain. Finally, in the learning step, Hidden Markov Model (HMM) is used to create a model for each emotion class and recognize a particular input emotional speech. In the experiment, feature extraction based on Gammatone filterbank (GTF) shows the better outcomes in comparison with features based on Mel-Frequency Cepstral Coefficient (MFCC) which is a well-known feature extraction for speech recognition as well as emotion recognition from speech.

Quasi-Optimal Linear Recursive DOA Tracking of Moving Acoustic Source for Cognitive Robot Auditory System (인지로봇 청각시스템을 위한 의사최적 이동음원 도래각 추적 필터)

  • Han, Seul-Ki;Ra, Won-Sang;Whang, Ick-Ho;Park, Jin-Bae
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.17 no.3
    • /
    • pp.211-217
    • /
    • 2011
  • This paper proposes a quasi-optimal linear DOA (Direction-of-Arrival) estimator which is necessary for the development of a real-time robot auditory system tracking moving acoustic source. It is well known that the use of conventional nonlinear filtering schemes may result in the severe performance degradation of DOA estimation and not be preferable for real-time implementation. These are mainly due to the inherent nonlinearity of the acoustic signal model used for DOA estimation. This motivates us to consider a new uncertain linear acoustic signal model based on the linear prediction relation of a noisy sinusoid. Using the suggested measurement model, it is shown that the resultant DOA estimation problem is cast into the NCRKF (Non-Conservative Robust Kalman Filtering) problem [12]. NCRKF-based DOA estimator provides reliable DOA estimates of a fast moving acoustic source in spite of using the noise-corrupted measurement matrix in the filter recursion and, as well, it is suitable for real-time implementation because of its linear recursive filter structure. The computational efficiency and DOA estimation performance of the proposed method are evaluated through the computer simulations.