• Title/Summary/Keyword: 음향음성학

Search Result 749, Processing Time 0.025 seconds

Histogram Equalization Using Centroids of Fuzzy C-Means of Background Speakers' Utterances for Majority Voting Based Speaker Identification (다수 투표 기반의 화자 식별을 위한 배경 화자 데이터의 퍼지 C-Means 중심을 이용한 히스토그램 등화기법)

  • Kim, Myung-Jae;Yang, Il-Ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.33 no.1
    • /
    • pp.68-74
    • /
    • 2014
  • In a previous work, we proposed a novel approach of histogram equalization using a supplement set which is composed of centroids of Fuzzy C-Means of the background utterances. The performance of the proposed method is affected by the size of the supplement set, but it is difficult to find the best size at the point of recognition. In this paper, we propose a histogram equalization using a supplement set for majority voting based speaker identification. The proposed method identifies test utterances using a majority voting on the histogram equalization methods with various sizes of supplement sets. The proposed method is compared with the conventional feature normalization methods such as CMN(Cepstral Mean Normalization), MVN(Mean and Variance Normalization), and HEQ(Histogram Equalization) and the histogram equalization method using a supplement set.

Sound Enhancement of low Sample rate Audio Using LMS in DWT Domain (DWT영역에서 LMS를 이용한 저 샘플링 비율 오디오 신호의 음질 향상)

  • 백수진;윤원중;박규식
    • The Journal of the Acoustical Society of Korea
    • /
    • v.23 no.1
    • /
    • pp.54-60
    • /
    • 2004
  • In order to mitigate the problems in storage space and network bandwidth for the full CD quality audio, current digital audio is always restricted by sampling rate and bandwidth. This restriction normally results in low sample rate audio or calls for the data compression scheme such as MP3. However, they can only reproduce a lower frequency range than a regular CD quality because of the Nyquist sampling theory. Consequently they lose rich spatial information embedded in high frequency. The propose of this paper is to propose efficient high frequency enhancement of low sample rate audio using n adaptive filtering and DWT analysis and synthesis. The proposed algorithm uses the LMS adaptive algorithm to estimate the missing high frequency contents in DWT domain and it then reconstructs the spectrally enhanced audio by using the DWT synthesis procedure. Several experiments with real speech and audio are performed and compared with other algorithm. From the experimental results of spectrogram and sonic test, we confirm that the proposed algorithm outperforms the other algorithm and reasonably works well for the most of audio cases.

A Study on Korean Phoneme Classification using Recursive Least-Square Algorithm (Recursive Least-Square 알고리즘을 이용한 한국어 음소분류에 관한 연구)

  • Kim, Hoe-Rin;Lee, Hwang-Su;Un, Jong-Gwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.6 no.3
    • /
    • pp.60-67
    • /
    • 1987
  • In this paper, a phoneme classification method for Korean speech recognition has been proposed and its performance has been studied. The phoneme classification has been done based on the phonemic features extracted by the prewindowed recursive least-square (PRLS) algorithm that is a kind of adaptive filter algorithms. Applying the PRLS algorithm to input speech signal, precise detection of phoneme boundaries has been made, Reference patterns of Korean phonemes have been generated by the ordinery vector quantization (VQ) of feature vectors obtained manualy from prototype regions of each phoneme. In order to obtain the performance of the proposed phoneme classification method, the method has been tested using spoken names of seven Korean cities which have eleven different consonants and eight different vowels. In the speaker-dependent phoneme classification, the accuracy is about $85\%$ considering simple phonemic rules of Korean language, while the accuracy of the speaker-independent case is far less than that of the speaker-dependent case.

  • PDF

Korean Word Recognition using the Transition Matrix of VQ-Code and DHMM (VQ코드의 천이 행렬과 이산 HMM을 이용한 한국어 단어인식)

  • Chung, Kwang-Woo;Hong, Kwang-Seok;Park, Byung-Chul
    • The Journal of the Acoustical Society of Korea
    • /
    • v.13 no.4
    • /
    • pp.40-49
    • /
    • 1994
  • In this paper, we propose methods for improving the performance of word recognition system. The ray stratey of the first method is to apply the inertia to the feature vector sequences of speech signal to stabilize the transitions between VQ cdoes. The second method is generating the new observation probabilities using the transition matrix of VQ codes as weights at the observation probability of the output symbol, so as to take into account the time relation between neighboring frames in DHMM. By applying the inertia to the feature vector sequences, we can reduce the overlapping of probability distribution of the response paths for each word and stabilize state transitions in the HMM. By using the transition matrix of VQ codes as weights in conventional DHMM. we can divide the probability distribution of feature vectors more and more, and restrict the feature distribution to a suitable region so that the performance of recognition system can improve. To evaluate the performance of the proposed methods, we carried out experiments for 50 DDD area names. As a result, the proposed methods improved the recognition rate by $4.2\%$ in the speaker-dependent test and $12.45\%$ in the speaker-independent test, respectively, compared with the conventional DHMM.

  • PDF

Performance Improvement of Mel-Cepstrum Through Optimzing Filter Banks (필터 뱅크 최적화에 의한 멜켑스트럼의 성능 향상)

  • 현동훈;이철희
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.1
    • /
    • pp.78-85
    • /
    • 1999
  • In this paper we propose a method to improve the performance of the mel-cepstrum that is widely used in speech recognition. Typically, the met-cepstrum is obtained by critical band filters that have fixed center spacing and bandwidth. However different filter characteristics produce a different mel-cepstrum, resulting in a different performance. In this paper we analyze triangular-shaped and rectangular-shaped filters. By changing the characteristics of filters such as center frequency and bandwidth, we analyze the performance of the met-cepstrum. Then utilizing the simplex method, we propose a method to optimize the critical band filters. Using the dynamic time warping, we performed speaker independent recognition experiments with Korean digit words pronounced by 10 males and 10 females. Experiments show that the rectangular-shaped filters show good performance and the mel-cepstrum obtained by the optimized filters shows better performance than filters that have fixed center spacing and bandwidth.

  • PDF

An Enhancement of Speaker Location System Using the Low-frequency Phase Restoration Algorithm and Its Implementation (저주파 위상 복원 알고리듬을 이용한 화자 위치 추적 시스템의 성능 개선과 구현)

  • 이학주;차일환;윤대희;이충용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.4
    • /
    • pp.22-28
    • /
    • 2001
  • This paper describes the implementation of a robust speaker position location system using the voice signal received by microphone array. To be robust to the reverberation which is the major factor of the performance degradation, low-frequency phase restoration algorithm which eliminates the influence of reverberations using the low-frequency information of the CPSP function is proposed. The implemented real-time system consists of a general purpose DSP (TMS320C31 of Texas instruments), analog part which contains amplifiers and filters, and digital part which is composed of the external memory and 12-bit A/D converter. In the real conference room environment, the implemented system that was constructed by the proposed algorithms showed better performance than the conventional system. The error of the TDOA estimation reduced more than 15 samples.

  • PDF

Gaussian Density Selection Method of CDHMM in Speaker Recognition (화자인식에서 연속밀도 은닉마코프모델의 혼합밀도 결정방법)

  • 서창우;이주헌;임재열;이기용
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.8
    • /
    • pp.711-716
    • /
    • 2003
  • This paper proposes the method to select the number of optimal mixtures in each state in Continuous Density HMM (Hidden Markov Models), Previously, researchers used the same number of mixture components in each state of HMM regardless spectral characteristic of speaker, To model each speaker as accurately as possible, we propose to use a different number of mixture components for each state, Selection of mixture components considered the probability value of mixture by each state that affects much parameter estimation of continuous density HMM, Also, we use PCA (principal component analysis) to reduce the correlation and obtain the system' stability when it is reduced the number of mixture components, We experiment it when the proposed method used average 10% small mixture components than the conventional HMM, When experiment result is only applied selection of mixture components, the proposed method could get the similar performance, When we used principal component analysis, the feature vector of the 16 order could get the performance decrease of average 0,35% and the 25 order performance improvement of average 0.65%.

A Study on the Performance of Music Retrieval Based on the Emotion Recognition (감정 인식을 통한 음악 검색 성능 분석)

  • Seo, Jin Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.34 no.3
    • /
    • pp.247-255
    • /
    • 2015
  • This paper presents a study on the performance of the music search based on the automatically recognized music-emotion labels. As in the other media data, such as speech, image, and video, a song can evoke certain emotions to the listeners. When people look for songs to listen, the emotions, evoked by songs, could be important points to consider. However; very little study has been done on the performance of the music-emotion labels to the music search. In this paper, we utilize the three axes of human music perception (valence, activity, tension) and the five basic emotion labels (happiness, sadness, tenderness, anger, fear) in measuring music similarity for music search. Experiments were conducted on both genre and singer datasets. The search accuracy of the proposed emotion-based music search was up to 75 % of that of the conventional feature-based music search. By combining the proposed emotion-based method with the feature-based method, we achieved up to 14 % improvement of search accuracy.

A Study on the Acoutical Characteristics of Last Consonants in Korean (국어 종성 자음의 음성학적 특징에 관한 연구)

  • Kim, Seon-Il;Hong, Ki-Won;Lee, Haing-Sei
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1
    • /
    • pp.65-72
    • /
    • 1995
  • An auditory experiments for the phonetic value of the last consonants when its signal is transmitted through the amplifier from the last to the first, shortly speaking, time reversed waveform, were done for the 14 Korean consonants. Then the last consonant becomes to the first consonant in the time reversed waveform. The listeners who heard the 14 reversed consonants have recorded the phonetic value being heard. We analyzed these results by the method of articulation and the position of articulation. By the results, the phonetic value of the last consonants /n/, /l/ and /m/ is the same as the first consonants. Last consonant /d/ is heard like first consonant /n/. Last consonant /ng/ is heard like first consonant /m/. Last consonants /k/ and /b/ don't have any particular phonetic values. These results were tested by the experiments and were analyzed by the principle of articulation.

  • PDF

Korean Word Recognition Using Vector Quantization Speaker Adaptation (벡터 양자화 화자적응기법을 사용한 한국어 단어 인식)

  • Choi, Kap-Seok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.10 no.4
    • /
    • pp.27-37
    • /
    • 1991
  • This paper proposes the ESFVQ(energy subspace fuzzy vector quantization) that employs energy subspaces to reduce the quantizing distortion which is less than that of a fuzzy vector quatization. The ESFVQ is applied to a speaker adaptation method by which Korean words spoken by unknown speakers are recognized. By generating mapped codebooks with fuzzy histogram according to each energy subspace in the training procedure and by decoding a spoken word through the ESFVQ in the recognition proecedure, we attempt to improve the recognition rate. The performance of the ESFVQ is evaluated by measuring the quantizing distortion and the speaker adaptive recognition rate for DDD telephone area names uttered by 2 males and 1 female. The quatizing distortion of the ESFVQ is reduced by 22% than that of a vector quantization and by 5% than that of a fuzzy vector quantization, and the speaker adaptive recognition rate of the ESFVQ is increased by 26% than that without a speaker adaptation and by 11% than that of a vector quantization.

  • PDF