통합 검색 | Korea Science

다층 퍼셉트론 신경회로망을 이용한 후두 질환 음성 식별 (Detection of Laryngeal Pathology in Speech Using Multilayer Perceptron Neural Networks)

강현민;김유신;김형순
- 대한음성학회:학술대회논문집
- /
- 대한음성학회 2002년도 11월 학술대회지
- /
- pp.115-118
- /
- 2002
Neural networks have been known to have great discriminative power in pattern classification problems. In this paper, the multilayer perceptron neural networks are employed to automatically detect laryngeal pathology in speech. Also new feature parameters are introduced which can reflect the periodicity of speech and its perturbation. These parameters and cepstral coefficients are used as input of the multilayer perceptron neural networks. According to the experiment using Korean disordered speech database, incorporation of new parameters with cepstral coefficients outperforms the case with only cepstral coefficients.
PDF

LPC 켑스트럼 계수를 이용한 특정인의 코골이 인식 (Snorer-Dependent Snore Recognition Using LPC Cepstral Coefficients)

최호선;장원규;이경중
- 대한전기학회논문지:시스템및제어부문D
- /
- 제52권9호
- /
- pp.554-559
- /
- 2003
In this paper the possibility of snorer-dependent snore recognition using cepstral coefficients was suggested. We assumed that snore and speech sounds have some similarities and we used cepstral coefficients which are widely used for speech recognition. Snoring data were acquired from 18 persons including 5 patients diagnosed as snore patient. To evaluate the performance of proposed method, the distance ratio based on LPC cepstral coefficients was selected as an index for snorer-dependent snore recognition. As a result, distance ratio of 3 was selected as optimal value showing the most efficient snorer-dependent snore recognition, which is high accuracy of 95.05％ on average. In conclusion, the proposed method showed the possibilities to be applied in clinical applications for snorer-dependent snore recognition.
PDF KSCI

화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가 (Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition)

최영호;반성민;김경화;김형순
- 말소리와 음성과학
- /
- 제7권1호
- /
- pp.3-10
- /
- 2015
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
https://doi.org/10.13064/KSSS.2015.7.1.003 인용 PDF KSCI

Filtering of Filter-Bank Energies for Robust Speech Recognition

Jung, Ho-Young
- ETRI Journal
- /
- 제26권3호
- /
- pp.273-276
- /
- 2004
We propose a novel feature processing technique which can provide a cepstral liftering effect in the log-spectral domain. Cepstral liftering aims at the equalization of variance of cepstral coefficients for the distance-based speech recognizer, and as a result, provides the robustness for additive noise and speaker variability. However, in the popular hidden Markov model based framework, cepstral liftering has no effect in recognition performance. We derive a filtering method in log-spectral domain corresponding to the cepstral liftering. The proposed method performs a high-pass filtering based on the decorrelation of filter-bank energies. We show that in noisy speech recognition, the proposed method reduces the error rate by 52.7% to conventional feature.
PDF

A 3-Level Endpoint Detection Algorithm for Isolated Speech Using Time and Frequency-based Features

Eng, Goh Kia;Ahmad, Abdul Manan
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2004년도 ICCAS
- /
- pp.1291-1295
- /
- 2004
This paper proposed a new approach for endpoint detection of isolated speech, which proves to significantly improve the endpoint detection performance. The proposed algorithm relies on the root mean square energy (rms energy), zero crossing rate and spectral characteristics of the speech signal where the Euclidean distance measure is adopted using cepstral coefficients to accurately detect the endpoint of isolated speech. The algorithm offers better performance than traditional energy-based algorithm. The vocabulary for the experiment includes English digit from one to nine. These experimental results were conducted by 360 utterances from a male speaker. Experimental results show that the accuracy of the algorithm is quite acceptable. Moreover, the computation overload of this algorithm is low since the cepstral coefficients parameters will be used in feature extraction later of speech recognition procedure.
PDF

청각장애인을 위한 상황인지기반의 음향강화기술 (Sound Reinforcement Based on Context Awareness for Hearing Impaired)

최재훈;장준혁
- 대한전자공학회논문지SP
- /
- 제48권5호
- /
- pp.109-114
- /
- 2011
본 논문에서는 청각장애인을 위한 음향 데이터를 이용한 음향강화 알고리즘을 Gaussian Mixture Model (GMM)을 이용한 상황인지 시스템 기반으로 제안한다. 음향 신호 데이터에서 Mel-Frequency Cepstral Coefficients (MFCC) 특징벡터를 추출하여 GMM을 구성하고 이를 기반으로 상황인지 결과에 따라 위험음향일 경우 음향강화기술을 제안한다. 실험결과 제안된 상황인지 기반의 음향강화 알고리즘이 다양한 음향학적 환경에서 우수한 성능을 보인 것을 알 수 있었다.
PDF KSCI

Text-Independent Speaker Identification System Based On Vowel And Incremental Learning Neural Networks

Heo, Kwang-Seung;Lee, Dong-Wook;Sim, Kwee-Bo
- 제어로봇시스템학회:학술대회논문집
- /
- 제어로봇시스템학회 2003년도 ICCAS
- /
- pp.1042-1045
- /
- 2003
In this paper, we propose the speaker identification system that uses vowel that has speaker's characteristic. System is divided to speech feature extraction part and speaker identification part. Speech feature extraction part extracts speaker's feature. Voiced speech has the characteristic that divides speakers. For vowel extraction, formants are used in voiced speech through frequency analysis. Vowel-a that different formants is extracted in text. Pitch, formant, intensity, log area ratio, LP coefficients, cepstral coefficients are used by method to draw characteristic. The cpestral coefficients that show the best performance in speaker identification among several methods are used. Speaker identification part distinguishes speaker using Neural Network. 12 order cepstral coefficients are used learning input data. Neural Network's structure is MLP and learning algorithm is BP (Backpropagation). Hidden nodes and output nodes are incremented. The nodes in the incremental learning neural network are interconnected via weighted links and each node in a layer is generally connected to each node in the succeeding layer leaving the output node to provide output for the network. Though the vowel extract and incremental learning, the proposed system uses low learning data and reduces learning time and improves identification rate.
PDF

포만트 기반의 가우시안 분포를 가지는 필터뱅크를 이용한 멜-주파수 켑스트럴 계수 (Mel-Frequency Cepstral Coefficients Using Formants-Based Gaussian Distribution Filterbank)

손영우;홍재근
- 한국음향학회지
- /
- 제25권8호
- /
- pp.370-374
- /
- 2006
음성인식의 특징벡터로서 멜-주파수 켑스트럴 계수 (MFCC, mel-frequency cepstral coefficients)가 가장 널리 사용되고 있다. FMCC 추출과정은 입력되는 음성신호를 푸리에 변환한 후, 주파수 대역별로 필터를 취하여 에너지 값을 구하고 이산 코사인 변환을 하여 그 계수 값을 구한다. 본 논문에서는 멜-스케일 된 주파수 대역필터를 취할 때 가중함수에 의해서 구해진 각 대역필터별 가중치를 적용하여 필터의 출력 에너지를 계산한다. 여기서 가중치를 구하기 위해 사용된 가중함수는 포만트가 존재하는 대역을 중심으로 인접한 대역들이 가우시안 분포를 가지는 함수이다. 제안한 방법으로 실험한 결과, 잡음이 거의 없는 음성신호에 대해서는 기존의 MFCC를 사용했을 때와 비슷한 인식률을 보이고 잡음성분이 많을수록 가중치가 적용된 방법이 인식률에서 보다 높은 성능 향상을 가져온다.
https://doi.org/10.7776/ASK.2006.25.8.370 인용 PDF KSCI

뇌파를 이용한 의자의 쾌적성 평가 기술에 관한 연구 (A Study on Comfortableness Evaluation Technique of Chairs using Electroencephalogram)

김동준
- 대한전기학회논문지:시스템및제어부문D
- /
- 제52권12호
- /
- pp.702-707
- /
- 2003
This study describes a new technique for human sensibility evaluation using electroencephalogram(EEG). Production of EEG is assumed to be linear. The linear predictor coefficients and the linear cepstral coefficients of EEG are used as the feature parameters of sensibility and pattern classification performances of them are compared. Using the better parameter, a human sensibility evaluation algorithm is designed. The obtained results are as follows. The linear predictor coefficients showed the better performance in pattern classification than the linear cepstral coefficients. Then, using the linear predictor coefficients as the feature parameter, a human sensibility evaluation algorithm is developed at the base of a multi-layer neural network. This algorithm showed 90% of accuracy in comfortableness evaluation in spite of fluctuations in statistics of EEG signal.
PDF KSCI

멜 켑스트럼 모듈레이션 에너지를 이용한 음성/음악 판별 (Speech/Music Discrimination Using Mel-Cepstrum Modulation Energy)

김봉완;최대림;이용주
- 대한음성학회지:말소리
- /
- 제64호
- /
- pp.89-103
- /
- 2007
In this paper, we introduce mel-cepstrum modulation energy (MCME) for a feature to discriminate speech and music data. MCME is a mel-cepstrum domain extension of modulation energy (ME). MCME is extracted on the time trajectory of Mel-frequency cepstral coefficients, while ME is based on the spectrum. As cepstral coefficients are mutually uncorrelated, we expect the MCME to perform better than the ME. To find out the best modulation frequency for MCME, we perform experiments with 4 Hz to 20 Hz modulation frequency. To show effectiveness of the proposed feature, MCME, we compare the discrimination accuracy with the results obtained from the ME and the cepstral flux.
PDF

검색결과 114건 처리시간 0.025초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)