Search | Korea Science

Evaluation of Frequency Warping Based Features and Spectro-Temporal Features for Speaker Recognition (화자인식을 위한 주파수 워핑 기반 특징 및 주파수-시간 특징 평가)

Choi, Young Ho;Ban, Sung Min;Kim, Kyung-Wha;Kim, Hyung Soon
- Phonetics and Speech Sciences
- /
- v.7 no.1
- /
- pp.3-10
- /
- 2015
In this paper, different frequency scales in cepstral feature extraction are evaluated for the text-independent speaker recognition. To this end, mel-frequency cepstral coefficients (MFCCs), linear frequency cepstral coefficients (LFCCs), and bilinear warped frequency cepstral coefficients (BWFCCs) are applied to the speaker recognition experiment. In addition, the spectro-temporal features extracted by the cepstral-time matrix (CTM) are examined as an alternative to the delta and delta-delta features. Experiments on the NIST speaker recognition evaluation (SRE) 2004 task are carried out using the Gaussian mixture model-universal background model (GMM-UBM) method and the joint factor analysis (JFA) method, both based on the ALIZE 3.0 toolkit. Experimental results using both the methods show that BWFCC with appropriate warping factor yields better performance than MFCC and LFCC. It is also shown that the feature set including the spectro-temporal information based on the CTM outperforms the conventional feature set including the delta and delta-delta features.
https://doi.org/10.13064/KSSS.2015.7.1.003 인용 PDF KSCI

Comparison of the recognition performance of Korean connected digit telephone speech depending on channel compensation methods and feature parameters (채널보상기법 및 특징파라미터에 따른 한국어 연속숫자음 전화음성의 인식성능 비교)

Jung Sung Yun;Kim Min Sung;Son Jong Mok;Bae Keun Sung;Kim Sang Hun
- Proceedings of the KSPS conference
- /
- 2002.11a
- /
- pp.201-204
- /
- 2002
As a preliminary study for improving recognition performance of the connected digit telephone speech, we investigate feature parameters as well as channel compensation methods of telephone speech. The CMN and RTCN are examined for telephone channel compensation, and the MFCC, DWFBA, SSC and their delta-features are examined as feature parameters. Recognition experiments with database we collected show that in feature level DWFBA is better than MFCC and for channel compensation RTCN is better than CMN. The DWFBA+Delta_ Mel-SSC feature shows the highest recognition rate.
PDF

Robust Feature Normalization Scheme Using Separated Eigenspace in Noisy Environments (분리된 고유공간을 이용한 잡음환경에 강인한 특징 정규화 기법)

Lee Yoonjae;Ko Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.24 no.4
- /
- pp.210-216
- /
- 2005
We Propose a new feature normalization scheme based on eigenspace for achieving robust speech recognition. In general, mean and variance normalization (MVN) is Performed in cepstral domain. However, another MVN approach using eigenspace was recently introduced. in that the eigenspace normalization Procedure Performs normalization in a single eigenspace. This Procedure consists of linear PCA matrix feature transformation followed by mean and variance normalization of the transformed cepstral feature. In this method. 39 dimensional feature distribution is represented using only a single eigenspace. However it is observed to be insufficient to represent all data distribution using only a sin91e eigenvector. For more specific representation. we apply unique na independent eigenspaces to cepstra, delta and delta-delta cepstra respectively in this Paper. We also normalize training data in eigenspace and get the model from the normalized training data. Finally. a feature space rotation procedure is introduced to reduce the mismatch of training and test data distribution in noisy condition. As a result, we obtained a substantial recognition improvement over the basic eigenspace normalization.
PDF KSCI

A Study on Automatic Classification of Fingerprint Images (지문 영상의 자동 분류에 관한 연구)

Lim, In-Sic;Sin, Tae-Min;Park, Goo-Man;Lee, Byeong-Rae;Park, Kyu-Tae
- Proceedings of the KIEE Conference
- /
- 1988.07a
- /
- pp.628-631
- /
- 1988
This paper describes a fingerprint classification on the basis of feature points(whorl, core) and feature vector and uses a syntactic approach to identify the shape of flow line around the core. Fingerprint image is divided into 8 by 8 subregions and fingerprint region is separated from background. For each subregion of fingerprint region, the dominant ridge direction is obtained to use the slit window quantized in 8 direction and relaxation is performed to correct ridge direction code. Feature points(whorl, core, delta) are found from the ridge direction code. First classification procedure divides the types of fingerprint into 4 class based on whorl and cores. The shape of flow line around the core is obtained by tracing for the fingerprint which has one core or two core and is represented as string. If the string is acceptable by LR(1) parser, feature vector is obtained from feature points(whorl, core, delta) and the shape of flow line around the core. Feature vector is used hierarchically and linearly to classify fingerprint again. The experiment resulted in 97.3 percentages of sucessful classification for 71 fingerprint impressions.
PDF

Effective Combination of Temporal Information and Linear Transformation of Feature Vector in Speaker Verification (화자확인에서 특징벡터의 순시 정보와 선형 변환의 효과적인 적용)

Seo, Chang-Woo;Zhao, Mei-Hua;Lim, Young-Hwan;Jeon, Sung-Chae
- Phonetics and Speech Sciences
- /
- v.1 no.4
- /
- pp.127-132
- /
- 2009
The feature vectors which are used in conventional speaker recognition (SR) systems may have many correlations between their neighbors. To improve the performance of the SR, many researchers adopted linear transformation method like principal component analysis (PCA). In general, the linear transformation of the feature vectors is based on concatenated form of the static features and their dynamic features. However, the linear transformation which based on both the static features and their dynamic features is more complex than that based on the static features alone due to the high order of the features. To overcome these problems, we propose an efficient method that applies linear transformation and temporal information of the features to reduce complexity and improve the performance in speaker verification (SV). The proposed method first performs a linear transformation by PCA coefficients. The delta parameters for temporal information are then obtained from the transformed features. The proposed method only requires 1/4 in the size of the covariance matrix compared with adding the static and their dynamic features for PCA coefficients. Also, the delta parameters are extracted from the linearly transformed features after the reduction of dimension in the static features. Compared with the PCA and conventional methods in terms of equal error rate (EER) in SV, the proposed method shows better performance while requiring less storage space and complexity.
PDF

RECOGNIZING SIX EMOTIONAL STATES USING SPEECH SIGNALS

Kang, Bong-Seok;Han, Chul-Hee;Youn, Dae-Hee;Lee, Chungyong
- Proceedings of the Korean Society for Emotion and Sensibility Conference
- /
- 2000.04a
- /
- pp.366-369
- /
- 2000
This paper examines three algorithms to recognize speaker's emotion using the speech signals. Target emotions are happiness, sadness, anger, fear, boredom and neutral state. MLB(Maximum-Likeligood Bayes), NN(Nearest Neighbor) and HMM (Hidden Markov Model) algorithms are used as the pattern matching techniques. In all cases, pitch and energy are used as the features. The feature vectors for MLB and NN are composed of pitch mean, pitch standard deviation, energy mean, energy standard deviation, etc. For HMM, vectors of delta pitch with delta-delta pitch and delta energy with delta-delta energy are used. We recorded a corpus of emotional speech data and performed the subjective evaluation for the data. The subjective recognition result was 56% and was compared with the classifiers' recognition rates. MLB, NN, and HMM classifiers achieved recognition rates of 68.9%, 69.3% and 89.1% respectively, for the speaker dependent, and context-independent classification.
PDF

A Study on Image Recognition based on the Characteristics of Retinal Cells (망막 세포 특성에 의한 영상인식에 관한 연구)

Cho, Jae-Hyun;Kim, Do-Hyeon;Kim, Kwang-Baek
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.11 no.11
- /
- pp.2143-2149
- /
- 2007
Visual Cortex Stimulator is among artificial retina prosthesis for blind man, is the method that stimulate the brain cell directly without processing the information from retina to visual cortex. In this paper, we propose image construction and recognition model that is similar to human visual processing by recognizing the feature data with orientation information, that is, the characteristics of visual cortex. Back propagation algorithm based on Delta-bar delta is used to recognize after extracting image feature by Kirsh edge detector. Various numerical patterns are used to analyze the performance of proposed method. In experiment, the proposed recognition model to extract image characteristics with the orientation of information from retinal cells to visual cortex makes a little difference in a recognition rate but shows that it is not sensitive in a variety of learning rates similar to human vision system.
https://doi.org/10.6109/jkiice.2007.11.11.2143 인용 PDF KSCI

Intra-and Inter-frame Features for Automatic Speech Recognition

Lee, Sung Joo;Kang, Byung Ok;Chung, Hoon;Lee, Yunkeun
- ETRI Journal
- /
- v.36 no.3
- /
- pp.514-517
- /
- 2014
In this paper, alternative dynamic features for speech recognition are proposed. The goal of this work is to improve speech recognition accuracy by deriving the representation of distinctive dynamic characteristics from a speech spectrum. This work was inspired by two temporal dynamics of a speech signal. One is the highly non-stationary nature of speech, and the other is the inter-frame change of a speech spectrum. We adopt the use of a sub-frame spectrum analyzer to capture very rapid spectral changes within a speech analysis frame. In addition, we attempt to measure spectral fluctuations of a more complex manner as opposed to traditional dynamic features such as delta or double-delta. To evaluate the proposed features, speech recognition tests over smartphone environments were conducted. The experimental results show that the feature streams simply combined with the proposed features are effective for an improvement in the recognition accuracy of a hidden Markov model-based speech recognizer.
https://doi.org/10.4218/etrij.14.0213.0181 인용 PDF KSCI KPUBS

New Data Extraction Method using the Difference in Speaker Recognition (화자인식에서 차분을 이용한 새로운 데이터 추출 방법)

Seo, Chang-Woo;Ko, Hee-Ae;Lim, Yong-Hwan;Choi, Min-Jung;Lee, Youn-Jeong
- Speech Sciences
- /
- v.15 no.3
- /
- pp.7-15
- /
- 2008
This paper proposes the method to extract new feature vectors using the difference between the cepstrum for static characteristics and delta cepstrum for dynamic characteristics in speaker recognition (SR). The difference vector (DV) which it proposes from this paper is containing the static and the dynamic characteristics simultaneously at the intermediate characteristic vector which uses the deference between the static and the dynamic characteristics and as the characteristic vector which is new there is a possibility of doing. Compared to the conventional method, the proposed method can achieve new feature vector without increasing of new parameter, but only need the calculation process for the difference between the cepstrum and delta cepstrum. Experimental results show that the proposed method has a good performance more than 2.03%, on average, compared with conventional method in speaker identification (SI).
PDF

A study on Effective Feature Parameters Comparison for Speaker Recognition (화자인식에 효과적인 특징벡터에 관한 비교연구)

Park TaeSun;Kim Sang-Jin;Kwang Moon;Hahn Minsoo
- Proceedings of the KSPS conference
- /
- 2003.05a
- /
- pp.145-148
- /
- 2003
In this paper, we carried out comparative study about various feature parameters for the effective speaker recognition such as LPC, LPCC, MFCC, Log Area Ratio, Reflection Coefficients, Inverse Sine, and Delta Parameter. We also adopted cepstral liftering and cepstral mean subtraction methods to check their usefulness. Our recognition system is HMM based one with 4 connected-Korean-digit speech database. Various experimental results will help to select the most effective parameter for speaker recognition.
PDF

Search Result 75, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)