• Title/Summary/Keyword: Gaussian Mixture Models

Search Result 98, Processing Time 0.021 seconds

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

Fire-Smoke Detection Based on Video using Dynamic Bayesian Networks (동적 베이지안 네트워크를 이용한 동영상 기반의 화재연기감지)

  • Lee, In-Gyu;Ko, Byung-Chul;Nam, Jae-Yeol
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.34 no.4C
    • /
    • pp.388-396
    • /
    • 2009
  • This paper proposes a new fire-smoke detection method by using extracted features from camera images and pattern recognition technique. First, moving regions are detected by analyzing the frame difference between two consecutive images and generate candidate smoke regions by applying smoke color model. A smoke region generally has a few characteristics such as similar color, simple texture and upward motion. From these characteristics, we extract brightness, wavelet high frequency and motion vector as features. Also probability density functions of three features are generated using training data. Probabilistic models of smoke region are then applied to observation nodes of our proposed Dynamic Bayesian Networks (DBN) for considering time continuity. The proposed algorithm was successfully applied to various fire-smoke tasks not only forest smokes but also real-world smokes and showed better detection performance than previous method.

CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm (CTC를 적용한 CRNN 기반 한국어 음소인식 모델 연구)

  • Hong, Yoonseok;Ki, Kyungseo;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.3
    • /
    • pp.115-122
    • /
    • 2019
  • For Korean phoneme recognition, Hidden Markov-Gaussian Mixture model(HMM-GMM) or hybrid models which combine artificial neural network with HMM have been mainly used. However, current approach has limitations in that such models require force-aligned corpus training data that is manually annotated by experts. Recently, researchers used neural network based phoneme recognition model which combines recurrent neural network(RNN)-based structure with connectionist temporal classification(CTC) algorithm to overcome the problem of obtaining manually annotated training data. Yet, in terms of implementation, these RNN-based models have another difficulty in that the amount of data gets larger as the structure gets more sophisticated. This problem of large data size is particularly problematic in the Korean language, which lacks refined corpora. In this study, we introduce CTC algorithm that does not require force-alignment to create a Korean phoneme recognition model. Specifically, the phoneme recognition model is based on convolutional neural network(CNN) which requires relatively small amount of data and can be trained faster when compared to RNN based models. We present the results from two different experiments and a resulting best performing phoneme recognition model which distinguishes 49 Korean phonemes. The best performing phoneme recognition model combines CNN with 3hop Bidirectional LSTM with the final Phoneme Error Rate(PER) at 3.26. The PER is a considerable improvement compared to existing Korean phoneme recognition models that report PER ranging from 10 to 12.

Convergence performance comparison using combination of ML-SVM, PCA, VBM and GMM for detection of AD (알츠하이머 병의 검출을 위한 ML-SVM, PCA, VBM, GMM을 결합한 융합적 성능 비교)

  • Alam, Saurar;Kwon, Goo-Rak
    • Journal of the Korea Convergence Society
    • /
    • v.7 no.4
    • /
    • pp.1-7
    • /
    • 2016
  • Structural MRI(sMRI) imaging is used to extract morphometric features after Grey Matter (GM), White Matter (WM) for several univariate and multivariate method, and Cerebro-spinal Fluid (CSF) segmentation. A new approach is applied for the diagnosis of very mild to mild AD. We propose the classification method of Alzheimer disease patients from normal controls by combining morphometric features and Gaussian Mixture Models parameters along with MMSE (Mini Mental State Examination) score. The combined features are fed into Multi-kernel SVM classifier after getting rid of curse of dimensionality using principal component analysis. The experimenral results of the proposed diagnosis method yield up to 96% stratification accuracy with Multi-kernel SVM along with high sensitivity and specificity above 90%.

Noise Rabust Speaker Verification Using Sub-Band Weighting (서브밴드 가중치를 이용한 잡음에 강인한 화자검증)

  • Kim, Sung-Tak;Ji, Mi-Kyong;Kim, Hoi-Rin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3
    • /
    • pp.279-284
    • /
    • 2009
  • Speaker verification determines whether the claimed speaker is accepted based on the score of the test utterance. In recent years, methods based on Gaussian mixture models and universal background model have been the dominant approaches for text-independent speaker verification. These speaker verification systems based on these methods provide very good performance under laboratory conditions. However, in real situations, the performance of speaker verification system is degraded dramatically. For overcoming this performance degradation, the feature recombination method was proposed, but this method had a drawback that whole sub-band feature vectors are used to compute the likelihood scores. To deal with this drawback, a modified feature recombination method which can use each sub-band likelihood score independently was proposed in our previous research. In this paper, we propose a sub-band weighting method based on sub-band signal-to-noise ratio which is combined with previously proposed modified feature recombination. This proposed method reduces errors by 28% compared with the conventional feature recombination method.

Railway Track Extraction from Mobile Laser Scanning Data (모바일 레이저 스캐닝 데이터로부터 철도 선로 추출에 관한 연구)

  • Yoonseok, Jwa;Gunho, Sohn;Jong Un, Won;Wonchoon, Lee;Nakhyeon, Song
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.33 no.2
    • /
    • pp.111-122
    • /
    • 2015
  • This study purposed on introducing a new automated solution for detecting railway tracks and reconstructing track models from the mobile laser scanning data. The proposed solution completes following procedures; the study initiated with detecting a potential railway region, called Region Of Interest (ROI), and approximating the orientation of railway track trajectory with the raw data. At next, the knowledge-based detection of railway tracks was performed for localizing track candidates in the first strip. In here, a strip -referring the local track search region- is generated in the orthogonal direction to the orientation of track trajectory. Lastly, an initial track model generated over the candidate points, which were detected by GMM-EM (Gaussian Mixture Model-Expectation & Maximization) -based clustering strip- wisely grows to capture all track points of interest and thus converted into geometric track model in the tracking by detection framework. Therefore, the proposed railway track tracking process includes following key features; it is able to reduce the complexity in detecting track points by using a hypothetical track model. Also, it enhances the efficiency of track modeling process by simultaneously capturing track points and modeling tracks that resulted in the minimization of data processing time and cost. The proposed method was developed using the C++ program language and was evaluated by the LiDAR data, which was acquired from MMS over an urban railway track area with a complex railway scene as well.

A Method of Detecting the Aggressive Driving of Elderly Driver (노인 운전자의 공격적인 운전 상태 검출 기법)

  • Koh, Dong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.537-542
    • /
    • 2017
  • Aggressive driving is a major cause of car accidents. Previous studies have mainly analyzed young driver's aggressive driving tendency, yet they were only done through pure clustering or classification technique of machine learning. However, since elderly people have different driving habits due to their fragile physical conditions, it is necessary to develop a new method such as enhancing the characteristics of driving data to properly analyze aggressive driving of elderly drivers. In this study, acceleration data collected from a smartphone of a driving vehicle is analyzed by a newly proposed ECA(Enhanced Clustering method for Acceleration data) technique, coupled with a conventional clustering technique (K-means Clustering, Expectation-maximization algorithm). ECA selects high-intensity data among the data of the cluster group detected through K-means and EM in all of the subjects' data and models the characteristic data through the scaled value. Using this method, the aggressive driving data of all youth and elderly experiment participants were collected, unlike the pure clustering method. We further found that the K-means clustering has higher detection efficiency than EM method. Also, the results of K-means clustering demonstrate that a young driver has a driving strength 1.29 times higher than that of an elderly driver. In conclusion, the proposed method of our research is able to detect aggressive driving maneuvers from data of the elderly having low operating intensity. The proposed method is able to construct a customized safe driving system for the elderly driver. In the future, it will be possible to detect abnormal driving conditions and to use the collected data for early warning to drivers.

Clustering of sediment characteristics in South Korean rivers and its expanded application strategy to H-ADCP based suspended sediment concentration monitoring technique (한국 하천의 지역별 유사특성의 군집화와 H-ADCP 기반 부유사 농도 관측 기법에의 활용 방안)

  • Noh, Hyoseob;Son, GeunSoo;Kim, Dongsu;Park, Yong Sung
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.1
    • /
    • pp.43-57
    • /
    • 2022
  • Advances in measurement techniques have reduced measurement costs and enhanced safety resulting in less uncertainty. For example, an acoustic doppler current profiler (ADCP) based suspended sediment concentration (SSC) measurement technique is being accepted as an alternative to the conventional data collection method. In Korean rivers, horizontal ADCPs (H-ADCPs) are mounted on the automatic discharge monitoring stations, where SSC can be measured using the backscatter of ADCPs. However, automatic discharge monitoring stations and sediment monitoring stations do not always coincide which hinders the application of the new techniques that are not feasible to some stations. This work presents and analyzes H-ADCP-SSC models for 9 discharge monitoring stations in Korean rivers. In application of the Gaussian mixture model (GMM) to sediment-related variables (catchment area, particle size distributions of suspended sediment and bed material, water discharge-sediment discharge curves) from 44 sediment monitoring stations, it is revealed that those characteristics can distinguish sediment monitoring stations regionally. Linking the two results, we propose a protocol determining the H-ADCP-SSC model where no H-ADCP-SSC model is available.