• Title/Summary/Keyword: GMM모델

Search Result 131, Processing Time 0.025 seconds

An Improved Speech Absence Probability Estimation based on Environmental Noise Classification (환경잡음분류 기반의 향상된 음성부재확률 추정)

  • Son, Young-Ho;Park, Yun-Sik;An, Hong-Sub;Lee, Sang-Min
    • The Journal of the Acoustical Society of Korea
    • /
    • v.30 no.7
    • /
    • pp.383-389
    • /
    • 2011
  • In this paper, we propose a improved speech absence probability estimation algorithm by applying environmental noise classification for speech enhancement. The previous speech absence probability required to seek a priori probability of speech absence was derived by applying microphone input signal and the noise signal based on the estimated value of a posteriori SNR threshold. In this paper, the proposed algorithm estimates the speech absence probability using noise classification algorithm which is based on Gaussian mixture model in order to apply the optimal parameter each noise types, unlike the conventional fixed threshold and smoothing parameter. Performance of the proposed enhancement algorithm is evaluated by ITU-T P.862 PESQ (perceptual evaluation of speech quality) and composite measure under various noise environments. It is verified that the proposed algorithm yields better results compared to the conventional speech absence probability estimation algorithm.

Infrared Image Segmentation by Extracting and Merging Region of Interest (관심영역 추출과 통합에 의한 적외선 영상 분할)

  • Yeom, Seokwon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.6
    • /
    • pp.493-497
    • /
    • 2016
  • Infrared (IR) imaging is capable of detecting targets that are not visible at night, thus it has been widely used for the security and defense system. However, the quality of the IR image is often degraded by low resolution and noise corruption. This paper addresses target segmentation with the IR image. Multiple regions of interest (ROI) are extracted by the multi-level segmentation and targets are segmented from the individual ROI. Each level of the multi-level segmentation is composed of a k-means clustering algorithm an expectation-maximization (EM) algorithm, and a decision process. The k-means clustering algorithm initializes the parameters of the Gaussian mixture model (GMM) and the EM algorithm iteratively estimates those parameters. Each pixel is assigned to one of clusters during the decision. This paper proposes the selection and the merging of the extracted ROIs. ROI regions are selectively merged in order to include the overlapped ROI windows. In the experiments, the proposed method is tested on an IR image capturing two pedestrians at night. The performance is compared with conventional methods showing that the proposed method outperforms others.

Comparison of drone-based hyperspectral and multispectral imagery for bathymetry mapping (드론기반 초분광영상과 다분광영상을 활용한 수심산정 비교)

  • Yeonghwa Gwon;Dongsu Kim;Siyoon Kwon;Hojun You
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.54-54
    • /
    • 2023
  • 하천유역조사는 관련 법률의 규정에 의해 물관리정책의 수립에 필요한 기초정보를 제공하는 것을 목적으로 기본현황, 이수, 치수 환경생태 등 유역관리에 필요한 주요 조사항목을 대상으로 수행되고 있다. 조사방법 중 원격탐사자료 활용한 조사는 드론 모니터링 영상 및 위성영상자료를 이용해 댐·제방과 같은 치수 시설물의 안전관리, 수질 모니터링, 하천지형조사, 하상변동조사 등에 활용되고 있다. 최근에는 일반 RGB 영상뿐만 아니라 수백개의 분광밴드를 포함한 초분광영상을 이용한 하천조사 연구가 이루어지고 있다. 초분광영상은 분광해상도가 높아 다항목 조사에 활용할 수 있다는 장점이 있지만, 많은 양의 분광정보를 포함하고 있기 때문에 초기 수집 자료의 용량이 너무 크고, 분석을 위한 전처리 과정이 까다롭다는 단점이 있다. 반면, 10개 이하 밴드의 분광정보를 수집하는 다분광영상은 2개 밴드를 이용해 정규식생지수(NDVI)를 즉각적으로 모니터링할 수 있고, 작물의 생육현황 등을 분석할 수 있어 농업 및 산림분야에서 널리 활용되고 있다. 초분광영상을 이용한 수심산정 연구는 최적 밴드비 탐색 기법(OBRA)을 활용해 측정수심과 상관관계가 높은 밴드비를 이용해 수심맵을 구축하는 방식이 활용되어왔다. 본 연구에서는 기존의 초분광영상을 활용한 수심산정기법을 다분광영상에 적용하여 분광밴드수가 축소된(경량화된) 자료를 활용한 수심산정 가능성을 확인하기 위해 동일한 현장에서 초분광과 다분광 두가지 영상을 촬영하였으며, 각각 수심맵을 구축해 하천분야에서 다분광영상의 활용도를 평가하였다. 또한, 기존의 OBRA의 한계를 개선하기 위해 가우시안 혼합 모델(GMM; Gaussian Mixture Model)을 활용해 영상을 군집화하여 수심산정 정확도를 개선하였다.

  • PDF

Compromised feature normalization method for deep neural network based speech recognition (심층신경망 기반의 음성인식을 위한 절충된 특징 정규화 방식)

  • Kim, Min Sik;Kim, Hyung Soon
    • Phonetics and Speech Sciences
    • /
    • v.12 no.3
    • /
    • pp.65-71
    • /
    • 2020
  • Feature normalization is a method to reduce the effect of environmental mismatch between the training and test conditions through the normalization of statistical characteristics of acoustic feature parameters. It demonstrates excellent performance improvement in the traditional Gaussian mixture model-hidden Markov model (GMM-HMM)-based speech recognition system. However, in a deep neural network (DNN)-based speech recognition system, minimizing the effects of environmental mismatch does not necessarily lead to the best performance improvement. In this paper, we attribute the cause of this phenomenon to information loss due to excessive feature normalization. We investigate whether there is a feature normalization method that maximizes the speech recognition performance by properly reducing the impact of environmental mismatch, while preserving useful information for training acoustic models. To this end, we introduce the mean and exponentiated variance normalization (MEVN), which is a compromise between the mean normalization (MN) and the mean and variance normalization (MVN), and compare the performance of DNN-based speech recognition system in noisy and reverberant environments according to the degree of variance normalization. Experimental results reveal that a slight performance improvement is obtained with the MEVN over the MN and the MVN, depending on the degree of variance normalization.

A New Face Tracking and Recognition Method Adapted to the Environment (환경에 적응적인 얼굴 추적 및 인식 방법)

  • Ju, Myung-Ho;Kang, Hang-Bong
    • The KIPS Transactions:PartB
    • /
    • v.16B no.5
    • /
    • pp.385-394
    • /
    • 2009
  • Face tracking and recognition are difficult problems because the face is a non-rigid object. The main reasons for the failure to track and recognize the faces are the changes of a face pose and environmental illumination. To solve these problems, we propose a nonlinear manifold framework for the face pose and the face illumination normalization processing. Specifically, to track and recognize a face on the video that has various pose variations, we approximate a face pose density to single Gaussian density by PCA(Principle Component Analysis) using images sampled from training video sequences and then construct the GMM(Gaussian Mixture Model) for each person. To solve the illumination problem for the face tracking and recognition, we decompose the face images into the reflectance and the illuminance using the SSR(Single Scale Retinex) model. To obtain the normalized reflectance, the reflectance is rescaled by histogram equalization on the defined range. We newly approximate the illuminance by the trained manifold since the illuminance has almost variations by illumination. By combining these two features into our manifold framework, we derived the efficient face tracking and recognition results on indoor and outdoor video. To improve the video based tracking results, we update the weights of each face pose density at each frame by the tracking result at the previous frame using EM algorithm. Our experimental results show that our method is more efficient than other methods.

Digitally Modulated Signal Classification based on Higher Order Statistics of Cyclostationary Process (순환정상 프로세스의 고차 통계 특성을 이용한 디지털 변조인식)

  • Ahn, Woo-Hyun;Nah, Sun-Phil;Seo, Bo-Seok
    • Journal of Broadcast Engineering
    • /
    • v.19 no.2
    • /
    • pp.195-204
    • /
    • 2014
  • In this paper, we propose an automatic modulation classification method for ten digitally modulated baseband signals, such as 2-FSK, 4-FSK, 8-FSK, MSK, BPSK, QPSK, 8-PSK, 16-QAM, 32-QAM, and 64-QAM based on higher order statistics of cyclostationary process. The first order cyclic moments and higher order cyclic cumulants of the signal are used as features of the modulation signals. The proposed method consists of two stages. At the first stage, we classify modulation signals as M-FSK and non-FSK using peaks of the first order cyclic moment. At the next step, we apply the Gaussian mixture model-based classifier to classify non-FSK. Simulation results are demonstrated to evaluate the proposed scheme. The results show high probability of classification even in the presence of frequency and phase offsets.

Window Production Method based on Low-Frequency Detection for Automatic Object Extraction of GrabCut (GrabCut의 자동 객체 추출을 위한 저주파 영역 탐지 기반의 윈도우 생성 기법)

  • Yoo, Tae-Hoon;Lee, Gang-Seong;Lee, Sang-Hun
    • Journal of Digital Convergence
    • /
    • v.10 no.8
    • /
    • pp.211-217
    • /
    • 2012
  • Conventional GrabCut algorithm is semi-automatic algorithm that user must be set rectangle window surrounds the object. This paper studied automatic object detection to solve these problem by detecting salient region based on Human Visual System. Saliency map is computed using Lab color space which is based on color opposing theory of 'red-green' and 'blue-yellow'. Then Saliency Points are computed from the boundaries of Low-Frequency region that are extracted from Saliency Map. Finally, Rectangle windows are obtained from coordinate value of Saliency Points and these windows are used in GrabCut algorithm to extract objects. Through various experiments, the proposed algorithm computing rectangle windows of salient region and extracting objects has been proved.

Feature Extraction Algorithm for Underwater Transient Signal Using Cepstral Coefficients Based on Wavelet Packet (웨이브렛 패킷 기반 캡스트럼 계수를 이용한 수중 천이신호 특징 추출 알고리즘)

  • Kim, Juho;Paeng, Dong-Guk;Lee, Chong Hyun;Lee, Seung Woo
    • Journal of Ocean Engineering and Technology
    • /
    • v.28 no.6
    • /
    • pp.552-559
    • /
    • 2014
  • In general, the number of underwater transient signals is very limited for research on automatic recognition. Data-dependent feature extraction is one of the most effective methods in this case. Therefore, we suggest WPCC (Wavelet packet ceptsral coefficient) as a feature extraction method. A wavelet packet best tree for each data set is formed using an entropy-based cost function. Then, every terminal node of the best trees is counted to build a common wavelet best tree. It corresponds to flexible and non-uniform filter bank reflecting characteristics for the data set. A GMM (Gaussian mixture model) is used to classify five classes of underwater transient data sets. The error rate of the WPCC is compared using MFCC (Mel-frequency ceptsral coefficients). The error rates of WPCC-db20, db40, and MFCC are 0.4%, 0%, and 0.4%, respectively, when the training data consist of six out of the nine pieces of data in each class. However, WPCC-db20 and db40 show rates of 2.98% and 1.20%, respectively, while MFCC shows a rate of 7.14% when the training data consists of only three pieces. This shows that WPCC is less sensitive to the number of training data pieces than MFCC. Thus, it could be a more appropriate method for underwater transient recognition. These results may be helpful to develop an automatic recognition system for an underwater transient signal.

A Study of Sensor Fusion using Radar Sensor and Vision Sensor in Moving Object Detection (레이더 센서와 비전 센서를 활용한 다중 센서 융합 기반 움직임 검지에 관한 연구)

  • Kim, Se Jin;Byun, Ki Hun;Won, In Su;Kwon, Jang Woo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.2
    • /
    • pp.140-152
    • /
    • 2017
  • This Paper is for A study of sensor fusion using Radar sensor and Vision sensor in moving object detection. Radar sensor has some problems to detect object. When the sensor moves by wind or that kind of thing, it can happen to detect wrong object like building or tress. And vision sensor is very useful for all area. And it is also used so much. but there are some weakness that is influenced easily by the light of the area, shaking of the sensor device, and weather and so on. So in this paper I want to suggest to fuse these sensor to detect object. Each sensor can fill the other's weakness, so this kind of sensor fusion makes object detection much powerful.

Voice Personality Transformation Using a Probabilistic Method (확률적 방법을 이용한 음성 개성 변환)

  • Lee Ki-Seung
    • The Journal of the Acoustical Society of Korea
    • /
    • v.24 no.3
    • /
    • pp.150-159
    • /
    • 2005
  • This paper addresses a voice personality transformation algorithm which makes one person's voices sound as if another person's voices. In the proposed method, one person's voices are represented by LPC cepstrum, pitch period and speaking rate, the appropriate transformation rules for each Parameter are constructed. The Gaussian Mixture Model (GMM) is used to model one speaker's LPC cepstrums and conditional probability is used to model the relationship between two speaker's LPC cepstrums. To obtain the parameters representing each probabilistic model. a Maximum Likelihood (ML) estimation method is employed. The transformed LPC cepstrums are obtained by using a Minimum Mean Square Error (MMSE) criterion. Pitch period and speaking rate are used as the parameters for prosody transformation, which is implemented by using the ratio of the average values. The proposed method reveals the superior performance to the previous VQ-based method in subjective measures including average cepstrum distance reduction ratio and likelihood increasing ratio. In subjective test. we obtained almost the same correct identification ratio as the previous method and we also confirmed that high qualify transformed speech is obtained, which is due to the smoothly evolving spectral contours over time.