• Title/Summary/Keyword: unsupervised model

Search Result 241, Processing Time 0.024 seconds

Intrusion Detection Method Using Unsupervised Learning-Based Embedding and Autoencoder (비지도 학습 기반의 임베딩과 오토인코더를 사용한 침입 탐지 방법)

  • Junwoo Lee;Kangseok Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.8
    • /
    • pp.355-364
    • /
    • 2023
  • As advanced cyber threats continue to increase in recent years, it is difficult to detect new types of cyber attacks with existing pattern or signature-based intrusion detection method. Therefore, research on anomaly detection methods using data learning-based artificial intelligence technology is increasing. In addition, supervised learning-based anomaly detection methods are difficult to use in real environments because they require sufficient labeled data for learning. Research on an unsupervised learning-based method that learns from normal data and detects an anomaly by finding a pattern in the data itself has been actively conducted. Therefore, this study aims to extract a latent vector that preserves useful sequence information from sequence log data and develop an anomaly detection learning model using the extracted latent vector. Word2Vec was used to create a dense vector representation corresponding to the characteristics of each sequence, and an unsupervised autoencoder was developed to extract latent vectors from sequence data expressed as dense vectors. The developed autoencoder model is a recurrent neural network GRU (Gated Recurrent Unit) based denoising autoencoder suitable for sequence data, a one-dimensional convolutional neural network-based autoencoder to solve the limited short-term memory problem that GRU can have, and an autoencoder combining GRU and one-dimensional convolution was used. The data used in the experiment is time-series-based NGIDS (Next Generation IDS Dataset) data, and as a result of the experiment, an autoencoder that combines GRU and one-dimensional convolution is better than a model using a GRU-based autoencoder or a one-dimensional convolution-based autoencoder. It was efficient in terms of learning time for extracting useful latent patterns from training data, and showed stable performance with smaller fluctuations in anomaly detection performance.

Performance of Pseudomorpheme-Based Speech Recognition Units Obtained by Unsupervised Segmentation and Merging (비교사 분할 및 병합으로 구한 의사형태소 음성인식 단위의 성능)

  • Bang, Jeong-Uk;Kwon, Oh-Wook
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.155-164
    • /
    • 2014
  • This paper proposes a new method to determine the recognition units for large vocabulary continuous speech recognition (LVCSR) in Korean by applying unsupervised segmentation and merging. In the proposed method, a text sentence is segmented into morphemes and position information is added to morphemes. Then submorpheme units are obtained by splitting the morpheme units through the maximization of posterior probability terms. The posterior probability terms are computed from the morpheme frequency distribution, the morpheme length distribution, and the morpheme frequency-of-frequency distribution. Finally, the recognition units are obtained by sequentially merging the submorpheme pair with the highest frequency. Computer experiments are conducted using a Korean LVCSR with a 100k word vocabulary and a trigram language model obtained by a 300 million eojeol (word phrase) corpus. The proposed method is shown to reduce the out-of-vocabulary rate to 1.8% and reduce the syllable error rate relatively by 14.0%.

Determining the Optimal Number of Signal Clusters Using Iterative HMM Classification

  • Ernest, Duker Junior;Kim, Yoon Joong
    • International journal of advanced smart convergence
    • /
    • v.7 no.2
    • /
    • pp.33-37
    • /
    • 2018
  • In this study, we propose an iterative clustering algorithm that automatically clusters a set of voice signal data without a label into an optimal number of clusters and generates hmm model for each cluster. In the clustering process, the likelihood calculations of the clusters are performed using iterative hmm learning and testing while varying the number of clusters for given data, and the maximum likelihood estimation method is used to determine the optimal number of clusters. We tested the effectiveness of this clustering algorithm on a small-vocabulary digit clustering task by mapping the unsupervised decoded output of the optimal cluster to the ground-truth transcription, we found out that they were highly correlated.

Cluster Analysis Algorithms Based on the Gradient Descent Procedure of a Fuzzy Objective Function

  • Rhee, Hyun-Sook;Oh, Kyung-Whan
    • Journal of Electrical Engineering and information Science
    • /
    • v.2 no.6
    • /
    • pp.191-196
    • /
    • 1997
  • Fuzzy clustering has been playing an important role in solving many problems. Fuzzy c-Means(FCM) algorithm is most frequently used for fuzzy clustering. But some fixed point of FCM algorithm, know as Tucker's counter example, is not a reasonable solution. Moreover, FCM algorithm is impossible to perform the on-line learning since it is basically a batch learning scheme. This paper presents unsupervised learning networks as an attempt to improve shortcomings of the conventional clustering algorithm. This model integrates optimization function of FCM algorithm into unsupervised learning networks. The learning rule of the proposed scheme is a result of formal derivation based on the gradient descent procedure of a fuzzy objective function. Using the result of formal derivation, two algorithms of fuzzy cluster analysis, the batch learning version and on-line learning version, are devised. They are tested on several data sets and compared with FCM. The experimental results show that the proposed algorithms find out the reasonable solution on Tucker's counter example.

  • PDF

A Classification Technique for Panchromatic Imagery Using Independent Component Analysis Feature Extraction

  • Byoun, Seung-Gun;Lee, Ho-Yong;Kim, Min;Lee, Kwae-Hi
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.23-28
    • /
    • 2002
  • Among effective feature extraction methods from the small-patched image set, independent component analysis (ICA) is recently well known stochastic manner to find informative basis images. The ICA simultaneously learns both basis images and independent components using high order statistic manners, because that information underlying between pixels are sensitive to high-order statistic models. The topographic ICA model is adapted in our experiment. This paper deals with an unsupervised classification strategies using learned ICA basis images. The experimental result by proposed classification technique shows superior performance than classic texture analysis techniques for the panchromatic KOMPSAT imagery.

  • PDF

Unsupervised Change Detection Using Iterative Mixture Density Estimation and Thresholding

  • Park, No-Wook;Chi, Kwang-Hoon
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.402-404
    • /
    • 2003
  • We present two methods for the automatic selection of the threshold values in unsupervised change detection. Both methods consist of the same two procedures: 1) to determine the parameters of Gaussian mixtures from a difference image or ratio image, 2) to determine threshold values using the Bayesian rule for minimum error. In the first method, the Expectation-Maximization algorithm is applied for estimating the parameters of the Gaussian mixtures. The second method is based on the iterative thresholding that successively employs thresholding and estimation of the model parameters. The effectiveness and applicability of the methods proposed here are illustrated by an experiment on the multi-temporal KOMPAT-1 EOC images.

  • PDF

Improving Adversarial Domain Adaptation with Mixup Regularization

  • Bayarchimeg Kalina;Youngbok Cho
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.139-144
    • /
    • 2023
  • Engineers prefer deep neural networks (DNNs) for solving computer vision problems. However, DNNs pose two major problems. First, neural networks require large amounts of well-labeled data for training. Second, the covariate shift problem is common in computer vision problems. Domain adaptation has been proposed to mitigate this problem. Recent work on adversarial-learning-based unsupervised domain adaptation (UDA) has explained transferability and enabled the model to learn robust features. Despite this advantage, current methods do not guarantee the distinguishability of the latent space unless they consider class-aware information of the target domain. Furthermore, source and target examples alone cannot efficiently extract domain-invariant features from the encoded spaces. To alleviate the problems of existing UDA methods, we propose the mixup regularization in adversarial discriminative domain adaptation (ADDA) method. We validated the effectiveness and generality of the proposed method by performing experiments under three adaptation scenarios: MNIST to USPS, SVHN to MNIST, and MNIST to MNIST-M.

Range Detection of Wa/Kwa Parallel Noun Phrase using a Probabilistic Model and Modification Information (확률모형과 수식정보를 이용한 와/과 병렬사구 범위결정)

  • Choi, Yong-Seok;Shin, Ji-Ae;Choi, Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.2
    • /
    • pp.128-136
    • /
    • 2008
  • Recognition of parallel structure at early stage of sentence parsing can reduce the complexity of parsing. In this paper, we propose an unsupervised language-independent probabilistic model for recongition of parallel noun structures. The proposed model is based on the idea of swapping constituents, which replies the properties of symmetry (two or more identical constituents are repeated) and of reversibility (the order of constituents is inter-changeable) in parallel structures. The non-symmetric patterns that cannot be captured by the general symmetry rule are resolved additionally by the modifier information. In particular this paper shows how the proposed model is applied to recognize Korean parallel noun phrases connected by "wa/kwa" particle. Our model is compared with other models including supervised models and performs better on recongition of parallel noun phrases.

Unsupervised Clustering of Multivariate Time Series Microarray Experiments based on Incremental Non-Gaussian Analysis

  • Ng, Kam Swee;Yang, Hyung-Jeong;Kim, Soo-Hyung;Kim, Sun-Hee;Anh, Nguyen Thi Ngoc
    • International Journal of Contents
    • /
    • v.8 no.1
    • /
    • pp.23-29
    • /
    • 2012
  • Multiple expression levels of genes obtained using time series microarray experiments have been exploited effectively to enhance understanding of a wide range of biological phenomena. However, the unique nature of microarray data is usually in the form of large matrices of expression genes with high dimensions. Among the huge number of genes presented in microarrays, only a small number of genes are expected to be effective for performing a certain task. Hence, discounting the majority of unaffected genes is the crucial goal of gene selection to improve accuracy for disease diagnosis. In this paper, a non-Gaussian weight matrix obtained from an incremental model is proposed to extract useful features of multivariate time series microarrays. The proposed method can automatically identify a small number of significant features via discovering hidden variables from a huge number of features. An unsupervised hierarchical clustering representative is then taken to evaluate the effectiveness of the proposed methodology. The proposed method achieves promising results based on predictive accuracy of clustering compared to existing methods of analysis. Furthermore, the proposed method offers a robust approach with low memory and computation costs.

A Statistically Model-Based Adaptive Technique to Unsupervised Segmentation of MR Images (자기공명영상의 비지도 분할을 위한 통계적 모델기반 적응적 방법)

  • Kim, Tae-Woo
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.1
    • /
    • pp.286-295
    • /
    • 2000
  • We present a novel statistically adaptive method using the Minimum Description Length(MDL) principle for unsupervised segmentation of magnetic resonance(MR) images. In the method, Markov random filed(MRF) modeling of tissue region accounts for random noise. Intensity measurements on the local region defined by a window are modeled by a finite Gaussian mixture, which accounts for image inhomogeneities. The segmentation algorithm is based on an iterative conditional modes(ICM) algorithm, approximately finds maximum ${\alpha}$ posteriori(MAP) estimation, and estimates model parameters on the local region. The size of the window for parameter estimation and segmentation is estimated from the image using the MDL principle. In the experiments, the technique well reflected image characteristic of the local region and showed better results than conventional methods in segmentation of MR images with inhomogeneities, especially.

  • PDF