• Title/Summary/Keyword: unsupervised model

Search Result 241, Processing Time 0.028 seconds

Recognition Performance Improvement of Unsupervised Limabeam Algorithm using Post Filtering Technique

  • Nguyen, Dinh Cuong;Choi, Suk-Nam;Chung, Hyun-Yeol
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.4
    • /
    • pp.185-194
    • /
    • 2013
  • Abstract- In distant-talking environments, speech recognition performance degrades significantly due to noise and reverberation. Recent work of Michael L. Selzer shows that in microphone array speech recognition, the word error rate can be significantly reduced by adapting the beamformer weights to generate a sequence of features which maximizes the likelihood of the correct hypothesis. In this approach, called Likelihood Maximizing Beamforming algorithm (Limabeam), one of the method to implement this Limabeam is an UnSupervised Limabeam(USL) that can improve recognition performance in any situation of environment. From our investigation for this USL, we could see that because the performance of optimization depends strongly on the transcription output of the first recognition step, the output become unstable and this may lead lower performance. In order to improve recognition performance of USL, some post-filter techniques can be employed to obtain more correct transcription output of the first step. In this work, as a post-filtering technique for first recognition step of USL, we propose to add a Wiener-Filter combined with Feature Weighted Malahanobis Distance to improve recognition performance. We also suggest an alternative way to implement Limabeam algorithm for Hidden Markov Network (HM-Net) speech recognizer for efficient implementation. Speech recognition experiments performed in real distant-talking environment confirm the efficacy of Limabeam algorithm in HM-Net speech recognition system and also confirm the improved performance by the proposed method.

Detecting response patterns of zooplankton to environmental parameters in shallow freshwater wetlands: discovery of the role of macrophytes as microhabitat for epiphytic zooplankton

  • Choi, Jong-Yun;Kim, Seong-Ki;Jeng, Kwang-Seuk;Joo, Gea-Jae
    • Journal of Ecology and Environment
    • /
    • v.38 no.2
    • /
    • pp.133-143
    • /
    • 2015
  • Freshwater macrophytes improve the structural heterogeneity of microhabitats in water, often providing an important habitat for zooplankton. Some studies have focused on the overall influence of macrophytes on zooplankton, but the effects of macrophyte in relation to different habitat characteristics of zooplankton (e.g., epiphytic and pelagic) have not been intensively studied. We hypothesized that different habitat structures (i.e., macrophyte habitat) would strongly affect zooplankton distribution. We investigated zooplankton density and diversity, macrophyte characteristics (dry weight and species number), and environmental parameters in 40 shallow wetlands in South Korea. Patterns in the data were analyzed using a self-organizing map (SOM), which extracts information through competitive and adaptive properties. A total of 20 variables (11 environmental parameters and 9 zooplankton groups) were patterned onto the SOM. Based on a U-matrix, 3 clusters were identified from the model. Zooplankton assemblages were positively related to macrophyte characteristics (i.e., dry weight and species number). In particular, epiphytic species (i.e., epiphytic rotifers and cladocerans) exhibited a clear relationship with macrophyte characteristics, while large biomass and greater numbers of macrophyte species supported high zooplankton assemblages. Consequently, habitat heterogeneity in the macrophyte bed was recognized as an important factor to determine zooplankton distribution, particularly in epiphytic species. The results indicate that macrophytes are critical for heterogeneity in lentic freshwater ecosystems, and the inclusion of diverse plant species in wetland construction or restoration schemes is expected to generate ecologically healthy food webs.

Metabolomic Analysis of Ethyl Acetate and Methanol Extracts of Blueberry (Ethyl Acetate와 Methanol을 이용한 블루베리 추출물 대사체 분석)

  • Jo, Young-Hee;Kim, Sugyeong;Kwon, Da-Ae;Lee, Hong Jin;Choi, Hyung-Kyoon;Auh, Joong-Hyuck
    • Journal of the Korean Society of Food Science and Nutrition
    • /
    • v.43 no.3
    • /
    • pp.419-424
    • /
    • 2014
  • Metabolite profiling of blueberry (cultivar "Spartan") was performed by extraction using different solvents, methanol and ethyl acetate, through metabolomic analysis using LC-MS/MS. Unsupervised classification method (PCA) and supervised prediction model (OPLS-DA) provided good categorization of metabolites according to the extraction solvents. Metabolites of the anthocyanin family, including delphinidin hexoside, delphinidin, 5-O-feruloylquinic acid, malvidin hexoside, malvidin-3-arabinoside, petunidin-3-arabinoside, and petunidin hexoside, were mainly detected in methanol fractions, whereas those of the flavonoid family, including chlorogenic acid, chlorogenic acid dimer, 6,8-di-C-arabinopyranosyl-luteolin, and luteolin were successfully prepared in the ethyl acetate fraction. Thus, metabolomic analysis of blueberry extracts allows for the simple profiling of whole and distinctive metabolites for future applications.

Multi-view learning review: understanding methods and their application (멀티 뷰 기법 리뷰: 이해와 응용)

  • Bae, Kang Il;Lee, Yung Seop;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.1
    • /
    • pp.41-68
    • /
    • 2019
  • Multi-view learning considers data from various viewpoints as well as attempts to integrate various information from data. Multi-view learning has been studied recently and has showed superior performance to a model learned from only a single view. With the introduction of deep learning techniques to a multi-view learning approach, it has showed good results in various fields such as image, text, voice, and video. In this study, we introduce how multi-view learning methods solve various problems faced in human behavior recognition, medical areas, information retrieval and facial expression recognition. In addition, we review data integration principles of multi-view learning methods by classifying traditional multi-view learning methods into data integration, classifiers integration, and representation integration. Finally, we examine how CNN, RNN, RBM, Autoencoder, and GAN, which are commonly used among various deep learning methods, are applied to multi-view learning algorithms. We categorize CNN and RNN-based learning methods as supervised learning, and RBM, Autoencoder, and GAN-based learning methods as unsupervised learning.

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Building Energy Time Series Data Mining for Behavior Analytics and Forecasting Energy consumption

  • Balachander, K;Paulraj, D
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.1957-1980
    • /
    • 2021
  • The significant aim of this research has always been to evaluate the mechanism for efficient and inherently aware usage of vitality in-home devices, thus improving the information of smart metering systems with regard to the usage of selected homes and the time of use. Advances in information processing are commonly used to quantify gigantic building activity data steps to boost the activity efficiency of the building energy systems. Here, some smart data mining models are offered to measure, and predict the time series for energy in order to expose different ephemeral principles for using energy. Such considerations illustrate the use of machines in relation to time, such as day hour, time of day, week, month and year relationships within a family unit, which are key components in gathering and separating the effect of consumers behaviors in the use of energy and their pattern of energy prediction. It is necessary to determine the multiple relations through the usage of different appliances from simultaneous information flows. In comparison, specific relations among interval-based instances where multiple appliances use continue for certain duration are difficult to determine. In order to resolve these difficulties, an unsupervised energy time-series data clustering and a frequent pattern mining study as well as a deep learning technique for estimating energy use were presented. A broad test using true data sets that are rich in smart meter data were conducted. The exact results of the appliance designs that were recognized by the proposed model were filled out by Deep Convolutional Neural Networks (CNN) and Recurrent Neural Networks (LSTM and GRU) at each stage, with consolidated accuracy of 94.79%, 97.99%, 99.61%, for 25%, 50%, and 75%, respectively.

Sensitivity of abacus and Chasdaq in the Chinese stock market through analysis of Weibo sentiment related to Corona-19 (코로나-19관련 웨이보 정서 분석을 통한 중국 주식시장의 주판 및 차스닥의 민감도 예측 기법)

  • Li, Jiaqi;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.1
    • /
    • pp.1-7
    • /
    • 2021
  • Investor mood from social media is gaining increasing attention for leading a price movement in stock market. Based on the behavioral finance theory, this study argues that sentiment extracted from social media using big data technique can predict a real-time (short-run) price momentum in Chinese stock market. Collecting Sina Weibo posts that related to COVID-19 using keyword method, a daily influential weighted sentiment factors is extracted from the sizable raw data of over 2 millions of posts. We examine one supervised and 4 unsupervised sentiment analysis model, and use the best performed word-frequency and BiLSTM mdoel. The test result shows a similar movement between stock price change and sentiment factor. It indicates that public mood extracted from social media can in some extent represent the investors' sentiment and make a difference in stock market fluctuation when people are concentrating on a special events that can cause effect on the stock market.

Development of Security Anomaly Detection Algorithms using Machine Learning (기계 학습을 활용한 보안 이상징후 식별 알고리즘 개발)

  • Hwangbo, Hyunwoo;Kim, Jae Kyung
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.1-13
    • /
    • 2022
  • With the development of network technologies, the security to protect organizational resources from internal and external intrusions and threats becomes more important. Therefore in recent years, the anomaly detection algorithm that detects and prevents security threats with respect to various security log events has been actively studied. Security anomaly detection algorithms that have been developed based on rule-based or statistical learning in the past are gradually evolving into modeling based on machine learning and deep learning. In this study, we propose a deep-autoencoder model that transforms LSTM-autoencoder as an optimal algorithm to detect insider threats in advance using various machine learning analysis methodologies. This study has academic significance in that it improved the possibility of adaptive security through the development of an anomaly detection algorithm based on unsupervised learning, and reduced the false positive rate compared to the existing algorithm through supervised true positive labeling.

Malware Detection Using Deep Recurrent Neural Networks with no Random Initialization

  • Amir Namavar Jahromi;Sattar Hashemi
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.8
    • /
    • pp.177-189
    • /
    • 2023
  • Malware detection is an increasingly important operational focus in cyber security, particularly given the fast pace of such threats (e.g., new malware variants introduced every day). There has been great interest in exploring the use of machine learning techniques in automating and enhancing the effectiveness of malware detection and analysis. In this paper, we present a deep recurrent neural network solution as a stacked Long Short-Term Memory (LSTM) with a pre-training as a regularization method to avoid random network initialization. In our proposal, we use global and short dependencies of the inputs. With pre-training, we avoid random initialization and are able to improve the accuracy and robustness of malware threat hunting. The proposed method speeds up the convergence (in comparison to stacked LSTM) by reducing the length of malware OpCode or bytecode sequences. Hence, the complexity of our final method is reduced. This leads to better accuracy, higher Mattews Correlation Coefficients (MCC), and Area Under the Curve (AUC) in comparison to a standard LSTM with similar detection time. Our proposed method can be applied in real-time malware threat hunting, particularly for safety critical systems such as eHealth or Internet of Military of Things where poor convergence of the model could lead to catastrophic consequences. We evaluate the effectiveness of our proposed method on Windows, Ransomware, Internet of Things (IoT), and Android malware datasets using both static and dynamic analysis. For the IoT malware detection, we also present a comparative summary of the performance on an IoT-specific dataset of our proposed method and the standard stacked LSTM method. More specifically, of our proposed method achieves an accuracy of 99.1% in detecting IoT malware samples, with AUC of 0.985, and MCC of 0.95; thus, outperforming standard LSTM based methods in these key metrics.

Comparison and Analysis of Unsupervised Contrastive Learning Approaches for Korean Sentence Representations (한국어 문장 표현을 위한 비지도 대조 학습 방법론의 비교 및 분석)

  • Young Hyun Yoo;Kyumin Lee;Minjin Jeon;Jii Cha;Kangsan Kim;Taeuk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.360-365
    • /
    • 2022
  • 문장 표현(sentence representation)은 자연어처리 분야 내의 다양한 문제 해결 및 응용 개발에 있어 유용하게 활용될 수 있는 주요한 도구 중 하나이다. 하지만 최근 널리 도입되고 있는 사전 학습 언어 모델(pre-trained language model)로부터 도출한 문장 표현은 이방성(anisotropy)이 뚜렷한 등 그 고유의 특성으로 인해 문장 유사도(Semantic Textual Similarity; STS) 측정과 같은 태스크에서 기대 이하의 성능을 보이는 것으로 알려져 있다. 이러한 문제를 해결하기 위해 대조 학습(contrastive learning)을 사전 학습 언어 모델에 적용하는 연구가 문헌에서 활발히 진행되어 왔으며, 그중에서도 레이블이 없는 데이터를 활용하는 비지도 대조 학습 방법이 주목을 받고 있다. 하지만 대다수의 기존 연구들은 주로 영어 문장 표현 개선에 집중하였으며, 이에 대응되는 한국어 문장 표현에 관한 연구는 상대적으로 부족한 실정이다. 이에 본 논문에서는 대표적인 비지도 대조 학습 방법(ConSERT, SimCSE)을 다양한 한국어 사전 학습 언어 모델(KoBERT, KR-BERT, KLUE-BERT)에 적용하여 문장 유사도 태스크(KorSTS, KLUE-STS)에 대해 평가하였다. 그 결과, 한국어의 경우에도 일반적으로 영어의 경우와 유사한 경향성을 보이는 것을 확인하였으며, 이에 더하여 다음과 같은 새로운 사실을 관측하였다. 첫째, 사용한 비지도 대조 학습 방법 모두에서 KLUE-BERT가 KoBERT, KR-BERT보다 더 안정적이고 나은 성능을 보였다. 둘째, ConSERT에서 소개하는 여러 데이터 증강 방법 중 token shuffling 방법이 전반적으로 높은 성능을 보였다. 셋째, 두 가지 비지도 대조 학습 방법 모두 검증 데이터로 활용한 KLUE-STS 학습 데이터에 대해 성능이 과적합되는 현상을 발견하였다. 결론적으로, 본 연구에서는 한국어 문장 표현 또한 영어의 경우와 마찬가지로 비지도 대조 학습의 적용을 통해 그 성능을 개선할 수 있음을 검증하였으며, 이와 같은 결과가 향후 한국어 문장 표현 연구 발전에 초석이 되기를 기대한다.

  • PDF