• Title/Summary/Keyword: unsupervised analysis

Search Result 311, Processing Time 0.03 seconds

High Resolution Satellite Image Segmentation Algorithm Development Using Seed-based region growing (시드 기반 영역확장기법을 이용한 고해상도 위성영상 분할기법 개발)

  • Byun, Young-Gi;Kim, Yong-Il
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.28 no.4
    • /
    • pp.421-430
    • /
    • 2010
  • Image segmentation technique is becoming increasingly important in the field of remote sensing image analysis in areas such as object oriented image classification to extract object regions of interest within images. This paper presents a new method for image segmentation in High Resolution Remote Sensing Image based on Improved Seeded Region Growing (ISRG) and Region merging. Firstly, multi-spectral edge detection was done using an entropy operator in pan-sharpened QuickBird imagery. Then, the initial seeds were automatically selected from the obtained multi-spectral edge map. After automatic selection of significant seeds, an initial segmentation was achieved by applying ISRG to consider spectral and edge information. Finally the region merging process, integrating region texture and spectral information, was carried out to get the final segmentation result. The accuracy assesment was done using the unsupervised objective evaluation method for evaluating the effectiveness of the proposed method. Experimental results demonstrated that the proposed method has good potential for application in the segmentation of high resolution satellite images.

Feature-based Gene Classification and Region Clustering using Gene Expression Grid Data in Mouse Hippocampal Region (쥐 해마의 유전자 발현 그리드 데이터를 이용한 특징기반 유전자 분류 및 영역 군집화)

  • Kang, Mi-Sun;Kim, HyeRyun;Lee, Sukchan;Kim, Myoung-Hee
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.54-60
    • /
    • 2016
  • Brain gene expression information is closely related to the structural and functional characteristics of the brain. Thus, extensive research has been carried out on the relationship between gene expression patterns and the brain's structural organization. In this study, Principal Component Analysis was used to extract features of gene expression patterns, and genes were automatically classified by spatial distribution. Voxels were then clustered with classified specific region expressed genes. Finally, we visualized the clustering results for mouse hippocampal region gene expression with the Allen Brain Atlas. This experiment allowed us to classify the region-specific gene expression of the mouse hippocampal region and provided visualization of clustering results and a brain atlas in an integrated manner. This study has the potential to allow neuroscientists to search for experimental groups of genes more quickly and design an effective test according to the new form of data. It is also expected that it will enable the discovery of a more specific sub-region beyond the current known anatomical regions of the brain.

A Comparative Analysis of land Cover Changes Among Different Source Regions of Dust Emission in East Asia: Gobi Desert and Manchuria (동아시아의 황사발원지들에 대한 토지피복 비교 연구: 고비사막과 만주)

  • Pi, Kyoung-Jin;Han, Kyung-Soo;Park, Soo-Jae
    • Korean Journal of Remote Sensing
    • /
    • v.25 no.2
    • /
    • pp.175-184
    • /
    • 2009
  • This study attempts to analyze the difference among the variations of ecological distribution in Gobi desert and Manchuria through satellite based land cover classification. This was motivated by two well-known facts: 1) Gobi desert, which is an old source region, had been gradually expanded eastward; 2) Manchuria, which is located in east of Gobi desert, was observed as a new source region of yellow dust. An unsupervised classification called ISODATA clustering method was employed to detect the land cover change and to characterize the status of desertification and its expanding trends using NDVI (Normalized Difference Vegetation Index) derived from VEGETATION sensor onboard the SPOT satellite for 1999 and 2007. We analyzed NDVI annual variation pattern for every classes and divide into 5 level according to their vegetation's density level based on NDVI. As results, Gobi desert is showed positive variation: a decrease $78,066km^2$ in central Gobi desert and out skirts of Gobi desert (level-0) but Manchuria area is worse than previous time: an increase $25,744km^2$.

A survey on unsupervised subspace outlier detection methods for high dimensional data (고차원 자료의 비지도 부분공간 이상치 탐지기법에 대한 요약 연구)

  • Ahn, Jaehyeong;Kwon, Sunghoon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.3
    • /
    • pp.507-521
    • /
    • 2021
  • Detecting outliers among high-dimensional data encounters a challenging problem of screening the variables since relevant information is often contained in only a few of the variables. Otherwise, when a number of irrelevant variables are included in the data, the distances between all observations tend to become similar which leads to making the degree of outlierness of all observations alike. The subspace outlier detection method overcomes the problem by measuring the degree of outlierness of the observation based on the relevant subsets of the entire variables. In this paper, we survey recent subspace outlier detection techniques, classifying them into three major types according to the subspace selection method. And we summarize the techniques of each type based on how to select the relevant subspaces and how to measure the degree of outlierness. In addition, we introduce some computing tools for implementing the subspace outlier detection techniques and present results from the simulation study and real data analysis.

Pattern analysis of lower limb magnetic resonance images in Korean patients with distal myopathy

  • Park, Hyung Jun;Shin, Ha Young;Kim, Seung Min;Park, Kee Duk;Choi, Young-Chul
    • Annals of Clinical Neurophysiology
    • /
    • v.21 no.2
    • /
    • pp.79-86
    • /
    • 2019
  • Background: Magnetic resonance (MR) images are useful for diagnosing myopathy. The purpose of this study was to determine the usefulness of lower-limb MR images in Korean patients with distal myopathy. Methods: We reviewed medical records in the myopathy database from January 2002 to October 2016. We selected 21 patients from 91 unrelated families with distal myopathy: four with GNE myopathy, 11 with dysferlinopathy, and six with ADSSL1 myopathy. Results: Ten (48%) of the 21 patients were men. The ages of the participants at symptom onset and imaging were $19.2{\pm}9.5$ and $30.4{\pm}9.0$ years (mean${\pm}$standard deviation), respectively. Their grade on the modified Gardner-Medwin and Walton grade was $3.3{\pm}1.7$. The strength grade of the knee extensors was not correlated with the Mercuri scale for the quadriceps (r = -0.247, p = 0.115). However, the Medical Research Council grades of the knee flexors, ankle dorsiflexors, and ankle plantar flexors were significantly correlated with the Mercuri scale ratings of the knee flexors (r = -0.497, p = 0.001), tibialis anterior (r = -0.727, p < 0.001), and ankle plantar flexors (r = -0.620, p < 0.001), respectively. T1-weighted MR images showed characteristic fatty replacement patterns that were consistent with the causative genes. Unsupervised hierarchical clustering of the Mercuri scale showed that the main factors contributing to the dichotomy were the causative gene and the clinical severity. Conclusions: This study is the first to reveal the usefulness of lower-limb MR images in the differential diagnosis of distal myopathy in Korea.

Prediction of the Probability of Job Loss due to Digitalization and Comparison by Industry: Using Machine Learning Methods

  • Park, Heedae;Lee, Kiyoul
    • Journal of Korea Trade
    • /
    • v.25 no.5
    • /
    • pp.110-128
    • /
    • 2021
  • Purpose - The essential purpose of this study is to analyze the possibility of substitution of an individual job resulting from technological development represented by the 4th Industrial Resolution, considering the different effects of digital transformation on the labor market. Design/methodology - In order to estimate the substitution probability, this study used two data sets which the job characteristics data for individual occupations provided by KEIS and the information on occupational status of substitution provided by Frey and Osborne(2013). In total, 665 occupations were considered in this study. Of these, 80 occupations had data with labels of substitution status. The primary goal of estimation was to predict the degree of substitution for 607 of 665 occupations (excluding 58 with markers). It utilized three methods a principal component analysis, an unsupervised learning methodology of machine learning, and Ridge and Lasso from supervised learning methodology. After extracting significant variables based on the three methods, this study carried out logistics regression to estimate the probability of substitution for each occupation. Findings - The probability of substitution for other occupational groups did not significantly vary across individual models, and the rank order of the probabilities across occupational groups were similar across models. The mean of three methods of substitution probability was analyzed to be 45.3%. The highest value was obtained using the PCA method, and the lowest value was derived from the LASSO method. The average substitution probability of the trading industry was 45.1%, very similar to the overall average. Originality/value - This study has a significance in that it estimates the job substitution probability using various machine learning methods. The results of substitution probability estimation were compared by industry sector. In addition, This study attempts to compare between trade business and industry sector.

Building Energy Time Series Data Mining for Behavior Analytics and Forecasting Energy consumption

  • Balachander, K;Paulraj, D
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.6
    • /
    • pp.1957-1980
    • /
    • 2021
  • The significant aim of this research has always been to evaluate the mechanism for efficient and inherently aware usage of vitality in-home devices, thus improving the information of smart metering systems with regard to the usage of selected homes and the time of use. Advances in information processing are commonly used to quantify gigantic building activity data steps to boost the activity efficiency of the building energy systems. Here, some smart data mining models are offered to measure, and predict the time series for energy in order to expose different ephemeral principles for using energy. Such considerations illustrate the use of machines in relation to time, such as day hour, time of day, week, month and year relationships within a family unit, which are key components in gathering and separating the effect of consumers behaviors in the use of energy and their pattern of energy prediction. It is necessary to determine the multiple relations through the usage of different appliances from simultaneous information flows. In comparison, specific relations among interval-based instances where multiple appliances use continue for certain duration are difficult to determine. In order to resolve these difficulties, an unsupervised energy time-series data clustering and a frequent pattern mining study as well as a deep learning technique for estimating energy use were presented. A broad test using true data sets that are rich in smart meter data were conducted. The exact results of the appliance designs that were recognized by the proposed model were filled out by Deep Convolutional Neural Networks (CNN) and Recurrent Neural Networks (LSTM and GRU) at each stage, with consolidated accuracy of 94.79%, 97.99%, 99.61%, for 25%, 50%, and 75%, respectively.

Anomaly Detection In Real Power Plant Vibration Data by MSCRED Base Model Improved By Subset Sampling Validation (Subset 샘플링 검증 기법을 활용한 MSCRED 모델 기반 발전소 진동 데이터의 이상 진단)

  • Hong, Su-Woong;Kwon, Jang-Woo
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.1
    • /
    • pp.31-38
    • /
    • 2022
  • This paper applies an expert independent unsupervised neural network learning-based multivariate time series data analysis model, MSCRED(Multi-Scale Convolutional Recurrent Encoder-Decoder), and to overcome the limitation, because the MCRED is based on Auto-encoder model, that train data must not to be contaminated, by using learning data sampling technique, called Subset Sampling Validation. By using the vibration data of power plant equipment that has been labeled, the classification performance of MSCRED is evaluated with the Anomaly Score in many cases, 1) the abnormal data is mixed with the training data 2) when the abnormal data is removed from the training data in case 1. Through this, this paper presents an expert-independent anomaly diagnosis framework that is strong against error data, and presents a concise and accurate solution in various fields of multivariate time series data.

Development of Security Anomaly Detection Algorithms using Machine Learning (기계 학습을 활용한 보안 이상징후 식별 알고리즘 개발)

  • Hwangbo, Hyunwoo;Kim, Jae Kyung
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.1
    • /
    • pp.1-13
    • /
    • 2022
  • With the development of network technologies, the security to protect organizational resources from internal and external intrusions and threats becomes more important. Therefore in recent years, the anomaly detection algorithm that detects and prevents security threats with respect to various security log events has been actively studied. Security anomaly detection algorithms that have been developed based on rule-based or statistical learning in the past are gradually evolving into modeling based on machine learning and deep learning. In this study, we propose a deep-autoencoder model that transforms LSTM-autoencoder as an optimal algorithm to detect insider threats in advance using various machine learning analysis methodologies. This study has academic significance in that it improved the possibility of adaptive security through the development of an anomaly detection algorithm based on unsupervised learning, and reduced the false positive rate compared to the existing algorithm through supervised true positive labeling.

Comparison and Analysis of Unsupervised Contrastive Learning Approaches for Korean Sentence Representations (한국어 문장 표현을 위한 비지도 대조 학습 방법론의 비교 및 분석)

  • Young Hyun Yoo;Kyumin Lee;Minjin Jeon;Jii Cha;Kangsan Kim;Taeuk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.360-365
    • /
    • 2022
  • 문장 표현(sentence representation)은 자연어처리 분야 내의 다양한 문제 해결 및 응용 개발에 있어 유용하게 활용될 수 있는 주요한 도구 중 하나이다. 하지만 최근 널리 도입되고 있는 사전 학습 언어 모델(pre-trained language model)로부터 도출한 문장 표현은 이방성(anisotropy)이 뚜렷한 등 그 고유의 특성으로 인해 문장 유사도(Semantic Textual Similarity; STS) 측정과 같은 태스크에서 기대 이하의 성능을 보이는 것으로 알려져 있다. 이러한 문제를 해결하기 위해 대조 학습(contrastive learning)을 사전 학습 언어 모델에 적용하는 연구가 문헌에서 활발히 진행되어 왔으며, 그중에서도 레이블이 없는 데이터를 활용하는 비지도 대조 학습 방법이 주목을 받고 있다. 하지만 대다수의 기존 연구들은 주로 영어 문장 표현 개선에 집중하였으며, 이에 대응되는 한국어 문장 표현에 관한 연구는 상대적으로 부족한 실정이다. 이에 본 논문에서는 대표적인 비지도 대조 학습 방법(ConSERT, SimCSE)을 다양한 한국어 사전 학습 언어 모델(KoBERT, KR-BERT, KLUE-BERT)에 적용하여 문장 유사도 태스크(KorSTS, KLUE-STS)에 대해 평가하였다. 그 결과, 한국어의 경우에도 일반적으로 영어의 경우와 유사한 경향성을 보이는 것을 확인하였으며, 이에 더하여 다음과 같은 새로운 사실을 관측하였다. 첫째, 사용한 비지도 대조 학습 방법 모두에서 KLUE-BERT가 KoBERT, KR-BERT보다 더 안정적이고 나은 성능을 보였다. 둘째, ConSERT에서 소개하는 여러 데이터 증강 방법 중 token shuffling 방법이 전반적으로 높은 성능을 보였다. 셋째, 두 가지 비지도 대조 학습 방법 모두 검증 데이터로 활용한 KLUE-STS 학습 데이터에 대해 성능이 과적합되는 현상을 발견하였다. 결론적으로, 본 연구에서는 한국어 문장 표현 또한 영어의 경우와 마찬가지로 비지도 대조 학습의 적용을 통해 그 성능을 개선할 수 있음을 검증하였으며, 이와 같은 결과가 향후 한국어 문장 표현 연구 발전에 초석이 되기를 기대한다.

  • PDF