• Title/Summary/Keyword: supervised and unsupervised classification

Search Result 100, Processing Time 0.023 seconds

Identifying potential mergers of globular clusters: a machine-learning approach

  • Pasquato, Mario
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.39 no.2
    • /
    • pp.89-89
    • /
    • 2014
  • While the current consensus view holds that galaxy mergers are commonplace, it is sometimes speculated that Globular Clusters (GCs) may also have undergone merging events, possibly resulting in massive objects with a strong metallicity spread such as Omega Centauri. Galaxies are mostly far, unresolved systems whose mergers are most likely wet, resulting in observational as well as modeling difficulties, but GCs are resolved into stars that can be used as discrete dynamical tracers, and their mergers might have been dry, therefore easily simulated with an N-body code. It is however difficult to determine the observational parameters best suited to reveal a history of merging based on the positions and kinematics of GC stars, if evidence of merging is at all observable. To overcome this difficulty, we investigate the applicability of supervised and unsupervised machine learning to the automatic reconstruction of the dynamical history of a stellar system. In particular we test whether statistical clustering methods can classify simulated systems into monolithic versus merger products. We run direct N-body simulations of two identical King-model clusters undergoing a head-on collision resulting in a merged system, and other simulations of isolated King models with the same total number of particles as the merged system. After several relaxation times elapse, we extract a sample of snapshots of the sky-projected positions of particles from each simulation at different dynamical times, and we run a variety of clustering and classification algorithms to classify the snapshots into two subsets in a relevant feature space.

  • PDF

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

K-means based Clustering Method with a Fixed Number of Cluster Members

  • Yi, Faliu;Moon, Inkyu
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1160-1170
    • /
    • 2014
  • Clustering methods are very useful in many fields such as data mining, classification, and object recognition. Both the supervised and unsupervised grouping approaches can classify a series of sample data with a predefined or automatically assigned cluster number. However, there is no constraint on the number of elements for each cluster. Numbers of cluster members for each cluster obtained from clustering schemes are usually random. Thus, some clusters possess a large number of elements whereas others only have a few members. In some areas such as logistics management, a fixed number of members are preferred for each cluster or logistic center. Consequently, it is necessary to design a clustering method that can automatically adjust the number of group elements. In this paper, a k-means based clustering method with a fixed number of cluster members is proposed. In the proposed method, first, the data samples are clustered using the k-means algorithm. Then, the number of group elements is adjusted by employing a greedy strategy. Experimental results demonstrate that the proposed clustering scheme can classify data samples efficiently for a fixed number of cluster members.

Unsupervised Classification of Landsat-8 OLI Satellite Imagery Based on Iterative Spectral Mixture Model (자동화된 훈련 자료를 활용한 Landsat-8 OLI 위성영상의 반복적 분광혼합모델 기반 무감독 분류)

  • Choi, Jae Wan;Noh, Sin Taek;Choi, Seok Keun
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.4
    • /
    • pp.53-61
    • /
    • 2014
  • Landsat OLI satellite imagery can be applied to various remote sensing applications, such as generation of land cover map, urban area analysis, extraction of vegetation index and change detection, because it includes various multispectral bands. In addition, land cover map is an important information to monitor and analyze land cover using GIS. In this paper, land cover map is generated by using Landsat OLI and existing land cover map. First, training dataset is obtained using correlation between existing land cover map and unsupervised classification result by K-means, automatically. And then, spectral signatures corresponding to each class are determined based on training data. Finally, abundance map and land cover map are generated by using iterative spectral mixture model. The experiment is accomplished by Landsat OLI of Cheongju area. It shows that result by our method can produce land cover map without manual training dataset, compared to existing land cover map and result by supervised classification result by SVM, quantitatively and visually.

MODIS Data-based Crop Classification using Selective Hierarchical Classification (선택적 계층 분류를 이용한 MODIS 자료 기반 작물 분류)

  • Kim, Yeseul;Lee, Kyung-Do;Na, Sang-Il;Hong, Suk-Young;Park, No-Wook;Yoo, Hee Young
    • Korean Journal of Remote Sensing
    • /
    • v.32 no.3
    • /
    • pp.235-244
    • /
    • 2016
  • In large-area crop classification with MODIS data, a mixed pixel problem caused by the low resolution of MODIS data has been one of main issues. To mitigate this problem, this paper proposes a hierarchical classification algorithm that selectively classifies the specific crop class of interest by using their spectral characteristics. This selective classification algorithm can reduce mixed pixel effects between crops and improve classification performance. The methodological developments are illustrated via a case study in Jilin city, China with MODIS Normalized Difference Vegetation Index (NDVI) and Near InfRared (NIR) reflectance datasets. First, paddy fields were extracted from unsupervised classification of NIR reflectance. Non-paddy areas were then classified into corn and bean using time-series NDVI datasets. In the case study result, the proposed classification algorithm showed the best classification performance by selectively classifying crops having similar spectral characteristics, compared with traditional direct supervised classification of time-series NDVI and NIR datasets. Thus, it is expected that the proposed selective hierarchical classification algorithm would be effectively used for producing reliable crop maps.

Analysis of Forest Types and Estimation of the Forest Carbon Stocks Using Landsat Satellite Images in Chungcheongnam-do, South Korea (Landsat 위성영상을 이용한 충청남도 임상 분석 및 산림 탄소저장량 추정)

  • Kim, Sung Hoon;Jang, Dong-Ho
    • Journal of the Korean association of regional geographers
    • /
    • v.20 no.2
    • /
    • pp.206-216
    • /
    • 2014
  • In this study, forest types in Chungheongnam-do were analyzed using Landsat satellite images and digital forest type map as a means to estimate forest carbon stocks. NDVI and Tasseled Cap, ISODATA, and supervised classification among others were used to analyze the forest types. The forest carbon stocks of Chungcheongnam-do were estimated utilizing forest statistical data derived from the classified results. The results indicate that the analysis of forest types through supervised classification yielded the highest overall accuracy in analyzing forest types using satellite images. Coniferous forests(49.3%) accounted for the highest proportion in all the forest types of Chungcheongnam-do, followed by deciduous forests(28.0%) and mixed forests(22.7%). The results of a comparative analysis between forest carbon stocks estimates made using the modified digital forest type map and other estimation methods showed that the method using Tasseled Cap and unsupervised classification yielded the most similar forest carbon stock estimates. The most significant difference, though, was made when only the digital forest type map was used. It is expected that if carbon stocks are estimated by integrating satellite images and digital forest type maps in the future, more accurate results can be derived in estimating forest carbon stocks at a national level.

  • PDF

A Comparison of the Land Cover Data Sets over Asian Region: USGS, IGBP, and UMd (아시아 지역 지면피복자료 비교 연구: USGS, IGBP, 그리고 UMd)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.17 no.2
    • /
    • pp.159-169
    • /
    • 2007
  • A comparison of the three land cover data sets (United States Geological Survey: USGS, International Geosphere Biosphere Programme: IGBP, and University of Maryland: UMd), derived from 1992-1993 Advanced Very High Resolution Radiometer(AVHRR) data sets, was performed over the Asian continent. Preprocesses such as the unification of map projection and land cover definition, were applied for the comparison of the three different land cover data sets. Overall, the agreement among the three land cover data sets was relatively high for the land covers which have a distinct phenology, such as urban, open shrubland, mixed forest, and bare ground (>45%). The ratios of triple agreement (TA), couple agreement (CA) and total disagreement (TD) among the three land cover data sets are 30.99%, 57.89% and 8.91%, respectively. The agreement ratio between USGS and IGBP is much greater (about 80%) than that (about 32%) between USGS and UMd (or IGBP and UMd). The main reasons for the relatively low agreement among the three land cover data sets are differences in 1) the number of land cover categories, 2) the basic input data sets used for the classification, 3) classification (or clustering) methodologies, and 4) level of preprocessing. The number of categories for the USGS, IGBP and UMd are 24, 17 and 14, respectively. USGS and IGBP used only the 12 monthly normalized difference vegetation index (NDVI), whereas UMd used the 12 monthly NDVI and other 29 auxiliary data derived from AVHRR 5 channels. USGS and IGBP used unsupervised clustering method, whereas UMd used the supervised technique, decision tree using the ground truth data derived from the high resolution Landsat data. The insufficient preprocessing in USGS and IGBP compared to the UMd resulted in the spatial discontinuity and misclassification.

Hierarchical Ann Classification Model Combined with the Adaptive Searching Strategy (적응적 탐색 전략을 갖춘 계층적 ART2 분류 모델)

  • 김도현;차의영
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.649-658
    • /
    • 2003
  • We propose a hierarchical architecture of ART2 Network for performance improvement and fast pattern classification model using fitness selection. This hierarchical network creates coarse clusters as first ART2 network layer by unsupervised learning, then creates fine clusters of the each first layer as second network layer by supervised learning. First, it compares input pattern with each clusters of first layer and select candidate clusters by fitness measure. We design a optimized fitness function for pruning clusters by measuring relative distance ratio between a input pattern and clusters. This makes it possible to improve speed and accuracy. Next, it compares input pattern with each clusters connected with selected clusters and finds winner cluster. Finally it classifies the pattern by a label of the winner cluster. Results of our experiments show that the proposed method is more accurate and fast than other approaches.

Network Anomaly Detection Technologies Using Unsupervised Learning AutoEncoders (비지도학습 오토 엔코더를 활용한 네트워크 이상 검출 기술)

  • Kang, Koohong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.617-629
    • /
    • 2020
  • In order to overcome the limitations of the rule-based intrusion detection system due to changes in Internet computing environments, the emergence of new services, and creativity of attackers, network anomaly detection (NAD) using machine learning and deep learning technologies has received much attention. Most of these existing machine learning and deep learning technologies for NAD use supervised learning methods to learn a set of training data set labeled 'normal' and 'attack'. This paper presents the feasibility of the unsupervised learning AutoEncoder(AE) to NAD from data sets collecting of secured network traffic without labeled responses. To verify the performance of the proposed AE mode, we present the experimental results in terms of accuracy, precision, recall, f1-score, and ROC AUC value on the NSL-KDD training and test data sets. In particular, we model a reference AE through the deep analysis of diverse AEs varying hyper-parameters such as the number of layers as well as considering the regularization and denoising effects. The reference model shows the f1-scores 90.4% and 89% of binary classification on the KDDTest+ and KDDTest-21 test data sets based on the threshold of the 82-th percentile of the AE reconstruction error of the training data set.

A Detection Model using Labeling based on Inference and Unsupervised Learning Method (추론 및 비교사학습 기법 기반 레이블링을 적용한 탐지 모델)

  • Hong, Sung-Sam;Kim, Dong-Wook;Kim, Byungik;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.18 no.1
    • /
    • pp.65-75
    • /
    • 2017
  • The Detection Model is the model to find the result of a certain purpose using artificial intelligent, data mining, intelligent algorithms In Cyber Security, it usually uses to detect intrusion, malwares, cyber incident, and attacks etc. There are an amount of unlabeled data that are collected in a real environment such as security data. Since the most of data are not defined the class labels, it is difficult to know type of data. Therefore, the label determination process is required to detect and analysis with accuracy. In this paper, we proposed a KDFL(K-means and D-S Fusion based Labeling) method using D-S inference and k-means(unsupervised) algorithms to decide label of data records by fusion, and a detection model architecture using a proposed labeling method. A proposed method has shown better performance on detection rate, accuracy, F1-measure index than other methods. In addition, since it has shown the improved results in error rate, we have verified good performance of our proposed method.