• Title/Summary/Keyword: Unsupervised machine learning.

Search Result 139, Processing Time 0.025 seconds

A Study on the Construction of Stable Clustering by Minimizing the Order Bias (순서 바이어스 최소화에 의한 안정적 클러스터링 구축에 관한 연구)

  • Lee, Gye-Seong
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.6
    • /
    • pp.1571-1580
    • /
    • 1999
  • When a hierarchical structure is derived from data set for data mining and machine learning, using a conceptual clustering algorithm, one of the unsupervised learning paradigms, it is not unusual to have a different set of outcomes with respect to the order of processing data objects. To overcome this problem, the first classification process is proceeded to construct an initial partition. The partition is expected to imply the possible range in the number of final classes. We apply center sorting to the data objects in the classes of the partition for new data ordering and build a new partition using ITERATE clustering procedure. We developed an algorithm, REIT that leads to the final partition with stable and best partition score. A number of experiments were performed to show the minimization of order bias effects using the algorithm.

  • PDF

A Stay Detection Algorithm Using GPS Trajectory and Points of Interest Data

  • Eunchong Koh;Changhoon Lyu;Goya Choi;Kye-Dong Jung;Soonchul Kwon;Chigon Hwang
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.176-184
    • /
    • 2023
  • Points of interest (POIs) are widely used in tourism recommendations and to provide information about areas of interest. Currently, situation judgement using POI and GPS data is mainly rule-based. However, this approach has the limitation that inferences can only be made using predefined POI information. In this study, we propose an algorithm that uses POI data, GPS data, and schedule information to calculate the current speed, location, schedule matching, movement trajectory, and POI coverage, and uses machine learning to determine whether to stay or go. Based on the input data, the clustered information is labelled by k-means algorithm as unsupervised learning. This result is trained as the input vector of the SVM model to calculate the probability of moving and staying. Therefore, in this study, we implemented an algorithm that can adjust the schedule using the travel schedule, POI data, and GPS information. The results show that the algorithm does not rely on predefined information, but can make judgements using GPS data and POI data in real time, which is more flexible and reliable than traditional rule-based approaches. Therefore, this study can optimize tourism scheduling. Therefore, the stay detection algorithm using GPS movement trajectories and POIs developed in this study provides important information for tourism schedule planning and is expected to provide much value for tourism services.

An intelligent cooling control system for mitigating the cracking risks of mass concretes during bridge construction

  • Ruinan An;Peng Lin;Daoxiang Chen;Jianshu Ouyang;Zichang Li;Zheng Zhang;Yuanguang Liu
    • Advances in concrete construction
    • /
    • v.17 no.5
    • /
    • pp.257-271
    • /
    • 2024
  • During any construction involving mass concrete, it is crucial to control cracking during the placement and curing process. This study develops an intelligent cooling control system that regulates water temperature and flow based on concrete hydration heat, effectively preventing cracking in bridge construction. The system consists of hardware, a neural network-based control algorithm, and an information management system. An optimal cooling control strategy is proposed to dynamically regulate water flow and temperature, preventing cracking by utilizing real-time temperature data, target control curves, neural network algorithms, and cloud-based computing. The intelligent cooling control system has been successfully implemented in controlling cracking risks during bridge construction. It not only mitigates the risk but also provides a convenient management strategy for bridge construction projects. The optimal cooling control strategy ensures high accuracy and stability under unsupervised learning conditions. This intelligent cooling control system can be applied to similar constructions such as bridge, dam, and building that involve the use of mass concrete.

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

  • Hyun Kwon;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.4
    • /
    • pp.61-70
    • /
    • 2023
  • Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.

An Outlier Detection Using Autoencoder for Ocean Observation Data (해양 이상 자료 탐지를 위한 오토인코더 활용 기법 최적화 연구)

  • Kim, Hyeon-Jae;Kim, Dong-Hoon;Lim, Chaewook;Shin, Yongtak;Lee, Sang-Chul;Choi, Youngjin;Woo, Seung-Buhm
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.265-274
    • /
    • 2021
  • Outlier detection research in ocean data has traditionally been performed using statistical and distance-based machine learning algorithms. Recently, AI-based methods have received a lot of attention and so-called supervised learning methods that require classification information for data are mainly used. This supervised learning method requires a lot of time and costs because classification information (label) must be manually designated for all data required for learning. In this study, an autoencoder based on unsupervised learning was applied as an outlier detection to overcome this problem. For the experiment, two experiments were designed: one is univariate learning, in which only SST data was used among the observation data of Deokjeok Island and the other is multivariate learning, in which SST, air temperature, wind direction, wind speed, air pressure, and humidity were used. Period of data is 25 years from 1996 to 2020, and a pre-processing considering the characteristics of ocean data was applied to the data. An outlier detection of actual SST data was tried with a learned univariate and multivariate autoencoder. We tried to detect outliers in real SST data using trained univariate and multivariate autoencoders. To compare model performance, various outlier detection methods were applied to synthetic data with artificially inserted errors. As a result of quantitatively evaluating the performance of these methods, the multivariate/univariate accuracy was about 96%/91%, respectively, indicating that the multivariate autoencoder had better outlier detection performance. Outlier detection using an unsupervised learning-based autoencoder is expected to be used in various ways in that it can reduce subjective classification errors and cost and time required for data labeling.

A Novel of Data Clustering Architecture for Outlier Detection to Electric Power Data Analysis (전력데이터 분석에서 이상점 추출을 위한 데이터 클러스터링 아키텍처에 관한 연구)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Young Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.465-472
    • /
    • 2017
  • In the past, researchers mainly used the supervised learning technique of machine learning to analyze power data and investigated the identification of patterns through the data mining technique. Data analysis research, however, faces its limitations with the old data classification and analysis techniques today when the size of electric power data has increased with the possible real-time provision of data. This study thus set out to propose a clustering architecture to analyze large-sized electric power data. The clustering process proposed in the study supplements the K-means algorithm, an unsupervised learning technique, for its problems and is capable of automating the entire process from the collection of electric power data to their analysis. In the present study, power data were categorized and analyzed in total three levels, which include the row data level, clustering level, and user interface level. In addition, the investigator identified K, the ideal number of clusters, based on principal component analysis and normal distribution and proposed an altered K-means algorithm to reduce data that would be categorized as ideal points in order to increase the efficiency of clustering.

Gaussian mixture model for automated tracking of modal parameters of long-span bridge

  • Mao, Jian-Xiao;Wang, Hao;Spencer, Billie F. Jr.
    • Smart Structures and Systems
    • /
    • v.24 no.2
    • /
    • pp.243-256
    • /
    • 2019
  • Determination of the most meaningful structural modes and gaining insight into how these modes evolve are important issues for long-term structural health monitoring of the long-span bridges. To address this issue, modal parameters identified throughout the life of the bridge need to be compared and linked with each other, which is the process of mode tracking. The modal frequencies for a long-span bridge are typically closely-spaced, sensitive to the environment (e.g., temperature, wind, traffic, etc.), which makes the automated tracking of modal parameters a difficult process, often requiring human intervention. Machine learning methods are well-suited for uncovering complex underlying relationships between processes and thus have the potential to realize accurate and automated modal tracking. In this study, Gaussian mixture model (GMM), a popular unsupervised machine learning method, is employed to automatically determine and update baseline modal properties from the identified unlabeled modal parameters. On this foundation, a new mode tracking method is proposed for automated mode tracking for long-span bridges. Firstly, a numerical example for a three-degree-of-freedom system is employed to validate the feasibility of using GMM to automatically determine the baseline modal properties. Subsequently, the field monitoring data of a long-span bridge are utilized to illustrate the practical usage of GMM for automated determination of the baseline list. Finally, the continuously monitoring bridge acceleration data during strong typhoon events are employed to validate the reliability of proposed method in tracking the changing modal parameters. Results show that the proposed method can automatically track the modal parameters in disastrous scenarios and provide valuable references for condition assessment of the bridge structure.

Performance of Investment Strategy using Investor-specific Transaction Information and Machine Learning (투자자별 거래정보와 머신러닝을 활용한 투자전략의 성과)

  • Kim, Kyung Mock;Kim, Sun Woong;Choi, Heung Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.1
    • /
    • pp.65-82
    • /
    • 2021
  • Stock market investors are generally split into foreign investors, institutional investors, and individual investors. Compared to individual investor groups, professional investor groups such as foreign investors have an advantage in information and financial power and, as a result, foreign investors are known to show good investment performance among market participants. The purpose of this study is to propose an investment strategy that combines investor-specific transaction information and machine learning, and to analyze the portfolio investment performance of the proposed model using actual stock price and investor-specific transaction data. The Korea Exchange offers daily information on the volume of purchase and sale of each investor to securities firms. We developed a data collection program in C# programming language using an API provided by Daishin Securities Cybosplus, and collected 151 out of 200 KOSPI stocks with daily opening price, closing price and investor-specific net purchase data from January 2, 2007 to July 31, 2017. The self-organizing map model is an artificial neural network that performs clustering by unsupervised learning and has been introduced by Teuvo Kohonen since 1984. We implement competition among intra-surface artificial neurons, and all connections are non-recursive artificial neural networks that go from bottom to top. It can also be expanded to multiple layers, although many fault layers are commonly used. Linear functions are used by active functions of artificial nerve cells, and learning rules use Instar rules as well as general competitive learning. The core of the backpropagation model is the model that performs classification by supervised learning as an artificial neural network. We grouped and transformed investor-specific transaction volume data to learn backpropagation models through the self-organizing map model of artificial neural networks. As a result of the estimation of verification data through training, the portfolios were rebalanced monthly. For performance analysis, a passive portfolio was designated and the KOSPI 200 and KOSPI index returns for proxies on market returns were also obtained. Performance analysis was conducted using the equally-weighted portfolio return, compound interest rate, annual return, Maximum Draw Down, standard deviation, and Sharpe Ratio. Buy and hold returns of the top 10 market capitalization stocks are designated as a benchmark. Buy and hold strategy is the best strategy under the efficient market hypothesis. The prediction rate of learning data using backpropagation model was significantly high at 96.61%, while the prediction rate of verification data was also relatively high in the results of the 57.1% verification data. The performance evaluation of self-organizing map grouping can be determined as a result of a backpropagation model. This is because if the grouping results of the self-organizing map model had been poor, the learning results of the backpropagation model would have been poor. In this way, the performance assessment of machine learning is judged to be better learned than previous studies. Our portfolio doubled the return on the benchmark and performed better than the market returns on the KOSPI and KOSPI 200 indexes. In contrast to the benchmark, the MDD and standard deviation for portfolio risk indicators also showed better results. The Sharpe Ratio performed higher than benchmarks and stock market indexes. Through this, we presented the direction of portfolio composition program using machine learning and investor-specific transaction information and showed that it can be used to develop programs for real stock investment. The return is the result of monthly portfolio composition and asset rebalancing to the same proportion. Better outcomes are predicted when forming a monthly portfolio if the system is enforced by rebalancing the suggested stocks continuously without selling and re-buying it. Therefore, real transactions appear to be relevant.

Forecasting the Precipitation of the Next Day Using Deep Learning (딥러닝 기법을 이용한 내일강수 예측)

  • Ha, Ji-Hun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.2
    • /
    • pp.93-98
    • /
    • 2016
  • For accurate precipitation forecasts the choice of weather factors and prediction method is very important. Recently, machine learning has been widely used for forecasting precipitation, and artificial neural network, one of machine learning techniques, showed good performance. In this paper, we suggest a new method for forecasting precipitation using DBN, one of deep learning techniques. DBN has an advantage that initial weights are set by unsupervised learning, so this compensates for the defects of artificial neural networks. We used past precipitation, temperature, and the parameters of the sun and moon's motion as features for forecasting precipitation. The dataset consists of observation data which had been measured for 40 years from AWS in Seoul. Experiments were based on 8-fold cross validation. As a result of estimation, we got probabilities of test dataset, so threshold was used for the decision of precipitation. CSI and Bias were used for indicating the precision of precipitation. Our experimental results showed that DBN performed better than MLP.

EEG Signal Classification based on SVM Algorithm (SVM(Support Vector Machine) 알고리즘 기반의 EEG(Electroencephalogram) 신호 분류)

  • Rhee, Sang-Won;Cho, Han-Jin;Chae, Cheol-Joo
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.2
    • /
    • pp.17-22
    • /
    • 2020
  • In this paper, we measured the user's EEG signal and classified the EEG signal using the Support Vector Machine algorithm and measured the accuracy of the signal. An experiment was conducted to measure the user's EEG signals by separating men and women, and a single channel EEG device was used for EEG signal measurements. The results of measuring users' EEG signals using EEG devices were analyzed using R. In addition, data in the study was predicted using a 80:20 ratio between training data and test data by applying a combination of specific vectors with the highest classifying performance of the SVM, and thus the predicted accuracy of 93.2% of the recognition rate. This paper suggested that the user's EEG signal could be recognized at about 93.2 percent, and that it can be performed only by simple linear classification of the SVM algorithm, which can be used variously for biometrics using EEG signals.