• Title/Summary/Keyword: k-Means 알고리즘

Search Result 773, Processing Time 0.031 seconds

A Study on the Development of Prediction Method of Ozone Formation for Ozone Forecast System (오존예보시스템을 위한 오존 발생량의 예측기법 개발에 관한 연구)

  • Oh, Sea Cheon;Yeo, Yeong-Koo
    • Clean Technology
    • /
    • v.8 no.1
    • /
    • pp.27-37
    • /
    • 2002
  • To verify the performance and effectiveness of bilinear model for the development of ozone prediction system, the simulation experiments of the model identification for ozone formation were performed by using bilinear and linear models. And the prediction results of the ozone formation by bilinear model were compared to those of linear model and the measured data of Seoul. ARMA(Autoregressive Moving Average) model was used in the model identification. A recursive parameter estimation algorithm based on an equation error method was used to estimate parameters of model. From the results of model identification experiment, the ozone formation by bilinear model showed good agreement with the ozone formation from the simulator. From the comparison of the prediction results and the measured data, it appears that the method proposed in this work is a reasonable means of developing real-time short-term prediction of ozone formation for an ozone forecast system.

  • PDF

A Study on Detecting of an Anonymity Network and an Effective Counterstrategy in the Massive Network Environment (대용량 네트워크 환경에서 익명 네트워크 탐지 및 효과적 대응전략에 관한 연구)

  • Seo, Jung-woo;Lee, Sang-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.3
    • /
    • pp.667-678
    • /
    • 2016
  • Due to a development of the cable/wireless network infra, the traffic as big as unable to compare with the past is being served through the internet, the traffic is increasing every year following the change of the network paradigm such as the object internet, especially the traffic of about 1.6 zettabyte is expected to be distributed through the network in 2018. As the network traffic increases, the performance of the security infra is developing together to deal with the bulk terabyte traffic in the security equipment, and is generating hundreds of thousands of security events every day such as hacking attempt and the malignant code. Efficiently analyzing and responding to an event on the attack attempt detected by various kinds of security equipment of company is one of very important assignments for providing a stable internet service. This study attempts to overcome the limit of study such as the detection of Tor network traffic using the existing low-latency by classifying the anonymous network by means of the suggested algorithm about the event detected in the security infra.

Analysis of Hyperspectral Radiometer and Water Constituents Data for Remote Estimation of Water Quality (원격 수질 측정을 위한 현장 초분광 복사계 및 수중 구성성분 관측 자료 분석)

  • Kim, Wonkook;Choi, Jun Myoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.36 no.4
    • /
    • pp.205-211
    • /
    • 2018
  • Remote estimation of water quality via radiometric instruments provides a convenient means for monitoring environmental changes in water bodies in wide areas. Combined with platforms such as satellite, manned/unmanned vehicles, it reduces the measurement cost and time for acquiring water quality information on the interested target areas. To develop accurate retrieval algorithms, however, acquisition of in-situ measurements from various optical environment is critical. In this study, hyperspectral radiometric measurements, the coincident water quality variables, and its optical properties were obtained to analyze the optical environment of the study area. Field data collected around the Tongyeong area showed that the area has optically complex environment, with occasional outbreak of red tide in summer seasons. Effect of water constituents on the optical variables (remote sensing reflectance and absorption coefficients) were qualitatively analyzed.

Design of Echo Classifier Based on Neuro-Fuzzy Algorithm Using Meteorological Radar Data (기상레이더를 이용한 뉴로-퍼지 알고리즘 기반 에코 분류기 설계)

  • Oh, Sung-Kwun;Ko, Jun-Hyun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.5
    • /
    • pp.676-682
    • /
    • 2014
  • In this paper, precipitation echo(PRE) and non-precipitaion echo(N-PRE)(including ground echo and clear echo) through weather radar data are identified with the aid of neuro-fuzzy algorithm. The accuracy of the radar information is lowered because meteorological radar data is mixed with the PRE and N-PRE. So this problem is resolved by using RBFNN and judgement module. Structure expression of weather radar data are analyzed in order to classify PRE and N-PRE. Input variables such as Standard deviation of reflectivity(SDZ), Vertical gradient of reflectivity(VGZ), Spin change(SPN), Frequency(FR), cumulation reflectivity during 1 hour(1hDZ), and cumulation reflectivity during 2 hour(2hDZ) are made by using weather radar data and then each characteristic of input variable is analyzed. Input data is built up from the selected input variables among these input variables, which have a critical effect on the classification between PRE and N-PRE. Echo judgment module is developed to do echo classification between PRE and N-PRE by using testing dataset. Polynomial-based radial basis function neural networks(RBFNNs) are used as neuro-fuzzy algorithm, and the proposed neuro-fuzzy echo pattern classifier is designed by combining RBFNN with echo judgement module. Finally, the results of the proposed classifier are compared with both CZ and DZ, as well as QC data, and analyzed from the view point of output performance.

Creation and clustering of proximity data for text data analysis (텍스트 데이터 분석을 위한 근접성 데이터의 생성과 군집화)

  • Jung, Min-Ji;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.451-462
    • /
    • 2019
  • Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various documents provided by the objects to be analyzed. When analyzing objects using this matrix, researchers generally select only terms that are common in documents belonging to one object as keywords. Keywords are used to analyze the object. However, this method misses the unique information of the individual document as well as causes a problem of removing potential keywords that occur frequently in a specific document. In this study, we define data that can overcome this problem as proximity data. We introduce twelve methods that generate proximity data and cluster the objects through two clustering methods of multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized for clustering the object.

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Prediction of Housing Price Index using Data Mining and Learning Techniques (데이터마이닝과 학습기법을 이용한 부동산가격지수 예측)

  • Lee, Jiyoung;Ryu, Jae Pil
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.47-53
    • /
    • 2021
  • With increasing interest in the 4th industrial revolution, data-driven scientific methodologies have developed. However, there are limitations of data collection in the real estate field of research. In addition, as the public becomes more knowledgeable about the real estate market, the qualitative sentiment comes to play a bigger role in the real estate market. Therefore, we propose a method to collect quantitative data that reflects sentiment using text mining and k-means algorithms, rather than the existing source data, and to predict the direction of housing index through artificial neural network learning based on the collected data. Data from 2012 to 2019 is set as the training period and 2020 as the prediction period. It is expected that this study will contribute to the utilization of scientific methods such as artificial neural networks rather than the use of the classical methodology for real estate market participants in their decision making process.

Improve reliability of SSD through cluster analysis based on error rate of 3D-NAND flash memory and application of differentiated protection policy (3D-NAND 플래시 메모리의 오류율 기반 군집분석과 차별화된 보호정책 적용을 통한 SSD의 신뢰성 향상 방안)

  • Son, Seung woo;Oh, Min jin;Kim, Jaeho
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2021.07a
    • /
    • pp.1-2
    • /
    • 2021
  • 3D NAND 플래시 메모리는 플래너(2D) NAND 셀을 적층하는 방식으로 단위 면적당 고용량을 제공한다. 하지만 적층 공정의 특성상 각 레이어별 또는 셀 위치에 따라 오류 발생 빈도가 달라질 수 있는 문제가 있다. 이와 같은 현상은 플래시 메모리의 쓰기/지우기(P/E) 횟수가 증가할 수록 두드러진다. SSD와 같은 대부분의 플래시 기반 저장장치는 오류 교정을 위하여 ECC를 사용한다. 이 방법은 모든 플래시 메모리 페이지에 대하여 고정된 보호 강도를 제공하므로 물리적 위치에 따라 에러 발생률이 각기 다르게 나타나는 3D NAND 플래시 메모리에서는 한계를 보인다. 따라서 본 논문에서는 오류 발생률 차이를 보이는 페이지와 레이어를 분류하여 각 영역별로 차별화된 보호강도를 적용한다. 우리는 페이지와 레이어별로 오류 발생률이 현저하게 달라지는 3K P/E 사이클에서 측정된 오류율을 바탕으로 페이지와 레이어를 분류하고 오류에 취약한 영역에 대해서는 패리티 데이터를 추가하여 차별화된 보호 강도를 제공한다. 오류 발생 횟수에 따른 영역 구분을 위하여 K-Means 머신러닝 알고리즘을 사용한다. 우리는 이와 같은 차별화된 보호정책이 3D NAND 플래시 메모리의 신뢰성과 수명향상에 기여할 수 있는 가능성을 보인다.

  • PDF

A Study on Clustering of Core Competencies to Deploy in and Develop Courseworks for New Digital Technology (카드소팅을 활용한 디지털 신기술 과정 핵심역량 군집화에 관한 연구)

  • Ji-Woon Lee;Ho Lee;Joung-Huem Kwon
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.565-572
    • /
    • 2022
  • Card sorting is a useful data collection method for understanding users' perceptions of relationships between items. In general, card sorting is an intuitive and cost-effective technique that is very useful for user research and evaluation. In this study, the core competencies of each field were used as competency cards used in the next stage of card sorting for course development, and the clustering results were derived by applying the K-means algorithm to cluster the results. As a result of card sorting, competency clustering for core competencies for each occupation in each field was verified based on Participant-Centric Analysis (PCA). For the number of core competency cards for each occupation, the number of participants who agreed appropriately for clustering and the degree of card similarity were derived compared to the number of sorting participants.

Development of Sentiment Analysis Model for the hot topic detection of online stock forums (온라인 주식 포럼의 핫토픽 탐지를 위한 감성분석 모형의 개발)

  • Hong, Taeho;Lee, Taewon;Li, Jingjing
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.187-204
    • /
    • 2016
  • Document classification based on emotional polarity has become a welcomed emerging task owing to the great explosion of data on the Web. In the big data age, there are too many information sources to refer to when making decisions. For example, when considering travel to a city, a person may search reviews from a search engine such as Google or social networking services (SNSs) such as blogs, Twitter, and Facebook. The emotional polarity of positive and negative reviews helps a user decide on whether or not to make a trip. Sentiment analysis of customer reviews has become an important research topic as datamining technology is widely accepted for text mining of the Web. Sentiment analysis has been used to classify documents through machine learning techniques, such as the decision tree, neural networks, and support vector machines (SVMs). is used to determine the attitude, position, and sensibility of people who write articles about various topics that are published on the Web. Regardless of the polarity of customer reviews, emotional reviews are very helpful materials for analyzing the opinions of customers through their reviews. Sentiment analysis helps with understanding what customers really want instantly through the help of automated text mining techniques. Sensitivity analysis utilizes text mining techniques on text on the Web to extract subjective information in the text for text analysis. Sensitivity analysis is utilized to determine the attitudes or positions of the person who wrote the article and presented their opinion about a particular topic. In this study, we developed a model that selects a hot topic from user posts at China's online stock forum by using the k-means algorithm and self-organizing map (SOM). In addition, we developed a detecting model to predict a hot topic by using machine learning techniques such as logit, the decision tree, and SVM. We employed sensitivity analysis to develop our model for the selection and detection of hot topics from China's online stock forum. The sensitivity analysis calculates a sentimental value from a document based on contrast and classification according to the polarity sentimental dictionary (positive or negative). The online stock forum was an attractive site because of its information about stock investment. Users post numerous texts about stock movement by analyzing the market according to government policy announcements, market reports, reports from research institutes on the economy, and even rumors. We divided the online forum's topics into 21 categories to utilize sentiment analysis. One hundred forty-four topics were selected among 21 categories at online forums about stock. The posts were crawled to build a positive and negative text database. We ultimately obtained 21,141 posts on 88 topics by preprocessing the text from March 2013 to February 2015. The interest index was defined to select the hot topics, and the k-means algorithm and SOM presented equivalent results with this data. We developed a decision tree model to detect hot topics with three algorithms: CHAID, CART, and C4.5. The results of CHAID were subpar compared to the others. We also employed SVM to detect the hot topics from negative data. The SVM models were trained with the radial basis function (RBF) kernel function by a grid search to detect the hot topics. The detection of hot topics by using sentiment analysis provides the latest trends and hot topics in the stock forum for investors so that they no longer need to search the vast amounts of information on the Web. Our proposed model is also helpful to rapidly determine customers' signals or attitudes towards government policy and firms' products and services.