• Title/Summary/Keyword: K means clustering

Search Result 1,111, Processing Time 0.028 seconds

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

Modeling of the Cluster-based Multi-hop Sensor Networks (클거스터 기반 다중 홉 센서 네트워크의 모델링 기법)

  • Choi Jin-Chul;Lee Chae-Woo
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.43 no.1 s.343
    • /
    • pp.57-70
    • /
    • 2006
  • This paper descWireless Sensor Network consisting of a number of small sensors with transceiver and data processor is an effective means for gathering data in a variety of environments. The data collected by each sensor is transmitted to a processing center that use all reported data to estimate characteristics of the environment or detect an event. This process must be designed to conserve the limited energy resources of the sensor since neighboring sensors generally have the data of similar information. Therefore, clustering scheme which sends aggregated information to the processing center may save energy. Existing multi-hop cluster energy consumption modeling scheme can not estimate exact energy consumption of an individual sensor. In this paper, we propose a new cluster energy consumption model which modified existing problem. We can estimate more accurate total energy consumption according to the number of clusterheads by using Voronoi tessellation. Thus, we can realize an energy efficient cluster formation. Our modeling has an accuracy over $90\%$ when compared with simulation and has considerably superior than existing modeling scheme about $60\%.$ We also confirmed that energy consumption of the proposed modeling scheme is more accurate when the sensor density is increased.

Hydrogeochemistry and Statistical Analysis for Low and Intermediate Level Radioactive Waste Disposal Site in Gyeongju (경주 중·저준위 방폐장의 수리지화학 및 통계 분석)

  • Soon-Il Ok;Sieun Kim;Seongyeon Jung;Chung-Mo Lee
    • Journal of the Korean earth science society
    • /
    • v.44 no.6
    • /
    • pp.629-642
    • /
    • 2023
  • Currently, low and intermediate level radioactive waste is being disposed of at the Gyeongju disposal site for permanent isolation. Since 2006, the Korea Radioactive Waste Agency has been conducting site characteristics surveys continuously verifying changes in the site based on the site monitoring and investigation plan. The hydrogeochemical environment of the disposal site is considered for the evaluation of natural barriers. However, the seawater must be considered because of the regional characteristics of Gyeongju, which is near the East Sea. Therefore, this study, collected 30 samples for deriving the groundwater quality data from seven wells and compared with two seawater samples collected from October 2017 to June 2022. Additionally, the study explores the groundwater monitoring method using statistical tools such as clustering and background concentration analysis. The groundwater samples in the study area were classified into two to four clusters depending on their chemical constituents-especially, EC, HCO3, Na, and Cl-using statistical analysis, molar ratio, and K-means clustering.

A Statistical Approach for Improving the Embedding Capacity of Block Matching based Image Steganography (블록 매칭 기반 영상 스테가노그래피의 삽입 용량 개선을 위한 통계적 접근 방법)

  • Kim, Jaeyoung;Park, Hanhoon;Park, Jong-Il
    • Journal of Broadcast Engineering
    • /
    • v.22 no.5
    • /
    • pp.643-651
    • /
    • 2017
  • Steganography is one of information hiding technologies and discriminated from cryptography in that it focuses on avoiding the existence the hidden information from being detected by third parties, rather than protecting it from being decoded. In this paper, as an image steganography method which uses images as media, we propose a new block matching method that embeds information into the discrete wavelet transform (DWT) domain. The proposed method, based on a statistical analysis, reduces loss of embedding capacity due to inequable use of candidate blocks. It works in such a way that computes the variance of each candidate block, preserves candidate blocks with high frequency components while reducing candidate blocks with low frequency components by compressing them exploiting the k-means clustering algorithm. Compared with the previous block matching method, the proposed method can reconstruct secret images with similar PSNRs while embedding higher-capacity information.

Design and Assessment of an Ozone Potential Forecasting Model using Multi-regression Equations in Ulsan Metropolitan Area (중회귀 모형을 이용한 울산지역 오존 포텐셜 모형의 설계 및 평가)

  • Kim, Yoo-Keun;Lee, So-Young;Lim, Yun-Kyu;Song, Sang-Keun
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.23 no.1
    • /
    • pp.14-28
    • /
    • 2007
  • This study presented the selection of ozone ($O_3$) potential factors and designed and assessed its potential prediction model using multiple-linear regression equations in Ulsan area during the springtime from April to June, $2000{\sim}2004$. $O_3$ potential factors were selected by analyzing the relationship between meterological parameters and surface $O_3$ concentrations. In addition, cluster analysis (e.g., average linkage and K-means clustering techniques) was performed to identify three major synoptic patterns (e.g., $P1{\sim}P3$) for an $O_3$ potential prediction model. P1 is characterized by a presence of a low-pressure system over northeastern Korea, the Ulsan was influenced by the northwesterly synoptic flow leading to a retarded sea breeze development. P2 is characterized by a weakening high-pressure system over Korea, and P3 is clearly associated with a migratory anticyclone. The stepwise linear regression was performed to develop models for prediction of the highest 1-h $O_3$ occurring in the Ulsan. The results of the models were rather satisfactory, and the high $O_3$ simulation accuracy for $P1{\sim}P3$ synoptic patterns was found to be 79, 85, and 95%, respectively ($2000{\sim}2004$). The $O_3$ potential prediction model for $P1{\sim}P3$ using the predicted meteorological data in 2005 showed good high $O_3$ prediction performance with 78, 75, and 70%, respectively. Therefore the regression models can be a useful tool for forecasting of local $O_3$ concentration.

A change of the public's emotion depending on Temperature & Humidity index (온습도에 따른 대중의 감성(감정+감각) 활동 변화)

  • Yang, Junggi;Kim, Geunyoung;Lee, Youngho;Kang, Un-Gu
    • Journal of Digital Convergence
    • /
    • v.12 no.10
    • /
    • pp.243-252
    • /
    • 2014
  • Many researches about the effect on politics, economics and Sociocultural phenomenon using the social media are in progress. Authors utilized NAVER Trend most famous web browsing service in korea, NAVER Blog social media, NAVER Cafe service and Open Data(API) and also used temperature, humidity index data of Korea Meteorological Administration. This study analyzed a change of the public's emotion in korea using Cluster analysis of vocabulary of taste among its of feelings and senses. K-means clustering was followed by decision of the number of groups which was used Chi-square goodness of fit test and ward analysis. Eight groups was made and it represented sensitive vocabulary. By Discriminant analysis, eight groups decided by Cluster analysis has 98.9% accuracy. The change of the public's emotion has capability to predict people's activity so they can share sensibility and a bond of sympathy developed between them.

The Habitat Classification of mammals in Korea based on the National Ecosystem Survey (전국자연환경조사를 활용한 포유류 서식지 유형의 분류)

  • Lee, Hwajin;Ha, Jeongwook;Cha, Jinyeol;Lee, Junghyo;Yoon, Heenam;Chung, Chulun;Oh, Hongshik;Bae, Soyeon
    • Journal of Environmental Impact Assessment
    • /
    • v.26 no.2
    • /
    • pp.160-170
    • /
    • 2017
  • The purpose of this study is to perform clustering of the habitat types and to identify the characteristics of species in the habitat types using mammal data (70,562) of the 3rd National Ecosystem Survey conducted from 2006 to 2012. The 15 habitat types recorded in the field-paper of the 3rd National ecosystem survey were reclassified, which was followed by the statistical analysis of mammal habitat types. In the habitat types cluster analysis, non-hierarchical cluster analysis (k-means cluster analysis), hierarchical cluster analysis, and non-metric multidimensional scaling method were applied to 14 habitat types recorded more than 30 times. A total of 7 Orders, 16 Families, and 39 Species of mammals were identified in the 3rd National Ecosystem Survey collected nationwide. When 11 clusters were classified by habitat types, the simple structure index was the highest (ssi = 0.07). As a result of the similarities and hierarchies between habitat types suggested by the hierarchical clustering analysis, the residential areas were the most different habitat types for mammals; the next following type was a cluster together with rivers and coasts. The results of the non-metric multidimensional scaling analysis demonstrated that both Mus musculus and Rattus norvegicus restrictively appeared in a residential area, which is the most discriminating habitat type. Lutra lutra restrictively appeared in coastal and river areas. In summary, according to our results, the mammalian habitat can be divided into the following four types: (1) the forest type (using forest as the main habitat and migration route); (2) the river type (using water as the main habitat); (3) the residence habitat (living near residential area); and (4) the lowland type (consuming grain or seeds as the main feeding resource).

Analysis method of patent document to Forecast Patent Registration (특허 등록 예측을 위한 특허 문서 분석 방법)

  • Koo, Jung-Min;Park, Sang-Sung;Shin, Young-Geun;Jung, Won-Kyo;Jang, Dong-Sik
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.4
    • /
    • pp.1458-1467
    • /
    • 2010
  • Recently, imitation and infringement rights of an intellectual property are being recognized as impediments to nation's industrial growth. To prevent the huge loss which comes from theses impediments, many researchers are studying protection and efficient management of an intellectual property in various ways. Especially, the prediction of patent registration is very important part to protect and assert intellectual property rights. In this study, we propose the patent document analysis method by using text mining to predict whether the patent is registered or rejected. In the first instance, the proposed method builds the database by using the word frequencies of the rejected patent documents. And comparing the builded database with another patent documents draws the similarity value between each patent document and the database. In this study, we used k-means which is partitioning clustering algorithm to select criteria value of patent rejection. In result, we found conclusion that some patent which similar to rejected patent have strong possibility of rejection. We used U.S.A patent documents about bluetooth technology, solar battery technology and display technology for experiment data.

Analyzing K-POP idol popularity factors using music charts and new media data using machine learning (머신러닝을 활용한 음원 차트와 뉴미디어 데이터를 활용한 K-POP 아이돌 인기 요인 분석)

  • Jiwon Choi;Dayeon Jung;Kangkyu Choi;Taein Lim;Daehoon Kim;Jongkyn Jung;Seunmin Rho
    • Journal of Platform Technology
    • /
    • v.12 no.1
    • /
    • pp.55-66
    • /
    • 2024
  • The K-POP market has become influential not only in culture but also in society as a whole, including diplomacy and environmental movements. As a result, various papers have been conducted based on machine learning to identify the success factors of idols by utilizing traditional data such as music and recordings. However, there is a limitation that previous studies have not reflected the influence of new media platforms such as Instagram releases, YouTube shorts, TikTok, Twitter, etc. on the popularity of idols. Therefore, it is difficult to clarify the causal relationship of recent idol success factors because the existing studies do not consider the daily changing media trends. To solve these problems, this paper proposes a data collection system and analysis methodology for idol-related data. By developing a container-based real-time data collection automation system that reflects the specificity of idol data, we secure the stability and scalability of idol data collection and compare and analyze the clusters of successful idols through a K-Means clustering-based outlier detection model. As a result, we were able to identify commonalities among successful idols such as gender, time of success after album release, and association with new media. Through this, it is expected that we can finally plan optimal comeback promotions for each idol, album type, and comeback period to improve the chances of idol success.

  • PDF

Transcriptome Analyses for the Anti-Adipogenic Mechanism of an Herbal Composition (생약복합물의 지방세포형성억제 기전규명을 위한 전사체 분석)

  • Lee, Hae-Yong;Kang, Ryun-Hwa;Bae, Sung-Min;Chae, Soo-Ahn;Lee, Jung-Ju;Oh, Dong-Jin;Park, Suk-Won;Cho, Soo-Hyun;Shim, Yae-Jie;Yoon, Yoo-Sik
    • Journal of Life Science
    • /
    • v.20 no.7
    • /
    • pp.1054-1065
    • /
    • 2010
  • SH21B is a natural composition composed of seven herbs: Scutellaria baicalensis Georgi, Prunus armeniaca Maxim, Ephedra sinica Stapf, Acorus gramineus Soland, Typha orientalis Presl, Polygala tenuifolia Willd and Nelumbo nucifera Gaertner (Ratio 3:3:3:3:3:2:2). In our previous study, we reported that SH21B inhibited adipogenesis and fat accumulation in 3T3-L1 cells through modulation of various regulators in the adipogenesis pathway. The aim of this study was to analyze the transcriptome profiles for the anti-adipogenic effects of SH21B in 3T3-L1 cells. Total RNAs from SH21B-treated 3T3-L1 cells were reverse-transcribed into cDNAs and hybridized to Affymetrix Mouse Gene 1.0 ST array. From microarray analyses, we identified 2,568 genes of which expressions were changed more than two-fold by SH21B, and the clustering analyses of these genes resulted in 9 clusters. Three clusters among the 9 showed down-regulation by SH21B (cluster 4, cluster 6 and cluster 9), and two clusters showed up-regulation by SH21B (cluster 7 and cluster 8) during the adipogenesis of 3T3-L1 cells. It was found that many genes related to cell proliferation and adipogenesis were included in these clusters. Clusters 4, 6 and 9 included genes which were related with adipogenesis induction and cell cycle arrest. Clusters 7 and 8 included genes related to cell proliferation as well as adipogenesis inhibition. These results suggest that the mechanisms of the anti-adipogenic effects of SH21B may be the modulation of genes involved in cell proliferation and adipogenesis.