• Title/Summary/Keyword: K means clustering

Search Result 1,118, Processing Time 0.028 seconds

Clustering-based Statistical Machine Translation Using Syntactic Structure and Word Similarity (문장구조 유사도와 단어 유사도를 이용한 클러스터링 기반의 통계기계번역)

  • Kim, Han-Kyong;Na, Hwi-Dong;Li, Jin-Ji;Lee, Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.4
    • /
    • pp.297-304
    • /
    • 2010
  • Clustering method which based on sentence type or document genre is a technique used to improve translation quality of SMT(statistical machine translation) by domain-specific translation. But there is no previous research using sentence type and document genre information simultaneously. In this paper, we suggest an integrated clustering method that classifying sentence type by syntactic structure similarity and document genre by word similarity information. We interpolated domain-specific models from clusters with general models to improve translation quality of SMT system. Kernel function and cosine measures are applied to calculate structural similarity and word similarity. With these similarities, we used machine learning algorithms similar to K-means to clustering. In Japanese-English patent translation corpus, we got 2.5% point relative improvements of translation quality at optimal case.

Machine Learning-Based Transactions Anomaly Prediction for Enhanced IoT Blockchain Network Security and Performance

  • Nor Fadzilah Abdullah;Ammar Riadh Kairaldeen;Asma Abu-Samah;Rosdiadee Nordin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1986-2009
    • /
    • 2024
  • The integration of blockchain technology with the rapid growth of Internet of Things (IoT) devices has enabled secure and decentralised data exchange. However, security vulnerabilities and performance limitations remain significant challenges in IoT blockchain networks. This work proposes a novel approach that combines transaction representation and machine learning techniques to address these challenges. Various clustering techniques, including k-means, DBSCAN, Gaussian Mixture Models (GMM), and Hierarchical clustering, were employed to effectively group unlabelled transaction data based on their intrinsic characteristics. Anomaly transaction prediction models based on classifiers were then developed using the labelled data. Performance metrics such as accuracy, precision, recall, and F1-measure were used to identify the minority class representing specious transactions or security threats. The classifiers were also evaluated on their performance using balanced and unbalanced data. Compared to unbalanced data, balanced data resulted in an overall average improvement of approximately 15.85% in accuracy, 88.76% in precision, 60% in recall, and 74.36% in F1-score. This demonstrates the effectiveness of each classifier as a robust classifier with consistently better predictive performance across various evaluation metrics. Moreover, the k-means and GMM clustering techniques outperformed other techniques in identifying security threats, underscoring the importance of appropriate feature selection and clustering methods. The findings have practical implications for reinforcing security and efficiency in real-world IoT blockchain networks, paving the way for future investigations and advancements.

Analysis of Apartment Power Consumption and Forecast of Power Consumption Based on Deep Learning (공동주택 전력 소비 데이터 분석 및 딥러닝을 사용한 전력 소비 예측)

  • Yoo, Namjo;Lee, Eunae;Chung, Beom Jin;Kim, Dong Sik
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1373-1380
    • /
    • 2019
  • In order to increase energy efficiency, developments of the advanced metering infrastructure (AMI) in the smart grid technology have recently been actively conducted. An essential part of AMI is analyzing power consumption and forecasting consumption patterns. In this paper, we analyze the power consumption and summarized the data errors. Monthly power consumption patterns are also analyzed using the k-means clustering algorithm. Forecasting the consumption pattern by each household is difficult. Therefore, we first classify the data into 100 clusters and then predict the average of the next day as the daily average of the clusters based on the deep neural network. Using practically collected AMI data, we analyzed the data errors and could successfully conducted power forecasting based on a clustering technique.

A Hybrid Multiuser Detection Algorithm for Outer Space DS-UWB Ad-hoc Network with Strong Narrowband Interference

  • Yin, Zhendong;Kuang, Yunsheng;Sun, Hongjian;Wu, Zhilu;Tang, Wenyan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.5
    • /
    • pp.1316-1332
    • /
    • 2012
  • Formation flying is an important technology that enables high cost-effective organization of outer space aircrafts. The ad-hoc wireless network based on direct-sequence ultra-wideband (DS-UWB) techniques is seen as an effective means of establishing wireless communication links between aircrafts. In this paper, based on the theory of matched filter and error bits correction, a hybrid detection algorithm is proposed for realizing multiuser detection (MUD) when the DS-UWB technique is used in the ad-hoc wireless network. The matched filter is used to generate a candidate code set which may contain several error bits. The error bits are then recognized and corrected by an novel error-bit corrector, which consists of two steps: code mapping and clustering. In the former step, based on the modified optimum MUD decision function, a novel mapping function is presented that maps the output candidate codes into a feature space for differentiating the right and wrong codes. In the latter step, the codes are clustered into the right and wrong sets by using the K-means clustering approach. Additionally, in order to prevent some right codes being wrongly classified, a sign judgment method is proposed that reduces the bit error rate (BER) of the system. Compared with the traditional detection approaches, e.g., matched filter, minimum mean square error (MMSE) and decorrelation receiver (DEC), the proposed algorithm can considerably improve the BER performance of the system because of its high probability of recognizing wrong codes. Simulation results show that the proposed algorithm can almost achieve the BER performance of the optimum MUD (OMD). Furthermore, compared with OMD, the proposed algorithm has lower computational complexity, and its BER performance is less sensitive to the number of users.

An Image Segmentation Method and Similarity Measurement Using fuzzy Algorithm for Object Recognition (물체인식을 위한 영상분할 기법과 퍼지 알고리듬을 이용한 유사도 측정)

  • Kim, Dong-Gi;Lee, Seong-Gyu;Lee, Moon-Wook;Kang, E-Sok
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.28 no.2
    • /
    • pp.125-132
    • /
    • 2004
  • In this paper, we propose a new two-stage segmentation method for the effective object recognition which uses region-growing algorithm and k-means clustering method. At first, an image is segmented into many small regions via region growing algorithm. And then the segmented small regions are merged in several regions so that the regions of an object may be included in the same region using typical k-means clustering method. This paper also establishes similarity measurement which is useful for object recognition in an image. Similarity is measured by fuzzy system whose input variables are compactness, magnitude of biasness and orientation of biasness of the object image, which are geometrical features of the object. To verify the effectiveness of the proposed two-stage segmentation method and similarity measurement, experiments for object recognition were made and the results show that they are applicable to object recognition under normal circumstance as well as under abnormal circumstance of being.

An expanded Matrix Factorization model for real-time Web service QoS prediction

  • Hao, Jinsheng;Su, Guoping;Han, Xiaofeng;Nie, Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3913-3934
    • /
    • 2021
  • Real-time prediction of Web service of quality (QoS) provides more convenience for web services in cloud environment, but real-time QoS prediction faces severe challenges, especially under the cold-start situation. Existing literatures of real-time QoS predicting ignore that the QoS of a user/service is related to the QoS of other users/services. For example, users/services belonging to the same group of category will have similar QoS values. All of the methods ignore the group relationship because of the complexity of the model. Based on this, we propose a real-time Matrix Factorization based Clustering model (MFC), which uses category information as a new regularization term of the loss function. Specifically, in order to meet the real-time characteristic of the real-time prediction model, and to minimize the complexity of the model, we first map the QoS values of a large number of users/services to a lower-dimensional space by the PCA method, and then use the K-means algorithm calculates user/service category information, and use the average result to obtain a stable final clustering result. Extensive experiments on real-word datasets demonstrate that MFC outperforms other state-of-the-art prediction algorithms.

A Study on Classifications and Characteristics of Declined Rural Area in Chungcheong Region

  • Jo, Jinhee;Park, Hyungkeun;Seo, Sedeok
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.468-471
    • /
    • 2015
  • The study aims to identify the degree and types of spatial decline in Eup/Myun units within Chungcheong region in South Korea to contribute to the efforts being made to diagnose the rural decline and the potentials. To this end, we analyzed 27 Sis and Guns to identify the degree of decline and potentials of rural areas in Chungcheong region. We also carried out the diagnosis and K-Means Clustering on 274 Eups and Myuns, the smallest administrative units, to figure out the types and characteristics of the rural recessions. According to the results of the clustering analysis carried out on the 166 Eups and Myuns, there were five outstanding clusters. They were; areas with housing deterioration (29), areas with poor economic foundation (16), areas with poor accessibility to central areas (42), areas with poor residential environment (51) and areas with aged population (28). The findings and results of the present study are likely to serve as a basis for the design and enforcement of forthcoming rural area activation policies. Also, it would be highly recommended that a more comprehensive diagnosis is taken from a community-level perspective and policy suggestions and strategies tailored for rural communities are further discussed.

  • PDF

Data Preprocessing and ML Analysis Method for Abnormal Situation Detection during Approach using Domestic Aircraft Safety Data (국내 항공기 위치 데이터를 활용한 이착륙 접근 단계에서의 항공 위험상황 탐지를 위한 데이터 전처리 및 머신 러닝 분석 기법)

  • Sang Ho Lee;Ilrak Son;Kyuho Jeong;Nohsam Park
    • Journal of Platform Technology
    • /
    • v.11 no.5
    • /
    • pp.110-125
    • /
    • 2023
  • In this paper, we utilize time-series aircraft location data measured based on 2019 domestic airports to analyze Go-Around and UOC_D situations during the approach phase of domestic airports. Various clustering-based machine learning techniques are applied to determine the most appropriate analysis method for domestic aviation data through experimentation. The ADS-B sensor is solely employed to measure aircraft positions. We designed a model using clustering algorithms such as K-Means, GMM, and DBSCAN to classify abnormal situations. Among them, the RF model showed the best performance overseas, but through experiments, it was confirmed that the GMM showed the highest classification performance for domestic aviation data by reflecting the aspects specialized in domestic terrain.

  • PDF

Optimal Arrangement of Patrol Ships based on k-Means Clustering for Quick Response of Marine Accidents (해양사고 신속대응을 위한 k-평균 군집화 기반 경비함정 최적배치)

  • Yoo, Sang-Lok;Jung, Cho-Young
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.23 no.7
    • /
    • pp.775-782
    • /
    • 2017
  • The position of existing patrol ships has been decided according to subjective judgments, not purely by any reasonable or scientific criteria, because of a lack of access to marine accident positions. In this study, the optimal location of patrol ships is quantitatively determined based on historical marine accident data. The study area used included the coastal sea of Pohang in South Korea. In this study, a k-means clustering algorithm was used to derive the location of patrol ships, and then a Voronoi diagram was used to divide the region around each patrol ship. As a result, the average navigation distance for patrol ships was improved by 4.4 nautical miles, and the average arrival time was improved by 13.2 minutes per marine accident. Moreover, if the locations of patrol ships need to be changed flexibly, it will be possible to optimally arrange limited resources using the technique developed in this study to ensure a fast rescue.

A Research for Clustering of Conflict in Public Construction Project (군집분석을 통한 공공 건설사업 갈등 유형화 연구)

  • Lee, Jiseop;Kim, Doyun;Lee, Changjun;Lee, Jeonghun;Han, Seungheon
    • Korean Journal of Construction Engineering and Management
    • /
    • v.19 no.2
    • /
    • pp.61-72
    • /
    • 2018
  • Conflicts in public construction projects lead to increase social costs as well as construction costs and schedule delay. Therefore, it is important to evaluate the conflict in construction project and find appropriate solutions based on previous cases. In this research, the conflict factors and criteria for evaluating conflict are derived and 30 cases are evaluated by 11 conflict experts. Using k-means clustering, the cases are clustered by three clusters. The cases were analyzed according to the characteristics of each cluster and labeled as 'NIMBY and harmful facility conflict cluster', 'environmental and pollution conflict cluster', and 'PIMFY and small conflicts'. In the future, when conflict occurs in the public construction projects, the conflict can be evaluated using this clustering and the characteristics of the conflicts can be found. As a result, it will be helpful to mitigate the conflict quickly and effectively by looking for previous cases that are suitable for resolving the conflict through appropriate clusters.