• Title/Summary/Keyword: Clustering Problem

Search Result 708, Processing Time 0.022 seconds

A Dual Processing Load Shedding to Improve The Accuracy of Aggregate Queries on Clustering Environment of GeoSensor Data Stream (클러스터 환경에서 GeoSensor 스트림 데이터의 집계질의의 정확도 향상을 위한 이중처리 부하제한 기법)

  • Ji, Min-Sub;Lee, Yeon;Kim, Gyeong-Bae;Bae, Hae-Young
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.1
    • /
    • pp.31-40
    • /
    • 2012
  • u-GIS DSMSs have been researched to deal with various sensor data from GeoSensors in ubiquitous environment. Also, they has been more important for high availability. The data from GeoSensors have some characteristics that increase explosively. This characteristic could lead memory overflow and data loss. To solve the problem, various load shedding methods have been researched. Traditional methods drop the overloaded tuples according to a particular criteria in a single server. Tuple deletion sensitive queries such as aggregation is hard to satisfy accuracy. In this paper a dual processing load shedding method is suggested to improve the accuracy of aggregation in clustering environment. In this method two nodes use replicated stream data for high availability. They process a stream in two nodes by using a characteristic they share stream data. Stream data are synchronized between them with a window as a unit. Then, processed results are merged. We gain improved query accuracy without data loss.

Modeling of the Cluster-based Multi-hop Sensor Networks (클거스터 기반 다중 홉 센서 네트워크의 모델링 기법)

  • Choi Jin-Chul;Lee Chae-Woo
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.43 no.1 s.343
    • /
    • pp.57-70
    • /
    • 2006
  • This paper descWireless Sensor Network consisting of a number of small sensors with transceiver and data processor is an effective means for gathering data in a variety of environments. The data collected by each sensor is transmitted to a processing center that use all reported data to estimate characteristics of the environment or detect an event. This process must be designed to conserve the limited energy resources of the sensor since neighboring sensors generally have the data of similar information. Therefore, clustering scheme which sends aggregated information to the processing center may save energy. Existing multi-hop cluster energy consumption modeling scheme can not estimate exact energy consumption of an individual sensor. In this paper, we propose a new cluster energy consumption model which modified existing problem. We can estimate more accurate total energy consumption according to the number of clusterheads by using Voronoi tessellation. Thus, we can realize an energy efficient cluster formation. Our modeling has an accuracy over $90\%$ when compared with simulation and has considerably superior than existing modeling scheme about $60\%.$ We also confirmed that energy consumption of the proposed modeling scheme is more accurate when the sensor density is increased.

The Model of Network Packet Analysis based on Big Data (빅 데이터 기반의 네트워크 패킷 분석 모델)

  • Choi, Bomin;Kong, Jong-Hwan;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.392-399
    • /
    • 2013
  • Due to the development of IT technology and the information age, a dependency of the network over the most of our lives have grown to a greater extent. Although it provides us to get various useful information and service, it also has negative effectiveness that can provide network intruder with vulnerable roots. In other words, we need to urgently cope with theses serious security problem causing service disableness or system connected to network obstacle with exploiting various packet information. Many experts in a field of security are making an effort to develop the various security solutions to respond against these threats, but existing solutions have a lot of problems such as lack of storage capacity and performance degradation along with the massive increase of packet data volume. Therefore we propose the packet analysis model to apply issuing Big Data technology in the field of security. That is, we used NoSQL which is technology of massive data storage to collect the packet data growing massive and implemented the packet analysis model based on K-means clustering using MapReudce which is distributed programming framework, and then we have shown its high performance by experimenting.

Region Based Image Similarity Search using Multi-point Relevance Feedback (다중점 적합성 피드백방법을 이용한 영역기반 이미지 유사성 검색)

  • Kim, Deok-Hwan;Lee, Ju-Hong;Song, Jae-Won
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.857-866
    • /
    • 2006
  • Performance of an image retrieval system is usually very low because of the semantic gap between the low level feature and the high level concept in a query image. Semantically relevant images may exhibit very different visual characteristics, and may be scattered in several clusters. In this paper, we propose a content based image rertrieval approach which combines region based image retrieval and a new relevance feedback method using adaptive clustering together. Our main goal is finding semantically related clusters to narrow down the semantic gap. Our method consists of region based clustering processes and cluster-merging process. All segmented regions of relevant images are organized into semantically related hierarchical clusters, and clusters are merged by finding the number of the latent clusters. This method, in the cluster-merging process, applies r: using v principal components instead of classical Hotelling's $T_v^2$ [1] to find the unknown number of clusters and resolve the singularity problem in high dimensions and demonstrate that there is little difference between the performance of $T^2$ and that of $T_v^2$. Experiments have demonstrated that the proposed approach is effective in improving the performance of an image retrieval system.

Evaluation of Multivariate Stream Data Reduction Techniques (다변량 스트림 데이터 축소 기법 평가)

  • Jung, Hung-Jo;Seo, Sung-Bo;Cheol, Kyung-Joo;Park, Jeong-Seok;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.13D no.7 s.110
    • /
    • pp.889-900
    • /
    • 2006
  • Even though sensor networks are different in user requests and data characteristics depending on each application area, the existing researches on stream data transmission problem focus on the performance improvement of their methods rather than considering the original characteristic of stream data. In this paper, we introduce a hierarchical or distributed sensor network architecture and data model, and then evaluate the multivariate data reduction methods suitable for user requirements and data features so as to apply reduction methods alternatively. To assess the relative performance of the proposed multivariate data reduction methods, we used the conventional techniques, such as Wavelet, HCL(Hierarchical Clustering), Sampling and SVD (Singular Value Decomposition) as well as the experimental data sets, such as multivariate time series, synthetic data and robot execution failure data. The experimental results shows that SVD and Sampling method are superior to Wavelet and HCL ia respect to the relative error ratio and execution time. Especially, since relative error ratio of each data reduction method is different according to data characteristic, it shows a good performance using the selective data reduction method for the experimental data set. The findings reported in this paper can serve as a useful guideline for sensor network application design and construction including multivariate stream data.

Headword Finding System Using Document Expansion (문서 확장을 이용한 표제어 검색시스템)

  • Kim, Jae-Hoon;Kim, Hyung-Chul
    • Journal of Information Management
    • /
    • v.42 no.4
    • /
    • pp.137-154
    • /
    • 2011
  • A headword finding system is defined as an information retrieval system using a word gloss as a query. We use the gloss as a document in order to implement such a system. Generally the gloss is very short in length and then makes very difficult to find the most proper headword for a given query. To alleviate this problem, we expand the document using the concept of query expansion in information retrieval. In this paper, we use 2 document expansion methods : gloss expansion and similar word expansion. The former is the process of inserting glosses of words, which include in the document, into a seed document. The latter is also the process of inserting similar words into a seed document. We use a featureless clustering algorithm for getting the similar words. The performance (r-inclusion rate) amounts to almost 100% when the queries are word glosses and r is 16, and to 66.9% when the queries are written in person by users. Through several experiments, we have observed that the document expansions are very useful for the headword finding system. In the future, new measures including the r-inclusion rate of our proposed measure are required for performance evaluation of headword finding systems and new evaluation sets are also needed for objective assessment.

Development of the Shortest Path Algorithm for Multiple Waypoints Based on Clustering for Automatic Book Management in Libraries (도서관의 자동 도서 관리를 위한 군집화 기반 다중경유지의 최단 경로 알고리즘 개발)

  • Kang, Hyo Jung;Jeon, Eun Joo;Park, Chan Jung
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.1
    • /
    • pp.541-551
    • /
    • 2021
  • Among the numerous duties of a librarian in a library, the work of arranging books is a job that the librarian has to do one by one. Thus, the cost of labor and time is large. In order to solve this problem, the interest in book-arranging robots based on artificial intelligence has recently increased. In this paper, we propose the K-ACO algorithm, which is the shortest path algorithm for multi-stops that can be applied to the library book arrangement robots. The proposed K-ACO algorithm assumes multiple robots rather than one robot. In addition, the K-ACO improves the ANT algorithm to create K clusters and provides the shortest path for each cluster. In this paper, the performance analysis of the proposed algorithm was carried out from the perspective of book arrangement time. The proposed algorithm, the K-ACO algorithm, was applied to a university library and compared with the current book arrangement algorithm. Through the simulation, we found that the proposed algorithm can allocate fairly, without biasing the work of arranging books, and ultimately significantly reduce the time to complete the entire work. Through the results of this study, we expect to improve quality services in the library by reducing the labor and time costs required for arranging books.

Analysis of Characteristics of Clusters of Middle School Students Using K-Means Cluster Analysis (K-평균 군집분석을 활용한 중학생의 군집화 및 특성 분석)

  • Jaebong, Lee
    • Journal of The Korean Association For Science Education
    • /
    • v.42 no.6
    • /
    • pp.611-619
    • /
    • 2022
  • The purpose of this study is to explore the possibility of applying big data analysis to provide appropriate feedback to students using evaluation data in science education at a time when interest in educational data mining has recently increased in education. In this study, we use the evaluation data of 2,576 students who took 24 questions of the national assessment of educational achievement. And we use K-means cluster analysis as a method of unsupervised machine learning for clustering. As a result of clustering, students were divided into six clusters. The middle-ranking students are divided into various clusters when compared to upper or lower ranks. According to the results of the cluster analysis, the most important factor influencing clusterization is academic achievement, and each cluster shows different characteristics in terms of content domains, subject competencies, and affective characteristics. Learning motivation is important among the affective domains in the lower-ranking achievement cluster, and scientific inquiry and problem-solving competency, as well as scientific communication competency have a major influence in terms of subject competencies. In the content domain, achievement of motion and energy and matter are important factors to distinguish the characteristics of the cluster. As a result, we can provide students with customized feedback for learning based on the characteristics of each cluster. We discuss implications of these results for science education, such as the possibility of using this study results, balanced learning by content domains, enhancement of subject competency, and improvement of scientific attitude.

Location Classification and Its Utilization for Illegal Parking Enforcement: Focusing on the Case of Gyeonggi (불법주정차 단속을 위한 지역(장소) 분류 및 활용 방안: 경기도를 중심으로)

  • Hyeon Han;So-yeon Choe;So-Hyun Lee
    • Information Systems Review
    • /
    • v.25 no.4
    • /
    • pp.113-130
    • /
    • 2023
  • Due to economic development and increasing gross national income, the number of automobiles continues to rise, leading to a serious issue of illegal parking due to limited road conditions and insufficient parking facilities. Illegal parking causes significant inconvenience and displeasure to people and can even result in accidents and loss of lives. The severity of accidents and their consequences, related to the growing number of vehicles and illegal parking, is escalating, particularly in the metropolitan areas. Consequently, efforts are being made to address this problem as a cause of social issues and come up with measures to reduce illegal parking. In particular, half of the public complaints in the metropolitan area are related to illegal parking, and the highest physical and human damage occurs in Gyeonggi. Thus, this study aims to use machine learning techniques based on data related to illegal parking in Suwon city, Gyeonggi, to categorize regional characteristics and propose effective measures to crack down on illegal parking. Additionally, practical, social, policy, and legal measures to decrease illegal parking in the metropolitan area are suggested. This study has academic significance in that it solved the problem of illegal parking, which is mentioned as one of the social problems that cause traffic congestion, by classifying regional characteristics using K-prototype, a machine learning algorithm. Furthermore, the results of this study contribute to practical and social aspects by providing measures to decrease illegal parking in the metropolitan area.

Dynamic Link Recommendation Based on Anonymous Weblog Mining (익명 웹로그 탐사에 기반한 동적 링크 추천)

  • Yoon, Sun-Hee;Oh, Hae-Seok
    • The KIPS Transactions:PartC
    • /
    • v.10C no.5
    • /
    • pp.647-656
    • /
    • 2003
  • In Webspace, mining traversal patterns is to understand user's path traversal patterns. On this mining, it has a unique characteristic which objects (for example, URLs) may be visited due to their positions rather than contents, because users move to other objects according to providing information services. As a consequence, it becomes very complex to extract meaningful information from these data. Recently discovering traversal patterns has been an important problem in data mining because there has been an increasing amount of research activity on various aspects of improving the quality of information services. This paper presents a Dynamic Link Recommendation (DLR) algorithm that recommends link sets on a Web site through mining frequent traversal patterns. It can be employed to any Web site with massive amounts of data. Our experimentation with two real Weblog data clearly validate that our method outperforms traditional method.