• Title/Summary/Keyword: cluster method

Search Result 2,498, Processing Time 0.032 seconds

Distributed data deduplication technique using similarity based clustering and multi-layer bloom filter (SDS 환경의 유사도 기반 클러스터링 및 다중 계층 블룸필터를 활용한 분산 중복제거 기법)

  • Yoon, Dabin;Kim, Deok-Hwan
    • The Journal of Korean Institute of Next Generation Computing
    • /
    • v.14 no.5
    • /
    • pp.60-70
    • /
    • 2018
  • A software defined storage (SDS) is being deployed in cloud environment to allow multiple users to virtualize physical servers, but a solution for optimizing space efficiency with limited physical resources is needed. In the conventional data deduplication system, it is difficult to deduplicate redundant data uploaded to distributed storages. In this paper, we propose a distributed deduplication method using similarity-based clustering and multi-layer bloom filter. Rabin hash is applied to determine the degree of similarity between virtual machine servers and cluster similar virtual machines. Therefore, it improves the performance compared to deduplication efficiency for individual storage nodes. In addition, a multi-layer bloom filter incorporated into the deduplication process to shorten processing time by reducing the number of the false positives. Experimental results show that the proposed method improves the deduplication ratio by 9% compared to deduplication method using IP address based clusters without any difference in processing time.

Automation Review of Road Design Standard using Visual Programming (비주얼 프로그래밍 기법을 활용한 도로설계기준 자동검토 방안)

  • Hyoun-seok Moon;Hyeoun-seung Kim
    • Journal of the Society of Disaster Information
    • /
    • v.18 no.4
    • /
    • pp.891-898
    • /
    • 2022
  • Purpose: There is not much time left for mandatory BIM implementation for all sectors and stages of the construction industry. Therefore, it is necessary to find a way to secure technology to substantially improve the productivity of BIM work. In the research, we proposed a method to automatically verify related construction standards for major objects produced by BIM modeling procedures so that engineers can verify construction standards in the BIM-based design process. Method: We defined a modeling work procedure for BIM-based road design work and prepared a method for constructing related design standards in a database. In addition, a process map for developing a BIM-based design basis review automation system was also presented. Result: A BIM-based design standard review automation module was developed using Civil3D and Dynamo. And it was confirmed by the test application that it is possible to quickly judge whether the BIM object manufactured in the design process conforms to the construction design standard. Conclusion: BIM-based design standard review automation technology can improve the productivity of BIM model production work and secure the quality of the BIM model.

The Proposal Method of ARINC-429 Linkage for Efficient Operation of Tactical Stations in P-3C Maritime Patrol Aircraft (P-3C 해상초계기용 전술컴퓨터의 효율적 운영을 위한 ARINC-429 연동 방법)

  • Byoung-Kug Kim;Yong-Hoon Cha
    • Journal of Advanced Navigation Technology
    • /
    • v.27 no.2
    • /
    • pp.167-172
    • /
    • 2023
  • The P-3C maritime patrol aircraft operated by the Republic of Korea Navy is equipped with various sensor devices (LRUs, line replace units) for tactical data collection. Depending on the characteristics of the sensor device, it operates with various communication protocols such as IEEE 802.3, MIL-STD-1553A/B, and ARINC-429. In addition, the collected tactical data is processed in the tactical station for mission operators, and this tactical station constitutes a clustering network on Gigabit Ethernet and operates in a distributed processing method. For communication with the sensor device, a specific tactical station mounts a peripheral device (eg. ARINC-429 interface card). The problem is that the performance of the entire distributed processing according to the peripheral device control and communication relay of this specific device is degraded, and even the operation stop of the tactical station has a problem of disconnecting the communication with the related sensor device. In this paper, we propose a method to mount a separate gateway to solve this problem, and the validity of the proposed application is demonstrated through the operation result of this gateway.

Study on the Application of Big Data Mining to Activate Physical Distribution Cooperation : Focusing AHP Technique (물류공동화 활성화를 위한 빅데이터 마이닝 적용 연구 : AHP 기법을 중심으로)

  • Young-Hyun Pak;Jae-Ho Lee;Kyeong-Woo Kim
    • Korea Trade Review
    • /
    • v.46 no.5
    • /
    • pp.65-81
    • /
    • 2021
  • The technological development in the era of the 4th industrial revolution is changing the paradigm of various industries. Various technologies such as big data, cloud, artificial intelligence, virtual reality, and the Internet of Things are used, creating synergy effects with existing industries, creating radical development and value creation. Among them, the logistics sector has been greatly influenced by quantitative data from the past and has been continuously accumulating and managing data, so it is highly likely to be linked with big data analysis and has a high utilization effect. The modern advanced technology has developed together with the data mining technology to discover hidden patterns and new correlations in such big data, and through this, meaningful results are being derived. Therefore, data mining occupies an important part in big data analysis, and this study tried to analyze data mining techniques that can contribute to the logistics field and common logistics using these data mining technologies. Therefore, by using the AHP technique, it was attempted to derive priorities for each type of efficient data mining for logisticalization, and R program and R Studio were used as tools to analyze this. Criteria of AHP method set association analysis, cluster analysis, decision tree method, artificial neural network method, web mining, and opinion mining. For the alternatives, common transport and delivery, common logistics center, common logistics information system, and common logistics partnership were set as factors.

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

  • Hyun Kwon;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.4
    • /
    • pp.61-70
    • /
    • 2023
  • Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.

Integrating physics-based fragility for hierarchical spectral clustering for resilience assessment of power distribution systems under extreme winds

  • Jintao Zhang;Wei Zhang;William Hughes;Amvrossios C. Bagtzoglou
    • Wind and Structures
    • /
    • v.39 no.1
    • /
    • pp.1-14
    • /
    • 2024
  • Widespread damages from extreme winds have attracted lots of attentions of the resilience assessment of power distribution systems. With many related environmental parameters as well as numerous power infrastructure components, such as poles and wires, the increased challenge of power asset management before, during and after extreme events have to be addressed to prevent possible cascading failures in the power distribution system. Many extreme winds from weather events, such as hurricanes, generate widespread damages in multiple areas such as the economy, social security, and infrastructure management. The livelihoods of residents in the impaired areas are devastated largely due to the paucity of vital utilities, such as electricity. To address the challenge of power grid asset management, power system clustering is needed to partition a complex power system into several stable clusters to prevent the cascading failure from happening. Traditionally, system clustering uses the Binary Decision Diagram (BDD) to derive the clustering result, which is time-consuming and inefficient. Meanwhile, the previous studies considering the weather hazards did not include any detailed weather-related meteorologic parameters which is not appropriate as the heterogeneity of the parameters could largely affect the system performance. Therefore, a fragility-based network hierarchical spectral clustering method is proposed. In the present paper, the fragility curve and surfaces for a power distribution subsystem are obtained first. The fragility of the subsystem under typical failure mechanisms is calculated as a function of wind speed and pole characteristic dimension (diameter or span length). Secondly, the proposed fragility-based hierarchical spectral clustering method (F-HSC) integrates the physics-based fragility analysis into Hierarchical Spectral Clustering (HSC) technique from graph theory to achieve the clustering result for the power distribution system under extreme weather events. From the results of vulnerability analysis, it could be seen that the system performance after clustering is better than before clustering. With the F-HSC method, the impact of the extreme weather events could be considered with topology to cluster different power distribution systems to prevent the system from experiencing power blackouts.

Analysis on the Users′ Behaviors and Satisfaction on the Actual Conditions of Management in Chiri Mountain National Park (지리산국립공원의 이용자 행태분석과 관리실태에 대한 만족도 조사에 관한 연구)

  • 김광래;진희성;김세천
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.16 no.2
    • /
    • pp.43-57
    • /
    • 1988
  • The purpose of this thesis is to suggest objective basic data for park management proposal through the quantitative analysis of users' behaviors and satisfaction for the actual conditions of management in the Chiri Mountain National Park. For this users' behaviors and socio-economic characteristics have been cross-analyzed. Specifically, it attempts to investigate users' anticipate and degree of satisfaction applied Expectancy Theory by Likert attitude scale. Users'behaviors patterns of each site have been analyzed by the factor analysis algorithm, and each factor scores of sites have been clustered by the cluster method. And also user' satisfaction for the actual conditions of management have been analyzed by using the multiple regression. The major user groups were students and youth groups accompanied by their friends ranging from 3 to 10. The values of user'post occupancy-evaluation for such as rockwall climbing and praying on the mountain of each site showed higher than those of anticipated, but evaluation values of other activities were lower. The user'behaviors of each site have been analyzed five factors by factor analysis algorithm. By using the control method for the number of factors, T.V. has been obtained as 50.58%. The factor score of factor covering the behavior patterns of student and youth yield high EV. and C.V.. On the analysis of cluster using factor score, factor IV in Hwaomsa temple site and Ssanggyesa temple site, factor II, v in Jungsanri Valley site, factor, I, III in Bangmudong valley site and factor I. IV in Baemsagol vallry site showed very high values, respectively. According to the multiple regression analysis, the major variables related to the satisfaction for the actual conditions of vegetation and landscape managements were reservation of groundcover, recovery of artificial injury, the surroundings of camping and temple site. In the park facilities and operation, the major variables related to the satisfaction were conditions of management such amenity facilities as privy, sign board, junk yard, camping site, and guidance of excursion, campaign and preservation of nature.

  • PDF

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

A Method of Detecting the Aggressive Driving of Elderly Driver (노인 운전자의 공격적인 운전 상태 검출 기법)

  • Koh, Dong-Woo;Kang, Hang-Bong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.537-542
    • /
    • 2017
  • Aggressive driving is a major cause of car accidents. Previous studies have mainly analyzed young driver's aggressive driving tendency, yet they were only done through pure clustering or classification technique of machine learning. However, since elderly people have different driving habits due to their fragile physical conditions, it is necessary to develop a new method such as enhancing the characteristics of driving data to properly analyze aggressive driving of elderly drivers. In this study, acceleration data collected from a smartphone of a driving vehicle is analyzed by a newly proposed ECA(Enhanced Clustering method for Acceleration data) technique, coupled with a conventional clustering technique (K-means Clustering, Expectation-maximization algorithm). ECA selects high-intensity data among the data of the cluster group detected through K-means and EM in all of the subjects' data and models the characteristic data through the scaled value. Using this method, the aggressive driving data of all youth and elderly experiment participants were collected, unlike the pure clustering method. We further found that the K-means clustering has higher detection efficiency than EM method. Also, the results of K-means clustering demonstrate that a young driver has a driving strength 1.29 times higher than that of an elderly driver. In conclusion, the proposed method of our research is able to detect aggressive driving maneuvers from data of the elderly having low operating intensity. The proposed method is able to construct a customized safe driving system for the elderly driver. In the future, it will be possible to detect abnormal driving conditions and to use the collected data for early warning to drivers.

Dietary Behaviors of Adults for Health in Ulsan City (울산시민의 건강실천을 위한 식생활 행태)

  • Shin, Ae-Sook;Kim, Kwang-Kee
    • Journal of the Korean Society of Food Culture
    • /
    • v.15 no.1
    • /
    • pp.17-28
    • /
    • 2000
  • This paper is an effort to describe dietary behaviors to keep them healthy among adults. A probability sample was drawn from residents aged between 15 and 60 living in Ulsan City area through a multi-staged cluster sampling method. The data collected by face-to-face interview includes 1,232 respondents. Both univariate and bivariate analyses were employed to describe the dietary behaviors. The dietary behaviors in this study includes preference of taking fat-part of meat, fried food, salty food, hot-taste food, drinking coffee and milk, and taking supplementary medicine. About half of the respondents reported to take fat removed when eating meat, and more than 68% of them preferred not to take any kinds of fried food. With respect to preference of salty and hot-taste food, 39.6% of the respondents take medium-salty and 39.4% do hot-taste food. A third of the respondents drink two-four cups of coffee a day. Those who reported not to drink milk at all were prevalent(37.4% of the respondents) than expected. However, less than 20% of the respondents reported to have any kinds of supplementary health food in a year. These dietary behaviors were examined by sociodemographic characteristics for bivariate analyses.

  • PDF