• Title/Summary/Keyword: clustering problem

Search Result 709, Processing Time 0.03 seconds

Noise Averaging Effect on Privacy-Preserving Clustering of Time-Series Data (시계열 데이터의 프라이버시 보호 클러스터링에서 노이즈 평준화 효과)

  • Moon, Yang-Sae;Kim, Hea-Suk
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.3
    • /
    • pp.356-360
    • /
    • 2010
  • Recently, there have been many research efforts on privacy-preserving data mining. In privacy-preserving data mining, accuracy preservation of mining results is as important as privacy preservation. Random perturbation privacy-preserving data mining technique is known to well preserve privacy. However, it has a problem that it destroys distance orders among time-series. In this paper, we propose a notion of the noise averaging effect of piecewise aggregate approximation(PAA), which can be preserved the clustering accuracy as high as possible in time-series data clustering. Based on the noise averaging effect, we define the PAA distance in computing distance. And, we show that our PAA distance can alleviate the problem of destroying distance orders in random perturbing time series.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

Heuristics for vehicle routing problem to minimize total delivery waiting time (고객 대기 시간 최소화 차량 경로 문제의 발견적 해법)

  • 강경환;이영훈
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2001.10a
    • /
    • pp.279-282
    • /
    • 2001
  • This paper is concerned with Vehicle Routing Problem to minimize the total delivery waiting time of customers. The delivery waiting time is the time taken to travel from the depot to the customer, which is important for the delivery of the perishing products or foods requiring freshness. We construct a mixed integer linear programming formulation of this problem, and CR(Clustering first Routing second) heuristic and SPH(set partitioning heuristic) are suggested. the results of computational experiments showed that these heuristics find successfully favorable solutions, in much shorter time than optimal.

  • PDF

Retrieve System for Performance support of Vocabulary Clustering Model In Continuous Vocabulary Recognition System (연속 어휘 인식 시스템에서 어휘 클러스터링 모델의 성능 지원을 위한 검색 시스템)

  • Oh, Sang Yeob
    • Journal of Digital Convergence
    • /
    • v.10 no.9
    • /
    • pp.339-344
    • /
    • 2012
  • Established continuous vocabulary recognition system improved recognition rate by using decision tree based tying modeling method. However, since system model cannot support the retrieve of phoneme data, it is hard to secure the accuracy. In order to improve this problem, we remodeled a system that could retrieve probabilistic model from continuous vocabulary clustering model to phoneme unit. Therefore in this paper showed 95.88%of recognition rate in system performance.

Group Model Clustering Method for Model Downsizing (모델 축소를 위한 그룹 모델 클러스터링 방법에 대한 연구)

  • Park, Mi-Na;Ha, Jin-Young
    • Journal of Industrial Technology
    • /
    • v.28 no.A
    • /
    • pp.185-189
    • /
    • 2008
  • Practical pattern recognition systems should overcome very large class problem. Sometimes it is almost impossible to build every model for every class due to memory and time constraints. For this case, grouping similar models will be helpful. In this paper, we propose GMC(Group Model Clustering) to build a large class Chinese character recognition system. We built hidden Markov models for 10% of total classes, then classify the rest of classes into already trained group classes. Finally group models are trained using group model clustered data. Recognition is performed using only group models, in order to achieve reduced model size and improved recognition speed.

  • PDF

A Hybrid Genetic Algorithm for K-Means Clustering

  • Jun, Sung-Hae;Han, Jin-Woo;Park, Minjae;Oh, Kyung-Whan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.330-333
    • /
    • 2003
  • Initial cluster size for clustering of partitioning methods is very important to the clustering result. In K-means algorithm, the result of cluster analysis becomes different with optimal cluster size K. Usually, the initial cluster size is determined by prior and subjective information. Sometimes this may not be optimal. Now, more objective method is needed to solve this problem. In our research, we propose a hybrid genetic algorithm, a tree induction based evolution algorithm, for determination of optimal cluster size. Initial population of this algorithm is determined by the number of terminal nodes of tree induction. From the initial population based on decision tree, our optimal cluster size is generated. The fitness function of ours is defined an inverse of dissimilarity measure. And the bagging approach is used for saying computational time cost.

  • PDF

Design and Implementation of Spatial Clustering Method using Regular Grid (균등 격자를 이용한 공간 클러스터링 기법의 설계 및 구현)

  • 문상호
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2003.05a
    • /
    • pp.485-489
    • /
    • 2003
  • Several clustering methods for spatial data mining have been devised in the literature, but have the following drawback: increase cost due to calculating distance among objects. To solve this problem, we propose a spatial clustering method using regular cells. In this paper, we design and implement file structures, data structures and algorithms to realize the proposed method, also, show experimental results after applying test data to the implemented method.

  • PDF

ASVMRT: Materialized View Selection Algorithm in Data Warehouse

  • Yang, Jin-Hyuk;Chung, In-Jeong
    • Journal of Information Processing Systems
    • /
    • v.2 no.2
    • /
    • pp.67-75
    • /
    • 2006
  • In order to acquire a precise and quick response to an analytical query, proper selection of the views to materialize in the data warehouse is crucial. In traditional view selection algorithms, all relations are considered for selection as materialized views. However, materializing all relations rather than a part results in much worse performance in terms of time and space costs. Therefore, we present an improved algorithm for selection of views to materialize using the clustering method to overcome the problem resulting from conventional view selection algorithms. In the presented algorithm, ASVMRT (Algorithm for Selection of Views to Materialize using Reduced Table), we first generate reduced tables in the data warehouse using clustering based on attribute-values density, and then we consider the combination of reduced tables as materialized views instead of a combination of the original base relations. For the justification of the proposed algorithm, we reveal the experimental results in which both time and space costs are approximately 1.8 times better than conventional algorithms.

A clustering method using the Coulomb Energy Network (쿨롱네트워크를 이용한 집락분석)

  • 이석훈;박래현;김응환
    • The Korean Journal of Applied Statistics
    • /
    • v.8 no.1
    • /
    • pp.39-50
    • /
    • 1995
  • This article deals with the problem that all the statistical clustering methods do not supply the clustering rule after the analysis. We modify the Coulomb Energy Network model basically developed in physics and suggest one model appropriate for our purpose and show the implementation using an actual data. Finally the method suggested is compared with one of the well known methods, K-means algorithm using Rand C.

  • PDF

Clustering of transmission system using the electrical distance (전기적 거리를 이용한 송전계통 클러스터링)

  • Hwang, Ho-Yoon;Kim, Hyun-Hong;Won, Jong-Jip;Park, Jong-Bae;Shin, Joong-Rin
    • Proceedings of the KIEE Conference
    • /
    • 2009.07a
    • /
    • pp.591_592
    • /
    • 2009
  • For a similar price signal to local consumer, a regional clustering is important that can transfer a equal sign to the bus which has each different price on the basis of bus. If you take LMP and local information this method into consideration at the same time, and you give weight on neither one side, buses on a borderline can move on arbitrary. For solving this problem, in this paper will propose a clustering method to have applied new concept called 'electric distance'.

  • PDF