• 제목/요약/키워드: clustering problem

검색결과 708건 처리시간 0.022초

Document Clustering Using Semantic Features and Fuzzy Relations

  • Kim, Chul-Won;Park, Sun
    • Journal of information and communication convergence engineering
    • /
    • 제11권3호
    • /
    • pp.179-184
    • /
    • 2013
  • Traditional clustering methods are usually based on the bag-of-words (BOW) model. A disadvantage of the BOW model is that it ignores the semantic relationship among terms in the data set. To resolve this problem, ontology or matrix factorization approaches are usually used. However, a major problem of the ontology approach is that it is usually difficult to find a comprehensive ontology that can cover all the concepts mentioned in a collection. This paper proposes a new document clustering method using semantic features and fuzzy relations for solving the problems of ontology and matrix factorization approaches. The proposed method can improve the quality of document clustering because the clustered documents use fuzzy relation values between semantic features and terms to distinguish clearly among dissimilar documents in clusters. The selected cluster label terms can represent the inherent structure of a document set better by using semantic features based on non-negative matrix factorization, which is used in document clustering. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.

객체지향개발에서의 속성 클러스터링과 클래스 계층구조생성 (Clustering Characteristics and Class Hierarchy Generation in Object-Oriented Development)

  • 이건호
    • 정보처리학회논문지D
    • /
    • 제11D권7호
    • /
    • pp.1443-1450
    • /
    • 2004
  • 객체지향 소프트웨어 개발 초기단계에서 클래스의 결정은 많은 객체와 관련된 속성들의 클러스터링을 하는 복잡한 문제이다. 클래스의 재사용을 위해 라이브러리에 클래스의 등록은 반복적인 시행착오에 의존하여왔다. 클래스를 등록하는 전통적인 방법과 모델링 혹은 설계단계에서 클래스와 그 계층구조의 정의를 위한 통합적인 방법에 대해 논의한다. 속성 클러스터링 문제를 위해 객체들의 속성 유사도에 근거하여 0-1 정수프로그램 위한 모형을 제시하고 또한 네트워크 기법을 이용한 클러스터링 알고리즘을 제안한다. 클래스 계층구조를 생성하기 위한 규칙을 제시하였으며 계층구조그래프 생성알고리즘을 제안한다. 본 연구결과를 이용하여 실제 현장의 문제를 사례로 제시한다.

Fuzzy Clustering with Genre Preference for Collaborative Filtering

  • Lee, Soojung
    • 한국컴퓨터정보학회논문지
    • /
    • 제25권5호
    • /
    • pp.99-106
    • /
    • 2020
  • 협력 필터링 기반의 추천 시스템에 내재된 확장성 문제는 지난 수십년간 관련 연구의 이슈가 되어 왔다. 클러스터링은 이 문제를 해결하는 유명한 기술인데 낮은 성능으로 인하여 활발히 연구되어 오진 않았다. 본 논문에서는 협력 필터링 시스템의 고질적인 단점인 확장성 문제를 극복하기 위하여 클러스터링 기법을 채택하였다. 또한 클러스터링을 적용함으로 인해 초래되는 성능저하 문제를 개선하기 위해, 두 가지 전략을 사용하였는데, 첫째는 퍼지 클러스터링이며, 둘째는 영화 장르에 대한 사용자 선호도에 기반한 유사도 측정 방법을 제안하고 이를 적용하였다. 본 연구에서의 제안 방법을 기존의 여러 관련 방법들과 비교 실험을 통해 다양한 주요 성능 척도에 의거하여 평가하였는데, 실험 결과 제안 방법은 예측과 순위 정확도 측면에서 더 우수한 성능을 보였고, 추천 정확도 측면에서는 실험 대상 중 최상의 방법과 대등한 성능을 나타냈다.

범주형 값들이 순서를 가지고 있는 데이터들의 클러스터링 기법 (Clustering Algorithm for Sequences of Categorical Values)

  • 오승준;김재련
    • 한국산업경영시스템학회:학술대회논문집
    • /
    • 한국산업경영시스템학회 2002년도 춘계학술대회
    • /
    • pp.125-132
    • /
    • 2002
  • We study clustering algorithm for sequences of categorical values. Clustering is a data mining problem that has received significant attention by the database community. Traditional clustering algorlthms deal with numerical or categorical data points. However, there exist many important databases that store categorical data sequences. In this paper we introduce new similarity measure and develope a hierarchical clustering algorithm. An experimental section shows performance of the proposed approach.

  • PDF

Sample Based Algorithm for k-Spatial Medians Clustering

  • Jin, Seo-Hoon;Jung, Byoung-Cheol
    • 응용통계연구
    • /
    • 제23권2호
    • /
    • pp.367-374
    • /
    • 2010
  • As an alternative to the k-means clustering the k-spatial medians clustering has many good points because of advantages of spatial median. However, it has not been used a lot since it needs heavy computation. If the number of objects and the number of variables are large the computation time problem is getting serious. In this study we propose fast algorithm for the k-spatial medians clustering. Practical applicability of the algorithm is shown with some numerical studies.

Refinement of Document Clustering by Using NMF

  • Shinnou, Hiroyuki;Sasaki, Minoru
    • 한국언어정보학회:학술대회논문집
    • /
    • 한국언어정보학회 2007년도 정기학술대회
    • /
    • pp.430-439
    • /
    • 2007
  • In this paper, we use non-negative matrix factorization (NMF) to refine the document clustering results. NMF is a dimensional reduction method and effective for document clustering, because a term-document matrix is high-dimensional and sparse. The initial matrix of the NMF algorithm is regarded as a clustering result, therefore we can use NMF as a refinement method. First we perform min-max cut (Mcut), which is a powerful spectral clustering method, and then refine the result via NMF. Finally we should obtain an accurate clustering result. However, NMF often fails to improve the given clustering result. To overcome this problem, we use the Mcut object function to stop the iteration of NMF.

  • PDF

군집의 효율향상을 위한 휴리스틱 알고리즘 (Heuristic algorithm to raise efficiency in clustering)

  • 이석환;박승헌
    • 대한안전경영과학회지
    • /
    • 제11권3호
    • /
    • pp.157-166
    • /
    • 2009
  • In this study, we developed a heuristic algorithm to get better efficiency of clustering than conventional algorithms. Conventional clustering algorithm had lower efficiency of clustering as there were no solid method for selecting initial center of cluster and as they had difficulty in search solution for clustering. EMC(Expanded Moving Center) heuristic algorithm was suggested to clear the problem of low efficiency in clustering. We developed algorithm to select initial center of cluster and search solution systematically in clustering. Experiments of clustering are performed to evaluate performance of EMC heuristic algorithm. Squared-error of EMC heuristic algorithm showed better performance for real case study and improved greatly with increase of cluster number than the other ones.

규칙 생성 시스템을 위한 새로운 연속 클러스터링 조합 (New Sequential Clustering Combination for Rule Generation System)

  • 김승석;최호진
    • 인터넷정보학회논문지
    • /
    • 제13권5호
    • /
    • pp.1-8
    • /
    • 2012
  • 본 논문에서는 수치적 데이터를 이용하여 규칙을 생성하는 시스템에 대해 순차적인 클러스터링 방법을 제안한다. 단일 클러스터링 기법은 방대하고 복잡한 공간 내에서는 원하는 결과를 얻지 못할 수 있다. 이런 문제점을 해결하기 위해 제안된 방법은 서로 다른 클러스터링 기법을 순차적으로 수행하여 장점들은 활용하고 단점들은 보안하는 형태를 제안하였다. Mountain 클러스터링과 Chen 클러스터링을 이용하여 non-parametric 공간에서 자율적으로 클러스터를 구성하였고, global 공간과 local 공간으로 역할을 분담하여 클러스터를 추정한다. 추정된 클러스터들은 신경회로망이나 퍼지 시스템과 같은 지능 시스템의 구조와 초기 파라미터 결정에 활용될 수 있으며, 확장하여 헬스케어와 의료 분야에서의 결정 제공 시스템의 학습에 도움을 줄 수 있다. 제안된 방법을 유용성을 시뮬레이션을 통해 보이고자 한다.

Clustering Algorithm for Time Series with Similar Shapes

  • Ahn, Jungyu;Lee, Ju-Hong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권7호
    • /
    • pp.3112-3127
    • /
    • 2018
  • Since time series clustering is performed without prior information, it is used for exploratory data analysis. In particular, clusters of time series with similar shapes can be used in various fields, such as business, medicine, finance, and communications. However, existing time series clustering algorithms have a problem in that time series with different shapes are included in the clusters. The reason for such a problem is that the existing algorithms do not consider the limitations on the size of the generated clusters, and use a dimension reduction method in which the information loss is large. In this paper, we propose a method to alleviate the disadvantages of existing methods and to find a better quality of cluster containing similarly shaped time series. In the data preprocessing step, we normalize the time series using z-transformation. Then, we use piecewise aggregate approximation (PAA) to reduce the dimension of the time series. In the clustering step, we use density-based spatial clustering of applications with noise (DBSCAN) to create a precluster. We then use a modified K-means algorithm to refine the preclusters containing differently shaped time series into subclusters containing only similarly shaped time series. In our experiments, our method showed better results than the existing method.

Clustering Algorithm Considering Sensor Node Distribution in Wireless Sensor Networks

  • Yu, Boseon;Choi, Wonik;Lee, Taikjin;Kim, Hyunduk
    • Journal of Information Processing Systems
    • /
    • 제14권4호
    • /
    • pp.926-940
    • /
    • 2018
  • In clustering-based approaches, cluster heads closer to the sink are usually burdened with much more relay traffic and thus, tend to die early. To address this problem, distance-aware clustering approaches, such as energy-efficient unequal clustering (EEUC), that adjust the cluster size according to the distance between the sink and each cluster head have been proposed. However, the network lifetime of such approaches is highly dependent on the distribution of the sensor nodes, because, in randomly distributed sensor networks, the approaches do not guarantee that the cluster energy consumption will be proportional to the cluster size. To address this problem, we propose a novel approach called CACD (Clustering Algorithm Considering node Distribution), which is not only distance-aware but also node density-aware approach. In CACD, clusters are allowed to have limited member nodes, which are determined by the distance between the sink and the cluster head. Simulation results show that CACD is 20%-50% more energy-efficient than previous work under various operational conditions considering the network lifetime.