• Title/Summary/Keyword: 군집 적합도

Search Result 336, Processing Time 0.032 seconds

Screening and Clustering for Time-course Yeast Microarray Gene Expression Data using Gaussian Process Regression (효모 마이크로어레이 유전자 발현데이터에 대한 가우시안 과정 회귀를 이용한 유전자 선별 및 군집화)

  • Kim, Jaehee;Kim, Taehoun
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.389-399
    • /
    • 2013
  • This article introduces Gaussian process regression and shows its application with time-course microarray gene expression data. Gene screening for yeast cell cycle microarray expression data is accomplished with a ratio of log marginal likelihood that uses Gaussian process regression with a squared exponential covariance kernel function. Gaussian process regression fitting with each gene is done and shown with the nine top ranking genes. With the screened data the Gaussian model-based clustering is done and its silhouette values are calculated for cluster validity.

Document Clustering using Term reweighting based on NMF (NMF 기반의 용어 가중치 재산정을 이용한 문서군집)

  • Lee, Ju-Hong;Park, Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.11-18
    • /
    • 2008
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the re-weighted term based NMF(non-negative matrix factorization) to cluster documents relevant to a user's requirement. The proposed model uses the re-weighted term by using user feedback to reduce the gap between the user's requirement for document classification and the document clusters by means of machine. The Proposed method can improve the quality of document clustering because the re-weighted terms. the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

  • PDF

Sparse Document Data Clustering Using Factor Score and Self Organizing Maps (인자점수와 자기조직화지도를 이용한 희소한 문서데이터의 군집화)

  • Jun, Sung-Hae
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.22 no.2
    • /
    • pp.205-211
    • /
    • 2012
  • The retrieved documents have to be transformed into proper data structure for the clustering algorithms of statistics and machine learning. A popular data structure for document clustering is document-term matrix. This matrix has the occurred frequency value of a term in each document. There is a sparsity problem in this matrix because most frequencies of the matrix are 0 values. This problem affects the clustering performance. The sparseness of document-term matrix decreases the performance of clustering result. So, this research uses the factor score by factor analysis to solve the sparsity problem in document clustering. The document-term matrix is transformed to document-factor score matrix using factor scores in this paper. Also, the document-factor score matrix is used as input data for document clustering. To compare the clustering performances between document-term matrix and document-factor score matrix, this research applies two typed matrices to self organizing map (SOM) clustering.

Design of customized product recommendation model on correlation analysis when using electronic commerce (전자상거래 이용시 연관성 분석을 통한 맞춤형 상품추천 모델 설계)

  • Yang, MingFei;Park, Kiyong;Choi, Sang-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.203-216
    • /
    • 2022
  • In the recent business environment, purchase patterns are changing around the influence of COVID-19 and the online market. This study analyzed cluster and correlation analysis based on purchase and product information. The cluster analysis of new methods was attempted by creating customer, product, and cross-bonding clusters. The cross-bonding cluster analysis was performed based on the results of each cluster analysis. As a result of the correlation analysis, it was analyzed that more association rules were derived from a cross-bonding cluster, and the overlap rate was less. The cross-bonding cluster was found to be highly efficient. The cross-bonding cluster is the most suitable model for recommending products according to customer needs. The cross-bonding cluster model can save time and provide useful information to consumers. It is expected to bring positive effects such as increasing sales for the company.

A Particle Swarm Optimization based Control Scheme for Super peer Ratio in Unstructured Peer-to-Peer System (비구조적 피어-투-피어 시스템에서 입자 군집 최적화를 이용한 우수 피어 비율 조절 기법)

  • Jang Hyung-Keun;Han Sung-Min;Park Sung-Yong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.06d
    • /
    • pp.163-165
    • /
    • 2006
  • 비구조적인 피어-투-피어 시스템은 구조적 피어-투-피어 시스템에 비해 동적인 상황에 적합하지만 메시지가 여러 다른 피어를 이동하면서 검색하기 때문에 검색 시간이 길고 검색의 성공률이 낮다. 이러한 문제를 해결하기 위해 우수 피어를 사용한 계층적 피어-투-피어 시스템이 연구 되었다. 효율적인 계층적 피어-투-피어 시스템을 구성하기 위해서는 어떤 피어가 얼마나 많이 우수 피어로 선택되어야 하는지가 중요하다. 본 논문에서는 기존에 연구된 자기 조직적 링 구조 기법을 기반으로 우수 피어의 비율을 환경에 적응하게 하는 시스템을 제안한다. 환경에 적합한 비율 조절을 위해 효율적으로 최적 또는 최적에 가까운 해를 찾는 것으로 알려진 입자 군집 최적화(PSO : Particle Swarm Optimization)기법을 사용하였고 성능 평가 결과 PSO를 적용한 시스템에서 성능 향상을 볼 수 있었다.

  • PDF

A Technique of Angle-of-Arrival Estimation in an Ultra wide Band(UWB) Indoor Wireless Communication (초광대역 옥내 무선 통신에서 신호 도착 방향 추정 기법)

  • Lee Yong-Up;Seo Young-Jun;Cho Gin-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.31 no.3C
    • /
    • pp.279-285
    • /
    • 2006
  • In this study, a new signal model suitable for UWB indoor environments with random angle spread is proposed to estimate the angle-of-arrivals(AOAs) of UWB cluster signals in an UWB wireless communication. A subspace based estimation technique adopted for this model is investigated and the estimates of the AOA and distribution parameter on the received UWB cluster signals are obtained. The proposed model and estimation technique are verified using computer simulations, and the performance of the estimation error is analyzed.

Construction of Onion Sentiment Dictionary using Cluster Analysis (군집분석을 이용한 양파 감성사전 구축)

  • Oh, Seungwon;Kim, Min Soo
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2917-2932
    • /
    • 2018
  • Many researches are accomplished as a result of the efforts of developing the production predicting model to solve the supply imbalance of onions which are vegetables very closely related to Korean food. But considering the possibility of storing onions, it is very difficult to solve the supply imbalance of onions only with predicting the production. So, this paper's purpose is trying to build a sentiment dictionary to predict the price of onions by using the internet articles which include the informations about the production of onions and various factors of the price, and these articles are very easy to access on our daily lives. Articles about onions are from 2012 to 2016, using TF-IDF for comparing with four kinds of TF-IDFs through the documents classification of wholesale prices of onions. As a result of classifying the positive/negative words for price by k-means clustering, DBSCAN (density based spatial cluster application with noise) clustering, GMM (Gaussian mixture model) clustering which are partitional clustering, GMM clustering is composed with three meaningful dictionaries. To compare the reasonability of these built dictionary, applying classified articles about the rise and drop of the price on logistic regression, and it shows 85.7% accuracy.

Forecasting Electric Power Demand Using Census Information and Electric Power Load (센서스 정보 및 전력 부하를 활용한 전력 수요 예측)

  • Lee, Heon Gyu;Shin, Yong Ho
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.18 no.3
    • /
    • pp.35-46
    • /
    • 2013
  • In order to develop an accurate analytical model for domestic electricity demand forecasting, we propose a prediction method of the electric power demand pattern by combining SMO classification techniques and a dimension reduction conceptualized subspace clustering techniques suitable for high-dimensional data cluster analysis. In terms of electricity demand pattern prediction, hourly electricity load patterns and the demographic and geographic characteristics can be analyzed by integrating the wireless load monitoring data as well as sub-regional unit of census information. There are composed of a total of 18 characteristics clusters in the prediction result for the sub-regional demand pattern by using census information and power load of Seoul metropolitan area. The power demand pattern prediction accuracy was approximately 85%.

Comparison between at-site frequency analysis and regional frequency analysis at Gangwon Province (강원도에서의 지점빈도분석과 지역빈도분석의 비교)

  • Seo, Dong Il;Kim, Sang Ug;Jeon, Young Il;Han, Jae Wook
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2023.05a
    • /
    • pp.205-205
    • /
    • 2023
  • 지역 빈도 분석과 점 빈도 분석은 하천 기본계획 및 수공 구조물의 설계에 있어 재현기간 별 확률강우량을 산정하기 위한 방법이다. 점 빈도 분석은 자료의 수가 부족하여 높은 재현기간에 대한 확률강우량을 산정하기에 어려운 점이 있다. 2019년도부터 사용되고 있는 지역빈도분석 방법은 이러한 점을 보완해주고 있다. 지역빈도분석을 수행하기 위해서는 지역의 동질성을 확인하는 과정이 가장 중요한 과정이다. 이러한 동질성을 판단하기 위하여 K-means등의 군집분석과 L-moment 법 등을 사용하고 있다. 이러한 차이점으로 인해 두 방법 간의 정확성은 비교가 어려우나 서로 간의 장점, 단점과 결과 간의 차이를 기반으로 산간지역이 많은 강원도와 같은 지역에 대한 확률강우량 산정의 적절한 방법을 판단해보고자 본 연구를 진행하였다. 지역 빈도 분석은 강원도에 위치한 48개 관측소의 강우 자료 수집 후 고도, 위치, 지속시간 별 강우량을 변수로 지정하고 K-means 분석을 통해 6개의 군집으로 구분하여 수행되었다. 이질성 척도는 관측 자료와 500번의 모의 수행을 통해 결정하였다. 이후 분석된 군집이 동질한 경우 확률분포형에 적합시켜 확률강우량을 산정하였다. 점 빈도 분석은 지역 빈도 분석에서 결정된 군집에서의 최대 강우량과 최소 강우량 관측소의 자료를 이용하여 수행하였다. 본 연구에서는 점빈도분석과 지역빈도분석의 결과를 비교하였으며, 두 가지 분석 방법에 따른 차이의 발생원인 및 특성을 결론으로 제시하였다.

  • PDF

Determining the number of Clusters in On-Line Document Clustering Algorithm (온라인 문서 군집화에서 군집 수 결정 방법)

  • Jee, Tae-Chang;Lee, Hyun-Jin;Lee, Yill-Byung
    • The KIPS Transactions:PartB
    • /
    • v.14B no.7
    • /
    • pp.513-522
    • /
    • 2007
  • Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.