• Title/Summary/Keyword: over-clustering

Search Result 388, Processing Time 0.023 seconds

A Study on Classification Evaluation Prediction Model by Cluster for Accuracy Measurement of Unsupervised Learning Data (비지도학습 데이터의 정확성 측정을 위한 클러스터별 분류 평가 예측 모델에 대한 연구)

  • Jung, Se Hoon;Kim, Jong Chan;Kim, Cheeyong;You, Kang Soo;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.7
    • /
    • pp.779-786
    • /
    • 2018
  • In this paper, we are applied a nerve network to allow for the reflection of data learning methods in their overall forms by using cluster data rather than data learning by the stages and then selected a nerve network model and analyzed its variables through learning by the cluster. The CkLR algorithm was proposed to analyze the reaction variables of clustering outcomes through an approach to the initialization of K-means clustering and build a model to assess the prediction rate of clustering and the accuracy rate of prediction in case of new data inputs. The performance evaluation results show that the accuracy rate of test data by the class was over 92%, which was the mean accuracy rate of the entire test data, thus confirming the advantages of a specialized structure found in the proposed learning nerve network by the class.

3D Radar Objects Tracking and Reflectivity Profiling

  • Kim, Yong Hyun;Lee, Hansoo;Kim, Sungshin
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.12 no.4
    • /
    • pp.263-269
    • /
    • 2012
  • The ability to characterize feature objects from radar readings is often limited by simply looking at their still frame reflectivity, differential reflectivity and differential phase data. In many cases, time-series study of these objects' reflectivity profile is required to properly characterize features objects of interest. This paper introduces a novel technique to automatically track multiple 3D radar structures in C,S-band in real-time using Doppler radar and profile their characteristic reflectivity distribution in time series. The extraction of reflectivity profile from different radar cluster structures is done in three stages: 1. static frame (zone-linkage) clustering, 2. dynamic frame (evolution-linkage) clustering and 3. characterization of clusters through time series profile of reflectivity distribution. The two clustering schemes proposed here are applied on composite multi-layers CAPPI (Constant Altitude Plan Position Indicator) radar data which covers altitude range of 0.25 to 10 km and an area spanning over hundreds of thousands $km^2$. Discrete numerical simulations show the validity of the proposed technique and that fast and accurate profiling of time series reflectivity distribution for deformable 3D radar structures is achievable.

The clustering of critical points in the evolving cosmic web

  • Shim, Junsup;Codis, Sandrine;Pichon, Christophe;Pogosyan, Dmitri;Cadiou, Corentin
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.46 no.1
    • /
    • pp.47.2-47.2
    • /
    • 2021
  • Focusing on both small separations and baryonic acoustic oscillation scales, the cosmic evolution of the clustering properties of peak, void, wall, and filament-type critical points is measured using two-point correlation functions in ΛCDM dark matter simulations as a function of their relative rarity. A qualitative comparison to the corresponding theory for Gaussian random fields allows us to understand the following observed features: (i) the appearance of an exclusion zone at small separation, whose size depends both on rarity and signature (i.e. the number of negative eigenvalues) of the critical points involved; (ii) the amplification of the baryonic acoustic oscillation bump with rarity and its reversal for cross-correlations involving negatively biased critical points; (iii) the orientation-dependent small-separation divergence of the cross-correlations of peaks and filaments (respectively voids and walls) that reflects the relative loci of such points in the filament's (respectively wall's) eigenframe. The (cross-) correlations involving the most non-linear critical points (peaks, voids) display significant variation with redshift, while those involving less non-linear critical points seem mostly insensitive to redshift evolution, which should prove advantageous to model. The ratios of distances to the maxima of the peak-to-wall and peak-to-void over that of the peak-to-filament cross-correlation are ~2-√~2 and ~3-√~3WJ, respectively, which could be interpreted as the cosmic crystal being on average close to a cubic lattice. The insensitivity to redshift evolution suggests that the absolute and relative clustering of critical points could become a topologically robust alternative to standard clustering techniques when analysing upcoming surveys such as Euclid or Large Synoptic Survey Telescope (LSST).

  • PDF

Similarity Measurement with Interestingness Weight for Improving the Accuracy of Web Transaction Clustering (웹 트랜잭션 클러스터링의 정확성을 높이기 위한 흥미가중치 적용 유사도 비교방법)

  • Kang, Tae-Ho;Min, Young-Soo;Yoo, Jae-Soo
    • The KIPS Transactions:PartD
    • /
    • v.11D no.3
    • /
    • pp.717-730
    • /
    • 2004
  • Recently. many researches on the personalization of a web-site have been actively made. The web personalization predicts the sets of the most interesting URLs for each user through data mining approaches such as clustering techniques. Most existing methods using clustering techniques represented the web transactions as bit vectors that represent whether users visit a certain WRL or not to cluster web transactions. The similarity of the web transactions was decided according to the match degree of bit vectors. However, since the existing methods consider only whether users visit a certain URL or not, users' interestingness on the URL is excluded from clustering web transactions. That is, it is possible that the web transactions with different visit proposes or inclinations are classified into the same group. In this paper. we propose an enhanced transaction modeling with interestingness weight to solve such problems and a new similarity measuring method that exploits the proposed transaction modeling. It is shown through performance evaluation that our similarity measuring method improves the accuracy of the web transaction clustering over the existing method.

Similarity-based Dynamic Clustering Using Radar Reflectivity Data (퍼지모델을 이용한 유사성 기반의 동적 클러스터링)

  • Lee, Han-Soo;Kim, Su-Dae;Kim, Yong-Hyun;Kim, Sung-Shin
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2011.10a
    • /
    • pp.219-222
    • /
    • 2011
  • There are number of methods that track the movement of an object or the change of state, such as Kalman filter, particle filter, dynamic clustering, and so on. Amongst these method, dynamic clustering method is an useful way to track cluster across multiple data frames and analyze their trend. In this paper we suggest the similarity-based dynamic clustering method, and verifies it's performance by simulation. Proposed dynamic clustering method is how to determine the same clusters for each continuative frame. The same clusters have similar characteristics across adjacent frames. The change pattern of cluster's characteristics in each time frame is throughly studied. Clusters in each time frames are matched against each others to see their similarity. Mamdani fuzzy model is used to determine similarity based matching algorithm. The proposed algorithm is applied to radar reflectivity data over time domain. We were able to observe time dependent characteristic of the clusters.

  • PDF

Analysis of forest types and stand structures over Korean peninsula Using NOAA/AVHRR data

  • Lee, Seung-Ho;Kim, Cheol-Min;Oh, Dong-Ha
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.386-389
    • /
    • 1999
  • In this study, visible and near infrared channels of NOAA/AVHRR data were used to classify land use and vegetation types over Korean peninsula. Analyzing forest stand structures and prediction of forest productivity using satellite data were also reviewed. Land use and land cover classification was made by unsupervised clustering methods. After monthly Normalized Difference Vegetation Index (NDVI) composite images were derived from April to November 1998, the derived composite images were used as temporal feature vector's in this clustering analysis. Visually interpreted, the classification result was satisfactory in overall for it matched well with the general land cover patterns. But subclassification of forests into coniferous, deciduous, and mixed forests were much confused due to the effects of low ground resolution of AVHRR data and without defined classification scheme. To investigate into the forest stand structures, digital forest type maps were used as an ancillary data. Forest type maps, which were compiled and digitalized by Forestry Research Institute, were registered to AVHRR image coordinates. Two data sets were compared and percent forest cover over whole region was estimated by multiple regression analysis. Using this method, other forest stand structure characteristics within the primary data pixels are expected to be extracted and estimated.

  • PDF

An Adjustment for a Regional Incongruity in Global land Cover Map: case of Korea

  • Park Youn-Young;Han Kyung-Soo;Yeom Jong-Min;Suh Yong-Cheol
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.3
    • /
    • pp.199-209
    • /
    • 2006
  • The Global Land Cover 2000 (GLC 200) project, as a most recent issue, is to provide for the year 2000 a harmonized land cover database over the whole globe. The classifications were performed according to continental or regional scales by corresponding organization using the data of VEGETATION sensor onboard the SPOT4 Satellite. Even if the global land cover classification for Asia provided by Chiba University showed a good accuracy in whole Asian area, some problems were detected in Korean region. Therefore, the construction of new land cover database over Korea is strongly required using more recent data set. The present study focuses on the development of a new upgraded land cover map at 1 km resolution over Korea considering the widely used K-means clustering, which is one of unsupervised classification technique using distance function for land surface pattern classification, and the principal components transformation. It is based on data sets from the Earth observing system SPOT4/VEGETATION. Newly classified land cover was compared with GLC 2000 for Korean peninsula to access how well classification performed using confusion matrix.

User Oriented clustering of news articles using Tweets Heterogeneous Information Network (트위트 이형 정보 망을 이용한 뉴스 기사의 사용자 지향적 클러스터링)

  • Shoaib, Muhammad;Song, Wang-Cheol
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.85-94
    • /
    • 2013
  • With the emergence of world wide web, in particular web 2.0 the rapidly growing amount of news articles has created a problem for users in selection of news articles according to their requirements. To overcome this problem different clustering mechanism has been proposed to broadly categorize news articles. However these techniques are totally machine oriented techniques and lack users' participation in the process of decision making for membership of clustering. In order to overcome the issue of zero-participation in the process of clustering news articles in this paper we have proposed a framework for clustering news articles by combining users' judgments that they post on twitter with the news articles to cluster the objects. We have employed twitter hash-tags for this purpose. Furthermore we have computed the credibility of users' based on frequency of retweets for their tweets in order to enhance the accuracy of the clustering membership function. In order to test performance of proposed methodology, we performed experiments on tweets messages tweeted during general election 2013 in Pakistan. Our results proved over claim that using users' output better outcome can be achieved then ordinary clustering algorithms.

Reorganizing Social Issues from R&D Perspective Using Social Network Analysis

  • Shun Wong, William Xiu;Kim, Namgyu
    • Journal of Information Technology Applications and Management
    • /
    • v.22 no.3
    • /
    • pp.83-103
    • /
    • 2015
  • The rapid development of internet technologies and social media over the last few years has generated a huge amount of unstructured text data, which contains a great deal of valuable information and issues. Therefore, text mining-extracting meaningful information from unstructured text data-has gained attention from many researchers in various fields. Topic analysis is a text mining application that is used to determine the main issues in a large volume of text documents. However, it is difficult to identify related issues or meaningful insights as the number of issues derived through topic analysis is too large. Furthermore, traditional issue-clustering methods can only be performed based on the co-occurrence frequency of issue keywords in many documents. Therefore, an association between issues that have a low co-occurrence frequency cannot be recognized using traditional issue-clustering methods, even if those issues are strongly related in other perspectives. Therefore, in this research, a methodology to reorganize social issues from a research and development (R&D) perspective using social network analysis is proposed. Using an R&D perspective lexicon, issues that consistently share the same R&D keywords can be further identified through social network analysis. In this study, the R&D keywords that are associated with a particular issue imply the key technology elements that are needed to solve a particular issue. Issue clustering can then be performed based on the analysis results. Furthermore, the relationship between issues that share the same R&D keywords can be reorganized more systematically, by grouping them into clusters according to the R&D perspective lexicon. We expect that our methodology will contribute to establishing efficient R&D investment policies at the national level by enhancing the reusability of R&D knowledge, based on issue clustering using the R&D perspective lexicon. In addition, business companies could also utilize the results by aligning the R&D with their business strategy plans, to help companies develop innovative products and new technologies that sustain innovative business models.

인위적 데이터를 이용한 군집분석 프로그램간의 비교에 대한 연구

  • 김성호;백승익
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.2
    • /
    • pp.35-49
    • /
    • 2001
  • Over the years, cluster analysis has become a popular tool for marketing and segmentation researchers. There are various methods for cluster analysis. Among them, K-means partitioning cluster analysis is the most popular segmentation method. However, because the cluster analysis is very sensitive to the initial configurations of the data set at hand, it becomes an important issue to select an appropriate starting configuration that is comparable with the clustering of the whole data so as to improve the reliability of the clustering results. Many programs for K-mean cluster analysis employ various methods to choose the initial seeds and compute the centroids of clusters. In this paper, we suggest a methodology to evaluate various clustering programs. Furthermore, to explore the usability of the methodology, we evaluate four clustering programs by using the methodology.

  • PDF