• Title/Summary/Keyword: Supervised clustering

Search Result 115, Processing Time 0.027 seconds

Microblog Sentiment Analysis Method Based on Spectral Clustering

  • Dong, Shi;Zhang, Xingang;Li, Ya
    • Journal of Information Processing Systems
    • /
    • v.14 no.3
    • /
    • pp.727-739
    • /
    • 2018
  • This study evaluates the viewpoints of user focus incidents using microblog sentiment analysis, which has been actively researched in academia. Most existing works have adopted traditional supervised machine learning methods to analyze emotions in microblogs; however, these approaches may not be suitable in Chinese due to linguistic differences. This paper proposes a new microblog sentiment analysis method that mines associated microblog emotions based on a popular microblog through user-building combined with spectral clustering to analyze microblog content. Experimental results for a public microblog benchmark corpus show that the proposed method can improve identification accuracy and save manually labeled time compared to existing methods.

Unsupervised Speaker Adaptation Based on Sufficient HMM Statistics (SUFFICIENT HMM 통계치에 기반한 UNSUPERVISED 화자 적응)

  • Ko Bong-Ok;Kim Chong-Kyo
    • Proceedings of the KSPS conference
    • /
    • 2003.05a
    • /
    • pp.127-130
    • /
    • 2003
  • This paper describes an efficient method for unsupervised speaker adaptation. This method is based on selecting a subset of speakers who are acoustically close to a test speaker, and calculating adapted model parameters according to the previously stored sufficient HMM statistics of the selected speakers' data. In this method, only a few unsupervised test speaker's data are required for the adaptation. Also, by using the sufficient HMM statistics of the selected speakers' data, a quick adaptation can be done. Compared with a pre-clustering method, the proposed method can obtain a more optimal speaker cluster because the clustering result is determined according to test speaker's data on-line. Experiment results show that the proposed method attains better improvement than MLLR from the speaker independent model. Moreover the proposed method utilizes only one unsupervised sentence utterance, while MLLR usually utilizes more than ten supervised sentence utterances.

  • PDF

Improving Clustering Performance Using Gene Ontology (유전자 온톨로지를 활용한 클러스터링 성능 향상 기법)

  • Ko, Song;Kang, Bo-Yeong;Kim, Dae-Won
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.6
    • /
    • pp.802-808
    • /
    • 2009
  • Recently many researches have been presented to improve the clustering performance of gene expression data by incorporating Gene Ontology into the process of clustering. In particular, Kustra et al. showed higher performance improvement by exploiting Biological Process Ontology compared to the typical expression-based clustering. This paper extends the work of Kustra et al. by performing extensive experiments on the way of incorporating GO structures. To this end, we used three ontological distance measures (Lin's, Resnik's, Jiang's) and three GO structures (BP, CC, MF) for the yeast expression data. From all test cases, We found that clustering performances were remarkably improved by incorporating GO; especially, Resnik's distance measure based on Biological Process Ontology was the best.

Accessing the Clustering of TNM Stages on Survival Analysis of Lung Cancer Patient (폐암환자 생존분석에 대한 TNM 병기 군집분석 평가)

  • Choi, Chulwoong;Kim, Kyungbaek
    • Smart Media Journal
    • /
    • v.9 no.4
    • /
    • pp.126-133
    • /
    • 2020
  • The treatment policy and prognosis are determined based on the final stage of lung cancer patients. The final stage of lung cancer patients is determined based on the T, N, and M stage classification table provided by the American Cancer Society (AJCC). However, the final stage of AJCC has limitations in its use for various fields such as patient treatment, prognosis and survival days prediction. In this paper, clustering algorithm which is one of non-supervised learning algorithms was assessed in order to check whether using only T, N, M stages with a data science method is effective for classifying the group of patients in the aspect of survival days. The final stage groups and T, N, M stage clustering groups of lung cancer patients were compared by using the cox proportional hazard model. It is confirmed that the accuracy of prediction of survival days with only T, N, M stages becomes higher than the accuracy with the final stages of patients. Especially, the accuracy of prediction of survival days with clustering of T, N, M stages improves when more or less clusters are analyzed than the seven clusters which is same to the number of final stage of AJCC.

The classified method for overlapping data

  • Kruatrachue, Boontee;Warunsin, Kulwarun;Siriboon, Kritawan
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.2037-2040
    • /
    • 2004
  • In this paper we introduce a new prototype based classifiers for overlapping data, where training pattern can be overlap on the feature space. The proposed classifier is based on the prototype from neural network classifier (NNC)[1] for overlap data. The method automatically chooses the initial center and two radiuses for each class. The center is used as a mean representative of training data for each class. The unclassified pattern is classified by measure distance from the class center. If the distance is in the lower (shorter radius) the unknown pattern has the high percentage of being in this class. If the distance is between the lower and upper (further radius), the pattern has the probability of being in this class or others. But if the distance is outside the upper, the pattern is not in this class. We borrow the words upper and lower from the rough set to represent the region of certainty [3]. The training algorithm to find number of cluster and their parameters (center, lower, upper) is presented. The clustering result is tested using patterns from Thai handwritten letter and the clustering result is very similar to human eyes clustering.

  • PDF

An Ensemble Clustering Algorithm based on a Prior Knowledge (사전정보를 활용한 앙상블 클러스터링 알고리즘)

  • Ko, Song;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.36 no.2
    • /
    • pp.109-121
    • /
    • 2009
  • Although a prior knowledge is a factor to improve the clustering performance, it is dependant on how to use of them. Especial1y, when the prior knowledge is employed in constructing initial centroids of cluster groups, there should be concerned of similarities of a prior knowledge. Despite labels of some objects of a prior knowledge are identical, the objects whose similarities are low should be separated. By separating them, centroids of initial group were not fallen in a problem which is collision of objects with low similarities. There can use the separated prior knowledge by various methods such as various initializations. To apply association rule, proposed method makes enough cluster group number, then the centroids of initial groups could constructed by separated prior knowledge. Then ensemble of the various results outperforms what can not be separated.

An Ensemble Model for Machine Failure Prediction (앙상블 모델 기반의 기계 고장 예측 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.1
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

A New Supervised Competitive Learning Algorithm and Its Application to Power System Transient Stability Analysis (새로운 지도 경쟁 학습 알고리즘의 개발과 전력계통 과도안정도 해석에의 적용)

  • Park, Young-Moon;Cho, Hong-Shik;Kim, Gwang-Won
    • Proceedings of the KIEE Conference
    • /
    • 1995.07b
    • /
    • pp.591-593
    • /
    • 1995
  • Artificial neural network based pattern recognition method is one of the most probable candidate for on-line power system transient stability analysis. Especially, Kohonen layer is an adequate neural network for the purpose. Each node of Kehonen layer competes on the basis of which of them has its clustering center closest to an input vector. This paper discusses Kohonen's LVQ(Learning Victor Quantization) and points out a defection of the algorithm when applied to the transient stability analysis. Only the clustering centers located near the decision boundary of the stability region is needed for the stability criterion and the centers far from the decision boundary are redundant. This paper presents a new algorithm ratted boundary searching algorithm II which assigns only the points that are near the boundary in an input space to nodes or Kohonen layer as their clustering centers. This algorithm is demonstrated with satisfaction using 4-generator 6-bus sample power system.

  • PDF

Generation of Efficient Fuzzy Classification Rules for Intrusion Detection (침입 탐지를 위한 효율적인 퍼지 분류 규칙 생성)

  • Kim, Sung-Eun;Khil, A-Ra;Kim, Myung-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.6
    • /
    • pp.519-529
    • /
    • 2007
  • In this paper, we investigate the use of fuzzy rules for efficient intrusion detection. We use evolutionary algorithm to optimize the set of fuzzy rules for intrusion detection by constructing fuzzy decision trees. For efficient execution of evolutionary algorithm we use supervised clustering to generate an initial set of membership functions for fuzzy rules. In our method both performance and complexity of fuzzy rules (or fuzzy decision trees) are taken into account in fitness evaluation. We also use evaluation with data partition, membership degree caching and zero-pruning to reduce time for construction and evaluation of fuzzy decision trees. For performance evaluation, we experimented with our method over the intrusion detection data of KDD'99 Cup, and confirmed that our method outperformed the existing methods. Compared with the KDD'99 Cup winner, the accuracy was increased by 1.54% while the cost was reduced by 20.8%.

Improved Algorithm of Hybrid c-Means Clustering for Supervised Classification of Remote Sensing Images (원격탐사 영상의 감독분류를 위한 개선된 하이브리드 c-Means 군집화 알고리즘)

  • Jeon, Young-Joon;Kim, Jin-Il
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.8 no.3
    • /
    • pp.185-191
    • /
    • 2007
  • Remote sensing images are multispectral image data collected from several band divided by wavelength ranges. The classification of remote sensing images is the method of classifying what has similar spectral characteristics together among each pixel composing an image as the important algorithm in this field. This paper presents a pattern classification method of remote sensing images by applying a possibilistic fuzzy c-means (PFCM) algorithm. The PFCM algorithm is a hybridization of a FCM algorithm, which adopts membership degree depending on the distance between data and the center of a certain cluster, combined with a PCM algorithm, which considers class typicality of the pattern sets. In this proposed method, we select the training data for each class and perform supervised classification using the PFCM algorithm with spectral signatures of the training data. The application of the PFCM algorithm is tested and verified by using Landsat TM and IKONOS remote sensing satellite images. As a result, the overall accuracy showed a better results than the FCM, PCM algorithm or conventional maximum likelihood classification(MLC) algorithm.

  • PDF