• Title/Summary/Keyword: Automatic Clustering

Search Result 242, Processing Time 0.023 seconds

Automatic word sense clustering using collocation for practical sense boundaries (의미 경계의 현실화를 위한 공기정보의 자동 군집화)

  • 신사임;최기선
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.04b
    • /
    • pp.559-561
    • /
    • 2004
  • 본 논문에서는 다의어의 현실적인 의미 분포의 결정에 대해 이야기 하고자 한다. 수동으로 구축한 의미체계인 사전이나 시소러스들은 그 의미구분의 경개가 모호하고 비현실적인 부분이 많아서 언어처리 시스템의 적용에 문제점으로 지적되고 있다. 그러므로, 본 연구에서는 대용량 코퍼스에서 추출한 공기정보와 자동 군집화 방법들을 사용하여 실질적인 다의어의 의미 경계를 발견하는 방법을 제안하였다. 수동 구축된 사전과 코퍼스 기반 사전의 다의어 의미 분포와 비교해 본 결과, 본 논문에서 제안한 방법의 결과가 코퍼스 기반 사전의 의미 분포와 매우 유사한 결과를 보이는 것을 확인할 수 있었다.

  • PDF

A Method of Descriptor Extraction for Automatic Document Clustering (자동 문서 클러스터링을 위한 디스크립터 추출 방안)

  • Yun, Bo-Hyun;Kang, Hyun-Kyu;Ko, Hyung-Dae
    • Annual Conference of KIPS
    • /
    • 2000.04a
    • /
    • pp.230-233
    • /
    • 2000
  • 기존의 검색엔진은 검색결과를 적합도 순서로 나열하여 사용자가 원하는 문서를 찾는데 어려움이 있다. 이러한 문제의 해결책으로 검색결과 문서에 대해 자동 클러스터링을 수행하여 문서 내용이 유사한 문서가 하나의 클러스터내에 존재하도록 한다. 본 논문에서는 검색 결과 문서의 클러스터링에서 필요한 디스크립터 추출 방안을 제안한다. 각 클러스터 내에서 디스크립터를 추출하기 위해 정보검색의 색인과정에서 사용하는 용어 가중치 계산 방법을 이용한다.

  • PDF

Evolutionary Nonlinear Regression Based Compensation Technique for Short-range Prediction of Wind Speed using Automatic Weather Station (AWS 지점별 기상데이타를 이용한 진화적 회귀분석 기반의 단기 풍속 예보 보정 기법)

  • Hyeon, Byeongyong;Lee, Yonghee;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.64 no.1
    • /
    • pp.107-112
    • /
    • 2015
  • This paper introduces an evolutionary nonlinear regression based compensation technique for the short-range prediction of wind speed using AWS(Automatic Weather Station) data. Development of an efficient MOS(Model Output Statistics) is necessary to correct systematic errors of the model, but a linear regression based MOS is hard to manage an irregular nature of weather prediction. In order to solve the problem, a nonlinear and symbolic regression method using GP(Genetic Programming) is suggested for a development of MOS wind forecast guidance. Also FCM(Fuzzy C-Means) clustering is adopted to mitigate bias of wind speed data. The purpose of this study is to evaluate the accuracy of the estimation by a GP based nonlinear MOS for 3 days prediction of wind speed in South Korean regions. This method is then compared to the UM model and has shown superior results. Data for 2007-2009, 2011 is used for training, and 2012 is used for testing.

A Text Summarization Model Based on Sentence Clustering (문장 클러스터링에 기반한 자동요약 모형)

  • 정영미;최상희
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.3
    • /
    • pp.159-178
    • /
    • 2001
  • This paper presents an automatic text summarization model which selects representative sentences from sentence clusters to create a summary. Summary generation experiments were performed on two sets of test documents after learning the optimum environment from a training set. Centroid clustering method turned out to be the most effective in clustering sentences, and sentence weight was found more effective than the similarity value between sentence and cluster centroid vectors in selecting a representative sentence from each cluster. The result of experiments also proves that inverse sentence weight as well as title word weight for terms and location weight for sentences are effective in improving the performance of summarization.

  • PDF

A Design of an Improved Linguistic Model based on Information Granules (정보 입자에 근거한 개선된 언어적인 모델의 설계)

  • Han, Yun-Hee;Kwak, Keun-Chang
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.47 no.3
    • /
    • pp.76-82
    • /
    • 2010
  • In this paper, we develop Linguistic Model (LM) based on information granules as a systematic approach to generating fuzzy if-then rules from a given input-output data. The LM introduced by Pedrycz is performed by fuzzy information granulation obtained from Context-based Fuzzy Clustering(CFC). This clustering estimates clusters by preserving the homogeneity of the clustered patterns associated with the input and output data. Although the effectiveness of LM has been demonstrated in the previous works, it needs to improve in the sense of performance. Therefore, we focus on the automatic generation of linguistic contexts, addition of bias term, and the transformed form of consequent parameter to improve both approximation and generalization capability of the conventional LM. The experimental results revealed that the improved LM yielded a better performance in comparison with LM and the conventional works for automobile MPG(miles per gallon) predication and Boston housing data.

Extraction of Basic Insect Footprint Segments Using ART2 of Automatic Threshold Setting (자동 임계값 설정 ART2를 이용한 곤충 발자국의 인식 대상 영역 추출)

  • Shin, Bok-Suk;Cha, Eui-Young;Woo, Young-Woon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.8
    • /
    • pp.1604-1611
    • /
    • 2007
  • In a process of insect footprint recognition, basic footprint segments should be extracted from a whole insect footprint image in order to find out appropriate features for classification. In this paper, we used a clustering method as a preprocessing stage for extraction of basic insect footprint segments. In general, sizes and strides of footprints may be different according to type and sire of an insect for recognition. Therefore we proposed an improved ART2 algorithm for extraction or basic insect footprint segments regardless of size and stride or footprint pattern. In the proposed ART2 algorithm, threshold value for clustering is determined automatically using contour shape of the graph created by accumulating distances between all the spots of footprint pattern. In the experimental results applying the proposed method to two kinds of insect footprint patterns, we could see that all the clustering results were accomplished correctly.

Automatic e-mail Hierarchy Classification using Dynamic Category Hierarchy and Principal Component Analysis (PCA와 동적 분류체계를 사용한 자동 이메일 계층 분류)

  • Park, Sun
    • Journal of Advanced Navigation Technology
    • /
    • v.13 no.3
    • /
    • pp.419-425
    • /
    • 2009
  • The amount of incoming e-mails is increasing rapidly due to the wide usage of Internet. Therefore, it is more required to classify incoming e-mails efficiently and accurately. Currently, the e-mail classification techniques are focused on two way classification to filter spam mails from normal ones based mainly on Bayesian and Rule. The clustering method has been used for the multi-way classification of e-mails. But it has a disadvantage of low accuracy of classification and no category labels. The classification methods have a disadvantage of training and setting of category labels by user. In this paper, we propose a novel multi-way e-mail hierarchy classification method that uses PCA for automatic category generation and dynamic category hierarchy for high accuracy of classification. It classifies a huge amount of incoming e-mails automatically, efficiently, and accurately.

  • PDF

A Statistical Approach for Extracting and Miming Relation between Concepts (개념간 관계의 추출과 명명을 위한 통계적 접근방법)

  • Kim Hee-soo;Choi Ikkyu;Kim Minkoo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.4 s.100
    • /
    • pp.479-486
    • /
    • 2005
  • The ontology was proposed to construct the logical basis of semantic web. Ontology represents domain knowledge in the formal form and it enables that machine understand domain knowledge and provide appropriate intelligent service for user request. However, the construction and the maintenance of ontology requires large amount of cost and human efforts. This paper proposes an automatic ontology construction method for defining relation between concepts in the documents. The Proposed method works as following steps. First we find concept pairs which compose association rule based on the concepts in domain specific documents. Next, we find pattern that describes the relation between concepts by clustering the context between two concepts composing association rule. Last, find generalized pattern name by clustering the clustered patterns. To verify the proposed method, we extract relation between concepts and evaluate the result using documents set provide by TREC(Text Retrieval Conference). The result shows that proposed method cant provide useful information that describes relation between concepts.

A Study on the Applicability of Safety Performance Indicators using the Density-Based Ship Domain (밀도기반 선박 도메인을 이용한 안전 성능 지표 활용성 연구)

  • Yeong-Jae Han;Sunghyun Sim;Hyerim Bae
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.89-97
    • /
    • 2022
  • Various efforts are needed to prevent accidents because ship collisions can cause various negative situations such as economic losses and casualties. Therefore, research to prevent accidents is being actively conducted, and in this study, new leading indicators for preventing ship collision accidents is proposed. In previous studies, the risk of collision was expressed in consideration of the distance between ships in a specific sea area, but there is a disadvantage that a new model needs to be developed to apply this to other sea areas. In this study, the density-based ship domain DESD (Density-based Empirical Ship Domain) including the environment and operating characteristics of the sea area was defined using AIS (Automatic Identification System) data, which is ship operation information. Deep clustering is applied to two-dimensional DESDs created for each sea area to cluster the seas with similar operating environments. Through the analysis of the relationship between clustered sea areas and ship collision accidents, it was statistically tested that the occurrence of accidents varies by characteristic of each sea area, and it was proved that DESD can be used as a leading indicator of accidents.

An Automatic Classification System of Korean Documents Using Weight for Keywords of Document and Word Cluster (문서의 주제어별 가중치 부여와 단어 군집을 이용한 한국어 문서 자동 분류 시스템)

  • Hur, Jun-Hui;Choi, Jun-Hyeog;Lee, Jung-Hyun;Kim, Joong-Bae;Rim, Kee-Wook
    • The KIPS Transactions:PartB
    • /
    • v.8B no.5
    • /
    • pp.447-454
    • /
    • 2001
  • The automatic document classification is a method that assigns unlabeled documents to the existing classes. The automatic document classification can be applied to a classification of news group articles, a classification of web documents, showing more precise results of Information Retrieval using a learning of users. In this paper, we use the weighted Bayesian classifier that weights with keywords of a document to improve the classification accuracy. If the system cant classify a document properly because of the lack of the number of words as the feature of a document, it uses relevance word cluster to supplement the feature of a document. The clusters are made by the automatic word clustering from the corpus. As the result, the proposed system outperformed existing classification system in the classification accuracy on Korean documents.

  • PDF