• Title/Summary/Keyword: 문서군집

Search Result 127, Processing Time 0.033 seconds

Technology Development Strategy of Piggyback Transportation System Using Topic Modeling Based on LDA Algorithm

  • Jun, Sung-Chan;Han, Seong-Ho;Kim, Sang-Baek
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.12
    • /
    • pp.261-270
    • /
    • 2020
  • In this study, we identify promising technologies for Piggyback transportation system by analyzing the relevant patent information. In order for this, we first develop the patent database by extracting relevant technology keywords from the pioneering research papers for the Piggyback flactcar system. We then employed textmining to identify the frequently referred words from the patent database, and using these words, we applied the LDA (Latent Dirichlet Allocation) algorithm in order to identify "topics" that are corresponding to "key" technologies for the Piggyback system. Finally, we employ the ARIMA model to forecast the trends of these "key" technologies for technology forecasting, and identify the promising technologies for the Piggyback system. with keyword search method the patent analysis. The results show that data-driven integrated management system, operation planning system and special cargo (especially fluid and gas) handling/storage technologies are identified to be the "key" promising technolgies for the future of the Piggyback system, and data reception/analysis techniques must be developed in order to improve the system performance. The proposed procedure and analysis method provides useful insights to develop the R&D strategy and the technology roadmap for the Piggyback system.

The Evaluation Measure of Text Clustering for the Variable Number of Clusters (가변적 클러스터 개수에 대한 문서군집화 평가방법)

  • Jo, Tae-Ho
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2006.10b
    • /
    • pp.233-237
    • /
    • 2006
  • This study proposes an innovative measure for evaluating the performance of text clustering. In using K-means algorithm and Kohonen Networks for text clustering, the number clusters is fixed initially by configuring it as their parameter, while in using single pass algorithm for text clustering, the number of clusters is not predictable. Using labeled documents, the result of text clustering using K-means algorithm or Kohonen Network is able to be evaluated by setting the number of clusters as the number of the given target categories, mapping each cluster to a target category, and using the evaluation measures of text. But in using single pass algorithm, if the number of clusters is different from the number of target categories, such measures are useless for evaluating the result of text clustering. This study proposes an evaluation measure of text clustering based on intra-cluster similarity and inter-cluster similarity, what is called CI (Clustering Index) in this article.

  • PDF

An Analysi s of Performance Improvement Algorithm for Personalized Recommender System (개인화 추천시스템의 성능 향상 적용 알고리즘 분석)

  • Yun Sujin;Yoon Heebyung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.04a
    • /
    • pp.181-184
    • /
    • 2005
  • 무수히 많은 정보 중에서 특정 사용자에게 가장 유용할 것으로 판단되는 정보를 추천하여 제공함으로써 특정 사용자의 편의를 돕는 시스템이 추천시스템이다. 이러한 추천시스템에 성공적으로 적용된 알고리즘이 협력적 필터링이며 이것은 다른 사용자로부터 먼저 평가된 웹 문서를 제공받아 이를 축적하고 다시 사용자에게 환원하는 알고리즘이다. 하지만 이 알고리즘은 초기평가, 희소성, 확장성 둥의 문제점을 내포하고 있다. 따라서 본 논문은 이러한 문제점을 해결하고 성능 향상을 하기 위해 적용된 개인화 추천시스템 관련 최신 알고리즘들을 비교하고 분석한 결과를 제시한다. 이를 위해 먼저 최근에 발표된 협력적 필터링과 최근접 이웃 알고리즘, 인공 지능기술을 이용한 알고리즘, 군집화 알고리즘 둥 각각에 대한 기술적 분석 결과를 수행한다. 그런 후 이들 다양한 알고리즘들의 조합을 통한 성능 향상 결과에 대한 비교분석과 각각의 조합에 대한 장단점 분석 결과도 또한 제시한다.

  • PDF

A Study on Shot Change Detection Applying the Law of Inertia (관성의 법칙을 적용시킨 장면 전환 검출에 관한 연구)

  • Kim, Kyong-Wook;Lee, Hyo-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2003.05a
    • /
    • pp.515-518
    • /
    • 2003
  • 멀티미디어 데이터베이스 시스템의 구현 과정 중 제일 첫 번째 단계라 할 수 있는 장면전환의 검출은 비디오 데이터베이스 시스템에서뿐만 아니라 비디오 검출, 비디오 압축 비디오 문서의 군집 화 등 여러 분야에서 유용하게 다루어지고 있고. 또 이미 많은 알고리즘들이 개발되어 있다. 이미 개발되어져 있는 알고리즘들을 구현 비교하는 과정에서, 제시된 알고리즘들은 부분적으로는 장면전환의 정확한 검출을 하고 있으나 잡음이 삽입되거나 특수한 상황에서는 잘못된 견과를 나타내고 있다. 실세계에서 적용되는 뉴턴의 제 1법칙인 관성의 법칙이 지니는 특성을 장면전화 검출에 적용시키고자 하였다. 제시된 알고리즘의 성능을 증명하기 위하여, 본 논문에서는 이미 발표된 여러 알고리즘들의 성능과 관성의 법칙을 적용시킨 알고리즘의 성능을 비교하였다.

  • PDF

Comparison of Algorithms for Shot Change Detection (장면전환 검출 알고리즘의 구현 및 비교)

  • Kim, Kyong-Wook;Lee, Hyo-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2002.11a
    • /
    • pp.625-628
    • /
    • 2002
  • 동영상의 장면 전환의 검출은 특정 객체의 검출, 비디오 압축 또는 비디오 문서의 군집화, 비디오 데이터베이스 시스템 등 많은 응용프로그램에서 유용하게 다루어진다. 특히 멀티미디어 데이터베이스에서 이미지를 검출하는 처음 단계로서 Shot Change 검출은 아주 중요하다. 이미 장면 전환의 검출을 위한 여러 알고리즘이 개발되어 발표되었다. 본 논문에서는 대용량의 영상 데이터 사이즈를 고려하여 검출에 소요되는 시간과 검출의 정확도의 상쇄관계를 알아보기 위해서 히스토그램의 분포에 의한 알고리즘과 이미지의 평균과 분산을 이용한 알고리즘을 구현하고 그 알고리즘 간의 성능의 차이를 비교하였다.

  • PDF

Cluster-Based Selection of Diverse Query Examples for Active Learning (능동적 학습을 위한 군집화 기반의 다양한 복수 문의 예제 선정 방법)

  • Kang, Jae-Ho;Ryu, Kwang-Ryel;Kwon, Hyuk-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.11 no.1
    • /
    • pp.169-189
    • /
    • 2005
  • In order to derive a better classifier with a limited number of training examples, active teaming alternately repeats the querying stage fur category labeling and the subsequent learning stage fur rebuilding the calssifier with the newly expanded training set. To relieve the user from the burden of labeling, especially in an on-line environment, it is important to minimize the number of querying steps as well as the total number of query examples. We can derive a good classifier in a small number of querying steps by using only a small number of examples if we can select multiple of diverse, representative, and ambiguous examples to present to the user at each querying step. In this paper, we propose a cluster-based batch query selection method which can select diverse, representative, and highly ambiguous examples for efficient active learning. Experiments with various text data sets have shown that our method can derive a better classifier than other methods which only take into account the ambiguity as the criterion to select multiple query examples.

  • PDF

A R&D strategies for development using structured association map (구조화된 연관맵을 이용한 연구개발 전략 수립)

  • Song, Wonho;Lee, Junseok;Park, Sangsung
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.3
    • /
    • pp.190-195
    • /
    • 2016
  • A technology is continuously developed in a rapidly changing global market. A company requires an appropriate R&D strategy for adapting to this environment. That is, the technologies owned by the company needs to be thoroughly analyzed to improve its competitiveness. Alternatively, technology classification using IPC codes is carried out recently in an objective and quantitative way. International Patent Classification, IPC is an internationally specified classification system, so it is helpful to conduct an objective and quantitative patent analysis of technology. In this study, all of the patents owned by company C are investigated and a matrix representing IPC codes of each patent is created. Then, a structured association map of the patents is made through association rules mining based on Confidence. The association map can be used to inspect the current situation of a company about patents. It also allows highly associated technologies to be clustered. Using the association map, this study analyzes the technologies of company C and how it changes with time. The strategy for future technologies is established based on the result.

Three-Level Color Clustering Algorithm for Binarizing Scene Text Images (자연영상 텍스트 이진화를 위한 3단계 색상 군집화 알고리즘)

  • Kim Ji-Soo;Kim Soo-Hyung
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.737-744
    • /
    • 2005
  • In this paper, we propose a three-level color clustering algerian for the binarization of text regions extracted from natural scene images. The proposed algorithm consists of three phases of color segmentation. First, the ordinary images in which the texts are well separated from the background, are binarized. Then, in the second phase, the input image is passed through a high pass filter to deal with those affected by natural or artificial light. Finally, the image Is passed through a low pass filter to deal with the texture in texts and/or background. We have shown that the proposed algorithm is more effective used gray-information binarization algorithm. To evaluate the effectiveness of the proposed algorithm we use a commercial OCR software ARMI 6.0 to observe the recognition accuracies on the binarized images. The experimental results on word and character recognition show that the proposed approach is more accurate than conventional methods by over $35\%$.

A Comparative Study on Clustering Methods for Grouping Related Tags (연관 태그의 군집화를 위한 클러스터링 기법 비교 연구)

  • Han, Seung-Hee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.43 no.3
    • /
    • pp.399-416
    • /
    • 2009
  • In this study, clustering methods with related tags were discussed for improving search and exploration in the tag space. The experiments were performed on 10 Delicious tags and the strongly-related tags extracted by each 300 documents, and hierarchical and non-hierarchical clustering methods were carried out based on the tag co-occurrences. To evaluate the experimental results, cluster relevance was measured. Results showed that Ward's method with cosine coefficient, which shows good performance to term clustering, was best performed with consistent clustering tendency. Furthermore, it was analyzed that cluster membership among related tags is based on users' tagging purposes or interest and can disambiguate word sense. Therefore, tag clusters would be helpful for improving search and exploration in the tag space.

News Topic Extraction based on Word Similarity (단어 유사도를 이용한 뉴스 토픽 추출)

  • Jin, Dongxu;Lee, Soowon
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1138-1148
    • /
    • 2017
  • Topic extraction is a technology that automatically extracts a set of topics from a set of documents, and this has been a major research topic in the area of natural language processing. Representative topic extraction methods include Latent Dirichlet Allocation (LDA) and word clustering-based methods. However, there are problems with these methods, such as repeated topics and mixed topics. The problem of repeated topics is one in which a specific topic is extracted as several topics, while the problem of mixed topic is one in which several topics are mixed in a single extracted topic. To solve these problems, this study proposes a method to extract topics using an LDA that is robust against the problem of repeated topic, going through the steps of separating and merging the topics using the similarity between words to correct the extracted topics. As a result of the experiment, the proposed method showed better performance than the conventional LDA method.