Search | Korea Science

The Document Clustering using LSI of IR (LSI를 이용한 문서 클러스터링)

고지현;최영란;유준현;박순철
- Proceedings of the Korea Society for Industrial Systems Conference
- /
- 2002.06a
- /
- pp.330-335
- /
- 2002
The most critical issue in information retrieval system is to have adequate results corresponding to user requests. When all documents related with user inquiry retrieve, it is not easy not only to find correct document what user wants but is limited. Therefore, clustering method that grouped by corresponding documents has widely used so far. In this paper, we cluster on the basis of the meaning rather than the index term in the existing document and a LSI method is applied by this reason. Furthermore, we distinguish and analyze differences from the clustering using widely-used K-Means algorithm for the document clustering.
PDF

The Analysis of Clustering Result with Weight Change using LSI (LSI 를 이용한 가중치 변화에 따른 클러스터링 결과 분석)

Goh, Ji-Hyun;Oh, Hyung-Jin;Park, Soon-Cheol
- Annual Conference of KIPS
- /
- 2002.04b
- /
- pp.1009-1012
- /
- 2002
정보검색시스템에서 가장 중요한 것은 사용자의 요구에 부합하는 결과를 도출하는 것이다. 이를 위하여 사용자의 질의와 연관된 모든 문서들을 추출하게 되는데, 이 많은 결과 문서들 중에서 사용자가 원하는 문서는 소수이고, 원하는 문서를 찾는 것도 쉽지 않다. 따라서 적절한 결과 문서 도출을 위하여 연관된 문서들끼리 그룹화 시키는 클러스터링 방법이 많이 이용된다. 본 논문에서는 클러스터링에 영향을 끼치는 요소 중 문서별 색인어의 가중치가 클러스터링에 끼치는 영향을 알아보았다. 이를 위해 가중치의 변화에 따른 클러스터링 된 결과를 LSI 를 이용하여 도식화하고 그 결과를 분석하였다.
PDF

An Experimental Study on Multi-Document Summarization for Question Answering (질의응답을 위한 복수문서 요약에 관한 실험적 연구)

Choi, Sang-Hee;Chung, Young-Mee
- Journal of the Korean Society for information Management
- /
- v.21 no.3
- /
- pp.289-303
- /
- 2004
This experimental study proposes a multi-document summarization method that produces optimal summaries in which users can find answers to their queries. In order to identify the most effective method for this purpose, the performance of the three summarization methods were compared. The investigated methods are sentence clustering, passage extraction through spreading activation, and clustering-passage extraction hybrid methods. The effectiveness of each summarizing method was evaluated by two criteria used to measure the accuracy and the redundancy of a summary. The passage extraction method using the sequential bnb search algorithm proved to be most effective in summarizing multiple documents with regard to summarization precision. This study proposes the passage extraction method as the optimal multi-document summarization method.
https://doi.org/10.3743/KOSIM.2004.21.3.289 인용 PDF

Policies of Trajectory Clustering in Index based on R-trees for Moving Objects (이동체를 위한 R-트리 기반 색인에서의 궤적 클러스터링 정책)

Ban ChaeHoon;Kim JinGon;Jun BongGi;Hong BongHee
- The KIPS Transactions:PartD
- /
- v.12D no.4 s.100
- /
- pp.507-520
- /
- 2005
The R-trees are usually used for an index of trajectories in moving-objects databases. However, they need to access a number of nodes to trace same trajectories because of considering only a spatial proximity. Overlaps and dead spaces should be minimized to enhance the performance of range queries in moving-objects indexes. Trajectories of moving-objects should be preserved to enhance the performance of the trajectory queries. In this paper, we propose the TP3DR-tree(Trajectory Preserved 3DR-tree) using clusters of trajectories for range and trajectory queries. The TP3DR-tree uses two split policies: one is a spatial splitting that splits the same trajectory by clustering and the other is a time splitting that increases space utilization. In addition, we use connecting information in non-leaf nodes to enhance the performance of combined-queries. Our experiments show that the new index outperforms the others in processing queries on various datasets.
https://doi.org/10.3745/KIPSTD.2005.12D.4.507 인용 PDF KSCI

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
- Journal of KIISE:Databases
- /
- v.31 no.6
- /
- pp.596-608
- /
- 2004
A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.
PDF KSCI

Weighting and Query Structuring Scheme for Disambiguation in CLTR (교차언어 문서검색에서 중의성 해소를 위한 가중치 부여 및 질의어 구조화 방법)

Jeong, Eui-Heon;Kwon, Oh-Woog;Lee, Jong-Hyeok
- Annual Conference on Human and Language Technology
- /
- 2001.10d
- /
- pp.175-182
- /
- 2001
본 논문은 사전에 기반한 질의변환 교차언어 문서검색에서, 대역어 중의성 문제를 해결하기 위한, 질의어 가중치 부여 및 구조화 방법을 제안한다. 제안하는 방법의 질의 변환 과정은 다음의 세 단계로 이루어진다. 첫째, 대역어 클러스터링을 통해 먼저 질의어 단어의 적합한 의미를 결정짓고, 둘째, 문맥정보와 지역정보를 이용하여 후보 대역어들간의 상호관계를 분석하며, 셋째, 각 후보 대역어들을 연결하여, 후보 질의어를 만들고 각각에 가중치를 부여하여 weighted Boolean 질의어로 생성하게 된다. 이를 통해, 단순하고 경제적이지만, 높은 성능을 낼 수 있는 사전에 의한 질의변환 교차언어 문서검색 방법을 제시하고자 한다.
PDF

An Experimental Study on Enhancing the Retrieval Performance for the Web Documents Using Link-Based Clustering Technique (링크기반 클러스터링을 이용한 웹 문서 검색의 성능 향상에 관한 실험적 연구)

김혜진;문성빈
- Proceedings of the Korean Society for Information Management Conference
- /
- 2002.08a
- /
- pp.247-252
- /
- 2002
본 연구에서는 하이퍼텍스트나 웹 문서의 검색에서 링크로 연결된 문서들이 주제적으로 서로 관련되어 있다는 것을 기반으로 하여 링크정보를 참조한 웹 문서 클러스터링 기법을 제안하였고 이것을 이용하여 검색된 결과를 질의 근접 순위화함으로써 웹 문서 검색의 성능을 향상시키는 방안을 연구하였다. 본 연구에서 사용된 웹 문서 집단은 웹(WWW)을 통하여 직접 수집하였으며 웹 문서가 다른 웹 문서를 링크하고 있을 때를 OutLink, 다른 웹 문서로부터 링크를 받고 있을 때를 InLink로 구분하였다. 실험결과 OutLink를 참조하여 클러스터링을 수행하는 기법과 InLink를 참조하여 클러스터링을 수행하는 기법 모두 검색 성능을 향상시켰다.
PDF

Automatic Categorization of Real World FAQs Using Hierarchical Document Clustering (계층적 문서 클러스터링을 이용한 실세계 질의 메일의 자동 분류)

류중원;조성배
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2001.05a
- /
- pp.187-190
- /
- 2001
Due to the recent proliferation of the internet, it is broadly granted that the necessity of the automatic document categorization has been on the rise. Since it is a heavy time-consuming work and takes too much manpower to process and classify manually, we need a system that categorizes them automatically as their contents. In this paper, we propose the automatic E-mail response system that is based on 2 hierarchical document clustering methods. One is to get the final result from the classifier trained seperatly within each class, after clustering the whole documents into 3 groups so that the first classifier categorize the input documents as the corresponding group. The other method is that the system classifies the most distinct classes first as their similarity, successively. Neural networks have been adopted as classifiers, we have used dendrograms to show the hierarchical aspect of similarities between classes. The comparison among the performances of hierarchical and non-hierarchical classifiers tells us clustering methods have provided the classification efficiency.
PDF

Query Optimization for an Advanced Keyword Search on Relational Data Stream (관계형 데이터 스트림에서 고급 키워드 검색을 위한 질의 최적화)

Joo, Jin-Ung;Kim, Hak-Soo;Hwang, Jin-Ho;Son, Jin-Hyun
- The KIPS Transactions:PartD
- /
- v.16D no.6
- /
- pp.859-870
- /
- 2009
Despite the surge in the research for keyword search method over relational database, only little attention has been devoted to studying on relational data stream.The research for keyword search over relational data stream is intense interest because streaming data is recently a major research topic of growing interest in the data management. In this regard we first analyze the researches related to keyword search methodover relational data stream, and then this paper focuses on the method of minimizing the join cost occurred while processing keyword search queries. As a result, we propose an advanced keyword search method that can yield more meaningful results for users on relational data streams. We also propose a query optimization method using layered-clustering for efficient query processing.
https://doi.org/10.3745/KIPSTD.2009.16D.6.859 인용 PDF KSCI

Clustering of MPEG-7 Data for Efficient Management (MPEG-7 데이터의 효율적인 관리를 위한 클러스터링 방법)

Ahn, Byeong-Tae;Kang, Byeong-Shoo;Diao, Jianhua;Kang, Hyun-Syug
- Journal of Korea Multimedia Society
- /
- v.10 no.1
- /
- pp.1-12
- /
- 2007
To use multimedia data in restricted resources of mobile environment, any management method of MPEG-7 documents is needed. At this time, some XML clustering methods can be used. But, to improve the performance efficiency better, a new clustering method which uses the characteristics of MPEG-7 documents is needed. A new clustering improved query processing speed at multimedia search and it possible document storage about various application suitably. In this paper, we suggest a new clustering method of MPEG-7 documents for effective management in multimedia data of large capacity, which uses some semantic relationships among elements of MPEG-7 documents. And also we compared it to the existed clustering methods.
PDF

Search Result 154, Processing Time 0.026 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)