• Title/Summary/Keyword: Cluster Retrieval

Search Result 88, Processing Time 0.023 seconds

Gathering Common-word and Document Reclassification to improve Accuracy of Document Clustering (문서 군집화의 정확률 향상을 위한 범용어 수집과 문서 재분류 알고리즘)

  • Shin, Joon-Choul;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.53-62
    • /
    • 2012
  • Clustering technology is used to deal efficiently with many searched documents in information retrieval system. But the accuracy of the clustering is satisfied to the requirement of only some domains. This paper proposes two methods to increase accuracy of the clustering. We define a common-word, that is frequently used but has low weight during clustering. We propose the method that automatically gathers the common-word and calculates its weight from the searched documents. From the experiments, the clustering error rates using the common-word is reduced to 34% compared with clustering using a stop-word. After generating first clusters using average link clustering from the searched documents, we propose the algorithm that reevaluates the similarity between document and clusters and reclassifies the document into more similar clusters. From the experiments using Naver JiSikIn category, the accuracy of reclassified clusters is increased to 1.81% compared with first clusters without reclassification.

Selection of Cluster Hierarchy Depth in Hierarchical Clustering using K-Means Algorithm (K-means 알고리즘을 이용한 계층적 클러스터링에서의 클러스터 계층 깊이 선택)

  • Lee, Won-Hee;Lee, Shin-Won;Chung, Sung-Jong;An, Dong-Un
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.2
    • /
    • pp.150-156
    • /
    • 2008
  • Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, with a large number of variables, K-means reduces a time complexity. Think of the factor of simplify, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system with hierarchical structure based on document clustering using K-means algorithm. Evaluated the performance on different hierarchy depth and initial uncertain centroid number based on variational relative document amount correspond to given queries. Comparing with regular method that the initial centroids have been established in advance, our method performance has been improved a lot.

Document Clustering using Term reweighting based on NMF (NMF 기반의 용어 가중치 재산정을 이용한 문서군집)

  • Lee, Ju-Hong;Park, Sun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.11-18
    • /
    • 2008
  • Document clustering is an important method for document analysis and is used in many different information retrieval applications. This paper proposes a new document clustering model using the re-weighted term based NMF(non-negative matrix factorization) to cluster documents relevant to a user's requirement. The proposed model uses the re-weighted term by using user feedback to reduce the gap between the user's requirement for document classification and the document clusters by means of machine. The Proposed method can improve the quality of document clustering because the re-weighted terms. the semantic feature matrix and the semantic variable matrix, which is used in document clustering, can represent an inherent structure of document set more well. The experimental results demonstrate appling the proposed method to document clustering methods achieves better performance than documents clustering methods.

  • PDF

Identification of ginseng root using quantitative X-ray microtomography

  • Ye, Linlin;Xue, Yanling;Wang, Yudan;Qi, Juncheng;Xiao, Tiqiao
    • Journal of Ginseng Research
    • /
    • v.41 no.3
    • /
    • pp.290-297
    • /
    • 2017
  • Background: The use of X-ray phase-contrast microtomography for the investigation of Chinese medicinal materials is advantageous for its nondestructive, in situ, and three-dimensional quantitative imaging properties. Methods: The X-ray phase-contrast microtomography quantitative imaging method was used to investigate the microstructure of ginseng, and the phase-retrieval method is also employed to process the experimental data. Four different ginseng samples were collected and investigated; these were classified according to their species, production area, and sample growth pattern. Results: The quantitative internal characteristic microstructures of ginseng were extracted successfully. The size and position distributions of the calcium oxalate cluster crystals (COCCs), important secondary metabolites that accumulate in ginseng, are revealed by the three-dimensional quantitative imaging method. The volume and amount of the COCCs in different species of the ginseng are obtained by a quantitative analysis of the three-dimensional microstructures, which shows obvious difference among the four species of ginseng. Conclusion: This study is the first to provide evidence of the distribution characteristics of COCCs to identify four types of ginseng, with regard to species authentication and age identification, by X-ray phase-contrast microtomography quantitative imaging. This method is also expected to reveal important relationships between COCCs and the occurrence of the effective medicinal components of ginseng.

High-Dimensional Image Indexing based on Adaptive Partitioning ana Vector Approximation (적응 분할과 벡터 근사에 기반한 고차원 이미지 색인 기법)

  • Cha, Gwang-Ho;Jeong, Jin-Wan
    • Journal of KIISE:Databases
    • /
    • v.29 no.2
    • /
    • pp.128-137
    • /
    • 2002
  • In this paper, we propose the LPC+-file for efficient indexing of high-dimensional image data. With the proliferation of multimedia data, there Is an increasing need to support the indexing and retrieval of high-dimensional image data. Recently, the LPC-file (5) that based on vector approximation has been developed for indexing high-dimensional data. The LPC-file gives good performance especially when the dataset is uniformly distributed. However, compared with for the uniformly distributed dataset, its performance degrades when the dataset is clustered. We improve the performance of the LPC-file for the strongly clustered image dataset. The basic idea is to adaptively partition the data space to find subspaces with high-density clusters and to assign more bits to them than others to increase the discriminatory power of the approximation of vectors. The total number of bits used to represent vector approximations is rather less than that of the LPC-file since the partitioned cells in the LPC+-file share the bits. An empirical evaluation shows that the LPC+-file results in significant performance improvements for real image data sets which are strongly clustered.

A study on online WOM search behavior based on shopping orientation (의복쇼핑성향에 따른 온라인 구전 정보탐색행동에 관한 연구)

  • Lee, Angie;Rhee, YoungJu
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.20 no.4
    • /
    • pp.57-71
    • /
    • 2018
  • Since consumers have become more comfortable with providing and receiving information online, 'online word of mouth' has been gaining consideration as one of the major information sources. Also, the shopping orientation of consumers has been proven to be an important determinant of consumer behavior. Therefore, the study investigated the differences in online WOM behavior based on shopping orientation. Hedonic, loyal, and syntonic styles were the types of shopping orientation considered, and the study focused on information retrieval tendencies, the motivation of online WOM search, searching online WOM sources, and the contents for the online WOM behavior. The research conducted an off-line survey targeting females in their twenties. The total number of data sets used in the empirical study was 125, and these were analyzed by SPSS 20.0: factors analysis, Cronbach's ${\alpha}$, k-means cluster, ANOVA, Duncan's multiple range test, Kruskal-Wallis, Mann-Whitney, and Bonferroni correction. The participants were divided into 3 kinds of shopping orientation groups named 'trend-pursuit', 'passive', and 'loyal'. As a result, there were significant differences in online WOM behavior discovered between the groups. Firstly, the 'trend-pursuit' group had the highest number of ongoing searches while the 'loyal' group had the highest number of pre-purchase search. Secondly, the 'trend-pursuit' and 'loyal' groups both had the motivations of online WOM search, hedonic and utility, whereas the 'passive' group had the lowest motivations for both motivations. Thirdly, the 'loyal' group frequently referred to reviews on shopping malls as online WOM sources. The research provided a better understanding of the online WOM behavior of present consumers and suggests that fashion related corporations map out marketing strategies with the understanding of these behaviors.

A Mobile Landmarks Guide : Outdoor Augmented Reality based on LOD and Contextual Device (모바일 랜드마크 가이드 : LOD와 문맥적 장치 기반의 실외 증강현실)

  • Zhao, Bi-Cheng;Rosli, Ahmad Nurzid;Jang, Chol-Hee;Lee, Kee-Sung;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.1-21
    • /
    • 2012
  • In recent years, mobile phone has experienced an extremely fast evolution. It is equipped with high-quality color displays, high resolution cameras, and real-time accelerated 3D graphics. In addition, some other features are includes GPS sensor and Digital Compass, etc. This evolution advent significantly helps the application developers to use the power of smart-phones, to create a rich environment that offers a wide range of services and exciting possibilities. To date mobile AR in outdoor research there are many popular location-based AR services, such Layar and Wikitude. These systems have big limitation the AR contents hardly overlaid on the real target. Another research is context-based AR services using image recognition and tracking. The AR contents are precisely overlaid on the real target. But the real-time performance is restricted by the retrieval time and hardly implement in large scale area. In our work, we exploit to combine advantages of location-based AR with context-based AR. The system can easily find out surrounding landmarks first and then do the recognition and tracking with them. The proposed system mainly consists of two major parts-landmark browsing module and annotation module. In landmark browsing module, user can view an augmented virtual information (information media), such as text, picture and video on their smart-phone viewfinder, when they pointing out their smart-phone to a certain building or landmark. For this, landmark recognition technique is applied in this work. SURF point-based features are used in the matching process due to their robustness. To ensure the image retrieval and matching processes is fast enough for real time tracking, we exploit the contextual device (GPS and digital compass) information. This is necessary to select the nearest and pointed orientation landmarks from the database. The queried image is only matched with this selected data. Therefore, the speed for matching will be significantly increased. Secondly is the annotation module. Instead of viewing only the augmented information media, user can create virtual annotation based on linked data. Having to know a full knowledge about the landmark, are not necessary required. They can simply look for the appropriate topic by searching it with a keyword in linked data. With this, it helps the system to find out target URI in order to generate correct AR contents. On the other hand, in order to recognize target landmarks, images of selected building or landmark are captured from different angle and distance. This procedure looks like a similar processing of building a connection between the real building and the virtual information existed in the Linked Open Data. In our experiments, search range in the database is reduced by clustering images into groups according to their coordinates. A Grid-base clustering method and user location information are used to restrict the retrieval range. Comparing the existed research using cluster and GPS information the retrieval time is around 70~80ms. Experiment results show our approach the retrieval time reduces to around 18~20ms in average. Therefore the totally processing time is reduced from 490~540ms to 438~480ms. The performance improvement will be more obvious when the database growing. It demonstrates the proposed system is efficient and robust in many cases.

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.