• Title/Summary/Keyword: Latent semantic indexing

Search Result 18, Processing Time 0.108 seconds

An Experimental Study on Opinion Classification Using Supervised Latent Semantic Indexing(LSI) (지도적 잠재의미색인(LSI)기법을 이용한 의견 문서 자동 분류에 관한 실험적 연구)

  • Lee, Ji-Hye;Chung, Young-Mee
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.451-462
    • /
    • 2009
  • The aim of this study is to apply latent semantic indexing(LSI) techniques for efficient automatic classification of opinionated documents. For the experiments, we collected 1,000 opinionated documents such as reviews and news, with 500 among them labelled as positive documents and the remaining 500 as negative. In this study, sets of content words and sentiment words were extracted using a POS tagger in order to identify the optimal feature set in opinion classification. Findings addressed that it was more effective to employ LSI techniques than using a term indexing method in sentiment classification. The best performance was achieved by a supervised LSI technique.

Latent Semantic Indexing Analysis of K-Means Document Clustering for Changing Index Terms Weighting (색인어 가중치 부여 방법에 따른 K-Means 문서 클러스터링의 LSI 분석)

  • Oh, Hyung-Jin;Go, Ji-Hyun;An, Dong-Un;Park, Soon-Chul
    • The KIPS Transactions:PartB
    • /
    • v.10B no.7
    • /
    • pp.735-742
    • /
    • 2003
  • In the information retrieval system, document clustering technique is to provide user convenience and visual effects by rearranging documents according to the specific topics from the retrieved ones. In this paper, we clustered documents using K-Means algorithm and present the effect of index terms weighting scheme on the document clustering. To verify the experiment, we applied Latent Semantic Indexing approach to illustrate the clustering results and analyzed the clustering results in 2-dimensional space. Experimental results showed that in case of applying local weighting, global weighting and normalization factor, the density of clustering is higher than those of similar or same weighting schemes in 2-dimensional space. Especially, the logarithm of local and global weighting is noticeable.

A Comparative Study between LSI and LDA in Constructing Traceability between Functional and Non-Functional Requirements

  • Byun, Sung-Hoon;Lee, Seok-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.24 no.7
    • /
    • pp.19-29
    • /
    • 2019
  • Requirements traceability is regarded as one of the important quality attributes in software requirements engineering field. If requirements traceability is guaranteed then we can trace the requirements' life throughout all the phases, from the customers' needs in the early stage of the project to requirements specification, deployment, and maintenance phase. This includes not only tracking the development artifacts that accompany the requirements, but also tracking backwards from the development artifacts to the initial customer requirements associated with them. In this paper, especially, we dealt with the traceability between functional requirements and non-functional requirements. Among many Information Retrieval (IR) techniques, we decided to utilize Latent Semantic Indexing (LSI) and Latent Dirichlet Allocation (LDA) in our research. Ultimately, we conducted an experiment on constructing traceability by using two techniques and analyzed the experiment results. And then we provided a comparative study between two IR techniques in constructing traceability between functional requirements and non-functional requirements.

A Mobile P2P Semantic Information Retrieval System with Effective Updates

  • Liu, Chuan-Ming;Chen, Cheng-Hsien;Chen, Yen-Lin;Wang, Jeng-Haur
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.5
    • /
    • pp.1807-1824
    • /
    • 2015
  • As the technologies advance, mobile peer-to-peer (MP2P) networks or systems become one of the major ways to share resources and information. On such a system, the information retrieval (IR), including the development of scalable infrastructures for indexing, becomes more complicated due to a huge increase on the amount of information and rapid information change. To keep the systems on MP2P networks more reliable and consistent, the index structures need to be updated frequently. For a semantic IR system, the index structure is even more complicated than a classic IR system and generally has higher update cost. The most well-known indexing technique used in semantic IR systems is Latent Semantic Indexing (LSI), of which the index structure is generated by singular value decomposition (SVD). Although LSI performs well, updating the index structure is not easy and time consuming. In an MP2P environment, which is fully distributed and dynamic, the update becomes more challenging. In this work, we consider how to update the sematic index generated by LSI and keep the index consistent in the whole MP2P network. The proposed Concept Space Update (CSU) protocol, based on distributed 2-Phase locking strategy, can effectively achieve the objectives in terms of two measurements: coverage speed and update cost. Using the proposed effective synchronization mechanism with the efficient updates on the SVD, re-computing the whole index on the P2P overlay can be avoided and the consistency can be achieved. Simulated experiments are also performed to validate our analysis on the proposed CSU protocol. The experimental results indicate that CSU is effective on updating the concept space with LSI/SVD index structure in MP2P semantic IR systems.

Automatic Construction of Reduced Dimensional Cluster-based Keyword Association Networks using LSI (LSI를 이용한 차원 축소 클러스터 기반 키워드 연관망 자동 구축 기법)

  • Yoo, Han-mook;Kim, Han-joon;Chang, Jae-young
    • Journal of KIISE
    • /
    • v.44 no.11
    • /
    • pp.1236-1243
    • /
    • 2017
  • In this paper, we propose a novel way of producing keyword networks, named LSI-based ClusterTextRank, which extracts significant key words from a set of clusters with a mutual information metric, and constructs an association network using latent semantic indexing (LSI). The proposed method reduces the dimension of documents through LSI, decomposes documents into multiple clusters through k-means clustering, and expresses the words within each cluster as a maximal spanning tree graph. The significant key words are identified by evaluating their mutual information within clusters. Then, the method calculates the similarities between the extracted key words using the term-concept matrix, and the results are represented as a keyword association network. To evaluate the performance of the proposed method, we used travel-related blog data and showed that the proposed method outperforms the existing TextRank algorithm by about 14% in terms of accuracy.

Retrieval Model using Subject Classification Table, User Profile, and LSI (전공분류표, 사용자 프로파일, LSI를 이용한 검색 모델)

  • Woo Seon-Mi
    • The KIPS Transactions:PartD
    • /
    • v.12D no.5 s.101
    • /
    • pp.789-796
    • /
    • 2005
  • Because existing information retrieval systems, in particular library retrieval systems, use 'exact keyword matching' with user's query, they present user with massive results including irrelevant information. So, a user spends extra effort and time to get the relevant information from the results. Thus, this paper will propose SULRM a Retrieval Model using Subject Classification Table, User profile, and LSI(Latent Semantic Indexing), to provide more relevant results. SULRM uses document filtering technique for classified data and document ranking technique for non-classified data in the results of keyword-based retrieval. Filtering technique uses Subject Classification Table, and ranking technique uses user profile and LSI. And, we have performed experiments on the performance of filtering technique, user profile updating method, and document ranking technique using the results of information retrieval system of our university' digital library system. In case that many documents are retrieved proposed techniques are able to provide user with filtered data and ranked data according to user's subject and preference.

Comparison and Analysis of Subject Classification for Domestic Research Data (국내 학술논문 주제 분류 알고리즘 비교 및 분석)

  • Choi, Wonjun;Sul, Jaewook;Jeong, Heeseok;Yoon, Hwamook
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.8
    • /
    • pp.178-186
    • /
    • 2018
  • Subject classification of thesis units is essential to serve scholarly information deliverables. However, to date, there is a journal-based topic classification, and there are not many article-level subject classification services. In the case of academic papers among domestic works, subject classification can be a more important information because it can cover a larger area of service and can provide service by setting a range. However, the problem of classifying themes by field requires the hands of experts in various fields, and various methods of verification are needed to increase accuracy. In this paper, we try to classify topics using the unsupervised learning algorithm to find the correct answer in the unknown state and compare the results of the subject classification algorithms using the coherence and perplexity. The unsupervised learning algorithms are a well-known Hierarchical Dirichlet Process (HDP), Latent Dirichlet Allocation (LDA) and Latent Semantic Indexing (LSI) algorithm.

Experiments using query expansion in LSI (LSI에서 질의 확장을 이용한 실험)

  • 안성수;김동주;이기영;김한우
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10b
    • /
    • pp.151-153
    • /
    • 1999
  • 한번의 질의로 사용자가 모든 요구를 표현하기 어렵고 만족시킬 수 없기 때문에 질의를 확장하는 연구가 계속되고 있다. 본 논문에서는 LSI(Latent Semantic Indexing)에서 사용자의 질의와 의미공간에서의 용어들간의 유사도를 구해 최상위의 용어들을 순서를 정해 질의확장을 하는 방법과 LCA(Local Context Analysis)을 이용하는 방법을 제안한다. 그리고 문서 집합에 대해 3가지 가중치를 적용한 결과를 분석하고 질의확장시의 문제점과 향후 연구과제에 대해 설명한다.

  • PDF

LSI-Updating Application for Internet-based Information Retrieval - LSI Improvement Using QR Decomposition (인터넷기반 정보 검색을 위한 LSI 활용 - QR 분해를 이용한 LSI 향상)

  • 박유진;송만석
    • Proceedings of the IEEK Conference
    • /
    • 2001.06c
    • /
    • pp.47-50
    • /
    • 2001
  • This paper took advantage of SVD (Singular value Decomposition) techniques of LSI(Latent Semantic Indexing) to grasp easily terminology distribution. Existent LSI did to static database, propose that apply to dynamic database in this paper. But, if dynamic applies LSI to database, updating problem happens. Existent updating way is Recomputing method, Folding-in method, SVD-updating method. Proposed QR decomposition method to show performance improvement than existent three methods in this paper.

  • PDF