• Title/Summary/Keyword: Keywords Similarity

Search Result 89, Processing Time 0.025 seconds

A New Approach to Automatic Keyword Generation Using Inverse Vector Space Model (키워드 자동 생성에 대한 새로운 접근법: 역 벡터공간모델을 이용한 키워드 할당 방법)

  • Cho, Won-Chin;Rho, Sang-Kyu;Yun, Ji-Young Agnes;Park, Jin-Soo
    • Asia pacific journal of information systems
    • /
    • v.21 no.1
    • /
    • pp.103-122
    • /
    • 2011
  • Recently, numerous documents have been made available electronically. Internet search engines and digital libraries commonly return query results containing hundreds or even thousands of documents. In this situation, it is virtually impossible for users to examine complete documents to determine whether they might be useful for them. For this reason, some on-line documents are accompanied by a list of keywords specified by the authors in an effort to guide the users by facilitating the filtering process. In this way, a set of keywords is often considered a condensed version of the whole document and therefore plays an important role for document retrieval, Web page retrieval, document clustering, summarization, text mining, and so on. Since many academic journals ask the authors to provide a list of five or six keywords on the first page of an article, keywords are most familiar in the context of journal articles. However, many other types of documents could not benefit from the use of keywords, including Web pages, email messages, news reports, magazine articles, and business papers. Although the potential benefit is large, the implementation itself is the obstacle; manually assigning keywords to all documents is a daunting task, or even impractical in that it is extremely tedious and time-consuming requiring a certain level of domain knowledge. Therefore, it is highly desirable to automate the keyword generation process. There are mainly two approaches to achieving this aim: keyword assignment approach and keyword extraction approach. Both approaches use machine learning methods and require, for training purposes, a set of documents with keywords already attached. In the former approach, there is a given set of vocabulary, and the aim is to match them to the texts. In other words, the keywords assignment approach seeks to select the words from a controlled vocabulary that best describes a document. Although this approach is domain dependent and is not easy to transfer and expand, it can generate implicit keywords that do not appear in a document. On the other hand, in the latter approach, the aim is to extract keywords with respect to their relevance in the text without prior vocabulary. In this approach, automatic keyword generation is treated as a classification task, and keywords are commonly extracted based on supervised learning techniques. Thus, keyword extraction algorithms classify candidate keywords in a document into positive or negative examples. Several systems such as Extractor and Kea were developed using keyword extraction approach. Most indicative words in a document are selected as keywords for that document and as a result, keywords extraction is limited to terms that appear in the document. Therefore, keywords extraction cannot generate implicit keywords that are not included in a document. According to the experiment results of Turney, about 64% to 90% of keywords assigned by the authors can be found in the full text of an article. Inversely, it also means that 10% to 36% of the keywords assigned by the authors do not appear in the article, which cannot be generated through keyword extraction algorithms. Our preliminary experiment result also shows that 37% of keywords assigned by the authors are not included in the full text. This is the reason why we have decided to adopt the keyword assignment approach. In this paper, we propose a new approach for automatic keyword assignment namely IVSM(Inverse Vector Space Model). The model is based on a vector space model. which is a conventional information retrieval model that represents documents and queries by vectors in a multidimensional space. IVSM generates an appropriate keyword set for a specific document by measuring the distance between the document and the keyword sets. The keyword assignment process of IVSM is as follows: (1) calculating the vector length of each keyword set based on each keyword weight; (2) preprocessing and parsing a target document that does not have keywords; (3) calculating the vector length of the target document based on the term frequency; (4) measuring the cosine similarity between each keyword set and the target document; and (5) generating keywords that have high similarity scores. Two keyword generation systems were implemented applying IVSM: IVSM system for Web-based community service and stand-alone IVSM system. Firstly, the IVSM system is implemented in a community service for sharing knowledge and opinions on current trends such as fashion, movies, social problems, and health information. The stand-alone IVSM system is dedicated to generating keywords for academic papers, and, indeed, it has been tested through a number of academic papers including those published by the Korean Association of Shipping and Logistics, the Korea Research Academy of Distribution Information, the Korea Logistics Society, the Korea Logistics Research Association, and the Korea Port Economic Association. We measured the performance of IVSM by the number of matches between the IVSM-generated keywords and the author-assigned keywords. According to our experiment, the precisions of IVSM applied to Web-based community service and academic journals were 0.75 and 0.71, respectively. The performance of both systems is much better than that of baseline systems that generate keywords based on simple probability. Also, IVSM shows comparable performance to Extractor that is a representative system of keyword extraction approach developed by Turney. As electronic documents increase, we expect that IVSM proposed in this paper can be applied to many electronic documents in Web-based community and digital library.

A Study on the Knowledge Structure of Cancer Survivors based on Social Network Analysis (네트워크 분석을 통한 암 생존자 지식구조 연구)

  • Kwon, Sun Young;Bae, Ka Ryeong
    • Journal of Korean Academy of Nursing
    • /
    • v.46 no.1
    • /
    • pp.50-58
    • /
    • 2016
  • Purpose: The purpose of this study was to identify the knowledge structure of cancer survivors. Methods: For data, 1099 articles were collected, with 365 keywords as a Noun phrase extracted from the articles and standardized for analyzing. Co-occurrence matrix were generated via a cosine similarity measure, and then the network analysis and visualization using PFNet and NodeXL were applied to visualize intellectual interchanges among keywords. Results: According to the result of the content analysis and the cluster analysis of author keywords from cancer survivors articles, keywords such as 'quality of life', 'breast neoplasms', 'cancer survivors', 'neoplasms', 'exercise' had a high degree centrality. The 9 most important research topics concerning cancer survivors were 'cancer-related symptoms and nursing', 'cancer treatment-related issues', 'late effects', 'psychosocial issues', 'healthy living managements', 'social supports', 'palliative cares', 'research methodology', and 'research participants'. Conclusion: Through this study, the knowledge structure of cancer survivors was identified. The 9 topics identified in this study can provide useful research direction for the development of nursing in cancer survivor research areas. The Network analysis used in this study will be useful for identifying the knowledge structure and identifying general views and current cancer survivor research trends.

Font Recommendation Service Based on Emotion Keyword Attribute Value Estimation (감정 기반 키워드 속성값 산출에 따른 글꼴 추천 서비스)

  • Ji, Youngseo;Lim, SoonBum
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.8
    • /
    • pp.999-1006
    • /
    • 2022
  • The use of appropriate fonts is not only an aesthetic point of view, but also a factor influencing the reinforcement of meaning. However, it is a difficult process and wastes a lot of time for general users to choose a font that suits their needs and emotions. Therefore, in this study, keywords and fonts to be used in the experiment were selected for emotion-based font recommendation, and keyword values for each font were calculated through an experiment to check the correlation between keywords and fonts. Using the experimental results, a prototype of a keyword-based font recommendation system was designed and the possibility of the system was tested. As a result of the usability evaluation of the font recommendation system prototype, it received a positive evaluation compared to the existing font search system, but the number of fonts was limited and users had difficulties in the process of associating keywords suitable for their desired situation. Therefore, we plan to expand the number of fonts and conduct follow-up research to automatically recommend fonts suitable for the user's situation without selecting keywords.

e-Cohesive Keyword based Arc Ranking Measure for Web Navigation (연관 웹 페이지 검색을 위한 e-아크 랭킹 메저)

  • Lee, Woo-Key;Lee, Byoung-Su
    • Journal of KIISE:Databases
    • /
    • v.36 no.1
    • /
    • pp.22-29
    • /
    • 2009
  • The World Wide Web has emerged as largest media which provides even a single user to market their products and publish desired information; on the other hand the user can access what kind of information abundantly enough as well. As a result web holds large amount of related information distributed over multiple web pages. The current search engines search for all the entered keywords in a single webpage and rank the resulting set of web pages as an answer to the user query. But this approach fails to retrieve the pair of web pages which contains more relevant information for users search. We introduce a new search paradigm which gives different weights to the query keywords according to their order of appearance. We propose a new arc weight measure that assigns more relevance to the pair of web pages with alternate keywords present so that the pair of web pages which contains related but distributed information can be presented to the user. Our measure proved to be effective on the similarity search in which the experimentation represented the e~arc ranking measure outperforming the conventional ones.

Semantic-based Keyword Search System over Relational Database (관계형 데이터베이스에서의 시맨틱 기반 키워드 탐색 시스템)

  • Yang, Younghyoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.91-101
    • /
    • 2013
  • One issue with keyword search in general is its ambiguity which can ultimately impact the effectiveness of the search in terms of the quality of the search results. This ambiguity is primarily due to the ambiguity of the contextual meaning of each term in the query. In addition to the query ambiguity itself, the relationships between the keywords in the search results are crucial for the proper interpretation of the search results by the user and should be clearly presented in the search results. We address the keyword search ambiguity issue by adapting some of the existing approaches for keyword mapping from the query terms to the schema terms/instances. The approaches we have adapted for term mapping capture both the syntactic similarity between the query keywords and the schema terms as well as the semantic similarity of the two and give better mappings and ultimately 50% raised accurate results. Finally, to address the last issue of lacking clear relationships among the terms appearing in the search results, our system has leveraged semantic web technologies in order to enrich the knowledgebase and to discover the relationships between the keywords.

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

Engineering Information Search based on Ontology Mapping (온톨로지 매핑 기반 엔지니어링 정보 검색)

  • Jung Min;Suh Hyo-Won
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.23 no.5 s.182
    • /
    • pp.30-36
    • /
    • 2006
  • The participants in collaborative environment want to get the right information or documents which are intended to find. In general search systems, documents which contain only the keywords are retrieved. For searching different word-expressions for the same meaning, we perform mapping before searching. Our mapping-based search approach has two parts, ontology-based mapping logic and ontology libraries. The ontology-based mapping consists of three steps such as character matching (CM), definition comparing (DC) and similarity checking (SC). First, the character matching is the mapping of two terminologies that have identical character strings. Second, the definition comparing is the method that compares two terminologies' ontological definitions. Third, the similarity checking pairs two terminologies which were not mapped by two prior steps through evaluating the similarity of the ontological definitions. For the ontology libraries, document ontology library (DOL), keyword ontology library (KOL), and mapping result library (MRL) are defined. With these three libraries and three mapping steps, an ontology-based search engine (OntSE) is built, and a use case scenario is discussed to show the applicability.

A Semi-Automatic Semantic Mark Tagging System for Building Dialogue Corpus (대화 말뭉치 구축을 위한 반자동 의미표지 태깅 시스템)

  • Park, Junhyeok;Lee, Songwook;Lim, Yoonseob;Choi, Jongsuk
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.8 no.5
    • /
    • pp.213-222
    • /
    • 2019
  • Determining the meaning of a keyword in a speech dialogue system is an important technology for the future implementation of an intelligent speech dialogue interface. After extracting keywords to grasp intention from user's utterance, the intention of utterance is determined by using the semantic mark of keyword. One keyword can have several semantic marks, and we regard the task of attaching the correct semantic mark to the user's intentions on these keyword as a problem of word sense disambiguation. In this study, about 23% of all keywords in the corpus is manually tagged to build a semantic mark dictionary, a synonym dictionary, and a context vector dictionary, and then the remaining 77% of all keywords is automatically tagged. The semantic mark of a keyword is determined by calculating the context vector similarity from the context vector dictionary. For an unregistered keyword, the semantic mark of the most similar keyword is attached using a synonym dictionary. We compare the performance of the system with manually constructed training set and semi-automatically expanded training set by selecting 3 high-frequency keywords and 3 low-frequency keywords in the corpus. In experiments, we obtained accuracy of 54.4% with manually constructed training set and 50.0% with semi-automatically expanded training set.

Study of Similarity Theory of River Models with Movable Beds and its Application. (이동상 하천모형이론의 수립 및 적용)

  • Seo, Il-Won;Jeong, Tae-Seong;Kim, Young-Han
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.5
    • /
    • pp.575-586
    • /
    • 1998
  • A relaxed similarity theory which can be applied to river models with movable beds is established by modifying existing theory by Einstein and chien(1954). Experimental data collected from river models with movable beds were used to evaluate the applicability of the proposed theory. Effects of similarity of flow, ΔFΔM, and similarity of sediment movement, ΔFs, were examined by analyzing the behaviour of total river-bed change. The results show that the smaller ΔFΔM or ΔFs is, respectively, the larger total sedimentation is. The modified similarity theory established in this study would be useful and practical whenever it is impossible or very difficult to satisfy strict theoretical requirements concerning the river model experiments with movable beds. Keywords : river model, similarity of flow, similarity of sediment movement, sediment transport, river-bed change.

  • PDF

Knowledge Structure of the Korean Journal of Occupational Health Nursing through Network Analysis (네트워크분석을 통한 직업건강간호학회지 논문의 지식구조 분석)

  • Kwon, Sun Young;Park, Eun Jung
    • Korean Journal of Occupational Health Nursing
    • /
    • v.24 no.2
    • /
    • pp.76-85
    • /
    • 2015
  • Purpose: The purpose of this study was to identify knowledge structure of the Korean Journal of Occupational Health Nursing from 1991 to 2014. Methods: 400 articles between 1991 and 2014 were collected. 1,369 keywords as noun phrases were extracted from articles and standardized for analysis. Co-occurrence matrix was generated via a cosine similarity measure, then the network was analyzed and visualized using PFNet. Also NodeXL was applied to visualize intellectual interchanges among keywords. Results: According to the results of the content analysis and the cluster analysis of author keywords from the Korean Journal of Occupational Health Nursing articles, 7 most important research topics of the journal were 'Workers & Work-related Health Problem', 'Recognition & Preventive Health Behaviors', 'Health Promotion & Quality of Life', 'Occupational Health Nursing & Management', 'Clinical Nursing Environment', 'Caregivers and Social Support', and 'Job Satisfaction, Stress & Performance'. Newly emerging topics for 4-year period units were observed as research trends. Conclusion: Through this study, the knowledge structure of the Korean Journal of Occupational Health Nursing was identified. The network analysis of this study will be useful for identifying the knowledge structure as well as finding general view and current research trends. Furthermore, The results of this study could be utilized to seek the research direction in the Korean Journal of Occupational Health Nursing.