• Title/Summary/Keyword: Keywords Similarity

Search Result 89, Processing Time 0.023 seconds

A Knowledge-based Interactive Idea Categorizer for Electronic Meeting Systems

  • Kim, Jae-Kyeong;Lee, Jae-Kwang
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.04a
    • /
    • pp.333-340
    • /
    • 2000
  • Research on group decisions and electronic meeting systems have been increasing rapidly according to the widespread of Internet technology. Although various issues have been raised in empirical research, we will try to solve an issue on idea categorizing in the group decision making process of electronic meeting systems. Idea categorizing used at existing group decision support systems was performed in a top-down procedure and mostly b participants; manual work. This resulted in tacking as long in idea categorizing as it does for idea generating clustering an idea in multiple categories, and identifying almost similar redundant categories. However such methods have critical limitation in the electronic meeting systems, we suggest an intelligent idea categorizing methodology which is a bottom-up approach. This method consists of steps to present idea using keywords, identifying keywords' affinity, computing similarity among ideas, and clustering ideas. This methodology allows participants to interact iteratively for clear manifestation of ambiguous ideas. We also developed a prototype system, IIC (intelligent idea categorizer) and evaluated its performance using the comparision experimetn with other systems. IIC is not a general purposed system, but it produces a good result in a given specific domain.

  • PDF

A Knowledge based Interaction idea Categorizer for Electronic Meeting Systems

  • Kim, Jae-Kyeong;Lee, Jae-Kwang
    • Journal of Intelligence and Information Systems
    • /
    • v.6 no.2
    • /
    • pp.63-76
    • /
    • 2000
  • Research on group decisions and electroinc meeting systems have been increasing rapidly according to the widespread of Internet technology. Although various issues have been raised in empirical research, we will try to solve an issue on idea categorizing in the group decision making process of elecronic meeting systems. Idea categorizing used at existing group decision support systems was performed in a top-down procedure and mostly participants\` by manual work. This resulted in tacking as long in idea categorizing as it does for idea generating, clustering an idea in multiple categories, and identifying almost similar redundant categories. However such methods have critical limitation in the electronic meeting systems, we suggest an intelligent idea categorizing methodology which is a bottom-up approach. This method consists of steps to present idea using keywords, identifying keywords\` affinity, computing similarity among ideas, and clustering ideas. This methodology allows participants to interact iteratively for clear manifestation of ambiguous ideas. We also developed a prototype system, IIC (intelligent idea categorizer) and evaluated its performance using the comparision experimetn with other systems. IIC is not a general purposed system, but it produces a good result in a given specific domain.

  • PDF

Representative Keyword Extraction from Few Documents through Fuzzy Inference (퍼지추론을 이용한 소수 문서의 대표 키워드 추출)

  • 노순억;김병만;허남철
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.11 no.9
    • /
    • pp.837-843
    • /
    • 2001
  • In this work, we propose a new method of extracting and weighting representative keywords(RKs) from a few documents that might interest a user. In order to extract RKs, we first extract candidate terms and them choose a number of terms called initial representative keywords (IRKs) from them through fuzzy inference. Then, by expanding and reweighting IRKs using term co-occurrence similarity, the final RKs are obtained. Performance of our approach is heavily influenced by effectiveness of selection method of IRKs so that we choose fuzzy inference because it is more effective in handling the uncertainty inherent in selecting representative keywords of documents. The problem addressed in this paper can be viewed as the one of calculating center of document vectors. So, to show the usefulness of our approach, we compare with two famous methods - Rocchio and Widrow-Hoff - on a number of documents collections. The result show that our approach outperforms the other approaches.

  • PDF

Clustering of Web Objects with Similar Popularity Trends (유사한 인기도 추세를 갖는 웹 객체들의 클러스터링)

  • Loh, Woong-Kee
    • The KIPS Transactions:PartD
    • /
    • v.15D no.4
    • /
    • pp.485-494
    • /
    • 2008
  • Huge amounts of various web items such as keywords, images, and web pages are being made widely available on the Web. The popularities of such web items continuously change over time, and mining temporal patterns in popularities of web items is an important problem that is useful for several web applications. For example, the temporal patterns in popularities of search keywords help web search enterprises predict future popular keywords, enabling them to make price decisions when marketing search keywords to advertisers. However, presence of millions of web items makes it difficult to scale up previous techniques for this problem. This paper proposes an efficient method for mining temporal patterns in popularities of web items. We treat the popularities of web items as time-series, and propose gapmeasure to quantify the similarity between the popularities of two web items. To reduce the computation overhead for this measure, an efficient method using the Fast Fourier Transform (FFT) is presented. We assume that the popularities of web items are not necessarily following any probabilistic distribution or periodic. For finding clusters of web items with similar popularity trends, we propose to use a density-based clustering algorithm based on the gap measure. Our experiments using the popularity trends of search keywords obtained from the Google Trends web site illustrate the scalability and usefulness of the proposed approach in real-world applications.

The Study on Recent Research Trend in Korean Tourism Using Keyword Network Analysis (키워드 네트워크를 이용한 국내 관광연구의 최근 연구동향 분석)

  • Kim, Min Sun;Um, Hyemi
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.9
    • /
    • pp.68-73
    • /
    • 2016
  • This study was conducted to identify trends and knowledge structures associated with recent trends in Korean tourism from 2010 to 2015 using keyword data. To accomplish this, we constructed a network using keywords extracted from KCI journals. We then made a matrix describing the relationships between rows as papers and columns as keywords. A keyword network showed the connectivity of papers that have included one or more of the same keywords. Major keywords were then extracted using the cosine similarity between co-occurring keywords and components were analyzed to understand research trends and knowledge structure. The results revealed that subjects of tourism research have changed rapidly and variously. A few topics related to 'organization-employee' were major trends for several years, but intrinsic and extrinsic factors have been further subdivided and employees of specific fields have been targeted as subjects of research. Component analysis is useful for analyzing concrete research topics and the relationships between them. The results of this study will be useful for researchers attempting to identify new topics.

A Convergence Study of the Research Trends on Stress Urinary Incontinence using Word Embedding (워드임베딩을 활용한 복압성 요실금 관련 연구 동향에 관한 융합 연구)

  • Kim, Jun-Hee;Ahn, Sun-Hee;Gwak, Gyeong-Tae;Weon, Young-Soo;Yoo, Hwa-Ik
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.8
    • /
    • pp.1-11
    • /
    • 2021
  • The purpose of this study was to analyze the trends and characteristics of 'stress urinary incontinence' research through word frequency analysis, and their relationships were modeled using word embedding. Abstract data of 9,868 papers containing abstracts in PubMed's MEDLINE were extracted using a Python program. Then, through frequency analysis, 10 keywords were selected according to the high frequency. The similarity of words related to keywords was analyzed by Word2Vec machine learning algorithm. The locations and distances of words were visualized using the t-SNE technique, and the groups were classified and analyzed. The number of studies related to stress urinary incontinence has increased rapidly since the 1980s. The keywords used most frequently in the abstract of the paper were 'woman', 'urethra', and 'surgery'. Through Word2Vec modeling, words such as 'female', 'urge', and 'symptom' were among the words that showed the highest relevance to the keywords in the study on stress urinary incontinence. In addition, through the t-SNE technique, keywords and related words could be classified into three groups focusing on symptoms, anatomical characteristics, and surgical interventions of stress urinary incontinence. This study is the first to examine trends in stress urinary incontinence-related studies using the keyword frequency analysis and word embedding of the abstract. The results of this study can be used as a basis for future researchers to select the subject and direction of the research field related to stress urinary incontinence.

Research on the development of demand for medical and bio technology using big data (빅데이터 활용 의학·바이오 부문 사업화 가능 기술 연구)

  • Lee, Bongmun.;Nam, Gayoung;Kang, Byeong Chul;Kim, CheeYong
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.345-352
    • /
    • 2022
  • Conducting AI-based fusion business due to the increment of ICT fusion medical device has been expanded. In addition, AI-based medical devices help change existing medical system on treatment into the paradigm of customized treatment such as preliminary diagnosis and prevention. It will be generally promoted to the change of medical device industry. Although the current demand forecasting of medical biotechnology commercialization is based on the method of Delphi and AHP, there is a problem that it is difficult to have a generalization due to fluctuation results according to a pool of participants. Therefore, the purpose of the paper is to predict demand forecasting for identifying promising technology based on building up big data in medical biotechnology. The development method is to employ candidate technologies of keywords extracted from SCOPUS and to use word2vec for drawing analysis indicator, technological distance similarity, and recommended technological similarity of top-level items in order to achieve a reasonable result. In addition, the method builds up academic big data for 5 years (2016-2020) in order to commercialize technology excavation on demand perspective. Lastly, the paper employs global data studies in order to develop domestic and international demand for technology excavation in the medical biotechnology field.

Deep Learning Based Semantic Similarity for Korean Legal Field (딥러닝을 이용한 법률 분야 한국어 의미 유사판단에 관한 연구)

  • Kim, Sung Won;Park, Gwang Ryeol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.2
    • /
    • pp.93-100
    • /
    • 2022
  • Keyword-oriented search methods are mainly used as data search methods, but this is not suitable as a search method in the legal field where professional terms are widely used. In response, this paper proposes an effective data search method in the legal field. We describe embedding methods optimized for determining similarities between sentences in the field of natural language processing of legal domains. After embedding legal sentences based on keywords using TF-IDF or semantic embedding using Universal Sentence Encoder, we propose an optimal way to search for data by combining BERT models to check similarities between sentences in the legal field.

Applying Genomic Sequence Alignment Methodology for Source Codes Plagiarism Detection (유전체 서열의 정렬 기법을 이용한 소스 코드 표절 검사)

  • 강은미;황미녕;조환규
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.352-367
    • /
    • 2003
  • The syntactic and semantic characteristics of a computer program can be represented by the keywords sequence extracted from the source code. Therefore the similarity and the difference between two programs can be clearly figured out by comparing the keyword sequences obtained from the given programs. Various methods for measuring the similarity of two different sequences have been intensively studied already in bioinformatics on biological genetic sequence manipulation. In this paper, we propose a new method for measuring the similarity of two different programs and detecting the partial plagiarism by exploiting the sequence alignment techniques. In order to evaluate the performance of the proposed method, we experimented with the actual Program codes submitted by 70 students attending a Data Structure course )tow 2001. The experimental results show that the proposed method is more effective and powerful than the fingerprint method which is the most commonly used for the Plagiarism detection.

Semi-automatic Data Fusion Method for Spatial Datasets (공간 정보를 가지는 데이터셋의 준자동 융합 기법)

  • Yoon, Jong-chan;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.26 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • With the development of big data-related technologies, it has become possible to process vast amounts of data that could not be processed before. Accordingly, the establishment of an automated data selection and fusion process for the realization of big data-based services has become a necessity, not an option. In this paper, we propose an automation technique to create meaningful new information by fusing datasets containing spatial information. Firstly, the given datasets are embedded by using the Node2Vec model and the keywords of each dataset. Then, the semantic similarities among all of datasets are obtained by calculating the cosine similarity for the embedding vector of each pair of datasets. In addition, a person intervenes to select some candidate datasets with one or more spatial identifiers from among dataset pairs with a relatively higher similarity, and fuses the dataset pairs to visualize them. Through such semi-automatic data fusion processes, we show that significant fused information that cannot be obtained with a single dataset can be generated.