• Title/Summary/Keyword: Semantic word network

Search Result 114, Processing Time 0.031 seconds

Question Similarity Measurement of Chinese Crop Diseases and Insect Pests Based on Mixed Information Extraction

  • Zhou, Han;Guo, Xuchao;Liu, Chengqi;Tang, Zhan;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.3991-4010
    • /
    • 2021
  • The Question Similarity Measurement of Chinese Crop Diseases and Insect Pests (QSM-CCD&IP) aims to judge the user's tendency to ask questions regarding input problems. The measurement is the basis of the Agricultural Knowledge Question and Answering (Q & A) system, information retrieval, and other tasks. However, the corpus and measurement methods available in this field have some deficiencies. In addition, error propagation may occur when the word boundary features and local context information are ignored when the general method embeds sentences. Hence, these factors make the task challenging. To solve the above problems and tackle the Question Similarity Measurement task in this work, a corpus on Chinese crop diseases and insect pests(CCDIP), which contains 13 categories, was established. Then, taking the CCDIP as the research object, this study proposes a Chinese agricultural text similarity matching model, namely, the AgrCQS. This model is based on mixed information extraction. Specifically, the hybrid embedding layer can enrich character information and improve the recognition ability of the model on the word boundary. The multi-scale local information can be extracted by multi-core convolutional neural network based on multi-weight (MM-CNN). The self-attention mechanism can enhance the fusion ability of the model on global information. In this research, the performance of the AgrCQS on the CCDIP is verified, and three benchmark datasets, namely, AFQMC, LCQMC, and BQ, are used. The accuracy rates are 93.92%, 74.42%, 86.35%, and 83.05%, respectively, which are higher than that of baseline systems without using any external knowledge. Additionally, the proposed method module can be extracted separately and applied to other models, thus providing reference for related research.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Comparing the Structure of Secondary School Students' Perception of the Meaning of 'Experiment' in Science and Biology (중등학생들의 과학과 생물에서의 '실험'의 의미에 대한 인식구조 비교)

  • Lee, Jun-Ki;Shin, Sein;Ha, Minsu
    • Journal of The Korean Association For Science Education
    • /
    • v.35 no.6
    • /
    • pp.997-1006
    • /
    • 2015
  • Perception of the experiment is one of the most important factors of students' understanding of scientific inquiry and the nature of science. This study examined the perception of middle and high school students of the meaning of 'experiment' in the biological sciences. Semantic network analysis (SNA) was especially used to visualize students' perception structure in this study. One hundred and ninety middle school students and 200 high school students participated in this study. Students responded to two questions on the meaning of 'experiment' in science and biology. This study constructed four semantic networks based on the collected response. As a result, middle school students about the 'experiment' in science are 'we', 'direct', 'principle' of such words was aware of the experiments from the center to the active side. The high school students' 'theory', 'true', 'information' were recognized as an experiment that explores the process of creating a knowledge center including the word. In addition, middle school students relative to 'experiment' of the creature around the 'dissection', 'body', high school students were recognized as 'life', 'observation' observation activities dealing with the living organisms and recognized as a core. The results of this study will be used as important evidence in the future to map out an experiment in biological science curriculum.

Perceived Characteristics of Grains during the Choseon Dynasty - A Study Applying Text Frequency Analysis Using the Choseonwangjoshilrok Data - (조선왕조실록 텍스트 빈도 분석을 통한 조선시대 곡물에 관한 인식 특성 고찰)

  • Mi-Hye, Kim
    • Journal of the Korean Society of Food Culture
    • /
    • v.38 no.1
    • /
    • pp.26-37
    • /
    • 2023
  • This study applied the text frequency method to analyze the crops prevalent during the Chosunwangjoshilrok dynasty, and categorized the results by each king. Contemporary perception of grains was observed by examining the staple crop types. Staple species were examined using the word cloud and semantic network analysis. Totally, 101,842 types of crop consumption were recorded during the Chosunwangjoshilrok period. Of these, 51,337 (50.4%) were grains, 50,407 (49.5%) were beans, and 98 (0.1%) were seeds. Rice was the most frequently consumed grain (37.1%), followed by pii (11.9%), millet (11.3%), barley (4.5%), proso (0.8%), wheat (0.6%), buckwheat (0.1%), and adlay (0.05%). Grain chronological frequency in the Choseon dynasty was determined to be 15,520 cases in the 15th century (30.2%), 11,201 cases in the 18th century (21.8%), 9,421 cases in the 17th century (18.4%), 9,113 cases in the 16th century (17.8%), and 6,082 cases in the 19th century (11.8%). Interest in grain amongst the 27 kings of Choseon was evaluated based on the frequency of records. The 15th century King Sejong recorded the maximum interest with 13,363 cases (13.1%), followed by King Jungjo (8,501 cases in the 18th century; 8.4%), King Sungjong (7,776 cases in the 15th century; 7.6%).

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Question Retrieval using Deep Semantic Matching for Community Question Answering (심층적 의미 매칭을 이용한 cQA 시스템 질문 검색)

  • Kim, Seon-Hoon;Jang, Heon-Seok;Kang, In-Ho
    • Annual Conference on Human and Language Technology
    • /
    • 2017.10a
    • /
    • pp.116-121
    • /
    • 2017
  • cQA(Community-based Question Answering) 시스템은 온라인 커뮤니티를 통해 사용자들이 질문을 남기고 답변을 작성할 수 있도록 만들어진 시스템이다. 신규 질문이 인입되면, 기존에 축적된 cQA 저장소에서 해당 질문과 가장 유사한 질문을 검색하고, 그 질문에 대한 답변을 신규 질문에 대한 답변으로 대체할 수 있다. 하지만, 키워드 매칭을 사용하는 전통적인 검색 방식으로는 문장에 내재된 의미들을 이용할 수 없다는 한계가 있다. 이를 극복하기 위해서는 의미적으로 동일한 문장들로 학습이 되어야 하지만, 이러한 데이터를 대량으로 확보하기에는 어려움이 있다. 본 논문에서는 질문이 제목과 내용으로 분리되어 있는 대량의 cQA 셋에서, 질문 제목과 내용을 의미 벡터 공간으로 사상하고 두 벡터의 상대적 거리가 가깝게 되도록 학습함으로써 의사(pseudo) 유사 의미의 성질을 내재화 하였다. 또한, 질문 제목과 내용의 의미 벡터 표현(representation)을 위하여, semi-training word embedding과 CNN(Convolutional Neural Network)을 이용한 딥러닝 기법을 제안하였다. 유사 질문 검색 실험 결과, 제안 모델을 이용한 검색이 키워드 매칭 기반 검색보다 좋은 성능을 보였다.

  • PDF

The Application and Evaluation of Verbal Lexical-Semantic Network Using Automatic Word Clustering (단어클러스터링을 이용한 동사 어휘의미망의 활용 및 평가)

  • Kim, Hae-Gyung;Yoon, Ae-Sun
    • Proceedings of the Korean Society for Cognitive Science Conference
    • /
    • 2006.06a
    • /
    • pp.1-7
    • /
    • 2006
  • 최근 수년간 한국어를 위한 어휘의미망에 대한 관심은 꾸준히 높아지고 있지만, 그 결과물을 어떻게 평가하고 활용할 것인가에 대한 방안은 이루어지지 않고 있다. 본 논문에서는 단어클러스터링 시스템 개발을 통하여, 어휘의미망에 의해 확장되기 전후의 클러스터링을 수행하여 데이터를 서로 비교하였다. 단어클러스터링 시스템 개발을 위해 사용된 학습 데이터는 신문 말뭉치 기사로 총 68,455,856 어절 규모이며, 특성벡터와 벡터공간모델을 이용하여 시스템A를 완성하였다. 시스템B는 구축된 '[-하]동사류' 3,656개의 어휘의미를 포함하는 동사어휘의미망을 포함하여 확장된 것으로 확장대상정보를 선택하여 특성벡터를 재구성한다. 대상이 되는 실험 데이터는 '다국어 어휘의미망-코어넷'으로 클러스터링 결과 나타난 어휘들의 세 번째 층위까지의 노드 동일성 여부로 정확률 검수를 하였다. 같은 환경에서 시스템A와 시스템B를 비교한 결과 단어클러스터링의 정확률이 45.3%에서 46.6%로의 향상을 보였다. 향후 연구는 어휘의미망을 활용하여 좀 더 다양한 시스템에 체계적이고 폭넓은 평가를 통해 전산시스템의 향상은 물론, 연구되고 있는 많은 어휘의미망에 의미 있는 평가 방안을 확대시켜 나가야 할 것이다.

  • PDF

Design and Implementation of Computational Model Simulating Language Phenomena in Lexical Decision Task (어휘판단 과제 시 보이는 언어현상의 계산주의적 모델 설계 및 구현)

  • Park, Kinam;Lim, Heuiseok;Nam, Kichun
    • The Journal of Korean Association of Computer Education
    • /
    • v.9 no.2
    • /
    • pp.89-99
    • /
    • 2006
  • This paper proposes a computational model which can simulate peculiar language phenomena observed in human lexical decision task. The model is designed to mimic major language phenomena such as frequency effect, lexical status effect, word similarity, and semantic priming effect. The experimental results show that the propose model replicated the major language phenomena and performed similar performance with that of human in LDT.

  • PDF

Semantic Information Retrieval using User-Word Intelligent Network (사용자 어휘지능망을 이용한 의미적 정보검색)

  • Kim, Chang-Hwan;Im, Ji-Hui;Choe, Ho-Seop;Yoon, Hwa-Mook;Ock, Cheol-Young
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2006.11a
    • /
    • pp.157-160
    • /
    • 2006
  • 웹 자원이 방대함에 따라, 사용자가 원하는 정보를 얼마나 정확하게 제시하느냐가 정보검색시스템 성능을 판단하는 기준이 된다. 그러나 동형이의어만을 질의어로 이용한 검색 결과는 동형이의어 각 의미에 관련된 문서가 혼재되어 있거나, 특정 의미에 관련된 문서가 집중적으로 나타나는 현상을 볼 수 있다. 이에 본 논문에서는 한국어 사용자 어휘지능망(U-WIN)의 관계정보를 이용하여 질의어의 모호성을 해결하고 의미적 정보검색의 기반을 마련하고자 한다. 우선, 전문분야에 주로 사용되는 동형이의어와 보편적으로 사용하는 동형의어를 구번하여 질의어로 선정하고, '질의어+상위어' 형태의 확장 질의어에 대해 두 개의 포탈사이트(Google, Naver)를 대상으로 웹 문서를 검색하여 정확률이 각각 81.5%(Naver), 65.5%(Google)로 나타났다.

  • PDF

A Big Data Analysis of Public Interest in Defense Reform 2.0 and Suggestions for Policy Completion

  • Kim, Tae Kyoung;Kang, Wonseok
    • Journal of East Asia Management
    • /
    • v.4 no.1
    • /
    • pp.1-22
    • /
    • 2023
  • This study conducted a big data analysis study through text mining and semantic network analysis to explore the perception of defense reform 2.0. The collected data were analyzed with the top 70 keywords as the appropriate range for network visualization. Through word frequency analysis, connection centrality analysis, and an N-gram analysis, we identified issues that received much attention such as troop reduction, shortening of military service period, dismantling of the border area unit, and returning wartime operational control. In particular, the results of clustering words through CONCOR analysis showed that there was a great interest in pursuing the technical group, concerns about military capacity reduction, and reorganization of manpower structure. The results of the analysis through text mining techniques are as follows. First, it was found that there was a lack of awareness about measures to reinforce the reduced troops while receiving much attention to the reduction of troops in Defense Reform 2.0. Second, it was found that it is necessary to actively communicate with the local community due to the deconstruction and movement of the border area units, such as the decrease of the population of the region and the collapse of the local commercial area. Third, it was judged that it is necessary to show substantial results through the promotion of barracks culture and the defense industry, which showed that there was less interest than military structure and defense operation from the people and the introduction of active policies. Through this study, we analyzed the public's interest in defense reform 2.0, which is a representative defense policy, and suggested a plan to draw support for national policy.