• Title/Summary/Keyword: co-occurrence words

Search Result 74, Processing Time 0.03 seconds

A Method for Information Source Selection using Teasaurus for Distributed Information Retrieval

  • Goto, Shoji;Ozono, Tadachika;Shintani, Toramatsu
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.272-277
    • /
    • 2001
  • In this paper, we describe a new method for selecting information sources in a distributed environment. Recently, there has been much research on distributed information retrieval, that is information retrieval (IR) based on a multi-database model in which the existence of multiple sources is modeled explicitly. In distributed IR, a method is needed that would enable selecting appropriate sources for users\` queries. Most existing methods use statistical data such as document frequency. These methods may select inappropriate ate sources if a query contains polysemous words. In this paper, we describe an information-source selection method using two types of thesaurus. One is a thesaurus automatically constructed from documents in a source. The other is a hand-crafted general-purpose thesaurus(e.g. WordNet). Terms used in documents in a source differ from one another and the meanings of a term differ depending on th situation in which the term is used. The difference is a characteristic of the source. In our method, the meanings of a term are distinguished between by the relationship between the term and other terms, and the relationship appear in the co-occurrence-based thesaurus. In this paper, we describe an algorithm for evaluating a usefulness of a source for a query based on a thesaurus. For a practical application of our method, we have developed Papits, a multi-agent-based in formation sharing system. An experiment of selection shows that our method is effective for selecting appropriate sources.

  • PDF

A study on research trends for gestational diabetes mellitus and breastfeeding: Focusing on text network analysis and topic modeling (임신성 당뇨와 모유수유에 대한 연구 동향 분석: 텍스트네트워크 분석과 토픽모델링 중심)

  • Lee, Junglim;Kim, Youngji;Kwak, Eunju;Park, Seungmi
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.27 no.2
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: The aim of this study was to identify core keywords and topic groups in the 'Gestational diabetes mellitus (GDM) and Breastfeeding' field of research for better understanding research trends in the past 20 years. Methods: This was a text-mining and topic modeling study composed of four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building a co-occurrence matrix, and 4) analyzing network features and clustering topic groups. Results: A total of 635 papers published between 2001 and 2020 were found in databases (Web of Science, CINAHL, RISS, DBPIA, RISS, KISS). Among them, 3,639 words extracted from 366 articles selected according to the conditions were analyzed by text network analysis and topic modeling. The most important keywords were 'exposure', 'fetus', 'hypoglycemia', 'prevention' and 'program'. Six topic groups were identified through topic modeling. The main topics of the study were 'cardiovascular disease' and 'obesity'. Through the topic modeling analysis, six themes were derived: 'cardiovascular disease', 'obesity', 'complication prevention strategy', 'support of breastfeeding', 'educational program' and 'management of GDM'. Conclusion: This study showed that over the past 20 years many studies have been conducted on complications such as cardiovascular diseases and obesity related to gestational diabetes and breastfeeding. In order to prevent complications of gestational diabetes and promote breastfeeding, various nursing interventions, including gestational diabetes management and educational programs for GDM pregnancies, should be developed in nursing fields.

Korea's Trade Rules Analysis using Topic Modeling : from 2000 to 2022 (토픽 모델링을 이용한 한국 무역규범 연구동향 분석 : 2000년~2022년)

  • Byeong-Ho Lim;Jeong-In Chang;Tae-Han Kim;Ha-Neul Han
    • Korea Trade Review
    • /
    • v.48 no.1
    • /
    • pp.55-81
    • /
    • 2023
  • The purpose of this study is to analyze the main issues and trends of Korean trade, and to draw implications for future research regarding trade rules. A total of 476 academic journal are analyzed using English keyword searched for 'Trade Rules' from 2000 to July 2022 in the Korean Journal Citation Index data base. The analysis methodology includes co-occurrence network and topic trend analysis which is a kind of text mining methods. The results shows that key words representing Korea's trade trend fall into four categories in which the number of research journals has rapidly increased, which are Topic 4 (Investment Treaty), Topic 7 (Trade Security), Topic 8 (China's Protectionism), and Topic 11 (Trade Settlement). The major background for these topics is the tension between the United States and China threatening the existing international trade system. A detailed study for China's protectionism, changes in trade security system, and new investment agreements, and changes in payment methods will be the challenges in near future.

Analysis of Research Trends of Explosion Accidents Using Co-Occurrence Keyword Analysis (동시출현 핵심단어 분석을 활용한 폭발사고 연구 동향 분석)

  • Youngwoo Lee;Minju Kim;Jeewon Lee;Wusung An;Sangki, Kwon
    • Explosives and Blasting
    • /
    • v.42 no.2
    • /
    • pp.12-28
    • /
    • 2024
  • Explosion involving rapid energy diffusion are causing enormous human and economic damage. Due to the advancement of the industry, various and widespread explosion accidents are occurring worldwise, and to prevent such explosion accidents, accurate cause analysis should be the basis. Research analysis related to worldwise explosion accidents was carried out in a limited range for some accidents. By conducting bibliometric analysis of keywords on all the papers published in international journals, this study attempted to derive the overall research trend by period and the latest fields in which future researchers may be interested. As a result of the study of keywords, the number of papers was generally small and the number of overall key words was small from 2005 to 2014, but numerical simulation and artificial intelligence have been used for the analysis of explosion accident cases since 2015, and various studies such as lithium-ion battery and mixed gas, which are the latest research fields, are currently being actively conducted.

A Study of Secondary Mathematics Materials at a Gifted Education Center in Science Attached to a University Using Network Text Analysis (네트워크 텍스트 분석을 활용한 대학부설 과학영재교육원의 중등수학 강의교재 분석)

  • Kim, Sungyeun;Lee, Seonyoung;Shin, Jongho;Choi, Won
    • Communications of Mathematical Education
    • /
    • v.29 no.3
    • /
    • pp.465-489
    • /
    • 2015
  • The purpose of this study is to suggest implications for the development and revision of future teaching materials for mathematically gifted students by using network text analysis of secondary mathematics materials. Subjects of the analysis were learning goals of 110 teaching materials in a gifted education center in science attached to a university from 2002 to 2014. In analysing the frequency of the texts that appeared in the learning goals, key words were selected. A co-occurrence matrix of the key words was established, and a basic information of network, centrality, centralization, component, and k-core were deducted. For the analysis, KrKwic, KrTitle, and NetMiner4.0 programs were used, respectively. The results of this study were as follows. First, there was a pivot of the network formed with core hubs including 'diversity', 'understanding' 'concept' 'method', 'application', 'connection' 'problem solving', 'basic', 'real life', and 'thinking ability' in the whole network from 2002 to 2014. In addition, knowledge aspects were well reflected in teaching materials based on the centralization analysis. Second, network text analysis based on the three periods of the Mater Plan for the promotion of gifted education was conducted. As a result, a network was built up with 'understanding', and there were strong ties among 'question', 'answer', and 'problem solving' regardless of the periods. On the contrary, the centrality analysis showed that 'communication', 'discovery', and 'proof' only appeared in the first, second, and third period of Master Plan, respectively. Therefore, the results of this study suggest that affective aspects and activities with high cognitive process should be accompanied, and learning goals' mannerism and ahistoricism be prevented in developing and revising teaching materials.

A Network Analysis of the Research Trends in Fingerprints in Korea (네트워크 분석을 활용한 국내 지문인식연구의 동향분석)

  • Jung, Jinhyo;Lee, Chang-Moo
    • Convergence Security Journal
    • /
    • v.17 no.1
    • /
    • pp.15-30
    • /
    • 2017
  • Since the 1990s, fingerprint recognition has attracted much attention among scholars. There have been numerous studies on fingerprint recognition. However, most of the academic papers have focused mainly on how to make a technical advance of fingerprint recognition. there has been no significant output in the analysis of the research trends in fingerprint recognition. It's essential part to describe the overall structure of fingerprint recognition to make further studies much more efficient and effective. To this end, the primary purpose of this article is to deliver an overview of the research trends on fingerprint recognition based on network analysis. This study analyzed abstracts of the 122 academic journals ranging from 1990 to 2015. For gathering those data, the author took advantage of an academic searchable data base-RISS. After collecting abstracts, cleaning process was carried out and key words were selected by using Krwords and R; co-occurrence symmetric matrix made up of key words was created by Ktitle; and Netminer was employed to analyze closeness centrality. The result achieved from this work included followings: research trends in fingerprint recognition from 1990 to 2000, 2001 to 2005, 2006 to 2010, and 2011 to 2015.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

A Study on the Effect of "ADAPTAGEN"$^{\textregistered}$ Korean Ginseng Components, for the Injured Mouse by X-ray($^{60}Co$) Irradiation (X-방사선($^{60}Co$)에 조사된 새앙쥐의 상해에 대한 "아답태겐"$^{\textregistered}$의 효과에 관한 연구)

  • 공태훈;유성렬
    • Journal of Ginseng Research
    • /
    • v.15 no.3
    • /
    • pp.171-178
    • /
    • 1991
  • The results ok feeding experiments to the mice with ginseng extract, ginseng Powder, and ADAPTAGEN, for 30 days before X-ray irradiation and for 40 days after the X-ray irradiation at 750 rads were as follow: 1. The 50% lethals days (LD50, ) by the X-ray irradiation were 9 days at 1, 000 rads. 10 days at 900 rads, 11 days at 800 rads, 14 days at 760 rads, and 19 darts at 750 rads. Therefore, the standard radiation dose was set at 750 radb/8 min. 2. The 80% of the control group mice exposed to the X-ray radiation without ginseng feeding died in periods ranging from 14 to 24 days and the 20~30% of the ginseng extract and ginseng powder feeding groups died. But the 100% of the mice fed with ADAPTAGEN survived. 3. Testicles of the control group became smaller in weight than the nomad group by 26.5 to 29.0% and those of the ginseng extract and ginseng powder feeding group reduced by 44.6 to 60.4%. However, testicles of the ADAPTiIGEN feeding group increased in size by 77.4% to 87.1% and in weight by 61%, showing a recovery phenomenon approarhing to those of the ordinary mice. The ADAPTAGEN feeding group mice were also as active in color as the ordinary ones. 4. An electron micrograph(X8, 000X2.2) of the liver cells of the mice which had been 40 days after X-ray irradiation showed as follows; The control group appeared that is physiological action stopped due to the frequent occurrence of morphological change of the nucleus and diffusion of chromosome, reduction in microspores and expansion of microsomts, and endoplasmic change of mitochondria. The liver cells or the ADAPTAGEN feeding group were in a state similar to those of the ordinary mice restoring to normalcy In contrast, the liver cells of the ginseng extract and ginseng powder feeding groups were still far from being normal. 5. A serological analysis showed that the control group sharply decreased in albumin, Y-g1obu1in, and IgG so far as to cause dystrophy and to weaken antibody resistance but that ginseng extract and ginseng powder feeding groups, though in a little more restoring state than the control group, were still far from the normal group. The ADAPTAGEN feeding group restored to a state as comparable to the normal group in the contents of albumin ${\gamma}$-globulin, IgG and serum protein. In order words, it is noteworthy that ADAPTAGEN feeding was effective in revitalizing the destroyed cells of a living body and that it has the function of normalizing antibody components.

  • PDF

Research Trend Analysis Using Bibliographic Information and Citations of Cloud Computing Articles: Application of Social Network Analysis (클라우드 컴퓨팅 관련 논문의 서지정보 및 인용정보를 활용한 연구 동향 분석: 사회 네트워크 분석의 활용)

  • Kim, Dongsung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.195-211
    • /
    • 2014
  • Cloud computing services provide IT resources as services on demand. This is considered a key concept, which will lead a shift from an ownership-based paradigm to a new pay-for-use paradigm, which can reduce the fixed cost for IT resources, and improve flexibility and scalability. As IT services, cloud services have evolved from early similar computing concepts such as network computing, utility computing, server-based computing, and grid computing. So research into cloud computing is highly related to and combined with various relevant computing research areas. To seek promising research issues and topics in cloud computing, it is necessary to understand the research trends in cloud computing more comprehensively. In this study, we collect bibliographic information and citation information for cloud computing related research papers published in major international journals from 1994 to 2012, and analyzes macroscopic trends and network changes to citation relationships among papers and the co-occurrence relationships of key words by utilizing social network analysis measures. Through the analysis, we can identify the relationships and connections among research topics in cloud computing related areas, and highlight new potential research topics. In addition, we visualize dynamic changes of research topics relating to cloud computing using a proposed cloud computing "research trend map." A research trend map visualizes positions of research topics in two-dimensional space. Frequencies of key words (X-axis) and the rates of increase in the degree centrality of key words (Y-axis) are used as the two dimensions of the research trend map. Based on the values of the two dimensions, the two dimensional space of a research map is divided into four areas: maturation, growth, promising, and decline. An area with high keyword frequency, but low rates of increase of degree centrality is defined as a mature technology area; the area where both keyword frequency and the increase rate of degree centrality are high is defined as a growth technology area; the area where the keyword frequency is low, but the rate of increase in the degree centrality is high is defined as a promising technology area; and the area where both keyword frequency and the rate of degree centrality are low is defined as a declining technology area. Based on this method, cloud computing research trend maps make it possible to easily grasp the main research trends in cloud computing, and to explain the evolution of research topics. According to the results of an analysis of citation relationships, research papers on security, distributed processing, and optical networking for cloud computing are on the top based on the page-rank measure. From the analysis of key words in research papers, cloud computing and grid computing showed high centrality in 2009, and key words dealing with main elemental technologies such as data outsourcing, error detection methods, and infrastructure construction showed high centrality in 2010~2011. In 2012, security, virtualization, and resource management showed high centrality. Moreover, it was found that the interest in the technical issues of cloud computing increases gradually. From annual cloud computing research trend maps, it was verified that security is located in the promising area, virtualization has moved from the promising area to the growth area, and grid computing and distributed system has moved to the declining area. The study results indicate that distributed systems and grid computing received a lot of attention as similar computing paradigms in the early stage of cloud computing research. The early stage of cloud computing was a period focused on understanding and investigating cloud computing as an emergent technology, linking to relevant established computing concepts. After the early stage, security and virtualization technologies became main issues in cloud computing, which is reflected in the movement of security and virtualization technologies from the promising area to the growth area in the cloud computing research trend maps. Moreover, this study revealed that current research in cloud computing has rapidly transferred from a focus on technical issues to for a focus on application issues, such as SLAs (Service Level Agreements).

Construction of Event Networks from Large News Data Using Text Mining Techniques (텍스트 마이닝 기법을 적용한 뉴스 데이터에서의 사건 네트워크 구축)

  • Lee, Minchul;Kim, Hea-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.183-203
    • /
    • 2018
  • News articles are the most suitable medium for examining the events occurring at home and abroad. Especially, as the development of information and communication technology has brought various kinds of online news media, the news about the events occurring in society has increased greatly. So automatically summarizing key events from massive amounts of news data will help users to look at many of the events at a glance. In addition, if we build and provide an event network based on the relevance of events, it will be able to greatly help the reader in understanding the current events. In this study, we propose a method for extracting event networks from large news text data. To this end, we first collected Korean political and social articles from March 2016 to March 2017, and integrated the synonyms by leaving only meaningful words through preprocessing using NPMI and Word2Vec. Latent Dirichlet allocation (LDA) topic modeling was used to calculate the subject distribution by date and to find the peak of the subject distribution and to detect the event. A total of 32 topics were extracted from the topic modeling, and the point of occurrence of the event was deduced by looking at the point at which each subject distribution surged. As a result, a total of 85 events were detected, but the final 16 events were filtered and presented using the Gaussian smoothing technique. We also calculated the relevance score between events detected to construct the event network. Using the cosine coefficient between the co-occurred events, we calculated the relevance between the events and connected the events to construct the event network. Finally, we set up the event network by setting each event to each vertex and the relevance score between events to the vertices connecting the vertices. The event network constructed in our methods helped us to sort out major events in the political and social fields in Korea that occurred in the last one year in chronological order and at the same time identify which events are related to certain events. Our approach differs from existing event detection methods in that LDA topic modeling makes it possible to easily analyze large amounts of data and to identify the relevance of events that were difficult to detect in existing event detection. We applied various text mining techniques and Word2vec technique in the text preprocessing to improve the accuracy of the extraction of proper nouns and synthetic nouns, which have been difficult in analyzing existing Korean texts, can be found. In this study, the detection and network configuration techniques of the event have the following advantages in practical application. First, LDA topic modeling, which is unsupervised learning, can easily analyze subject and topic words and distribution from huge amount of data. Also, by using the date information of the collected news articles, it is possible to express the distribution by topic in a time series. Second, we can find out the connection of events in the form of present and summarized form by calculating relevance score and constructing event network by using simultaneous occurrence of topics that are difficult to grasp in existing event detection. It can be seen from the fact that the inter-event relevance-based event network proposed in this study was actually constructed in order of occurrence time. It is also possible to identify what happened as a starting point for a series of events through the event network. The limitation of this study is that the characteristics of LDA topic modeling have different results according to the initial parameters and the number of subjects, and the subject and event name of the analysis result should be given by the subjective judgment of the researcher. Also, since each topic is assumed to be exclusive and independent, it does not take into account the relevance between themes. Subsequent studies need to calculate the relevance between events that are not covered in this study or those that belong to the same subject.