• Title/Summary/Keyword: bio text mining

Search Result 24, Processing Time 0.023 seconds

Study of the Activation Plan for Rural Tourism of the Jeollabuk-do Using Big Data Analysis (빅데이터 분석을 통한 농촌관광 실태와 활성화 방안 연구: 전라북도를 중심으로)

  • Park, Ro Un;Lee, Ki Hoon
    • The Korean Journal of Community Living Science
    • /
    • v.27 no.spc
    • /
    • pp.665-679
    • /
    • 2016
  • This study examined the main factors for activating rural tourism of Jeollabuk-do using big data analysis. The tourism big data was gathered from public open data sources and social network services (SNS), and the analysis tools, 'Opinion Mining', 'Text Mining', and 'Social Network Analysis(SNA)' were used. The opinion mining and text mining analysis identified the key local contents of the 14 areas of Jeollabuk-do and the evaluations of customers on rural tourism. Social network analysis detected the relationships between their contents and determined the importance of the contents. The results of this research showed that each location in Jeollabuk-do had their specific contents attracting visitors and the number of contents affected the scale of tourists. In addition, the number of visitors might be large when their tourism contents were strongly correlated with the other contents. Hence, strong connections among their contents are a point to activate rural tourism. Social network analysis divided the contents into several clusters and derived the eigenvector centralities of the content nodes implying the importance of them in the network. Tourism was active when the nodes at high value of the eigenvector centrality were distributed evenly in every cluster; however the results were contrary when the nodes were located in a few clusters. This study suggests an action plan to extend rural tourism that develop valuable contents and connect the content clusters properly.

Using Data Mining Techniques for Analysis of the Impacts of COVID-19 Pandemic on the Domestic Stock Prices: Focusing on Healthcare Industry (데이터 마이닝 기법을 통한 COVID-19 팬데믹의 국내 주가 영향 분석: 헬스케어산업을 중심으로)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.21-45
    • /
    • 2021
  • Purpose This paper analyzed the impacts of domestic stock market by a global pandemic such as COVID-19. We investigated how the overall pattern of the stock market changed due to the impact of the COVID-19 pandemic. In particular, we analyzed in depth the pattern of stock price, as well, tried to find what factors affect on stock market index(KOSPI) in the healthcare industry due to the COVID-19 pandemic. Design/methodology/approach We built a data warehouse from the databases in various industrial and economic fields to analyze the changes in the KOSPI due to COVID-19, particularly, the changes in the healthcare industry centered on bio-medicine. We collected daily stock price data of the KOSPI centered on the KOSPI-200 about two years before and one year after the outbreak of COVID-19. In addition, we also collected various news related to COVID-19 from the stock market by applying text mining techniques. We designed four experimental data sets to develop decision tree-based prediction models. Findings All prediction models from the four data sets showed the significant predictive power with explainable decision tree models. In addition, we derived significant 10 to 14 decision rules for each prediction model. The experimental results showed that the decision rules were enough to explain the domestic healthcare stock market patterns for before and after COVID-19.

A Study on the Promising Future Biotechnology (바이오 미래유망 연구분야 도출에 관한 연구)

  • Kam, Ju-Sik;Kim, Moo-Woong;Par, Sang-Dai;Hyun, Byung-Hwan
    • Journal of Korea Technology Innovation Society
    • /
    • v.15 no.2
    • /
    • pp.345-368
    • /
    • 2012
  • As science and technology are the core engines of economic and social affairs, it is becoming increasingly necessary to explore new promising technologies in order to secure competitiveness in science and technology with a view to helping upgrade the country's overall competitiveness and promoting industrial development. The governments of major advanced countries provide R&D support for promising future technologies. Even in South Korea, a study is being carried out to set up a model for forecasting future technologies and reinforcing the relevant survey system. This study intends to explore methods of identifying promising future technologies in the bio-science sector, which has emerged as a new growth engine. It will use a text-mining technique to collect and analyze theses in the bio science sector. It will identify key research sectors by analyzing thesis contour lines, and then review promising future key research subjects through in-depth study.

  • PDF

LitCovid-AGAC: cellular and molecular level annotation data set based on COVID-19

  • Ouyang, Sizhuo;Wang, Yuxing;Zhou, Kaiyin;Xia, Jingbo
    • Genomics & Informatics
    • /
    • v.19 no.3
    • /
    • pp.23.1-23.7
    • /
    • 2021
  • Currently, coronavirus disease 2019 (COVID-19) literature has been increasing dramatically, and the increased text amount make it possible to perform large scale text mining and knowledge discovery. Therefore, curation of these texts becomes a crucial issue for Bio-medical Natural Language Processing (BioNLP) community, so as to retrieve the important information about the mechanism of COVID-19. PubAnnotation is an aligned annotation system which provides an efficient platform for biological curators to upload their annotations or merge other external annotations. Inspired by the integration among multiple useful COVID-19 annotations, we merged three annotations resources to LitCovid data set, and constructed a cross-annotated corpus, LitCovid-AGAC. This corpus consists of 12 labels including Mutation, Species, Gene, Disease from PubTator, GO, CHEBI from OGER, Var, MPA, CPA, NegReg, PosReg, Reg from AGAC, upon 50,018 COVID-19 abstracts in LitCovid. Contain sufficient abundant information being possible to unveil the hidden knowledge in the pathological mechanism of COVID-19.

Technology Convergence & Trend Analysis of Biohealth Industry in 5 Countries : Using patent co-classification analysis and text mining (5개국 바이오헬스 산업의 기술융합과 트렌드 분석 : 특허 동시분류분석과 텍스트마이닝을 활용하여)

  • Park, Soo-Hyun;Yun, Young-Mi;Kim, Ho-Yong;Kim, Jae-Soo
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.4
    • /
    • pp.9-21
    • /
    • 2021
  • The study aims to identify convergence and trends in technology-based patent data for the biohealth sector in IP5 countries (KR, EP, JP, US, CN) and present the direction of development in that industry. We used patent co-classification analysis-based network analysis and TF-IDF-based text mining as the principal methodology to understand the current state of technology convergence. As a result, the technology convergence cluster in the biohealth industry was derived in three forms: (A) Medical device for treatment, (B) Medical data processing, and (C) Medical device for biometrics. Besides, as a result of trend analysis based on technology convergence results, it is analyzed that Korea is likely to dominate the market with patents with high commercial value in the future as it is derived as a market leader in (B) medical data processing. In particular, the field is expected to require technology convergence activation policies and R&D support strategies for the technology as the possibility of medical data utilization by domestic bio-health companies expands, along with the policy conversion of the "Data 3 Act" passed by the National Assembly in January 2019.

Terminology Recognition System based on Machine Learning for Scientific Document Analysis (과학 기술 문헌 분석을 위한 기계학습 기반 범용 전문용어 인식 시스템)

  • Choi, Yun-Soo;Song, Sa-Kwang;Chun, Hong-Woo;Jeong, Chang-Hoo;Choi, Sung-Pil
    • The KIPS Transactions:PartD
    • /
    • v.18D no.5
    • /
    • pp.329-338
    • /
    • 2011
  • Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.

A Study on Analysis of Patent Information Based Biotechnology Research Trend and Promising Research Themes (특허정보 기반의 바이오 기술개발 트렌드 분석 및 유망기술분야 도출에 관한 연구)

  • Kam, Ju Sik;Kim, Moo Woong;Hyun, Byung Hwan
    • Journal of Technology Innovation
    • /
    • v.21 no.2
    • /
    • pp.25-56
    • /
    • 2013
  • As science and technology are emphasized as national competitiveness, major nations designate new growth engine industry and establish the effective investment and the development strategy to enhance industrial development and competitiveness through science and technology. New industrial sectors such as Biotechnology and renewable energy have been spot lighted as major new growth engines and this competitive situation is getting fiercer. Universities and research institutions in each country selected and announced the future promising technological field which will produce ripple effect in the future on a regular basis. In Korea, various research institutions continue to select and announce the promising technological fields. In this study, we would like to study the method to derive the promising technological field in the field of biotechnology spotlighted as a new growth engine by utilizing patent information. We would like to derive the major technological field by collecting domestic and international patents in the field of biotechnology using IPC code based technological classification and identifying bio technological trends utilizing text mining method for analysis of technological development trends with patents. Patent contour of US and Korea is compared and analyzed through analysis of text mining to derive the general technological development field in the field of biotechnology. After that, we would like to investigate research theme of promising biotechnology focused technological development through details on technological trends through in-depth analysis about technological field which draws interest more and more in Korea and other countries.

  • PDF

A Maximum Entropy-Based Bio-Molecular Event Extraction Model that Considers Event Generation

  • Lee, Hyoung-Gyu;Park, So-Young;Rim, Hae-Chang;Lee, Do-Gil;Chun, Hong-Woo
    • Journal of Information Processing Systems
    • /
    • v.11 no.2
    • /
    • pp.248-265
    • /
    • 2015
  • In this paper, we propose a maximum entropy-based model, which can mathematically explain the bio-molecular event extraction problem. The proposed model generates an event table, which can represent the relationship between an event trigger and its arguments. The complex sentences with distinctive event structures can be also represented by the event table. Previous approaches intuitively designed a pipeline system, which sequentially performs trigger detection and arguments recognition, and thus, did not clearly explain the relationship between identified triggers and arguments. On the other hand, the proposed model generates an event table that can represent triggers, their arguments, and their relationships. The desired events can be easily extracted from the event table. Experimental results show that the proposed model can cover 91.36% of events in the training dataset and that it can achieve a 50.44% recall in the test dataset by using the event table.

StrokeMed: an integrated literature database for stroke and the differentiation of stroke syndrome

  • Kim, Young-Uk;Kim, Jin-Ho;Park, Young-Kyu;Kim, Young-Joo
    • Interdisciplinary Bio Central
    • /
    • v.2 no.2
    • /
    • pp.2.1-2.4
    • /
    • 2010
  • Complex diseases, such as stroke and cancer, have two or more genetic influences and are affected by environmental factors, which complicate them. Due to the complex characteristics of these diseases, we must search and study comprehensive literature-based article resources. Some disease-related literature databases have been developed through specialized journal issues or major websites. Most of them, however, are scattered throughout a website, and users encounter difficulties in finding accurate and comprehensive information easily and quickly. We developed StrokeMed, an integrated literature database for stroke and the differentiation of stroke syndrome. The system allows users to explore PubMed search results, categorized by MeSH (Medical Subject Headings), and the differentiation of stroke syndrome in Oriental medicine. StrokeMed collects data from important sites, such as PubMed, Scirus, and Scopus, automatically to maintain higher-quality and updated content. Currently, the system indexes more than 20,000 PubMed abstracts that are related to stroke, stroke etiology, and Oriental medicine. The system provides valuable literature information to the scientific and medical fields in stroke.

An Experimental Study on the Relation Extraction from Biomedical Abstracts using Machine Learning (기계 학습을 이용한 바이오 분야 학술 문헌에서의 관계 추출에 대한 실험적 연구)

  • Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.50 no.2
    • /
    • pp.309-336
    • /
    • 2016
  • This paper introduces a relation extraction system that can be used in identifying and classifying semantic relations between biomedical entities in scientific texts using machine learning methods such as Support Vector Machines (SVM). The suggested system includes many useful functions capable of extracting various linguistic features from sentences having a pair of biomedical entities and applying them into training relation extraction models for maximizing their performance. Three globally representative collections in biomedical domains were used in the experiments which demonstrate its superiority in various biomedical domains. As a result, it is most likely that the intensive experimental study conducted in this paper will provide meaningful foundations for research on bio-text analysis based on machine learning.