• 제목/요약/키워드: Scientific data

검색결과 2,657건 처리시간 0.031초

Performance Assessment of Machine Learning and Deep Learning in Regional Name Identification and Classification in Scientific Documents (머신러닝을 이용한 과학기술 문헌에서의 지역명 식별과 분류방법에 대한 성능 평가)

  • Jung-Woo Lee;Oh-Jin Kwon
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • 제19권2호
    • /
    • pp.389-396
    • /
    • 2024
  • Generative AI has recently been utilized across all fields, achieving expert-level advancements in deep data analysis. However, identifying regional names in scientific literature remains a challenge due to insufficient training data and limited AI application. This study developed a standardized dataset for effectively classifying regional names using address data from Korean institution-affiliated authors listed in the Web of Science. It tested and evaluated the applicability of machine learning and deep learning models in real-world problems. The BERT model showed superior performance, with a precision of 98.41%, recall of 98.2%, and F1 score of 98.31% for metropolitan areas, and a precision of 91.79%, recall of 88.32%, and F1 score of 89.54% for city classifications. These findings offer a valuable data foundation for future research on regional R&D status, researcher mobility, collaboration status, and so on.

Multivariable Bayesian curve-fitting under functional measurement error model

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권6호
    • /
    • pp.1645-1651
    • /
    • 2016
  • A lot of data, particularly in the medical field, contain variables that have a measurement error such as blood pressure and body mass index. On the other hand, recently smoothing methods are often used to solve a complex scientific problem. In this paper, we study a Bayesian curve-fitting under functional measurement error model. Especially, we extend our previous model by incorporating covariates free of measurement error. In this paper, we consider penalized splines for non-linear pattern. We employ a hierarchical Bayesian framework based on Markov Chain Monte Carlo methodology for fitting the model and estimating parameters. For application we use the data from the fifth wave (2012) of the Korea National Health and Nutrition Examination Survey data, a national population-based data. To examine the convergence of MCMC sampling, potential scale reduction factors are used and we also confirm a model selection criteria to check the performance.

Deriving the Properties of Object Types for Research Data Relation Model

  • Kim, Suntae
    • Journal of Information Science Theory and Practice
    • /
    • 제1권2호
    • /
    • pp.84-92
    • /
    • 2013
  • In this study, the properties of the object types required to describe the relationship among research data resources, which may be generated during the life cycle of the research, are derived. The properties of Fedora Commons and DSpace, which are open source software used for resource management, and schema properties published in DataCite were analyzed. Based on relation names of Fedora Commons, nine new relation names were derived. Thirty-eight object type properties consolidating the target properties of the analysis were derived. The result of this study can be used as basic material for crosswalk research studies of object type relation terms to ensure interoperability among the systems.

Federated Named Data Networking Testbed for Climate Science

  • Ni, Alexander;Lim, Huhnkuk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제42권4호
    • /
    • pp.780-784
    • /
    • 2017
  • Data discovery and distribution application that is utilized by climate, high energy physics, and other scientific communities are experiencing performance and large scale data managing problems, that are rooted from the shortcomings of IP architecture. To solve this problem, newly developed data managing applications based on NDN architecture were introduced. In this letter, we present the federated NDN testbed with an NDN-based climate science application and the set of experiments that reflect the performance of NDN based climate application in general with determined and applied optimization.

Management Strategy of Hotspot Temporal Data using Minimum Overlap (최소 중복을 이용한 Hotspot 시간 데이터의 관리)

  • Kang, Ji-Hyung;Yun, Hong-Won
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 한국해양정보통신학회 2005년도 춘계종합학술대회
    • /
    • pp.196-199
    • /
    • 2005
  • We propose a strategy to manage temporal data which are occurred on scientific applications. Firstly, We define LB and RB to separate temporal data, and entity versions to be stored in past, current, future segments. Also, We describe an algorithm to migrate temporal data with hotspot distribution among segments. The performance evaluation of average response time and space utilization is conducted. Average response time between two methods is similar, and spare is saved in proposed method.

  • PDF

Towards Effective Entity Extraction of Scientific Documents using Discriminative Linguistic Features

  • Hwang, Sangwon;Hong, Jang-Eui;Nam, Young-Kwang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제13권3호
    • /
    • pp.1639-1658
    • /
    • 2019
  • Named entity recognition (NER) is an important technique for improving the performance of data mining and big data analytics. In previous studies, NER systems have been employed to identify named-entities using statistical methods based on prior information or linguistic features; however, such methods are limited in that they are unable to recognize unregistered or unlearned objects. In this paper, a method is proposed to extract objects, such as technologies, theories, or person names, by analyzing the collocation relationship between certain words that simultaneously appear around specific words in the abstracts of academic journals. The method is executed as follows. First, the data is preprocessed using data cleaning and sentence detection to separate the text into single sentences. Then, part-of-speech (POS) tagging is applied to the individual sentences. After this, the appearance and collocation information of the other POS tags is analyzed, excluding the entity candidates, such as nouns. Finally, an entity recognition model is created based on analyzing and classifying the information in the sentences.

Five Forces Model of Computational Power: A Comprehensive Measure Method

  • Wu, Meixi;Guo, Liang;Yang, Xiaotong;Xie, Lina;Wang, Shaopeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권7호
    • /
    • pp.2239-2256
    • /
    • 2022
  • In this paper, a model is proposed to comprehensively evaluate the computational power. The five forces model of computational power solves the problem that the measurement units of different indexes are not unified in the process of computational power evaluation. It combines the bidirectional projection method with TOPSIS method. This model is more scientific and effective in evaluating the comprehensive situation of computational power. Lastly, an example shows the validity and practicability of the model.

Analyzing the Status Quo of Docent Training Program and Searching Its Development Direction in Science Museum of Korea (과학관 도슨트 양성 프로그램의 실태 분석 및 발전 방향 모색)

  • Park, Young-Shin;Lee, Jung-Hwa
    • Journal of the Korean earth science society
    • /
    • 제32권7호
    • /
    • pp.881-901
    • /
    • 2011
  • The science museum in the past satisfied visitors only by interacting them with simple objects and exhibition, while one in modern times was requested to meet the need of visitors in their engagement in educational programs. To meet the visitors' need, the science museum made efforts to train, educate, and assign docents so that they can interact with visitors and serve the educational purpose of visitation. In this study, we analyzed the strengths and weakness of docent training programs from science museums/science centers nationally and internationally, to make implication on how to design a docent training and professional program. Programs from four national and four international science centers/museums were selected as a sample for analysis. Their docent training programs were compared with the data of surveys and interviews and emails from docents and docent managers/evaluators. Artifacts and documents of the docent training programs were also collected and used to construct the validity in analyzing the data, resulting in the well-developed docent training program as the critical one for enriching science museum education. The results included; First, we need to recruit and train docents who interact visitors directly but they need to be differentiated from regular volunteers for promoting science museum education for the purpose of popularization of science. Additionally, Second, we need to develop and run docent training program where docents can experience 'informal learning' exhibition interpreting strategies through the real field from mentoring from the experienced/senior docents beyond 'formal learning' exhibition content. Third, we need to equip docents with skills to make scientific literacy possible at science museum-such as experiencing scientific ethics through scientific inquiry-which happens limited at school education.

Scientific fire investigation by NFPA 921 CODE based on frozen warehouse fire case (냉동창고 화재 사례를 기반으로 하는 NFPA 921 CODE에 의한 과학적 화재조사 연구)

  • Park, Kyong-Jin;Lee, Yong-KI;Cha, Sung-Sig;Jung, Dong-Young;Kim, Jang-Oh
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • 제19권8호
    • /
    • pp.78-85
    • /
    • 2018
  • In this study, we investigated the cases where there were many opinions in the judgment of the cause of ignition in the case of 20 cases of frozen warehouse fire that occurred in 2017.The research methodology is the scientific fire survey method prescribed by the NFPA 921 CODE. Scientific fire investigation method is fire investigation method by logical reasoning through hypothesis setting, minimizing errors in judgment of ignition source. On the other hand, unscientific fire investigation methods cause many errors by the intervention of irrational factors such as subjective estimation, reasoning judgment, etc. This eventually leads to the problem of human and material responsibility and academic deterioration. In particular, fire not seen as compared to sighted fire makes more errors in ignition sources in the cause investigation. In this study, we set the hypothesis A and hypothesis B based on the review of the fire investigation report and the field survey on the fire case of the cold storage warehouse front line that occurred at ** city ** Mart in 2017.The set hypothesis was tested by the NFPA 921 code. This analytical method will be constructed by NEW Paradigm as a source of fire that is not seen in the future and a source of ignorant fire.In addition, the experimental data of this study will be used to inform the manufacturer and operator of the refrigeration warehouse and serve as basic data for fire prevention.

Exploring the Difficulties of High School Students in Self-Directed Scientific Inquiry (고등학생의 자기 주도적 과학탐구연구에서 나타난 어려움 탐색)

  • Kim, Gahyoung;Ha, Minsu
    • Journal of The Korean Association For Science Education
    • /
    • 제39권6호
    • /
    • pp.707-715
    • /
    • 2019
  • The self-directed inquiry to improve students' core scientific competency is an important teaching method. Students experience a variety of difficulties in carrying out their inquiry tasks, sometimes fail to produce the desired results, or fail to perform a meaningless inquiry. This study was conducted to identify the causes of difficulties and failures in students' self-directed scientific inquiry. The study involved 16 high school students with experience in science research at science high schools and science-focused high schools. The data collection consisted of in-depth interviews centered on semi-structured open questions. Qualitative data analysis was imputed by finding paragraphs from the interview material that might reveal the difficulties and failures experienced by participants and the reasons for them. The study found that most of the causes of failure were lack of ability, incomplete procedures, and selection of complicated tasks. A variety of cognitive biases, such as overconfidence, planning fallacy, and groupthink, were also analyzed as causes. Based on the results of the study, it is necessary to develop an educational strategy that students can be fully prepared to reduce their trials and errors in a self-directed inquiry maximally.