• Title/Summary/Keyword: Scientific Dataset

Search Result 41, Processing Time 0.021 seconds

Analysis and Implications of Australian National Data Service(ANDS) (오스트레일리아의 과학데이터 서비스체제(ANDS) 분석과 시사점)

  • Park, Dong-Jin
    • Journal of Digital Convergence
    • /
    • v.9 no.3
    • /
    • pp.1-10
    • /
    • 2011
  • Our country does not currently have a concrete policy for the management and preservation of the scientific dataset on the national level. The scientists and the research groups that are implementing a research project are not capable of searching or sharing the information about the dataset. In this situation where there is a major increase in the number of researches that use digitalized dataset, being able to share and reuse the scientific data amongst researchers is recognized to be very important. Therefore our country needs a new formulated policy that manages scientific data on the national level. This paper helps to find the implications of the strategic planning in our country by analyzing previous advanced case studies done by foreign countries. We selected Australia as our subject because its intensive government-driven research environment, research infrastructure and information service are very similar to Korea. To be specific, we analyzed ANDS (Australian National Data Service) and drew out the implications that could be applied to our country also. And finally we propose the basic principles that needs to be mirrored when formulating a policy on our country's scientific data.

OryzaGP: rice gene and protein dataset for named-entity recognition

  • Larmande, Pierre;Do, Huy;Wang, Yue
    • Genomics & Informatics
    • /
    • v.17 no.2
    • /
    • pp.17.1-17.3
    • /
    • 2019
  • Text mining has become an important research method in biology, with its original purpose to extract biological entities, such as genes, proteins and phenotypic traits, to extend knowledge from scientific papers. However, few thorough studies on text mining and application development, for plant molecular biology data, have been performed, especially for rice, resulting in a lack of datasets available to solve named-entity recognition tasks for this species. Since there are rare benchmarks available for rice, we faced various difficulties in exploiting advanced machine learning methods for accurate analysis of the rice literature. To evaluate several approaches to automatically extract information from gene/protein entities, we built a new dataset for rice as a benchmark. This dataset is composed of a set of titles and abstracts, extracted from scientific papers focusing on the rice species, and is downloaded from PubMed. During the 5th Biomedical Linked Annotation Hackathon, a portion of the dataset was uploaded to PubAnnotation for sharing. Our ultimate goal is to offer a shared task of rice gene/protein name recognition through the BioNLP Open Shared Tasks framework using the dataset, to facilitate an open comparison and evaluation of different approaches to the task.

GLOVE: Distributed Shared Memory Based Parallel Visualization Tool for Massive Scientific Dataset (GLOVE: 대용량 과학 데이터를 위한 분산공유메모리 기반 병렬 가시화 도구)

  • Lee, Joong-Youn;Kim, Min Ah;Lee, Sehoon;Hur, Young Ju
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.6
    • /
    • pp.273-282
    • /
    • 2016
  • Visualization tool can be divided by three components - data I/O, visual transformation and interactive rendering. In this paper, we present requirements of three major components on visualization tools for massive scientific dataset and propose strategies to develop the tool which satisfies those requirements. In particular, we present how to utilize open source softwares to efficiently realize our goal. Furthermore, we also study the way to combine several open source softwares which are separately made to produce a single visualization software and optimize it for realtime visualization of massiv espatio-temporal scientific dataset. Finally, we propose a distributed shared memory based scientific visualization tool which is called "GLOVE". We present a performance comparison among GLOVE and well known open source visualization tools such as ParaView and VisIt.

Scientific and Technical Visualization for Ocean Process Simulations (해양과정시뮬레이션의 과학기술적가시화)

  • Choi Byung Ho
    • 한국전산유체공학회:학술대회논문집
    • /
    • 1999.05a
    • /
    • pp.1-10
    • /
    • 1999
  • This paper briefly introduces the work done up to 1998 during the past twenty years for numerical modeling of ocean process focussing on the neighbouring seas of Korean Peninsula. Modeling of global ocean dynamics has also been performed as a pathway to understand the regional ocean dynamics. The ocean simulation produces a vast amount of multidimensional multivariate dataset therefore adoption of scientific and technical visualization techniques were essential to properly understand the physics involved.

  • PDF

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction

  • Alasmari, Eman;Hamdy, Mohamed;Alyoubi, Khaled H.;Alotaibi, Fahd Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.

A Study on Scientific Article Recommendation System with User Profile Applying TPIPF (TPIPF로 계산된 이용자프로파일을 적용한 논문추천시스템에 대한 연구)

  • Zhang, Lingling;Chang, Woo Kwon
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.1
    • /
    • pp.317-336
    • /
    • 2016
  • Nowadays users spend more time and effort to find what they want because of information overload. To solve the problem, scientific article recommendation system analyse users' needs and recommend them proper articles. However, most of the scientific article recommendation systems neglected the core part, user profile. Therefore, in this paper, instead of mean which applied in user profile in previous studies, New TPIPF (Topic Proportion-Inverse Paper Frequency) was applied to scientific article recommendation system. Moreover, the accuracy of two scientific article recommendation systems with above different methods was compared with experiments of public dataset from online reference manager, CiteULike. As a result, the proposed scientific article recommendation system with TPIPF was proven to be better.

Development of Dataset Evaluation Criteria for Learning Deepfake Video (딥페이크 영상 학습을 위한 데이터셋 평가기준 개발)

  • Kim, Rayng-Hyung;Kim, Tae-Gu
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.193-207
    • /
    • 2021
  • As Deepfakes phenomenon is spreading worldwide mainly through videos in web platforms and it is urgent to address the issue on time. More recently, researchers have extensively discussed deepfake video datasets. However, it has been pointed out that the existing Deepfake datasets do not properly reflect the potential threat and realism due to various limitations. Although there is a need for research that establishes an agreed-upon concept for high-quality datasets or suggests evaluation criterion, there are still handful studies which examined it to-date. Therefore, this study focused on the development of the evaluation criterion for the Deepfake video dataset. In this study, the fitness of the Deepfake dataset was presented and evaluation criterions were derived through the review of previous studies. AHP structuralization and analysis were performed to advance the evaluation criterion. The results showed that Facial Expression, Validation, and Data Characteristics are important determinants of data quality. This is interpreted as a result that reflects the importance of minimizing defects and presenting results based on scientific methods when evaluating quality. This study has implications in that it suggests the fitness and evaluation criterion of the Deepfake dataset. Since the evaluation criterion presented in this study was derived based on the items considered in previous studies, it is thought that all evaluation criterions will be effective for quality improvement. It is also expected to be used as criteria for selecting an appropriate deefake dataset or as a reference for designing a Deepfake data benchmark. This study could not apply the presented evaluation criterion to existing Deepfake datasets. In future research, the proposed evaluation criterion will be applied to existing datasets to evaluate the strengths and weaknesses of each dataset, and to consider what implications there will be when used in Deepfake research.

Method for Importance based Streamline Generation on the Massive Fluid Dynamics Dataset (대용량 유동해석 데이터에서의 중요도 기반 스트림라인 생성 방법)

  • Lee, Joong-Youn;Kim, Min Ah;Lee, Sehoon
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.27-37
    • /
    • 2018
  • Streamline generation is one of the most representative visualization methods to analyze the flow stream of fluid dynamics dataset. It is a challenging problem, however, to determine the seed locations for effective streamline visualization. Meanwhile, it needs much time to compute effective seed locations and streamlines on the massive flow dataset. In this paper, we propose not only an importance based method to determine seed locations for the effective streamline placements but also a parallel streamline visualization method on the distributed visualization system. Moreover, we introduce case studies on the real fluid dynamics dataset using GLOVE visualization system to evaluate the proposed method.

Collaborative Research Network and Scientific Productivity: The Case of Korean Statisticians and Computer Scientists

  • Kwon, Ki-Seok;Kim, Jin-Guk
    • Asian Journal of Innovation and Policy
    • /
    • v.6 no.1
    • /
    • pp.85-93
    • /
    • 2017
  • This paper focuses on the relationship between the characteristics of network and the productivity of scientists, which is rarely examined in previous studies. Utilizing a unique dataset from the Korean Citation Index (KCI), we examine the overall characteristics of the research network (e.g. distribution of nodes, density and mean distance), and analyze whether the network centrality is related to the scientific productivity. According to the results, firstly we have found that the collaborative research network of the Korean academics in the field of statistics and computer science is a scale-free network. Secondly, these research networks show a disciplinary difference. The network of statisticians is denser than that of computer scientists. In addition, computer scientists are located in a fragmented network compared to statisticians. Thirdly, with regard to the relationship between the researchers' network position and scientific productivity, a significant relation and their disciplinary difference have been observed. In particular, the degree centrality is the strongest predictor for the scientists' productivity. Based on these findings, some policy implications are put forward.

Accuracy Assessment of Global Land Cover Datasets in South Korea

  • Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.34 no.4
    • /
    • pp.601-610
    • /
    • 2018
  • The national accuracy of global land cover (GLC) products is of great importance to ecological and environmental research. However, GLC products that are derived from different satellite sensors, with differing spatial resolutions, classification methods, and classification schemes are certain to show some discrepancies. The goal of this study is to assess the accuracy of four commonly used GLC datasets in South Korea, GLC2000, GlobCover2009, MCD12Q1, and GlobeLand30. First, we compared the area of seven classes between four GLC datasets and a reference dataset. Then, we calculated the accuracy of the four GLC datasets based on an aggregated classification scheme containing seven classes, using overall, producer's and user's accuracies, and kappa coefficient. GlobeLand30 had the highest overall accuracy (77.59%). The overall accuracies of MCD12Q1, GLC2000, and GlobCover2009 were 75.51%, 68.38%, and 57.99%, respectively. These results indicate that GlobeLand30 is the most suitable dataset to support a variety of national scientific endeavors in South Korea.