• Title/Summary/Keyword: Scientific Computing

Search Result 182, Processing Time 0.022 seconds

KISTI-ML Platform: A Community-based Rapid AI Model Development Tool for Scientific Data (KISTI-ML 플랫폼: 과학기술 데이터를 위한 커뮤니티 기반 AI 모델 개발 도구)

  • Lee, Jeongcheol;Ahn, Sunil
    • Journal of Internet Computing and Services
    • /
    • v.20 no.6
    • /
    • pp.73-84
    • /
    • 2019
  • Machine learning as a service, the so-called MLaaS, has recently attracted much attention in almost all industries and research groups. The main reason for this is that you do not need network servers, storage, or even data scientists, except for the data itself, to build a productive service model. However, machine learning is often very difficult for most developers, especially in traditional science due to the lack of well-structured big data for scientific data. For experiment or application researchers, the results of an experiment are rarely shared with other researchers, so creating big data in specific research areas is also a big challenge. In this paper, we introduce the KISTI-ML platform, a community-based rapid AI model development for scientific data. It is a place where machine learning beginners use their own data to automatically generate code by providing a user-friendly online development environment. Users can share datasets and their Jupyter interactive notebooks among authorized community members, including know-how such as data preprocessing to extract features, hidden network design, and other engineering techniques.

A Data Placement Scheme for the Characteristics of Data Intensive Scientific Workflow Applications (데이터 집약 과학 워크플로우 응용의 특성을 고려한 데이터 배치 기법)

  • Ahn, Julim;Kim, Yoonhee
    • KNOM Review
    • /
    • v.21 no.2
    • /
    • pp.46-52
    • /
    • 2018
  • For data-intensive scientific workflow application experiments that leverage the cloud computing environment, large amounts of data can be distributed across multiple data centers in the cloud. The generated intermediate data can also be transmitted through access between different data centers. When the application is executed, the execution result is changed according to the location of the data since the intermediate data generated is used. However, existing data placement strategies do not consider the characteristics of scientific applications. In this paper, we define a data-intensive tasks and propose runtime data placement in that interval. Through the proposed data placement scheme, we analyze the scenarios considering the number of times in the data intensive tasks defined in this study and derive the results. In addition, performance was compared by analyzing runtime data placement times and runtime data placement overhead.

The Effect of Physical Computing Education to Improve the Convergence Capability of Secondary Mathematics-Science Gifted Students (중등 수학과학 영재를 위한 피지컬컴퓨팅 교육이 융합적 역량 향상에 미치는 영향)

  • Kim, Jihyun;Kim, Taeyoung
    • The Journal of Korean Association of Computer Education
    • /
    • v.19 no.2
    • /
    • pp.87-98
    • /
    • 2016
  • Our study is composed of Arduino robot assembly, board connecting and collaborative programming learning, and it is to evaluate their effect on improving secondary mathematics-science gifted students' convergence capability. Research results show that interpersonal skills, information-scientific creativity and integrative thinking disposition are improved. Further, by analyzing the relationship between the sub-elements of each thinking element, persistence and imagination for solving problems, interest of scientific information, openness, sense of adventure, a logical attitude, communication, productive skepticism and so on are extracted as important factors in convergence learning. Thus, as the result of our study, we know that gifted students conducted various thinking activities in their learning process to solve the problem, and it can be seen that convergence competencies are also improved significantly.

Analysis of Computer Scientific Attitude of Information Gifted Students in the University of Science Education Institute for Gifted (대학교부설 과학영재교육원의 정보영재 학생들의 컴퓨터 과학적 태도 분석)

  • Chung, Jong-In
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.8
    • /
    • pp.193-200
    • /
    • 2018
  • There are 27 science education institutes for gifted education institutes in the university with support from the MSIT (Ministry of Science and ICT). Mathematics, physics, chemistry, biology, earth sciences, and information classes are given in each science education institute for the gifted. The authors developed a curriculum with components of computing thinking for information-gifted students. To determine if the curriculum is effective on the computer scientific attitude of the information gifted, TOSRA was modified and the test was then developed. Information students were educated at K university's science education institute for the gifted with the developed curriculum for one year and the computer scientific attitude of them was tested. According to the test results, there was a significant difference in the computer scientific attitude of the curriculum conducted at the institute at 0.05 level of significance. Statistically significant differences were observed in the social implications of computer science, attitudes of computer scientific inquiry, and the normality of computer technicians at the level of significance of 0.05. On the other hand, there were no significant differences in the adoption of computer scientific attitudes, the enjoyment of computer science lessons, leisure interest in computer science, and career interest in computer science.

Priority Data Handling in Pipeline-based Workflow (파이프라인 기반 워크플로우의 우선 데이터 처리 방안)

  • Jeon, Wonpyo;Heo, Daeyoung;Hwang, Suntae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.12
    • /
    • pp.691-697
    • /
    • 2017
  • Volcanic ash has been predicted to be the main source of damage caused by a potential volcanic disaster around Mount Baekdu and the regions of the Korean peninsula. Computer simulations to predict the diffusion of volcanic ash should be performed according to prevalent meteorological situations within a predetermined time. Therefore, a workflow using pipelining is proposed to parallelize the software used for this computation. Due to the nature of volcanic calamities, the simulations need to be carried out for various plausible conditions given that the parameters cannot be precisely determined during the simulations, even at the time of a volcanic eruption. Among the given conditions, computations need to be first performed for the condition with the highest probability so that a response to the volcanic disaster can be provided using these results. Further action can then be performed later based on subsequent results. The computations need to be performed using a volcanic disaster damage prediction system on a computing server with limited computing performance. Hence, an optimal distribution of the computing resources is required. We propose a method through which specific data can be provided first to the proposed pipeline-based workflow.

An Efficient Method for Determining Work Process Number of Each Node on Computation Grid (계산 그리드 상에서 각 노드의 작업 프로세스 수를 결정하기 위한 효율적인 방법)

  • Kim Young-Hak;Cho Soo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.1
    • /
    • pp.189-199
    • /
    • 2005
  • The grid computing is a technique to solve big problems such as a field of scientific technique by sharing the computing power and a big storage space of the numerous computers on the distributed network. The environment of the grid computing is composed with the WAN which has a different performance and a heterogeneous network condition. Therefore, it is more important to reflect heterogeneous performance elements to calculation work. In this paper, we propose an efficient method that decides work process number of each node by considering a network state information. The network state information considers the latency, the bandwidth and latency-bandwidth mixture information. First, using information which was measured, we compute the performance ratio and decide work process number of each node. Finally, RSL file was created automatically based on work process number which was decided, and then accomplishes a work. The network performance information is collected by the NWS. According to experimental results, the method which was considered of network performance information is improved respectively 23%, 31%, and 57%, compared to the methods of existing in a viewpoint of work amount, work process number, and node number.

  • PDF

InterCom : Design and Implementation of an Agent-based Internet Computing Environment (InterCom : 에이전트 기반 인터넷 컴퓨팅 환경 설계 및 구현)

  • Kim, Myung-Ho;Park, Kweon
    • The KIPS Transactions:PartA
    • /
    • v.8A no.3
    • /
    • pp.235-244
    • /
    • 2001
  • Development of network and computer technology results in many studies to use physically distributed computers as a single resource. Generally, these studies have focused on developing environments based on message passing. These environments are mainly used to solve problems for scientific computation and process in parallel suing inside parallelism of the given problems. Therefore, these environments provide high parallelism generally, while it is difficult to program and use as well as it is required to have user accounts in the distributed computers. If a given problem is divided into completely independent subproblems, more efficient environment can be provided. We can find these problems in bio-informatics, 3D animatin, graphics, and etc., so the development of new environment for these problems can be considered to be very important. Therefore, we suggest new environment called InterCom based on a proxy computing, which can solve these problems efficiently, and explain the implementation of this environment. This environment consists of agent, server, and client. Merits of this environment are easy programing, no need of user accounts in the distributed computers, and easiness by compiling distributed code automatically.

  • PDF

A Case Study of Drug Repositioning Simulation based on Distributed Supercomputing Technology (분산 슈퍼컴퓨팅 기술에 기반한 신약재창출 시뮬레이션 사례 연구)

  • Kim, Jik-Soo;Rho, Seungwoo;Lee, Minho;Kim, Seoyoung;Kim, Sangwan;Hwang, Soonwook
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.15-22
    • /
    • 2015
  • In this paper, we present a case study for a drug repositioning simulation based on distributed supercomputing technology that requires highly efficient processing of large-scale computations. Drug repositioning is the application of known drugs and compounds to new indications (i.e., new diseases), and this process requires efficient processing of a large number of docking tasks with relatively short per-task execution times. This mechanism shows the main characteristics of a Many-Task Computing (MTC) application, and as a representative case of MTC applications, we have applied a drug repositioning simulation in our HTCaaS system which can leverage distributed supercomputing infrastructure, and show that efficient task dispatching, dynamic resource allocation and load balancing, reliability, and seamless integration of multiple computing resources are crucial to support these challenging scientific applications.

Stock News Dataset Quality Assessment by Evaluating the Data Distribution and the Sentiment Prediction

  • Alasmari, Eman;Hamdy, Mohamed;Alyoubi, Khaled H.;Alotaibi, Fahd Saleh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.2
    • /
    • pp.1-8
    • /
    • 2022
  • This work provides a reliable and classified stocks dataset merged with Saudi stock news. This dataset allows researchers to analyze and better understand the realities, impacts, and relationships between stock news and stock fluctuations. The data were collected from the Saudi stock market via the Corporate News (CN) and Historical Data Stocks (HDS) datasets. As their names suggest, CN contains news, and HDS provides information concerning how stock values change over time. Both datasets cover the period from 2011 to 2019, have 30,098 rows, and have 16 variables-four of which they share and 12 of which differ. Therefore, the combined dataset presented here includes 30,098 published news pieces and information about stock fluctuations across nine years. Stock news polarity has been interpreted in various ways by native Arabic speakers associated with the stock domain. Therefore, this polarity was categorized manually based on Arabic semantics. As the Saudi stock market massively contributes to the international economy, this dataset is essential for stock investors and analyzers. The dataset has been prepared for educational and scientific purposes, motivated by the scarcity of data describing the impact of Saudi stock news on stock activities. It will, therefore, be useful across many sectors, including stock market analytics, data mining, statistics, machine learning, and deep learning. The data evaluation is applied by testing the data distribution of the categories and the sentiment prediction-the data distribution over classes and sentiment prediction accuracy. The results show that the data distribution of the polarity over sectors is considered a balanced distribution. The NB model is developed to evaluate the data quality based on sentiment classification, proving the data reliability by achieving 68% accuracy. So, the data evaluation results ensure dataset reliability, readiness, and high quality for any usage.

Cascading Citation Expansion

  • Chen, Chaomei
    • Journal of Information Science Theory and Practice
    • /
    • v.6 no.2
    • /
    • pp.6-23
    • /
    • 2018
  • Digital Science's Dimensions is envisaged as a next-generation research and discovery platform for more efficient access to cross-referenced scholarly publications, grants, patents, and clinical trials. As a new addition to the growing open citation resources, it offers opportunities that may benefit a wide variety of stakeholders of scientific publications, from researchers and policy makers to the general public. In this article, we explore and demonstrate some of the practical potentials in terms of cascading citation expansions. Given a set of publications, the cascading citation expansion process can be iteratively applied to a set of articles so as to extend the coverage to more and more relevant articles through citation links. Although the conceptual origin can be traced back to Garfield's citation indexing, it has been largely limited, until recently, to the few who have unrestricted access to a citation database that is large enough to sustain such iterative expansions. Building on the open application program interface of Dimensions, we integrate cascading citation expansion functions in CiteSpace and demonstrate how one may benefit from these new capabilities. In conclusion, cascading citation expansion has the potential to improve our understanding of the structure and dynamics of scientific knowledge.