• Title/Summary/Keyword: Data Scientists

Search Result 3,357, Processing Time 0.04 seconds

Performance Analysis of Multilevel Data Structures with Adaptive Data Reorganization (적응적 자료 재구성에 의한 다중레벨 자료구조의 성능 분석)

  • 최창열;정지영;김성수
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.40-42
    • /
    • 2001
  • 본 논문에서는 공유 저장장치 기반 클러스터 시스템의 평균 서비스 시간을 줄이기 위해서 클러스터 시스템의 연선 종류와 부하량에 따라 재구성 시점을 결정하는 적응적 자료 재구성 방식(Adaptive Data Reorganization)을 제안한다. 또한 클러스터 시스템에서 처리해야할 서비스가 없고 서비스 요청도 없을 때 자료 재구성을 수행하는 지연 재구성 방식(Deferred Reorganization)을 채택하였다. 자료 재구성 방식은 전체적인 자료 재구성과 부분적인 자료 재구성으로 나누어 실행된다. 또한 적응적 자료 재구성 방식을 통한 공유 저장장치 기반 클러스터 시스템의 성능 평가를 위해 마르코프 모델(Markov Model)을 사용한다.

Scalable Data Visualization with Various Functional Representations (다양한 함수를 이용한 확장성 있는 데이터 가시화)

  • Jang, Yun
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06c
    • /
    • pp.413-414
    • /
    • 2012
  • Currently the amount and variety of data being generated is unprecedented and dramatically changes the way individuals, groups, and societies act and make decisions. Visualization is one of the most important commonly used methods of analyzing and interpreting digital assets and the interactive environments are necessary to enable effective discovery and decision making. In this paper we present several examples and approaches to scalable functional representations and interactive visualization and analysis. The functional representations provide us unified, compact, continuous, multi-scale, and compressed representations in the data domain.

Hash-based Parallel Join Schemes Supporting Dynamic Load Balancing in Data Sharing Systems (데이터 공유 시스템에서 동적 부하분산을 지원하는 해쉬 기반 병렬 조인 처리 기법)

  • 문애경;조행래
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1999.10a
    • /
    • pp.249-251
    • /
    • 1999
  • 해싱 함수를 이용하여 작업을 여러 노드에 분할해서 실행하는 해쉬 기반 병렬 조인 기법에서 Data Skew는 특정 노드에 부하를 집중시키므로 시스템의 성능을 떨어뜨린다. 본 논문에서는 기본적인 해쉬 기반 조인 기법을 데이터 공유시스템에 적용하고, Data Skew를 해결하기 위하여 동적 작업 할당과 부하가 집중된 노드의 작업을 다른 노드로 재할당하는 작업 재배치 방법을 제안한다. 제안된 기법들의 성능을 분석하기 위하여 모의 실험을 수행하였으며, 모든 노드에서 데이터베이스가 저장된 디스크를 공유하는 데이터 공유 시스템의 겨우 동적 작업 할당과 작업 재배치 방법이 효과적임을 알 수 있었다.

  • PDF

A Study on Transport Protocol for High Speed Networking

  • Kwon, Yoon-Joo;Seok, Woo-Jin;Byeon, Ok-Hwan
    • Proceedings of the Korea Society of Information Technology Applications Conference
    • /
    • 2005.11a
    • /
    • pp.211-214
    • /
    • 2005
  • There are emerging many eScience applications. More and more scientists want to collaborate on their investigation with international partners without space limitation by using these applications. Since these applications have to analyze the massive raw data, scientists need to send and receive the data in short time. So today's network related requirement is high speed networking. The key point of network performance is transport protocol. We can use TCP and UDP as transport protocol but we use TCP due to the data reliability. However, TCP was designed under low bandwidth network, therefore, general TCP, for example Reno, cannot utilize the whole bandwidth of high capacity network. There are several TCP variants to solve TCP problems related to high speed networking. They can be classified into two groups: loss based TCP and delay based TCP. In this paper, I will compare two approaches of TCP variants and propose a hybrid approach for high speed networking.

  • PDF

UNIVAC No.A DATA COMMUNICATION SYSTEM

  • 金吉昌
    • Communications of the Korean Institute of Information Scientists and Engineers
    • /
    • v.4 no.1
    • /
    • pp.16-25
    • /
    • 1986
  • Data communication is the transmission of coded information between terminals and computers or between multiple computers. A large, complex, and geographically dispersed present industrial society requires processing of real-time inputs that must be collected from many different locations and production of real-time outpts that must be distributed to many defferent locations. As one of many alternatives to meet the above requirement, a data communication system can be used.

Data-Compression-Based Resource Management in Cloud Computing for Biology and Medicine

  • Zhu, Changming
    • Journal of Computing Science and Engineering
    • /
    • v.10 no.1
    • /
    • pp.21-31
    • /
    • 2016
  • With the application and development of biomedical techniques such as next-generation sequencing, mass spectrometry, and medical imaging, the amount of biomedical data have been growing explosively. In terms of processing such data, we face the problems surrounding big data, highly intensive computation, and high dimensionality data. Fortunately, cloud computing represents significant advantages of resource allocation, data storage, computation, and sharing and offers a solution to solve big data problems of biomedical research. In order to improve the efficiency of resource management in cloud computing, this paper proposes a clustering method and adopts Radial Basis Function in order to compress comprehensive data sets found in biology and medicine in high quality, and stores these data with resource management in cloud computing. Experiments have validated that with such a data-compression-based resource management in cloud computing, one can store large data sets from biology and medicine in fewer capacities. Furthermore, with reverse operation of the Radial Basis Function, these compressed data can be reconstructed with high accuracy.

An XML Schema-based Semantic Data Integration (XML Schema기반 시맨틱 데이타 통합)

  • Kim Dong-Kwang;Jeong Karp-Joo;Shin Hyo-Seop;Hwang Sun-Tae
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.33 no.9
    • /
    • pp.563-573
    • /
    • 2006
  • Cyber-infrastructures for scientific and engineering applications require integrating heterogeneous legacy data in different formats and from various domains. Such data integration raises challenging issues: (1) Support for multiple independently-managed schemas, (2) Ease of schema evolution, and (3) Simple schema mappings. In order to address these issues, we propose a novel approach to semantic integration of scientific data which uses XML schemas and RDF-based schema mappings. In this approach, XML schema al-lows scientists to manage data models intuitively and to use commodity XML DBMS tools. A simple RDF-based ontological representation scheme is used for only structural relations among independently-managed XML schemas from different institutes or domains We present the design and implementation of a prototype system developed for the national cyber-environments for civil engi-neering research activities in Korea (similar to the NEES project in USA) which is called KOCEDgrid (http://www.koced.net).

Scientists' Information Behavior for Bridging the Gaps Encountered in the Process of the Scientific Research Lifecycle (과학기술분야 연구활동 단계별 문제상황 극복을 위한 정보행동 연구)

  • Lee, Jung-Yeoun;Chung, Eun-Kyung;Kwon, Na-Hyun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.3
    • /
    • pp.99-122
    • /
    • 2012
  • This study analyzed scientists information behaviors when they engage in solving specific research problems in various situations throughout the entire scientific R&D lifecycle process. In-depth interviews with a total of 24 scientists were conducted in their research laboratories, the scientists' everyday workplace and the contexts of scientific research. The theoretical and methodological frameworks employed for this study were Dervin's Sense-making, Savolainen's Everyday Life Information Seeking, and Engestrom's Activity Theory. The findings of this study informed context-specific research and information behaviors of the scientists in the 14 sub stages of the five-stage of R&D lifecycle. Specifically, the study revealed the research objectives and related information behaviors (e.g., information needs, information seeking, information sources and channels, information barriers, etc.) to achieve the objectives at each sub-stage. The study results provided essential information to re-design the information services and strategies that accommodate the scientific R&D lifecycle.

A Comparison of Scientists' and Students' Responses to Discrepant Event and Alternative Hypothesis in the Conceptual Change Processes from the Phlogiston Theory to the Oxygen Theory (플로지스톤설에서 산소설로의 개념 변화 과정에서 변칙 사례와 대안 가설에 대한 과학자들과 학생들의 반응 비교)

  • Noh, Tae-Hee;Yun, Jeong-Hyun;Kang, Hun-Sik;Kang, Suk-Jin
    • Journal of The Korean Association For Science Education
    • /
    • v.26 no.7
    • /
    • pp.798-804
    • /
    • 2006
  • In this study, we investigated students' responses to a discrepant event and an alternative hypothesis which had been presented in the conceptual change processes from the phlogiston theory to the oxygen theory, and compared them with scientists' responses. The data concerning scientists' responses to the discrepant event and the alternative hypothesis were gathered from the relevant literature on the history of science. Subjects were 148 eighth graders who possessed the target misconception about combustion through a preconception test. After having been presented with the discrepant event and the alternative hypothesis, students were asked to respond to the test of response to discrepant event. Although similar types of responses were obtained from both scientists and students, there was also a clear difference. Scientists tended to focus on explaining the problems of the discrepant event, whereas students tended to ignore and/or exclude the discrepant event in order to maintain their previous beliefs. Only a few students were also found to change their beliefs after having been presented with the alternative hypothesis.

A Semiotics Framework for Analyzing Data Provenance Research

  • Ram, Sudha;Liu, Jun
    • Journal of Computing Science and Engineering
    • /
    • v.2 no.3
    • /
    • pp.221-248
    • /
    • 2008
  • Data provenance is the background knowledge that enables a piece of data to be interpreted and used correctly within context. The importance of tracking provenance is widely recognized, as witnessed by significant research in various areas including e-science, homeland security, and data warehousing and business intelligence. In order to further advance the research on data provenance, however, one must first understand the research that has been conducted to date and identify specific topics that merit further investigation. In this work, we develop a framework based on semiotics theory to assist in analyzing and comparing existing provenance research at the conceptual level. We provide a detailed review of data provenance research and compare and contrast the research based on d semiotics framework. We conclude with an identification of challenges that will drive future research in this field.