• Title/Summary/Keyword: Large-scale Scientific Data

Search Result 53, Processing Time 0.029 seconds

The BIOWAY System: A Data Warehouse for Generalized Representation & Visualization of Bio-Pathways

  • Kim, Min Kyung;Seo, Young Joo;Lee, Sang Ho;Song, Eun Ha;Lee, Ho Il;Ahn, Chang Shin;Choi, Eun Chung;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • v.2 no.4
    • /
    • pp.191-194
    • /
    • 2004
  • Exponentially increasing biopathway data in recent years provide us with means to elucidate the large-scale modular organization of the cell. Given the existing information on metabolic and regulatory networks, inferring biopathway information through scientific reasoning or data mining of large scale array data or proteomics data get great attention. Naturally, there is a need for a user-friendly system allowing the user to combine large and diverse pathway data sets from different resources. We built a data warehouse - BIOWAY - for analyzing and visualizing biological pathways, by integrating and customizing resources. We have collected many different types of data in regards to pathway information, including metabolic pathway data from KEGG/LIGAND, signaling pathway data from BIND, and protein information data from SWISS-PROT. In addition to providing general data retrieval mechanism, a successful user interface should provide convenient visualization mechanism since biological pathway data is difficult to conceptualize without graphical representations. Still, the visual interface in the previous systems, at best, uses static images only for the specific categorized pathways. Thus, it is difficult to cope with more complex pathways. In the BIOWAY system, all the pathway data can be displayed in computer generated graphical networks, rather than manually drawn image data. Furthermore, it is designed in such a way that all the pathway maps can be expanded or shrinked, by introducing the concept of super node. A subtle graphic layout algorithm has been applied to best display the pathway data.

Design of a Large-scale Task Dispatching & Processing System based on Hadoop (하둡 기반 대규모 작업 배치 및 처리 기술 설계)

  • Kim, Jik-Soo;Cao, Nguyen;Kim, Seoyoung;Hwang, Soonwook
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.613-620
    • /
    • 2016
  • This paper presents a MOHA(Many-Task Computing on Hadoop) framework which aims to effectively apply the Many-Task Computing(MTC) technologies originally developed for high-performance processing of many tasks, to the existing Big Data processing platform Hadoop. We present basic concepts, motivation, preliminary results of PoC based on distributed message queue, and future research directions of MOHA. MTC applications may have relatively low I/O requirements per task. However, a very large number of tasks should be efficiently processed with potentially heavy inter-communications based on files. Therefore, MTC applications can show another pattern of data-intensive workloads compared to existing Hadoop applications, typically based on relatively large data block sizes. Through an effective convergence of MTC and Big Data technologies, we can introduce a new MOHA framework which can support the large-scale scientific applications along with the Hadoop ecosystem, which is evolving into a multi-application platform.

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.4
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Federated Named Data Networking Testbed for Climate Science

  • Ni, Alexander;Lim, Huhnkuk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.4
    • /
    • pp.780-784
    • /
    • 2017
  • Data discovery and distribution application that is utilized by climate, high energy physics, and other scientific communities are experiencing performance and large scale data managing problems, that are rooted from the shortcomings of IP architecture. To solve this problem, newly developed data managing applications based on NDN architecture were introduced. In this letter, we present the federated NDN testbed with an NDN-based climate science application and the set of experiments that reflect the performance of NDN based climate application in general with determined and applied optimization.

Large eddy simulation of turbulent flow using the parallel computational fluid dynamics code GASFLOW-MPI

  • Zhang, Han;Li, Yabing;Xiao, Jianjun;Jordan, Thomas
    • Nuclear Engineering and Technology
    • /
    • v.49 no.6
    • /
    • pp.1310-1317
    • /
    • 2017
  • GASFLOW-MPI is a widely used scalable computational fluid dynamics numerical tool to simulate the fluid turbulence behavior, combustion dynamics, and other related thermal-hydraulic phenomena in nuclear power plant containment. An efficient scalable linear solver for the large-scale pressure equation is one of the key issues to ensure the computational efficiency of GASFLOW-MPI. Several advanced Krylov subspace methods and scalable preconditioning methods are compared and analyzed to improve the computational performance. With the help of the powerful computational capability, the large eddy simulation turbulent model is used to resolve more detailed turbulent behaviors. A backward-facing step flow is performed to study the free shear layer, the recirculation region, and the boundary layer, which is widespread in many scientific and engineering applications. Numerical results are compared with the experimental data in the literature and the direct numerical simulation results by GASFLOW-MPI. Both time-averaged velocity profile and turbulent intensity are well consistent with the experimental data and direct numerical simulation result. Furthermore, the frequency spectrum is presented and a -5/3 energy decay is observed for a wide range of frequencies, satisfying the turbulent energy spectrum theory. Parallel scaling tests are also implemented on the KIT/IKET cluster and a linear scaling is realized for GASFLOW-MPI.

Runtime Prediction Based on Workload-Aware Clustering (병렬 프로그램 로그 군집화 기반 작업 실행 시간 예측모형 연구)

  • Kim, Eunhye;Park, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.38 no.3
    • /
    • pp.56-63
    • /
    • 2015
  • Several fields of science have demanded large-scale workflow support, which requires thousands of CPU cores or more. In order to support such large-scale scientific workflows, high capacity parallel systems such as supercomputers are widely used. In order to increase the utilization of these systems, most schedulers use backfilling policy: Small jobs are moved ahead to fill in holes in the schedule when large jobs do not delay. Since an estimate of the runtime is necessary for backfilling, most parallel systems use user's estimated runtime. However, it is found to be extremely inaccurate because users overestimate their jobs. Therefore, in this paper, we propose a novel system for the runtime prediction based on workload-aware clustering with the goal of improving prediction performance. The proposed method for runtime prediction of parallel applications consists of three main phases. First, a feature selection based on factor analysis is performed to identify important input features. Then, it performs a clustering analysis of history data based on self-organizing map which is followed by hierarchical clustering for finding the clustering boundaries from the weight vectors. Finally, prediction models are constructed using support vector regression with the clustered workload data. Multiple prediction models for each clustered data pattern can reduce the error rate compared with a single model for the whole data pattern. In the experiments, we use workload logs on parallel systems (i.e., iPSC, LANL-CM5, SDSC-Par95, SDSC-Par96, and CTC-SP2) to evaluate the effectiveness of our approach. Comparing with other techniques, experimental results show that the proposed method improves the accuracy up to 69.08%.

Bibliometric Analysis of Collaboration Network and the Role of Research Station in Antarctic Science

  • Kim, Hyunuk;Jung, Woo-Sung
    • Industrial Engineering and Management Systems
    • /
    • v.15 no.1
    • /
    • pp.92-98
    • /
    • 2016
  • Due to the large scale of Antarctic science, scientific collaboration is required for conducting scientific research. In this study, we attempted to investigate collaboration network and the role of research station in Antarctic science based on bibliometric data from 1995 to 2014. We confirmed that geographical proximity tends to be important for scientific collaboration by employing community detection in the network. This result raises the question about what the role of research station in Antarctica is. We tried to reveal its role by focusing on five countries, Belgium, China, Czech Republic, India, and Korea that constructed new research stations during the last decade. Relative growth rate, a value to measure the growth of publications, didn't differ much around the construction period compared to those in other periods for these countries except Belgium. However, we found geographical keywords emerged around the construction for all five countries. These keywords were utilized to observe national research activities in Antarctica. They show where countries started to be concerned about after the construction.

Systems Biology and Emerging Technologies Will Catalyze the Transition from Reactive Medicine to Predictive, Personalized, Preventive and Participatory (P4) Medicine

  • Galas, David J.;Hood, Leroy
    • Interdisciplinary Bio Central
    • /
    • v.1 no.2
    • /
    • pp.6.1-6.4
    • /
    • 2009
  • We stand at the brink of a fundamental change in how medicine will be practiced. Over the next 5-20 years medicine will move from being largely reactive to being predictive, personalized, preventive and participatory (P4). Technology and new scientific strategies have always been the drivers of revolutions and this is certainly the case for P4 medicine, where a systems approach to disease, new and emerging technologies and powerful computational tools will open new windows for the investigation of disease. Systems approaches are driving the emergence of fascinating new technologies that will permit billions of measurements on each individual patient. The challenge for health information technology will be how to reduce this enormous amount of data to simple hypotheses about health and disease. We predict that emerging technologies, together with the systems approaches to diagnosis, therapy and prevention will lead to a down turn in the escalating costs of healthcare. In time we will be able to export P4 medicine to the developing world and it will become the foundation of global medicine. The "democratization" of healthcare will come from P4 medicine. Its first real emergence will require the unprecedented integration of biology, medicine, technology and computation. as well as societal issues of major importance: ethical, regulatory, public policy, economic, and others. In order to effectively move the P4 scientific agenda forward new strategic partnerships are now being created with the large-scale integration of complementary skills, technologies, computational tools, patient records and samples and analysis of societal issues. It is evident that the business plans of every sector of the healthcare industry will need to be entirely transformed over the next 10 years.and the extent to which this will be done by existing companies as opposed to newly created companies is a fascinating question.

The Contribution of University-business Interaction to Innovation: Bibliometric Analysis (대학과 기업 간 상호협력에 따른 혁신창출 -계량서지학적 분석-)

  • Beck, Yeong Ki
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.15 no.4
    • /
    • pp.493-514
    • /
    • 2012
  • Research collaboration between industry and universities is high on many policy agenda's nowadays, especially with regard to science-based technological innovation. Nonetheless, there have been few attempts at examining large-scale systematic and quantitative data on the nature and extent of university-industry collaborations. The objective of this paper is to explore the patterns and trends of research collaborations between universities and companies for scientific knowledge production in the seven science-based technologies. This paper uses co-authored articles published in major scientific journals in the world as an indicator of collaborative scientific research between universities, companies and governmental research institutes. The tens of thousands of co-authorship papers in the northeast region in the US over the years 2006 to 2010 were analyzed for collaboration patterns and their spatial characteristics. This paper finds that there were increases both in the proportions of multiple authored, particularly five or more, papers, and in the volume of international collaborations. By examining a type of collaborations between different institutions, research collaboration between universities and companies in this region is relatively high share at national level. This suggests that the national or even international scale seems more appropriate for innovation policies.

  • PDF

Design of the new parallel processing architecture for commercial applications (상용 응용을 위한 병렬처리 구조 설계)

  • 한우종;윤석한;임기욱
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.33B no.5
    • /
    • pp.41-51
    • /
    • 1996
  • In this paper, anew parallel processing system based on a cluster architecture which provides scalability of a parallel processing system while maintains shared memory multiprocessor characteristics is proposed. In recent days low cost, high performnce microprocessors have led to construction of large scale parallel processing systems. Such parallel processing systems provides large scalability but are mainly used for scientific applications which have large data parallelism. A shared memory multiprocessor system like TICOM is currently used as aserver for the commercial application, however, the shared memory multiprocessor system is known to have very limited scalability. The proposed architecture can support scalability and performance of the parallel processing system while it provides adaptability for the commerical application, hence it can overcome the limitation of the shared memory multiprocessor. The architecture and characteristics of the proposed system shall be described. A proprietary hierarchical crsossbar network is designed for this system, of which the protocol, routing and switching technique and the signal transfer technique are optimized for the proposed architecture. The design trade-offs for the network are described in this paper and with simulation usihng the SES/workbench, it is explored that the network fits to the proposed architecture.

  • PDF