• Title/Summary/Keyword: Large scale of data

Search Result 2,785, Processing Time 0.033 seconds

Statistical Issues in Genomic Cohort Studies (유전체 코호트 연구의 주요 통계학적 과제)

  • Park, So-Hee
    • Journal of Preventive Medicine and Public Health
    • /
    • v.40 no.2
    • /
    • pp.108-113
    • /
    • 2007
  • When conducting large-scale cohort studies, numerous statistical issues arise from the range of study design, data collection, data analysis and interpretation. In genomic cohort studies, these statistical problems become more complicated, which need to be carefully dealt with. Rapid technical advances in genomic studies produce enormous amount of data to be analyzed and traditional statistical methods are no longer sufficient to handle these data. In this paper, we reviewed several important statistical issues that occur frequently in large-scale genomic cohort studies, including measurement error and its relevant correction methods, cost-efficient design strategy for main cohort and validation studies, inflated Type I error, gene-gene and gene-environment interaction and time-varying hazard ratios. It is very important to employ appropriate statistical methods in order to make the best use of valuable cohort data and produce valid and reliable study results.

Distributed In-Memory based Large Scale RDFS Reasoning and Query Processing Engine for the Population of Temporal/Spatial Information of Media Ontology (미디어 온톨로지의 시공간 정보 확장을 위한 분산 인메모리 기반의 대용량 RDFS 추론 및 질의 처리 엔진)

  • Lee, Wan-Gon;Lee, Nam-Gee;Jeon, MyungJoong;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.9
    • /
    • pp.963-973
    • /
    • 2016
  • Providing a semantic knowledge system using media ontologies requires not only conventional axiom reasoning but also knowledge extension based on various types of reasoning. In particular, spatio-temporal information can be used in a variety of artificial intelligence applications and the importance of spatio-temporal reasoning and expression is continuously increasing. In this paper, we append the LOD data related to the public address system to large-scale media ontologies in order to utilize spatial inference in reasoning. We propose an RDFS/Spatial inference system by utilizing distributed memory-based framework for reasoning about large-scale ontologies annotated with spatial information. In addition, we describe a distributed spatio-temporal SPARQL parallel query processing method designed for large scale ontology data annotated with spatio-temporal information. In order to evaluate the performance of our system, we conducted experiments using LUBM and BSBM data sets for ontology reasoning and query processing benchmark.

Identification of flutter derivatives from full-scale ambient vibration measurements of the Clifton Suspension Bridge

  • Nikitas, Nikolaos;Macdonald, John H.G.;Jakobsen, Jasna B.
    • Wind and Structures
    • /
    • v.14 no.3
    • /
    • pp.221-238
    • /
    • 2011
  • The estimated response of large-scale engineering structures to severe wind loads is prone to modelling uncertainties that can only ultimately be assessed by full-scale testing. To this end ambient vibration data from full-scale monitoring of the historic Clifton Suspension Bridge has been analysed using a combination of a frequency domain system identification method and a more elaborate stochastic identification technique. There is evidence of incipient coupling action between the first vertical and torsional modes in strong winds, providing unique full-scale data and making this an interesting case study. Flutter derivative estimation, which has rarely previously been attempted on full-scale data, was performed to provide deeper insight into the bridge aerodynamic behaviour, identifying trends towards flutter at higher wind speeds. It is shown that, as for other early suspension bridges with bluff cross-sections, single-degree-of-freedom flutter could potentially occur at wind speeds somewhat below requirements for modern designs. The analysis also demonstrates the viability of system identification techniques for extracting valuable results from full-scale data.

HOT GAS HALOS IN EARLY-TYPE GALAXIES AND ENVIRONMENTS

  • Kim, Eunbin;Choi, Yun-Young;Kim, Sungsoo S.
    • Journal of The Korean Astronomical Society
    • /
    • v.46 no.1
    • /
    • pp.33-40
    • /
    • 2013
  • We investigate the dependence of the extended X-ray emission from the halos of optically luminous early-type galaxies on the small-scale (the nearest neighbor distance) and large-scale (the average density inside the 20 nearest galaxies) environments. We cross-match the 3rd Data Release of the Second XMMNewton Serendipitous Source Catalog (2XMMi-DR3) to a volume-limited sample of the Sloan Digital Sky Survey (SDSS) Data Release 7 with $M_r$ < -19.5 and 0.020 < z < 0.085, and find 20 early-type galaxies that have extended X-ray detections. The X-ray luminosity of the galaxies is found to have a tighter correlation with the optical and near infrared luminosities when the galaxy is situated in the low large-scale density region than in the high large-scale density region. Furthermore, the X-ray to optical (r-band) luminosity ratio, $L_X/L_r$, shows a clear correlation with the distance to the nearest neighbor and with large-scale density environment only where the galaxies in pair interact hydrodynamically with seperations of $r_p$ < $r_{vir}$. These findings indicate that the galaxies in the high local density region have other mechanisms that are responsible for their halo X-ray luminosities than the current presence of a close encounter, or alternatively, in the high local density region the cooling time of the heated gas halo is longer than the typical time between the subsequent encounters.

Stress and strain behavior investigation on a scale model geotextile tube for Saemangeum dike project

  • Kim, Hyeong-Joo;Lee, Kwang-Hyung;Jo, Sung-Kyeong;Jamin, Jay C.
    • Ocean Systems Engineering
    • /
    • v.4 no.4
    • /
    • pp.309-325
    • /
    • 2014
  • Geotextile tubes are basically a huge sack filled with sand or dredged soil. Geotextile tubes are made of permeable woven or non-woven synthetic fibers (i.e., polyester or PET and polypropylene or PP). The geotextile tubes' performances in strength, dewatering, retaining solid particles and stacked stability have been studied extensively in the past. However, only little research has been done in the observation of the deformation behavior of geotextile tubes. In this paper, a large-scale apparatus for geotextile tube experiment is introduced. The apparatus is equipped with a slurry mixing station, pumping and delivery station, an observation station and a data station. For this study the large-scale apparatus was utilized in the studies regarding the stresses on the geotextile and the deformation behavior of the geotextile tube. Model tests were conducted using a custom-made woven geotextile tubes. Load cells placed at the inner belly of the geotextile tube to monitor the total soil pressure. Strain gauges were also placed on the outer skin of the tube to measure the geotextile strain. The pressure and strain sensors are attached to a data logger that sends the collected data to a desktop computer. The experiment results showed that the maximum geotextile strain occurs at the sides of the tube and the soil pressure distribution varies at each geotextile tube section.

Design and Implementation of Big Data Platform for Image Processing in Agriculture (농업 이미지 처리를 위한 빅테이터 플랫폼 설계 및 구현)

  • Nguyen, Van-Quyet;Nguyen, Sinh Ngoc;Vu, Duc Tiep;Kim, Kyungbaek
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.50-53
    • /
    • 2016
  • Image processing techniques play an increasingly important role in many aspects of our daily life. For example, it has been shown to improve agricultural productivity in a number of ways such as plant pest detecting or fruit grading. However, massive quantities of images generated in real-time through multi-devices such as remote sensors during monitoring plant growth lead to the challenges of big data. Meanwhile, most current image processing systems are designed for small-scale and local computation, and they do not scale well to handle big data problems with their large requirements for computational resources and storage. In this paper, we have proposed an IPABigData (Image Processing Algorithm BigData) platform which provides algorithms to support large-scale image processing in agriculture based on Hadoop framework. Hadoop provides a parallel computation model MapReduce and Hadoop distributed file system (HDFS) module. It can also handle parallel pipelines, which are frequently used in image processing. In our experiment, we show that our platform outperforms traditional system in a scenario of image segmentation.

A Workflow Execution System for Analyzing Large-scale Astronomy Data on Virtualized Computing Environments

  • Yu, Jung-Lok;Jin, Du-Seok;Yeo, Il-Yeon;Yoon, Hee-Jun
    • International Journal of Contents
    • /
    • v.16 no.4
    • /
    • pp.16-25
    • /
    • 2020
  • The size of observation data in astronomy has been increasing exponentially with the advents of wide-field optical telescopes. This means the needs of changes to the way used for large-scale astronomy data analysis. The complexity of analysis tools and the lack of extensibility of computing environments, however, lead to the difficulty and inefficiency of dealing with the huge observation data. To address this problem, this paper proposes a workflow execution system for analyzing large-scale astronomy data efficiently. The proposed system is composed of two parts: 1) a workflow execution manager and its RESTful endpoints that can automate and control data analysis tasks based on workflow templates and 2) an elastic resource manager as an underlying mechanism that can dynamically add/remove virtualized computing resources (i.e., virtual machines) according to the analysis requests. To realize our workflow execution system, we implement it on a testbed using OpenStack IaaS (Infrastructure as a Service) toolkit and HTCondor workload manager. We also exhaustively perform a broad range of experiments with different resource allocation patterns, system loads, etc. to show the effectiveness of the proposed system. The results show that the resource allocation mechanism works properly according to the number of queued and running tasks, resulting in improving resource utilization, and the workflow execution manager can handle more than 1,000 concurrent requests within a second with reasonable average response times. We finally describe a case study of data reduction system as an example application of our workflow execution system.

Application of satellite image data to management plan of large-scale irrigation projects

  • Torii, Kiyoshi;Yaota, Kiyoyuki;Hata, Takeshi;Abdelhadi, A.W.
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.1233-1235
    • /
    • 2003
  • We are studying large-scale irrigation projects of 100,000 ha to 1,000,000 ha in the world with emphasis on their historical courses of development, geographical conditions, water managements, planting conditions and landuse changes. Recent advancement in GIS/remote sensing techniques has enabled us to proceed studies in this field in more details. Here, we describe the results of analysis on distribution of irrigation canals, three-dimensional shape of the irrigation area and planting conditions of agricultural products investigated last year in Gezira irrigation project, Sudan, using Landsat 7 ETM data and DEM supplied by USGS.

  • PDF

A Dynamic Partitioning Scheme for Distributed Storage of Large-Scale RDF Data (대규모 RDF 데이터의 분산 저장을 위한 동적 분할 기법)

  • Kim, Cheon Jung;Kim, Ki Yeon;Yoo, Jong Hyeon;Lim, Jong Tae;Bok, Kyoung Soo;Yoo, Jae Soo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1126-1135
    • /
    • 2014
  • In recent years, RDF partitioning schemes have been studied for the effective distributed storage and management of large-scale RDF data. In this paper, we propose an RDF dynamic partitioning scheme to support load balancing in dynamic environments where the RDF data is continuously inserted and updated. The proposed scheme creates clusters and sub-clusters according to the frequency of the RDF data used by queries to set graph partitioning criteria. We partition the created clusters and sub-clusters by considering the workloads and data sizes for the servers. Therefore, we resolve the data concentration of a specific server, resulting from the continuous insertion and update of the RDF data, in such a way that the load is distributed among servers in dynamic environments. It is shown through performance evaluation that the proposed scheme significantly improves the query processing time over the existing scheme.

A FRAMEWORK FOR QUERY PROCESSING OVER HETEROGENEOUS LARGE SCALE SENSOR NETWORKS

  • Lee, Chung-Ho;Kim, Min-Soo;Lee, Yong-Joon
    • Proceedings of the KSRS Conference
    • /
    • 2007.10a
    • /
    • pp.101-104
    • /
    • 2007
  • Efficient Query processing and optimization are critical for reducing network traffic and decreasing latency of query when accessing and manipulating sensor data of large-scale sensor networks. Currently it has been studied in sensor database projects. These works have mainly focused on in-network query processing for sensor networks and assumes homogeneous sensor networks, where each sensor network has same hardware and software configuration. In this paper, we present a framework for efficient query processing over heterogeneous sensor networks. Our proposed framework introduces query processing paradigm considering two heterogeneous characteristics of sensor networks: (1) data dissemination approach such as push, pull, and hybrid; (2) query processing capability of sensor networks if they may support in-network aggregation, spatial, periodic and conditional operators. Additionally, we propose multi-query optimization strategies supporting cross-translation between data acquisition query and data stream query to minimize total cost of multiple queries. It has been implemented in WSN middleware, COSMOS, developed by ETRI.

  • PDF