• Title/Summary/Keyword: mixed data set

Search Result 150, Processing Time 0.022 seconds

Genetic Mixed Effects Models for Twin Survival Data

  • Ha, Il-Do;Noh, Maengseok;Yoon, Sangchul
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.759-771
    • /
    • 2005
  • Twin studies are one of the most widely used methods for quantifying the influence of genetic and environmental factors on some traits such as a life span or a disease. In this paper we propose a genetic mixed linear model for twin survival time data, which allows us to separate the genetic component from the environmental component. Inferences are based upon the hierarchical likelihood (h-likelihood), which provides a statistically efficient and simple unified framework for various random-effect models. We also propose a simple and fast computation method for analyzing a large data set on twin survival study. The new method is illustrated to the survival data in Swedish Twin Registry. A simulation study is carried out to evaluate the performance.

Development of a Core Set of Korean Soybean Landraces [Glycine max(L.) Merr.]

  • Cho, Gyu-Taek;Yoon, Mun-Sup;Lee, Jeong-Ran;Baek, Hyung-Jin;Kang, Jung-Hoon;Kim, Tae-San;Paek, Nam-Chon
    • Journal of Crop Science and Biotechnology
    • /
    • v.11 no.3
    • /
    • pp.157-162
    • /
    • 2008
  • A total of 2,765 accessions were used as the initial set having both seed coat color and 100-seed weight data. As a result of molecular profiling using six SSR markers followed by stratification based on their usages, 335 accessions(12.1%) were selected by clustering based on UPGMA. Since 75 out of 335 accessions were mixed in phenotypic traits as a result of characterization, 260 accessions were finally set as a core set. This core set revealed nearly the same diversity compared with the other results on morphological traits of Korean soybean landraces. In total, 115 alleles(19.2 alleles per locus) were detected in the initial set and 79 alleles(13.2 alleles per locus) were detected in the core set. All 30 major alleles were present in the initial set and in the core set as well. In allele coverage, the core set was 71.4% of the initial set. These comparisons of number of alleles, gene diversity and coverage indicated that the core set represented the entire set well.

  • PDF

Design and Implementation of Cyber Warfare Training Data Set Generation Method based on Traffic Distribution Plan (트래픽 유통계획 기반 사이버전 훈련데이터셋 생성방법 설계 및 구현)

  • Kim, Yong Hyun;Ahn, Myung Kil
    • Convergence Security Journal
    • /
    • v.20 no.4
    • /
    • pp.71-80
    • /
    • 2020
  • In order to provide realistic traffic to the cyber warfare training system, it is necessary to prepare a traffic distribution plan in advance and to create a training data set using normal/threat data sets. This paper presents the design and implementation results of a method for creating a traffic distribution plan and a training data set to provide background traffic like a real environment to a cyber warfare training system. We propose a method of a traffic distribution plan by using the network topology of the training environment to distribute traffic and the traffic attribute information collected in real and simulated environments. We propose a method of generating a training data set according to a traffic distribution plan using a unit traffic and a mixed traffic method using the ratio of the protocol. Using the implemented tool, a traffic distribution plan was created, and the training data set creation result according to the distribution plan was confirmed.

Statistical Method for Implementing the Experimenter Effect in the Analysis of Gene Expression Data

  • Kim, In-Young;Rha, Sun-Young;Kim, Byung-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.3
    • /
    • pp.701-718
    • /
    • 2006
  • In cancer microarray experiments, the experimenter or patient which is nested in each experimenter often shows quite heterogeneous error variability, which should be estimated for identifying a source of variation. Our study describes a Bayesian method which utilizes clinical information for identifying a set of DE genes for the class of subtypes as well as assesses and examines the experimenter effect and patient effect which is nested in each experimenter as a source of variation. We propose a Bayesian multilevel mixed effect model based on analysis of covariance (ANACOVA). The Bayesian multilevel mixed effect model is a combination of the multilevel mixed effect model and the Bayesian hierarchical model, which provides a flexible way of defining a suitable correlation structure among genes.

Bayesian modeling of random effects precision/covariance matrix in cumulative logit random effects models

  • Kim, Jiyeong;Sohn, Insuk;Lee, Keunbaik
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.1
    • /
    • pp.81-96
    • /
    • 2017
  • Cumulative logit random effects models are typically used to analyze longitudinal ordinal data. The random effects covariance matrix is used in the models to demonstrate both subject-specific and time variations. The covariance matrix may also be homogeneous; however, the structure of the covariance matrix is assumed to be homoscedastic and restricted because the matrix is high-dimensional and should be positive definite. To satisfy these restrictions two Cholesky decomposition methods were proposed in linear (mixed) models for the random effects precision matrix and the random effects covariance matrix, respectively: modified Cholesky and moving average Cholesky decompositions. In this paper, we use these two methods to model the random effects precision matrix and the random effects covariance matrix in cumulative logit random effects models for longitudinal ordinal data. The methods are illustrated by a lung cancer data set.

A Numerical Study on the Performance Analysis of the Mixed Flow Pump for FPSO (수치해석을 이용한 FPSO용 사류펌프 성능해석 연구)

  • Kang, Kyung-Won;Kim, Young-Hun;Kim, Young-Ju;Woo, Nam-Sub;Kwon, Jae-Ki;Yoon, Myung-O
    • The KSFM Journal of Fluid Machinery
    • /
    • v.14 no.5
    • /
    • pp.12-17
    • /
    • 2011
  • The seawater lift pump system is responsible for maintaining the open canal level to provide the suction flow of circulating water pump at the set point. The objective of this paper is to design a 2-stage mixed flow pump (for seawater lifting) by inverse design method and to evaluate the overall performance and the local flow fields of the pump by using a commercial CFD code. Rotating speed of the impeller is 1,750 rpm with the flow rate of 2,700 $m^3$/h. Finite volume method with structured mesh and realized k-${\varepsilon}$ turbulent model is used to guaranty more accurate prediction of turbulent flow in the pump impeller. The numerical results such as static head, brake horse power and efficiency of the mixed flow pump are compared with the design data. The simulated results are good agreement with the design data less 3% error.

Classification of Land Cover over the Korean Peninsula using MODIS Data (MODIS 자료를 이용한 한반도 지면피복 분류)

  • Kang, Jeon-Ho;Suh, Myoung-Seok;Kwak, Chong-Heum
    • Atmosphere
    • /
    • v.19 no.2
    • /
    • pp.169-182
    • /
    • 2009
  • To improve the performance of climate and numerical models, concerns on the land-atmosphere schemes are steadily increased in recent years. For the realistic calculation of land-atmosphere interaction, a land surface information of high quality is strongly required. In this study, a new land cover map over the Korean peninsula was developed using MODIS (MODerate resolution Imaging Spectroradiometer) data. The seven phenological data set (maximum, minimum, amplitude, average, growing period, growing and shedding rate) derived from 15-day normalized difference vegetation index (NDVI) were used as a basic input data. The ISOData (Iterative Self-Organizing Data Analysis), a kind of unsupervised non-hierarchical clustering method, was applied to the seven phenological data set. After the clustering, assignment of land cover type to the each cluster was performed according to the phenological characteristics of each land cover defined by USGS (US. Geological Survey). Most of the Korean peninsula are occupied by deciduous broadleaf forest (46.5%), mixed forest (15.6%), and dryland crop (13%). Whereas, the dominant land cover types are very diverse in South-Korea: evergreen needleleaf forest (29.9%), mixed forest (26.6%), deciduous broadleaf forest (16.2%), irrigated crop (12.6%), and dryland crop (10.7%). The 38 in-situ observation data-base over South-Korea, Environment Geographic Information System and Google-earth are used in the validation of the new land cover map. In general, the new land cover map over the Korean peninsula seems to be better classified compared to the USGS land cover map, especially for the Savanna in the USGS land cover map.

An Interconnection Model of ISP Networks (ISP 네트워크간 상호접속 모델)

  • Choi Eunjeong;Tcha Dong-Wan
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.30 no.4
    • /
    • pp.151-161
    • /
    • 2005
  • For Internet service providers (ISPs), there are three common types of interconnection agreements : private peering, public peering and transit. One of the most important problems for a single ISP is to determine which other ISPs to interconnect with, and under which agreements. The problem can be then to find a set of private peering providers, transit providers and Internet exchanges (IXs) when the following input data are assumed to be given : a set of BGP addresses with traffic demands, and a set of potential service providers (Private peering/transit providers and IXs) with routing information, cost functions and capacities. The objective is to minimize the total interconnection cost. We show that the problem is NP-hard, give a mixed-integer programming model, and propose a heuristic algorithm. Computational experience with a set of test instances shows the remarkable performance of the proposed algorithm of rapidly generating near-optimal solutions.

Bayesian Analysis for the Error Variance in a Two-Way Mixed-Effects ANOVA Model Using Noninformative Priors (무정보 사전분포를 이용한 이원배치 혼합효과 분산분석모형에서 오차분산에 대한 베이지안 분석)

  • 장인홍;김병휘
    • The Korean Journal of Applied Statistics
    • /
    • v.15 no.2
    • /
    • pp.405-414
    • /
    • 2002
  • We consider the problem of estimating the error variance of in a two-way mixed-effects ANOVA model using noninformative priors. First, we derive Jeffreys' prior, a reference prior, and matching priors. We then provide marginal posterior distributions under those noninformative priors. Finally, we provide graphs of marginal posterior densities of the error variance and credible intervals for the error variance in two real data set and compare these credible intervals.

Charges of TIP4P water model for mixed quantum/classical calculations of OH stretching frequency in liquid water

  • Jeon, Kiyoung;Yang, Mino
    • Rapid Communication in Photoscience
    • /
    • v.5 no.1
    • /
    • pp.8-10
    • /
    • 2016
  • The potential curves of OH bonds of liquid water are inhomogeneous because of a variety of interactions with other molecules and this leads to a wide distribution of vibrational frequency which hampers our understanding of the structure and dynamics of water molecules. Mixed quantum/classical (QM/CM) calculation methods are powerful theoretical techniques to help us analyze experimental data of various vibrational spectroscopies to study such inhomogeneous systems. In a type of those approaches, the interaction energy between OH bonds and other molecules is approximately represented by the interaction between the charges located at the appropriate interaction sites of water molecules. For this purpose, we re-calculated the values of charges by comparing the approximate interaction energies with quantum chemical interaction energies. We determined a set of charges at the TIP4P charge sites which better represents the quantum mechanical potential curve of OH bonds of liquid water.