• 제목/요약/키워드: statistical approach

검색결과 2,335건 처리시간 0.036초

A maximum likelihood approach to infer demographic models

  • Chung, Yujin
    • Communications for Statistical Applications and Methods
    • /
    • 제27권3호
    • /
    • pp.385-395
    • /
    • 2020
  • We present a new maximum likelihood approach to estimate demographic history using genomic data sampled from two populations. A demographic model such as an isolation-with-migration (IM) model explains the genetic divergence of two populations split away from their common ancestral population. The standard probability model for an IM model contains a latent variable called genealogy that represents gene-specific evolutionary paths and links the genetic data to the IM model. Under an IM model, a genealogy consists of two kinds of evolutionary paths of genetic data: vertical inheritance paths (coalescent events) through generations and horizontal paths (migration events) between populations. The computational complexity of the IM model inference is one of the major limitations to analyze genomic data. We propose a fast maximum likelihood approach to estimate IM models from genomic data. The first step analyzes genomic data and maximizes the likelihood of a coalescent tree that contains vertical paths of genealogy. The second step analyzes the estimated coalescent trees and finds the parameter values of an IM model, which maximizes the distribution of the coalescent trees after taking account of possible migration events. We evaluate the performance of the new method by analyses of simulated data and genomic data from two subspecies of common chimpanzees in Africa.

서포트벡터 기계를 이용한 이상치 진단 (Outlier Detection Using Support Vector Machines)

  • 서한손;윤민
    • Communications for Statistical Applications and Methods
    • /
    • 제18권2호
    • /
    • pp.171-177
    • /
    • 2011
  • 실생활에서 얻어지는 자료에서 근사함수를 구성하기 위하여 모델링을 하기 전에 측정된 원자료로부터 이상치를 제거하는 것이 필요하다. 기존의 이상치 진단의 방법들은 시각화나 최대 잔차들을 이용해왔다. 그러나 종종 다차원의 입력자료를 가지는 비선형함수에 대한 이상치 진단은 좋지 않은 결과를 얻었다. 다차원 입력자료를 갖는 비선형함수에 대한 전형적인서포트 벡터 회귀에 기초한 이상치 진단방법들은 좋은 수행능력을 얻어지지만, 계산비용이나 모수들의 보정 등의 실질적인 문제점들을 가지고 있다. 본 논문에서 계산비용을 감소하고 이상치의 문턱을 적절히 정의하는 서포트 벡터회귀를 이용한 이상치 진단의 실질적인방법을 제안한다. 제안한 방법을 실제자료들에 적용하여 타당성을 보일 것이다.

Bayesian Variable Selection in the Proportional Hazard Model with Application to Microarray Data

  • Lee, Kyeong-Eun;Mallick, Bani K.
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 춘계 학술발표회 논문집
    • /
    • pp.17-23
    • /
    • 2005
  • In this paper we consider the well-known semiparametric proportional hazards models for survival analysis. These models are usually used with few covariates and many observations (subjects). But, for a typical setting of gene expression data from DNA microarray, we need to consider the case where the number of covariates p exceeds the number of samples n. For a given vector of response values which are times to event (death or censored times) and p gene expressions(covariates), we address the issue of how to reduce the dimension by selecting the significant genes. This approach enables us to estimate the survival curve when n ${\ll}$p. In our approach, rather than fixing the number of selected genes, we will assign a prior distribution to this number. The approach creates additional flexibility by allowing the imposition of constraints, such as bounding the dimension via a prior, which in effect works as a penalty To implement our methodology, we use a Markov Chain Monte Carlo (MCMC) method. We demonstrate the use of the methodology to diffuse large B-cell lymphoma (DLBCL) complementary DNA (cDNA) data and Breast Carcinomas data.

  • PDF

Identifying differentially expressed genes using the Polya urn scheme

  • Saraiva, Erlandson Ferreira;Suzuki, Adriano Kamimura;Milan, Luis Aparecido
    • Communications for Statistical Applications and Methods
    • /
    • 제24권6호
    • /
    • pp.627-640
    • /
    • 2017
  • A common interest in gene expression data analysis is to identify genes that present significant changes in expression levels among biological experimental conditions. In this paper, we develop a Bayesian approach to make a gene-by-gene comparison in the case with a control and more than one treatment experimental condition. The proposed approach is within a Bayesian framework with a Dirichlet process prior. The comparison procedure is based on a model selection procedure developed using the discreteness of the Dirichlet process and its representation via Polya urn scheme. The posterior probabilities for models considered are calculated using a Gibbs sampling algorithm. A numerical simulation study is conducted to understand and compare the performance of the proposed method in relation to usual methods based on analysis of variance (ANOVA) followed by a Tukey test. The comparison among methods is made in terms of a true positive rate and false discovery rate. We find that proposed method outperforms the other methods based on ANOVA followed by a Tukey test. We also apply the methodologies to a publicly available data set on Plasmodium falciparum protein.

공공사업으로 인한 어업피해 범위와 피해정도 추정의 새로운 통계학적 접근 (A New Statistical Approach for the Estimation of Range and Degree of Fisheries Damages Caused by Public Undertaking)

  • 강용주;김기수;장창익;박청길;이종섭
    • 수산경영론집
    • /
    • 제35권1호
    • /
    • pp.117-132
    • /
    • 2004
  • This study attempts to suggest a new approach of the estimation of range and degree of fisheries damages caused by a large scale of reclamation undertaken in coastal area using the central limit theorem(CLT) in statistics. The key result of the study is the introduction of the new concept of critical variation of environmental factor($d_{c}$). The study defines $d_{c}$ as a standard deviation of the sample mean($\bar{X}$) of environmental factor(X), in other words, $\frac{\sigma}{ \sqrt{n}}$. The inner bound of $d_{c}$ could be the area of fisheries damages caused by public coastal undertaking. The study also defines the decreasing rate of fisheries production$\delta_{\varepsilon}$, in other words, degree of fisheries damages, as the rate of change in the distribution of sample mean(($\bar{X}$), caused by the continuous and constant variation of environmental factor. Therefore $\delta_{\varepsilon}$ can be easily calculated by the use of table of the standardized normal distribution.

  • PDF

Ranking subjects based on paired compositional data with application to age-related hearing loss subtyping

  • Nam, Jin Hyun;Khatiwada, Aastha;Matthews, Lois J.;Schulte, Bradley A.;Dubno, Judy R.;Chung, Dongjun
    • Communications for Statistical Applications and Methods
    • /
    • 제27권2호
    • /
    • pp.225-239
    • /
    • 2020
  • Analysis approaches for single compositional data are well established; however, effective analysis strategies for paired compositional data remain to be investigated. The current project was motivated by studies of age-related hearing loss (presbyacusis), where subjects are classified into four audiometric phenotypes that need to be ranked within these phenotypes based on their paired compositional data. We address this challenge by formulating this problem as a classification problem and integrating a penalized multinomial logistic regression model with compositional data analysis approaches. We utilize Elastic Net for a penalty function, while considering average, absolute difference, and perturbation operators for compositional data. We applied the proposed approach to the presbyacusis study of 532 subjects with probabilities that each ear of a subject belongs to each of four presbyacusis subtypes. We further investigated the ranking of presbyacusis subjects using the proposed approach based on previous literature. The data analysis results indicate that the proposed approach is effective for ranking subjects based on paired compositional data.

개념 설계 단계에서 인공 신경망과 통계적 분석을 이용한 제품군의 근사적 전과정 평가 (Approximate Life Cycle Assessment of Classified Products using Artificial Neural Network and Statistical Analysis in Conceptual Product Design)

  • 박지형;서광규
    • 한국정밀공학회지
    • /
    • 제20권3호
    • /
    • pp.221-229
    • /
    • 2003
  • In the early phases of the product life cycle, Life Cycle Assessment (LCA) is recently used to support the decision-making fer the conceptual product design and the best alternative can be selected based on its estimated LCA and its benefits. Both the lack of detailed information and time for a full LCA fur a various range of design concepts need the new approach fer the environmental analysis. This paper suggests a novel approximate LCA methodology for the conceptual design stage by grouping products according to their environmental characteristics and by mapping product attributes into impact driver index. The relationship is statistically verified by exploring the correlation between total impact indicator and energy impact category. Then a neural network approach is developed to predict an approximate LCA of grouping products in conceptual design. Trained learning algorithms for the known characteristics of existing products will quickly give the result of LCA for new design products. The training is generalized by using product attributes for an ID in a group as well as another product attributes for another IDs in other groups. The neural network model with back propagation algorithm is used and the results are compared with those of multiple regression analysis. The proposed approach does not replace the full LCA but it would give some useful guidelines fer the design of environmentally conscious products in conceptual design phase.

A response surface modelling approach for multi-objective optimization of composite plates

  • Kalita, Kanak;Dey, Partha;Joshi, Milan;Haldar, Salil
    • Steel and Composite Structures
    • /
    • 제32권4호
    • /
    • pp.455-466
    • /
    • 2019
  • Despite the rapid advancement in computing resources, many real-life design and optimization problems in structural engineering involve huge computation costs. To counter such challenges, approximate models are often used as surrogates for the highly accurate but time intensive finite element models. In this paper, surrogates for first-order shear deformation based finite element models are built using a polynomial regression approach. Using statistical techniques like Box-Cox transformation and ANOVA, the effectiveness of the surrogates is enhanced. The accuracy of the surrogate models is evaluated using statistical metrics like $R^2$, $R^2{_{adj}}$, $R^2{_{pred}}$ and $Q^2{_{F3}}$. By combining these surrogates with nature-inspired multi-criteria decision-making algorithms, namely multi-objective genetic algorithm (MOGA) and multi-objective particle swarm optimization (MOPSO), the optimal combination of various design variables to simultaneously maximize fundamental frequency and frequency separation is predicted. It is seen that the proposed approach is simple, effective and good at inexpensively producing a host of optimal solutions.

A tutorial on generalizing the default Bayesian t-test via posterior sampling and encompassing priors

  • Faulkenberry, Thomas J.
    • Communications for Statistical Applications and Methods
    • /
    • 제26권2호
    • /
    • pp.217-238
    • /
    • 2019
  • With the advent of so-called "default" Bayesian hypothesis tests, scientists in applied fields have gained access to a powerful and principled method for testing hypotheses. However, such default tests usually come with a compromise, requiring the analyst to accept a one-size-fits-all approach to hypothesis testing. Further, such tests may not have the flexibility to test problems the scientist really cares about. In this tutorial, I demonstrate a flexible approach to generalizing one specific default test (the JZS t-test) (Rouder et al., Psychonomic Bulletin & Review, 16, 225-237, 2009) that is becoming increasingly popular in the social and behavioral sciences. The approach uses two results, the Savage-Dickey density ratio (Dickey and Lientz, 1980) and the technique of encompassing priors (Klugkist et al., Statistica Neerlandica, 59, 57-69, 2005) in combination with MCMC sampling via an easy-to-use probabilistic modeling package for R called Greta. Through a comprehensive mathematical description of the techniques as well as illustrative examples, the reader is presented with a general, flexible workflow that can be extended to solve problems relevant to his or her own work.

Optimization sensor placement of marine platforms using modified ECOMAC approach

  • Vosoughifar, Hamidreza;Yaghoubi, Ali;Khorani, Milad;Biranvand, Pooya;Hosseininejad, Seyedehzeinab
    • Earthquakes and Structures
    • /
    • 제21권6호
    • /
    • pp.587-599
    • /
    • 2021
  • The modified-ECOMAC approach to monitor and investigate health of structure in marine platforms was evaluated in this research. The material properties of structure were defined based on the real platform located in Persian Gulf. The nonlinear time-history analyses were undertaken using the marine natural waves. The modified-ECOMAC approach was designed to act as the solution of the best sensor placement according to structural dynamic behavior of structure. This novel method uses nonlinear time-history analysis results as an exact seismic response despite the common COMAC algorithms utilize the eigenvalue responses. The processes of modified-ECOMAC criteria were designed and developed by author of this paper as a toolbox of Matlab. The Results show that utilizing an efficient ECOMAC method in SHM process leads to detecting the critical weak points of sensitive marine platforms to make better decision about them. The statistical results indicate that considering modified ECOMAC based on seismic waves analysis has an acceptable accuracy on identify the sensor location. The average of statistical comparison of COMAC and ECOMAC via modal and integrated analysis, had a high MAE of 0.052 and RSME of 0.057 and small R2 of 0.504, so there is significant difference between them.