• 제목/요약/키워드: Bayesian and non-parametric methods

검색결과 10건 처리시간 0.024초

How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review

  • Oh, Sunghee
    • Communications for Statistical Applications and Methods
    • /
    • 제22권2호
    • /
    • pp.181-199
    • /
    • 2015
  • In a short history, RNA-seq data have established a revolutionary tool to directly decode various scenarios occurring on whole genome-wide expression profiles in regards with differential expression at gene, transcript, isoform, and exon specific quantification, genetic and genomic mutations, and etc. RNA-seq technique has been rapidly replacing arrays with seq-based platform experimental settings by revealing a couple of advantages such as identification of alternative splicing and allelic specific expression. The remarkable characteristics of high-throughput large-scale expression profile in RNA-seq are lied on expression levels of read counts, structure of correlated samples and genes, larger number of genes compared to sample size, different sampling rates, inevitable systematic RNA-seq biases, and etc. In this study, we will comprehensively review how robust Bayesian and non-parametric methods have a better performance than classical statistical approaches by explicitly incorporating such intrinsic RNA-seq specific features with flexible and more appropriate assumptions and distributions in practice.

모수, 비모수, 베이지안 출산율 모형을 활용한 합계출산율 예측과 비교 (A comparison and prediction of total fertility rate using parametric, non-parametric, and Bayesian model)

  • 오진호
    • 응용통계연구
    • /
    • 제31권6호
    • /
    • pp.677-692
    • /
    • 2018
  • 최근 2017년 우리나라 합계출산율은 1.05명로 2005년 1.08명 수준으로 회귀하는 현상을 보이고 있다. 1.05명은 인구대체선(2.1명), 안전선(1.5명)과도 거리가 먼 초저출산 수준이고 마치 초저출산 덫에 빠질 우려가 있다. 이에 합계출산율의 합리적인 예측과 이를 통한 출산정책에 유용한 자료를 제공하는 것은 그 어느 때 보다도 중요하다. 그 동안 다양한 통계적 방법으로 합계출산율 추이를 예측하였는데, 데이터 완비성이 높고 품질이 좋은 경우 모형 접근인 모수적 방법, 데이터 추이가 단절되거나 변동이 심한 경우 평활과 가중치를 적용한 비모수적 방법, 데이터 부족과 품질 등으로 선진국의 출산율 3단계 전이현상을 참고하여 이들의 사전분포를 활용하는 베이지안 방법 등이 적용되어 왔다. 본 연구는 최근 변동이 심한 우리나라 출산율에 모수, 비모수, 그리고 베이지안 방법을 적용하여 추정과 예측을 실시하고 도출된 결과 비교를 통해 적합성과 타당성 측면에서 어떤 방법이 합리적인지 모색하고자 한다. 분석결과 합계출산율 예측값 순위는 통계청 합계출산율이 가장 높고, 베이지안, 모수, 비모수 순으로 나타났다. 2017년 TFR 1.05명 수준을 감안할 때 모수, 비모수모형으로 도출된 합계출산율 예측값이 합리적이다. 또한 출산율 자료완비성이 높고 품질이 우수할 경우 계산 효율성과 적합도 관점에서 모수적 추정과 예측 접근 방법이 타 방법보다 우수한 것으로 도출되었다.

품질경영학회 50주년 특별호: 통계적 기법 분야 연구 리뷰 (Literature Review on the Statistical Methods in KSQM for 50 Years)

  • 임용빈;김상익;이상복;장대흥
    • 품질경영학회지
    • /
    • 제44권2호
    • /
    • pp.221-244
    • /
    • 2016
  • Purpose: This research reviews the papers, published in the Journal of the Korean Society for Quality Control (KSQC) and the Journal of the Korean Society for Quality Management (KSQM) since 1965, in the area of statistical methods. The literature review is performed in the four fields of the statistical methods and we categorize the published articles into the several sub-areas in each field. Methods: The reviewed articles are classified into the four main categories: probability model and estimation, Bayesian analysis and non-parametric analysis, regression and time series analysis, and application of data analysis. We examine the contents and relationships of the published articles of the several sub-areas in each category. Results: We summarize the reviewed papers in the chronological road-maps for each sub-area, and outline the relations of the connected papers. Some comments on the contents and the contributions of the reviewed papers are also provided in this paper. Conclusion: Various issues are employed and published on the research of the application statistical methods for past 50 years, and many worthy works are achieved in the theory and application areas of statistical methods for improving quality in the manufacturing and service industries. The future direction of the research in the statistical quality management methods also can be explored by the contents of this research.

Determinacy on a Maximum Resolution in Wavelet Series

  • Park, Chun-Gun;Kim, Yeong-Hwa;Yang, Wan-Youn
    • Journal of the Korean Data and Information Science Society
    • /
    • 제15권2호
    • /
    • pp.467-476
    • /
    • 2004
  • Recently, an approximation of a wavelet series has been developed in the analyses of an unknown function. Most of articles have been studied on thresholding and shrinkage methods for its wavelet coefficients based on (non)parametric and Bayesian methods when the sample size is considered as a maximum resolution in wavelet series. In this paper, regardless of the sample size, we are focusing only on the choice of a maximum resolution in wavelet series. We propose a Bayesian approach to the choice of a maximum resolution based on the linear combination of the wavelet basis functions.

  • PDF

Inverted exponentiated Weibull distribution with applications to lifetime data

  • Lee, Seunghyung;Noh, Yunhwan;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • 제24권3호
    • /
    • pp.227-240
    • /
    • 2017
  • In this paper, we introduce the inverted exponentiated Weibull (IEW) distribution which contains exponentiated inverted Weibull distribution, inverse Weibull (IW) distribution, and inverted exponentiated distribution as submodels. The proposed distribution is obtained by the inverse form of the exponentiated Weibull distribution. In particular, we explain that the proposed distribution can be interpreted by Marshall and Olkin's book (Lifetime Distributions: Structure of Non-parametric, Semiparametric, and Parametric Families, 2007, Springer) idea. We derive the cumulative distribution function and hazard function and calculate expression for its moment. The hazard function of the IEW distribution can be decreasing, increasing or bathtub-shaped. The maximum likelihood estimation (MLE) is obtained. Then we show the existence and uniqueness of MLE. We can also obtain the Bayesian estimation by using the Gibbs sampler with the Metropolis-Hastings algorithm. We also give applications with a simulated data set and two real data set to show the flexibility of the IEW distribution. Finally, conclusions are mentioned.

Bayesian quantile regression analysis of Korean Jeonse deposit

  • Nam, Eun Jung;Lee, Eun Kyung;Oh, Man-Suk
    • Communications for Statistical Applications and Methods
    • /
    • 제25권5호
    • /
    • pp.489-499
    • /
    • 2018
  • Jeonse is a unique property rental system in Korea in which a tenant pays a part of the price of a leased property as a fixed amount security deposit and gets back the entire deposit when the tenant moves out at the end of the tenancy. Jeonse deposit is very important in the Korean real estate market since it is directly related to the residential property sales price and it is a key indicator to predict future real estate market trend. Jeonse deposit data shows a skewed and heteroscedastic distribution and the commonly used mean regression model may be inappropriate for the analysis of Jeonse deposit data. In this paper, we apply a Bayesian quantile regression model to analyze Jeonse deposit data, which is non-parametric and does not require any distributional assumptions. Analysis results show that the quantile regression coefficients of most explanatory variables change dramatically for different quantiles. The regression coefficients of some variables have different signs for different quantiles, implying that even the same variable may affect the Jeonse deposit in the opposite direction depending on the amount of deposit.

Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics

  • Chen, YongHeng;Zhang, Fuquan;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제12권1호
    • /
    • pp.392-412
    • /
    • 2018
  • Due to the semantic gap problem across different modalities, automatically retrieval from multimedia information still faces a main challenge. It is desirable to provide an effective joint model to bridge the gap and organize the relationships between them. In this work, we develop a deep image annotation and classification by fusing multi-modal semantic topics (DAC_mmst) model, which has the capacity for finding visual and non-visual topics by jointly modeling the image and loosely related text for deep image annotation while simultaneously learning and predicting the class label. More specifically, DAC_mmst depends on a non-parametric Bayesian model for estimating the best number of visual topics that can perfectly explain the image. To evaluate the effectiveness of our proposed algorithm, we collect a real-world dataset to conduct various experiments. The experimental results show our proposed DAC_mmst performs favorably in perplexity, image annotation and classification accuracy, comparing to several state-of-the-art methods.

Phrase-based Topic and Sentiment Detection and Tracking Model using Incremental HDP

  • Chen, YongHeng;Lin, YaoJin;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권12호
    • /
    • pp.5905-5926
    • /
    • 2017
  • Sentiments can profoundly affect individual behavior as well as decision-making. Confronted with the ever-increasing amount of review information available online, it is desirable to provide an effective sentiment model to both detect and organize the available information to improve understanding, and to present the information in a more constructive way for consumers. This study developed a unified phrase-based topic and sentiment detection model, combined with a tracking model using incremental hierarchical dirichlet allocation (PTSM_IHDP). This model was proposed to discover the evolutionary trend of topic-based sentiments from online reviews. PTSM_IHDP model firstly assumed that each review document has been composed by a series of independent phrases, which can be represented as both topic information and sentiment information. PTSM_IHDP model secondly depended on an improved time-dependency non-parametric Bayesian model, integrating incremental hierarchical dirichlet allocation, to estimate the optimal number of topics by incrementally building an up-to-date model. To evaluate the effectiveness of our model, we tested our model on a collected dataset, and compared the result with the predictions of traditional models. The results demonstrate the effectiveness and advantages of our model compared to several state-of-the-art methods.

장기 조위자료를 이용한 한반도 권역별 미래 해수면 상승 추정 (Estimation of the Regional Future Sea Level Rise Using Long-term Tidal Data in the Korean Peninsula)

  • 이철응;김상욱;이영섭
    • 한국수자원학회논문집
    • /
    • 제47권9호
    • /
    • pp.753-766
    • /
    • 2014
  • 본 논문에서는 기후변화로 인한 한반도 주요 권역에서의 미래 평균해수면 상승을 장기 조위자료를 사용하여 통계적으로 추정하는 연구를 수행하였다. 먼저 5개 조위 관측소로부터 얻어진 장기 조위자료에 대한 비모수적 경향성 검정인 Mann-Kendall 검정을 통해 관측된 자료의 경향성을 검정하였으며, 이를 보다 정량적으로 분석하기 위하여 Bayesian 변동점 분석 기법을 적용하였다. 특히 이 연구에서는 4개의 미래 평균해수면 상승 시나리오와 5개 관측소의 지역별 평균해수면 상승 자료를 결합시키기 위하여 변동점 분석결과를 활용하였다. 제안된 절차는 미래 평균해수면 상승 시나리오의 시작년도를 결정함에 있어 18.6년의 주기를 사용하지 않고 변동점 분석결과를 사용함으로써, 지역적 특성을 효과적으로 반영할 수 있도록 개선되었다. 변동점 분석결과를 사용하여 한반도의 권역별 미래 해수면상승을 분석한 결과, 제주 권역(제주 조위관측소)이 가장 뚜렷한 해수면 상승을 나타냈다. 서해안 권역(보령 조위관측소)과 남해안 권역(부산 조위관측소)에서는 두 번째로 높은 해수면 상승의 증가가 추정되었으며, 마지막으로 남해안 권역(여수 조위관측소)와 동해안 권역(속초 조위관측소)에서 가장 낮은 해수면 상승의 증가가 추정되었다.

정규 확률과정을 사용한 공조 시스템의 전력 소모량 예측에 관한 연구 (A Study on the Prediction of Power Consumption in the Air-Conditioning System by Using the Gaussian Process)

  • 이창용;송근수;김진호
    • 산업경영시스템학회지
    • /
    • 제39권1호
    • /
    • pp.64-72
    • /
    • 2016
  • In this paper, we utilize a Gaussian process to predict the power consumption in the air-conditioning system. As the power consumption in the air-conditioning system takes a form of a time-series and the prediction of the power consumption becomes very important from the perspective of the efficient energy management, it is worth to investigate the time-series model for the prediction of the power consumption. To this end, we apply the Gaussian process to predict the power consumption, in which the Gaussian process provides a prior probability to every possible function and higher probabilities are given to functions that are more likely consistent with the empirical data. We also discuss how to estimate the hyper-parameters, which are parameters in the covariance function of the Gaussian process model. We estimated the hyper-parameters with two different methods (marginal likelihood and leave-one-out cross validation) and obtained a model that pertinently describes the data and the results are more or less independent of the estimation method of hyper-parameters. We validated the prediction results by the error analysis of the mean relative error and the mean absolute error. The mean relative error analysis showed that about 3.4% of the predicted value came from the error, and the mean absolute error analysis confirmed that the error in within the standard deviation of the predicted value. We also adopt the non-parametric Wilcoxon's sign-rank test to assess the fitness of the proposed model and found that the null hypothesis of uniformity was accepted under the significance level of 5%. These results can be applied to a more elaborate control of the power consumption in the air-conditioning system.