• Title/Summary/Keyword: Bayesian and non-parametric methods

Search Result 10, Processing Time 0.03 seconds

How are Bayesian and Non-Parametric Methods Doing a Great Job in RNA-Seq Differential Expression Analysis? : A Review

  • Oh, Sunghee
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.181-199
    • /
    • 2015
  • In a short history, RNA-seq data have established a revolutionary tool to directly decode various scenarios occurring on whole genome-wide expression profiles in regards with differential expression at gene, transcript, isoform, and exon specific quantification, genetic and genomic mutations, and etc. RNA-seq technique has been rapidly replacing arrays with seq-based platform experimental settings by revealing a couple of advantages such as identification of alternative splicing and allelic specific expression. The remarkable characteristics of high-throughput large-scale expression profile in RNA-seq are lied on expression levels of read counts, structure of correlated samples and genes, larger number of genes compared to sample size, different sampling rates, inevitable systematic RNA-seq biases, and etc. In this study, we will comprehensively review how robust Bayesian and non-parametric methods have a better performance than classical statistical approaches by explicitly incorporating such intrinsic RNA-seq specific features with flexible and more appropriate assumptions and distributions in practice.

A comparison and prediction of total fertility rate using parametric, non-parametric, and Bayesian model (모수, 비모수, 베이지안 출산율 모형을 활용한 합계출산율 예측과 비교)

  • Oh, Jinho
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.677-692
    • /
    • 2018
  • The total fertility rate of Korea was 1.05 in 2017, showing a return to the 1.08 level in the year 2005. 1.05 is a very low fertility level that is far from replacement level fertility or safety zone 1.5. The number may indicate a low fertility trap. It is therefore important to predict fertility than at any other time. In the meantime, we have predicted the age-specific fertility rate and total fertility rate by various statistical methods. When the data trend is disconnected or fluctuating, it applied a nonparametric method applying the smoothness and weight. In addition, the Bayesian method of using the pre-distribution of fertility rates in advanced countries with reference to the three-stage transition phenomenon have been applied. This paper examines which method is reasonable in terms of precision and feasibility by applying estimation, forecasting, and comparing the results of the recent variability of the Korean fertility rate with parametric, non-parametric and Bayesian methods. The results of the analysis showed that the total fertility rate was in the order of KOSTAT's total fertility rate, Bayesian, parametric and non-parametric method outcomes. Given the level of TFR 1.05 in 2017, the predicted total fertility rate derived from the parametric and nonparametric models is most reasonable. In addition, if a fertility rate data is highly complete and a quality is good, the parametric model approach is superior to other methods in terms of parameter estimation, calculation efficiency and goodness-of-fit.

Literature Review on the Statistical Methods in KSQM for 50 Years (품질경영학회 50주년 특별호: 통계적 기법 분야 연구 리뷰)

  • Lim, Yong Bin;Kim, Sang Ik;Lee, Sang Bok;Jang, Dae Heung
    • Journal of Korean Society for Quality Management
    • /
    • v.44 no.2
    • /
    • pp.221-244
    • /
    • 2016
  • Purpose: This research reviews the papers, published in the Journal of the Korean Society for Quality Control (KSQC) and the Journal of the Korean Society for Quality Management (KSQM) since 1965, in the area of statistical methods. The literature review is performed in the four fields of the statistical methods and we categorize the published articles into the several sub-areas in each field. Methods: The reviewed articles are classified into the four main categories: probability model and estimation, Bayesian analysis and non-parametric analysis, regression and time series analysis, and application of data analysis. We examine the contents and relationships of the published articles of the several sub-areas in each category. Results: We summarize the reviewed papers in the chronological road-maps for each sub-area, and outline the relations of the connected papers. Some comments on the contents and the contributions of the reviewed papers are also provided in this paper. Conclusion: Various issues are employed and published on the research of the application statistical methods for past 50 years, and many worthy works are achieved in the theory and application areas of statistical methods for improving quality in the manufacturing and service industries. The future direction of the research in the statistical quality management methods also can be explored by the contents of this research.

Determinacy on a Maximum Resolution in Wavelet Series

  • Park, Chun-Gun;Kim, Yeong-Hwa;Yang, Wan-Youn
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.2
    • /
    • pp.467-476
    • /
    • 2004
  • Recently, an approximation of a wavelet series has been developed in the analyses of an unknown function. Most of articles have been studied on thresholding and shrinkage methods for its wavelet coefficients based on (non)parametric and Bayesian methods when the sample size is considered as a maximum resolution in wavelet series. In this paper, regardless of the sample size, we are focusing only on the choice of a maximum resolution in wavelet series. We propose a Bayesian approach to the choice of a maximum resolution based on the linear combination of the wavelet basis functions.

  • PDF

Inverted exponentiated Weibull distribution with applications to lifetime data

  • Lee, Seunghyung;Noh, Yunhwan;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.3
    • /
    • pp.227-240
    • /
    • 2017
  • In this paper, we introduce the inverted exponentiated Weibull (IEW) distribution which contains exponentiated inverted Weibull distribution, inverse Weibull (IW) distribution, and inverted exponentiated distribution as submodels. The proposed distribution is obtained by the inverse form of the exponentiated Weibull distribution. In particular, we explain that the proposed distribution can be interpreted by Marshall and Olkin's book (Lifetime Distributions: Structure of Non-parametric, Semiparametric, and Parametric Families, 2007, Springer) idea. We derive the cumulative distribution function and hazard function and calculate expression for its moment. The hazard function of the IEW distribution can be decreasing, increasing or bathtub-shaped. The maximum likelihood estimation (MLE) is obtained. Then we show the existence and uniqueness of MLE. We can also obtain the Bayesian estimation by using the Gibbs sampler with the Metropolis-Hastings algorithm. We also give applications with a simulated data set and two real data set to show the flexibility of the IEW distribution. Finally, conclusions are mentioned.

Bayesian quantile regression analysis of Korean Jeonse deposit

  • Nam, Eun Jung;Lee, Eun Kyung;Oh, Man-Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.489-499
    • /
    • 2018
  • Jeonse is a unique property rental system in Korea in which a tenant pays a part of the price of a leased property as a fixed amount security deposit and gets back the entire deposit when the tenant moves out at the end of the tenancy. Jeonse deposit is very important in the Korean real estate market since it is directly related to the residential property sales price and it is a key indicator to predict future real estate market trend. Jeonse deposit data shows a skewed and heteroscedastic distribution and the commonly used mean regression model may be inappropriate for the analysis of Jeonse deposit data. In this paper, we apply a Bayesian quantile regression model to analyze Jeonse deposit data, which is non-parametric and does not require any distributional assumptions. Analysis results show that the quantile regression coefficients of most explanatory variables change dramatically for different quantiles. The regression coefficients of some variables have different signs for different quantiles, implying that even the same variable may affect the Jeonse deposit in the opposite direction depending on the amount of deposit.

Deep Image Annotation and Classification by Fusing Multi-Modal Semantic Topics

  • Chen, YongHeng;Zhang, Fuquan;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.1
    • /
    • pp.392-412
    • /
    • 2018
  • Due to the semantic gap problem across different modalities, automatically retrieval from multimedia information still faces a main challenge. It is desirable to provide an effective joint model to bridge the gap and organize the relationships between them. In this work, we develop a deep image annotation and classification by fusing multi-modal semantic topics (DAC_mmst) model, which has the capacity for finding visual and non-visual topics by jointly modeling the image and loosely related text for deep image annotation while simultaneously learning and predicting the class label. More specifically, DAC_mmst depends on a non-parametric Bayesian model for estimating the best number of visual topics that can perfectly explain the image. To evaluate the effectiveness of our proposed algorithm, we collect a real-world dataset to conduct various experiments. The experimental results show our proposed DAC_mmst performs favorably in perplexity, image annotation and classification accuracy, comparing to several state-of-the-art methods.

Phrase-based Topic and Sentiment Detection and Tracking Model using Incremental HDP

  • Chen, YongHeng;Lin, YaoJin;Zuo, WanLi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.12
    • /
    • pp.5905-5926
    • /
    • 2017
  • Sentiments can profoundly affect individual behavior as well as decision-making. Confronted with the ever-increasing amount of review information available online, it is desirable to provide an effective sentiment model to both detect and organize the available information to improve understanding, and to present the information in a more constructive way for consumers. This study developed a unified phrase-based topic and sentiment detection model, combined with a tracking model using incremental hierarchical dirichlet allocation (PTSM_IHDP). This model was proposed to discover the evolutionary trend of topic-based sentiments from online reviews. PTSM_IHDP model firstly assumed that each review document has been composed by a series of independent phrases, which can be represented as both topic information and sentiment information. PTSM_IHDP model secondly depended on an improved time-dependency non-parametric Bayesian model, integrating incremental hierarchical dirichlet allocation, to estimate the optimal number of topics by incrementally building an up-to-date model. To evaluate the effectiveness of our model, we tested our model on a collected dataset, and compared the result with the predictions of traditional models. The results demonstrate the effectiveness and advantages of our model compared to several state-of-the-art methods.

Estimation of the Regional Future Sea Level Rise Using Long-term Tidal Data in the Korean Peninsula (장기 조위자료를 이용한 한반도 권역별 미래 해수면 상승 추정)

  • Lee, Cheol-Eung;Kim, Sang Ug;Lee, Yeong Seob
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.9
    • /
    • pp.753-766
    • /
    • 2014
  • The future mean sea level rise (MSLR) due to climate change in major harbors of Korean Peninsula has been estimated by some statistical methods in this article. Firstly, Mann-Kendall non-parametric trend test to find some trend in the observed long-term tidal data has been performed and also Bayesian change point analysis has been used also to detect the location of change points and their magnitude quantitatively. Especially, in this study, the results from Bayesian change point analysis have been applied to combine 4 future MSLR scenario projections with local MSLR data at 5 tidal gauges. This proposed procedure including Bayesian change point analysis results can improve the step for the determination of starting years of future MLSR scenario projections with 18.6-year lunar node tidal cycle and effectively consider local characteristics at each gauge. The final results by the proposed procedure in this study have shown that the future MSLR in Jeju region (Jeju tidal gauge) is in the largest increment and also the future MSLRs in Western region (Boryeong tidal gauge) and Southern region (Busan tidal gauge) are in the second largest one. Finally, it has been shown that the future MSLRs in Southern region (Yeosu tidal gauge) and Eastern region (Sokcho tidal gauge) seem to be in the relatively smallest growth among 5 gauges.

A Study on the Prediction of Power Consumption in the Air-Conditioning System by Using the Gaussian Process (정규 확률과정을 사용한 공조 시스템의 전력 소모량 예측에 관한 연구)

  • Lee, Chang-Yong;Song, Gensoo;Kim, Jinho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.39 no.1
    • /
    • pp.64-72
    • /
    • 2016
  • In this paper, we utilize a Gaussian process to predict the power consumption in the air-conditioning system. As the power consumption in the air-conditioning system takes a form of a time-series and the prediction of the power consumption becomes very important from the perspective of the efficient energy management, it is worth to investigate the time-series model for the prediction of the power consumption. To this end, we apply the Gaussian process to predict the power consumption, in which the Gaussian process provides a prior probability to every possible function and higher probabilities are given to functions that are more likely consistent with the empirical data. We also discuss how to estimate the hyper-parameters, which are parameters in the covariance function of the Gaussian process model. We estimated the hyper-parameters with two different methods (marginal likelihood and leave-one-out cross validation) and obtained a model that pertinently describes the data and the results are more or less independent of the estimation method of hyper-parameters. We validated the prediction results by the error analysis of the mean relative error and the mean absolute error. The mean relative error analysis showed that about 3.4% of the predicted value came from the error, and the mean absolute error analysis confirmed that the error in within the standard deviation of the predicted value. We also adopt the non-parametric Wilcoxon's sign-rank test to assess the fitness of the proposed model and found that the null hypothesis of uniformity was accepted under the significance level of 5%. These results can be applied to a more elaborate control of the power consumption in the air-conditioning system.