• Title/Summary/Keyword: statistical approach

Search Result 2,344, Processing Time 0.03 seconds

Classification-Based Approach for Hybridizing Statistical and Rule-Based Machine Translation

  • Park, Eun-Jin;Kwon, Oh-Woog;Kim, Kangil;Kim, Young-Kil
    • ETRI Journal
    • /
    • v.37 no.3
    • /
    • pp.541-550
    • /
    • 2015
  • In this paper, we propose a classification-based approach for hybridizing statistical machine translation and rulebased machine translation. Both the training dataset used in the learning of our proposed classifier and our feature extraction method affect the hybridization quality. To create one such training dataset, a previous approach used auto-evaluation metrics to determine from a set of component machine translation (MT) systems which gave the more accurate translation (by a comparative method). Once this had been determined, the most accurate translation was then labelled in such a way so as to indicate the MT system from which it came. In this previous approach, when the metric evaluation scores were low, there existed a high level of uncertainty as to which of the component MT systems was actually producing the better translation. To relax such uncertainty or error in classification, we propose an alternative approach to such labeling; that is, a cut-off method. In our experiments, using the aforementioned cut-off method in our proposed classifier, we managed to achieve a translation accuracy of 81.5% - a 5.0% improvement over existing methods.

Statistical Techniques for Automatic Indexing and Some Experiments with Korean Documents (자동색인의 통계적기법과 한국어 문헌의 실험)

  • Chung Young Mee;Lee Tae Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.9
    • /
    • pp.99-118
    • /
    • 1982
  • This paper first reviews various techniques proposed for automatic indexing with special emphasis placed on statistical techniques. Frequency-based statistical techniques are categorized into the following three approaches for further investigation on the basis of index term selection criteria: term frequency approach, document frequency approach, and probabilistic approach. In the experimental part of this study, Pao's technique based on the Goffman's transition region formula and Harter's 2-Poisson distribution model with a measure of the potential effectiveness of index term were tested. Experimental document collection consists of 30 agriculture-related documents written in Korean. Pao's technique did not yield good result presumably due to the difference in word usage between Korean and English. However, Harter's model holds some promise for Korean document indexing because the evaluation result from this experiment was similar to that of the Harter's.

  • PDF

A Bayesian Approach to Detecting Outliers Using Variance-Inflation Model

  • Lee, Sangjeen;Chung, Younshik
    • Communications for Statistical Applications and Methods
    • /
    • v.8 no.3
    • /
    • pp.805-814
    • /
    • 2001
  • The problem of 'outliers', observations which look suspicious in some way, has long been one of the most concern in the statistical structure to experimenters and data analysts. We propose a model for outliers problem and also analyze it in linear regression model using a Bayesian approach with the variance-inflation model. We will use Geweke's(1996) ideas which is based on the data augmentation method for detecting outliers in linear regression model. The advantage of the proposed method is to find a subset of data which is most suspicious in the given model by the posterior probability The sampling based approach can be used to allow the complicated Bayesian computation. Finally, our proposed methodology is applied to a simulated and a real data.

  • PDF

Strategical Issues in Multiple-Objective Optimal Experimental Design

  • Kim Young-Il;Kahng Myung-Wook
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • Many of statistical experimental designs have multiple goals. It is often impractical to use the single-objective criterion for this purpose. It is necessary to modify the existing optimum experimental design criteria. There exist three criteria handling this problem in general: compound, constrained, maxi-min approach. This paper extends Kahng and Kim's idea to develop another approach to incorporate several experimental design criteria in accordance of their importance in practical way. Furthermore this paper investigate its relationship with the maxi-min approach. It shows logically that the often realized infeasibility can be still avoided with the rank of importance of the objectives intact.

Cash flow Forecasting in Construction Industry Using Soft Computing Approach

  • Kumar, V.S.S.;Venugopal, M.;Vikram, B.
    • International conference on construction engineering and project management
    • /
    • 2013.01a
    • /
    • pp.502-506
    • /
    • 2013
  • The cash flow forecasting is normally done by contractors in construction industry at early stages of the project for contractual decisions. The decision making in such situations involve uncertainty about future cash flows and assessment of working capital requirements gains more importance in projects constrained by cash. The traditional approach to assess the working capital requirements is deterministic in and neglects the uncertainty. This paper presents an alternate approach to assessment of working capital requirements for contractor based on fuzzy set theory by considering the uncertainty and ambiguity involved at payment periods. Statistical methods are used to deal with the uncertainty for working capital curves. Membership functions of the fuzzy sets are developed based on these statistical measures. Advantage of fuzzy peak working capital requirements is demonstrated using peak working capital requirements curves. Fuzzy peak working capital requirements curves are compared with deterministic curves and the results are analyzed. Fuzzy weighted average methodology is proposed for the assessment of peak working capital requirements.

  • PDF

Bayesian Semi-Parametric Regression for Quantile Residual Lifetime

  • Park, Taeyoung;Bae, Wonho
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.4
    • /
    • pp.285-296
    • /
    • 2014
  • The quantile residual life function has been effectively used to interpret results from the analysis of the proportional hazards model for censored survival data; however, the quantile residual life function is not always estimable with currently available semi-parametric regression methods in the presence of heavy censoring. A parametric regression approach may circumvent the difficulty of heavy censoring, but parametric assumptions on a baseline hazard function can cause a potential bias. This article proposes a Bayesian semi-parametric regression approach for inference on an unknown baseline hazard function while adjusting for available covariates. We consider a model-based approach but the proposed method does not suffer from strong parametric assumptions, enjoying a closed-form specification of the parametric regression approach without sacrificing the flexibility of the semi-parametric regression approach. The proposed method is applied to simulated data and heavily censored survival data to estimate various quantile residual lifetimes and adjust for important prognostic factors.

Improved Statistical Testing of Two-class Microarrays with a Robust Statistical Approach

  • Oh, Hee-Seok;Jang, Dong-Ik;Oh, Seung-Yoon;Kim, Hee-Bal
    • Interdisciplinary Bio Central
    • /
    • v.2 no.2
    • /
    • pp.4.1-4.6
    • /
    • 2010
  • The most common type of microarray experiment has a simple design using microarray data obtained from two different groups or conditions. A typical method to identify differentially expressed genes (DEGs) between two conditions is the conventional Student's t-test. The t-test is based on the simple estimation of the population variance for a gene using the sample variance of its expression levels. Although empirical Bayes approach improves on the t-statistic by not giving a high rank to genes only because they have a small sample variance, the basic assumption for this is same as the ordinary t-test which is the equality of variances across experimental groups. The t-test and empirical Bayes approach suffer from low statistical power because of the assumption of normal and unimodal distributions for the microarray data analysis. We propose a method to address these problems that is robust to outliers or skewed data, while maintaining the advantages of the classical t-test or modified t-statistics. The resulting data transformation to fit the normality assumption increases the statistical power for identifying DEGs using these statistics.

Statistical Model-Based Voice Activity Detection Using the Second-Order Conditional Maximum a Posteriori Criterion with Adapted Threshold (적응형 문턱값을 가지는 2차 조건 사후 최대 확률을 이용한 통계적 모델 기반의 음성 검출기)

  • Kim, Sang-Kyun;Chang, Joon-Hyuk
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.1
    • /
    • pp.76-81
    • /
    • 2010
  • In this paper, we propose a novel approach to improve the performance of a statistical model-based voice activity detection (VAD) which is based on the second-order conditional maximum a posteriori (CMAP). In our approach, the VAD decision rule is expressed as the geometric mean of likelihood ratios (LRs) based on adapted threshold according to the speech presence probability conditioned on both the current observation and the speech activity decisions in the pervious two frames. Experimental results show that the proposed approach yields better results compared to the statistical model-based and the CMAP-based VAD using the LR test.

A Definition and Criterion on Typhoons Approaching to the Korean Peninsula for the Objective Statistical Analysis (객관적인 태풍 통계자료 구축을 위한 '한반도 근접 태풍'의 정의 및 기준 설정)

  • Moon, Il-Ju;Choi, Eu-Soo
    • Atmosphere
    • /
    • v.21 no.1
    • /
    • pp.45-55
    • /
    • 2011
  • A definition on the tropical cyclone (TC) that influenced the Korean Peninsula (KP), the KP-influence TC, is widely used in the TC communities, but its criterion is not clear mainly due to the ambiguity and subjectiveness of the term such as 'influence', which led to the inconsistent TC statistical analysis. This study suggests a definition and criterion on the TC approaching to the KP (KP-approach TC) additionally, which is more obvious and objective than the KP-influence TC. In this study, the criterion on the KP-approach TC is determined when the TC's center from the RSMC best track data encounters the box areas of $28^{\circ}N{\sim}40^{\circ}N$ and $120^{\circ}E{\sim}138^{\circ}E$. The range is chosen by finding a minimum area that includes all official KP-influence TCs except three TCs that affected the KP as a tropical depression (TD). Statistical analysis reveals that, among total 1,537 TCs that occur in the western North Pacific during 1951-2008, the KP-approach TC was 472, the KP-influence TC was 187, and the KP-landfall TC was 87. August was the month that the largest TCs approach and influence to the KP. Finally, this paper suggests to determine the KP-influence TC by the strong wind and heavy rain advisories in the KP based on the observation after the storm's passage.

Choice of Statistical Calibration Procedures When the Standard Measurement is Also Subject to Error

  • Lee, Seung-Hoon;Yum, Bong-Jin
    • Journal of the Korean Statistical Society
    • /
    • v.14 no.2
    • /
    • pp.63-75
    • /
    • 1985
  • This paper considers a statistical calibration problem in which the standard as wel as the nonstandard measurement is subject to error. Since the classicla approach cannot handle this situation properly, a functional relationship model with additional feature of prediction is proposed. For the analysis of the problem four different approaches-two estimation techniques (ordinary and grouping least squares) combined with two prediction methods (classical and inverse prediction)-are considered. By Monte Carlo simulation the perromance of each approach is assessed in term of the probability of concentration. The simulation results indicate that the ordinary least squares with inverse prediction is generally preferred in interpolation while the grouping least squares with classical prediction turns out to be better in extrapolation.

  • PDF