Search | Korea Science

Two-stage imputation method to handle missing data for categorical response variable

Jong-Min Kim;Kee-Jae Lee;Seung-Joo Lee
- Communications for Statistical Applications and Methods
- /
- v.30 no.6
- /
- pp.577-587
- /
- 2023
Conventional categorical data imputation techniques, such as mode imputation, often encounter issues related to overestimation. If the variable has too many categories, multinomial logistic regression imputation method may be impossible due to computational limitations. To rectify these limitations, we propose a two-stage imputation method. During the first stage, we utilize the Boruta variable selection method on the complete dataset to identify significant variables for the target categorical variable. Then, in the second stage, we use the important variables for the target categorical variable for logistic regression to impute missing data in binary variables, polytomous regression to impute missing data in categorical variables, and predictive mean matching to impute missing data in quantitative variables. Through analysis of both asymmetric and non-normal simulated and real data, we demonstrate that the two-stage imputation method outperforms imputation methods lacking variable selection, as evidenced by accuracy measures. During the analysis of real survey data, we also demonstrate that our suggested two-stage imputation method surpasses the current imputation approach in terms of accuracy.
https://doi.org/10.29220/CSAM.2023.30.6.577 인용 PDF

Exponentiality Test of the Three Step-Stress Accelerated Life Testing Model based on Kullback-Leibler Information

Park, Byung-Gu;Yoon, Sang-Chul;Lee, Jeong-Eun
- Journal of the Korean Data and Information Science Society
- /
- v.14 no.4
- /
- pp.951-963
- /
- 2003
In this paper, we propose goodness of fit test statistics based on the estimated Kullback-Leibler information functions using the data from three step stress accelerated life test. This acceleration model is assumed to be a tampered random variable model. The power of the proposed test under various alternatives is compared with Kolmogorov-Smirnov statistic, Cramer-von Mises statistic and Anderson-Darling statistic.
PDF

A Realization Method of the Transfer Functions Containing Variable Parameter

Kawakami, Atsushi
- Proceedings of the IEEK Conference
- /
- 2002.07c
- /
- pp.1988-1991
- /
- 2002
In this paper, we propose a method for realizing transfer functions containing variable parameter, by the state-space method. By using this method, variable transfer functions (VTF) can be often realized with a minimal dimension. In case that a minimal realization can not be obtained, the realization dimension can be fairly reduced.
PDF

Effects of Variable Block Size Motion Estimation in Transform Domain Wyner-Ziv Coding

Kim, Do-Hyeong;Ko, Bong-Hyuck;Shim, Hiuk-Jae;Jeon, Byeung-Woo
- Proceedings of the Korean Society of Broadcast Engineers Conference
- /
- 2009.01a
- /
- pp.381-384
- /
- 2009
In the Wyner-Ziv coding, compression performance highly depends on the quality of the side information since better quality of side information brings less channel noise and less parity bit. However, as decoder generates side information without any knowledge of the current Wyner-Ziv frame, it doesn't have optimal criterion to decide which block is more advantageous to generate better side information. Hence, in general, fixed block size motion estimation (ME) is performed in generating side information. By the fixed block size ME, the best coding performance cannot be attained since some blocks are better to be motion estimated in different block sizes. Therefore if there is a way to find appropriate ME block of each block, the quality of the side information might be improved. In this paper, we investigate the effects of variable block sizes of ME in generating side information.
PDF

Algorithms for Handling Incomplete Data in SVM and Deep Learning (SVM과 딥러닝에서 불완전한 데이터를 처리하기 위한 알고리즘)

Lee, Jong-Chan
- Journal of the Korea Convergence Society
- /
- v.11 no.3
- /
- pp.1-7
- /
- 2020
This paper introduces two different techniques for dealing with incomplete data and algorithms for learning this data. The first method is to process the incomplete data by assigning the missing value with equal probability that the missing variable can have, and learn this data with the SVM. This technique ensures that the higher the frequency of missing for any variable, the higher the entropy so that it is not selected in the decision tree. This method is characterized by ignoring all remaining information in the missing variable and assigning a new value. On the other hand, the new method is to calculate the entropy probability from the remaining information except the missing value and use it as an estimate of the missing variable. In other words, using a lot of information that is not lost from incomplete learning data to recover some missing information and learn using deep learning. These two methods measure performance by selecting one variable in turn from the training data and iteratively comparing the results of different measurements with varying proportions of data lost in the variable.
https://doi.org/10.15207/JKCS.2020.11.3.001 인용 PDF KSCI

On Reliability and UMVUE of Right-Tail Probability in a Half-Normal Variable

Woo, Jung-Soo
- Journal of the Korean Data and Information Science Society
- /
- v.18 no.1
- /
- pp.259-267
- /
- 2007
We consider parametric estimation in a half-normal variable and a UMVUE of its right-tail probability. Also we consider estimation of reliability in two independent half-normal variables, and derive k-th moment of ratio of two same variables.
PDF

New variable adaptive coefficient algorithm for variable circumstances (가변환경에 적합한 새로운 가변 적응 계수에 관한 연구)

오신범;이채욱
- Journal of Korea Society of Industrial Information Systems
- /
- v.4 no.3
- /
- pp.79-88
- /
- 1999
One of the most popular algorithm in adaptive signal processing is the least mean square(LMS) algorithm. The majority of these papers examine the LMS algorithm with a constant step size. The choice of the step size reflects a tradeoff between misadjustment and the speed of adaptation. Subsequent works have discussed the issue of optimization of the step size or methods of varying the step size to improve performance. However there is as yet no detailed analysis of a variable step size algorithm that is capable of giving both the speed of adaptation and convergence. In this paper we propose a new variable step size algorithm where the step size adjustment is controlled by square of the prediction error. The simulation results obtained using the new algorithm about noise canceller system and system identification are described. They are compared to the results obtained for other variable step size algorithm. function.
PDF

Bayesian test of homogenity in small areas: A discretization approach

Kim, Min Sup;Nandram, Balgobin;Kim, Dal Ho
- Journal of the Korean Data and Information Science Society
- /
- v.28 no.6
- /
- pp.1547-1555
- /
- 2017
This paper studies Bayesian test of homogeneity in contingency tables made by discretizing a continuous variable. Sometimes when we are considering events of interest in small area setup, we can think of discretization approaches about the continuous variable. If we properly discretize the continuous variable, we can find invisible relationships between areas (groups) and a continuous variable of interest. The proper discretization of the continuous variable can support the alternative hypothesis of the homogeneity test in contingency tables even if the null hypothesis was not rejected through k-sample tests involving one-way ANOVA. In other words, the proportions of variables with a particular level can vary from group to group by the discretization. If we discretize the the continuous variable, it can be treated as an analysis of the contingency table. In this case, the chi-squared test is the most commonly employed method. However, further discretization gives rise to more cells in the table. As a result, the count in the cells becomes smaller and the accuracy of the test becomes lower. To prevent this, we can consider the Bayesian approach and apply it to the setup of the homogeneity test.
https://doi.org/10.7465/jkdi.2017.28.6.1547 인용 KSCI

Information Theory and Data Visualization Approach to Poll Analysis (정보이론과 시각화 방법에 의한 여론조사 분석의 새로운 접근방법)

Huh, Moon-Yul;Cha, Woon-Ock
- The Korean Journal of Applied Statistics
- /
- v.20 no.1
- /
- pp.61-78
- /
- 2007
A method for poll analysis using information theory and data visualization is proposed in this paper. Questions of opinion poll consist of a target variable and many explanation variables. The type of explanation variables is either numerical or categorical. In this study, explanation variables of mixed types have been ranked according to the magnitude of their effect on target variable by using mutual information. Likewise, the order of explanation variables has been evaluated using data visualization. This is the first study to quantify the impact of specific explanation variable on the related target variable.
https://doi.org/10.5351/KJAS.2007.20.1.061 인용 PDF KSCI

Design of a Condition-based Maintenance Policy Using a Surrogate Variable (대용변수를 이용한 상태기반 보전정책의 설계)

Kwon, Hyuck Moo;Hong, Sung Hoon;Lee, Min Koo
- Journal of Korean Society for Quality Management
- /
- v.49 no.3
- /
- pp.299-312
- /
- 2021
Purpose: We provide a condition-based maintenance policy where a surrogate variable is used for monitoring system performance. We constructed a risk function by taking into account the risk and losses accompanied with erroneous decisions. Methods: Assuming a unique degradation process for the performance variable and its specific relationship with the surrogate variable, the maintenance policy is determined. A risk function is developed on the basis of producer's and consumer's risks accompanied with each decision. With a strategic safety factor considered, the optimal threshold value for the surrogate variable is determined based on the risk function. Results: The condition-based maintenance is analyzed from the point of risk. With an assumed safety consideration, the optimal threshold value of the surrogate variable is provided for taking a maintenance action. The optimal solution cannot be obtained in a closed form. An illustrative numerical example and solution is provided with a source code of R program. Conclusion: The study can be applied to situation where a sensor signal is issued if the system performance begins to degrade gradually and reaches eventually its functional failure. The study can be extended to the case where two or more performance variables are connected to a same surrogate variable. Also estimation of the distribution parameters and risk coefficients should be further studied.
https://doi.org/10.7469/JKSQM.2021.49.3.299 인용 PDF KSCI

Search Result 5,187, Processing Time 0.035 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)