• Title/Summary/Keyword: Missing Values

Search Result 440, Processing Time 0.023 seconds

Nonstationary Time Series and Missing Data

  • Shin, Dong-Wan;Lee, Oe-Sook
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.73-79
    • /
    • 2010
  • Missing values for unit root processes are imputed by the most recent observations. Treating the imputed observations as if they are complete ones, semiparametric unit root tests are extended to missing value situations. Also, an invariance principle for the partial sum process of the imputed observations is established under some mild conditions, which shows that the extended tests have the same limiting null distributions as those based on complete observations. The proposed tests are illustrated by analyzing an unequally spaced real data set.

Nonestimability of missing values for $2^K$ and $3^K$Factoroial Designs

  • Jung W. Sim;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • v.13 no.1
    • /
    • pp.57-68
    • /
    • 1984
  • A method of missing value estimation for a general design is descrived. In particular, the cases of missing value estimation for $2^k$ and $3^k$ design are explored and discussed. Some examples are illustrated to show the missing value estimation and the nonestimatimable cases.

  • PDF

Bootstrap Confidence Intervals of Classification Error Rate for a Block of Missing Observations

  • Chung, Hie-Choon
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.4
    • /
    • pp.675-686
    • /
    • 2009
  • In this paper, it will be assumed that there are two distinct populations which are multivariate normal with equal covariance matrix. We also assume that the two populations are equally likely and the costs of misclassification are equal. The classification rule depends on the situation when the training samples include missing values or not. We consider the bootstrap confidence intervals for classification error rate when a block of observation is missing.

Treatment of Missing Data by Decomposition and Voting with Ordinal Data

  • Chun, Young-M.;Son, Hong-K.;Chung, Sung-S.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.3
    • /
    • pp.585-598
    • /
    • 2007
  • It is so difficult to get complete data when we conduct a questionaire in actuality. And we get inefficient results if we analyze statistical tests with ignoring missing values. Therefore, we use imputation methods which evaluate quality of data. This study proposes a imputation method by decomposition and voting with ordinal data. First, data are sorted by each variable. After that, imputation methods are used by each decomposition level. And the last step is selection of values with voting. The proposed method is evaluated by accuracy and RMSE. In conclusion, missing values are related to each variable, median imputation method using decomposition and voting is powerful.

  • PDF

An Intelligent System for Filling of Missing Values in Weather Data

  • Maqsood Ali Solangi;Ghulam Ali Mallah;Shagufta Naz;Jamil Ahmed Chandio;Muhammad Bux Soomro
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.95-99
    • /
    • 2023
  • Recently Machine Learning has been considered as one of the active research areas of Computer Science. The various Artificial Intelligence techniques are used to solve the classification problems of environmental sciences, biological sciences, and medical sciences etc. Due to the heterogynous and malfunctioning weather sensors a considerable amount of noisy data with missing is generated, which is alarming situation for weather prediction stockholders. Filling of these missing values with proper method is really one of the significant problems. The data must be cleaned before applying prediction model to collect more precise & accurate results. In order to solve all above stated problems, this research proposes a novel weather forecasting system which consists upon two steps. The first step will prepare data by reducing the noise; whereas a decision model is constructed at second step using regression algorithm. The Confusion Matrix will be used to evaluation the proposed classifier.

A Study on the Index Estimation of Missing Real Estate Transaction Cases Using Machine Learning (머신러닝을 활용한 결측 부동산 매매 지수의 추정에 대한 연구)

  • Kim, Kyung-Min;Kim, Kyuseok;Nam, Daisik
    • Journal of the Economic Geographical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.171-181
    • /
    • 2022
  • The real estate price index plays key roles as quantitative data in real estate market analysis. International organizations including OECD publish the real estate price indexes by country, and the Korea Real Estate Board announces metropolitan-level and municipal-level indexes. However, when the index is set on the smaller spatial unit level than metropolitan and municipal-level, problems occur: missing values. As the spatial scope is narrowed down, there are cases where there are few or no transactions depending on the unit period, which lead index calculation difficult or even impossible. This study suggests a supervised learning-based machine learning model to compensate for missing values that may occur due to no transaction in a specific range and period. The models proposed in our research verify the accuracy of predicting the existing values and missing values.

Developing a Method to Define Mountain Search Priority Areas Based on Behavioral Characteristics of Missing Persons

  • Yoo, Ho Jin;Lee, Jiyeong
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.37 no.5
    • /
    • pp.293-302
    • /
    • 2019
  • In mountain accident events, it is important for the search team commander to determine the search area in order to secure the Golden Time. Within this period, assistance and treatment to the concerned individual will most likely prevent further injuries and harm. This paper proposes a method to determine the search priority area based on missing persons behavior and missing persons incidents statistics. GIS (Geographic Information System) and MCDM (Multi Criteria Decision Making) are integrated by applying WLC (Weighted Linear Combination) techniques. Missing persons were classified into five types, and their behavioral characteristics were analyzed to extract seven geographic analysis factors. Next, index values were set up for each missing person and element according to the behavioral characteristics, and the raster data generated by multiplying the weight of each element are superimposed to define models to select search priority areas, where each weight is calculated from the AHP (Analytical Hierarchy Process) through a pairwise comparison method obtained from search operation experts. Finally, the model generated in this study was applied to a missing person case through a virtual missing scenario, the priority area was selected, and the behavioral characteristics and topographical characteristics of the missing persons were compared with the selected area. The resulting analysis results were verified by mountain rescue experts as 'appropriate' in terms of the behavior analysis, analysis factor extraction, experimental process, and results for the missing persons.

An EM Algorithm-Based Approach for Imputation of Pixel Values in Color Image (색조영상에서 랜덤결측화소값 대체를 위한 EM 알고리즘 기반 기법)

  • Kim, Seung-Gu
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.2
    • /
    • pp.305-315
    • /
    • 2010
  • In this paper, a frequentistic approach to impute the values of R, G, B-components in random missing pixels of color image is provided. Under assumption that the given image is a realization of Gaussian Markov random field, its model is designed such that each neighbor pixel values for a given pixel follows (independently) the normal distribution with covariance matrix scaled by an evaluates of the similarity between two pixel values, so that the imputation is not to be affected by the neighbors with different color. An approximate EM-based algorithm maximizing the underlying likelihood is implemented to estimate the parameters and to impute the missing pixel values. Some experiments are presented to show its effectiveness through performance comparison with a popular interpolation method.

Imputation of Missing Data Based on Hot Deck Method Using K-nn (K-nn을 이용한 Hot Deck 기반의 결측치 대체)

  • Kwon, Soonchang
    • Journal of Information Technology Services
    • /
    • v.13 no.4
    • /
    • pp.359-375
    • /
    • 2014
  • Researchers cannot avoid missing data in collecting data, because some respondents arbitrarily or non-arbitrarily do not answer questions in studies and experiments. Missing data not only increase and distort standard deviations, but also impair the convenience of estimating parameters and the reliability of research results. Despite widespread use of hot deck, researchers have not been interested in it, since it handles missing data in ambiguous ways. Hot deck can be complemented using K-nn, a method of machine learning, which can organize donor groups closest to properties of missing data. Interested in the role of k-nn, this study was conducted to impute missing data based on the hot deck method using k-nn. After setting up imputation of missing data based on hot deck using k-nn as a study objective, deletion of listwise, mean, mode, linear regression, and svm imputation were compared and verified regarding nominal and ratio data types and then, data closest to original values were obtained reasonably. Simulations using different neighboring numbers and the distance measuring method were carried out and better performance of k-nn was accomplished. In this study, imputation of hot deck was re-discovered which has failed to attract the attention of researchers. As a result, this study shall be able to help select non-parametric methods which are less likely to be affected by the structure of missing data and its causes.

Comparison of missing data methods in clustered survival data using Bayesian adaptive B-Spline estimation

  • Yoo, Hanna;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.159-172
    • /
    • 2018
  • In many epidemiological studies, missing values in the outcome arise due to censoring. Such censoring is what makes survival analysis special and differentiated from other analytical methods. There are many methods that deal with censored data in survival analysis. However, few studies have dealt with missing covariates in survival data. Furthermore, studies dealing with missing covariates are rare when data are clustered. In this paper, we conducted a simulation study to compare results of several missing data methods when data had clustered multi-structured type with missing covariates. In this study, we modeled unknown baseline hazard and frailty with Bayesian B-Spline to obtain more smooth and accurate estimates. We also used prior information to achieve more accurate results. We assumed the missing mechanism as MAR. We compared the performance of five different missing data techniques and compared these results through simulation studies. We also presented results from a Multi-Center study of Korean IBD patients with Crohn's disease(Lee et al., Journal of the Korean Society of Coloproctology, 28, 188-194, 2012).