• Title/Summary/Keyword: Missing Data Adjustment

Search Result 18, Processing Time 0.027 seconds

Adjustment System for Outlier and Missing Value using Data Storage (데이터 저장소를 이용한 이상치 및 결측치 보정 시스템)

  • Gwangho Kim;Neunghoe Kim
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.23 no.5
    • /
    • pp.47-53
    • /
    • 2023
  • With the advent of the 4th Industrial Revolution, diverse and a large amount of data has been accumulated now. The agricultural community has also collected environmental data that affects the growth of crops in smart farms or open fields with sensors. Environmental data has different features depending on where and when they are measured. Studies have been conducted using collected agricultural data to predict growth and yield with statistics and artificial intelligence. The results of these studies vary greatly depending on the data on which they are based. So, studies to enhance data quality have also been continuously conducted for performance improvement. A lot of data is required for high performance, but if there are outlier or missing values in the data, it can greatly affect the results even if the amount is sufficient. So, adjustment of outlier and missing values is essential in the data preprocessing. Therefore, this paper integrates data collected from actual farms and proposes a adjustment system for outlier and missing values based on it.

Estimation using response probability when missing data happen on the second occasion

  • Park, Hyeonah;Na, Seongryong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.1
    • /
    • pp.263-269
    • /
    • 2014
  • When the loss of samples appears under repeated surveys, new samples can often replace missing values. Estimators using response probability can be considered under repeated surveys on two occasions where new samples are selected instead of missing data on the second occasion. We propose a new estimator that uses both respondents and new samples on the second occasion. It is considered for the simulation setting that missing values can happen at the second occasion and are replaced by new samples. We can see that the proposed estimator is more efficient than that using a weighting adjustment method for respondents at the second occasion.

Travel Time Forecasting in an Interrupted Traffic Flow by adopting Historical Profile and Time-Space Data Fusion (히스토리컬 프로파일 구축과 시.공간 자료합성에 의한 단속류 통행시간 예측)

  • Yeo, Tae-Dong;Han, Gyeong-Su;Bae, Sang-Hun
    • Journal of Korean Society of Transportation
    • /
    • v.27 no.2
    • /
    • pp.133-144
    • /
    • 2009
  • In Korea, the ITS project has been progressed to improve traffic mobility and safety. Further, it is to relieve traffic jam by supply real time travel information for drivers and to promote traffic convenience and safety. It is important that the traffic information is provided accurately. This study was conducted outlier elimination and missing data adjustment to improve accuracy of raw data. A method for raise reliability of travel time prediction information was presented. We developed Historical Profile model and adjustment formula to reflect quality of interrupted flow. We predicted travel time by developed Historical Profile model and adjustment formula and verified by comparison between developed model and existing model such as Neural Network model and Kalman Filter model. The results of comparative analysis clarified that developed model and Karlman Filter model similarity predicted in general situation but developed model was more accurate than other models in incident situation.

Modified BLS Weight Adjustment (수정된 BLS 가중치보정법)

  • Park, Jung-Joon;Cho, Ki-Jong;Lee, Sang-Eun;Shin, Key-Il
    • Communications for Statistical Applications and Methods
    • /
    • v.18 no.3
    • /
    • pp.367-376
    • /
    • 2011
  • BLS weight adjustment is a widely used method for business surveys with non-responses and outliers. Recent surveys show that the non-response weight adjustment of the BLS method is the same as the ratio imputation method. In this paper, we suggested a modified BLS weight adjustment method by imputing missing values instead of using weight adjustment for non-response. Monthly labor survey data is used for a small Monte-Carlo simulation and we conclude that the suggested method is superior to the original BLS weight adjustment method.

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

  • Hwang, S.Y.;Hahn, H.E.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.3
    • /
    • pp.555-563
    • /
    • 2004
  • In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.

  • PDF

Multiple Imputation Reducing Outlier Effect using Weight Adjustment Methods (가중치 보정을 이용한 다중대체법)

  • Kim, Jin-Young;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.4
    • /
    • pp.635-647
    • /
    • 2013
  • Imputation is a commonly used method to handle missing survey data. The performance of the imputation method is influenced by various factors, especially an outlier. The removal of the outlier in a data set is a simple and effective approach to reduce the effect of an outlier. In this paper in order to improve the precision of multiple imputation, we study a imputation method which reduces the effect of outlier using various weight adjustment methods that include the removal of an outlier method. The regression method in PROC/MI in SAS is used for multiple imputation and the obtained final adjusted weight is used as a weight variable to obtain the imputed values. Simulation studies compared the performance of various weight adjustment methods and Monthly Labor Statistic data is used for real data analysis.

The Phenomenological study on the Meaning of Family Adjustment Process Experience in Married Immigrant Women (결혼이민여성의 가족적응 과정에 관한 현상학적 연구)

  • Park, Byung-Kum
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.2
    • /
    • pp.277-295
    • /
    • 2013
  • The purpose of the phenomenological study was to explore the meaning of family adjustment process experience according to married immigrant women's perception and to enrich our understanding. In order to accomplish the purpose of research, six married immigrant women participated. Data were collected through in-depth interview. In addition, the data were analyzed by a Colaizzi's phenomenological analysis. The findings showed that the meaning of family adjustment process experience in married immigrant women were identified as 37 themes and 8categories. The 8 categories consisted of "deciding to marry a foreigner", "first meeting and marriage", "starting to living as a korean", "getting along with husband", "becoming a family with in-laws", "playing one's role as a mother", "missing hometown and family", "adjusting to living in Korea". Based on the findings, we discussed the meaning of family adjustment process experience in married immigrant women. And lastly, this results made suggestions for the social welfare policies and practices for them and their families.

Handling the nonresponse in sample survey (설문조사에서의 무응답 처리)

  • Lee, Hwa-Jung;Kang, Suk-Bok
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.6
    • /
    • pp.1183-1194
    • /
    • 2012
  • When it comes to a survey, no answer would occur frequently. Therefore various methods for handling nonresponse have been applied to analyse the survey. In this paper, the ratio of occurrence of two type of nonresponse cases - unit nonresponse and item nonresponse - is presented using previous real survey data, and we compared complete data and data with nonresponse. We suggest the reason of happening of nonresponse and the ratio of nonresponse using data collected through group interviews.

Evaluation of the Validity of Risk-Adjustment Model of Acute Stroke Mortality for Comparing Hospital Performance (병원 성과 비교를 위한 급성기 뇌졸중 사망률 위험보정모형의 타당도 평가)

  • Choi, Eun Young;Kim, Seon-Ha;Ock, Minsu;Lee, Hyeon-Jeong;Son, Woo-Seung;Jo, Min-Woo;Lee, Sang-il
    • Health Policy and Management
    • /
    • v.26 no.4
    • /
    • pp.359-372
    • /
    • 2016
  • Background: The purpose of this study was to develop risk-adjustment models for acute stroke mortality that were based on data from Health Insurance Review and Assessment Service (HIRA) dataset and to evaluate the validity of these models for comparing hospital performance. Methods: We identified prognostic factors of acute stroke mortality through literature review. On the basis of the avaliable data, the following factors was included in risk adjustment models: age, sex, stroke subtype, stroke severity, and comorbid conditions. Survey data in 2014 was used for development and 2012 dataset was analysed for validation. Prediction models of acute stroke mortality by stroke type were developed using logistic regression. Model performance was evaluated using C-statistics, $R^2$ values, and Hosmer-Lemeshow goodness-of-fit statistics. Results: We excluded some of the clinical factors such as mental status, vital sign, and lab finding from risk adjustment model because there is no avaliable data. The ischemic stroke model with age, sex, and stroke severity (categorical) showed good performance (C-statistic=0.881, Hosmer-Lemeshow test p=0.371). The hemorrhagic stroke model with age, sex, stroke subtype, and stroke severity (categorical) also showed good performance (C-statistic=0.867, Hosmer-Lemeshow test p=0.850). Conclusion: Among risk adjustment models we recommend the model including age, sex, stroke severity, and stroke subtype for HIRA assessment. However, this model may be inappropriate for comparing hospital performance due to several methodological weaknesses such as lack of clinical information, variations across hospitals in the coding of comorbidities, inability to discriminate between comorbidity and complication, missing of stroke severity, and small case number of hospitals. Therefore, further studies are needed to enhance the validity of the risk adjustment model of acute stroke mortality.

A Network Partition Approach for MFD-Based Urban Transportation Network Model

  • Xu, Haitao;Zhang, Weiguo;zhuo, Zuozhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.11
    • /
    • pp.4483-4501
    • /
    • 2020
  • Recent findings identified the scatter and shape of MFD (macroscopic fundamental diagram) is heavily influenced by the spatial distribution of link density in a road network. This implies that the concept of MFD can be utilized to divide a heterogeneous road network with different degrees of congestion into multiple homogeneous subnetworks. Considering the actual traffic data is usually incomplete and inaccurate while most traffic partition algorithms rely on the completeness of the data, we proposed a three-step partitioned algorithm called Iso-MB (Isoperimetric algorithm - Merging - Boundary adjustment) permitting of incompletely input data in this paper. The proposed algorithm was implemented and verified in a simulated urban transportation network. The existence of well-defined MFD in each subnetwork was revealed and discussed and the selection of stop parameter in the isoperimetric algorithm was explained and dissected. The effectiveness of the approach to the missing input data was also demonstrated and elaborated.