Search | Korea Science

HANDLING MISSING VALUES IN FUZZY c-MEANS

Miyamoto, Sadaaki;Takata, Osamu;Unayahara, Kazutaka
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 1998.06a
- /
- pp.139-142
- /
- 1998
Missing values in data for fuzzy c-menas clustering is discussed. Two basic methods of fuzzy c-means, i.e., the standard fuzzy c-means and the entropy method are considered and three options of handling missing values are proposed, among which one is to define a new distance between data with missing values, second is to alter a weight in the new distance, and the third is to fill the missing values by an appropriate numbers. Experimental Results are shown.
PDF

Development of Missing Item Detection and Management System under Cell Type Packaging Processes (Cell 방식 포장공정에서의 Missing Item 검사 및 관리 시스템 개발)

Kim, Hyeon-Woo;Choi, Hyun-Eui;An, Ho-Gyun;Yoon, Tae-Sung
- Proceedings of the IEEK Conference
- /
- 2009.05a
- /
- pp.344-346
- /
- 2009
Cell type packaging line is more suitable for the products with various models and small quantities like mobile phone or mp3 player than conveyor type packaging line. Cell type packaging line is applicable to package various product models, but it can cause wrong product compositions and missing of items. So, automatic missing item detection system is needed. We designed an missing item detection system with a bar code reader, infrared sensors, and s digital camera. and also developed the programs for sensor data acquisition, image data processing, GUI, and data management.
PDF

Interpolation of Missing Groundwater-Level Data at the National Groundwater Monitoring Wells (장기 관측 지하수위 결측자료 보완)

정상용;심병완;강동환;원종호;김규범
- Proceedings of the Korean Society of Soil and Groundwater Environment Conference
- /
- 2000.11a
- /
- pp.15-22
- /
- 2000
Long ranged groundwater-level data often have the missing intervals because of the trouble of monitoring systems at the national groundwater monitoring wells. Geostatistical methods are very useful for the supplement of the missing data. Ordinary kriging was applied for the interpolation of the missing groundwater-level data with a smooth sinusoidal variation. Conditional simulation was used for the reproduction of the missing data with high fluctuations. Two geostatistical methods produced the very accurate estimates at the missing intervals and reproduced their original variations. This fact is proved by the cross validation test and graphical method, respectively.
PDF

A Study on the Sensing System Construction of a Missing Roadbed (철도 노반유실검지시스템 구축에 관한 연구)

Kim, Ki-Young;Kang, Kyung-Sik
- Proceedings of the Safety Management and Science Conference
- /
- 2009.11a
- /
- pp.461-470
- /
- 2009
A railroad has a benefit of the mass transportation of a passenger and cargo, but just a time of accident could cause a huge loss of a human life and property. Especially, a typhoon and a localized torrential downpour usually happened in summer season have caused average 38.29 times of the missing roadbed which support the railroad in recent 7 years. If a train would pass on this railroad which the roadbed was missed, there could be a huge accident and many people will die. But, the security issue is not satisfied because the method of sensing the missing roadbed is depending solely on the naked eye inspection by a person in charge. So, in this study, I would like to suggest the missing roadbed real-time sensing and train operation system to reduce the possibility of the railroad accident by controlling the operation of train when the missing roadbed condition would be sensed in the real-time system.
PDF

A Study on Automatic Missing Value Imputation Replacement Method for Data Processing in Digital Data (디지털 데이터에서 데이터 전처리를 위한 자동화된 결측 구간 대치 방법에 관한 연구)

Kim, Jong-Chan;Sim, Chun-Bo;Jung, Se-Hoon
- Journal of Korea Multimedia Society
- /
- v.24 no.2
- /
- pp.245-254
- /
- 2021
We proposed the research on an analysis and prediction model that allows the identification of outliers or abnormality in the data followed by effective and rapid imputation of missing values was conducted. This model is expected to analyze efficiently the problems in the data based on the calibrated raw data. As a result, a system that can adequately utilize the data was constructed by using the introduced KNN + MLE algorithm. With this algorithm, the problems in some of the existing KNN-based missing data imputation algorithms such as ignoring the missing values in some data sections or discarding normal observations were effectively addressed. A comparative evaluation was performed between the existing imputation approaches such as K-means, KNN, MEI, and MI as well as the data missing mechanisms including MCAR, MAR, and NI to check the effectiveness/efficiency of the proposed algorithm, and its superiority in all aspects was confirmed.
https://doi.org/10.9717/kmms.2020.24.2.245 인용 PDF KSCI HTML

Bioequivalence trial with two generic drugs in 2 × 3 crossover design with missing data

Park, Sang-Gue;Kim, Seunghyo;Choi, Ikjoon
- Communications for Statistical Applications and Methods
- /
- v.27 no.6
- /
- pp.641-647
- /
- 2020
The 2 × 3 crossover design, a modified version of the 3 × 3 crossover design, is considered to compare the bioavailability of two generic candidates with a reference drug. The 2 × 3 crossover design is more economically favorable due to decrease in the number of sequences, rather than conducting a 3×3 crossover trial or two separate 2 × 2 crossover trials. However, when using a higher-order crossover trial, the risk of drop-outs and withdrawals of subjects increases, so the suitable statistical inferences for missing data is needed. The bioequivalence model of a of 2×3 crossover trial with missing data is defined and the statistical procedures of assessing bioequivalence is proposed. An illustrated example of the 2 × 3 trial with missing data is also presented with discussion.
https://doi.org/10.29220/CSAM.2020.27.6.641 인용 PDF KSCI

Effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment

Kim, Byung-Soo;Rha, Sun-Young
- Bioinformatics and Biosystems
- /
- v.1 no.1
- /
- pp.67-72
- /
- 2006
The aim of this paper is to discuss the effect of missing values in detecting differentially expressed genes in a cDNA microarray experiment in the context of a one sample problem. We conducted a cDNA micro array experiment to detect differentially expressed genes for the metastasis of colorectal cancer based on twenty patients who underwent liver resection due to liver metastasis from colorectal cancer. Total RNAs from metastatic liver tumor and adjacent normal liver tissue from a single patient were labeled with cy5 and cy3, respectively, and competitively hybridized to a cDNA microarray with 7775 human genes. We used $M=log_2(R/G)$ for the signal evaluation, where Rand G denoted the fluorescent intensities of Cy5 and Cy3 dyes, respectively. The statistical problem comprises a one sample test of testing E(M)=0 for each gene and involves multiple tests. The twenty cDNA microarray data would comprise a matrix of dimension 7775 by 20, if there were no missing values. However, missing values occur for various reasons. For each gene, the no missing proportion (NMP) was defined to be the proportion of non-missing values out of twenty. In detecting differentially expressed (DE) genes, we used the genes whose NMP is greater than or equal to 0.4 and then sequentially increased NMP by 0.1 for investigating its effect on the detection of DE genes. For each fixed NMP, we imputed the missing values with K-nearest neighbor method (K=10) and applied the nonparametric t-test of Dudoit et al. (2002), SAM by Tusher et al. (2001) and empirical Bayes procedure by $L\ddot{o}nnstedt$ and Speed (2002) to find out the effect of missing values in the final outcome. These three procedures yielded substantially agreeable result in detecting DE genes. Of these three procedures we used SAM for exploring the acceptable NMP level. The result showed that the optimum no missing proportion (NMP) found in this data set turned out to be 80%. It is more desirable to find the optimum level of NMP for each data set by applying the method described in this note, when the plot of (NMP, Number of overlapping genes) shows a turning point.
PDF

Comparison of GEE Estimators Using Imputation Methods (대체방법별 GEE추정량 비교)

김동욱;노영화
- The Korean Journal of Applied Statistics
- /
- v.16 no.2
- /
- pp.407-426
- /
- 2003
We consider the missing covariates problem in generalized estimating equations(GEE) model. If the covariate is partially missing, GEE can not be calculated. In this paper, we study the performance of 7 imputation methods to handle missing covariates in GEE models, and the properties of GEE estimators are investigated after missing covariates are imputed for ordinal data of repeated measurements. The 7 imputation methods include i) Naive Deletion ii) Sample Average Imputation iii) Row Average Imputation iv) Cross-wave Regression Imputation v) Carry-over Imputation vi) Bayesian Bootstrap vii) Approximate Bayesian Bootstrap. A Monte-Carlo simulation is used to compare the performance of these methods. For the missing mechanism generating the missing data, we assume ignorable nonresponse. Furthermore, we generate missing covariates with or without considering wave nonresp onse patterns.
https://doi.org/10.5351/KJAS.2003.16.2.407 인용 PDF KSCI

Comparing Accuracy of Imputation Methods for Categorical Incomplete Data (범주형 자료의 결측치 추정방법 성능 비교)

신형원;손소영
- The Korean Journal of Applied Statistics
- /
- v.15 no.1
- /
- pp.33-43
- /
- 2002
Various kinds of estimation methods have been developed for imputation of categorical missing data. They include category method, logistic regression, and association rule. In this study, we propose two fusions algorithms based on both neural network and voting scheme that combine the results of individual imputation methods. A Mont-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data pattern are (1) input-output function, (2) data size, (3) noise of input-output function (4) proportion of missing data, and (5) pattern of missing data. Experimental study results indicate the following: when the data size is small and missing data proportion is large, modal category method, association rule, and neural network based fusion have better performances than the other methods. However, when the data size is small and correlation between input and missing output is strong, logistic regression and neural network barred fusion algorithm appear better than the others. When data size is large with low missing data proportion, a large noise, and strong correlation between input and missing output, neural networks based fusion algorithm turns out to be the best choice.
https://doi.org/10.5351/KJAS.2002.15.1.033 인용 PDF KSCI

An Approach to Survey Data with Nonresponse: Evaluation of KEPEC Data with BMI (무응답이 있는 설문조사연구의 접근법 : 한국노인약물역학코호트 자료의 평가)

Baek, Ji-Eun;Kang, Wee-Chang;Lee, Young-Jo;Park, Byung-Joo
- Journal of Preventive Medicine and Public Health
- /
- v.35 no.2
- /
- pp.136-140
- /
- 2002
Objectives : A common problem with analyzing survey data involves incomplete data with either a nonresponse or missing data. The mail questionnaire survey conducted for collecting lifestyle variables on the members of the Korean Elderly Phamacoepidemiologic Cohort(KEPEC) in 1996 contains some nonresponse or missing data. The proper statistical method was applied to evaluate the missing pattern of a specific KEPEC data, which had no missing data in the independent variable and missing data in the response variable, BMI. Methods : The number of study subjects was 8,689 elderly people. Initially, the BMI and significant variables that influenced the BMI were categorized. After fitting the log-linear model, the probabilities of the people on each category were estimated. The EM algorithm was implemented using a log-linear model to determine the missing mechanism causing the nonresponse. Results : Age, smoking status, and a preference of spicy hot food were chosen as variables that influenced the BMI. As a result of fitting the nonignorable and ignorable nonresponse log-linear model considering these variables, the difference in the deviance in these two models was 0.0034(df=1). Conclusion : There is a lot of risk if an inference regarding the variables and large samples is made without considering the pattern of missing data. On the basis of these results, the missing data occurring in the BMI is the ignorable nonresponse. Therefore, when analyzing the BMI in KEPEC data, the inference can be made about the data without considering the missing data.
PDF KSCI

Search Result 2,817, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)