• Title/Summary/Keyword: Imputing

Search Result 22, Processing Time 0.019 seconds

Preparation of Copper Database of Korean Foods and Copper Nutritional Status of Korean Adults Living in Rural Area Assessed by Dietary Intake and Serum Analysis (한국인 상용 식품의 구리영양가표작성과 식이섭취 및 혈청분석에 의한 한국농촌성인의 구리영양상태 평가)

  • 정효지
    • Journal of Nutrition and Health
    • /
    • v.32 no.3
    • /
    • pp.296-306
    • /
    • 1999
  • This study was carried out to prepare a copper database of Korean foods which can be used in calculating copper intake from dietary data, and to evaluate the copper nutritional status of Koran adults living in rural areas by dietary intake and serum copper concentrations. A copper database for 1,176 Korean foods was constructed (1) by analysing 112 Korean foods which are frequently consumed by Korean adults living in rural areas, (2) by adapting values form food composition databases from other countries-320 items from the University of Minnesota database, 201 items from the USAD database, and 25 items from U.K. database, and (3) by imputing values from similar foods for 518 food items. Copper intake of 2,034 Korean adults over the age of 30 living in Yeonchongun was Kyunggi province, Korea was estimated by 24-hour recall method. Mean daily copper intake of subjects was 0.98mg. Mean daily intake level of males was 1.11mg which was significantly higher than that of females, 0.88mg. There was a significant difference in the distribution of subjects by the level of copper intake and sex(p<0.05). Mean serum copper concentration was 14.8umol/1 and the percentage of subjects with low, adequate, and high levels of copper concentration were 23.9%, 69.4%, and 6.6%, respectively. The two food groups which contributed most to the dietary copper intake of subjects were cereals and grain products, and vegetables, supplying 46.2% and 12.7% of total copper intake, respectively. Individuallym, rice contributed most, suppling 31% of total copper intake, followed by soybean curd, starch vermicle, barley, etc. Plant foods contributed to 82.1% of the total copper intake. In summary, results of this study show that copper intake of Korean adults living in rural areas is low, and that dietary sources of copper are mainly plant foods. Serum levels of copper in the subjects were relatively normal. The copper database for Korean foods constructed in present study will be a valuable tool for the as-yet limited assessment of copper intake of Koreans. Such studies will contribute to the establishment of a dietary of a dietary allowance of copper and the relationship of copper nutriture and chronic diseases in Koreans.

  • PDF

Modelling Missing Traffic Volume Data using Circular Probability Distribution (순환확률분포를 이용한 교통량 결측자료 보정 모형)

  • Kim, Hyeon-Seok;Im, Gang-Won;Lee, Yeong-In;Nam, Du-Hui
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.4
    • /
    • pp.109-121
    • /
    • 2007
  • In this study, an imputation model using circular probability distribution was developed in order to overcome problems of missing data from a traffic survey. The existing ad-hoc or heuristic, model-based and algorithm-based imputation techniques were reviewed through previous studies, and then their limitations for imputing missing traffic volume data were revealed. The statistical computing language 'R' was employed for model construction, and a mixture of von Mises probability distribution, which is classified as symmetric, and unimodal circular probability were finally fitted on the basis of traffic volume data at survey stations in urban and rural areas, respectively. The circular probability distribution model largely proved to outperform a dummy variable regression model in regards to various evaluation conditions. It turned out that circular probability distribution models depict circularity of hourly volumes well and are very cost-effective and robust to changes in missing mechanisms.

A study on the imputation solution for missing speed data on UTIS by using adaptive k-NN algorithm (적응형 k-NN 기법을 이용한 UTIS 속도정보 결측값 보정처리에 관한 연구)

  • Kim, Eun-Jeong;Bae, Gwang-Soo;Ahn, Gye-Hyeong;Ki, Yong-Kul;Ahn, Yong-Ju
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.13 no.3
    • /
    • pp.66-77
    • /
    • 2014
  • UTIS(Urban Traffic Information System) directly collects link travel time in urban area by using probe vehicles. Therefore it can estimate more accurate link travel speed compared to other traffic detection systems. However, UTIS includes some missing data caused by the lack of probe vehicles and RSEs on road network, system failures, and other factors. In this study, we suggest a new model, based on k-NN algorithm, for imputing missing data to provide more accurate travel time information. New imputation model is an adaptive k-NN which can flexibly adjust the number of nearest neighbors(NN) depending on the distribution of candidate objects. The evaluation result indicates that the new model successfully imputed missing speed data and significantly reduced the imputation error as compared with other models(ARIMA and etc). We have a plan to use the new imputation model improving traffic information service by applying UTIS Central Traffic Information Center.

Comparisons of Imputation Methods for Wave Nonresponse in Panel Surveys (패널조사 웨이브 무응답의 대체방법 비교)

  • Kim, Kyu-Seong;Park, In-Ho
    • Survey Research
    • /
    • v.11 no.1
    • /
    • pp.1-18
    • /
    • 2010
  • We compare various imputation methods for compensating wave nonresponse that are commonly adopted in many panel surveys. Unlike the cross-sectional survey, the panel survey is involved a time-effect in nonresponse in a sense that nonresponse may happen for some but not all waves. Thus, responses in neighboring waves can be used as powerful predictors for imputing wave nonresponse such as in longitudinal regression imputation, carry-over imputation, nearest neighborhood regression imputation and row-column imputation method. For comparison, we carry out a simulation study on a few income data from the Korean Welfare Panel Study based on two performance criteria: predictive accuracy and estimation accuracy. Our simulation shows that the ratio and row-column imputation methods are much more effective in terms of both criteria. Regression, longitudinal regression and carry-over imputation methods performed better in predictive accuracy, but less in estimation accuracy. On the other hand, nearest neighborhood, nearest neighbor regression and hot-deck imputation show higher performance in estimation accuracy but lower predictive accuracy. Finally, the mean imputation shows much lower performance in both criteria.

  • PDF

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

Probability Estimation Method for Imputing Missing Values in Data Expansion Technique (데이터 확장 기법에서 손실값을 대치하는 확률 추정 방법)

  • Lee, Jong Chan
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.11
    • /
    • pp.91-97
    • /
    • 2021
  • This paper uses a data extension technique originally designed for the rule refinement problem to handling incomplete data. This technique is characterized in that each event can have a weight indicating importance, and each variable can be expressed as a probability value. Since the key problem in this paper is to find the probability that is closest to the missing value and replace the missing value with the probability, three different algorithms are used to find the probability for the missing value and then store it in this data structure format. And, after learning to classify each information area with the SVM classification algorithm for evaluation of each probability structure, it compares with the original information and measures how much they match each other. The three algorithms for the imputation probability of the missing value use the same data structure, but have different characteristics in the approach method, so it is expected that it can be used for various purposes depending on the application field.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Denoising Self-Attention Network for Mixed-type Data Imputation (혼합형 데이터 보간을 위한 디노이징 셀프 어텐션 네트워크)

  • Lee, Do-Hoon;Kim, Han-Joon;Chun, Joonghoon
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.11
    • /
    • pp.135-144
    • /
    • 2021
  • Recently, data-driven decision-making technology has become a key technology leading the data industry, and machine learning technology for this requires high-quality training datasets. However, real-world data contains missing values for various reasons, which degrades the performance of prediction models learned from the poor training data. Therefore, in order to build a high-performance model from real-world datasets, many studies on automatically imputing missing values in initial training data have been actively conducted. Many of conventional machine learning-based imputation techniques for handling missing data involve very time-consuming and cumbersome work because they are applied only to numeric type of columns or create individual predictive models for each columns. Therefore, this paper proposes a new data imputation technique called 'Denoising Self-Attention Network (DSAN)', which can be applied to mixed-type dataset containing both numerical and categorical columns. DSAN can learn robust feature expression vectors by combining self-attention and denoising techniques, and can automatically interpolate multiple missing variables in parallel through multi-task learning. To verify the validity of the proposed technique, data imputation experiments has been performed after arbitrarily generating missing values for several mixed-type training data. Then we show the validity of the proposed technique by comparing the performance of the binary classification models trained on imputed data together with the errors between the original and imputed values.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • v.86 no.3
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.

Studies on the Character of Forest Purchasers and It's Forestry Activities -A Case Study on the Transfer of Forest Ownership and Forest Investment- (산림취득자본(山林取得資本)의 성격(性格)과 그들의 임업생산(林業生産)에 관(関)한 연구(硏究) -산림(山林)의 소유변동(所有変動)과 그들의 임업투자(林業投資) 사례(事例)-)

  • Park, Myong Kyu;Lee, Tchang Bok
    • Journal of Korean Society of Forest Science
    • /
    • v.55 no.1
    • /
    • pp.59-67
    • /
    • 1982
  • The objectives of this report is to evaluate the contribution of forest investments by the forest owners for the developments of private forests in the villages where the forest production, especially, chestnut production is active. The results obtained are as follows : 1) Newly purchased forest lands of 526 hectares, 71 percent, in 741 hectares by 96 farmers were replanted with chestnut trees for chestnut production. 2) As the chestnut production is considered to be the unique source of the early capital return in management of forests, selling and buying of forest lands in the area surveyed are enhanced to reforest the forest land with chestnut seedlings. 3) Most of new farmers being engaged in plantation and production of chestnuts in the forests are employees of private industries and government agencies, and merchants in neighboring towns. 4) All materials and expenses for formation of chestnut orchards are generally supplied by forest land owners. 5) Active buying and selling of newly established chestnut bushes are surely served as the estate in the area, thus, the trading of the bushes of young chestnut seedling also enhances the forest as the estate. 6) The management of forest established with chestnut orchards is a special form for forest investment, it makes possible to encourage imputing of capital to the new form of forests, chestnut orchards, and it could be a good possible model for private forest development as compared with that of government funding.

  • PDF