• Title/Summary/Keyword: Missing Values

Search Result 441, Processing Time 0.025 seconds

Development of a fatty acids database using the Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료를 이용한 지방산 데이터베이스 구축)

  • Yoon, Mi Ock;Kim, Kirang;Hwang, Ji-Yun;Lee, Hyun Sook;Son, Tae Young;Moon, Hyun-Kyung;Shim, Jae Eun
    • Journal of Nutrition and Health
    • /
    • v.47 no.6
    • /
    • pp.435-442
    • /
    • 2014
  • Purpose: The objective of this study was to develop a fatty acid database (DB) for estimation of intake levels of fatty acids in the Korean population, using data from the Korea National Health and Nutrition Survey (KNHANES). Methods: Analytical values of fatty acids in foods were collected from food composition tables of national institutions (National Fisheries Research & Development Institute, Rural Development Administration), Japan Ministry of Education, Culture, Sports, Science and Technology, US Department of Agriculture, and journal articles that previously reported analytical fatty acid content of some Korean foods. The coverage of fatty acids was C14:0, C16:0, C18:0, C18:1, C18:2 n-6, C18:3 n-3, C20:5 n-3 (EPA), C22:6 n-3 (DHA), SFA, MUFA, and PUFA (n-3, n-6, n-9). The fatty acids DB covered a total of 5,144 food items used in the KNHANES nutrition survey. The food items were preferentially filled with analytical values of the collected data source. An analytical value for each food item was selected based on the priority criteria and the quality evaluation of data sources. Missing values were replaced with calculated or imputed values using the analytical values of similar food items from the data source. Results: A total of 1,545 analytical values, 2,589 calculated values, and 1,010 imputed values were included in the fatty acid DB. The developed fatty acid DB was applied to 2,112 food items available for 2011 KNHANES data. Mean intake levels of total fatty acids and saturated fatty acids were 40.3 g/day and 13.2 g/day, respectively. The estimation of total fatty acid intake was 84.3% (men 83.2%, women 86.0%) of daily total fat intake. Conclusion: This newly developed fatty acid DB would be helpful in determining the association of fatty acids intake and related health concerns in the Korean population.

A Non-annotated Recurrent Neural Network Ensemble-based Model for Near-real Time Detection of Erroneous Sea Level Anomaly in Coastal Tide Gauge Observation (비주석 재귀신경망 앙상블 모델을 기반으로 한 조위관측소 해수위의 준실시간 이상값 탐지)

  • LEE, EUN-JOO;KIM, YOUNG-TAEG;KIM, SONG-HAK;JU, HO-JEONG;PARK, JAE-HUN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.26 no.4
    • /
    • pp.307-326
    • /
    • 2021
  • Real-time sea level observations from tide gauges include missing and erroneous values. Classification as abnormal values can be done for the latter by the quality control procedure. Although the 3𝜎 (three standard deviations) rule has been applied in general to eliminate them, it is difficult to apply it to the sea-level data where extreme values can exist due to weather events, etc., or where erroneous values can exist even within the 3𝜎 range. An artificial intelligence model set designed in this study consists of non-annotated recurrent neural networks and ensemble techniques that do not require pre-labeling of the abnormal values. The developed model can identify an erroneous value less than 20 minutes of tide gauge recording an abnormal sea level. The validated model well separates normal and abnormal values during normal times and weather events. It was also confirmed that abnormal values can be detected even in the period of years when the sea level data have not been used for training. The artificial neural network algorithm utilized in this study is not limited to the coastal sea level, and hence it can be extended to the detection model of erroneous values in various oceanic and atmospheric data.

A Computational Intelligence Based Online Data Imputation Method: An Application For Banking

  • Nishanth, Kancherla Jonah;Ravi, Vadlamani
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.633-650
    • /
    • 2013
  • All the imputation techniques proposed so far in literature for data imputation are offline techniques as they require a number of iterations to learn the characteristics of data during training and they also consume a lot of computational time. Hence, these techniques are not suitable for applications that require the imputation to be performed on demand and near real-time. The paper proposes a computational intelligence based architecture for online data imputation and extended versions of an existing offline data imputation method as well. The proposed online imputation technique has 2 stages. In stage 1, Evolving Clustering Method (ECM) is used to replace the missing values with cluster centers, as part of the local learning strategy. Stage 2 refines the resultant approximate values using a General Regression Neural Network (GRNN) as part of the global approximation strategy. We also propose extended versions of an existing offline imputation technique. The offline imputation techniques employ K-Means or K-Medoids and Multi Layer Perceptron (MLP)or GRNN in Stage-1and Stage-2respectively. Several experiments were conducted on 8benchmark datasets and 4 bank related datasets to assess the effectiveness of the proposed online and offline imputation techniques. In terms of Mean Absolute Percentage Error (MAPE), the results indicate that the difference between the proposed best offline imputation method viz., K-Medoids+GRNN and the proposed online imputation method viz., ECM+GRNN is statistically insignificant at a 1% level of significance. Consequently, the proposed online technique, being less expensive and faster, can be employed for imputation instead of the existing and proposed offline imputation techniques. This is the significant outcome of the study. Furthermore, GRNN in stage-2 uniformly reduced MAPE values in both offline and online imputation methods on all datasets.

The Impact of Food Service Franchisee's Customer-oriented Activities on Hedonic, and Utilitarian Values and Loyalty

  • JANG, Hae-Jin;WOO, Sung-Keun;LEE, Yong-Ki
    • The Korean Journal of Franchise Management
    • /
    • v.11 no.1
    • /
    • pp.7-17
    • /
    • 2020
  • Purpose - As the competition in the foodservice franchise industry and the market becomes fierce and the entry barrier is lowered, the foodservice franchisor and franchisees strive to increase their competitive advantage in the market. Therefore, the franchisor and franchisees use experience management strategies to enhance the positive experiences of customers visiting the stores. In this regard, this study examines the effects of customer-oriented activities (physical-, social-, health-, and service-oriented activities) on utilitarian and hedonic values, and loyalty using stimulus-organism-response (S-O-R) model and value-expectancy theory. Research design, data, methodology - The data were collected from panels of online survey company, who visited a foodservice franchisee within last month. The survey was conducted for about 15 days from March 7, 2019 to March 21, 2019, and about 3,500 e-mails and messages were distributed to ask for the survey. A total 412 responded and completed the questionnaires. Of the 412 completed questionaires, 12 were discarded due to missing and misinformation data and 400 were retained for further data analysis. Results --The results showed that social oriented activities, health oriented activities, and service oriented activities had positive effects on hedonic value, while physical oriented activities did not have a significant effect on hedonic value. Health oriented activities and service oriented activities had positive effects on utilitarian value, while physical oriented activities and social oriented activities had no significant effects on utilitarian value. Hedonic and utilitarian values also have a positive effect on loyalty. Conclusions - First, food service franchises should provide services and menus in consideration of the health of customers. When a customer visits the store, franchisee should provide more health-oriented food or materials and clean and comfortable conditions so as not to threaten the health of the customer. Second, the food service franchise must build a service-oriented system. Foodservice franchisor need to provide continuous service training not only to the franchisees, but also to the employees of the franchisees. Third, franchise should design a store where customers can form social exchanges through providing various information exchange to customers and making the store as a local community center.

Imputation Accuracy from 770K SNP Chips to Next Generation Sequencing Data in a Hanwoo (Korean Native Cattle) Population using Minimac3 and Beagle (Minimac3와 Beagle 프로그램을 이용한 한우 770K chip 데이터에서 차세대 염기서열분석 데이터로의 결측치 대치의 정확도 분석)

  • An, Na-Rae;Son, Ju-Hwan;Park, Jong-Eun;Chai, Han-Ha;Jang, Gul-Won;Lim, Dajeong
    • Journal of Life Science
    • /
    • v.28 no.11
    • /
    • pp.1255-1261
    • /
    • 2018
  • Whole genome analysis have been made possible with the development of DNA sequencing technologies and discovery of many single nucleotide polymorphisms (SNPs). Large number of SNP can be analyzed with SNP chips, since SNPs of human as well as livestock genomes are available. Among the various missing nucleotide imputation programs, Minimac3 software is suggested to be highly accurate, with a simplified workflow and relatively fast. In the present study, we used Minimac3 program to perform genomic missing value substitution 1,226 animals 770K SNP chip and imputing missing SNPs with next generation sequencing data from 311 animals. The accuracy on each chromosome was about 94~96%, and individual sample accuracy was about 92~98%. After imputation of the genotypes, SNPs with R Square ($R^2$) values for three conditions were 0.4, 0.6, and 0.8 and the percentage of SNPs were 91%, 84%, and 70% respectively. The differences in the Minor Allele Frequency gave $R^2$ values corresponding to seven intervals (0, 0.025), (0.025, 0.05), (0.05, 0.1), (0.1, 0.2), (0.2, 0.3). (0.3, 0.4) and (0.4, 0.5) of 64~88%. The total analysis time was about 12 hr. In future SNP chip studies, as the size and complexity of the genomic datasets increase, we expect that genomic imputation using Minimac3 can improve the reliability of chip data for Hanwoo discrimination.

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

A Methodological Quality Assessment of South Korean Nursing Research using Structural Equation Modeling in South Korea (국내 간호학 학회지에 출판된 구조방정식모형 연구의 방법론적 질 평가)

  • Kim, Jung-Hee;Shin, Sujin;Park, Jin-Hwa
    • Journal of Korean Academy of Nursing
    • /
    • v.45 no.2
    • /
    • pp.159-168
    • /
    • 2015
  • Purpose: The purpose of this study was to evaluate the methodological quality of nursing studies using structural equation modeling in Korea. Methods: Databases of KISS, DBPIA, and National Assembly Library up to March 2014 were searched using the MeSH terms 'nursing', 'structure', 'model'. A total of 152 studies were screened. After removal of duplicates and non-relevant titles, 61 papers were read in full. Results: Of the sixty-one articles retrieved, 14 studies were published between 1992 and 2000, 27, between 2001 and 2010, and 20, between 2011 and March 2014. The methodological quality of the review examined varied considerably. Conclusion: The findings of this study suggest that more rigorous research is necessary to address theoretical identification, two indicator rule, distribution of sample, treatment of missing values, mediator effect, discriminant validity, convergent validity, post hoc model modification, equivalent models issues, and alternative models issues should be undergone. Further research with robust consistent methodological study designs from model identification to model respecification is needed to improve the validity of the research.

The Current State and Determinants of Korean Baby-Boomers' Welfare Consciousness

  • Lee, Hyoung-Ha
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.193-200
    • /
    • 2016
  • This study was conducted in order to assess the effect of variables influencing Korean baby-boomers' welfare consciousness. For this purpose, data from the $8^{th}$ supplementary survey of the Korea Welfare Panel in 2013 were analyzed. The subjects of analysis were 2,035 people who were born between 1955 and 1965 whose welfare panel data did not have missing values for the variables of the research model. According to the results of analysis, first, when the descriptive statistics of the major variables were analyzed, those showing a relatively high mean score among the sub-factors of the baby-boomers' welfare consciousness were 'expansion of expenditure for public assistance' (mean 3.65, SD .557), 'expansion of expenditure for social insurance' (mean 3.53, SD .646), and 'expansion of expenditure for social services' (mean 3.26, SD .424). The mean score of the baby-boomers' overall welfare consciousness was relatively high as 3.45 (SD .428), advocating the expansion of welfare expenditure. Second, the independent variables influencing the baby-boomers' welfare consciousness was found to have explanatory power of 12.9%. In the results of regression analysis, variables found to have a significant effect were gender (B=.100, t=2.573, p<.01), personal responsibility for poverty (B=-.151, t=-3.635, p<.01), social responsibility for poverty (B=.149, t=3.437, p<.001), and recipient's laziness (B=.251, t=6.578, p<.001). Based on these results were discussed major relevant policies.

Chinese Tourist Shopping Satisfaction and Brand Attitude to Korean Cosmetics : A Disconfirmation Approach

  • Yoon, Ju-Hee;Hwang, Yong-Cheol;Suh, Jaebeom;Kim, Jae-Gyun
    • Journal of Distribution Science
    • /
    • v.15 no.10
    • /
    • pp.51-63
    • /
    • 2017
  • Purpose - The current study examines the shopping behavior of Chinese tourists who purchase Korean cosmetics when visiting Korea, based on expectancy-disconfirmation of shopping satisfaction and brand attitude toward Korean cosmetics. A moderating effect of consumer conformity on the relationships between cosmetics selection factors and two dimensions of disconfirmation - expectation and performance is also examined. Research design, data, and methodology - We conducted a survey with 250 Chinese tourists who visited Jeju, Korea and had purchased Korean cosmetics during their stay. Excluding 43 respondents' inputs because of incomplete answers and missing values, 207 responses were used in the final analysis. All hypotheses were tested using structural equation model (SEM). Results - We found that the Chinese tourist expectations had positive impact on their satisfaction, and the factors for cosmetic selection had a positive effect on shopping satisfaction and brand attitude. A moderating effect of consumer conformity was found to be significant. Conclusions - Given the significantly increased demand for Korean cosmetics from Chinese tourists, Korean cosmetics firms need to better understand cosmetics selection attributes and preference of Chinese tourists, which can provide a guideline to develop retail stores and distribution outlets for Chinese tourists.

Detecting Boundaries between Different Color Regions in Color Codes

  • Kwon B. H.;Yoo H. J.;Kim T. W.
    • Proceedings of the IEEK Conference
    • /
    • 2004.08c
    • /
    • pp.846-849
    • /
    • 2004
  • Compared to the bar code which is being widely used for commercial products management, color code is advantageous in both the outlook and the number of combinations. And the color code has application areas complement to the RFID's. However, due to the severe distortion of the color component values, which is easily over $50{\%}$ of the scale, color codes have difficulty in finding applications in the industry. To improve the accuracy of recognition of color codes, it'd better to statistically process an entire color region and then determine its color than to process some samples selected from the region. For this purpose, we suggest a technique to detect edges between color regions in this paper, which is indispensable for an accurate segmentation of color regions. We first transformed RGB color image to HSI and YIQ color models, and then extracted I- and Y-components from them, respectively. Then we performed Canny edge detection on each component image. Each edge image usually had some edges missing. However, since the resulting edge images were complementary, we could obtain an optimal edge image by combining them.

  • PDF