• Title/Summary/Keyword: Missing Values

Search Result 440, Processing Time 0.025 seconds

A Study on Solo Consumers' Shopping Orientation (Solo 소비자의 쇼핑성향에 관한 연구)

  • Suh, Yong-Han
    • Fashion & Textile Research Journal
    • /
    • v.9 no.3
    • /
    • pp.312-318
    • /
    • 2007
  • Changing demographics and social values suggest that marketers will see an increase in the number of solo customers. Solo consumer often have different reasons for their solo consumer shopping experience. Proper offer of the shopping environment or services and proper training of the service provider to recognize these different motivation will result in a higher level of satisfaction for all solo consumer. The purposes of this study are to identify the demographic characteristics of solo consumer and investigate the difference of solo consumer' shopping orientation in clothes shopping process. To test research problems, Data were collected by means of a survey questionnaire, which was sent to 250 single in Pusan and Ulsan. In total, 224 were returned and 25 were unusable because of unacceptable levels of missing data. The Results were summarized as follows: First, solo consumer had higher incomes, and higher education experience. Second, solo consumer and non-solo consumer (companion consumer) group had significantly different shopping orientation about ostentationbrand, convenience, and hedonic.

Predicting Personal Credit Rating with Incomplete Data Sets Using Frequency Matrix technique (Frequency Matrix 기법을 이용한 결측치 자료로부터의 개인신용예측)

  • Bae, Jae-Kwon;Kim, Jin-Hwa;Hwang, Kook-Jae
    • Journal of Information Technology Applications and Management
    • /
    • v.13 no.4
    • /
    • pp.273-290
    • /
    • 2006
  • This study suggests a frequency matrix technique to predict personal credit rate more efficiently using incomplete data sets. At first this study test on multiple discriminant analysis and logistic regression analysis for predicting personal credit rate with incomplete data sets. Missing values are predicted with mean imputation method and regression imputation method here. An artificial neural network and frequency matrix technique are also tested on their performance in predicting personal credit rating. A data set of 8,234 customers in 2004 on personal credit information of Bank A are collected for the test. The performance of frequency matrix technique is compared with that of other methods. The results from the experiments show that the performance of frequency matrix technique is superior to that of all other models such as MDA-mean, Logit-mean, MDA-regression, Logit-regression, and artificial neural networks.

  • PDF

Anomaly detection in particulate matter sensor using hypothesis pruning generative adversarial network

  • Park, YeongHyeon;Park, Won Seok;Kim, Yeong Beom
    • ETRI Journal
    • /
    • v.43 no.3
    • /
    • pp.511-523
    • /
    • 2021
  • The World Health Organization provides guidelines for managing the particulate matter (PM) level because a higher PM level represents a threat to human health. To manage the PM level, a procedure for measuring the PM value is first needed. We use a PM sensor that collects the PM level by laser-based light scattering (LLS) method because it is more cost effective than a beta attenuation monitor-based sensor or tapered element oscillating microbalance-based sensor. However, an LLS-based sensor has a higher probability of malfunctioning than the higher cost sensors. In this paper, we regard the overall malfunctioning, including strange value collection or missing collection data as anomalies, and we aim to detect anomalies for the maintenance of PM measuring sensors. We propose a novel architecture for solving the above aim that we call the hypothesis pruning generative adversarial network (HP-GAN). Through comparative experiments, we achieve AUROC and AUPRC values of 0.948 and 0.967, respectively, in the detection of anomalies in LLS-based PM measuring sensors. We conclude that our HP-GAN is a cutting-edge model for anomaly detection.

Default Prediction of Automobile Credit Based on Support Vector Machine

  • Chen, Ying;Zhang, Ruirui
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.75-88
    • /
    • 2021
  • Automobile credit business has developed rapidly in recent years, and corresponding default phenomena occur frequently. Credit default will bring great losses to automobile financial institutions. Therefore, the successful prediction of automobile credit default is of great significance. Firstly, the missing values are deleted, then the random forest is used for feature selection, and then the sample data are randomly grouped. Finally, six prediction models of support vector machine (SVM), random forest and k-nearest neighbor (KNN), logistic, decision tree, and artificial neural network (ANN) are constructed. The results show that these six machine learning models can be used to predict the default of automobile credit. Among these six models, the accuracy of decision tree is 0.79, which is the highest, but the comprehensive performance of SVM is the best. And random grouping can improve the efficiency of model operation to a certain extent, especially SVM.

A Study on Development Environments for Machine Learning (머신러닝 자동화를 위한 개발 환경에 관한 연구)

  • Kim, Dong Gil;Park, Yong-Soon;Park, Lae-Jeong;Chung, Tae-Yun
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.15 no.6
    • /
    • pp.307-316
    • /
    • 2020
  • Machine learning model data is highly affected by performance. preprocessing is needed to enable analysis of various types of data, such as letters, numbers, and special characters. This paper proposes a development environment that aims to process categorical and continuous data according to the type of missing values in stage 1, implementing the function of selecting the best performing algorithm in stage 2 and automating the process of checking model performance in stage 3. Using this model, machine learning models can be created without prior knowledge of data preprocessing.

The Mediating Effects of Subjective Health Perception on the Relationship between Physical Activity, Eating Habits and Mental Health in Gangwon-do Youth

  • Ji-Woo Han
    • International journal of advanced smart convergence
    • /
    • v.12 no.3
    • /
    • pp.192-199
    • /
    • 2023
  • The purpose of this study was to investigate the structural relationship between eating habits, physical activity, and subjective health perception, which can affect the mental health status of adolescents, and to examine whether subjective health perception has a mediating effect in these relationships. In this study, raw data from the "17th 2021 Youth Health Behavior Online Survey" were used, and a total of 1,998 people were used for the analysis of Gangwon-do adolescents, except for data with missing values. For analysis, SPSS 25.0 and AMOS 25.0 programs were used to analyze descriptive statistics, t-test, and structural equation models(SEM). Physical activity was found to have a positive and significant effect on mental health status, and subjective health cognition showed the effect of physical activity mediating mental health status.

Classification for Imbalanced Breast Cancer Dataset Using Resampling Methods

  • Hana Babiker, Nassar
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.89-95
    • /
    • 2023
  • Analyzing breast cancer patient files is becoming an exciting area of medical information analysis, especially with the increasing number of patient files. In this paper, breast cancer data is collected from Khartoum state hospital, and the dataset is classified into recurrence and no recurrence. The data is imbalanced, meaning that one of the two classes have more sample than the other. Many pre-processing techniques are applied to classify this imbalanced data, resampling, attribute selection, and handling missing values, and then different classifiers models are built. In the first experiment, five classifiers (ANN, REP TREE, SVM, and J48) are used, and in the second experiment, meta-learning algorithms (Bagging, Boosting, and Random subspace). Finally, the ensemble model is used. The best result was obtained from the ensemble model (Boosting with J48) with the highest accuracy 95.2797% among all the algorithms, followed by Bagging with J48(90.559%) and random subspace with J48(84.2657%). The breast cancer imbalanced dataset was classified into recurrence, and no recurrence with different classified algorithms and the best result was obtained from the ensemble model.

Performance Comparison of LSTM-Based Groundwater Level Prediction Model Using Savitzky-Golay Filter and Differential Method (Savitzky-Golay 필터와 미분을 활용한 LSTM 기반 지하수 수위 예측 모델의 성능 비교)

  • Keun-San Song;Young-Jin Song
    • Journal of the Semiconductor & Display Technology
    • /
    • v.22 no.3
    • /
    • pp.84-89
    • /
    • 2023
  • In water resource management, data prediction is performed using artificial intelligence, and companies, governments, and institutions continue to attempt to efficiently manage resources through this. LSTM is a model specialized for processing time series data, which can identify data patterns that change over time and has been attempted to predict groundwater level data. However, groundwater level data can cause sen-sor errors, missing values, or outliers, and these problems can degrade the performance of the LSTM model, and there is a need to improve data quality by processing them in the pretreatment stage. Therefore, in pre-dicting groundwater data, we will compare the LSTM model with the MSE and the model after normaliza-tion through distribution, and discuss the important process of analysis and data preprocessing according to the comparison results and changes in the results.

  • PDF

A Study on Ceramic Restoration Methods with Full Color 3D Printing (풀 컬러 3D 프린팅을 이용한 도자기 복원 방법 연구)

  • Shin, Woo Cheol;Wi, Koang Chul
    • Journal of Conservation Science
    • /
    • v.36 no.5
    • /
    • pp.306-314
    • /
    • 2020
  • The use of synthetic resins in ceramic restoration poses several challenges, including aging and potential damage to artifacts, which has raised the need to investigate new materials and restoration methods. This study set out to incorporate full color 3D printing into the 3D digital technology-based restoration method, an emerging approach currently being researched, and to print out missing parts with color information. After examining material physical properties with an experiment, the investigator printed out missing parts from a white porcelain vessel and grayish-blue-powdered celadon plate and compared them in chromaticity and brilliance. The experimental results show that the outputs had comparable tensile strength to the original restoration materials, whereas the recorded compressive strength was approximately 1.4~2 times higher than that of the original restoration materials. According to the NIST table of color difference values, the white porcelain vessel was visible at ΔE*ab 1.55, and the grayish-blue-powdered celadon plate was perceivable at 3.34. Even though it was impossible to express the colors accurately owing to printer limitations, this non-contact approach reduced the possibility of damage to the minimum. In conclusion, it can be applied to objects with a high chance of damage or generate display effects through purposeful color differentiation in missing parts.

Development of a fatty acids database using the Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료를 이용한 지방산 데이터베이스 구축)

  • Yoon, Mi Ock;Kim, Kirang;Hwang, Ji-Yun;Lee, Hyun Sook;Son, Tae Young;Moon, Hyun-Kyung;Shim, Jae Eun
    • Journal of Nutrition and Health
    • /
    • v.47 no.6
    • /
    • pp.435-442
    • /
    • 2014
  • Purpose: The objective of this study was to develop a fatty acid database (DB) for estimation of intake levels of fatty acids in the Korean population, using data from the Korea National Health and Nutrition Survey (KNHANES). Methods: Analytical values of fatty acids in foods were collected from food composition tables of national institutions (National Fisheries Research & Development Institute, Rural Development Administration), Japan Ministry of Education, Culture, Sports, Science and Technology, US Department of Agriculture, and journal articles that previously reported analytical fatty acid content of some Korean foods. The coverage of fatty acids was C14:0, C16:0, C18:0, C18:1, C18:2 n-6, C18:3 n-3, C20:5 n-3 (EPA), C22:6 n-3 (DHA), SFA, MUFA, and PUFA (n-3, n-6, n-9). The fatty acids DB covered a total of 5,144 food items used in the KNHANES nutrition survey. The food items were preferentially filled with analytical values of the collected data source. An analytical value for each food item was selected based on the priority criteria and the quality evaluation of data sources. Missing values were replaced with calculated or imputed values using the analytical values of similar food items from the data source. Results: A total of 1,545 analytical values, 2,589 calculated values, and 1,010 imputed values were included in the fatty acid DB. The developed fatty acid DB was applied to 2,112 food items available for 2011 KNHANES data. Mean intake levels of total fatty acids and saturated fatty acids were 40.3 g/day and 13.2 g/day, respectively. The estimation of total fatty acid intake was 84.3% (men 83.2%, women 86.0%) of daily total fat intake. Conclusion: This newly developed fatty acid DB would be helpful in determining the association of fatty acids intake and related health concerns in the Korean population.