• Title/Summary/Keyword: Machine learning and gender

Search Result 39, Processing Time 0.021 seconds

The study of blood glucose level prediction model using ballistocardiogram and artificial intelligence (심탄도와 인공지능을 이용한 혈당수치 예측모델 연구)

  • Choi, Sang-Ki;Park, Cheol-Gu
    • Journal of Digital Convergence
    • /
    • v.19 no.9
    • /
    • pp.257-269
    • /
    • 2021
  • The purpose of this study is to collect biosignal data in a non-invasive and non-restrictive manner using a BCG (Ballistocardiogram) sensor, and utilize artificial intelligence machine learning algorithms in ICT and high-performance computing environments. And it is to present and study a method for developing and validating a data-based blood glucose prediction model. In the blood glucose level prediction model, the input nodes in the MLP architecture are data of heart rate, respiration rate, stroke volume, heart rate variability, SDNN, RMSSD, PNN50, age, and gender, and the hidden layer 7 were used. As a result of the experiment, the average MSE, MAE, and RMSE values of the learning data tested 5 times were 0.5226, 0.6328, and 0.7692, respectively, and the average values of the validation data were 0.5408, 0.6776, and 0.7968, respectively, and the coefficient of determination (R2) was 0.9997. If research to standardize a model for predicting blood sugar levels based on data and to verify data set collection and prediction accuracy continues, it is expected that it can be used for non-invasive blood sugar level management.

AI Comparative Analysis of Trade and Consumption Patterns in Korea and China

  • Chang Hwan Choi;Thi Thanh Tuyen Nguyen;PengYan Wang
    • Journal of Korea Trade
    • /
    • v.27 no.1
    • /
    • pp.119-138
    • /
    • 2023
  • Purpose - This research is to empirically explore the differences in apparel consumption among male and female teenagers and college students in Korea and China. By conducting a survey to understand customers' needs and behaviors, fashion businesses will be able to improve their customer satisfaction and avoid redundancy, inventory, and the waste of resources, effort and money. Design/methodology - The research design considers the consumption patterns of male and female high school and college students in Korea and China. To analyze the data, the study employs decision trees, a type of machine learning algorithm. A decision tree model was developed to examine the relationship between the explanatory and response variables, which can be either quantitative or qualitative in nature. Findings - The main findings of this study indicate that there are differences in shopping behavior among different customer segments. The results show that men have a simpler shopping behavior compared to women. Additionally, cultural factors and the difference in fashion needs between students and non-students have a significant impact on the shopping choices of Chinese and Korean individuals. Originality/value - Existing studies often assume that the shopping behavior of high school and university students is similar and that there are no significant differences in clothing purchases between men and women across countries. The results provide valuable insights into the unique shopping behavior of different customer segments, and can inform fashion businesses in their efforts to meet the needs of their customers.

A customer credit Prediction Researched to Improve Credit Stability based on Artificial Intelligence

  • MUN, Ji-Hui;JUNG, Sang Woo
    • Korean Journal of Artificial Intelligence
    • /
    • v.9 no.1
    • /
    • pp.21-27
    • /
    • 2021
  • In this Paper, Since the 1990s, Korea's credit card industry has steadily developed. As a result, various problems have arisen, such as careless customer information management and loans to low-credit customers. This, in turn, had a high delinquency rate across the card industry and a negative impact on the economy. Therefore, in this paper, based on Azure, we analyze and predict the delinquency and delinquency periods of credit loans according to gender, own car, property, number of children, education level, marital status, and employment status through linear regression analysis and enhanced decision tree algorithm. These predictions can consequently reduce the likelihood of reckless credit lending and issuance of credit cards, reducing the number of bad creditors and reducing the risk of banks. In addition, after classifying and dividing the customer base based on the predicted result, it can be used as a basis for reducing the risk of credit loans by developing a credit product suitable for each customer. The predicted result through Azure showed that when predicting with Linear Regression and Boosted Decision Tree algorithm, the Boosted Decision Tree algorithm made more accurate prediction. In addition, we intend to increase the accuracy of the analysis by assigning a number to each data in the future and predicting again.

Identifying the Expression Patterns of Depression Based on the Random Forest (랜덤 포레스트 기반 우울증 발현 패턴 도출)

  • Jeon, Hyeon Jin;Jihn, Chang-Ho
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.4
    • /
    • pp.53-64
    • /
    • 2021
  • Depression is one of the most important psychiatric disorders worldwide. Most depression-related data mining and machine learning studies have been conducted to predict the presence of depression or to derive individual risk factors. However, since depression is caused by a combination of various factors, it is necessary to identify the complex relationship between the factors in order to establish effective anti-depression and management measures. In this study, we propose a methodology for identifying and interpreting patterns of depression expressions using the method of deriving random forest rules, where the random forest rule consists of the condition for the manifestation of the depressive pattern and the prediction result of depression when the condition is met. The analysis was carried out by subdividing into 4 groups in consideration of the different depressive patterns according to gender and age. Depression rules derived by the proposed methodology were validated by comparing them with the results of previous studies. Also, through the AUC comparison test, the depression diagnosis performance of the derived rules was evaluated, and it was not different from the performance of the existing PHQ-9 summing method. The significance of this study can be found in that it enabled the interpretation of the complex relationship between depressive factors beyond the existing studies that focused on prediction and deduction of major factors.

Counterfactual image generation by disentangling data attributes with deep generative models

  • Jieon Lim;Weonyoung Joo
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.6
    • /
    • pp.589-603
    • /
    • 2023
  • Deep generative models target to infer the underlying true data distribution, and it leads to a huge success in generating fake-but-realistic data. Regarding such a perspective, the data attributes can be a crucial factor in the data generation process since non-existent counterfactual samples can be generated by altering certain factors. For example, we can generate new portrait images by flipping the gender attribute or altering the hair color attributes. This paper proposes counterfactual disentangled variational autoencoder generative adversarial networks (CDVAE-GAN), specialized for data attribute level counterfactual data generation. The structure of the proposed CDVAE-GAN consists of variational autoencoders and generative adversarial networks. Specifically, we adopt a Gaussian variational autoencoder to extract low-dimensional disentangled data features and auxiliary Bernoulli latent variables to model the data attributes separately. Also, we utilize a generative adversarial network to generate data with high fidelity. By enjoying the benefits of the variational autoencoder with the additional Bernoulli latent variables and the generative adversarial network, the proposed CDVAE-GAN can control the data attributes, and it enables producing counterfactual data. Our experimental result on the CelebA dataset qualitatively shows that the generated samples from CDVAE-GAN are realistic. Also, the quantitative results support that the proposed model can produce data that can deceive other machine learning classifiers with the altered data attributes.

Matching prediction on Korean professional volleyball league (한국 프로배구 연맹의 경기 예측 및 영향요인 분석)

  • Heesook Kim;Nakyung Lee;Jiyoon Lee;Jongwoo Song
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.3
    • /
    • pp.323-338
    • /
    • 2024
  • This study analyzes the Korean professional volleyball league and predict match outcomes using popular machine learning classification methods. Match data from the 2012/2013 to 2022/2023 seasons for both male and female leagues were collected, including match details. Two different data structures were applied to the models: Separating matches results into two teams and performance differentials between the home and away teams. These two data structures were applied to construct a total of four predictive models, encompassing both male and female leagues. As specific variable values used in the models are unavailable before the end of matches, the results of the most recent 3 to 4 matches, up until just before today's match, were preprocessed and utilized as variables. Logistc Regrssion, Decision Tree, Bagging, Random Forest, Xgboost, Adaboost, and Light GBM, were employed for classification, and the model employing Random Forest showed the highest predictive performance. The results indicated that while significant variables varied by gender and data structure, set success rate, blocking points scored, and the number of faults were consistently crucial. Notably, our win-loss prediction model's distinctiveness lies in its ability to provide pre-match forecasts rather than post-event predictions.

A Comparative Study of Predictive Factors for Passing the National Physical Therapy Examination using Logistic Regression Analysis and Decision Tree Analysis

  • Kim, So Hyun;Cho, Sung Hyoun
    • Physical Therapy Rehabilitation Science
    • /
    • v.11 no.3
    • /
    • pp.285-295
    • /
    • 2022
  • Objective: The purpose of this study is to use logistic regression and decision tree analysis to identify the factors that affect the success or failurein the national physical therapy examination; and to build and compare predictive models. Design: Secondary data analysis study Methods: We analyzed 76,727 subjects from the physical therapy national examination data provided by the Korea Health Personnel Licensing Examination Institute. The target variable was pass or fail, and the input variables were gender, age, graduation status, and examination area. Frequency analysis, chi-square test, binary logistic regression, and decision tree analysis were performed on the data. Results: In the logistic regression analysis, subjects in their 20s (Odds ratio, OR=1, reference), expected to graduate (OR=13.616, p<0.001) and from the examination area of Jeju-do (OR=3.135, p<0.001), had a high probability of passing. In the decision tree, the predictive factors for passing result had the greatest influence in the order of graduation status (x2=12366.843, p<0.001) and examination area (x2=312.446, p<0.001). Logistic regression analysis showed a specificity of 39.6% and sensitivity of 95.5%; while decision tree analysis showed a specificity of 45.8% and sensitivity of 94.7%. In classification accuracy, logistic regression and decision tree analysis showed 87.6% and 88.0% prediction, respectively. Conclusions: Both logistic regression and decision tree analysis were adequate to explain the predictive model. Additionally, whether actual test takers passed the national physical therapy examination could be determined, by applying the constructed prediction model and prediction rate.

Differences and Multi-dimensionality of the Perception of Career Success among Korean Employees: A Topic Modeling Approach (기업근로자 경력성공 인식의 다차원성과 차이: 토픽모델링의 적용)

  • Lee, Jaeeun;Chae, Chungil
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.6
    • /
    • pp.58-71
    • /
    • 2019
  • The purpose of this study is to explore the multi-dimensionality and the differences of the career success that is revealed by the employee's perception. In order to fulfill the research purpose, LDA topic modeling has applied to extract latent topics of career success from 126 Korean employees' open-end survey questionnaires. The extracted latent topics are social recognition, continuing service within an organization, expertise, financial rewards, and pursuing personal meaning. The occurrence probability of each topic was different by individual characteristics such as gender, education, position. Study findings showed there is multi-dimensionality in career success, and there are differences of topic occurrence probability by demographic characteristics. Additionally, this study showed how to apply the recently developed machine learning approach in order to reduce the researcher's bias by adapting the LDA topic modeling to the qualitative open-ended survey data.

Consumer behavior prediction using Airbnb web log data (에어비앤비(Airbnb) 웹 로그 데이터를 이용한 고객 행동 예측)

  • An, Hyoin;Choi, Yuri;Oh, Raeeun;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.391-404
    • /
    • 2019
  • Customers' fixed characteristics have often been used to predict customer behavior. It has recently become possible to track customer web logs as customer activities move from offline to online. It has become possible to collect large amounts of web log data; however, the researchers only focused on organizing the log data or describing the technical characteristics. In this study, we predict the decision-making time until each customer makes the first reservation, using Airbnb customer data provided by the Kaggle website. This data set includes basic customer information such as gender, age, and web logs. We use various methodologies to find the optimal model and compare prediction errors for cases with web log data and without it. We consider six models such as Lasso, SVM, Random Forest, and XGBoost to explore the effectiveness of the web log data. As a result, we choose Random Forest as our optimal model with a misclassification rate of about 20%. In addition, we confirm that using web log data in our study doubles the prediction accuracy in predicting customer behavior compared to not using it.