• Title/Summary/Keyword: Data Imbalance

Search Result 475, Processing Time 0.03 seconds

A Method of Bank Telemarketing Customer Prediction based on Hybrid Sampling and Stacked Deep Networks (혼성 표본 추출과 적층 딥 네트워크에 기반한 은행 텔레마케팅 고객 예측 방법)

  • Lee, Hyunjin
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.15 no.3
    • /
    • pp.197-206
    • /
    • 2019
  • Telemarketing has been used in finance due to the reduction of offline channels. In order to select telemarketing target customers, various machine learning techniques have emerged to maximize the effect of minimum cost. However, there are problems that the class imbalance, which the number of marketing success customers is smaller than the number of failed customers, and the recall rate is lower than accuracy. In this paper, we propose a method that solve the imbalanced class problem and increase the recall rate to improve the efficiency. The hybrid sampling method is applied to balance the data in the class, and the stacked deep network is applied to improve the recall and precision as well as the accuracy. The proposed method is applied to actual bank telemarketing data. As a result of the comparison experiment, the accuracy, the recall, and the precision is improved higher than that of the conventional methods.

Anomaly Detection Model Based on Semi-Supervised Learning Using LIME: Focusing on Semiconductor Process (LIME을 활용한 준지도 학습 기반 이상 탐지 모델: 반도체 공정을 중심으로)

  • Kang-Min An;Ju-Eun Shin;Dong Hyun Baek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.86-98
    • /
    • 2022
  • Recently, many studies have been conducted to improve quality by applying machine learning models to semiconductor manufacturing process data. However, in the semiconductor manufacturing process, the ratio of good products is much higher than that of defective products, so the problem of data imbalance is serious in terms of machine learning. In addition, since the number of features of data used in machine learning is very large, it is very important to perform machine learning by extracting only important features from among them to increase accuracy and utilization. This study proposes an anomaly detection methodology that can learn excellently despite data imbalance and high-dimensional characteristics of semiconductor process data. The anomaly detection methodology applies the LIME algorithm after applying the SMOTE method and the RFECV method. The proposed methodology analyzes the classification result of the anomaly classification model, detects the cause of the anomaly, and derives a semiconductor process requiring action. The proposed methodology confirmed applicability and feasibility through application of cases.

Classification Abnormal temperatures based on Meteorological Environment using Random forests (랜덤포레스트를 이용한 기상 환경에 따른 이상기온 분류)

  • Youn Su Kim;Kwang Yoon Song;In Hong Chang
    • Journal of Integrative Natural Science
    • /
    • v.17 no.1
    • /
    • pp.1-12
    • /
    • 2024
  • Many abnormal climate events are occurring around the world. The cause of abnormal climate is related to temperature. Factors that affect temperature include excessive emissions of carbon and greenhouse gases from a global perspective, and air circulation from a local perspective. Due to the air circulation, many abnormal climate phenomena such as abnormally high temperature and abnormally low temperature are occurring in certain areas, which can cause very serious human damage. Therefore, the problem of abnormal temperature should not be approached only as a case of climate change, but should be studied as a new category of climate crisis. In this study, we proposed a model for the classification of abnormal temperature using random forests based on various meteorological data such as longitudinal observations, yellow dust, ultraviolet radiation from 2018 to 2022 for each region in Korea. Here, the meteorological data had an imbalance problem, so the imbalance problem was solved by oversampling. As a result, we found that the variables affecting abnormal temperature are different in different regions. In particular, the central and southern regions are influenced by high pressure (Mainland China, Siberian high pressure, and North Pacific high pressure) due to their regional characteristics, so pressure-related variables had a significant impact on the classification of abnormal temperature. This suggests that a regional approach can be taken to predict abnormal temperatures from the surrounding meteorological environment. In addition, in the event of an abnormal temperature, it seems that it is possible to take preventive measures in advance according to regional characteristics.

The Development of Biodegradable Fiber Tensile Tenacity and Elongation Prediction Model Considering Data Imbalance and Measurement Error (데이터 불균형과 측정 오차를 고려한 생분해성 섬유 인장 강신도 예측 모델 개발)

  • Se-Chan, Park;Deok-Yeop, Kim;Kang-Bok, Seo;Woo-Jin, Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.12
    • /
    • pp.489-498
    • /
    • 2022
  • Recently, the textile industry, which is labor-intensive, is attempting to reduce process costs and optimize quality through artificial intelligence. However, the fiber spinning process has a high cost for data collection and lacks a systematic data collection and processing system, so the amount of accumulated data is small. In addition, data imbalance occurs by preferentially collecting only data with changes in specific variables according to the purpose of fiber spinning, and there is an error even between samples collected under the same fiber spinning conditions due to difference in the measurement environment of physical properties. If these data characteristics are not taken into account and used for AI models, problems such as overfitting and performance degradation may occur. Therefore, in this paper, we propose an outlier handling technique and data augmentation technique considering the characteristics of the spinning process data. And, by comparing it with the existing outlier handling technique and data augmentation technique, it is shown that the proposed technique is more suitable for spinning process data. In addition, by comparing the original data and the data processed with the proposed method to various models, it is shown that the performance of the tensile tenacity and elongation prediction model is improved in the models using the proposed methods compared to the models not using the proposed methods.

Detecting Malicious Social Robots with Generative Adversarial Networks

  • Wu, Bin;Liu, Le;Dai, Zhengge;Wang, Xiujuan;Zheng, Kangfeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5594-5615
    • /
    • 2019
  • Malicious social robots, which are disseminators of malicious information on social networks, seriously affect information security and network environments. The detection of malicious social robots is a hot topic and a significant concern for researchers. A method based on classification has been widely used for social robot detection. However, this method of classification is limited by an unbalanced data set in which legitimate, negative samples outnumber malicious robots (positive samples), which leads to unsatisfactory detection results. This paper proposes the use of generative adversarial networks (GANs) to extend the unbalanced data sets before training classifiers to improve the detection of social robots. Five popular oversampling algorithms were compared in the experiments, and the effects of imbalance degree and the expansion ratio of the original data on oversampling were studied. The experimental results showed that the proposed method achieved better detection performance compared with other algorithms in terms of the F1 measure. The GAN method also performed well when the imbalance degree was smaller than 15%.

Effects of Iyengar Yoga Practice for 12 weeks on Lower Body Imbalance in Middle-aged Women (중년여성의 12주간 아헹가 요가 수련이 하체 불균형에 미치는 영향)

  • Park, Yunha;Kim, Donghee
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.1
    • /
    • pp.431-440
    • /
    • 2017
  • The purpose of this study was to investigate the effects of Iyengar yoga practice on the lower body imbalance in middle-aged women. The subjects (n=24), who had not performed yoga training prior to this study (and) were not attending any other training programs, participated after undergoing an X-RAY examination with the Gonstead Technique and then their lower body imbalance (was reevaluated). The subjects completed the yoga program for 12 weeks (3 times per week, 90 minutes per session). The data were analyzed with the paired t-test and alpha was set at 0.05. It was found that 1) the height differences between the right and left iliac crests (p < 0.001), width (p < 0.001) and length (p < 0.001) differences between the right and left iliac fossa, and width differences between the right and left sacrum (p < 0.001) were significantly reduced after the training program. In addition, 2) the lower limb length discrepancy was significantly reduced (p < 0.001). Our data suggest that Iyengar yoga training for 12 weeks reduces the pelvic imbalance and length differences between the right and left lower limbs in middle-aged females.

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

  • Chernyaeva, Olga;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.347-364
    • /
    • 2022
  • Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Effective Gait Imbalance Judgment Method based on Thigh Location (대퇴부 위치 기반 효과적인 보행 불균형 측정 방법)

  • Kim, Seojun;Kim, Yoohyun;Shim, Hyeonmin;Lee, Sangmin
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.4
    • /
    • pp.541-545
    • /
    • 2014
  • In this paper, the angle of the thighs that appear during walking condition to balance estimation to the left and right leg was occurred during normal walking. Get over to the limitations of gait analysis using image processing or foot pressure that was used a lot in the previous, the angle of the thigh were used for estimation of asymmetric gait. We implemented heathy five adult male to test targeting and gait and obtained cycle data from 10 times. For this research, Thigh-Angle measurement device were developed, and attached to in a position of $20^{\circ}$ for flexion and $15^{\circ}$ for extension to measure the angle of the thigh. Also, in order to verify the reliability of estimation of asymmetric gait using thigh-angle, it was compared with the result of asymmetric gait estimation using foot pressure. The results of this paper, using the thigh angle is the average of 16.84% higher than using pressure to accuracy of determine the gait imbalance.

The associations between dietary behavior and subjective measurements of serious dental diseases in nursing home staff (일부 병원종사자의 식행동과 주관적 중대 구강병과의 연관성)

  • Shim, Youn-Soo;An, So-Youn;Park, So-Young
    • Journal of Korean society of Dental Hygiene
    • /
    • v.13 no.3
    • /
    • pp.377-385
    • /
    • 2013
  • Objectives : The objective of this study is to determine the associations between dietary behaviour and subjective measurements of dental caries and periodontal disease in a cohort of nursing home staff. Methods : A self-reported survey was carried out in 280 nursing home staff in Jeollabukdo Province, Korea. The collected data were analyzed using SPSS Version 19.0 program. Multiple regression analysis was conducted to examine the effects of dietary behavior and food intake on subjective measurements of the two serious dental diseases. Results : The irregular meal tended to increase dietary imbalance and periodontal diseases in the nursing staff. For example, it had influences on the imbalance of sugar, vegetable, and safood intake. Conclusions : It is important to take regular meal because irregular eating behavior tended to increase dietary imbalance and periodontal diseases in the nursing staff.

The Effect of Knee Muscle Imbalance on Motion of Back Squat (무릎 근력의 불균형이 백 스쿼트 동작에 미치는 영향)

  • Sohn, Jee-Hoon
    • Journal of Digital Convergence
    • /
    • v.17 no.3
    • /
    • pp.463-471
    • /
    • 2019
  • The purpose of this study was to investigate the effect of muscle imbalance on motion of back squat. The isokinetic muscle strength of the 8 subjects was recorded for the knee flexion/extension by the cybex 770 dynamometer. Each subject performed 3 back squats with the long barbell with an intensity of 25% body weight(BW), 50%BW, 100%BW, 125%BW. During the back squat through the recorded kinematic data the subjects' maximum flexion and extension knee angle, center of mass displacement and V-COP were calculated for evaluation of the stability of the movement. For the statistical analysis independent t-test was used. Knee flexion angle and COM displacement are dominated by the reciprocal muscle ratio. V-COP factor was dominated by bilateral extension deficit. Based on the results we can know that as the intensity of the squat increased to a level control was difficult because the muscles' imbalance influenced the movement.