• Title/Summary/Keyword: 선별 알고리즘

Search Result 293, Processing Time 0.026 seconds

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Comparison Analysis of The results of IRMA Test among Different Equipment According to Algorithm change. (IRMA 검사법 중 알고리즘 변경에 따른 장비 간 결과값 비교분석)

  • Kim, Jung In;Kwon, Won Hyun;Lee, Kyung Jae
    • The Korean Journal of Nuclear Medicine Technology
    • /
    • v.23 no.2
    • /
    • pp.43-50
    • /
    • 2019
  • Purpose The principle of nuclear medicine test is divided into two main categories: competition(radioimmunoassay, RIA) and noncompetitive reaction(Immunoradiometric assay, IRMA). It is known that the curve fitting method, which is commonly used in inspection field, uses Spline interpolation in RIA method and Linear interpolation method in IRMA method. Among them, the insulin test using the IRMA test showed a significant difference, especially at low concentrations, despite the same algorithm of linear interpolation between fully automated radio immunoassay analyzers. In this study, we aim to obtain results from applying two different of algorithm using fully automated radio immunoassay analyzers including Gamma pro, Gamma 10, Cobra, and SR300. Materials and Methods A total of 30 test samples were selected for the test of TSH, ferritin, C-peptide, and insulin serum levels. Test was performed by IRMA method. We compared the difference in the results of applying the linear interpolation method and the spline interpolation method to Gamma Pro, Gamma 10, Cobra, and SR300 equipment. Results Two-way ANOVA was used for statistical analysis. The significance level was applied as P <0.05. The results of TSH, ferritin, C-peptide, and insulin tests were compared between the fully automated radio immunoassay analyzers. There was a significant difference between ferritin, C-peptide, and insulin serum levels(P<0.001). TSH didn't show any significant different between the devices(P=0.29). In the difference between linear and spline interpolation, there was no significant difference between insulin test(P=0.08), TSH test(P=0.81), and Ferritin test(P=0.06). However, C-peptide test showed a significant difference(P=0.03). Especially, the insulin test showed significant difference in lower ranges. As a result of comparing and analyzing the difference between the two interpolation methods, the devices in the low concentration group showed significant difference(P<0.001). Conclusion In case of new equipment in the laboratory it is necessary to recognize that there is a difference in the curve fitting method for each automated radio immunoassay analyzers in the low concentration area when the principle of inspection is IRMA method.

An exploration of the relationship between crime/victim characteristics and the victim's criminal damages: Variable selection based on random forest algorithm (범죄 및 피해자 특성과 범죄피해 내용의 관계 탐색: 랜덤포레스트 알고리즘에 기초한 변인선택)

  • Han, Yuhwa;Lee, Wooyeol
    • Korean Journal of Forensic Psychology
    • /
    • v.13 no.2
    • /
    • pp.121-145
    • /
    • 2022
  • The current study applied the random forest algorithm to Korean crime victim survey data collected biennially between 2010 and 2018 to explore the relationship between crime/victim characteristics and the victim's criminal damages. A total of 3,080 cases including gender, age (life cycle stage), type of crime, perpetrator acquisition, repeated victimization, psychological damage (depression, isolation, extreme fear, somatic symptoms, interpersonal problems, moving out to avoid people, suicidal impulses, suicide attempts), and emotional changes after victimization (changes in self-protection confidence, self-esteem, confidence in others, confidence in legal institutions, and respect for Korean legal system/law) were analyzed. Considering the features of data that are difficult to apply traditional statistical techniques, this study implemented random forest algorithms to predict crime and victim characteristics using the victim's criminal damages (psychological damage and emotional change) and selected good predictors using VSURF function in VSURF package for R. As a result of the analysis, it was confirmed that the relationship between the type of crime and depression, extreme fear, somatic symptoms, and interpersonal problems, between perpetrator acquisition and somatic symptoms and interpersonal problems, and between repeated victimization and changes in respect for Korean legal system/law. Gender and life cycle stage (youth/adult/elderly) were found to be related to extreme fear and changes in self-protection confidence, respectively. However, more empirical evidence should be aggregated to explain the results as meaningful. The results of this study suggest that it is necessary to enhance the experts' knowledge and educate them on cases about the relationship between crime/victim characteristics and criminal damage. Strengthening their interview strategy and knowledge about law/rules were also needed to increase the effectiveness of the Korean victim assessment system.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

Smart farm development strategy suitable for domestic situation -Focusing on ICT technical characteristics for the development of the industry6.0- (국내 실정에 적합한 스마트팜 개발 전략 -6차산업의 발전을 위한 ICT 기술적 특성을 중심으로-)

  • Han, Sang-Ho;Joo, Hyung-Kun
    • Journal of Digital Convergence
    • /
    • v.20 no.4
    • /
    • pp.147-157
    • /
    • 2022
  • This study tried to propose a smart farm technology strategy suitable for the domestic situation, focusing on the differentiation suitable for the domestic situation of ICT technology. In the case of advanced countries in the overseas agricultural industry, it was confirmed that they focused on the development of a specific stage that reflected the geographical characteristics of each country, the characteristics of the agricultural industry, and the characteristics of the people's demand. Confirmed that no enemy development is being performed. Therefore, in response to problems such as a rapid decrease in the domestic rural population, aging population, loss of agricultural price competitiveness, increase in fallow land, and decrease in use rate of arable land, this study aims to develop smart farm ICT technology in the future to create quality agricultural products and have price competitiveness. It was suggested that the smart farm should be promoted by paying attention to the excellent performance, ease of use due to the aging of the labor force, and economic feasibility suitable for a small business scale. First, in terms of economic feasibility, the ICT technology is configured by selecting only the functions necessary for the small farm household (primary) business environment, and the smooth communication system with these is applied to the ICT technology to gradually update the functions required by the actual farmhouse. suggested that it may contribute to the reduction. Second, in terms of performance, it is suggested that the operation accuracy can be increased if attention is paid to improving the communication function of ICT, such as adjusting the difficulty of big data suitable for the aging population in Korea, using a language suitable for them, and setting an algorithm that reflects their prediction tendencies. Third, the level of ease of use. Smart farms based on ICT technology for the development of the Industry6.0 (1.0(Agriculture, Forestry) + 2.0(Agricultural and Water & Water Processing) + 3.0 (Service, Rural Experience, SCM)) perform operations according to specific commands, finally suggested that ease of use can be promoted by presetting and standardizing devices based on big data configuration customized for each regional environment.

Preliminary Inspection Prediction Model to select the on-Site Inspected Foreign Food Facility using Multiple Correspondence Analysis (차원축소를 활용한 해외제조업체 대상 사전점검 예측 모형에 관한 연구)

  • Hae Jin Park;Jae Suk Choi;Sang Goo Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.1
    • /
    • pp.121-142
    • /
    • 2023
  • As the number and weight of imported food are steadily increasing, safety management of imported food to prevent food safety accidents is becoming more important. The Ministry of Food and Drug Safety conducts on-site inspections of foreign food facilities before customs clearance as well as import inspection at the customs clearance stage. However, a data-based safety management plan for imported food is needed due to time, cost, and limited resources. In this study, we tried to increase the efficiency of the on-site inspection by preparing a machine learning prediction model that pre-selects the companies that are expected to fail before the on-site inspection. Basic information of 303,272 foreign food facilities and processing businesses collected in the Integrated Food Safety Information Network and 1,689 cases of on-site inspection information data collected from 2019 to April 2022 were collected. After preprocessing the data of foreign food facilities, only the data subject to on-site inspection were extracted using the foreign food facility_code. As a result, it consisted of a total of 1,689 data and 103 variables. For 103 variables, variables that were '0' were removed based on the Theil-U index, and after reducing by applying Multiple Correspondence Analysis, 49 characteristic variables were finally derived. We build eight different models and perform hyperparameter tuning through 5-fold cross validation. Then, the performance of the generated models are evaluated. The research purpose of selecting companies subject to on-site inspection is to maximize the recall, which is the probability of judging nonconforming companies as nonconforming. As a result of applying various algorithms of machine learning, the Random Forest model with the highest Recall_macro, AUROC, Average PR, F1-score, and Balanced Accuracy was evaluated as the best model. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the selection reason for nonconforming facilities of individual instances, and discuss applicability to the on-site inspection facility selection system. Based on the results of this study, it is expected that it will contribute to the efficient operation of limited resources such as manpower and budget by establishing an imported food management system through a data-based scientific risk management model.

Research Trends of Health Recommender Systems (HRS): Applying Citation Network Analysis and GraphSAGE (건강추천시스템(HRS) 연구 동향: 인용네트워크 분석과 GraphSAGE를 활용하여)

  • Haryeom Jang;Jeesoo You;Sung-Byung Yang
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.57-84
    • /
    • 2023
  • With the development of information and communications technology (ICT) and big data technology, anyone can easily obtain and utilize vast amounts of data through the Internet. Therefore, the capability of selecting high-quality data from a large amount of information is becoming more important than the capability of just collecting them. This trend continues in academia; literature reviews, such as systematic and non-systematic reviews, have been conducted in various research fields to construct a healthy knowledge structure by selecting high-quality research from accumulated research materials. Meanwhile, after the COVID-19 pandemic, remote healthcare services, which have not been agreed upon, are allowed to a limited extent, and new healthcare services such as health recommender systems (HRS) equipped with artificial intelligence (AI) and big data technologies are in the spotlight. Although, in practice, HRS are considered one of the most important technologies to lead the future healthcare industry, literature review on HRS is relatively rare compared to other fields. In addition, although HRS are fields of convergence with a strong interdisciplinary nature, prior literature review studies have mainly applied either systematic or non-systematic review methods; hence, there are limitations in analyzing interactions or dynamic relationships with other research fields. Therefore, in this study, the overall network structure of HRS and surrounding research fields were identified using citation network analysis (CNA). Additionally, in this process, in order to address the problem that the latest papers are underestimated in their citation relationships, the GraphSAGE algorithm was applied. As a result, this study identified 'recommender system', 'wireless & IoT', 'computer vision', and 'text mining' as increasingly important research fields related to HRS research, and confirmed that 'personalization' and 'privacy' are emerging issues in HRS research. The study findings would provide both academic and practical insights into identifying the structure of the HRS research community, examining related research trends, and designing future HRS research directions.

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History (개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법)

  • Park, Seong Hun;Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.511-522
    • /
    • 2014
  • During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.

An Object Detection and Tracking System using Fuzzy C-means and CONDENSATION (Fuzzy C-means와 CONDENSATION을 이용한 객체 검출 및 추적 시스템)

  • Kim, Jong-Ho;Kim, Sang-Kyoon;Hang, Goo-Seun;Ahn, Sang-Ho;Kang, Byoung-Doo
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.16 no.4
    • /
    • pp.87-98
    • /
    • 2011
  • Detecting a moving object from videos and tracking it are basic and necessary preprocessing steps in many video systems like object recognition, context aware, and intelligent visual surveillance. In this paper, we propose a method that is able to detect a moving object quickly and accurately in a condition that background and light change in a real time. Furthermore, our system detects strongly an object in a condition that the target object is covered with other objects. For effective detection, effective Eigen-space and FCM are combined and employed, and a CONDENSATION algorithm is used to trace a detected object strongly. First, training data collected from a background image are linear-transformed using Principal Component Analysis (PCA). Second, an Eigen-background is organized from selected principal components having excellent discrimination ability on an object and a background. Next, an object is detected with FCM that uses a convolution result of the Eigen-vector of previous steps and the input image. Finally, an object is tracked by using coordinates of an detected object as an input value of condensation algorithm. Images including various moving objects in a same time are collected and used as training data to realize our system that is able to be adapted to change of light and background in a fixed camera. The result of test shows that the proposed method detects an object strongly in a condition having a change of light and a background, and partial movement of an object.

On the Design of Multi-layered Polygonal Helix Antennas (다각 다단 구조 헬릭스 안테나 설계)

  • Choo Jae-Yul;Choo Ho-Sung;Park Ik-Mo;Oh Yi-Sok
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.17 no.3 s.106
    • /
    • pp.249-258
    • /
    • 2006
  • In this letter, we propose a novel printed helix antenna for RFID reader in UHF band. The printed strip line of the antenna is first wound up outside a polygonal shaped layer and then the winding continues on an inner layer to control the overall gain and the radiation pattern. In addition, the winding pitch angles on each layer have either negative or positive values resulting in the broad CP bandwidth. The detail structure of the antenna was optimized using Pareto genetic algorithm(GA), so as to obtain excellent performances for RFID reader antennas. The optimized two-layered polygonal helix was fabricated on the cardboard of a flexible substrate and the performances were measured and compared with the simulations. The fabricated antenna was made up of copper tape which can adhere to a flexible cardboard and had 21.4 % matching bandwidth, 31.9 % CP bandwidth, readable range of $5.5m^2$ with kr=3.2. Also based on the current distribution of the strip line of the antenna and sensitivity of the antenna bents points, we confirmed that the antenna has the quarter-wave transformer near the feed for the broad matching bandwidth and radiates the traveling wave for the broad CP bandwidth using the bent strip line.