Search | Korea Science

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

Seo, Jeoung-soo;Ahn, Hyunchul
- Journal of Intelligence and Information Systems
- /
- v.26 no.4
- /
- pp.173-198
- /
- 2020
For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.
https://doi.org/10.13088/jiis.2020.26.4.173 인용 PDF KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

A Biomechanical Analysis of Four Different Taekwondo Body Punch Types in Horseback-Riding Stance (태권도 주춤 서 몸통지르기 유형별 생체역학적 변인 비교 분석)

Kang, Sung-Chul;Kim, Eui-Hwan;Shin, Hyun-Moo;Kim, Sung-Sup;Kim, Tae-Whan
- Korean Journal of Applied Biomechanics
- /
- v.17 no.4
- /
- pp.201-208
- /
- 2007
The purpose of this study is to compare 4 different body punch types(type 1: a punch using a shoulder, type 2: a punch using a waist, type 3: a punch using lower extremities, and type 4: a punch with elbows by your side at chest level) in horseback-riding stance and establish suitable teaching theory and method, which would be a useful reference to Taekwondo instructors on the spot(in Taekwondo dojangs all around Korea). Five exhibition players from Korean national Taekwondo exhibition team participated in this study. Each participant was asked to perform the four different types of punches and their kinematic and kinetic data were recorded with 7 vicon cameras(125Hz) and two force plates(AMTI, 1200Hz). We analyzed displacement, time, resultant center of body mass trajectory, velocity, trunk angular velocity, and ground reaction force(GRF) from each body segment in body punch and the result. I performed 1-way ANOVA(RM) for average values of each player after standardization and statistical significance was set as p<.05. was as the following ; First, they showed a tendency to take the body punch posture with the biggest motion at a shoulder and on descending order a waist and a knee. Second, a mean time for each body punch on ascending order 0.46sec. for type 2, 0.49sec for type 3, 0.50sec. for type 4, and 0.56sec. for type 1. Third, a mean resultant center of body mass trajectory for each body punch the longest 4.07cm for type 3 and the shortest 2.458cm for type 1. Fourth, a mean of maximal velocity of a fist strike was the fastest 5.99m/s for type 3, 5.93m/s for type 4, 5.67m/s for type 2, and 5.01m/s for type 1 on the descending order. Fifth, a mean of maximal trunk angular velocity of the fastest 495.6deg./sec. for type 4 and 337.7deg./sec. for type 1 on the descending order. Sixth, strongest value was type 3, 2 for anterior-posterior ground reaction force(left -54.89N, right 60.58N), type 4 for medial-lateral GRF(left 83.59N, right -80.12N), and type 3 for vertical GRF(left 341.79N, right 426.11N).
https://doi.org/10.5103/KJSB.2007.17.4.201 인용 PDF

Analysis of Distributed Computational Loads in Large-scale AC/DC Power System using Real-Time EMT Simulation (대규모 AC/DC 전력 시스템 실시간 EMP 시뮬레이션의 부하 분산 연구)

In Kwon, Park;Yi, Zhong Hu;Yi, Zhang;Hyun Keun, Ku;Yong Han, Kwon
- KEPCO Journal on Electric Power and Energy
- /
- v.8 no.2
- /
- pp.159-179
- /
- 2022
Often a network becomes complex, and multiple entities would get in charge of managing part of the whole network. An example is a utility grid. While the entire grid would go under a single utility company's responsibility, the network is often split into multiple subsections. Subsequently, each subsection would be given as the responsibility area to the corresponding sub-organization in the utility company. The issue of how to make subsystems of adequate size and minimum number of interconnections between subsystems becomes more critical, especially in real-time simulations. Because the computation capability limit of a single computation unit, regardless of whether it is a high-speed conventional CPU core or an FPGA computational engine, it comes with a maximum limit that can be completed within a given amount of execution time. The issue becomes worsened in real time simulation, in which the computation needs to be in precise synchronization with the real-world clock. When the subject of the computation allows for a longer execution time, i.e., a larger time step size, a larger portion of the network can be put on a computation unit. This translates into a larger margin of the difference between the worst and the best. In other words, even though the worst (or the largest) computational burden is orders of magnitude larger than the best (or the smallest) computational burden, all the necessary computation can still be completed within the given amount of time. However, the requirement of real-time makes the margin much smaller. In other words, the difference between the worst and the best should be as small as possible in order to ensure the even distribution of the computational load. Besides, data exchange/communication is essential in parallel computation, affecting the overall performance. However, the exchange of data takes time. Therefore, the corresponding consideration needs to be with the computational load distribution among multiple calculation units. If it turns out in a satisfactory way, such distribution will raise the possibility of completing the necessary computation in a given amount of time, which might come down in the level of microsecond order. This paper presents an effective way to split a given electrical network, according to multiple criteria, for the purpose of distributing the entire computational load into a set of even (or close to even) sized computational loads. Based on the proposed system splitting method, heavy computation burdens of large-scale electrical networks can be distributed to multiple calculation units, such as an RTDS real time simulator, achieving either more efficient usage of the calculation units, a reduction of the necessary size of the simulation time step, or both.
https://doi.org/10.18770/KEPCO.2022.08.02.159 인용 PDF KSCI

Application of Chemiluminescence Enzyme Immunoassay Method to Collect in vivo Matured Oocyte in Dog Cloning (개 복제 시 체내 성숙 난자 회수를 위한 화학발광효소면역분석기법의 적용)

Kim, Min-Jung;Oh, Hyun-Ju;Kim, Geon-A;Jo, Young-Kwang;Choi, Jin;Lee, Byeong-Chun
- Journal of Veterinary Clinics
- /
- v.31 no.4
- /
- pp.267-271
- /
- 2014
Accurate determination of in vivo oocyte maturation is particularly critical for dog cloning compared to other assisted reproductive technologies because oocytes in metaphase II stage have to be recovered in order to undergo somatic cell nuclear transfer right after its recovery. The aim of present study was to evaluate the reliability and to set a reference range of a chemiluminescence enzyme immunoassay (CLEIA) compared to radioimmunoassay (RIA) method to retrieve in vivo matured oocytes. Serum progesterone concentration during proestrus and estrus was analyzed by RIA and CLEIA to determine ovulation day (Day 0). On Day 3, in vivo oocytes were recovered surgically and evaluated microscopically maturation status after staining nucleus with bisbenzimidazole dye. Mean progesterone concentration by CLEIA ($7.64{\pm}0.06ng/ml$) was significantly higher than by RIA ($6.46{\pm}0.04ng/ml$, P < 0.0001). It was not different between CLEIA ($10.01{\pm}0.34ng/ml$) and RIA values ($7.91{\pm}0.14ng/ml$, P < 0.05) on Day 0, but significantly higher CLEIA level on Day -1 and Day 1 ($6.41{\pm}0.15$ and $14.25{\pm}0.44ng/ml$) was assessed compared to RIA ($4.95{\pm}0.10$ and $11.29{\pm}0.34ng/ml$). However, with both methods, progesterone level was significantly increased from Day -1 to Day 2. To determine oocyte maturation with CLEIA method, a wider and higher reference range has to be considered.
PDF KSCI

Establishment of the Appropriate Risk Standard through the Risk Assessment of Accident Scenario (사고시나리오별 위험도 산정을 통한 적정 위험도 기준 설정)

Kim, Kun-Ho;Chun, Young-Woo;Hwang, Yong-Woo;Lee, Ik-Mo;Kwak, In-ho
- Journal of Korean Society of Environmental Engineers
- /
- v.39 no.2
- /
- pp.74-81
- /
- 2017
An off-site consequence analysis is used to calculate the risks when hazardous chemicals that is being used on-site has been exposed off-site; the biggest factor that impacts the risk is the risks of accident scenarios. This study seeks to calculate risks according to accident scenarios by applying OGP/LOPA risk calculating methods for similar facilities, calculate risk reduction ratio by inspecting applicable IPL for incidents, and propose an appropriate risk standard for different risk calculating methods. Considering all applicable IPL when estimating the safety improvement of accident scenarios, the risk of OGP is 8.05E-04 and the risk of LOPA is 1.00E-04, According to the case of IPL, the risk is 1.34E-02. The optimal risk level for accident scenarios using LOPA was $10^{-2}$, but the appropriate risk criteria for accident scenarios in foreign similar studies were $10^{-3}{\sim}10^{-4}$, the risk of a scenario can be determined at an unacceptable level. When OGP is applied, it is analyzed as acceptable level, but in case of applying LOPA, all applicable IPL should be applied in order to satisfy the acceptable risk level. Compared to OGP, the risk is high when LOPA is applied. Therefore, the acceptable risk level should be set differently for each risk method.
https://doi.org/10.4491/KSEE.2017.39.2.74 인용 PDF KSCI

A Comparison Study on the Method of Pollution Evaluation of Water Quality in the Stream (하천 수질의 오염도평가 방법의 비교 연구)

Lee, Ho-Beom;Lee, Jung-Ki;Shin, Dae-Yewn
- Journal of Environmental Health Sciences
- /
- v.31 no.5 s.86
- /
- pp.398-403
- /
- 2005
This study is undertaken to find the optimal method to make the decision on the degree of water pollution by comparison of K-WQI, KOE-WQI that is made for index with the water quality index and water quality environment standard of the Frame Act on Environment Policy as the result of survey for water quality reality on the major point of the Yeongsan river from 2002 to 2004. The water quality of major rivers has some differences depending on seasons. however, under the water quality standard by the $BOD_5$ density, most of rivers displayed the water quality level of $II{\sim}III$ grading, and on K-WQI that is classified by indexing for 10 categories of pH, DO, $BOD_5,\;COD,\;SS,\;T-N,\;NH_3-N,\;NO_{3^-}$ N, T-P, and E-Coli and classified into 5 groups from 100 points to 40 points, they displayed the score distribution of the first grade in water quality for $85{\sim}100$ points to the second grade in water quality for $70{\sim}84$ points. On KOE-WQI that is classified by indexing for 5 categories of pH, DO, $BOD_5$, COD and T-coli and classified into 5 groups from 90 points or above for outstanding and 29 points or below for very bad, and the water quality distribution is made ranged from the first grade in water quality for 90 points or more to the third grade in water quality for $69{\sim}50$ points. In addition, for the contribution of the water quality decline, the Environmental standard has significant dependency on the $BOD_5$ density, with K-WQI contributing in various water quality decline depending on the environment around the river area of $BOD_5,\;T-N,\;NH_3-N,\;NO_3-N,\;T-P$, and E-Coli, and KOE-WQI acting os the factor contributing to lower the water quality decline by $BOD_5$, COD, and T-coli. As such, the current water quality environment standard has high dependency on $BOD_5$ and KOE-WQI excludes some nitrogen and phosphorus that considers the river environment that the grade in water quality is set by some category, and K-WQI reflected well of the ecology environment of rivers with the diversity of the assessment factor as well as to have the low dependency of specific factor to be objective.
PDF KSCI

Analysis of Allowable Stresses of Machine Graded Lumber in Korea (국내 기계등급구조재의 허용응력 분석)

Hong, Jung-Pyo;Oh, Jung-Kwon;Park, Joo-Saeng;Han, Yeon Jung;Pang, Sung-Jun;Kim, Chul-Ki;Lee, Jun-Jae
- Journal of the Korean Wood Science and Technology
- /
- v.43 no.4
- /
- pp.456-462
- /
- 2015
365 pieces of domestic $38{\times}140{\times}3600mm$ Red pine structural lumber were machine graded conforming to a softwood structural lumber standard (KS F 3020). The allowable bending stresses calculated for each grade were compared with the values currently tabulated in the standard. Four calculation methods for lower $5^{th}$ percentile bending stress were non-parametric estimation with 75% confidence level, 2-parameter and 3-parameter Weibull distribution fit, and bending modulus of rupture (MOR)-modulus of elasticity (MOE) regression based method. Only the data set of Grades E8, E9, and E10 were statistically eligible for the $5^{th}$ percentile calculation. The MOR-MOE regression based method only was able to estimate the lower $5^{th}$ percentile values theoretically for the full range of grades. The results showed that all allowable bending stresses calculated were lower than the design values tabulated in the standard. This implies that the current machine grading system has the pitfall of structural safety. Improvement in current machine grading system could be achieved by introducing the bending strength and stiffness combination grade system.
https://doi.org/10.5658/WOOD.2015.43.4.456 인용 PDF KSCI

Health Risk Assessment of Disinfection By-products by Chlorination in Tap Water Ingestion (수도수중 염소 소독부산물로 인한 건강위해성 평가에 관한 연구 - 서울시 수도수중 Trihalomethanes 및 Haloaceticnitriles을 중심으로 -)

Chung, Yong;Shin, Dong-Chun;Yang, Ji-Yeon;Park, Yeon-Shin;Kim, Jun-Sung
- Environmental Analysis Health and Toxicology
- /
- v.12 no.3_4
- /
- pp.31-41
- /
- 1997
Public concerns about hazardous health effect from the exposure to organic by-products of the chlorination have been increased. There are numerous studies reporting that chlorination of drinking water produces numerous chlorinated organic by-products including THMs, HAAs, HANs. Some of these products are known to be animal carcinogens. The purpose of this study was to estimate health risk of DBPs by chlorinated drinking water ingestion in Seoul based on methodologies that have been developed for conducting risk assessment of complex-chemical-mixture. The drinking water sample was collected seperately at six water treatment plant in Seoul at March, April, 1996. In tap water of households in Seoul, DBPs were measured wilfh the mean value of 36.6 $\mu$g/L. Risk assessment processes,. which include processes for the estimation of human cancer potency using animal bioassay data and calculation of human exposure, entail uncertainties. In the exposure assessment process, exposure scenarios with various assumptions could affect the exposure amount and excess cancer risk. The reference dose of haloacetonitriles was estimated to be 0.0023 mg/kg/day by applying dibromoacetonitrile NOAEL and uncertainty factor to the mean concentration. In the first case, human excess cancer risk was estimated by the US EPA method used to set the MCL (maximum contaminant level). In the second and third case, the risk was estimated for multi-route exposure with and without adopting Monte-Carlo simulation, respectively. In the second case, exposure input parameters and cancer potencies used probability distributions, and in the third case, those values used point estimates (mean, and maximum or 95% upper-bound value). As a result, while the excess cancer risk estimated by US EPA method considering only direct ingestion tended to be underestimated, the risk which was estimated by considering multi-route exposure without Monte-Carlo simulation and then using the maximum or 95% upper-bound value as input parameters tended to be overestimated. In risk assessment for Trihalomethanes, considering multi-route exposure with adopting Monte-Carlo analysis seems to provide the most reasonable estimations.
PDF

Assessment of Liquefaction Potential on Non-Plastic Silty Soil Layers Using Geographic Information System(GIS) and Standard Penetration Test Results (지리정보시스템 및 표준관입시험 결과를 이용한 비소성 실트질 지반의 액상화 평가)

Yoo, Si-Dong;Kim, Hong-Taek;Song, Byung-Woong;Lee, Hyung-Kyu
- Journal of the Korean GEO-environmental Society
- /
- v.6 no.2
- /
- pp.5-14
- /
- 2005
In the present study, the liquefaction potential in the area of the Incheon international airport was assessed by applying the data of both standard penetration tests and laboratory tests to the modified Seed & Idriss method. The analysis was performed against the non-plastic silty soil layer and silty sand soil layer existing within the depth of 20m and under the ground water level, having the standard penetration value(N) of below 20. Also, each set of data was mapped using the GIS(Geographic Information System) and the safety factor against the liquefaction potential ($FS_{liquefaction}$) was obtained by overlapping those layers. Throughout the analysis, it was found that there exists a potential hazard zone for the liquefaction, showing partially that the safety factor against the liquefaction potential is 1.0 to 1.5 below the standard safety factor criterion. It is further thought to be necessary that the liquefaction potential for the corresponding hazard zone be additionally assessed in detail.
PDF

Search Result 1,376, Processing Time 0.029 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)