• Title/Summary/Keyword: Algorithm Ability

Search Result 1,193, Processing Time 0.022 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

Steel Plate Faults Diagnosis with S-MTS (S-MTS를 이용한 강판의 표면 결함 진단)

  • Kim, Joon-Young;Cha, Jae-Min;Shin, Junguk;Yeom, Choongsub
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.1
    • /
    • pp.47-67
    • /
    • 2017
  • Steel plate faults is one of important factors to affect the quality and price of the steel plates. So far many steelmakers generally have used visual inspection method that could be based on an inspector's intuition or experience. Specifically, the inspector checks the steel plate faults by looking the surface of the steel plates. However, the accuracy of this method is critically low that it can cause errors above 30% in judgment. Therefore, accurate steel plate faults diagnosis system has been continuously required in the industry. In order to meet the needs, this study proposed a new steel plate faults diagnosis system using Simultaneous MTS (S-MTS), which is an advanced Mahalanobis Taguchi System (MTS) algorithm, to classify various surface defects of the steel plates. MTS has generally been used to solve binary classification problems in various fields, but MTS was not used for multiclass classification due to its low accuracy. The reason is that only one mahalanobis space is established in the MTS. In contrast, S-MTS is suitable for multi-class classification. That is, S-MTS establishes individual mahalanobis space for each class. 'Simultaneous' implies comparing mahalanobis distances at the same time. The proposed steel plate faults diagnosis system was developed in four main stages. In the first stage, after various reference groups and related variables are defined, data of the steel plate faults is collected and used to establish the individual mahalanobis space per the reference groups and construct the full measurement scale. In the second stage, the mahalanobis distances of test groups is calculated based on the established mahalanobis spaces of the reference groups. Then, appropriateness of the spaces is verified by examining the separability of the mahalanobis diatances. In the third stage, orthogonal arrays and Signal-to-Noise (SN) ratio of dynamic type are applied for variable optimization. Also, Overall SN ratio gain is derived from the SN ratio and SN ratio gain. If the derived overall SN ratio gain is negative, it means that the variable should be removed. However, the variable with the positive gain may be considered as worth keeping. Finally, in the fourth stage, the measurement scale that is composed of selected useful variables is reconstructed. Next, an experimental test should be implemented to verify the ability of multi-class classification and thus the accuracy of the classification is acquired. If the accuracy is acceptable, this diagnosis system can be used for future applications. Also, this study compared the accuracy of the proposed steel plate faults diagnosis system with that of other popular classification algorithms including Decision Tree, Multi Perception Neural Network (MLPNN), Logistic Regression (LR), Support Vector Machine (SVM), Tree Bagger Random Forest, Grid Search (GS), Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). The steel plates faults dataset used in the study is taken from the University of California at Irvine (UCI) machine learning repository. As a result, the proposed steel plate faults diagnosis system based on S-MTS shows 90.79% of classification accuracy. The accuracy of the proposed diagnosis system is 6-27% higher than MLPNN, LR, GS, GA and PSO. Based on the fact that the accuracy of commercial systems is only about 75-80%, it means that the proposed system has enough classification performance to be applied in the industry. In addition, the proposed system can reduce the number of measurement sensors that are installed in the fields because of variable optimization process. These results show that the proposed system not only can have a good ability on the steel plate faults diagnosis but also reduce operation and maintenance cost. For our future work, it will be applied in the fields to validate actual effectiveness of the proposed system and plan to improve the accuracy based on the results.

A Study of Factors Associated with Software Developers Job Turnover (데이터마이닝을 활용한 소프트웨어 개발인력의 업무 지속수행의도 결정요인 분석)

  • Jeon, In-Ho;Park, Sun W.;Park, Yoon-Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.191-204
    • /
    • 2015
  • According to the '2013 Performance Assessment Report on the Financial Program' from the National Assembly Budget Office, the unfilled recruitment ratio of Software(SW) Developers in South Korea was 25% in the 2012 fiscal year. Moreover, the unfilled recruitment ratio of highly-qualified SW developers reaches almost 80%. This phenomenon is intensified in small and medium enterprises consisting of less than 300 employees. Young job-seekers in South Korea are increasingly avoiding becoming a SW developer and even the current SW developers want to change careers, which hinders the national development of IT industries. The Korean government has recently realized the problem and implemented policies to foster young SW developers. Due to this effort, it has become easier to find young SW developers at the beginning-level. However, it is still hard to recruit highly-qualified SW developers for many IT companies. This is because in order to become a SW developing expert, having a long term experiences are important. Thus, improving job continuity intentions of current SW developers is more important than fostering new SW developers. Therefore, this study surveyed the job continuity intentions of SW developers and analyzed the factors associated with them. As a method, we carried out a survey from September 2014 to October 2014, which was targeted on 130 SW developers who were working in IT industries in South Korea. We gathered the demographic information and characteristics of the respondents, work environments of a SW industry, and social positions for SW developers. Afterward, a regression analysis and a decision tree method were performed to analyze the data. These two methods are widely used data mining techniques, which have explanation ability and are mutually complementary. We first performed a linear regression method to find the important factors assaociated with a job continuity intension of SW developers. The result showed that an 'expected age' to work as a SW developer were the most significant factor associated with the job continuity intention. We supposed that the major cause of this phenomenon is the structural problem of IT industries in South Korea, which requires SW developers to change the work field from developing area to management as they are promoted. Also, a 'motivation' to become a SW developer and a 'personality (introverted tendency)' of a SW developer are highly importantly factors associated with the job continuity intention. Next, the decision tree method was performed to extract the characteristics of highly motivated developers and the low motivated ones. We used well-known C4.5 algorithm for decision tree analysis. The results showed that 'motivation', 'personality', and 'expected age' were also important factors influencing the job continuity intentions, which was similar to the results of the regression analysis. In addition to that, the 'ability to learn' new technology was a crucial factor for the decision rules of job continuity. In other words, a person with high ability to learn new technology tends to work as a SW developer for a longer period of time. The decision rule also showed that a 'social position' of SW developers and a 'prospect' of SW industry were minor factors influencing job continuity intensions. On the other hand, 'type of an employment (regular position/ non-regular position)' and 'type of company (ordering company/ service providing company)' did not affect the job continuity intension in both methods. In this research, we demonstrated the job continuity intentions of SW developers, who were actually working at IT companies in South Korea, and we analyzed the factors associated with them. These results can be used for human resource management in many IT companies when recruiting or fostering highly-qualified SW experts. It can also help to build SW developer fostering policy and to solve the problem of unfilled recruitment of SW Developers in South Korea.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

VKOSPI Forecasting and Option Trading Application Using SVM (SVM을 이용한 VKOSPI 일 중 변화 예측과 실제 옵션 매매에의 적용)

  • Ra, Yun Seon;Choi, Heung Sik;Kim, Sun Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.177-192
    • /
    • 2016
  • Machine learning is a field of artificial intelligence. It refers to an area of computer science related to providing machines the ability to perform their own data analysis, decision making and forecasting. For example, one of the representative machine learning models is artificial neural network, which is a statistical learning algorithm inspired by the neural network structure of biology. In addition, there are other machine learning models such as decision tree model, naive bayes model and SVM(support vector machine) model. Among the machine learning models, we use SVM model in this study because it is mainly used for classification and regression analysis that fits well to our study. The core principle of SVM is to find a reasonable hyperplane that distinguishes different group in the data space. Given information about the data in any two groups, the SVM model judges to which group the new data belongs based on the hyperplane obtained from the given data set. Thus, the more the amount of meaningful data, the better the machine learning ability. In recent years, many financial experts have focused on machine learning, seeing the possibility of combining with machine learning and the financial field where vast amounts of financial data exist. Machine learning techniques have been proved to be powerful in describing the non-stationary and chaotic stock price dynamics. A lot of researches have been successfully conducted on forecasting of stock prices using machine learning algorithms. Recently, financial companies have begun to provide Robo-Advisor service, a compound word of Robot and Advisor, which can perform various financial tasks through advanced algorithms using rapidly changing huge amount of data. Robo-Adviser's main task is to advise the investors about the investor's personal investment propensity and to provide the service to manage the portfolio automatically. In this study, we propose a method of forecasting the Korean volatility index, VKOSPI, using the SVM model, which is one of the machine learning methods, and applying it to real option trading to increase the trading performance. VKOSPI is a measure of the future volatility of the KOSPI 200 index based on KOSPI 200 index option prices. VKOSPI is similar to the VIX index, which is based on S&P 500 option price in the United States. The Korea Exchange(KRX) calculates and announce the real-time VKOSPI index. VKOSPI is the same as the usual volatility and affects the option prices. The direction of VKOSPI and option prices show positive relation regardless of the option type (call and put options with various striking prices). If the volatility increases, all of the call and put option premium increases because the probability of the option's exercise possibility increases. The investor can know the rising value of the option price with respect to the volatility rising value in real time through Vega, a Black-Scholes's measurement index of an option's sensitivity to changes in the volatility. Therefore, accurate forecasting of VKOSPI movements is one of the important factors that can generate profit in option trading. In this study, we verified through real option data that the accurate forecast of VKOSPI is able to make a big profit in real option trading. To the best of our knowledge, there have been no studies on the idea of predicting the direction of VKOSPI based on machine learning and introducing the idea of applying it to actual option trading. In this study predicted daily VKOSPI changes through SVM model and then made intraday option strangle position, which gives profit as option prices reduce, only when VKOSPI is expected to decline during daytime. We analyzed the results and tested whether it is applicable to real option trading based on SVM's prediction. The results showed the prediction accuracy of VKOSPI was 57.83% on average, and the number of position entry times was 43.2 times, which is less than half of the benchmark (100 times). A small number of trading is an indicator of trading efficiency. In addition, the experiment proved that the trading performance was significantly higher than the benchmark.

An Analysis on Shortest Path Search Process of Gifted Student and Normal Student in Information (정보영재학생과 일반학생의 최단경로 탐색 과정 분석)

  • Kang, Sungwoong;Kim, Kapsu
    • Journal of The Korean Association of Information Education
    • /
    • v.20 no.3
    • /
    • pp.243-254
    • /
    • 2016
  • This study has produced a checker of the shortest path search problem with a total of 19 questions as a web-based computer evaluation based on the 'TRAFFIC' questions of PISA 2012. It is because the computer has been settled as an indispensable and significant instrument in the process of solving the problems of everyday life and as a media that is underlying in assessment. Therefore, information gifted students should be able to solve the problem using the computer and give clear enough commands to the computer so that it can perform the procedure. In addition, since it is the age that the computational thinking is affecting every sectors, it should give students new educational stimuli. The relationship between the rate of correct answers and the time took to solve the problem through the shortest route search process showed a significant correlation the variable that affected the problem solving as the difficulty of the question rises due to the increase of nodes and edges turned out to be the node than the edge. It was revealed that information gifted students went through algorithmic thinking in the process of solving the shortest route search problem. And It could be confirmed cognitive characteristics of the information gifted students such as 'ability streamlining' and 'information structure memory'.

Development of Route Planning System for Intermodal Transportation Based on an Agent Collecting Schedule Information (운송스케줄 정보수집 에이전트 기반 복합운송 경로계획 시스템)

  • Choi, Hyung-Rim;Kim, Hyun-Soo;Park, Byung-Joo;Kang, Moo-Hong
    • Information Systems Review
    • /
    • v.10 no.1
    • /
    • pp.115-133
    • /
    • 2008
  • The third-party logistics industry mainly delivers goods from a place to an arrival place on behalf of the freight owner. To handle the work, they need a transportation route including transportation equipment between the starting place and the arrival place, schedule information for departure/arrival and transportation cost. Actually, automatic searching for an optimal transportation route, which considers arrival and departure points for intermodal transportation, is not a simple problem. To search efficiently transportation route, the collection of schedule information for intermodal transportation and transportation route generation have become critical and vital issues for logistics companies. Usually, they manually make a plan for a transportation route by their experience. Because of this, they are limited in their ability if there is too much cargo volume and a great many transactions. Furthermore, their dependence on the conventional way in doing business causes an inefficient selection of transporters or transportation routes. Also, it fails to provide diverse alternatives for transportation routes to the customers, and as a result, increases logistics costs. In an effort to solve these problems, this study aims to develop a route planning system based on agent, which can collect scattered schedule information on the Web. The route planning system also has an algorithm for transportation route generation in intermodal transportation.

TMC (Tracker Motion Controller) Using Sensors and GPS Implementation and Performance Analysis (센서와 GPS를 이용한 TMC의 구현 및 성능 분석)

  • Ko, Jae-Hong
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.2
    • /
    • pp.828-834
    • /
    • 2013
  • In this paper, TMC (Tracker Motion Controller) as one of the many research methods for condensing efficiency improvements can be condensed into efficient solar system configuration to improve the power generation efficiency of the castle with Concentrated solar silicon and photovoltaic systems (CPV)experiments using PV systems. Microprocessor used on the solar system, tracing the development of solar altitude and latitude of each is calculated in real time. Also accept the value from the sensor, motor control and communication with the central control system by calculating the value of the current position of the sun, there is a growing burden on the applicability. Through the way the program is appropriate for solar power systems and sensors hybrid-type algorithm was implemented in the ARM core with built-in TMC, Concentrated CPV system compared to the existing PV systems, through the implementation of the TMC in the country's power generation efficiency compared and analyzed. Sensor method using existing experimental results Concentrated solar power systems to communicate the value of GPS location tracking method hybrid solar horizons in the coordinate system of the sun's azimuth and elevation angles calculated by the program in the calculations of astronomy through experimental resultslook clear day at high solar irradiation were shown to have a large difference. Stopped after a certain period of time, the sun appears in the blind spot of the sensor, the sensor error that can occur from climate change, however, do not have a cloudy and clear day solar radiation sensor does not keep track of the position of the sun, rather than the sensor of excellence could be found. It is expected that research is constantly needed for the system with ongoing research for development of solar cell efficiency increases to reduce the production cost of power generation, high efficiency condensing type according to the change of climate with the optimal development of the ability TMC.

Quality Assurance of Multileaf Collimator Using Electronic Portal Imaging (전자포탈영상을 이용한 다엽시준기의 정도관리)

  • ;Jason W Sohn
    • Progress in Medical Physics
    • /
    • v.14 no.3
    • /
    • pp.151-160
    • /
    • 2003
  • The application of more complex radiotherapy techniques using multileaf collimation (MLC), such as 3D conformal radiation therapy and intensity-modulated radiation therapy (IMRT), has increased the significance of verifying leaf position and motion. Due to thier reliability and empirical robustness, quality assurance (QA) of MLC. However easy use and the ability to provide digital data of electronic portal imaging devices (EPIDs) have attracted attention to portal films as an alternatives to films for routine qualify assurance, despite concerns about their clinical feasibility, efficacy, and the cost to benefit ratio. In this study, we developed method for daily QA of MLC using electronic portal images (EPIs). EPID availability for routine QA was verified by comparing of the portal films, which were simultaneously obtained when radiation was delivered and known prescription input to MLC controller. Specially designed two-test patterns of dynamic MLC were applied for image acquisition. Quantitative off-line analysis using an edge detection algorithm enhanced the verification procedure as well as on-line qualitative visual assessment. In conclusion, the availability of EPI was enough for daily QA of MLC leaf position with the accuracy of portal films.

  • PDF

Objective and Quantitative Evaluation of Image Quality Using Fuzzy Integral: Phantom Study (퍼지적분을 이용한 영상품질의 객관적이고 정량적 평가: 팬톰 연구)

  • Kim, Sung-Hyun;Suh, Tae-Suk;Choe, Bo-Young;Lee, Hyoung-Koo
    • Progress in Medical Physics
    • /
    • v.19 no.4
    • /
    • pp.201-208
    • /
    • 2008
  • Physical evaluations provide the basis for an objective and quantitative analysis of the image quality. Nonetheless, there are limitations in using physical evaluations to judge the utility of the image quality if the observer's subjectivity plays a key role despite its imprecise and variable nature. This study proposes a new method for objective and quantitative evaluation of image quality to compensate for the demerits of both physical and subjective image quality and combine the merits of them. The images of chest phantom were acquired from four digital radiography systems on clinic sites. The physical image quality was derived from an image analysis algorithm in terms of the contrast-to-noise ratio (CNR) of the low-contrast objects in three regions (lung, heart, and diaphragm) of a digital chest phantom radiograph. For image analysis, various image processing techniques were used such as segmentation, and registration, etc. The subjective image quality was assessed by the ability of the human observer to detect low-contrast objects. Fuzzy integral was used to integrate them. The findings of this study showed that the physical evaluation did not agree with the subjective evaluation. The system with the better performance in physical measurement showed the worse result in subjective evaluation compared to the other system. The proposed protocol is an integral evaluation method of image quality, which includes the properties of both physical and subjective measurement. It may be used as a useful tool in image evaluation of various modalities.

  • PDF