• Title/Summary/Keyword: 일반정보

Search Result 9,083, Processing Time 0.038 seconds

Response Modeling for the Marketing Promotion with Weighted Case Based Reasoning Under Imbalanced Data Distribution (불균형 데이터 환경에서 변수가중치를 적용한 사례기반추론 기반의 고객반응 예측)

  • Kim, Eunmi;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.29-45
    • /
    • 2015
  • Response modeling is a well-known research issue for those who have tried to get more superior performance in the capability of predicting the customers' response for the marketing promotion. The response model for customers would reduce the marketing cost by identifying prospective customers from very large customer database and predicting the purchasing intention of the selected customers while the promotion which is derived from an undifferentiated marketing strategy results in unnecessary cost. In addition, the big data environment has accelerated developing the response model with data mining techniques such as CBR, neural networks and support vector machines. And CBR is one of the most major tools in business because it is known as simple and robust to apply to the response model. However, CBR is an attractive data mining technique for data mining applications in business even though it hasn't shown high performance compared to other machine learning techniques. Thus many studies have tried to improve CBR and utilized in business data mining with the enhanced algorithms or the support of other techniques such as genetic algorithm, decision tree and AHP (Analytic Process Hierarchy). Ahn and Kim(2008) utilized logit, neural networks, CBR to predict that which customers would purchase the items promoted by marketing department and tried to optimized the number of k for k-nearest neighbor with genetic algorithm for the purpose of improving the performance of the integrated model. Hong and Park(2009) noted that the integrated approach with CBR for logit, neural networks, and Support Vector Machine (SVM) showed more improved prediction ability for response of customers to marketing promotion than each data mining models such as logit, neural networks, and SVM. This paper presented an approach to predict customers' response of marketing promotion with Case Based Reasoning. The proposed model was developed by applying different weights to each feature. We deployed logit model with a database including the promotion and the purchasing data of bath soap. After that, the coefficients were used to give different weights of CBR. We analyzed the performance of proposed weighted CBR based model compared to neural networks and pure CBR based model empirically and found that the proposed weighted CBR based model showed more superior performance than pure CBR model. Imbalanced data is a common problem to build data mining model to classify a class with real data such as bankruptcy prediction, intrusion detection, fraud detection, churn management, and response modeling. Imbalanced data means that the number of instance in one class is remarkably small or large compared to the number of instance in other classes. The classification model such as response modeling has a lot of trouble to recognize the pattern from data through learning because the model tends to ignore a small number of classes while classifying a large number of classes correctly. To resolve the problem caused from imbalanced data distribution, sampling method is one of the most representative approach. The sampling method could be categorized to under sampling and over sampling. However, CBR is not sensitive to data distribution because it doesn't learn from data unlike machine learning algorithm. In this study, we investigated the robustness of our proposed model while changing the ratio of response customers and nonresponse customers to the promotion program because the response customers for the suggested promotion is always a small part of nonresponse customers in the real world. We simulated the proposed model 100 times to validate the robustness with different ratio of response customers to response customers under the imbalanced data distribution. Finally, we found that our proposed CBR based model showed superior performance than compared models under the imbalanced data sets. Our study is expected to improve the performance of response model for the promotion program with CBR under imbalanced data distribution in the real world.

A Template-based Interactive University Timetabling Support System (템플릿 기반의 상호대화형 전공강의시간표 작성지원시스템)

  • Chang, Yong-Sik;Jeong, Ye-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.121-145
    • /
    • 2010
  • University timetabling depending on the educational environments of universities is an NP-hard problem that the amount of computation required to find solutions increases exponentially with the problem size. For many years, there have been lots of studies on university timetabling from the necessity of automatic timetable generation for students' convenience and effective lesson, and for the effective allocation of subjects, lecturers, and classrooms. Timetables are classified into a course timetable and an examination timetable. This study focuses on the former. In general, a course timetable for liberal arts is scheduled by the office of academic affairs and a course timetable for major subjects is scheduled by each department of a university. We found several problems from the analysis of current course timetabling in departments. First, it is time-consuming and inefficient for each department to do the routine and repetitive timetabling work manually. Second, many classes are concentrated into several time slots in a timetable. This tendency decreases the effectiveness of students' classes. Third, several major subjects might overlap some required subjects in liberal arts at the same time slots in the timetable. In this case, it is required that students should choose only one from the overlapped subjects. Fourth, many subjects are lectured by same lecturers every year and most of lecturers prefer the same time slots for the subjects compared with last year. This means that it will be helpful if departments reuse the previous timetables. To solve such problems and support the effective course timetabling in each department, this study proposes a university timetabling support system based on two phases. In the first phase, each department generates a timetable template from the most similar timetable case, which is based on case-based reasoning. In the second phase, the department schedules a timetable with the help of interactive user interface under the timetabling criteria, which is based on rule-based approach. This study provides the illustrations of Hanshin University. We classified timetabling criteria into intrinsic and extrinsic criteria. In intrinsic criteria, there are three criteria related to lecturer, class, and classroom which are all hard constraints. In extrinsic criteria, there are four criteria related to 'the numbers of lesson hours' by the lecturer, 'prohibition of lecture allocation to specific day-hours' for committee members, 'the number of subjects in the same day-hour,' and 'the use of common classrooms.' In 'the numbers of lesson hours' by the lecturer, there are three kinds of criteria : 'minimum number of lesson hours per week,' 'maximum number of lesson hours per week,' 'maximum number of lesson hours per day.' Extrinsic criteria are also all hard constraints except for 'minimum number of lesson hours per week' considered as a soft constraint. In addition, we proposed two indices for measuring similarities between subjects of current semester and subjects of the previous timetables, and for evaluating distribution degrees of a scheduled timetable. Similarity is measured by comparison of two attributes-subject name and its lecturer-between current semester and a previous semester. The index of distribution degree, based on information entropy, indicates a distribution of subjects in the timetable. To show this study's viability, we implemented a prototype system and performed experiments with the real data of Hanshin University. Average similarity from the most similar cases of all departments was estimated as 41.72%. It means that a timetable template generated from the most similar case will be helpful. Through sensitivity analysis, the result shows that distribution degree will increase if we set 'the number of subjects in the same day-hour' to more than 90%.

A Case Study on Forecasting Inbound Calls of Motor Insurance Company Using Interactive Data Mining Technique (대화식 데이터 마이닝 기법을 활용한 자동차 보험사의 인입 콜량 예측 사례)

  • Baek, Woong;Kim, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.3
    • /
    • pp.99-120
    • /
    • 2010
  • Due to the wide spread of customers' frequent access of non face-to-face services, there have been many attempts to improve customer satisfaction using huge amounts of data accumulated throughnon face-to-face channels. Usually, a call center is regarded to be one of the most representative non-faced channels. Therefore, it is important that a call center has enough agents to offer high level customer satisfaction. However, managing too many agents would increase the operational costs of a call center by increasing labor costs. Therefore, predicting and calculating the appropriate size of human resources of a call center is one of the most critical success factors of call center management. For this reason, most call centers are currently establishing a department of WFM(Work Force Management) to estimate the appropriate number of agents and to direct much effort to predict the volume of inbound calls. In real world applications, inbound call prediction is usually performed based on the intuition and experience of a domain expert. In other words, a domain expert usually predicts the volume of calls by calculating the average call of some periods and adjusting the average according tohis/her subjective estimation. However, this kind of approach has radical limitations in that the result of prediction might be strongly affected by the expert's personal experience and competence. It is often the case that a domain expert may predict inbound calls quite differently from anotherif the two experts have mutually different opinions on selecting influential variables and priorities among the variables. Moreover, it is almost impossible to logically clarify the process of expert's subjective prediction. Currently, to overcome the limitations of subjective call prediction, most call centers are adopting a WFMS(Workforce Management System) package in which expert's best practices are systemized. With WFMS, a user can predict the volume of calls by calculating the average call of each day of the week, excluding some eventful days. However, WFMS costs too much capital during the early stage of system establishment. Moreover, it is hard to reflect new information ontothe system when some factors affecting the amount of calls have been changed. In this paper, we attempt to devise a new model for predicting inbound calls that is not only based on theoretical background but also easily applicable to real world applications. Our model was mainly developed by the interactive decision tree technique, one of the most popular techniques in data mining. Therefore, we expect that our model can predict inbound calls automatically based on historical data, and it can utilize expert's domain knowledge during the process of tree construction. To analyze the accuracy of our model, we performed intensive experiments on a real case of one of the largest car insurance companies in Korea. In the case study, the prediction accuracy of the devised two models and traditional WFMS are analyzed with respect to the various error rates allowable. The experiments reveal that our data mining-based two models outperform WFMS in terms of predicting the amount of accident calls and fault calls in most experimental situations examined.

Time-Lapse Crosswell Seismic Study to Evaluate the Underground Cavity Filling (지하공동 충전효과 평가를 위한 시차 공대공 탄성파 토모그래피 연구)

  • Lee, Doo-Sung
    • Geophysics and Geophysical Exploration
    • /
    • v.1 no.1
    • /
    • pp.25-30
    • /
    • 1998
  • Time-lapse crosswell seismic data, recorded before and after the cavity filling, showed that the filling increased the velocity at a known cavity zone in an old mine site in Inchon area. The seismic response depicted on the tomogram and in conjunction with the geologic data from drillings imply that the size of the cavity may be either small or filled by debris. In this study, I attempted to evaluate the filling effect by analyzing velocity measured from the time-lapse tomograms. The data acquired by a downhole airgun and 24-channel hydrophone system revealed that there exists measurable amounts of source statics. I presented a methodology to estimate the source statics. The procedure for this method is: 1) examine the source firing-time for each source, and remove the effect of irregular firing time, and 2) estimate the residual statics caused by inaccurate source positioning. This proposed multi-step inversion may reduce high frequency numerical noise and enhance the resolution at the zone of interest. The multi-step inversion with different starting models successfully shows the subtle velocity changes at the small cavity zone. The inversion procedure is: 1) conduct an inversion using regular sized cells, and generate an image of gross velocity structure by applying a 2-D median filter on the resulting tomogram, and 2) construct the starting velocity model by modifying the final velocity model from the first phase. The model was modified so that the zone of interest consists of small-sized grids. The final velocity model developed from the baseline survey was as a starting velocity model on the monitor inversion. Since we expected a velocity change only in the cavity zone, in the monitor inversion, we can significantly reduce the number of model parameters by fixing the model out-side the cavity zone equal to the baseline model.

  • PDF

A Field Research on Multi-Language Sign System in Hospital at the Point of View in Convergent Study - Focused on General Hospital in Busan and South Gyeongsang Province - (융합적 관점에서 본 병원 사인시스템 다중언어 표기 현황 조사 - 부산 및 경남지역 의료기관을 중심으로 -)

  • Park, Han Na;Paik, Jin Kyung
    • Korea Science and Art Forum
    • /
    • v.37 no.1
    • /
    • pp.87-97
    • /
    • 2019
  • The study began in recent years with the aim of grasping the nation's medical status following the fast-growing trend of international medical tourism and attracting foreign patients, among other things, Busan, which ranks second in attracting foreign patients after the nation's capital, Seoul, has been highly active in the past eight years, with foreign patients rising by about 426 percent, and Russian patients entering the sea. In addition, Gimhae and Changwon, the Busan-based Gyeongsangnam-do region, ranked first and second in number of foreign residents, and are inhabited by a variety of foreign workers. Medical institutions, such as hospitals, should be able to find directions within hospitals. It is also a space where information in various languages, including Korean, English, Chinese, or Russian, must be delivered in a single medium. Based on this research, the purpose of this research is to provide converged information that helps foreigners who are not familiar with Korean language easily understand the proposed recognition system when visiting hospitals. Therefore, this paper is applied to a multi-language survey of six medical institutions (A, B, C, D, E, F) at the university hospital in Busan, and 10 medical institutions (R, J) in Gimhae, South Gyeongsang Province with high foreign residents. Research results and contents are as follows. First, the results of analyzing the design of the sinusoidal system show that the font uses colorless Gothic fonts, arrows, and pictograms to introduce the design of a typical hospital sign system. Second, the results of the multi-lingual situation were found to have only two languages in the system, such as Korean and English, and to have four languages, including Korean, English, Chinese, and Russian, according to their geographical location. However, it was judged that most medical institutions currently have only two languages (Korean, English) that may cause some discomfort in terms of language for foreign patients in non-English speaking countries. Based on these findings, it is necessary to propose designs that are considered by Koreans as well as foreign users in the use of multilingual hospital sign systems.

Analysis of Behavioral Characteristics by Park Types Displayed in 3rd Generation SNS (제3세대 SNS에 표출된 공원 유형별 이용 특성 분석)

  • Kim, Ji-Eun;Park, Chan;Kim, Ah-Yeon;Kim, Ho Gul
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.47 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • There have been studies on the satisfaction, preference, and post occupancy evaluation of urban parks in order to reflect users' preferences and activities, suggesting directions for future park planning and management. Despite using questionnaires that are proven to be affective to get users' opinions directly, there haven been limitations in understanding the latest changes in park use through questionnaires. This study seeks to address the possibility of utilizing the thirdgeneration SNS data, Instagram and Google, to compare behavior patterns and trends in park activities. Instagram keywords and photos representing user's feelings with a specific park name were collected. We also examined reviews, peak time, and popular time zones regarding selected parks through Google. This study tries to analyze users' behaviors, emerging activities, and satisfaction using SNS data. The findings are as follows. People using park near residential areas tend to enjoy programs being operated in indoor facilities and to like to use picnic places. In an adjacent park of commercial areas, eating in the park and extended areas beyond the park boundaries is found to be one of the popular park activities. Programs using open spaces and indoor facilities were active as well. Han River Park as a detached park type offers a popular venue for excercises and scenery appreciation. We also identified companionship characteristics of different park types from texts and photos, and extracted keywords of feelings and reviews about parks posted in $3^{rd}$ generation SNS. SNS data can provide basis to grasp behavioral patterns and satisfaction factors, and changes of park activities in real time. SNS data also can be used to set future directions in park planning and management in accordance with new technologies and policies.

Evaluation of Land Use Change Impact on Hydrology and Water Quality Health in Geum River Basin (금강유역의 토지이용 변화가 수문·수질 건전성에 미치는 영향 평가)

  • LEE, Ji-Wan;PARK, Jong-Yoon;JUNG, Chung-Gil;KIM, Seong-Joon
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.2
    • /
    • pp.82-96
    • /
    • 2019
  • This study evaluated the status of watershed health in Geum River Basin by SWAT (Soil and Water Assessment Tool) hydrology and water quality. The watershed healthiness from watershed hydrology and stream water quality was calculated using multivariate normal distribution from 0(poor) to 1(good). Before evaluation of watershed healthiness, the SWAT calibration for 11 years(2005~2015) of streamflow(Q) at 5 locations with 0.50~0.77 average Nash-Sutcliffe model efficiency and suspended solid (SS), total nitrogen(T-N), and total phosphorus(T-P) at 3 locations with 0.67~0.94, 0.59~0.79, and 0.61~0.79 determination coefficient($R^2$) respectively. For 24 years (1985~2008) the spatiotemporal change of watershed healthiness was analyzed with calibarted SWAT and 5 land use data of 1985, 1990, 1995, 2000, and 2008. The 2008 SWAT results showed that the surface runoff increased by 40.6%, soil moisture and baseflow decreased by 6.8% and 3.0% respectively compared to 1985 reference year. The stream water quality of SS, T-N, and T-P increased by 29.2%, 9.3%, and 16.7% respectively by land development and agricultural activity. Based on the 1985 year land use condition. the 2008 watershed healthiness of hydrology and stream water quality decreased from 1 to 0.94 and 0.69 respectively. The results of this study be able to detect changes in watershed environment due to human activity compared to past natural conditions.

The Effect of Data Size on the k-NN Predictability: Application to Samsung Electronics Stock Market Prediction (데이터 크기에 따른 k-NN의 예측력 연구: 삼성전자주가를 사례로)

  • Chun, Se-Hak
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.239-251
    • /
    • 2019
  • Statistical methods such as moving averages, Kalman filtering, exponential smoothing, regression analysis, and ARIMA (autoregressive integrated moving average) have been used for stock market predictions. However, these statistical methods have not produced superior performances. In recent years, machine learning techniques have been widely used in stock market predictions, including artificial neural network, SVM, and genetic algorithm. In particular, a case-based reasoning method, known as k-nearest neighbor is also widely used for stock price prediction. Case based reasoning retrieves several similar cases from previous cases when a new problem occurs, and combines the class labels of similar cases to create a classification for the new problem. However, case based reasoning has some problems. First, case based reasoning has a tendency to search for a fixed number of neighbors in the observation space and always selects the same number of neighbors rather than the best similar neighbors for the target case. So, case based reasoning may have to take into account more cases even when there are fewer cases applicable depending on the subject. Second, case based reasoning may select neighbors that are far away from the target case. Thus, case based reasoning does not guarantee an optimal pseudo-neighborhood for various target cases, and the predictability can be degraded due to a deviation from the desired similar neighbor. This paper examines how the size of learning data affects stock price predictability through k-nearest neighbor and compares the predictability of k-nearest neighbor with the random walk model according to the size of the learning data and the number of neighbors. In this study, Samsung electronics stock prices were predicted by dividing the learning dataset into two types. For the prediction of next day's closing price, we used four variables: opening value, daily high, daily low, and daily close. In the first experiment, data from January 1, 2000 to December 31, 2017 were used for the learning process. In the second experiment, data from January 1, 2015 to December 31, 2017 were used for the learning process. The test data is from January 1, 2018 to August 31, 2018 for both experiments. We compared the performance of k-NN with the random walk model using the two learning dataset. The mean absolute percentage error (MAPE) was 1.3497 for the random walk model and 1.3570 for the k-NN for the first experiment when the learning data was small. However, the mean absolute percentage error (MAPE) for the random walk model was 1.3497 and the k-NN was 1.2928 for the second experiment when the learning data was large. These results show that the prediction power when more learning data are used is higher than when less learning data are used. Also, this paper shows that k-NN generally produces a better predictive power than random walk model for larger learning datasets and does not when the learning dataset is relatively small. Future studies need to consider macroeconomic variables related to stock price forecasting including opening price, low price, high price, and closing price. Also, to produce better results, it is recommended that the k-nearest neighbor needs to find nearest neighbors using the second step filtering method considering fundamental economic variables as well as a sufficient amount of learning data.

A Recidivism Prediction Model Based on XGBoost Considering Asymmetric Error Costs (비대칭 오류 비용을 고려한 XGBoost 기반 재범 예측 모델)

  • Won, Ha-Ram;Shim, Jae-Seung;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.127-137
    • /
    • 2019
  • Recidivism prediction has been a subject of constant research by experts since the early 1970s. But it has become more important as committed crimes by recidivist steadily increase. Especially, in the 1990s, after the US and Canada adopted the 'Recidivism Risk Assessment Report' as a decisive criterion during trial and parole screening, research on recidivism prediction became more active. And in the same period, empirical studies on 'Recidivism Factors' were started even at Korea. Even though most recidivism prediction studies have so far focused on factors of recidivism or the accuracy of recidivism prediction, it is important to minimize the prediction misclassification cost, because recidivism prediction has an asymmetric error cost structure. In general, the cost of misrecognizing people who do not cause recidivism to cause recidivism is lower than the cost of incorrectly classifying people who would cause recidivism. Because the former increases only the additional monitoring costs, while the latter increases the amount of social, and economic costs. Therefore, in this paper, we propose an XGBoost(eXtream Gradient Boosting; XGB) based recidivism prediction model considering asymmetric error cost. In the first step of the model, XGB, being recognized as high performance ensemble method in the field of data mining, was applied. And the results of XGB were compared with various prediction models such as LOGIT(logistic regression analysis), DT(decision trees), ANN(artificial neural networks), and SVM(support vector machines). In the next step, the threshold is optimized to minimize the total misclassification cost, which is the weighted average of FNE(False Negative Error) and FPE(False Positive Error). To verify the usefulness of the model, the model was applied to a real recidivism prediction dataset. As a result, it was confirmed that the XGB model not only showed better prediction accuracy than other prediction models but also reduced the cost of misclassification most effectively.

A Study on Conventional Expression of Hangul Ganchal and Email (조선시대 한글 간찰과 이메일의 상투적 표현 고찰)

  • Jeon, Byeong-yong
    • (The)Study of the Eastern Classic
    • /
    • no.49
    • /
    • pp.431-459
    • /
    • 2012
  • The purpose of this article is to compare and analyze the conventional expression of Hangul Ganchal in Cheosun Dynasty and Email. Conventional expression is used remarkably in introductions and conclusions. In introduction, it is used for addressing and safety greetings while in conclusion, it is used for closing address and closing words. In Cheosun Dynasty, an envelope of Ganchal only included the details of the receiver because the letter was genuinely delivered by someone who knew the receiver and the sender very well. An envelope of Ganchal is applicable to the screen of the internet which is used for emailing. In an email, we see the name of the sender and the title of the text and once we click the title, we are able to view the text. The difference between the Ganchal and the email was reflected on how the receiver's detail showed on Ganchal and the email show the sender's details. In a case of addressing in a letter while using the conventional expression, we can see how we use "To~" in humble term and " ~께" in a honorific term. We confirmed that the conventional expression has not yet settled in both of the Gnachal and email for the seasonal greetings. The safety greetings comprised with both of the senders' and receivers' latest updates. In Ganchal, this composition is well described conventionally, whereas in emails, only the receivers' latest news are written but the senders' latest updates are hard to be seen throughout the text. In Ganchal's closing section, the closing address and closing words were expressed conventionally. However, in the case of email; those were again hard to be found throughout. To conclude, in Ganchal the conventional expression was developed and placed in 16thcentury(Sun-eon) when there was a focus in our native language. In 17thcentury(Hyeon-eon), it stood still for a sometime and moved on to 19thcentury(Jing-eon) when there was a strong in fluence of Hangul Ganchal, which resulted in regression to the conservative expression. In general, we are able to confirm that the conventional expression is slowly disappearing.