• Title/Summary/Keyword: statistical technique

Search Result 1,913, Processing Time 0.021 seconds

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.

Consumer Responses to Retailer's Location-based Mobile Shopping Service : Focusing on PAD Emotional State Model and Information Relevance (유통업체의 위치기반 모바일 쇼핑서비스 제공에 대한 소비자 반응 : PAD 감정모델과 정보의 상황관련성을 중심으로)

  • Lee, Hyun-Hwa;Moon, Hee-Kang
    • Journal of Distribution Research
    • /
    • v.17 no.2
    • /
    • pp.63-92
    • /
    • 2012
  • This study investigated consumer intention to use a location-based mobile shopping service (LBMSS) that integrates cognitive and affective responses. Information relevancy was integrated into pleasure-arousal-dominance (PAD) emotional state model in the present study as a conceptual framework. The results of an online survey of 335 mobile phone users in the U.S. indicated the positive effects of arousal and information relevancy on pleasure. In addition, there was a significant relationship between pleasure and intention to use a LBMSS. However, the relationship between dominance and pleasure was not statistically significant. The results of the present study provides insight to retailers and marketers as to what factors they need to consider to implement location-based mobile shopping services to improve their business performance. Extended Abstract : Location aware technology has expanded the marketer's reach by reducing space and time between a consumer's receipt of advertising and purchase, offering real-time information and coupons to consumers in purchasing situations (Dickenger and Kleijnen, 2008; Malhotra and Malhotra, 2009). LBMSS increases the relevancy of SMS marketing by linking advertisements to a user's location (Bamba and Barnes, 2007; Malhotra and Malhotra, 2009). This study investigated consumer intention to use a location-based mobile shopping service (LBMSS) that integrates cognitive and affective response. The purpose of the study was to examine the relationship among information relevancy and affective variables and their effects on intention to use LBMSS. Thus, information relevancy was integrated into pleasure-arousal-dominance (PAD) model and generated the following hypotheses. Hypothesis 1. There will be a positive influence of arousal concerning LBMSS on pleasure in regard to LBMSS. Hypothesis 2. There will be a positive influence of dominance in LBMSS on pleasure in regard to LBMSS. Hypothesis 3. There will be a positive influence of information relevancy on pleasure in regard to LBMSS. Hypothesis 4. There will be a positive influence of pleasure about LBMSS on intention to use LBMSS. E-mail invitations were sent out to a randomly selected sample of three thousand consumers who are older than 18 years old and mobile phone owners, acquired from an independent marketing research company. An online survey technique was employed utilizing Dillman's (2000) online survey method and follow-ups. A total of 335 valid responses were used for the data analysis in the present study. Before the respondents answer any of the questions, they were told to read a document describing LBMSS. The document included definitions and examples of LBMSS provided by various service providers. After that, they were exposed to a scenario describing the participant as taking a saturday shopping trip to a mall and then receiving a short message from the mall. The short message included new product information and coupons for same day use at participating stores. They then completed a questionnaire containing various questions. To assess arousal, dominance, and pleasure, we adapted and modified scales used in the previous studies in the context of location-based mobile shopping service, each of the five items from Mehrabian and Russell (1974). A total of 15 items were measured on a seven-point bipolar scale. To measure information relevancy, four items were borrowed from Mason et al. (1995). Intention to use LBMSS was captured using two items developed by Blackwell, and Miniard (1995) and one items developed by the authors. Data analyses were conducted using SPSS 19.0 and LISREL 8.72. A total of usable 335 data were obtained after deleting the incomplete responses, which results in a response rate of 11.20%. A little over half of the respondents were male (53.9%) and approximately 60% of respondents were married (57.4%). The mean age of the sample was 29.44 years with a range from 19 to 60 years. In terms of the ethnicity there were European Americans (54.5%), Hispanic American (5.3%), African-American (3.6%), and Asian American (2.9%), respectively. The respondents were highly educated; close to 62.5% of participants in the study reported holding a college degree or its equivalent and 14.5% of the participants had graduate degree. The sample represents all income categories: less than $24,999 (10.8%), $25,000-$49,999 (28.34%), $50,000-$74,999 (13.8%), and $75,000 or more (10.23%). The respondents of the study indicated that they were employed in many occupations. Responses came from all 42 states in the U.S. To identify the dimensions of research constructs, Exploratory Factor Analysis (EFA) using a varimax rotation was conducted. As indicated in table 1, these dimensions: arousal, dominance, relevancy, pleasure, and intention to use, suggested by the EFA, explained 82.29% of the total variance with factor loadings ranged from .74 to .89. As a next step, CFA was conducted to validate the dimensions that were identified from the exploratory factor analysis and to further refine the scale. Table 1 exhibits the results of measurement model analysis and revealed a chi-square of 202.13 with degree-of-freedom of 89 (p =.002), GFI of .93, AGFI = .89, CFI of .99, NFI of .98, which indicates of the evidence of a good model fit to the data (Bagozzi and Yi, 1998; Hair et al., 1998). As table 1 shows, reliability was estimated with Cronbach's alpha and composite reliability (CR) for all multi-item scales. All the values met evidence of satisfactory reliability in multi-item measure for alpha (>.91) and CR (>.80). In addition, we tested the convergent validity of the measure using average variance extracted (AVE) by following recommendations from Fornell and Larcker (1981). The AVE values for the model constructs ranged from .74 through .85, which are higher than the threshold suggested by Fornell and Larcker (1981). To examine discriminant validity of the measure, we again followed the recommendations from Fornell and Larcker (1981). The shared variances between constructs were smaller than the AVE of the research constructs and confirm discriminant validity of the measure. The causal model testing was conducted using LISREL 8.72 with a maximum-likelihood estimation method. Table 2 shows the results of the hypotheses testing. The results for the conceptual model revealed good overall fit for the proposed model. Chi-square was 342.00 (df = 92, p =.000), NFI was .97, NNFI was .97, GFI was .89, AGFI was .83, and RMSEA was .08. All paths in the proposed model received significant statistical support except H2. The paths from arousal to pleasure (H1: ${\ss}$=.70; t = 11.44), from information relevancy to intention to use (H3 ${\ss}$ =.12; t = 2.36), from information relevancy to pleasure (H4 ${\ss}$ =.15; t = 2.86), and pleasure to intention to use (H5: ${\ss}$=.54; t = 9.05) were significant. However, the path from dominance to pleasure was not supported. This study investigated consumer intention to use a location-based mobile shopping service (LBMSS) that integrates cognitive and affective responses. Information relevancy was integrated into pleasure-arousal-dominance (PAD) emotional state model as a conceptual framework. The results of the present study support previous studies indicating that emotional responses as well as cognitive responses have a strong impact on accepting new technology. The findings of this study suggest potential marketing strategies to mobile service developers and retailers who are considering the implementation of LBMSS. It would be rewarding to develop location-based mobile services that integrate information relevancy and which cause positive emotional responses.

  • PDF