• Title/Summary/Keyword: Variable selection

Search Result 885, Processing Time 0.028 seconds

A Study on Social Issues and Consumption Behavior Using Big Data (빅데이터를 활용한 사회적 이슈와 소비행동 연구)

  • Baek, Seung-Heon;Kim, Gi-Tak
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.8
    • /
    • pp.377-389
    • /
    • 2019
  • This study conducted social network big data analysis to investigate consumer's perception of Japanese sporting goods related to Japanese boycott and to extract problems and variables by recognition. Social network big data analysis was conducted in two areas, "Japanese boycott" and "Japanese sporting goods". Months of data were collected and investigated. If you specify the research method, you will identify the issues of the times - keyword setting using social network analysis - clustering using CONCOR analysis using TEXTOM and Ucinet 6 programs - variable selection through expert meetings - questionnaire preparation and answering - and validity of questionnaire Reliability Verification - It consists of hypothesis verification using the structural model equation. Based on the results of using the big data of social networks, four variables of relevant characteristics, nationality, attitude, and consumption behavior were extracted. A total of 30 questions and 292 questionnaires were used for final hypothesis verification. As a result of the analysis, first, the boycott-related characteristics showed a positive relationship with nationality. Specifically, all of the characteristics related to boycotts (necessary boycott, sense of boycott, and perceived boycott benefits were positively related to nationality. In addition, nationality was found to have a positive relationship with consumption behavior.

The Effect on the Switching Intention to the Blockchain-based Supply Chain Management Information System (블록체인 기반 공급망관리 정보시스템으로의 전환의도에 영향을 미치는 요인)

  • Kyoung Sang Oh;Dong Myung Lee
    • Journal of Industrial Convergence
    • /
    • v.20 no.12
    • /
    • pp.11-25
    • /
    • 2022
  • In this study, we want to verify the factors that affect the intention to switch to a supply chain management information system applied with blockchain. To this end, variable selection and research model were constructed through the review of previous studies, and empirical analysis was conducted using the TOE framework and PPM model. The effects of Push and Pull factors on the intention to switch to the block chain system and the moderating effect through the switching cost which is a Mooring factor, were verified. The hypothesis was verified using a structural equation model using a sample of 320 response data by conducting a questionnaire survey on small and medium-sized enterprises located in Korea. As a result of the study, social influence, which is a push factor, and management's will to innovate, which is a Pull factor, had a significant effect on switching intention. And the moderating effect between the groups with high and low switching cost recognition was confirmed. This study is significant in that it presents the concept and research direction of SCBM (supply chain & blockchain management) that can enhance the competitiveness of a company through the implementation of a blockchain-based supply chain management information system.

Prediction of Dormant Customer in the Card Industry (카드산업에서 휴면 고객 예측)

  • DongKyu Lee;Minsoo Shin
    • Journal of Service Research and Studies
    • /
    • v.13 no.2
    • /
    • pp.99-113
    • /
    • 2023
  • In a customer-based industry, customer retention is the competitiveness of a company, and improving customer retention improves the competitiveness of the company. Therefore, accurate prediction and management of potential dormant customers is paramount to increasing the competitiveness of the enterprise. In particular, there are numerous competitors in the domestic card industry, and the government is introducing an automatic closing system for dormant card management. As a result of these social changes, the card industry must focus on better predicting and managing potential dormant cards, and better predicting dormant customers is emerging as an important challenge. In this study, the Recurrent Neural Network (RNN) methodology was used to predict potential dormant customers in the card industry, and in particular, Long-Short Term Memory (LSTM) was used to efficiently learn data for a long time. In addition, to redefine the variables needed to predict dormant customers in the card industry, Unified Theory of Technology (UTAUT), an integrated technology acceptance theory, was applied to redefine and group the variables used in the model. As a result, stable model accuracy and F-1 score were obtained, and Hit-Ratio proved that models using LSTM can produce stable results compared to other algorithms. It was also found that there was no moderating effect of demographic information that could occur in UTAUT, which was pointed out in previous studies. Therefore, among variable selection models using UTAUT, dormant customer prediction models using LSTM are proven to have non-biased stable results. This study revealed that there may be academic contributions to the prediction of dormant customers using LSTM algorithms that can learn well from previously untried time series data. In addition, it is a good example to show that it is possible to respond to customers who are preemptively dormant in terms of customer management because it is predicted at a time difference with the actual dormant capture, and it is expected to contribute greatly to the industry.

Preoperative Prediction for Early Recurrence Can Be as Accurate as Postoperative Assessment in Single Hepatocellular Carcinoma Patients

  • Dong Ik Cha;Kyung Mi Jang;Seong Hyun Kim;Young Kon Kim;Honsoul Kim;Soo Hyun Ahn
    • Korean Journal of Radiology
    • /
    • v.21 no.4
    • /
    • pp.402-412
    • /
    • 2020
  • Objective: To evaluate the performance of predicting early recurrence using preoperative factors only in comparison with using both pre-/postoperative factors. Materials and Methods: We retrospectively reviewed 549 patients who had undergone curative resection for single hepatcellular carcinoma (HCC) within Milan criteria. Multivariable analysis was performed to identify pre-/postoperative high-risk factors of early recurrence after hepatic resection for HCC. Two prediction models for early HCC recurrence determined by stepwise variable selection methods based on Akaike information criterion were built, either based on preoperative factors alone or both pre-/postoperative factors. Area under the curve (AUC) for each receiver operating characteristic curve of the two models was calculated, and the two curves were compared for non-inferiority testing. The predictive models of early HCC recurrence were internally validated by bootstrap resampling method. Results: Multivariable analysis on preoperative factors alone identified aspartate aminotransferase/platelet ratio index (OR, 1.632; 95% CI, 1.056-2.522; p = 0.027), tumor size (OR, 1.025; 95% CI, 0.002-1.049; p = 0.031), arterial rim enhancement of the tumor (OR, 2.350; 95% CI, 1.297-4.260; p = 0.005), and presence of nonhypervascular hepatobiliary hypointense nodules (OR, 1.983; 95% CI, 1.049-3.750; p = 0.035) on gadoxetic acid-enhanced magnetic resonance imaging as significant factors. After adding postoperative histopathologic factors, presence of microvascular invasion (OR, 1.868; 95% CI, 1.155-3.022; p = 0.011) became an additional significant factor, while tumor size became insignificant (p = 0.119). Comparison of the AUCs of the two models showed that the prediction model built on preoperative factors alone was not inferior to that including both pre-/postoperative factors {AUC for preoperative factors only, 0.673 (95% confidence interval [CI], 0.623-0.723) vs. AUC after adding postoperative factors, 0.691 (95% CI, 0.639-0.744); p = 0.0013}. Bootstrap resampling method showed that both the models were valid. Conclusion: Risk stratification solely based on preoperative imaging and laboratory factors was not inferior to that based on postoperative histopathologic risk factors in predicting early recurrence after curative resection in within Milan criteria single HCC patients.

Analysis-based Pedestrian Traffic Incident Analysis Based on Logistic Regression (로지스틱 회귀분석 기반 노인 보행자 교통사고 요인 분석)

  • Siwon Kim;Jeongwon Gil;Jaekyung Kwon;Jae seong Hwang;Choul ki Lee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.23 no.2
    • /
    • pp.15-31
    • /
    • 2024
  • The characteristics of elderly traffic accidents were identified by reflecting the situation of the elderly population in Korea, which is entering an ultra-aging society, and the relationship between independent and dependent variables was analyzed by classifying traffic accidents of serious or higher and traffic accidents of minor or lower in elderly pedestrian traffic accidents using binomial variables. Data collection, processing, and variable selection were performed by acquiring data from the elderly pedestrian traffic accident analysis system (TAAS) for the past 10 years (from 13 to 22 years), and basic statistics and analysis by accident factors were performed. A total of 15 influencing variables were derived by applying the logistic regression model, and the influencing variables that have the greatest influence on the probability of a traffic accident involving severe or higher elderly pedestrians were derived. After that, statistical tests were performed to analyze the suitability of the logistic model, and a method for predicting the probability of a traffic accident according to the construction of a prediction model was presented.

Associations between income and survival in cholangiocarcinoma: A comprehensive subtype-based analysis

  • Calvin X. Geng;Anuragh R. Gudur;Jagannath Kadiyala;Daniel S. Strand;Vanessa M. Shami;Andrew Y. Wang;Alexander Podboy;Tri M. Le;Matthew Reilley;Victor Zaydfudim;Ross C. D. Buerlein
    • Annals of Hepato-Biliary-Pancreatic Surgery
    • /
    • v.28 no.2
    • /
    • pp.144-154
    • /
    • 2024
  • Backgrounds/Aims: Socioeconomic determinants of health are incompletely characterized in cholangiocarcinoma (CCA). We assessed how socioeconomic status influences initial treatment decisions and survival outcomes in patients with CCA, additionally performing multiple sub-analyses based on anatomic location of the primary tumor. Methods: Observational study using the 2018 submission of the Surveillance, Epidemiology, and End Results (SEER)-18 Database. In total, 5,476 patients from 2004-2015 with a CCA were separated based on median household income (MHI) into low income (< 25th percentile of MHI) and high income (> 25th percentile of MHI) groups. Seventy-three percent of patients had complete follow up data, and were included in survival analyses. Survival and treatment outcomes were calculated using R-studio. Results: When all cases of CCA were included, the high-income group was more likely than the low-income to receive surgery, chemotherapy, and local tumor destruction modalities. Initial treatment modality based on income differed significantly between tumor locations. Patients of lower income had higher overall and cancer-specific mortality at 2 and 5 years. Non-cancer mortality was similar between the groups. Survival differences identified in the overall cohort were maintained in the intrahepatic CCA subgroup. No differences between income groups were noted in cancer-specific or overall mortality for perihilar tumors, with variable differences in the distal cohort. Conclusions: Lower income was associated with higher rates of cancer-specific mortality and lower rates of surgical resection in CCA. There were significant differences in treatment selection and outcomes between intrahepatic, perihilar, and distal tumors. Population-based strategies aimed at identifying possible etiologies for these disparities are paramount to improving patient outcomes.

Development of Bond Strength Model for FRP Plates Using Back-Propagation Algorithm (역전파 학습 알고리즘을 이용한 콘크리트와 부착된 FRP 판의 부착강도 모델 개발)

  • Park, Do-Kyong
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.10 no.2
    • /
    • pp.133-144
    • /
    • 2006
  • In order to catch out such Bond Strength, the preceding researchers had ever examined the Bond Strength of FRP Plate through their experimentations by setting up of various fluent. However, since the experiment for research on such Bond Strength takes much of expenditure for equipment structure and time-consuming, also difficult to carry out, it is conducting limitedly. This Study purposes to develop the most suitable Artificial Neural Network Model by application of various Neural Network Model and Algorithm to the adhering experiment data of the preceding researchers. Output Layer of Artificial Neural Network Model, and Input Layer of Bond Strength were performed the learning by selection as the variable of the thickness, width, adhered length, the modulus of elasticity, tensile strength, and the compressive strength of concrete, tensile strength, width, respectively. The developed Artificial Neural Network Model has applied Back-Propagation, and its error was learnt to be converged within the range of 0.001. Besides, the process for generalization has dissolved the problem of Over-Fitting in the way of more generalized method by introduction of Bayesian Technique. The verification on the developed Model was executed by comparison with the resulted value of Bond Strength made by the other preceding researchers which was never been utilized to the learning as yet.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (부도예측을 위한 KNN 앙상블 모형의 동시 최적화)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.139-157
    • /
    • 2016
  • Bankruptcy involves considerable costs, so it can have significant effects on a country's economy. Thus, bankruptcy prediction is an important issue. Over the past several decades, many researchers have addressed topics associated with bankruptcy prediction. Early research on bankruptcy prediction employed conventional statistical methods such as univariate analysis, discriminant analysis, multiple regression, and logistic regression. Later on, many studies began utilizing artificial intelligence techniques such as inductive learning, neural networks, and case-based reasoning. Currently, ensemble models are being utilized to enhance the accuracy of bankruptcy prediction. Ensemble classification involves combining multiple classifiers to obtain more accurate predictions than those obtained using individual models. Ensemble learning techniques are known to be very useful for improving the generalization ability of the classifier. Base classifiers in the ensemble must be as accurate and diverse as possible in order to enhance the generalization ability of an ensemble model. Commonly used methods for constructing ensemble classifiers include bagging, boosting, and random subspace. The random subspace method selects a random feature subset for each classifier from the original feature space to diversify the base classifiers of an ensemble. Each ensemble member is trained by a randomly chosen feature subspace from the original feature set, and predictions from each ensemble member are combined by an aggregation method. The k-nearest neighbors (KNN) classifier is robust with respect to variations in the dataset but is very sensitive to changes in the feature space. For this reason, KNN is a good classifier for the random subspace method. The KNN random subspace ensemble model has been shown to be very effective for improving an individual KNN model. The k parameter of KNN base classifiers and selected feature subsets for base classifiers play an important role in determining the performance of the KNN ensemble model. However, few studies have focused on optimizing the k parameter and feature subsets of base classifiers in the ensemble. This study proposed a new ensemble method that improves upon the performance KNN ensemble model by optimizing both k parameters and feature subsets of base classifiers. A genetic algorithm was used to optimize the KNN ensemble model and improve the prediction accuracy of the ensemble model. The proposed model was applied to a bankruptcy prediction problem by using a real dataset from Korean companies. The research data included 1800 externally non-audited firms that filed for bankruptcy (900 cases) or non-bankruptcy (900 cases). Initially, the dataset consisted of 134 financial ratios. Prior to the experiments, 75 financial ratios were selected based on an independent sample t-test of each financial ratio as an input variable and bankruptcy or non-bankruptcy as an output variable. Of these, 24 financial ratios were selected by using a logistic regression backward feature selection method. The complete dataset was separated into two parts: training and validation. The training dataset was further divided into two portions: one for the training model and the other to avoid overfitting. The prediction accuracy against this dataset was used to determine the fitness value in order to avoid overfitting. The validation dataset was used to evaluate the effectiveness of the final model. A 10-fold cross-validation was implemented to compare the performances of the proposed model and other models. To evaluate the effectiveness of the proposed model, the classification accuracy of the proposed model was compared with that of other models. The Q-statistic values and average classification accuracies of base classifiers were investigated. The experimental results showed that the proposed model outperformed other models, such as the single model and random subspace ensemble model.

An empirical study on a firm's fail prediction model by considering whether there are embezzlement, malpractice and the largest shareholder changes or not (횡령.배임 및 최대주주변경을 고려한 부실기업예측모형 연구)

  • Moon, Jong Geon;Hwang Bo, Yun
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.9 no.1
    • /
    • pp.119-132
    • /
    • 2014
  • This study analyzed the failure prediction model of the firms listed on the KOSDAQ by considering whether there are embezzlement, malpractice and the largest shareholder changes or not. This study composed a total of 166 firms by using two-paired sampling method. For sample of failed firm, 83 manufacturing firms which delisted on KOSDAQ market for 4 years from 2009 to 2012 are selected. For sample of normal firm, 83 firms (with same item or same business as failed firm) that are listed on KOSDAQ market and perform normal business activities during the same period (from 2009 to 2012) are selected. This study selected 80 financial ratios for 5 years immediately preceding from delisting of sample firm above and conducted T-test to derive 19 of them which emerged for five consecutive years among significant variables and used forward selection to estimate logistic regression model. While the precedent studies only analyzed the data of three years immediately preceding the delisting, this study analyzes data of five years immediately preceding the delisting. This study is distinct from existing previous studies that it researches which significant financial characteristic influences the insolvency from the initial phase of insolvent firm with time lag and it also empirically analyzes the usefulness of data by building a firm's fail prediction model which considered embezzlement/malpractice and the largest shareholder changes as dummy variable(non-financial characteristics). The accuracy of classification of the prediction model with dummy variable appeared 95.2% in year T-1, 88.0% in year T-2, 81.3% in year T-3, 79.5% in year T-4, and 74.7% in year T-5. It increased as year of delisting approaches and showed generally higher the accuracy of classification than the results of existing previous studies. This study expects to reduce the damage of not only the firm but also investors, financial institutions and other stakeholders by finding the firm with high potential to fail in advance.

  • PDF