• Title/Summary/Keyword: Random regression

Search Result 955, Processing Time 0.024 seconds

Business Intelligence Design for Strategic Decision Making for Small and Midium-size E-Commerce Sellers: Focusing on Promotion Strategy (중소 전자상거래 판매상의 전략적 의사결정을 위한 비즈니스 인텔리전스 설계: 프로모션 전략을 중심으로)

  • Seung-Joo Lee;Young-Hyun Lee;Jin-Hyun Lee;Kang-Hyun Lee;Kwang-Sup Shin
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.201-222
    • /
    • 2023
  • As the e-Commerce gets increased based on the platform, a lot of small and medium sized sellers have tried to develop the more effective strategies to maximize the profit. In order to increase the profitability, it is quite important to make the strategic decisions based on the range of promotion, discount rate and categories of products. This research aims to develop the business intelligence application which can help sellers of e-Commerce platform make better decisions. To decide whether or not to promote, it is needed to predict the level of increase in sales after promotion. I n this research, we have applied the various machine learning algorithm such as MLP(Multi Layer Perceptron), Gradient Boosting Regression, Random Forest, and Linear Regression. Because of the complexity of data structure and distinctive characteristics of product categories, Random Forest and MLP showed the best performance. It seems possible to apply the proposed approach in this research in support the small and medium sized sellers to react on the market changes and to make the reasonable decisions based on the data, not their own experience.

Prediction of random-regression coefficient for daily milk yield after 305 days in milk by using the regression-coefficient estimates from the first 305 days

  • Yamazaki, Takeshi;Takeda, Hisato;Hagiya, Koichi;Yamaguchi, Satoshi;Sasaki, Osamu
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.10
    • /
    • pp.1542-1549
    • /
    • 2018
  • Objective: Because lactation periods in dairy cows lengthen with increasing total milk production, it is important to predict individual productivities after 305 days in milk (DIM) to determine the optimal lactation period. We therefore examined whether the random regression (RR) coefficient from 306 to 450 DIM (M2) can be predicted from those during the first 305 DIM (M1) by using a RR model. Methods: We analyzed test-day milk records from 85,690 Holstein cows in their first lactations and 131,727 cows in their later (second to fifth) lactations. Data in M1 and M2 were analyzed separately by using different single-trait RR animal models. We then performed a multiple regression analysis of the RR coefficients of M2 on those of M1 during the first and later lactations. Results: The first-order Legendre polynomials were practical covariates of RR for the milk yields of M2. All RR coefficients for the additive genetic (AG) effect and the intercept for the permanent environmental (PE) effect of M2 had moderate to strong correlations with the intercept for the AG effect of M1. The coefficients of determination for multiple regression of the combined intercepts for the AG and PE effects of M2 on the coefficients for the AG effect of M1 were moderate to high. The daily milk yields of M2 predicted by using the RR coefficients for the AG effect of M1 were highly correlated with those obtained by using the coefficients of M2. Conclusion: Milk production after 305 DIM can be predicted by using the RR coefficient estimates of the AG effect during the first 305 DIM.

Multicollinarity in Logistic Regression

  • Jong-Han lee;Myung-Hoe Huh
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.303-309
    • /
    • 1995
  • Many measures to detect multicollinearity in linear regression have been proposed in statistics and numerical analysis literature. Among them, condition number and variance inflation factor(VIF) are most popular. In this study, we give new interpretations of condition number and VIF in linear regression, using geometry on the explanatory space. In the same line, we derive natural measures of condition number and VIF for logistic regression. These computer intensive measures can be easily extended to evaluate multicollinearity in generalized linear models.

  • PDF

Machine learning in survival analysis (생존분석에서의 기계학습)

  • Baik, Jaiwook
    • Industry Promotion Research
    • /
    • v.7 no.1
    • /
    • pp.1-8
    • /
    • 2022
  • We investigated various types of machine learning methods that can be applied to censored data. Exploratory data analysis reveals the distribution of each feature, relationships among features. Next, classification problem has been set up where the dependent variable is death_event while the rest of the features are independent variables. After applying various machine learning methods to the data, it has been found that just like many other reports from the artificial intelligence arena random forest performs better than logistic regression. But recently well performed artificial neural network and gradient boost do not perform as expected due to the lack of data. Finally Kaplan-Meier and Cox proportional hazard model have been employed to explore the relationship of the dependent variable (ti, δi) with the independent variables. Also random forest which is used in machine learning has been applied to the survival analysis with censored data.

Promoter classification using random generator-controlled generalized regression neural network

  • Kim, Kunho;Kim, Byungwhan;Kim, Kyungnam;Hong, Jin-Han;Park, Sang-Ho
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.595-598
    • /
    • 2003
  • A new classifier is constructed by using a generalized regression neural network (GRNN) in conjunction with a random generator (RC). The RG played a role of generating a number of sets of random spreads given a range for gaussian functions in the pattern layer, The range experimentally varied from 0.4 to 1.4. The DNA sequences consisted 4 types of promoters. The performance of classifier is examined in terms of total classification sensitivity (TCS), and individual classification sensitivity (ICS). for comparisons, another GRNN classifier was constructed and optimized in conventional way. Compared GRNN, the RG-GRNN demonstrated much improved TCS along with better ICS on average.

  • PDF

Restricted maximum likelihood estimation of a censored random effects panel regression model

  • Lee, Minah;Lee, Seung-Chun
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.4
    • /
    • pp.371-383
    • /
    • 2019
  • Panel data sets have been developed in various areas, and many recent studies have analyzed panel, or longitudinal data sets. Maximum likelihood (ML) may be the most common statistical method for analyzing panel data models; however, the inference based on the ML estimate will have an inflated Type I error because the ML method tends to give a downwardly biased estimate of variance components when the sample size is small. The under estimation could be severe when data is incomplete. This paper proposes the restricted maximum likelihood (REML) method for a random effects panel data model with a censored dependent variable. Note that the likelihood function of the model is complex in that it includes a multidimensional integral. Many authors proposed to use integral approximation methods for the computation of likelihood function; however, it is well known that integral approximation methods are inadequate for high dimensional integrals in practice. This paper introduces to use the moments of truncated multivariate normal random vector for the calculation of multidimensional integral. In addition, a proper asymptotic standard error of REML estimate is given.

Investment, Export, and Exchange Rate on Prediction of Employment with Decision Tree, Random Forest, and Gradient Boosting Machine Learning Models (투자와 수출 및 환율의 고용에 대한 의사결정 나무, 랜덤 포레스트와 그래디언트 부스팅 머신러닝 모형 예측)

  • Chae-Deug Yi
    • Korea Trade Review
    • /
    • v.46 no.2
    • /
    • pp.281-299
    • /
    • 2021
  • This paper analyzes the feasibility of using machine learning methods to forecast the employment. The machine learning methods, such as decision tree, artificial neural network, and ensemble models such as random forest and gradient boosting regression tree were used to forecast the employment in Busan regional economy. The following were the main findings of the comparison of their predictive abilities. First, the forecasting power of machine learning methods can predict the employment well. Second, the forecasting values for the employment by decision tree models appeared somewhat differently according to the depth of decision trees. Third, the predictive power of artificial neural network model, however, does not show the high predictive power. Fourth, the ensemble models such as random forest and gradient boosting regression tree model show the higher predictive power. Thus, since the machine learning method can accurately predict the employment, we need to improve the accuracy of forecasting employment with the use of machine learning methods.

URL Phishing Detection System Utilizing Catboost Machine Learning Approach

  • Fang, Lim Chian;Ayop, Zakiah;Anawar, Syarulnaziah;Othman, Nur Fadzilah;Harum, Norharyati;Abdullah, Raihana Syahirah
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.297-302
    • /
    • 2021
  • The development of various phishing websites enables hackers to access confidential personal or financial data, thus, decreasing the trust in e-business. This paper compared the detection techniques utilizing URL-based features. To analyze and compare the performance of supervised machine learning classifiers, the machine learning classifiers were trained by using more than 11,005 phishing and legitimate URLs. 30 features were extracted from the URLs to detect a phishing or legitimate URL. Logistic Regression, Random Forest, and CatBoost classifiers were then analyzed and their performances were evaluated. The results yielded that CatBoost was much better classifier than Random Forest and Logistic Regression with up to 96% of detection accuracy.

Semiparametric Kernel Poisson Regression for Longitudinal Count Data

  • Hwang, Chang-Ha;Shim, Joo-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.1003-1011
    • /
    • 2008
  • Mixed-effect Poisson regression models are widely used for analysis of correlated count data such as those found in longitudinal studies. In this paper, we consider kernel extensions with semiparametric fixed effects and parametric random effects. The estimation is through the penalized likelihood method based on kernel trick and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of hyperparameters, cross-validation techniques are employed. Examples illustrating usage and features of the proposed method are provided.

Generalized Partially Linear Additive Models for Credit Scoring

  • Shim, Ju-Hyun;Lee, Young-K.
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.587-595
    • /
    • 2011
  • Credit scoring is an objective and automatic system to assess the credit risk of each customer. The logistic regression model is one of the popular methods of credit scoring to predict the default probability; however, it may not detect possible nonlinear features of predictors despite the advantages of interpretability and low computation cost. In this paper, we propose to use a generalized partially linear model as an alternative to logistic regression. We also introduce modern ensemble technologies such as bagging, boosting and random forests. We compare these methods via a simulation study and illustrate them through a German credit dataset.