• Title/Summary/Keyword: Regression Analysis Method

Search Result 4,614, Processing Time 0.037 seconds

Input Variable Importance in Supervised Learning Models

  • Huh, Myung-Hoe;Lee, Yong Goo
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.1
    • /
    • pp.239-246
    • /
    • 2003
  • Statisticians, or data miners, are often requested to assess the importances of input variables in the given supervised learning model. For the purpose, one may rely on separate ad hoc measures depending on modeling types, such as linear regressions, the neural networks or trees. Consequently, the conceptual consistency in input variable importance measures is lacking, so that the measures cannot be directly used in comparing different types of models, which is often done in data mining processes, In this short communication, we propose a unified approach to the importance measurement of input variables. Our method uses sensitivity analysis which begins by perturbing the values of input variables and monitors the output change. Research scope is limited to the models for continuous output, although it is not difficult to extend the method to supervised learning models for categorical outcomes.

A Study on the Estimating Functions of Price and Domestic Consumption of Chestnut in South Korea (우리나라의 밤 가격(價格) 및 국내소비량(國內消費量) 추정(推定)에 관(關)한 연구(硏究))

  • Jeon, Jun-Heon;Lee, Sang-Sik
    • Journal of the Korean Wood Science and Technology
    • /
    • v.21 no.4
    • /
    • pp.29-34
    • /
    • 1993
  • This study was carried out to estimate price and domestic consumption functions of chestnut using time series data for the period 1970~1989. Using a regression analysis method, price and domestic consumption functions of chestnut in Korea are estimated. The result of this study reveals that the optimum function of price for chestnut is PR= -249.33965 + 163532.56817 EX/POP-4.10177 PD+4.02877 DC+6056.98339 GDP/POP($R^2$=0.88207), and that optimum function of domestic consumption for chestnut is ln DC=14.97145+1.48279 ln PD/POP - 0.32853 ln GDP - 0.02337 ln PR - 0.12117 ln EX($R^2$=0.98689). On the ground that instability of prices make the income of producer and family finances of consumer unstable, the object of price-policy should be to stabilize price of chestnut in Korea.

  • PDF

Optimal Variable Selection in a Thermal Error Model for Real Time Error Compensation (실시간 오차 보정을 위한 열변형 오차 모델의 최적 변수 선택)

  • Hwang, Seok-Hyun;Lee, Jin-Hyeon;Yang, Seung-Han
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.16 no.3 s.96
    • /
    • pp.215-221
    • /
    • 1999
  • The object of the thermal error compensation system in machine tools is improving the accuracy of a machine tool through real time error compensation. The accuracy of the machine tool totally depends on the accuracy of thermal error model. A thermal error model can be obtained by appropriate combination of temperature variables. The proposed method for optimal variable selection in the thermal error model is based on correlation grouping and successive regression analysis. Collinearity matter is improved with the correlation grouping and the judgment function which minimizes residual mean square is used. The linear model is more robust against measurement noises than an engineering judgement model that includes the higher order terms of variables. The proposed method is more effective for the applications in real time error compensation because of the reduction in computational time, sufficient model accuracy, and the robustness.

  • PDF

Change Analysis for Inheritance Relation in Method Level (계승관계에서 구성원 함수 수준의 변경 영향 분석)

  • 방정원
    • Journal of the Korea Society of Computer and Information
    • /
    • v.7 no.1
    • /
    • pp.27-32
    • /
    • 2002
  • Software reuse was focused for the way of improving programmer productivity from the crisis of software. Object oriented technology impact on overall area of software engineering, such as software analysis , Programming language. testing and maintenance. The new concepts, Class, Inheritance and encapsulation, not only introduce new testing problems and they raise a new challenging question of how to conduct regression testing for 0-0 programs the first problem of regression testing is how to identify the affected components due to the changes of some components. We propose a method firewall to enclose all classes and methods affected by the changes to one or more methods in Inheritance relation

  • PDF

A Study on the Fault Process and Equipment Analysis of Plastic Ball Grid Array Manufacturing Using Data-Mining Techniques

  • Sim, Hyun Sik
    • Journal of Information Processing Systems
    • /
    • v.16 no.6
    • /
    • pp.1271-1280
    • /
    • 2020
  • The yield and quality of a micromanufacturing process are important management factors. In real-world situations, it is difficult to achieve a high yield from a manufacturing process because the products are produced through multiple nanoscale manufacturing processes. Therefore, it is necessary to identify the processes and equipment that lead to low yields. This paper proposes an analytical method to identify the processes and equipment that cause a defect in the plastic ball grid array (PBGA) during the manufacturing process using logistic regression and stepwise variable selection. The proposed method was tested with the lot trace records of a real work site. The records included the sequence of equipment that the lot had passed through and the number of faults of each type in the lot. We demonstrated that the test results reflect the real situation in a PBGA manufacturing process, and the major equipment parameters were then controlled to confirm the improvement in yield; the yield improved by approximately 20%.

A Generation and Accuracy Evaluation of Common Metadata Prediction Model Using Public Bicycle Data and Imputation Method

  • Kim, Jong-Chan;Jung, Se-Hoon
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.2
    • /
    • pp.287-296
    • /
    • 2022
  • Today, air pollution is becoming a severe issue worldwide and various policies are being implemented to solve environmental pollution. In major cities, public bicycles are installed and operated to reduce pollution and solve transportation problems, and operational information is collected in real time. However, research using public bicycle operation information data has not been processed. This study uses the daily weather data of Korea Meteorological Agency and real-time air pollution data of Korea Environment Corporation to predict the amount of daily rental bicycles. Cross- validation, principal component analysis and multiple regression analysis were used to determine the independent variables of the predictive model. Then, the study selected the elements that satisfy the significance level, constructed a model, predicted the amount of daily rental bicycles, and measured the accuracy.

Prediction of movie audience numbers using hybrid model combining GLS and Bass models (GLS와 Bass 모형을 결합한 하이브리드 모형을 이용한 영화 관객 수 예측)

  • Kim, Bokyung;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.447-461
    • /
    • 2018
  • Domestic film industry sales are increasing every year. Theaters are the primary sales channels for movies and the number of audiences using the theater affects additional selling rights. Therefore, the number of audiences using the theater is an important factor directly linked to movie industry sales. In this paper we consider a hybrid model that combines a multiple linear regression model and the Bass model to predict the audience numbers for a specific day. By combining the two models, the predictive value of the regression analysis was corrected to that of the Bass model. In the analysis, three films with different release dates were used. All subset regression method is used to generate all possible combinations and 5-fold cross validation to estimate the model 5 times. In this case, the predicted value is obtained from the model with the smallest root mean square error and then combined with the predicted value of the Bass model to obtain the final predicted value. With the existence of past data, it was confirmed that the weight of the Bass model increases and the compensation is added to the predicted value.

Modeling of Flow-Accelerated Corrosion using Machine Learning: Comparison between Random Forest and Non-linear Regression (기계학습을 이용한 유동가속부식 모델링: 랜덤 포레스트와 비선형 회귀분석과의 비교)

  • Lee, Gyeong-Geun;Lee, Eun Hee;Kim, Sung-Woo;Kim, Kyung-Mo;Kim, Dong-Jin
    • Corrosion Science and Technology
    • /
    • v.18 no.2
    • /
    • pp.61-71
    • /
    • 2019
  • Flow-Accelerated Corrosion (FAC) is a phenomenon in which a protective coating on a metal surface is dissolved by a flow of fluid in a metal pipe, leading to continuous wall-thinning. Recently, many countries have developed computer codes to manage FAC in power plants, and the FAC prediction model in these computer codes plays an important role in predictive performance. Herein, the FAC prediction model was developed by applying a machine learning method and the conventional nonlinear regression method. The random forest, a widely used machine learning technique in predictive modeling led to easy calculation of FAC tendency for five input variables: flow rate, temperature, pH, Cr content, and dissolved oxygen concentration. However, the model showed significant errors in some input conditions, and it was difficult to obtain proper regression results without using additional data points. In contrast, nonlinear regression analysis predicted robust estimation even with relatively insufficient data by assuming an empirical equation and the model showed better predictive power when the interaction between DO and pH was considered. The comparative analysis of this study is believed to provide important insights for developing a more sophisticated FAC prediction model.

A study on the spatial neighborhood in spatial regression analysis (공간이웃정보를 고려한 공간회귀분석)

  • Kim, Sujung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.505-513
    • /
    • 2017
  • Recently, numerous small area estimation studies have been conducted to obtain more detailed and accurate estimation results. Most of these studies have employed spatial regression models, which require a clear definition of spatial neighborhoods. In this study, we introduce the Delaunay triangulation as a method to define spatial neighborhood, and compare this method with the k-nearest neighbor method. A simulation was conducted to determine which of the two methods is more efficient in defining spatial neighborhood, and we demonstrate the performance of the proposed method using a land price data.

An Optimal Design Method of a Linear Generator for Conversion of Wave Energy (파력에너지 변환을 위한 선형발전기의 최적 설계 방법)

  • Kim, Jung-Yoon;Kim, Byung Soo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.16 no.6
    • /
    • pp.1195-1204
    • /
    • 2021
  • In this paper, we present an optimal design method for wave power generators using the response surface analysis. Especially, in our method, we reduce the mechanical loss by selecting the linear generator whose linear movement can be converted to the electrical energy directly with the vertical movement of waves. Therefore, we calculate the exciting force acting on the drive device in a slow-wave condition and determine the winding process with a ratio of the slots and poles for the improvement of energy conversion efficiency. In addition, we employ the regression analysis for deriving the shape factors of the stator and the translator, which have a significant effect on the performance of a generator. We choose the best design variables through the response surface analysis, and then we study the optimization method for designing the efficient experiment using the analysis results. Finally, we show the validity of the proposed method through the simulation results.