• 제목/요약/키워드: stepwise regression model

Search Result 382, Processing Time 0.027 seconds

QSPR Study of the Absorption Maxima of Azobenzene Dyes

  • Xu, Jie;Wang, Lei;Liu, Li;Bai, Zikui;Wang, Luoxin
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.11
    • /
    • pp.3865-3872
    • /
    • 2011
  • A quantitative structure-property relationship (QSPR) study was performed for the prediction of the absorption maxima of azobenzene dyes. The entire set of 191 azobenzenes was divided into a training set of 150 azobenzenes and a test set of 41 azobenzenes according to Kennard and Stones algorithm. A seven-descriptor model, with squared correlation coefficient ($R^2$) of 0.8755 and standard error of estimation (s) of 14.476, was developed by applying stepwise multiple linear regression (MLR) analysis on the training set. The reliability of the proposed model was further illustrated using various evaluation techniques: leave-many-out crossvalidation procedure, randomization tests, and validation through the test set.

An Empirical Study on Faults Prediction for Large Scale Telecommunication Software (대규모 통신 소프트웨어의 결함 수 예측에 관한 사례 연구)

  • Park, Young-Sik;Yoon, Byeong-Nam;Lim, Jae-Hak
    • Journal of Korean Society for Quality Management
    • /
    • v.27 no.2
    • /
    • pp.263-276
    • /
    • 1999
  • In this paper, we consider the change request data collected from the system test of a large-scale telecommunication software and analyze the types and causes of failures. And we develop statistical models that incorporate a functional relation between the faults and some software metrics. To this end, we consider three possible regression models including a stepwise regression model and two nonlinear models. Three developed models are evaluated with respect to the predictive quality. We also discuss the advantage of proposed models and the application of our model to a new project.

  • PDF

The Prediction of Ship's Powering Performance Using Statistical Analysis and Theoretical Formulation (통계해석과 이론식을 이용한 저항추진성능 추정)

  • Eun-Chan,Kim;Sung-Wan,Hong;Seung-Il,Yang
    • Bulletin of the Society of Naval Architects of Korea
    • /
    • v.26 no.4
    • /
    • pp.14-26
    • /
    • 1989
  • This paper describes the method of statistical analysis and its programs for predicting the ship's powering performance. The equation for the wavemaking resistance coefficient is derived as the sectional area coefficients by using the wavemaking resistance theory and its regression coefficients are determined from the regression analysis of the model test results. The equations for the form factor, wake franction and thrust deduction fraction are derived by purely regression analysis of the principal dimensions, sectional area coefficients and model test results. The statistical analyses are performed using the various descriptive statistic and stepwise regression analysis techniques. The powering performance prognosis program is developed to cover the prediction of resistance coefficients, propulsive coefficients, propeller open-water efficiency and various scale effect corrections.

  • PDF

Development of Korean Paddy Rice Yield Prediction Model (KRPM) using Meteorological Element and MODIS NDVI (기상요소와 MODIS NDVI를 이용한 한국형 논벼 생산량 예측모형 (KRPM)의 개발)

  • Na, Sang-Il;Park, Jong-Hwa;Park, Jin-Ki
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.54 no.3
    • /
    • pp.141-148
    • /
    • 2012
  • Food policy is considered as the most basic and central issue for all countries, while making efforts to keep each country's food sovereignty and enhance food self-sufficiency. In the case of Korea where the staple food is rice, the rice yield prediction is regarded as a very important task to cope with unstable food supply at a national level. In this study, Korean paddy Rice yield Prediction Model (KRPM) developed to predict the paddy rice yield using meteorological element and MODIS NDVI. A multiple linear regression analysis was carried out by using the NDVI extracted from satellite image. Six meteorological elements include average temperature; maximum temperature; minimum temperature; rainfall; accumulated rainfall and duration of sunshine. Concerning the evaluation for the applicability of the KRPM, the accuracy assessment was carried out through correlation analysis between predicted and provided data by the National Statistical Office of paddy rice yield in 2011. The 2011 predicted yield of paddy rice by KRPM was 505 kg/10a at whole country level and 487 kg/10a by agroclimatic zones using stepwise regression while the predicted value by KOrea Statistical Information Service was 532 kg/10a. The characteristics of changes in paddy rice yield according to NDVI and other meteorological elements were well reflected by the KRPM.

Use of GIS to Develop a Multivariate Habitat Model for the Leopard Cat (Prionailurus bengalensis) in Mountainous Region of Korea

  • Rho, Paik-Ho
    • Journal of Ecology and Environment
    • /
    • v.32 no.4
    • /
    • pp.229-236
    • /
    • 2009
  • A habitat model was developed to delineate potential habitat of the leopard cat (Prionailurus bengalensis) in a mountainous region of Kangwon Province, Korea. Between 1997 and 2005, 224 leopard cat presence sites were recorded in the province in the Nationwide Survey on Natural Environments. Fifty percent of the sites were used to develop a habitat model, and the remaining sites were used to test the model. Fourteen environmental variables related to topographic features, water resources, vegetation and human disturbance were quantified for 112 of the leopard cat presence sites and an equal number of randomly selected sites. Statistical analyses (e.g., t-tests, and Pearson correlation analysis) showed that elevation, ridges, plains, % water cover, distance to water source, vegetated area, deciduous forest, coniferous forest, and distance to paved road differed significantly (P < 0.01) between presence and random sites. Stepwise logistic regression was used to develop a habitat model. Landform type (e.g., ridges vs. plains) is the major topographic factor affecting leopard cat presence. The species also appears to prefer deciduous forests and areas far from paved roads. The habitat map derived from the model correctly classified 93.75% of data from an independent sample of leopard cat presence sites, and the map at a regional scale showed that the cat's habitats are highly fragmented. Protection and restoration of connectivity of critical habitats should be implemented to preserve the leopard cat in mountainous regions of Korea.

Design and Assessment of an Ozone Potential Forecasting Model using Multi-regression Equations in Ulsan Metropolitan Area (중회귀 모형을 이용한 울산지역 오존 포텐셜 모형의 설계 및 평가)

  • Kim, Yoo-Keun;Lee, So-Young;Lim, Yun-Kyu;Song, Sang-Keun
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.23 no.1
    • /
    • pp.14-28
    • /
    • 2007
  • This study presented the selection of ozone ($O_3$) potential factors and designed and assessed its potential prediction model using multiple-linear regression equations in Ulsan area during the springtime from April to June, $2000{\sim}2004$. $O_3$ potential factors were selected by analyzing the relationship between meterological parameters and surface $O_3$ concentrations. In addition, cluster analysis (e.g., average linkage and K-means clustering techniques) was performed to identify three major synoptic patterns (e.g., $P1{\sim}P3$) for an $O_3$ potential prediction model. P1 is characterized by a presence of a low-pressure system over northeastern Korea, the Ulsan was influenced by the northwesterly synoptic flow leading to a retarded sea breeze development. P2 is characterized by a weakening high-pressure system over Korea, and P3 is clearly associated with a migratory anticyclone. The stepwise linear regression was performed to develop models for prediction of the highest 1-h $O_3$ occurring in the Ulsan. The results of the models were rather satisfactory, and the high $O_3$ simulation accuracy for $P1{\sim}P3$ synoptic patterns was found to be 79, 85, and 95%, respectively ($2000{\sim}2004$). The $O_3$ potential prediction model for $P1{\sim}P3$ using the predicted meteorological data in 2005 showed good high $O_3$ prediction performance with 78, 75, and 70%, respectively. Therefore the regression models can be a useful tool for forecasting of local $O_3$ concentration.

A Study on the Internal Control System of Fisheries Cooperative (수산업협동조합의 내부통제제도에 관한 연구)

  • 박이봉;최정윤
    • The Journal of Fisheries Business Administration
    • /
    • v.22 no.2
    • /
    • pp.101-148
    • /
    • 1991
  • The fisheries cooperative (FC) performs the economic and nonprofitable activity to get the fundamental objective of enhancing cooperative members' economic and social position. The internal control system fitted for a local FC should be required for not only solving the resulting problem from the complexity of FC environment nowaday but also delegating authorities and performance from FC Federation to a local FC by implementing the local autonomy. The methodology of this study is to empirically test and to analyze the condition of FC internal control system (FCICS) by the questionnaire survey. The actual condition of FCICS in Korea is analyzed by the questionnaire and the detailed contents are as follows : (1) sending 208 questionnaire consisting of 162 questions, and receiving 92 replies from 39 manufactures (business firms) and 15 banks in Gyungnam and Pusan area and 25 FC and 13 agricultural cooperative (AC) in Korea, (2) the analyzed results of FC and AC are treated simultaneously. In the fundamentals of above analyzed results, the evaluation model of FCICS is tried to construct from the relationship between the financial condition of FC and the internal control elements through the stepwise regression method. (1) By the stepwise regression method, the number of FC officials $(X_1)$, the experimental number of regular auditing $(X_7)$, and auditing duty years $(X_8)$ are finally accepted as independent variables, (2) and the final model becomes $Y=-1.53526+0.34455X_1+0.24513X_7+0.16585X_8$/ and this model explains to the extent of 47.826%. From the above study, following proposals are to be suggested: (1) The function and problem of internal control in FCICS is able to be improved by enforcing the function of FCICS and enriching the management's recognition of FCICS (2) The cooperative president can bring up good FC by the rational operation of FCICS according to the size and the performance pattern of FC, adding up to enhance members' economic and social position.

  • PDF

Climatic Influence on Seed Protein Content in Soybean(Glycine max) (기상요인이 콩 단백질 함량에 미치는 영향)

  • M. H. Yang;J. W. Burton
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.42 no.5
    • /
    • pp.539-547
    • /
    • 1997
  • This study was carried out to identify how soybean seed protein concentration is influenced by climatic factors. Twelve lines selected for seed protein concentration were studied in 13 environments of North Carolina. Sensitivity of seed protein concentration, total seed protein, and seed yield to climatic variables was investigated using a linear regression model. Best response models were determined using two stepwise selection methods, Maximum R-square and Stepwise Selection. There were wide climatic effects in seed protein concentration, total protein and seed yield. The highest protein concentration environment was characterized by the most high temperature days(HTD) and the smallest variance of average daily temperature range (VADTRg), while the lowest protein concentration environment was distinguished by the fewest HTD and the largest VADTRg. For protein concentration, all lines responded positively to average maximum daily temperature(MxDT), HTD, and average daily temperature range(ADTRg) and negatively to ADRa, while they responded positively or negatively to average daily temperature(ADT), variance of average minimum daily temperature (VMnDT), and VADTRg, indicating that genotypes may greatly differ in degrees of sensitivity to each climatic variable. Eleven lines seemed to have best response models with 2 or 3 variables. Exceptionally, NC106 did not show a significant sensitivity to any climatic variable and thus did not have a best response model. This indicates that it may be considered phenotypically more stable. For total seed protein and seed yield, all the lines responded negatively to both ADTRg and VADRa, suggesting that synthesis of seed components may increase with less daily temperature range and less variation in daily rainfall.

  • PDF

Individual Tree Growth Models for Natural Mixed Forests in Changbai Mountains, Northeast China

  • Lu, Jun;Li, Fengri
    • Journal of Korean Society of Forest Science
    • /
    • v.96 no.2
    • /
    • pp.160-169
    • /
    • 2007
  • The data used to develop distance-independent individual models for natural mixed forests were collected from 712 remeasured permanent sample plots (25,526 trees) of 10-year periodic from 1990 to 2000 in Baihe Forest Bureau of Changbai Mountains, northeast China. Based on analyzing relationship between diameter increment of individual trees with tree size, competitive status, and site condition, the diameter growth models for individual trees of 15 species growing in mixed-species uneven-aged forest stands, that have simple form, good predicting precision, and easily applicable, were developed using stepwise regression method. The main variables influencing on diameter increment of individual trees were tree size and competition, however, the site conditions were not significantly related with diameter increment. The tree size variables (lnDBH and $DBH^2$) were the most significant and important predictors of diameter growth existing in all 15 growth models. The diameter increment was directly proportional to tree diameter for each species. For the competitive factors in growth model, the relative diameter (RD), canopy closure (P), and the ratio of diameter of subject tree with maximum diameter (DDM) were contributed to the diameter increment at a certain extent. Other measures of stand density, such as basal area of stand (G) and stand density index (SDI), were not significantly influenced on diameter increment. Site factors, such as site index, slope and aspect were not important to diameter increment and excluded in the final models. The total variance explained by the final models of squared diameter increment ($R^2$) for all 15 species ranged from 35% to 72% and these results compared quit closely with those of Wykoff (1990) for mixed conifer stands. Using independent data set, validation measures were evaluated for predicting models of diameter increment developed in this study. The result indicated that the estimated precision was all greater than 94% and the models were suitable to describe diameter increment.

Prediction of Quantitative Traits Using Common Genetic Variants: Application to Body Mass Index

  • Bae, Sunghwan;Choi, Sungkyoung;Kim, Sung Min;Park, Taesung
    • Genomics & Informatics
    • /
    • v.14 no.4
    • /
    • pp.149-159
    • /
    • 2016
  • With the success of the genome-wide association studies (GWASs), many candidate loci for complex human diseases have been reported in the GWAS catalog. Recently, many disease prediction models based on penalized regression or statistical learning methods were proposed using candidate causal variants from significant single-nucleotide polymorphisms of GWASs. However, there have been only a few systematic studies comparing existing methods. In this study, we first constructed risk prediction models, such as stepwise linear regression (SLR), least absolute shrinkage and selection operator (LASSO), and Elastic-Net (EN), using a GWAS chip and GWAS catalog. We then compared the prediction accuracy by calculating the mean square error (MSE) value on data from the Korea Association Resource (KARE) with body mass index. Our results show that SLR provides a smaller MSE value than the other methods, while the numbers of selected variables in each model were similar.