• 제목/요약/키워드: stepwise regression model

검색결과 382건 처리시간 0.031초

방화 발생에 영향을 미치는 요인에 관한 연구 (A Study on the Factors Affecting the Arson)

  • 김영철;박우성;이수경
    • 한국화재소방학회논문지
    • /
    • 제28권2호
    • /
    • pp.69-75
    • /
    • 2014
  • 본 연구에서는 방화발생에 영향을 미치는 요인을 도출하기 위하여 발생건수를 종속변수로 하고 경제 인구 사회적 요인을 독립변수로 하는 다중회귀분석을 실시하였다. 다중회귀분석은 선형함수, 준로그함수, 역준로그함수, 이중로그함수 4가지 함수형태에 대해 적용하였으며, 각 단계별로 변수의 선택과 제외를 고려하는 단계적선택 방식을 적용하였다. 다중공선성 문제와 자기상관 문제를 해결하기 위하여 분산확대지수(VIF)와 Durbin-Watson 계수 이용하였으며, 4가지 함수모형에 대하여 수정된 R 제곱(설명력) 값이 0.935 (93.5%)로 가장 값이 높고 통계적으로 유의한 선형함수모형을 최적의 모형으로 결정하고 모형에 대한 해석을 진행하였다. 선형함수모형 결과 방화발생에 영향을 미치는 요인은 범죄발생건수(0.829), 일반이혼율(0.151), 재정자주도(0.149), 소비자물가상승률(0.099) 순으로 도출되었다.

Pure additive contribution of genetic variants to a risk prediction model using propensity score matching: application to type 2 diabetes

  • Park, Chanwoo;Jiang, Nan;Park, Taesung
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.47.1-47.12
    • /
    • 2019
  • The achievements of genome-wide association studies have suggested ways to predict diseases, such as type 2 diabetes (T2D), using single-nucleotide polymorphisms (SNPs). Most T2D risk prediction models have used SNPs in combination with demographic variables. However, it is difficult to evaluate the pure additive contribution of genetic variants to classically used demographic models. Since prediction models include some heritable traits, such as body mass index, the contribution of SNPs using unmatched case-control samples may be underestimated. In this article, we propose a method that uses propensity score matching to avoid underestimation by matching case and control samples, thereby determining the pure additive contribution of SNPs. To illustrate the proposed propensity score matching method, we used SNP data from the Korea Association Resources project and reported SNPs from the genome-wide association study catalog. We selected various SNP sets via stepwise logistic regression (SLR), least absolute shrinkage and selection operator (LASSO), and the elastic-net (EN) algorithm. Using these SNP sets, we made predictions using SLR, LASSO, and EN as logistic regression modeling techniques. The accuracy of the predictions was compared in terms of area under the receiver operating characteristic curve (AUC). The contribution of SNPs to T2D was evaluated by the difference in the AUC between models using only demographic variables and models that included the SNPs. The largest difference among our models showed that the AUC of the model using genetic variants with demographic variables could be 0.107 higher than that of the corresponding model using only demographic variables.

랜덤 포리스트를 이용한 비제어 급성 출혈성 쇼크의 흰쥐에서의 생존 예측 (A Survival Prediction Model of Rats in Uncontrolled Acute Hemorrhagic Shock Using the Random Forest Classifier)

  • 최준열;김성권;구정모;김덕원
    • 대한의용생체공학회:의공학회지
    • /
    • 제33권3호
    • /
    • pp.148-154
    • /
    • 2012
  • Hemorrhagic shock is a primary cause of deaths resulting from injury in the world. Although many studies have tried to diagnose accurately hemorrhagic shock in the early stage, such attempts were not successful due to compensatory mechanisms of humans. The objective of this study was to construct a survival prediction model of rats in acute hemorrhagic shock using a random forest (RF) model. Heart rate (HR), mean arterial pressure (MAP), respiration rate (RR), lactate concentration (LC), and peripheral perfusion (PP) measured in rats were used as input variables for the RF model and its performance was compared with that of a logistic regression (LR) model. Before constructing the models, we performed 5-fold cross validation for RF variable selection, and forward stepwise variable selection for the LR model to examine which variables were important for the models. For the LR model, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (ROC-AUC) were 0.83, 0.95, 0.88, and 0.96, respectively. For the RF models, sensitivity, specificity, accuracy, and AUC were 0.97, 0.95, 0.96, and 0.99, respectively. In conclusion, the RF model was superior to the LR model for survival prediction in the rat model.

Status of PM10 as an air pollutant and prediction using meteorological indexes in Shiraz, Iran

  • Masoudi, Masoud;Poor, Neda Rajai;Ordibeheshti, Fatemeh
    • Advances in environmental research
    • /
    • 제7권2호
    • /
    • pp.109-120
    • /
    • 2018
  • In the present study research air quality analyses for $PM_{10}$, were conducted in Shiraz, a city in the south of Iran. The measurements were taken from 2011 through 2012 in two different locations to prepare average data in the city. The averages concentrations were calculated for every 24 hours, each month and each season. Results showed that the highest concentration of $PM_{10}$ occurs generally in the night while the least concentration was found at the afternoon. Monthly concentrations of $PM_{10}$ showed highest value in August, while least value was found in January. The seasonal concentrations showed the least amounts in autumn while the highest amounts in summer. Relations between the air pollutant and some meteorological parameters were calculated statistically using the daily average data. The wind data (velocity, direction), relative humidity, temperature, sunshine periods, evaporation, dew point and rainfall were considered as independent variables. The relationships between concentration of pollutant and meteorological parameters were expressed by multiple linear regression equations for both annual and seasonal conditions SPSS software. RMSE test showed that among different prediction models, stepwise model is the best option.

한국과학영재학교 학생의 학교생활만족도: 생태학적 접근 (An Analysis of Ecological Factors Affecting Student-Life Satisfaction in Korea Science Academy)

  • 김애희;윤종희
    • 대한가정학회지
    • /
    • 제48권2호
    • /
    • pp.51-62
    • /
    • 2010
  • The primary purpose of this study was to employ an ecological model to analyze relative magnitudes of significant predictors affecting school life satisfaction in Korea Science Academy. The instruments used for this study were school life satisfaction Scale, Self-Efficacy Scale, Relationship Skill Scale, Internal Control Scale, Emotional Intelligence Scale, and FACE IV Scale. Data were collected by purposive sampling of 180 students of the Korea Science Academy in Busan, Korea. The data were analyzed by frequency, percentile, mean, standard deviation, Cronbach' ${\alpha}$, Pearson's productive correlation, hierarchical regression and stepwise regression, using SPSS 15.0+WIN program package. The results were as follows: 1. The level of school life satisfaction in Korea Science Academy was found to be high(Mean = 4.24, SD = 0.57). 2. Model IV was the most powerful. It explained 49.7% of the school life satisfaction. 3. Relationship with friends(${\beta}$ = .443), with teachers(${\beta}$ = .273), and self-efficacy(${\beta}$ = .201) were significant factors in explaining the school life satisfaction. The three variables explained 49.9% of school life satisfaction.

노인의 치매예방 행위의도에 미치는 영향요인 (Factors Influencing Dementia Preventive Behavior Intention in the Elderly People)

  • 최원희;서영미;김보람
    • 동서간호학연구지
    • /
    • 제25권2호
    • /
    • pp.138-146
    • /
    • 2019
  • Purpose: The purpose of this study was to identify the factors influencing dementia preventive behavior intention of the elderly people based on the Health Belief Model. Methods: The participants included 113 elderly people who met the eligibility criteria. Demographic variables, variables of the Health Belief Model (perceived susceptibility, perceived severity, perceived benefit, perceived barrier, cues to action, general health motivation, and self efficacy), dementia fear and behavioral intention of dementia prevention were examined using structured self-report questionnaires. Statistical analysis was performed by stepwise multiple regression using SPSS for Windows version 21. Results: Self efficacy, alcohol drinking, perceived barrier and education level were significant factors, which explained 32% of the variance in dementia preventive behavior intention. Multiple regression analysis demonstrated that a powerful predictor of dementia preventive behavior intention of the elderly was self efficacy. Conclusion: Developing nursing intervention to enhance self efficacy toward improvement of dementia preventive behavior among elderly people would be recommended.

An Empirical Testing of a House Pricing Model in the Indian Market

  • HODA, Najmul;JAFRI, Syed Ashraf;AHMAD, Naim;HUSSAIN, Syed Mannawar
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제7권8호
    • /
    • pp.33-40
    • /
    • 2020
  • The main aim of the study is to test a house pricing model by combining hedonic and asset-based pricing models. An understanding of the relationship between house pricing and its return (the rental income) helps to establish houses as a significant asset class. The model tested the relationship between house pricing (dependent variable) and the house attributes (independent variables) derived from Freeman's framework of housing attributes. This study uses a large data-set of 1,899 sample of new, high-end houses purchased between 2016 and 2019 collected from the national capital region of India (Delhi-NCR). The algorithm was built in R-Script, and stepwise multiple linear regression was used to analyze the model. The analysis of the model proves that the three significant variables, namely, carpet area, pay-off, and annual maintenance charges explain the price function. Further, the model is statistically fit. The major contribution of the study is to understand the key factors and their influence on the house pricing. The model will be helpful in risk assessment in the housing investment and enhance the chances of investment. Policy-makers can use information about the underlying valuation drivers of the house prices to stabilize the market and also in framing the tax policies.

딥러닝과 머신러닝을 이용한 아파트 실거래가 예측 (Apartment Price Prediction Using Deep Learning and Machine Learning)

  • 김학현;유환규;오하영
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제12권2호
    • /
    • pp.59-76
    • /
    • 2023
  • 코로나 시대 이후 아파트 가격 상승은 비상식적이었다. 이러한 불확실한 부동산 시장에서 가격 예측 연구는 매우 중요하다. 본 논문에서는 다양한 부동산 사이트에서 자료 수집 및 크롤링을 통해 2015년부터 2020년까지 87만개의 방대한 데이터셋을 구축하고 다양한 아파트 정보와 경제지표 등 가능한 많은 변수를 모은 뒤 미래 아파트 매매실거래가격을 예측하는 모델을 만든다. 해당 연구는 먼저 다중 공선성 문제를 변수 제거 및 결합으로 해결하였다. 이후 의미있는 독립변수들을 뽑아내는 전진선택법(Forward Selection), 후진소거법(Backward Elimination), 단계적선택법(Stepwise Selection), L1 Regularization, 주성분분석(PCA) 총 5개의 변수 선택 알고리즘을 사용했다. 또한 심층신경망(DNN), XGBoost, CatBoost, Linear Regression 총 4개의 머신러닝 및 딥러닝 알고리즘을 이용해 하이퍼파라미터 최적화 후 모델을 학습시키고 모형간 예측력을 비교하였다. 추가 실험에서는 DNN의 node와 layer 수를 바꿔가면서 실험을 진행하여 가장 적절한 node와 layer 수를 찾고자 하였다. 결론적으로 가장 성능이 우수한 모델로 2021년의 아파트 매매실거래가격을 예측한 후 실제 2021년 데이터와 비교한 결과 훌륭한 성과를 보였다. 이를 통해 머신러닝과 딥러닝은 다양한 경제 상황 속에서 투자자들이 주택을 구매할 때 올바른 판단을 할 수 있도록 도움을 줄 수 있을 것이라 확신한다.

A Yield Estimation Model of Forage Rye Based on Climate Data by Locations in South Korea Using General Linear Model

  • Peng, Jing Lun;Kim, Moon Ju;Kim, Byong Wan;Sung, Kyung Il
    • 한국초지조사료학회지
    • /
    • 제36권3호
    • /
    • pp.205-214
    • /
    • 2016
  • The objective of this study was to construct a forage rye (FR) dry matter yield (DMY) estimation model based on climate data by locations in South Korea. The data set (n = 549) during 29 years were used. Six optimal climatic variables were selected through stepwise multiple regression analysis with DMY as the response variable. Subsequently, via general linear model, the final model including the six climatic variables and cultivated locations as dummy variables was constructed as follows: DMY = 104.166SGD + 1.454AAT + 147.863MTJ + 59.183PAT150 - 4.693SRF + 45.106SRD - 5230.001 + Location, where SGD was spring growing days, AAT was autumnal accumulated temperature, MTJ was mean temperature in January, PAT150 was period to accumulated temperature 150, SRF was spring rainfall, and SRD was spring rainfall days. The model constructed in this research could explain 24.4 % of the variations in DMY of FR. The homoscedasticity and the assumption that the mean of the residuals were equal to zero was satisfied. The goodness-of-fit of the model was proper based on most scatters of the predicted DMY values fell within the 95% confidence interval.

통계적 예측모형을 활용한 경륜 경기 순위 분석 (Analysis of cycle racing ranking using statistical prediction models)

  • 박가희;박리라;송종우
    • 응용통계연구
    • /
    • 제30권1호
    • /
    • pp.25-39
    • /
    • 2017
  • 최근 경륜은 2015년도 기준, 5백만 명 이상의 많은 사람들이 참여하고 2조를 넘어선 매출을 발생시키는 대중적인 레저스포츠로서 자리 잡고 있다. 본 연구의 목적은 다양한 통계적 분석기법을 사용하여 경륜경기의 순위를 예측하고, 순위에 유의한 영향을 미치는 변수들을 파악하는 데에 있다. 다양한 Classification 방법과 Regression 방법들을 적용하여 순위예측모형을 만들고 비교분석하였다. 대부분의 모형에서 공통적으로 선택된 변수들을 살펴보면, 등급이 강급될수록, 종합득점이 높을수록 순위가 높아지며 반대로 등급이 승급될수록, 번호 4번을 부여받을수록 그리고 최근성적의 순위가 낮을수록 순위가 낮아지는 것을 알 수 있었다. 또한, 선수의 실력과 관련된 연속형 변수들을 각 경기별로 평균값을 빼서 보정한 자료와 원자료를 사용하여 모형을 적합시킨 결과 모든 모형에서 보정된 자료를 사용하였을 때 더 낮은 오분류율을 보였다. 마지막으로 분석에 사용하지 않은 최근 한 달 경기결과를 예측해서 베팅했을 때 모든 경우에 예측률은 높았지만 큰 이익을 거두지 못했는데 그 이유는 낮은 배당률을 가진 경기의 결과만을 잘 예측했기 때문이다.