• Title/Summary/Keyword: multivariate regression analysis

Search Result 1,104, Processing Time 0.025 seconds

Multivariate Analysis for Classification of Smog Type during the Summer Season in Seoul, Korea (다변량해석을 이용한 서울시 하계 스모그의 형태 분류)

  • 홍낙기;이종범;김용국
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.9 no.4
    • /
    • pp.278-287
    • /
    • 1993
  • In order to calssify smog type durnig the summer season in Seoul, air Quality and meterorological data were analyzed by multivariate analysis. Among 15 variables relating to visibility, 10 variables were selected by multiple regression analysis for clustering of smog types; total suspended particle, sulfur dioxide, ozone, ntrogen dioxide, total hydrocarbon, south-north wind component, ralative humidity, precipitable water, mixing height and air temperature. Somg types were grouped into three clusters using cubic clustering criterion and the mumbers of days in each cluster were contained 74, 28 and 16 days. Each cluster was seperated clearly by sulfur dioxide, precipitable water and air teperature. The first cluster was representative of high ozone concentration and prevailing meterological conditions for ozone formation. Therefore, visibility in the first cluster was considered to be affected by photochemical smog. The third cluster showed characteristics of sulphurous smog type due to the higher concentration of primary pollutant, based on the dry condition than that in another cluster. On the other hand, the characteristic of the second cluster was not relatively clear, but considered to be in an intermediate characteristic between photochemical smog and sulphurous smog type.

  • PDF

Evaluating Variable Selection Techniques for Multivariate Linear Regression (다중선형회귀모형에서의 변수선택기법 평가)

  • Ryu, Nahyeon;Kim, Hyungseok;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.42 no.5
    • /
    • pp.314-326
    • /
    • 2016
  • The purpose of variable selection techniques is to select a subset of relevant variables for a particular learning algorithm in order to improve the accuracy of prediction model and improve the efficiency of the model. We conduct an empirical analysis to evaluate and compare seven well-known variable selection techniques for multiple linear regression model, which is one of the most commonly used regression model in practice. The variable selection techniques we apply are forward selection, backward elimination, stepwise selection, genetic algorithm (GA), ridge regression, lasso (Least Absolute Shrinkage and Selection Operator) and elastic net. Based on the experiment with 49 regression data sets, it is found that GA resulted in the lowest error rates while lasso most significantly reduces the number of variables. In terms of computational efficiency, forward/backward elimination and lasso requires less time than the other techniques.

Analysis of Factors for Korean Women's Cancer Screening through Hadoop-Based Public Medical Information Big Data Analysis (Hadoop기반의 공개의료정보 빅 데이터 분석을 통한 한국여성암 검진 요인분석 서비스)

  • Park, Min-hee;Cho, Young-bok;Kim, So Young;Park, Jong-bae;Park, Jong-hyock
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.10
    • /
    • pp.1277-1286
    • /
    • 2018
  • In this paper, we provide flexible scalability of computing resources in cloud environment and Apache Hadoop based cloud environment for analysis of public medical information big data. In fact, it includes the ability to quickly and flexibly extend storage, memory, and other resources in a situation where log data accumulates or grows over time. In addition, when real-time analysis of accumulated unstructured log data is required, the system adopts Hadoop-based analysis module to overcome the processing limit of existing analysis tools. Therefore, it provides a function to perform parallel distributed processing of a large amount of log data quickly and reliably. Perform frequency analysis and chi-square test for big data analysis. In addition, multivariate logistic regression analysis of significance level 0.05 and multivariate logistic regression analysis of meaningful variables (p<0.05) were performed. Multivariate logistic regression analysis was performed for each model 3.

Predicting Landslide Damaged Area According to Climate Change Scenarios (기후변화 시나리오를 적용한 산사태 피해면적 변화 예측)

  • Song Eu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.4
    • /
    • pp.376-386
    • /
    • 2023
  • Due to climate changes, landslide hazards in the Republic of Korea (hereafter South Korea) continuously increase. To establish the effective landslide mitigation strategies, such as erosion control works, landslide hazard estimation in the long-term perspective should be proceeded considering the influence of climate changes. In this study, we examined the change in landslide-damaged areas in South Korea responding to climate change scenarios using the multivariate regression method. Data on landslide-damaged areas and rainfall from 1981-2010 were used as a training dataset. Sev en indices were deriv ed from rainfall data as the model's input data, corresponding to rainfall indices provided from two SSP scenarios for South Korea: SSP1-2.6 and SSP5-8.5. Prior to the multivariate regression analysis, we conducted the VIF test and the dimension analysis of regression model using PCA. Based on the result of PCA, we developed a regression model for landslide damaged area estimation with two principal components, which cov ered about 93% of total v ariance. With climate change scenarios, we simulated landslide-damaged areas in 2030-2100 using the regression model. As a result, the landslide-damaged area will be enlarged more than the double of current annual mean landslide damaged area of 1981-2010; It infers that landslide mitigation strategies should be reinforced considering the future climate condition.

A study on the properties of sensitivity analysis in principal component regression and latent root regression (주성분회귀와 고유값회귀에 대한 감도분석의 성질에 대한 연구)

  • Shin, Jae-Kyoung;Chang, Duk-Joon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.2
    • /
    • pp.321-328
    • /
    • 2009
  • In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.

  • PDF

Predictors of Chewing Discomfort among Community-dwelling Elderly (지역사회 노인에서의 저작불편감 예측요인)

  • Moon, Seol Hwa;Hong, Gwi-Ryung Son
    • Research in Community and Public Health Nursing
    • /
    • v.28 no.3
    • /
    • pp.302-312
    • /
    • 2017
  • Purpose: The purpose of this study was to identify associated factors of chewing discomfort among community-dwelling elderly. Methods: The study was cross-sectional design and secondary data analysis using the 6th Korea National Health and Nutrition Examination Survey. Among the total of 7,550 participants, data was analyzed with 1,126 adults aged 65 years and over. Chewing discomfort was assessed by the perceived chewing discomfort. Multivariate logistic regression analysis was used to find the associated factors of chewing discomfort. Results: Along with 61.7% of the participants reported having chewing discomfort, 85.2% reported to perceive poor oral health and 35.0% had oral pain. In multivariate logistic regression, perceived oral health (OR 3.22, 95% CI 2.24~4.63), oral pain (OR 2.46, 95% CI 1.76~3.43), activity limitation (OR 1.71, 95% CI 1.05~2.80), teeth requiring treatment (OR 1.61, 95% CI 1.14~2.26), number of remaining teeth (OR 1.60, 95% CI 1.22~2.10) and educational level (OR 1.56, 95% CI 1.15~2.12) were the significant predictors of chewing discomfort. Conclusion: The prevalence in chewing discomfort was high in elderly Koreans and various factors were associated with chewing discomfort. To improve chewing ability, it is suggested that the national level of policies offer strategical oral health programs in this population.

Factors Affecting Colorectal Cancer Screening Behaviors : Based on the 4th Korea National Health and Nutrition Examination Survey (대장암 조기 검진 행위에 영향을 미치는 요인 -제4차 2기(2008년) 국민건강영양조사 자료를 중심으로-)

  • Lim, Ji-Hye;Kim, Sun-Young
    • Korean Journal of Health Education and Promotion
    • /
    • v.28 no.1
    • /
    • pp.69-80
    • /
    • 2011
  • Objectives: This study aims to identify the factors associated with colorectal cancer screening behaviors. Methods: The nation-wide representative samples of 2,928 adults aged ${\geq}50$ years for colorectal cancer screening were derived from the fourth Korea National Health and Nutrition Examination Survey (KNHANES IV). This study investigated socio-demographic, health behavioral and contextual factors associated with colorectal cancer screening using descriptive statistics and multivariate logistic regression analysis. Results: In terms of socio-demographic factors, gender, age, marital status, occupation, monthly income, and resident region were significantly different between screening group and non-screening group. Among health behavioral and contextual factors, regular physical checkup, weight control, physical activity, smoking, drinking and having other cancers were significantly different. From the multivariate logistic regression analysis, marital status, education level, regular physical checkup and weight control were associated with colorectal cancer screening behavior. Conclusions: It is necessary to understand the importance of early detection and cancer screening. Appropriate health education and active promotion about the cancer screening should be developed based on the study findings in order to motivate people to have cancer screening. Also, these findings should be reflected in the health policy.

Customer Churning Forecasting and Strategic Implication in Online Auto Insurance using Decision Tree Algorithms (의사결정나무를 이용한 온라인 자동차 보험 고객 이탈 예측과 전략적 시사점)

  • Lim, Se-Hun;Hur, Yeon
    • Information Systems Review
    • /
    • v.8 no.3
    • /
    • pp.125-134
    • /
    • 2006
  • This article adopts a decision tree algorithm(C5.0) to predict customer churning in online auto insurance environment. Using a sample of on-line auto insurance customers contracts sold between 2003 and 2004, we test how decision tree-based model(C5.0) works on the prediction of customer churning. We compare the result of C5.0 with those of logistic regression model(LRM), multivariate discriminant analysis(MDA) model. The result shows C5.0 outperforms other models in the predictability. Based on the result, this study suggests a way of setting marketing strategy and of developing online auto insurance business.

Factors Influencing Human Papillomavirus Vaccination Intention among Unvaccinated Nursing Students in Korea (인유두종바이러스 백신 미접종 간호대생의 접종의도 영향 요인)

  • Yun, Younghee;Koh, Chin-Kang
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.24 no.3
    • /
    • pp.205-213
    • /
    • 2018
  • Purpose: This study was performed to identify factors associated with human papillomavirus vaccination intention among unvaccinated nursing students. Methods: Two hundred-and-five female nursing students from three universities completed self-administered questionnaires including participants' characteristics, human papillomavirus-related knowledge, attitude toward human papillomavirus vaccination, and human papillomavirus-related health beliefs. Multivariate logistic regression analysis was used to determine significant independent predictors of human papillomavirus vaccination intention. Results: Of 205 participants, 134 (65.4%) reported an intention to obtain a vaccination against human papillomavirus. As a result of the analysis of the bivariate relationships, family history of cervix cancer, perceived needs, importance of prevention, perceived susceptibility, perceived benefit, and perceived barrier were significantly related to vaccination intention. A multivariate logistic regression model identified factors of human papillomavirus vaccination intention: higher importance of prevention (Adjusted Odds Ratio [AOR]: 4.20, 95% Confidence Interval [CI]: 1.73~10.19), higher perceived benefit (AOR: 6.94, 95% CI: 2.01~23.98), lower perceived barrier (AOR: 0.39, 95% CI: 0.20~0.73). Conclusion: The results of this study indicated significant factors influencing the intention to obtain human papillomavirus vaccination in unvaccinated nursing students. Also, the importance of prevention, perceived susceptibility, perceived benefit, and perceived barrier in obtaining human papillomavirus vaccination should be taken into account when developing educational programs.

Study on the Annoyance Response in the Area Exposed by Road Traffic Noise and Railway Noise (도로교통소음과 철도소음 복합노출지역에서의 성가심 반응)

  • Ko, Joon-Hee;Chang, Seo-Il;Son, Jin-Hee;Lee, Kun
    • Transactions of the Korean Society for Noise and Vibration Engineering
    • /
    • v.20 no.2
    • /
    • pp.172-178
    • /
    • 2010
  • The multiple regression analysis and path analysis in each dominant area of noise source are conducted to analyze the relationship between dependent variables like annoyance and independent ones such as noise and non-noise factors. The multiple regression analysis shows that impact of noise factors is the highest to annoyance in dominant areas of road traffic and railway noise. Meanwhile, impact of non-noise factors such as sensitivity and satisfaction of environment on annoyance is also high in these areas. The path analysis result for multivariate analysis between various independent and dependent variables is similar to that of the multiple regression analysis. However, noise factor is the greatest factor influent on annoyance in the dominant areas of the combined noise, and relationship between annoyance and sensitivity is the highest in combined area exposed to road traffic noise and railway noise.