• 제목/요약/키워드: Multiple-Regression

검색결과 13,407건 처리시간 0.037초

요인분석을 이용한 유해 중금속 복합 노출수준과 건강영향과의 관련성 평가 (Evaluation of the Relationship between the Exposure Level to Mixed Hazardous Heavy Metals and Health Effects Using Factor Analysis)

  • 김은섭;문선인;임동혁;최병선;박정덕;엄상용;김용대;김헌
    • 한국환경보건학회지
    • /
    • 제48권4호
    • /
    • pp.236-243
    • /
    • 2022
  • Background: In the case of multiple exposures to different types of heavy metals, such as the conditions faced by residents living near a smelter, it would be preferable to group hazardous substances with similar characteristics rather than individually related substances and evaluate the effects of each group on the human body. Objectives: The purpose of this study is to evaluate the utility of factor analysis in the assessment of health effects caused by exposure to two or more hazardous substances with similar characteristics, such as in the case of residents living near a smelter. Methods: Heavy metal concentration data for 572 people living in the vicinity of the Janghang smelter area were grouped based on several subfactors according to their characteristics using factor analysis. Using these factor scores as an independent variable, multiple regression analysis was performed on health effect markers. Results: Through factor analysis, three subfactors were extracted. Factor 1 contained copper and zinc in serum and revealed a common characteristic of the enzyme co-factor in the human body. Factor 2 involved urinary cadmium and arsenic, which are harmful metals related to kidney damage. Factor 3 encompassed blood mercury and lead, which are classified as related to cardiovascular disease. As a result of multiple linear regression analysis, it was found that using the factor index derived through factor analysis as an independent variable is more advantageous in assessing the relevance to health effects than when analyzing the two heavy metals by including them in a single regression model. Conclusions: The results of this study suggest that regression analysis linked with factor analysis is a good alternative in that it can simultaneously identify the effects of heavy metals with similar properties while overcoming multicollinearity that may occur in environmental epidemiologic studies on exposure to various types of heavy metals.

커터수명지수 예측을 위한 다중선형회귀분석과 트리 기반 머신러닝 기법 적용 (Application of Multiple Linear Regression Analysis and Tree-Based Machine Learning Techniques for Cutter Life Index(CLI) Prediction)

  • 홍주표;고태영
    • 터널과지하공간
    • /
    • 제33권6호
    • /
    • pp.594-609
    • /
    • 2023
  • TBM 공법은 굴착면 안정성 확보 및 주변환경에 비치는 영향을 최소화하기 때문에 도심지나 하·해저터널 등에서 적용 사례가 증가하는 추세이다. 디스크 커터의 수명을 예측하는 대표적인 모델 중 NTNU모델은 커터수명지수(Cutter Life Index, CLI)를 주요 매개 변수로 활용하지만 복잡한 시험절차와 시험장비의 희귀성으로 측정에 어려움이 있다. 본 연구에서는 다중선형회귀분석과 트리 기반의 머신러닝 기법으로 암석물성을 활용하여 CLI를 예측하였다. 문헌 조사를 통해 암석의 일축압축강도, 압열인장강도, 등 가석영함량과 세르샤 마모지수 등을 포함한 데이터베이스를 구축하였고 파생변수를 계산하여 추가하였다. 다중선형회귀분석은 통계적 유의성과 다중공선성을 고려하여 입력 변수를 선정하였고 머신러닝 예측 모델은 변수 중요도를 기반으로 입력 변수를 선정하였다. 학습용과 검증용 데이터를 8:2로 나누어 모델 간 예측 성능을 비교한 결과 XGBoost가 최적의 모델로 선정되었다. 본 연구에서 도출된 다중선형회귀모델과 XGBoost모델을 선행 연구와 예측 성능을 비교하여 타당성을 확인하였다.

머신러닝 알고리즘 기반의 의료비 예측 모델 개발 (Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • 제1권1호
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

다항식 회귀분석을 이용한 전자저울의 비선형 특성 개선 연구 (A Study of the Nonlinear Characteristics Improvement for a Electronic Scale using Multiple Regression Analysis)

  • 채규수
    • 융합정보논문지
    • /
    • 제9권6호
    • /
    • pp.1-6
    • /
    • 2019
  • 본 연구에서는 다항식 회귀분석(Polynomial regression analysis) 방법을 이용하여 비선형 특성을 갖는 전자저울의 질량 추정 모델 개발이 이루어 졌다. 전자저울에 사용되는 로드셀의 출력 단자 전압을 기준 질량 추를 사용하여 직접 측정하였고 이 데이터를 이용하여 MS Office 엑셀의 행렬식 계산과 데이터 추세선 분석 기능을 이용하여 다항식 회귀모델을 구하였다. 5kg까지 측정 가능한 로드셀 전자저울을 사용하여 100g단위로 질량을 측정하였고 다항식 회귀분석(Multiple regression analysis) 모델을 구하였으며, 단순(1차), 2차, 3차 다항식 회귀분석에 대한 오차를 구하였다. 각 모델에 대한 회귀 방정식의 적합도 분석을 위해 결정계수(Coefficient of determination)를 제시하여 추정 질량과 측정 데이터와의 상관관계를 나타내었다. 본 연구에서 제안하는 3차 다항식 모델을 이용하여 추정 값의 표준편차가 10g, 결정계수 1.0으로 상당히 정확한 모델을 얻었다. 본 연구에 사용된 선형 회귀 분석 이론을 바탕으로 최근 인공지능 분야에서 많이 사용되고 있는 로지스틱 회귀 분석(Logistic regression analysis)을 활용하여 기상예측, 신약개발, 경제지표 분석 등의 분야에 대한 다양한 연구를 수행할 수 있을 것으로 생각된다.

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • 제23권4호
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.

FUZZY REGRESSION TOWARDS A GENERAL INSURANCE APPLICATION

  • Kim, Joseph H.T.;Kim, Joocheol
    • Journal of applied mathematics & informatics
    • /
    • 제32권3_4호
    • /
    • pp.343-357
    • /
    • 2014
  • In many non-life insurance applications past data are given in a form known as the run-off triangle. Smoothing such data using parametric crisp regression models has long served as the basis of estimating future claim amounts and the reserves set aside to protect the insurer from future losses. In this article a fuzzy counterpart of the Hoerl curve, a well-known claim reserving regression model, is proposed to analyze the past claim data and to determine the reserves. The fuzzy Hoerl curve is more flexible and general than the one considered in the previous fuzzy literature in that it includes a categorical variable with multiple explanatory variables, which requires the development of the fuzzy analysis of covariance, or fuzzy ANCOVA. Using an actual insurance run-off claim data we show that the suggested fuzzy Hoerl curve based on the fuzzy ANCOVA gives reasonable claim reserves without stringent assumptions needed for the traditional regression approach in claim reserving.

의료비 결정요인 분석을 위한 계량적 모형 고안 (A Quantitative Model for the Projection of Health Expenditure)

  • 김한중;이영두;남정모
    • Journal of Preventive Medicine and Public Health
    • /
    • 제24권1호
    • /
    • pp.29-36
    • /
    • 1991
  • A multiple regression analysis using ordinary least square (OLS) is frequently used for the projection of health expenditure as well as for the identification of factors affecting health care costs. Data for the analysis often have mixed characteristics of time series and cross section. Parameters as a result of OLS estimation, in this case, are no longer the best linear unbiased estimators (BLUE) because the data do not satisfy basic assumptions of regression analysis. The study theoretically examined statistical problems induced when OLS estimation was applied with the time series cross section data. Then both the OLS regression and time series cross section regression (TSCS regression) were applied to the same empirical da. Finally, the difference in parameters between the two estimations were explained through residual analysis.

  • PDF

Analysis of Success Factors for Mobile Commerce using Text Mining and PLS Regression

  • Kim, Yong-Hwan;Kim, Ja-Hee;Park, Ji hoon;Lee, Seung-Jun
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권11호
    • /
    • pp.127-134
    • /
    • 2016
  • In this paper, we propose factors that influence on the mobile commerce satisfaction conducted by data mining and a PLS regression analysis. We extracted the most frequent words from mobile application reviews in which there are a large number of user's requests. We employed the content analysis to condense the large number of texts. We took a survey with the categories by which data are condensed and specified as factors that influence on the mobile commerce satisfaction. To avoid multicollinearity, we employed a PLS regression analysis instead of using a multiple regression analysis. Discovered factors that are potential consequences of customer satisfaction from direct requests by customers, the result may be an appropriate indicator for the mobile commerce market to improve its services.

A Local Influence Approach to Regression Diagnostics with Application to Robust Regression

  • Huh, Myung-Hoe;Park, Sung H.
    • Journal of the Korean Statistical Society
    • /
    • 제19권2호
    • /
    • pp.151-159
    • /
    • 1990
  • Regression diagnostics often involves assesment of the changes that result from deleting multiple cases. Diagnostic mehtodology based on global influence measure, however, needs prohibitive computing time. As an alternative, Cook (1986) developed influence approach in which it is checked whether a minor modification of specifiation influences key results of an analysis. In line with Cook's development, we propose and study an inflence derivative method that yields both the magnitude and direction of case influences. The utility of our methodology is highlighted when case influence derivatives are plotted in a lower demensional space. Such plots are especially effective in unmasking "masked" observations in least squares regression and in robust regression also. We give several illustrations.strations.

  • PDF

L1 회귀 계수에 관한 연구 (On L1 regression coefficients)

  • 홍종선;최현집
    • 응용통계연구
    • /
    • 제6권2호
    • /
    • pp.247-252
    • /
    • 1993
  • 회귀계수 추정을 위해 잔차의 절대값의 합을 최소화하는 과정에서 회귀선이 어떤 점을 지난 다고 가정하였을 경우에는 $L_1$ 회귀계수는 표본 관측점과 주어진 점과의 기울기들의 가 중 중앙값으로 정의할 수 있음을 보였다. 그러므로 $L_1$ 방법은 회귀선이 통과하는 하나 의 최적점을 발견하는 것으로 간주될 수 있다.

  • PDF