• 제목/요약/키워드: statistical regression modeling

검색결과 194건 처리시간 0.035초

Application of Variable Selection for Prediction of Target Concentration

  • 김선우;김연주;김종원;윤길원
    • Bulletin of the Korean Chemical Society
    • /
    • 제20권5호
    • /
    • pp.525-527
    • /
    • 1999
  • Many types of chemical data tend to be characterized by many measured variables on each of a few observations. In this situation, target concentration can be predicted using multivariate statistical modeling. However, it is necessary to use a few variables considering size and cost of instrumentation, for an example, for development of a portable biomedical instrument. This study presents, with a spectral data set of total hemoglobin in whole blood, the possibility that modeling using only a few variables can improve predictability compared to modeling using all of the variables. Predictability from the model using three wavelengths selected from all possible regression method was improved, compared to the model using whole spectra (whole spectra: SEP = 0.4 g/dL, 3-wavelengths: SEP=0.3 g/dL). It appears that the proper selection of variables can be more effective than using whole spectra for determining the hemoglobin concentration in whole blood.

일반화된 회귀신경망과 유전자 알고리즘을 이용한 식각 마이크로 트렌치 모델링 (Modeling of etch microtrenching using generalized regression neural network and genetic algorithm)

  • 이덕우;김병환
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2005년도 심포지엄 논문집 정보 및 제어부문
    • /
    • pp.27-29
    • /
    • 2005
  • Using a generalized regression neural network, etch microtrenching was modeled. All neurons in the pattern layer were equipped with multi-factored spreads and their complex effects on the prediction performance were optimized by means of a genetic algorithm. For comparison, GRNN model was constructed in a conventional way. Comparison result revealed that GA-GRNN model was more accurate than GRNN model by about 30%. The microtrenching data were collected during the etching of silicon oxynitride film and the etch process was characterized by a statistical experimental design.

  • PDF

A Clustering Approach to Wind Power Prediction based on Support Vector Regression

  • Kim, Seong-Jun;Seo, In-Yong
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제12권2호
    • /
    • pp.108-112
    • /
    • 2012
  • A sustainable production of electricity is essential for low carbon green growth in South Korea. The generation of wind power as renewable energy has been rapidly growing around the world. Undoubtedly wind energy is unlimited in potential. However, due to its own intermittency and volatility, there are difficulties in the effective harvesting of wind energy and the integration of wind power into the current electric power grid. To cope with this, many works have been done for wind speed and power forecasting. It is reported that, compared with physical persistent models, statistical techniques and computational methods are more useful for short-term forecasting of wind power. Among them, support vector regression (SVR) has much attention in the literature. This paper proposes an SVR based wind speed forecasting. To improve the forecasting accuracy, a fuzzy clustering is adopted in the process of SVR modeling. An illustrative example is also given by using real-world wind farm dataset. According to the experimental results, it is shown that the proposed method provides better forecasts of wind power.

MODIS 영상 자료와 패널 자료를 이용한 지표면온도변화 요인분석 (The Factor Analysis of Land Surface Temperature(LST) Change using MODIS Imagery and Panel Data)

  • 배다혜;김홍명;하성룡
    • 한국지리정보학회지
    • /
    • 제21권1호
    • /
    • pp.46-56
    • /
    • 2018
  • 본 연구에서는 지표면 온도 변화에 미치는 주요 지역특성인자를 도출하고 각각의 인자가 미치는 확률적 영향계수를 추정하였다. 연구대상지역은 충청북도 전역이며 패널 분석을 위해 시 군 행정 단위로 분할하였다. 지표면온도 및 지역특성 시계열자료들은 MODIS 영상과 통계청자료를 사용하여 각각 구축하였다. 그리고 지표면온도와 횡단면자료인 지역특성인자들을 다중회귀관계로 설정하고 패널 모형 분석을 통하여 회귀계수 추정치를 산정하였다. 지표면온도와 지역특성인자는 패널 모형 분석에서 종속변수와 설명변수로 각각 사용하였다. 패널 자료 분석은 상용 통계프로그램 STATA14를 사용하였으며, 일원 개체 고정효과모형이 본 연구의 지표면온도 변화 해석에 가장 적절한 모형으로 선정되었다. 지표면온도 변화에 미치는 설명변수의 영향수준을 나타내는 기여율은 회귀방정식의 추정회귀계수로부터 구했다. 설명변수별 기여율은 도시 공업지역이 3.746로 가장 컸으며, 다음으로 평균고도${\times}$대지면적비율이 2.856, 전력사용량 2.742, 평균풍속 0.553, 비도시관리지역 0.102, 농림지역과 자연 환경보전지역은 0.085와 0.071 그리고 시 군의 평균강우량 한 단위 변화가 지표면온도 변화에 미치는 기여율은 0.003으로 추정되었다.

Neural-based Blind Modeling of Mini-mill ASC Crown

  • Lee, Gang-Hwa;Lee, Dong-Il;Lee, Seung-Joon;Lee, Suk-Gyu;Kim, Shin-Il;Park, Hae-Doo;Park, Seung-Gap
    • 한국지능시스템학회논문지
    • /
    • 제12권6호
    • /
    • pp.577-582
    • /
    • 2002
  • Neural network can be trained to approximate an arbitrary nonlinear function of multivariate data like the mini-mill crown values in Automatic Shape Control. The trained weights of neural network can evaluate or generalize the process data outside the training vectors. Sometimes, the blind modeling of the process data is necessary to compare with the scattered analytical model of mini-mill process in isolated electro-mechanical forms. To come up with a viable model, we propose the blind neural-based range-division domain-clustering piecewise-linear modeling scheme. The basic ideas are: 1) dividing the range of target data, 2) clustering the corresponding input space vectors, 3)training the neural network with clustered prototypes to smooth out the convergence and 4) solving the resulting matrix equations with a pseudo-inverse to alleviate the ill-conditioning problem. The simulation results support the effectiveness of the proposed scheme and it opens a new way to the data analysis technique. By the comparison with the statistical regression, it is evident that the proposed scheme obtains better modeling error uniformity and reduces the magnitudes of errors considerably. Approximatly 10-fold better performance results.

Nonlinear finite element based parametric and stochastic analysis of prestressed concrete haunched beams

  • Ozogul, Ismail;Gulsan, Mehmet E.
    • Structural Engineering and Mechanics
    • /
    • 제84권2호
    • /
    • pp.207-224
    • /
    • 2022
  • The mechanical behavior of prestressed concrete haunched beams (PSHBs) was investigated in depth using a finite element modeling technique in this study. The efficiency of finite element modeling was investigated in the first stage by taking into account a previous study from the literature. The first stage's findings suggested that finite element modeling might be preferable for modeling PSHBs. In the second stage of the research, a comprehensive parametric study was carried out to determine the effect of each parameter on PSHB load capacity, including haunch angle, prestress level, compressive strength, tensile reinforcement ratio, and shear span to depth ratio. PSHBs and prestressed concrete rectangular beams (PSRBs) were also compared in terms of capacity. Stochastic analysis was used in the third stage to define the uncertainty in PSHB capacity by taking into account uncertainty in geometric and material parameters. Standard deviation, coefficient of variation, and the most appropriate probability density function (PDF) were proposed as a result of the analysis to define the randomness of capacity of PSHBs. In the study's final section, a new equation was proposed for using symbolic regression to predict the load capacity of PSHBs and PSRBs. The equation's statistical results show that it can be used to calculate the capacity of PSHBs and PSRBs.

혼합회귀모형에서 콤포넌트 및 설명변수에 대한 벌점함수의 적용 (Joint penalization of components and predictors in mixture of regressions)

  • 박종선;모은비
    • 응용통계연구
    • /
    • 제32권2호
    • /
    • pp.199-211
    • /
    • 2019
  • 주어진 회귀자료에 유한혼합회귀모형을 적합하는 경우 적절한 성분의 수를 선택하고 선택된 각각의 회귀모형에서 의미있는 예측변수들의 집합을 선택하며 동시에 편의와 변동이 작은 회귀계수 추정치들을 얻는 것은 매우 중요하다. 본 연구에서는 혼합선형회귀모형에서 성분의 개수와 회귀계수에 벌점함수를 적용하여 적절한 성분의 수와 각 성분의 회귀모형에 필요한 설명변수들을 동시에 선택하는 방법을 제시하였다. 성분에 대한 벌점은 성분들의 로그값에 SCAD 벌점함수를 적용하였고 회귀계수들에는 SCAD와 더불어 MCP 및 Adplasso 벌점함수들을 사용하여 가상자료와 실제자료들에 대한 결과를 비교하였다. SCAD-SCAD 벌점함수 조합과 SCAD-MCP 조합의 경우 기존의 Luo 등 (2008)의 방법에서 문제가 되었던 과적합 문제를 해결함과 동시에 선택된 성분의 수와 회귀계수들을 효과적으로 선택하였으며 회귀계수들의 추정치에 대한 편의도 크지 않았다. 본 연구는 성분의 수가 알려져 있지 않은 회귀자료에서 적절한 성분의 수와 더불어 각 성분에 대한 회귀모형에서 모형에 필요한 예측변수들을 동시에 선택하는 방법을 제시하였다는데 의미가 있다고 하겠다.

Assessment through Statistical Methods of Water Quality Parameters(WQPs) in the Han River in Korea

  • Kim, Jae Hyoun
    • 한국환경보건학회지
    • /
    • 제41권2호
    • /
    • pp.90-101
    • /
    • 2015
  • Objective: This study was conducted to develop a chemical oxygen demand (COD) regression model using water quality monitoring data (January, 2014) obtained from the Han River auto-monitoring stations. Methods: Surface water quality data at 198 sampling stations along the six major areas were assembled and analyzed to determine the spatial distribution and clustering of monitoring stations based on 18 WQPs and regression modeling using selected parameters. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR), cluster analysis (CA) and principal component analysis (PCA) were used to build a COD model using water quality data. Results: A best GA-MLR model facilitated computing the WQPs for a 5-descriptor COD model with satisfactory statistical results ($r^2=92.64$,$Q{^2}_{LOO}=91.45$,$Q{^2}_{Ext}=88.17$). This approach includes variable selection of the WQPs in order to find the most important factors affecting water quality. Additionally, ordination techniques like PCA and CA were used to classify monitoring stations. The biplot based on the first two principal components (PCs) of the PCA model identified three distinct groups of stations, but also differs with respect to the correlation with WQPs, which enables better interpretation of the water quality characteristics at particular stations as of January 2014. Conclusion: This data analysis procedure appears to provide an efficient means of modelling water quality by interpreting and defining its most essential variables, such as TOC and BOD. The water parameters selected in a COD model as most important in contributing to environmental health and water pollution can be utilized for the application of water quality management strategies. At present, the river is under threat of anthropogenic disturbances during festival periods, especially at upstream areas.

Machine Learning Approaches to Corn Yield Estimation Using Satellite Images and Climate Data: A Case of Iowa State

  • Kim, Nari;Lee, Yang-Won
    • 한국측량학회지
    • /
    • 제34권4호
    • /
    • pp.383-390
    • /
    • 2016
  • Remote sensing data has been widely used in the estimation of crop yields by employing statistical methods such as regression model. Machine learning, which is an efficient empirical method for classification and prediction, is another approach to crop yield estimation. This paper described the corn yield estimation in Iowa State using four machine learning approaches such as SVM (Support Vector Machine), RF (Random Forest), ERT (Extremely Randomized Trees) and DL (Deep Learning). Also, comparisons of the validation statistics among them were presented. To examine the seasonal sensitivities of the corn yields, three period groups were set up: (1) MJJAS (May to September), (2) JA (July and August) and (3) OC (optimal combination of month). In overall, the DL method showed the highest accuracies in terms of the correlation coefficient for the three period groups. The accuracies were relatively favorable in the OC group, which indicates the optimal combination of month can be significant in statistical modeling of crop yields. The differences between our predictions and USDA (United States Department of Agriculture) statistics were about 6-8 %, which shows the machine learning approaches can be a viable option for crop yield modeling. In particular, the DL showed more stable results by overcoming the overfitting problem of generic machine learning methods.

공분산구조분석을 이용한 자체충족률 모형 검증 (Formulating Regional Relevance Index through Covariance Structure Modeling)

  • 장혜정;김창엽
    • 보건행정학회지
    • /
    • 제11권2호
    • /
    • pp.123-140
    • /
    • 2001
  • Hypotheses In health services research are becoming increasingly more complex and specific. As a result, health services research studies often include multiple independent, intervening, and dependent variables in a single hypothesis. Nevertheless, the statistical models adopted by health services researchers have failed to keep pace with the increasing complexity and specificity of hypotheses and research designs. This article introduces a statistical model well suited for complex and specific hypotheses tests in health services research studies. The covariance structure modeling(CSM) methodology is especially applied to regional relevance indices(RIs) to assess the impact of health resources and healthcare utilization. Data on secondary statistics and health insurance claims were collected by each catchment area. The model for RI was justified by direct and indirect effects of three latent variables measured by seven observed variables, using ten structural equations. The resulting structural model revealed significant direct effects of the structure of health resources but indirect effects of the quantity on RIs, and explained 82% of correlation matrix of measurement variables. Two variables, the number of beds and the portion of specialists among medical doctors, became to have significant effects on RIs by being analyzed using the CSM methodology, while they were insignificant in the regression model. Recommendations for the CSM methodology on health service research data are provided.

  • PDF