• Title/Summary/Keyword: Linear regression models

Search Result 947, Processing Time 0.044 seconds

Prediction and analysis of acute fish toxicity of pesticides to the rainbow trout using 2D-QSAR (2D-QSAR방법을 이용한 농약류의 무지개 송어 급성 어독성 분석 및 예측)

  • Song, In-Sik;Cha, Ji-Young;Lee, Sung-Kwang
    • Analytical Science and Technology
    • /
    • v.24 no.6
    • /
    • pp.544-555
    • /
    • 2011
  • The acute toxicity in the rainbow trout (Oncorhynchus mykiss) was analyzed and predicted using quantitative structure-activity relationships (QSAR). The aquatic toxicity, 96h $LC_{50}$ (median lethal concentration) of 275 organic pesticides, was obtained from EU-funded project DEMETRA. Prediction models were derived from 558 2D molecular descriptors, calculated in PreADMET. The linear (multiple linear regression) and nonlinear (support vector machine and artificial neural network) learning methods were optimized by taking into account the statistical parameters between the experimental and predicted p$LC_{50}$. After preprocessing, population based forward selection were used to select the best subsets of descriptors in the learning methods including 5-fold cross-validation procedure. The support vector machine model was used as the best model ($R^2_{CV}$=0.677, RMSECV=0.887, MSECV=0.674) and also correctly classified 87% for the training set according to EU regulation criteria. The MLR model could describe the structural characteristics of toxic chemicals and interaction with lipid membrane of fish. All the developed models were validated by 5 fold cross-validation and Y-scrambling test.

Is the association of continuous metabolic syndrome risk score with body mass index independent of physical activity? The CASPIAN-III study

  • Heshmat, Ramin;shafiee, Gita;Kelishadi, Roya;Babaki, Amir Eslami Shahr;Motlagh, Mohammad Esmaeil;Arefirad, Tahereh;Ardalan, Gelayol;Ataie-Jafari, Asal;Asayesh, Hamid;Mohammadi, Rasool;Qorbani, Mostafa
    • Nutrition Research and Practice
    • /
    • v.9 no.4
    • /
    • pp.404-410
    • /
    • 2015
  • BACKGROUND/OBJECTIVES: Although the association of body mass index (BMI) with metabolic syndrome (MetS) is well documented, there is little knowledge on the independent and joint associations of BMI and physical activity with MetS risk based on a continuous scoring system. This study was designed to explore the effect of physical activity on interactions between excess body weight and continuous metabolic syndrome (cMetS) in a nationwide survey of Iranian children and adolescents. SUBJECTS/METHODS: Data on 5,625 school students between 10 and 18 years of age were analyzed. BMI percentiles, screen time activity (STA), leisure time physical activity (LTPA) levels, and components of cMetS risk score were extracted. Standardized residuals (z-scores) were calculated for MetS components. Linear regression models were used to study the interactions between different combinations of cMetS, LTPA, and BMI percentiles. RESULTS: Overall, 984 (17.5%) subjects were underweight, whereas 501 (8.9%) and 451 (8%) participants were overweight and obese, respectively. All standardized values for cMetS components, except fasting blood glucose level, were directly correlated with BMI percentiles in all models (P-trend < 0.001); these associations were independent of STA and LTPA levels. Linear associations were also observed among LTPA and standardized residuals for blood pressure, high-density lipoprotein, and waist circumference (P-trend < 0.01). CONCLUSIONS: Our findings suggest that BMI percentiles are associated with cMetS risk score independent of LTPA and STA levels.

The wage determinants of the vocational high school graduates using mixed effects mode (혼합모형을 이용한 특성화고 졸업생의 임금결정요인 분석)

  • Ryu, Jangsoo;Cho, Jangsik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.935-946
    • /
    • 2016
  • In this paper, we analyzed wage determinants of the vocational high school graduates utilizing both individual-level and work region-level variables. We formulate the models in the way wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and work region-level (level-2) variables. To incorporate dependency between individual wages into the model, we utilize hierarchical linear model (HLM). The major results are as follows. First, it is shown that the HLM model is better than the OLS regression models which do not take level-1 and level-2 variables simultaneously into account. Second, random effects on sex, maester dummy and engineering dummy variables are statistically significant. Third, the fixed effects on business hours and mean wage of regular job for level-2 variables are statistically significant effect individual-level wages. Finally, parental education level, parental income, number of licenses and high school grade are statistically significant for higher individual-level wages.

Analysis of Urban Heat Island (UHI) Alleviating Effect of Urban Parks and Green Space in Seoul Using Deep Neural Network (DNN) Model (심층신경망 모형을 이용한 서울시 도시공원 및 녹지공간의 열섬저감효과 분석)

  • Kim, Byeong-chan;Kang, Jae-woo;Park, Chan;Kim, Hyun-jin
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.4
    • /
    • pp.19-28
    • /
    • 2020
  • The Urban Heat Island (UHI) Effect has intensified due to urbanization and heat management at the urban level is treated as an important issue. Green space improvement projects and environmental policies are being implemented as a way to alleviate Urban Heat Islands. Several studies have been conducted to analyze the correlation between urban green areas and heat with linear regression models. However, linear regression models have limitations explaining the correlation between heat and the multitude of variables as heat is a result of a combination of non-linear factors. This study evaluated the Heat Island alleviating effects in Seoul during the summer by using a deep neural network model methodology, which has strengths in areas where it is difficult to analyze data with existing statistical analysis methods due to variable factors and a large amount of data. Wide-area data was acquired using Landsat 8. Seoul was divided into a grid (30m × 30m) and the heat island reduction variables were enter in each grid space to create a data structure that is needed for the construction of a deep neural network using ArcGIS 10.7 and Python3.7 with Keras. This deep neural network was used to analyze the correlation between land surface temperature and the variables. We confirmed that the deep neural network model has high explanatory accuracy. It was found that the cooling effect by NDVI was the greatest, and cooling effects due to the park size and green space proximity were also shown. Previous studies showed that the cooling effects related to park size was 2℃-3℃, and the proximity effect was found to lower the temperature 0.3℃-2.3℃. There is a possibility of overestimation of the results of previous studies. The results of this study can provide objective information for the justification and more effective formation of new urban green areas to alleviate the Urban Heat Island phenomenon in the future.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

Comparison of Daily Rainfall Interpolation Techniques and Development of Two Step Technique for Rainfall-Runoff Modeling (강우-유출 모형 적용을 위한 강우 내삽법 비교 및 2단계 일강우 내삽법의 개발)

  • Hwang, Yeon-Sang;Jung, Young-Hun;Lim, Kwang-Suop;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.43 no.12
    • /
    • pp.1083-1091
    • /
    • 2010
  • Distributed hydrologic models typically require spatial estimates of precipitation interpolated from sparsely located observational points to the specific grid points. However, widely used estimation schemes fail to describe the realistic variability of daily precipitation field. We compare and contrast the performance of statistical methods for the spatial estimation of precipitation in two hydrologically different basins, and propose a two-step process for effective daily precipitation estimation. The methods assessed are: (1) Inverse Distance Weighted Average (IDW); (2) Multiple Linear Regression (MLR); (3) Climatological MLR; and (4) Locally Weighted Polynomial Regression (LWP). In the suggested simple two-step estimation process, precipitation occurrence is first generated via a logistic regression model before applying IDW scheme (one of the local scheme) to estimate the amount of precipitation separately on wet days. As the results, the suggested method shows the better performance of daily rainfall interpolation which has spatial differences compared with conventional methods. And this technique can be used for streamflow forecasting and downscaling of atmospheric circulation model effectively.

Methodology to Predict Service Lives of Pavement Marking Materials (도로 차선 재료의 공용수명 예측방법)

  • Oh, Heung-Un;Lee, Hyun-Seock;Jang, Jung-Hwa;Kang, Jai-Soo
    • International Journal of Highway Engineering
    • /
    • v.10 no.4
    • /
    • pp.151-159
    • /
    • 2008
  • Performances of retroreflectivity vary place to place, according to traffic volumes and time lengths after striping, depending on pavement marking materials and colors. The present paper uses the nation wide data of retroreflectivity, which has been collected from freeways and then tries to develop the regression curve setting traffic volume and service life as independent variables and retroreflectivities as dependent variables. The DB system includes two year's measurement in $2005{\sim}2006$ over Korean freeway pavement marking at an interval of three months for the period. The mobile measurement system, a laserlux, was employed for the purpose. The DB has provided a lot of information about materials and performance of the specific pavement marking such as geometric features, traffic volumes, material characteristics and the installation date. This study provides the comparison of pavement marking performances under diversified conditions. Based on accumulated pavement marking performances, this study provides performance curves based on the diversified factors. The goal of the retroreflectivity modeling is to develop equations that can be used to estimate an average retroreflectivity of pavement markings as a function time since application and traffic volume. After representing the variation of retroreflectivities and estimating regression curves by linear, exponential, logarithmic and power function, the regression curve which had the highest coefficient of determination and the value similar to the last field measurement was regarded as the retroreflectivity decay model. As a result of verification, the decay model showed the signification within the 90% confidence level and especially showed the clear relation with field data according to increase of cumulative vehicle exposure. Accordingly, these models can be used to determine service lives, retroreflectivity degradation rates, and retroreflectivity of new markings.

  • PDF

Analysis of the Feasibility and Reliability of Models Measuring National Innovative Capability: with a Focus on the IUS of the EU (국가혁신역량 측정모형의 신뢰성과 타당성 분석: 유럽연합의 IUS를 중심으로)

  • Um, Ik-Cheon;Cho, Joo-Yeon;Kim, Dae-In
    • Journal of Korea Technology Innovation Society
    • /
    • v.17 no.1
    • /
    • pp.45-67
    • /
    • 2014
  • National Innovative Capability (NIC) is an important decisive factor where economic growth is concerned. As such, it is very important to measure and manage NIC. The composite index approach is one of the widely used approaches to measuring NIC, but there have been insufficient reviews of its feasibility and reliability. This paper conducted an analysis of the feasibility and reliability of the report on the last three years (i.e. 2011 through 2013) of the Innovation Union Scoreboard (IUS) of the EU, which is the most representative means of measuring NIC. It turned out that its reliability meets the recommended criteria as a result of Chronbach's alpha-based test of the models of IUS-related composite index. However, neither the absolute fit index nor the incremental fit index was found to meet the recommended criteria in a construct validity analysis. It also turned out that predictive validity is very low as a result of panel linear regression analysis of sectors and items of IUS-related composite index. This paper presents a number of considerations to be made when measuring national innovative capability using the composite index approach, as well as major policy suggestions based on the results of the analysis.

Determination of Hot Air Drying Characteristics of Squash (Cucurbita spp.) Slices

  • Hong, Soon-jung;Lee, Dong Young;Park, Jeong Gil;Mo, Changyeun;Lee, Seung Hyun
    • Journal of Biosystems Engineering
    • /
    • v.42 no.4
    • /
    • pp.314-322
    • /
    • 2017
  • Purpose: This study was conducted to investigate the hot air drying characteristics of squash slices depending on the drying conditions (input air velocity, input air temperature, and sample thickness). Methods: The developed drying system was equipped with a controllable air blower and electric finned heater, drying chamber, and ventilation fan. Squash (summer squash called Korean zucchini) samples were cut into slices of two different thicknesses (5 and 10 mm). These were then dried at two different input air temperatures (60 and $70^{\circ}C$) and air velocities (5 and 7 m/s). Six well-known drying models were tested to describe the experimental drying data. A non-linear regression analysis was applied to determine model constants and statistical indices such as the coefficient of determination ($R^2$), reduced chi-square (${\chi}^2$), and root mean square error (RMSE). In addition, the effective moisture diffusivity ($D_{eff}$) was estimated based on the curve of ln(MR) versus drying time. Results: The results clearly showed that drying time decreased with an increase in input air temperature. Slice thickness also affected the drying time. Air velocity had a greater influence on drying time at $70^{\circ}C$ than at $60^{\circ}C$ for both thicknesses. All drying models accurately described the drying curve of squash slices regardless of slice thickness and drying conditions; the Modified Henderson and Pabis model had the best performance with the highest R2 and the lowest RMSE values. The effective moisture diffusivity ($D_{eff}$) changes, obtained from Fick's diffusion method, were between $1.67{\times}10^{-10}$ and $7.01{\times}10^{-10}m^2/s$. The moisture diffusivity was increased with an increase in input air temperature, velocity, and thickness. Conclusions: The drying time of squash slices varied depending on input temperature, velocity, and thickness of slices. The further study is necessary to figure out optimal drying condition for squash slices with retaining its original quality.

Prediction of Seedling Emergence and Early Growth of Monochoria vaginalis and Scirpus juncoides under Elevated Temperature (상승된 온도 조건에서 물달개비(Monochoria vaginalis)와 올챙이고랭이(Scirpus juncoides)의 출아 및 초기생장 예측)

  • Park, Min-Won;Kim, Jin-Won;Lim, Soo-Hyun;Lee, In-Yong;Kim, Do-Soon
    • Korean Journal of Weed Science
    • /
    • v.30 no.2
    • /
    • pp.103-110
    • /
    • 2010
  • This experiment was conducted to investigate seedling emergence and early growth of Monochoria vaginalis and Scirpus juncoides in the controlled-environment chamber maintained at different temperatures. Non-linear regression analyses of observed data against effective accumulated temperature (EAT) with the Gompertz and logistic models showed that the Gompertz and logistic models worked well in describing seedling emergence and early growth of both weed species, respectively, regardless of temperature. EATs required for 50% of the maximum seedling emergence and the maximum leaf number of M. vaginalis were estimated to be 69.3 and $131^{\circ}C$, respectively, while those of S. juncoides were 94.8 and $137^{\circ}C$, respectively. Models developed in this study thus were used to predict seedling emergence and early growth under elevated temperature condition. If rotary tillage with water is made on 27 May under $+3^{\circ}C$ elevated temperature condition, dates for 50% of the maximum seedling emergence and 4 leaf stage were predicted to be 1 June and 15 June for M. vaginalis and 3 June and 14 June for S. juncoides, respectively. As compared with current temperature, these dates are 1-2 days earlier for the seedling emergence and 3 days earlier for the early growth, suggesting that earlier application of herbicides is required for effective control of M. vaginalis and S. juncoides under elevated temperature condition in the future.