• Title/Summary/Keyword: Multivariate regression models

Search Result 174, Processing Time 0.025 seconds

Study on water quality prediction in water treatment plants using AI techniques (AI 기법을 활용한 정수장 수질예측에 관한 연구)

  • Lee, Seungmin;Kang, Yujin;Song, Jinwoo;Kim, Juhwan;Kim, Hung Soo;Kim, Soojun
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.3
    • /
    • pp.151-164
    • /
    • 2024
  • In water treatment plants supplying potable water, the management of chlorine concentration in water treatment processes involving pre-chlorination or intermediate chlorination requires process control. To address this, research has been conducted on water quality prediction techniques utilizing AI technology. This study developed an AI-based predictive model for automating the process control of chlorine disinfection, targeting the prediction of residual chlorine concentration downstream of sedimentation basins in water treatment processes. The AI-based model, which learns from past water quality observation data to predict future water quality, offers a simpler and more efficient approach compared to complex physicochemical and biological water quality models. The model was tested by predicting the residual chlorine concentration downstream of the sedimentation basins at Plant, using multiple regression models and AI-based models like Random Forest and LSTM, and the results were compared. For optimal prediction of residual chlorine concentration, the input-output structure of the AI model included the residual chlorine concentration upstream of the sedimentation basin, turbidity, pH, water temperature, electrical conductivity, inflow of raw water, alkalinity, NH3, etc. as independent variables, and the desired residual chlorine concentration of the effluent from the sedimentation basin as the dependent variable. The independent variables were selected from observable data at the water treatment plant, which are influential on the residual chlorine concentration downstream of the sedimentation basin. The analysis showed that, for Plant, the model based on Random Forest had the lowest error compared to multiple regression models, neural network models, model trees, and other Random Forest models. The optimal predicted residual chlorine concentration downstream of the sedimentation basin presented in this study is expected to enable real-time control of chlorine dosing in previous treatment stages, thereby enhancing water treatment efficiency and reducing chemical costs.

Larger Testicular Volume Is Independently Associated with Favorable Indices of Lung Function

  • Kim, Tae Beom;Park, I-Nae
    • Tuberculosis and Respiratory Diseases
    • /
    • v.80 no.4
    • /
    • pp.385-391
    • /
    • 2017
  • Background: Men with chronic obstructive pulmonary disease, have reduced endogenous testosterone levels, but the relationship between pulmonary function and endogenous testosterone levels, is inconsistent. Testicular volume is a known indicator of endogenous testosterone levels, male fertility, and male potency. In the present study, the authors investigated the relationship, between testicular volume and lung function. Methods: One hundred and eighty-one South Korean men age 40-70, hospitalized for urological surgery, were retrospectively enrolled, irrespective of the presence of respiratory disease. Study subjects underwent pulmonary function testing, prior to procedures, and testicular volumes were measured by orchidometry. Testosterone levels of patients in blood samples collected between $7{\small{AM}}$ and $11{\small{AM}}$, were measured by a direct chemiluminescent immunoassay. Results: The 181 study subjects were divided into two groups, by testicular volume (${\geq}35mL$ vs. <35 mL), the larger testes group, had better lung functions (forced vital capacity [FVC]: $3.87{\pm}0.65L$ vs. $3.66{\pm}0.65L$, p=0.037; forced expiratory volume in 1 second [$FEV_1$]: $2.92{\pm}0.57L$ vs. $2.65{\pm}0.61L$, p=0.002; FVC % predicted: $98.2{\pm}15.2%$ vs. $93.8{\pm}13.1%$, p=0.040; $FEV_1$ % predicted: $105.4{\pm}19.5%$ vs. $95.9{\pm}21.2%$, p=0.002). In addition, the proportion of patients with a $FEV_1/FVC$ of <70%, was lower in the larger testes group. Univariate analysis conducted using linear regression models, revealed that testicular volume was correlated with FVC (r=0.162, p=0.029), $FEV_1$ (r=0.218, p=0.003), $FEV_1/FVC$ (r=0.149, p=0.046), and $FEV_1$ % predicted (r=0.178, p=0.017), and multivariate analysis using linear regression models, revealed that testicular volume was a significant predictive factor for $FEV_1$ % predicted (${\beta}=0.159$, p=0.041). Conclusion: Larger testicular volume was independently associated, with favorable indices of lung function. These results suggest that androgens, may contribute to better lung function.

A Comparative Study on Failure Pprediction Models for Small and Medium Manufacturing Company (중소제조기업의 부실예측모형 비교연구)

  • Hwangbo, Yun;Moon, Jong Geon
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.11 no.3
    • /
    • pp.1-15
    • /
    • 2016
  • This study has analyzed predication capabilities leveraging multi-variate model, logistic regression model, and artificial neural network model based on financial information of medium-small sized companies list in KOSDAQ. 83 delisted companies from 2009 to 2012 and 83 normal companies, i.e. 166 firms in total were sampled for the analysis. Modelling with training data was mobilized for 100 companies inlcuding 50 delisted ones and 50 normal ones at random out of the 166 companies. The rest of samples, 66 companies, were used to verify accuracies of the models. Each model was designed by carrying out T-test with 79 financial ratios for the last 5 years and identifying 9 significant variables. T-test has shown that financial profitability variables were major variables to predict a financial risk at an early stage, and financial stability variables and financial cashflow variables were identified as additional significant variables at a later stage of insolvency. When predication capabilities of the models were compared, for training data, a logistic regression model exhibited the highest accuracy while for test data, the artificial neural networks model provided the most accurate results. There are differences between the previous researches and this study as follows. Firstly, this study considered a time-series aspect in light of the fact that failure proceeds gradually. Secondly, while previous studies constructed a multivariate discriminant model ignoring normality, this study has reviewed the regularity of the independent variables, and performed comparisons with the other models. Policy implications of this study is that the reliability for the disclosure documents is important because the simptoms of firm's fail woule be shown on financial statements according to this paper. Therefore institutional arragements for restraing moral laxity from accounting firms or its workers should be strengthened.

  • PDF

Dynamic forecasts of bankruptcy with Recurrent Neural Network model (RNN(Recurrent Neural Network)을 이용한 기업부도예측모형에서 회계정보의 동적 변화 연구)

  • Kwon, Hyukkun;Lee, Dongkyu;Shin, Minsoo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.139-153
    • /
    • 2017
  • Corporate bankruptcy can cause great losses not only to stakeholders but also to many related sectors in society. Through the economic crises, bankruptcy have increased and bankruptcy prediction models have become more and more important. Therefore, corporate bankruptcy has been regarded as one of the major topics of research in business management. Also, many studies in the industry are in progress and important. Previous studies attempted to utilize various methodologies to improve the bankruptcy prediction accuracy and to resolve the overfitting problem, such as Multivariate Discriminant Analysis (MDA), Generalized Linear Model (GLM). These methods are based on statistics. Recently, researchers have used machine learning methodologies such as Support Vector Machine (SVM), Artificial Neural Network (ANN). Furthermore, fuzzy theory and genetic algorithms were used. Because of this change, many of bankruptcy models are developed. Also, performance has been improved. In general, the company's financial and accounting information will change over time. Likewise, the market situation also changes, so there are many difficulties in predicting bankruptcy only with information at a certain point in time. However, even though traditional research has problems that don't take into account the time effect, dynamic model has not been studied much. When we ignore the time effect, we get the biased results. So the static model may not be suitable for predicting bankruptcy. Thus, using the dynamic model, there is a possibility that bankruptcy prediction model is improved. In this paper, we propose RNN (Recurrent Neural Network) which is one of the deep learning methodologies. The RNN learns time series data and the performance is known to be good. Prior to experiment, we selected non-financial firms listed on the KOSPI, KOSDAQ and KONEX markets from 2010 to 2016 for the estimation of the bankruptcy prediction model and the comparison of forecasting performance. In order to prevent a mistake of predicting bankruptcy by using the financial information already reflected in the deterioration of the financial condition of the company, the financial information was collected with a lag of two years, and the default period was defined from January to December of the year. Then we defined the bankruptcy. The bankruptcy we defined is the abolition of the listing due to sluggish earnings. We confirmed abolition of the list at KIND that is corporate stock information website. Then we selected variables at previous papers. The first set of variables are Z-score variables. These variables have become traditional variables in predicting bankruptcy. The second set of variables are dynamic variable set. Finally we selected 240 normal companies and 226 bankrupt companies at the first variable set. Likewise, we selected 229 normal companies and 226 bankrupt companies at the second variable set. We created a model that reflects dynamic changes in time-series financial data and by comparing the suggested model with the analysis of existing bankruptcy predictive models, we found that the suggested model could help to improve the accuracy of bankruptcy predictions. We used financial data in KIS Value (Financial database) and selected Multivariate Discriminant Analysis (MDA), Generalized Linear Model called logistic regression (GLM), Support Vector Machine (SVM), Artificial Neural Network (ANN) model as benchmark. The result of the experiment proved that RNN's performance was better than comparative model. The accuracy of RNN was high in both sets of variables and the Area Under the Curve (AUC) value was also high. Also when we saw the hit-ratio table, the ratio of RNNs that predicted a poor company to be bankrupt was higher than that of other comparative models. However the limitation of this paper is that an overfitting problem occurs during RNN learning. But we expect to be able to solve the overfitting problem by selecting more learning data and appropriate variables. From these result, it is expected that this research will contribute to the development of a bankruptcy prediction by proposing a new dynamic model.

Variable Selection for Multi-Purpose Multivariate Data Analysis (다목적 다변량 자료분석을 위한 변수선택)

  • Huh, Myung-Hoe;Lim, Yong-Bin;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.1
    • /
    • pp.141-149
    • /
    • 2008
  • Recently we frequently analyze multivariate data with quite large number of variables. In such data sets, virtually duplicated variables may exist simultaneously even though they are conceptually distinguishable. Duplicate variables may cause problems such as the distortion of principal axes in principal component analysis and factor analysis and the distortion of the distances between observations, i.e. the input for cluster analysis. Also in supervised learning or regression analysis, duplicated explanatory variables often cause the instability of fitted models. Since real data analyses are aimed often at multiple purposes, it is necessary to reduce the number of variables to a parsimonious level. The aim of this paper is to propose a practical algorithm for selection of a subset of variables from a given set of p input variables, by the criterion of minimum trace of partial variances of unselected variables unexplained by selected variables. The usefulness of proposed method is demonstrated in visualizing the relationship between selected and unselected variables, in building a predictive model with very large number of independent variables, and in reducing the number of variables and purging/merging categories in categorical data.

Role of Multiparametric Prostate Magnetic Resonance Imaging before Confirmatory Biopsy in Assessing the Risk of Prostate Cancer Progression during Active Surveillance

  • Joseba Salguero;Enrique Gomez-Gomez;Jose Valero-Rosa;Julia Carrasco-Valiente;Juan Mesa;Cristina Martin;Juan Pablo Campos-Hernandez;Juan Manuel Rubio;Daniel Lopez;Maria Jose Requena
    • Korean Journal of Radiology
    • /
    • v.22 no.4
    • /
    • pp.559-567
    • /
    • 2021
  • Objective: To evaluate the impact of multiparametric magnetic resonance imaging (mpMRI) before confirmatory prostate biopsy in patients under active surveillance (AS). Materials and Methods: This retrospective study included 170 patients with Gleason grade 6 prostate cancer initially enrolled in an AS program between 2011 and 2019. Prostate mpMRI was performed using a 1.5 tesla (T) magnetic resonance imaging system with a 16-channel phased-array body coil. The protocol included T1-weighted, T2-weighted, diffusion-weighted, and dynamic contrast-enhanced imaging sequences. Uroradiology reports generated by a specialist were based on prostate imaging-reporting and data system (PI-RADS) version 2. Univariate and multivariate analyses were performed based on regression models. Results: The reclassification rate at confirmatory biopsy was higher in patients with suspicious lesions on mpMRI (PI-RADS score ≥ 3) (n = 47) than in patients with non-suspicious mpMRIs (n = 61) and who did not undergo mpMRIs (n = 62) (66%, 26.2%, and 24.2%, respectively; p < 0.001). On multivariate analysis, presence of a suspicious mpMRI finding (PI-RADS score ≥ 3) was associated (adjusted odds ratio: 4.72) with the risk of reclassification at confirmatory biopsy after adjusting for the main variables (age, prostate-specific antigen density, number of positive cores, number of previous biopsies, and clinical stage). Presence of a suspicious mpMRI finding (adjusted hazard ratio: 2.62) was also associated with the risk of progression to active treatment during the follow-up. Conclusion: Inclusion of mpMRI before the confirmatory biopsy is useful to stratify the risk of reclassification during the biopsy as well as to evaluate the risk of progression to active treatment during follow-up.

Differences in Cigarette Use Behaviors by Age at the Time of Diagnosis With Diabetes From Young Adulthood to Adulthood: Results From the National Longitudinal Study of Adolescent Health

  • Bae, Jisuk
    • Journal of Preventive Medicine and Public Health
    • /
    • v.46 no.5
    • /
    • pp.249-260
    • /
    • 2013
  • Objectives: Previous observations propose that risk-taking behaviors such as cigarette smoking are prevailing among young people with chronic conditions including diabetes. The purpose of this study was to examine whether cigarette smoking is more prevalent among diabetics than non-diabetics and whether it differs by age at the time of diagnosis with diabetes from young adulthood (YAH) to adulthood (AH). Methods: We used US panel data from the National Longitudinal Study of Adolescent Health (Add Health Study) during the years 2001 to 2002 (Wave III, YAH) and 2007 to 2008 (Wave IV, AH). Multivariate logistic regression models were applied to estimate odds ratios (ORs) and 95% confidence intervals (CIs) of cigarette use behaviors according to age at the time of diagnosis with diabetes, after adjusting for demographic and selected behavioral factors. Results: Of 12 175 study participants, 2.6% reported having been diagnosed with diabetes up to AH. Early-onset diabetics (age at diagnosis <13 years) were more likely than non-diabetics to report frequent cigarette smoking (smoking on ${\geq}20$ days during the previous 30 days) in YAH (OR, 3.34; 95% CI, 1.27 to 8.79). On the other hand, late-onset diabetics (age at diagnosis ${\geq}13$ years) were more likely than non-diabetics to report heavy cigarette smoking (smoking ${\geq}10$ cigarettes per day during the previous 30 days) in AH (OR, 1.54; 95% CI, 1.03 to 2.30). Conclusions: The current study indicated that diabetics are more likely than non-diabetics to smoke cigarettes frequently and heavily in YAH and AH. Effective smoking prevention and cessation programs uniquely focused on diabetics need to be designed and implemented.

Prognostic Significance of Preoperative Lymphocyte-Monocyte Ratio in Patients with Resectable Esophageal Squamous Cell Carcinoma

  • Han, Li-Hui;Jia, Yi-Bin;Song, Qing-Xu;Wang, Jian-Bo;Wang, Na-Na;Cheng, Yu-Feng
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.16 no.6
    • /
    • pp.2245-2250
    • /
    • 2015
  • Background: The interaction between tumor cells and inflammatory cells has not been systematically investigated in esophageal squamous cell carcinoma (ESCC). The aim of the present study was to evaluate whether preoperative the lymphocyte-monocyte ratio (LMR), the neutrophil-lymphocyte ratio (NLR), and the platelet-lymphocyte ratio (PLR) could predict the prognosis of ESCC patients undergoing esophagectomy. Materials and Methods: Records from 218 patients with histologically diagnosed ESCC who underwent attempted curative surgery from January 2007 to December 2008 were retrospectively reviewed. Besides clinicopathological prognostic factors, we evaluated the prognostic value of the LMR, the NLR, and the PLR using Kaplan-Meier curves and Cox regression models. Results: The median follow-up was 38.6 months (range 3-71 months). The cut-off values of 2.57 for the LMR, 2.60 for the NLR and 244 for the PLR were chosen as optimal to discriminate between survival and death by applying receiver operating curve (ROC) analysis. Kaplan-Meier survival analysis of patients with low preoperative LMR demonstrated a significant worse prognosis for DFS (p=0.004) and OS (p=0.002) than those with high preoperative LMR. The high NLR cohort had lower DFS (p=0.004) and OS (p=0.011). Marginally reduced DFS (p=0.068) and lower OS (p=0.039) were found in the high PLR cohort. On multivariate analysis, only preoperative LMR was an independent prognostic factor for both DFS (p=0.009, HR=1.639, 95% CI 1.129-2.381) and OS (p=0.004, HR=1.759, 95% CI 1.201-2.576) in ESCC patients. Conclusions: Preoperative LMR better predicts cancer survival compared with the cellular components of systemic inflammation in patients with ESCC undergoing esophagectomy.

Preoperative Thrombocytosis and Poor Prognostic Factors in Endometrial Cancer

  • Heng, Suttichai;Benjapibal, Mongkol
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.23
    • /
    • pp.10231-10236
    • /
    • 2015
  • This study aimed to evaluate the prevalence of preoperative thrombocytosis and its prognostic significance in Thai patients with endometrial cancer. We retrospectively reviewed the medical records of 238 cases who had undergone surgical staging procedures between January 2005 and December 2008. Associations between clinicopathological variables and preoperative platelet counts were analyzed using Pearson's chi square or two-tailed Fisher's exact tests. Survival analysis was performed with Kaplan-Meier estimates. Univariate and Cox-regression models were used to evaluate the prognostic impact of various factors including platelet count in terms of disease-free survival and overall survival. The mean preoperative platelet count was $315,437/{\mu}L$ (SD $100,167/{\mu}L$). Patients who had advanced stage, adnexal involvement, lymph node metastasis, and positive peritoneal cytology had significantly higher mean preoperative platelet counts when compared with those who had not. We found thrombocytosis (platelet count greater than $400,000/{\mu}L$) in 18.1% of our patients with endometrial cancer. These had significant higher rates of advanced stage, cervical involvement, adnexal involvement, positive peritoneal cytology, and lymph node involvement than patients with a normal pretreatment platelet count. The 5-year disease-free survival and overall survival were significantly lower in patients who had thrombocytosis compared with those who had not (67.4% vs. 85.1%, p=0.001 and 86.0% vs. 94.9%, p=0.034, respectively). Thrombocytosis was shown to be a prognostic factor in the univariate but not the multivariate analysis. In conclusion, presence of thrombocytosis is not uncommon in endometrial cancer and may reflect unfavorable prognostic factors but its prognostic impact on survival needs to be clarified in further studies.

3D-QSAR of Angiotensin-Converting Enzyme Inhibitors: Functional Group Interaction Energy Descriptors for Quantitative Structure-Activity Relationships Study of ACE Inhibitors

  • Kim, Sang-Uk;Chi, Myung-Whan;Yoon, Chang-No;Sung, Ha-Chin
    • BMB Reports
    • /
    • v.31 no.5
    • /
    • pp.459-467
    • /
    • 1998
  • A new set of functional group interaction energy descriptors relevant to the ACE (Angiotensin-Converting Enzyme) inhibitory peptide, QSAR (Quantitative Structure Activity Relationships), is presented. The functional group interaction energies approximate the charged interactions and distances between functional groups in molecules. The effective energies of the computationally derived geometries are useful parameters for deriving 3D-QSAR models, especially in the absence of experimentally known active site conformation. ACE is a regulatory zinc protease in the renin-angiotensin system. Therapeutic inhibition of this enzyme has proven to be a very effective treatment for the management of hypertension. The non bond interaction energy values among functional groups of six-feature of ACE inhibitory peptides were used as descriptor terms and analyzed for multivariate correlation with ACE inhibition activity. The functional group interaction energy descriptors used in the regression analysis were obtained by a series of inhibitor structures derived from molecular mechanics and semi-empirical calculations. The descriptors calculated using electrostatic and steric fields from the precisely defined functional group were sufficient to explain the biological activity of inhibitor. Application of the descriptors to the inhibition of ACE indicates that the derived QSAR has good predicting ability and provides insight into the mechanism of enzyme inhibition. The method, functional group interaction energy analysis, is expected to be applicable to predict enzyme inhibitory activity of the rationally designed inhibitors.

  • PDF