• Title/Summary/Keyword: Standard error of prediction

Search Result 324, Processing Time 0.027 seconds

Multivariate Outlier Removing for the Risk Prediction of Gas Leakage based Methane Gas (메탄 가스 기반 가스 누출 위험 예측을 위한 다변량 특이치 제거)

  • Dashdondov, Khongorzul;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.12
    • /
    • pp.23-30
    • /
    • 2020
  • In this study, the relationship between natural gas (NG) data and gas-related environmental elements was performed using machine learning algorithms to predict the level of gas leakage risk without directly measuring gas leakage data. The study was based on open data provided by the server using the IoT-based remote control Picarro gas sensor specification. The naturel gas leaks into the air, it is a big problem for air pollution, environment and the health. The proposed method is multivariate outlier removing method based Random Forest (RF) classification for predicting risk of NG leak. After, unsupervised k-means clustering, the experimental dataset has done imbalanced data. Therefore, we focusing our proposed models can predict medium and high risk so best. In this case, we compared the receiver operating characteristic (ROC) curve, accuracy, area under the ROC curve (AUC), and mean standard error (MSE) for each classification model. As a result of our experiments, the evaluation measurements include accuracy, area under the ROC curve (AUC), and MSE; 99.71%, 99.57%, and 0.0016 for MOL_RF respectively.

Linear interpolation and Machine Learning Methods for Gas Leakage Prediction Base on Multi-source Data Integration (다중소스 데이터 융합 기반의 가스 누출 예측을 위한 선형 보간 및 머신러닝 기법)

  • Dashdondov, Khongorzul;Jo, Kyuri;Kim, Mi-Hye
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.3
    • /
    • pp.33-41
    • /
    • 2022
  • In this article, we proposed to predict natural gas (NG) leakage levels through feature selection based on a factor analysis (FA) of the integrating the Korean Meteorological Agency data and natural gas leakage data for considering complex factors. The paper has been divided into three modules. First, we filled missing data based on the linear interpolation method on the integrated data set, and selected essential features using FA with OrdinalEncoder (OE)-based normalization. The dataset is labeled by K-means clustering. The final module uses four algorithms, K-nearest neighbors (KNN), decision tree (DT), random forest (RF), Naive Bayes (NB), to predict gas leakage levels. The proposed method is evaluated by the accuracy, area under the ROC curve (AUC), and mean standard error (MSE). The test results indicate that the OrdinalEncoder-Factor analysis (OE-F)-based classification method has improved successfully. Moreover, OE-F-based KNN (OE-F-KNN) showed the best performance by giving 95.20% accuracy, an AUC of 96.13%, and an MSE of 0.031.

MOISTURE CONTENT MEASUREMENT OF POWDERED FOOD USING RF IMPEDANCE SPECTROSCOPIC METHOD

  • Kim, K. B.;Lee, J. W.;S. H. Noh;Lee, S. S.
    • Proceedings of the Korean Society for Agricultural Machinery Conference
    • /
    • 2000.11b
    • /
    • pp.188-195
    • /
    • 2000
  • This study was conducted to measure the moisture content of powdered food using RF impedance spectroscopic method. In frequency range of 1.0 to 30㎒, the impedance such as reactance and resistance of parallel plate type sample holder filled with wheat flour and red-pepper powder of which moisture content range were 5.93∼-17.07%w.b. and 10.87 ∼ 27.36%w.b., respectively, was characterized using by Q-meter (HP4342). The reactance was a better parameter than the resistance in estimating the moisture density defined as product of moisture content and bulk density which was used to eliminate the effect of bulk density on RF spectral data in this study. Multivariate data analyses such as principal component regression, partial least square regression and multiple linear regression were performed to develop one calibration model having moisture density and reactance spectral data as parameters for determination of moisture content of both wheat flour and red-pepper powder. The best regression model was one by the multiple linear regression model. Its performance for unknown data of powdered food was showed that the bias, standard error of prediction and determination coefficient are 0.179% moisture content, 1.679% moisture content and 0.8849, respectively.

  • PDF

Image Processing Methods for Measurement of Lettuce Fresh Weight

  • Jung, Dae-Hyun;Park, Soo Hyun;Han, Xiong Zhe;Kim, Hak-Jin
    • Journal of Biosystems Engineering
    • /
    • v.40 no.1
    • /
    • pp.89-93
    • /
    • 2015
  • Purpose: Machine vision-based image processing methods can be useful for estimating the fresh weight of plants. This study analyzes the ability of two different image processing methods, i.e., morphological and pixel-value analysis methods, to measure the fresh weight of lettuce grown in a closed hydroponic system. Methods: Polynomial calibration models are developed to relate the number of pixels in images of leaf areas determined by the image processing methods to actual fresh weights of lettuce measured with a digital scale. The study analyzes the ability of the machine vision- based calibration models to predict the fresh weights of lettuce. Results: The coefficients of determination (> 0.93) and standard error of prediction (SEP) values (< 5 g) generated by the two developed models imply that the image processing methods could accurately estimate the fresh weight of each lettuce plant during its growing stage. Conclusions: The results demonstrate that the growing status of a lettuce plant can be estimated using leaf images and regression equations. This shows that a machine vision system installed on a plant growing bed can potentially be used to determine optimal harvest timings for efficient plant growth management.

Prediction of the Volume of Solid Radioactive Wastes to be Generated from Korean Next Generation Reactor

  • Cheong, Jae-Hak;Lee, Kun-Jai;Maeng, Sung-Jun;Song, Myung-Jae;Park, Kyu-Wan
    • Nuclear Engineering and Technology
    • /
    • v.29 no.3
    • /
    • pp.218-228
    • /
    • 1997
  • Correlations between the amount of DAW (Dry Active Waste) generated from present Korean PWRs and their operating parameters were analyzed. As the result of multi-variable linear regressions, a model predicting the volume of DAW using the number of shutdowns ( $f_{FS}$ ) and total personnel exposure ( $P_{\varepsilon}$) was derived. Considering one standard error bound, the model could successfully simulate about 8575 of the real data. In order to predict the amount of DAW to be generated from a KNGR another model was derived by taking into account the additional volume reduction by supercompaction system. In addition, the volume of WAW (Wet Active Waste) to be generated from KNGR (Korean Next Generation Reactor) was calculated by considering conceptual design data and replacement effect of radwaste evaporator with selective ion exchangers. Finally, total volume of SRW (Solid Radioactive Waste) to be generated from KNGR was predicted by inserting design goal values of $f_{FS}$ and $P_{\varepsilon}$ into the model. The result showed that the expected amount of SRW to be generated from KNGR would be in the range of 33~44㎥. $y^{-1}$ . It was proved that the value would meet the operational target of KNGR proposed by KEPCO, that is, 50㎥. $y^{-1}$ .

  • PDF

Performance Prediction of a Solar Power System with Stirling Engine (Matching Collector/Receiver with Engine/Generator Systems) (스털링엔진 태양열 발전시스템의 성능예측(집열기.수열기 및 엔진.발전기 시스템의 조화))

  • Bae, Myung-Whan;Chang, Hyung-Sung
    • Proceedings of the KSME Conference
    • /
    • 2001.11b
    • /
    • pp.794-799
    • /
    • 2001
  • The simulation analyses of a solar power system with monolithic concentrator by using a stirling engine are carried out to predict the system performance in four test sites. The site has different intensities and distributions of direct solar radiation respectively. Seoul, Pusan and Cheju in Korea, and Naha in Japan are selected as test sites. To accomplish the same demand of a 25 kW output that the power level of a system has, it needs to take the matching of collector/receiver with engine/generator systems. In such a case, also, the size of the collector is sometimes adjusted. In this study, the diameter of the collector is decided by using the solar radiation of design point, which is defined as the sum of average and standard deviation $\sigma$ of maximum direct solar radiation distribution for a day during a year in the respective test site. It is found that the average power output during the system operating time in the case of slope error ${\sigma}_s=2.5$ is within the range of 9 to 13 kW.

  • PDF

Minimum Message Length and Classical Methods for Model Selection in Univariate Polynomial Regression

  • Viswanathan, Murlikrishna;Yang, Young-Kyu;WhangBo, Taeg-Keun
    • ETRI Journal
    • /
    • v.27 no.6
    • /
    • pp.747-758
    • /
    • 2005
  • The problem of selection among competing models has been a fundamental issue in statistical data analysis. Good fits to data can be misleading since they can result from properties of the model that have nothing to do with it being a close approximation to the source distribution of interest (for example, overfitting). In this study we focus on the preference among models from a family of polynomial regressors. Three decades of research has spawned a number of plausible techniques for the selection of models, namely, Akaike's Finite Prediction Error (FPE) and Information Criterion (AIC), Schwartz's criterion (SCH), Generalized Cross Validation (GCV), Wallace's Minimum Message Length (MML), Minimum Description Length (MDL), and Vapnik's Structural Risk Minimization (SRM). The fundamental similarity between all these principles is their attempt to define an appropriate balance between the complexity of models and their ability to explain the data. This paper presents an empirical study of the above principles in the context of model selection, where the models under consideration are univariate polynomials. The paper includes a detailed empirical evaluation of the model selection methods on six target functions, with varying sample sizes and added Gaussian noise. The results from the study appear to provide strong evidence in support of the MML- and SRM- based methods over the other standard approaches (FPE, AIC, SCH and GCV).

  • PDF

Selecting the Best Prediction Model for Readmission

  • Lee, Eun-Whan
    • Journal of Preventive Medicine and Public Health
    • /
    • v.45 no.4
    • /
    • pp.259-266
    • /
    • 2012
  • Objectives: This study aims to determine the risk factors predicting rehospitalization by comparing three models and selecting the most successful model. Methods: In order to predict the risk of rehospitalization within 28 days after discharge, 11 951 inpatients were recruited into this study between January and December 2009. Predictive models were constructed with three methods, logistic regression analysis, a decision tree, and a neural network, and the models were compared and evaluated in light of their misclassification rate, root asymptotic standard error, lift chart, and receiver operating characteristic curve. Results: The decision tree was selected as the final model. The risk of rehospitalization was higher when the length of stay (LOS) was less than 2 days, route of admission was through the out-patient department (OPD), medical department was in internal medicine, 10th revision of the International Classification of Diseases code was neoplasm, LOS was relatively shorter, and the frequency of OPD visit was greater. Conclusions: When a patient is to be discharged within 2 days, the appropriateness of discharge should be considered, with special concern of undiscovered complications and co-morbidities. In particular, if the patient is admitted through the OPD, any suspected disease should be appropriately examined and prompt outcomes of tests should be secured. Moreover, for patients of internal medicine practitioners, co-morbidity and complications caused by chronic illness should be given greater attention.

Determination of Human Skin Moisture in the Near-Infrared Region from 1100 to 2200 nm by Portable NIR System (1100∼2200 nm 파장 영역의 휴대용 근적외선 분광분석기를 이용한 사람피부의 수분측정)

  • 안지원;서은정;우영아;김효진
    • YAKHAK HOEJI
    • /
    • v.47 no.3
    • /
    • pp.148-153
    • /
    • 2003
  • Skin moisture is an important factor in skin health. Measurement of moisture content can provide diagnostic information on the condition of skin. In this study, a portable near-infrared (NIR) system was newly integrated with a photo diode array detector that has no moving parts, and this system has been successfully applied for the evaluation of human skin moisture. Diffuse reflectance spectra were collected and transformed to absorbance using 1 nm step size over the wavelength range of 1100 nm to 2200 nm. Partial least squares regression (PLSR) was applied to develop a calibration model. For practical use for the evaluation of human skin moisture, the PLS model for human skin moisture was developed in vivo using the portable NIR system on the basis of the relative water content values of stratum corneum from the conventional capacitance method. The PLS model showed a good correlation. The calibration with the use of PLS model predicted human moisture with a standard error of prediction (SEP) of 3.5 at 1120∼1730 nm range. This study showed the possibility of skin moisture measurement using portable NIR system.

Limiting conditions prediction using machine learning for loss of condenser vacuum event

  • Dong-Hun Shin;Moon-Ghu Park;Hae-Yong Jeong;Jae-Yong Lee;Jung-Uk Sohn;Do-Yeon Kim
    • Nuclear Engineering and Technology
    • /
    • v.55 no.12
    • /
    • pp.4607-4616
    • /
    • 2023
  • We implement machine learning regression models to predict peak pressures of primary and secondary systems, a major safety concern in Loss Of Condenser Vacuum (LOCV) accident. We selected the Multi-dimensional Analysis of Reactor Safety-KINS standard (MARS-KS) code to analyze the LOCV accident, and the reference plant is the Korean Optimized Power Reactor 1000MWe (OPR1000). eXtreme Gradient Boosting (XGBoost) is selected as a machine learning tool. The MARS-KS code is used to generate LOCV accident data and the data is applied to train the machine learning model. Hyperparameter optimization is performed using a simulated annealing. The randomly generated combination of initial conditions within the operating range is put into the input of the XGBoost model to predict the peak pressure. These initial conditions that cause peak pressure with MARS-KS generate the results. After such a process, the error between the predicted value and the code output is calculated. Uncertainty about the machine learning model is also calculated to verify the model accuracy. The machine learning model presented in this paper successfully identifies a combination of initial conditions that produce a more conservative peak pressure than the values calculated with existing methodologies.