• Title/Summary/Keyword: Data-driven models

Search Result 257, Processing Time 0.028 seconds

Research on Forecasting Framework for System Marginal Price based on Deep Recurrent Neural Networks and Statistical Analysis Models

  • Kim, Taehyun;Lee, Yoonjae;Hwangbo, Soonho
    • Clean Technology
    • /
    • v.28 no.2
    • /
    • pp.138-146
    • /
    • 2022
  • Electricity has become a factor that dramatically affects the market economy. The day-ahead system marginal price determines electricity prices, and system marginal price forecasting is critical in maintaining energy management systems. There have been several studies using mathematics and machine learning models to forecast the system marginal price, but few studies have been conducted to develop, compare, and analyze various machine learning and deep learning models based on a data-driven framework. Therefore, in this study, different machine learning algorithms (i.e., autoregressive-based models such as the autoregressive integrated moving average model) and deep learning networks (i.e., recurrent neural network-based models such as the long short-term memory and gated recurrent unit model) are considered and integrated evaluation metrics including a forecasting test and information criteria are proposed to discern the optimal forecasting model. A case study of South Korea using long-term time-series system marginal price data from 2016 to 2021 was applied to the developed framework. The results of the study indicate that the autoregressive integrated moving average model (R-squared score: 0.97) and the gated recurrent unit model (R-squared score: 0.94) are appropriate for system marginal price forecasting. This study is expected to contribute significantly to energy management systems and the suggested framework can be explicitly applied for renewable energy networks.

Evaluation of Water Quality Prediction Models at Intake Station by Data Mining Techniques (데이터마이닝 기법을 적용한 취수원 수질예측모형 평가)

  • Kim, Ju-Hwan;Chae, Soo-Kwon;Kim, Byung-Sik
    • Journal of Environmental Impact Assessment
    • /
    • v.20 no.5
    • /
    • pp.705-716
    • /
    • 2011
  • For the efficient discovery of knowledge and information from the observed systems, data mining techniques can be an useful tool for the prediction of water quality at intake station in rivers. Deterioration of water quality can be caused at intake station in dry season due to insufficient flow. This demands additional outflow from dam since some extent of deterioration can be attenuated by dam reservoir operation to control outflow considering predicted water quality. A seasonal occurrence of high ammonia nitrogen ($NH_3$-N) concentrations has hampered chemical treatment processes of a water plant in Geum river. Monthly flow allocation from upstream dam is important for downstream $NH_3$-N control. In this study, prediction models of water quality based on multiple regression (MR), artificial neural network and data mining methods were developed to understand water quality variation and to support dam operations through providing predicted $NH_3$-N concentrations at intake station. The models were calibrated with eight years of monthly data and verified with another two years of independent data. In those models, the $NH_3$-N concentration for next time step is dependent on dam outflow, river water quality such as alkalinity, temperature, and $NH_3$-N of previous time step. The model performances are compared and evaluated by error analysis and statistical characteristics like correlation and determination coefficients between the observed and the predicted water quality. It is expected that these data mining techniques can present more efficient data-driven tools in modelling stage and it is found that those models can be applied well to predict water quality in stream river systems.

Comparison of Different Multiple Linear Regression Models for Real-time Flood Stage Forecasting (실시간 수위 예측을 위한 다중선형회귀 모형의 비교)

  • Choi, Seung Yong;Han, Kun Yeun;Kim, Byung Hyun
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.1B
    • /
    • pp.9-20
    • /
    • 2012
  • Recently to overcome limitations of conceptual, hydrological and physics based models for flood stage forecasting, multiple linear regression model as one of data-driven models have been widely adopted for forecasting flood streamflow(stage). The objectives of this study are to compare performance of different multiple linear regression models according to regression coefficient estimation methods and determine most effective multiple linear regression flood stage forecasting models. To do this, the time scale was determined through the autocorrelation analysis of input data and different flood stage forecasting models developed using regression coefficient estimation methods such as LS(least square), WLS(weighted least square), SPW(stepwise) was applied to flood events in Jungrang stream. To evaluate performance of established models, fours statistical indices were used, namely; Root mean square error(RMSE), Nash Sutcliffe efficiency coefficient (NSEC), mean absolute error (MAE), adjusted coefficient of determination($R^{*2}$). The results show that the flood stage forecasting model using SPW(stepwise) parameter estimation can carry out the river flood stage prediction better in comparison with others, and the flood stage forecasting model using LS(least square) parameter estimation is also found to be slightly better than the flood stage forecasting model using WLS(weighted least square) parameter estimation.

Analysis of Impact of Hydrologic Data on Neuro-Fuzzy Technique Result (수문자료가 Neuro-Fuzzy 기법 결과에 미치는 영향 분석)

  • Ji, Jungwon;Choi, Changwon;Yi, Jaeeung
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.33 no.4
    • /
    • pp.1413-1424
    • /
    • 2013
  • Recently, the frequency of severe storms increases in Korea. Severe storms occurring in a short time cause huge losses of both life and property. A considerable research has been performed for the flood control system development based on an accurate stream discharge prediction. A physical model is mainly used for flood forecasting and warning. Physical rainfall-runoff models used for the conventional flood forecasting process require extensive information and data, and include uncertainties which can possibly accumulate errors during modelling processes. ANFIS, a data driven model combining neural network and fuzzy technique, can decrease the amount of physical data required for the construction of a conventional physical models and easily construct and evaluate a flood forecasting model by utilizing only rainfall and water level data. A data driven model, however, has a disadvantage that it does not provide the mathematical and physical correlations between input and output data of the model. The characteristics of a data driven model according to functional options and input data such as the change of clustering radius and training data length used in the ANFIS model were analyzed in this study. In addition, the applicability of ANFIS was evaluated through comparison with the results of HEC-HMS which is widely used for rainfall-runoff model in Korea. The neuro-fuzzy technique was applied to a Cheongmicheon Basin in the South Han River using the observed precipitation and stream level data from 2007 to 2011.

Development and Evaluation of Electronic Health Record Data-Driven Predictive Models for Pressure Ulcers (전자건강기록 데이터 기반 욕창 발생 예측모델의 개발 및 평가)

  • Park, Seul Ki;Park, Hyeoun-Ae;Hwang, Hee
    • Journal of Korean Academy of Nursing
    • /
    • v.49 no.5
    • /
    • pp.575-585
    • /
    • 2019
  • Purpose: The purpose of this study was to develop predictive models for pressure ulcer incidence using electronic health record (EHR) data and to compare their predictive validity performance indicators with that of the Braden Scale used in the study hospital. Methods: A retrospective case-control study was conducted in a tertiary teaching hospital in Korea. Data of 202 pressure ulcer patients and 14,705 non-pressure ulcer patients admitted between January 2015 and May 2016 were extracted from the EHRs. Three predictive models for pressure ulcer incidence were developed using logistic regression, Cox proportional hazards regression, and decision tree modeling. The predictive validity performance indicators of the three models were compared with those of the Braden Scale. Results: The logistic regression model was most efficient with a high area under the receiver operating characteristics curve (AUC) estimate of 0.97, followed by the decision tree model (AUC 0.95), Cox proportional hazards regression model (AUC 0.95), and the Braden Scale (AUC 0.82). Decreased mobility was the most significant factor in the logistic regression and Cox proportional hazards models, and the endotracheal tube was the most important factor in the decision tree model. Conclusion: Predictive validity performance indicators of the Braden Scale were lower than those of the logistic regression, Cox proportional hazards regression, and decision tree models. The models developed in this study can be used to develop a clinical decision support system that automatically assesses risk for pressure ulcers to aid nurses.

Consistency check algorithm for validation and re-diagnosis to improve the accuracy of abnormality diagnosis in nuclear power plants

  • Kim, Geunhee;Kim, Jae Min;Shin, Ji Hyeon;Lee, Seung Jun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.10
    • /
    • pp.3620-3630
    • /
    • 2022
  • The diagnosis of abnormalities in a nuclear power plant is essential to maintain power plant safety. When an abnormal event occurs, the operator diagnoses the event and selects the appropriate abnormal operating procedures and sub-procedures to implement the necessary measures. To support this, abnormality diagnosis systems using data-driven methods such as artificial neural networks and convolutional neural networks have been developed. However, data-driven models cannot always guarantee an accurate diagnosis because they cannot simulate all possible abnormal events. Therefore, abnormality diagnosis systems should be able to detect their own potential misdiagnosis. This paper proposes a rulebased diagnostic validation algorithm using a previously developed two-stage diagnosis model in abnormal situations. We analyzed the diagnostic results of the sub-procedure stage when the first diagnostic results were inaccurate and derived a rule to filter the inconsistent sub-procedure diagnostic results, which may be inaccurate diagnoses. In a case study, two abnormality diagnosis models were built using gated recurrent units and long short-term memory cells, and consistency checks on the diagnostic results from both models were performed to detect any inconsistencies. Based on this, a re-diagnosis was performed to select the label of the second-best value in the first diagnosis, after which the diagnosis accuracy increased. That is, the model proposed in this study made it possible to detect diagnostic failures by the developed consistency check of the sub-procedure diagnostic results. The consistency check process has the advantage that the operator can review the results and increase the diagnosis success rate by performing additional re-diagnoses. The developed model is expected to have increased applicability as an operator support system in terms of selecting the appropriate AOPs and sub-procedures with re-diagnosis, thereby further increasing abnormal event diagnostic accuracy.

Data-Driven Modelling of Damage Prediction of Granite Using Acoustic Emission Parameters in Nuclear Waste Repository

  • Lee, Hang-Lo;Kim, Jin-Seop;Hong, Chang-Ho;Jeong, Ho-Young;Cho, Dong-Keun
    • Journal of Nuclear Fuel Cycle and Waste Technology(JNFCWT)
    • /
    • v.19 no.1
    • /
    • pp.75-85
    • /
    • 2021
  • Evaluating the quantitative damage to rocks through acoustic emission (AE) has become a research focus. Most studies mainly used one or two AE parameters to evaluate the degree of damage, but several AE parameters have been rarely used. In this study, several data-driven models were employed to reflect the combined features of AE parameters. Through uniaxial compression tests, we obtained mechanical and AE-signal data for five granite specimens. The maximum amplitude, hits, counts, rise time, absolute energy, and initiation frequency expressed as the cumulative value were selected as input parameters. The result showed that gradient boosting (GB) was the best model among the support vector regression methods. When GB was applied to the testing data, the root-mean-square error and R between the predicted and actual values were 0.96 and 0.077, respectively. A parameter analysis was performed to capture the parameter significance. The result showed that cumulative absolute energy was the main parameter for damage prediction. Thus, AE has practical applicability in predicting rock damage without conducting mechanical tests. Based on the results, this study will be useful for monitoring the near-field rock mass of nuclear waste repository.

Determination of natural periods of vibration using genetic programming

  • Joshi, Shardul G.;Londhe, Shreenivas N.;Kwatra, Naveen
    • Earthquakes and Structures
    • /
    • v.6 no.2
    • /
    • pp.201-216
    • /
    • 2014
  • Many building codes use the empirical equation to determine fundamental period of vibration where in effect of length, width and the stiffness of the building is not explicitly accounted for. Also the equation, estimates the fundamental period of vibration with large safety margin beyond certain height of the building. An attempt is made to arrive at the simple empirical equations for fundamental period of vibration with adequate safety margin, using soft computing technique of Genetic Programming (GP). In the present study, GP models are developed in four categories, varying the number of input parameters in each category. Input parameters are chosen to represent mass, stiffness and geometry of the buildings directly or indirectly. Total numbers of 206 buildings are analyzed out of which, data set of 142 buildings is used to develop these models. It is observed that GP models developed under B and C category yield the same equation for fundamental period of vibration along X direction as well as along Y direction whereas the equation of fundamental period of vibration along X direction and along Y direction is of the same form for category D. The equations obtained as an output of GP models clearly indicate the influence of mass, geometry and stiffness of the building over fundamental period of vibration. These equations are then compared with the equation recommended by other researcher.

A basic study 3D model advancement method for nuclear power plant (원자력 발전설비의 3D 모델 상세화 방안에 대한 기초 연구)

  • Lim, Byung-Ki
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2018.05a
    • /
    • pp.37-38
    • /
    • 2018
  • BIM(Building Information Modeling) in the architecture, VDC(Virtual Design and Construction) defined CIFE(Center for Integrated Facility Engineering) of Stanford university in USA, and Data-driven design definition issued by TECDOC-1284 of IAEA are doing data-level design generated by 3D CAD technology, integrating and managing related information based on the 3D model, and Using 3D models effectively during nuclear power plant life cycle. 3D model of domestic nuclear power industry is using interference review between design fields, 4D system linked 3D construction model and schedule activity, but the 3D model generated in the design phase is effectively not utilized during the construction, operation, decommissioning. therefore, This study is aimed to suggest 3D model LOD(Level of Detail) advancement method through the analysis of existing literature, 2D drawings, and 3D models throughout nuclear power plant lifecycle.

  • PDF

Effective Acoustic Model Clustering via Decision Tree with Supervised Decision Tree Learning

  • Park, Jun-Ho;Ko, Han-Seok
    • Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In the acoustic modeling for large vocabulary speech recognition, a sparse data problem caused by a huge number of context-dependent (CD) models usually leads the estimated models to being unreliable. In this paper, we develop a new clustering method based on the C45 decision-tree learning algorithm that effectively encapsulates the CD modeling. The proposed scheme essentially constructs a supervised decision rule and applies over the pre-clustered triphones using the C45 algorithm, which is known to effectively search through the attributes of the training instances and extract the attribute that best separates the given examples. In particular, the data driven method is used as a clustering algorithm while its result is used as the learning target of the C45 algorithm. This scheme has been shown to be effective particularly over the database of low unknown-context ratio in terms of recognition performance. For speaker-independent, task-independent continuous speech recognition task, the proposed method reduced the percent accuracy WER by 3.93% compared to the existing rule-based methods.

  • PDF