• Title/Summary/Keyword: data-based model

Search Result 21,096, Processing Time 0.043 seconds

Robust transformer-based anomaly detection for nuclear power data using maximum correntropy criterion

  • Shuang Yi;Sheng Zheng;Senquan Yang;Guangrong Zhou;Junjie He
    • Nuclear Engineering and Technology
    • /
    • v.56 no.4
    • /
    • pp.1284-1295
    • /
    • 2024
  • Due to increasing operational security demands, digital and intelligent condition monitoring of nuclear power plants is becoming more significant. However, establishing an accurate and effective anomaly detection model is still challenging. This is mainly because of data characteristics of nuclear power data, including the lack of clear class labels combined with frequent interference from outliers and anomalies. In this paper, we introduce a Transformer-based unsupervised model for anomaly detection of nuclear power data, a modified loss function based on the maximum correntropy criterion (MCC) is applied in the model training to improve the robustness. Experimental results on simulation datasets demonstrate that the proposed Trans-MCC model achieves equivalent or superior detection performance to the baseline models, and the use of the MCC loss function is proven can obviously alleviate the negative effect of outliers and anomalies in the training procedure, the F1 score is improved by up to 0.31 compared to Trans-MSE on a specific dataset. Further studies on genuine nuclear power data have verified the model's capability to detect anomalies at an earlier stage, which is significant to condition monitoring.

Sample size calculations for clustered count data based on zero-inflated discrete Weibull regression models

  • Hanna Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.1
    • /
    • pp.55-64
    • /
    • 2024
  • In this study, we consider the sample size determination problem for clustered count data with many zeros. In general, zero-inflated Poisson and binomial models are commonly used for zero-inflated data; however, in real data the assumptions that should be satisfied when using each model might be violated. We calculate the required sample size based on a discrete Weibull regression model that can handle both underdispersed and overdispersed data types. We use the Monte Carlo simulation to compute the required sample size. With our proposed method, a unified model with a low failure risk can be used to cope with the dispersed data type and handle data with many zeros, which appear in groups or clusters sharing a common variation source. A simulation study shows that our proposed method provides accurate results, revealing that the sample size is affected by the distribution skewness, covariance structure of covariates, and amount of zeros. We apply our method to the pancreas disorder length of the stay data collected from Western Australia.

Optimal Road Maintenance Section Selection Using Mixed Integer Programming (혼합정수계획법을 활용한 도로포장 보수구간 선정 최적화 연구)

  • Cho, Geonyoung;Lim, Heejong
    • International Journal of Highway Engineering
    • /
    • v.19 no.3
    • /
    • pp.65-70
    • /
    • 2017
  • PURPOSES : Pavement Management System contains the data that describe the condition of the road. Under limited budget, the data can be utilized for efficient plans. The objective of this research is to develop a mixed integer program model that maximizes remaining durable years (or Lane-Kilometer-Years) in road maintenance planning. METHODS : An optimization model based on a mixed integer program is developed. The model selects a cluster of sectors that are adjacent to each other according to the road condition. The model also considers constraints required by the Seoul Metropolitan Facilities Management Corporation. They select two lanes at most not to block the traffic and limit the number of sectors for one-time construction to finish the work in given time. We incorporate variable cost constraints. As the model selects more sectors, the unit cost of the construction becomes smaller. The optimal choice of the number of sectors is implemented using piecewise linear constraints. RESULTS : Data (SPI) collected from Pavement Management System managed by Seoul Metropolitan City are fed into the model. Based on the data and the model, the optimal maintenance plans are established. Some of the optimal plans cannot be generated directly in existing heuristic approach or by human intuition. CONCLUSIONS:The mathematical model using actual data generates the optimal maintenance plans.

Software Quality Classification using Bayesian Classifier (베이지안 분류기를 이용한 소프트웨어 품질 분류)

  • Hong, Euy-Seok
    • Journal of Information Technology Services
    • /
    • v.11 no.1
    • /
    • pp.211-221
    • /
    • 2012
  • Many metric-based classification models have been proposed to predict fault-proneness of software module. This paper presents two prediction models using Bayesian classifier which is one of the most popular modern classification algorithms. Bayesian model based on Bayesian probability theory can be a promising technique for software quality prediction. This is due to the ability to represent uncertainty using probabilities and the ability to partly incorporate expert's knowledge into training data. The two models, Na$\ddot{i}$veBayes(NB) and Bayesian Belief Network(BBN), are constructed and dimensionality reduction of training data and test data are performed before model evaluation. Prediction accuracy of the model is evaluated using two prediction error measures, Type I error and Type II error, and compared with well-known prediction models, backpropagation neural network model and support vector machine model. The results show that the prediction performance of BBN model is slightly better than that of NB. For the data set with ambiguity, although the BBN model's prediction accuracy is not as good as the compared models, it achieves better performance than the compared models for the data set without ambiguity.

Auto Regulated Data Provisioning Scheme with Adaptive Buffer Resilience Control on Federated Clouds

  • Kim, Byungsang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.11
    • /
    • pp.5271-5289
    • /
    • 2016
  • On large-scale data analysis platforms deployed on cloud infrastructures over the Internet, the instability of the data transfer time and the dynamics of the processing rate require a more sophisticated data distribution scheme which maximizes parallel efficiency by achieving the balanced load among participated computing elements and by eliminating the idle time of each computing element. In particular, under the constraints that have the real-time and limited data buffer (in-memory storage) are given, it needs more controllable mechanism to prevent both the overflow and the underflow of the finite buffer. In this paper, we propose an auto regulated data provisioning model based on receiver-driven data pull model. On this model, we provide a synchronized data replenishment mechanism that implicitly avoids the data buffer overflow as well as explicitly regulates the data buffer underflow by adequately adjusting the buffer resilience. To estimate the optimal size of buffer resilience, we exploits an adaptive buffer resilience control scheme that minimizes both data buffer space and idle time of the processing elements based on directly measured sample path analysis. The simulation results show that the proposed scheme provides allowable approximation compared to the numerical results. Also, it is suitably efficient to apply for such a dynamic environment that cannot postulate the stochastic characteristic for the data transfer time, the data processing rate, or even an environment where the fluctuation of the both is presented.

Variable Density Yield Model for Irrigated Plantations of Dalbergia sissoo Grown Under Hot Arid Conditions in India

  • Tewari, Vindhya Prasad
    • Journal of Forest and Environmental Science
    • /
    • v.28 no.4
    • /
    • pp.205-211
    • /
    • 2012
  • Yield tables are a frequently used data base for regional timber resource forecasting. A normal yield table is based on two independent variables, age and site (species constant), and applies to fully stocked (or normal) stands while empirical yield tables are based on average rather than fully stocked stands. Normal and empirical yield tables essentially have many limitations. The limitations of normal and empirical yield tables led to the development of variable density yield tables. Mathematical models for estimating timber yields are usually developed by fitting a suitable equation to observed data. The model is then used to predict yields for conditions resembling those of the original data set. It may be accurate for the specific conditions, but of unproven accuracy or even entirely useless in other circumstances. Thus, these models tend to be specific rather than general and require validation before applying to other areas. Dalbergia sissoo forms a major portion of irrigated plantations in the hot desert of India and is an important timber tree species where stem wood is primarily used as timber. Variable density yield model is not available for this species which is very crucial in long-term planning for managing the plantations on a sustained basis. Thus, the objective of this study was to develop variable density yield model based on the data collected from 30 sample plots of D. sissoo laid out in IGNP area of Rajasthan State (India) and measured annually for 5 years. The best approximating model was selected based on the fit statistics among the models tested in the study. The model develop was evaluated based on quantitative and qualitative statistical criteria which showed that the model is statistically sound in prediction. The model can be safely applied on D. sissooo plantations in the study area or areas having similar conditions.

Target Market Determination for Information Distribution and Student Recruitment Using an Extended RFM Model with Spatial Analysis

  • ERNAWATI, ERNAWATI;BAHARIN, Safiza Suhana Kamal;KASMIN, Fauziah
    • Journal of Distribution Science
    • /
    • v.20 no.6
    • /
    • pp.1-10
    • /
    • 2022
  • Purpose: This research proposes a new modified Recency-Frequency-Monetary (RFM) model by extending the model with spatial analysis for supporting decision-makers in discovering the promotional target market. Research design, data and methodology: This quantitative research utilizes data-mining techniques and the RFM model to cluster a university's provider schools. The RFM model was modified by adapting its variables to the university's marketing context and adding a district's potential (D) variable based on heatmap analysis using Geographic Information System (GIS) and K-means clustering. The K-prototype algorithm and the Elbow method were applied to find provider school clusters using the proposed RFM-D model. After profiling the clusters, the target segment was assigned. The model was validated using empirical data from an Indonesian university, and its performance was compared to the Customer Lifetime Value (CLV)-based RFM utilizing accuracy, precision, recall, and F1-score metrics. Results: This research identified five clusters. The target segment was chosen from the highest-value and high-value clusters that comprised 17.80% of provider schools but can contribute 75.77% of students. Conclusions: The proposed model recommended more targeted schools in higher-potential districts and predicted the target segment with 0.99 accuracies, outperforming the CLV-based model. The empirical findings help university management determine the promotion location and allocate resources for promotional information distribution and student recruitment.

Anomaly Detection in Sensor Data

  • Kim, Jong-Min;Baik, Jaiwook
    • Journal of Applied Reliability
    • /
    • v.18 no.1
    • /
    • pp.20-32
    • /
    • 2018
  • Purpose: The purpose of this study is to set up an anomaly detection criteria for sensor data coming from a motorcycle. Methods: Five sensor values for accelerator pedal, engine rpm, transmission rpm, gear and speed are obtained every 0.02 second from a motorcycle. Exploratory data analysis is used to find any pattern in the data. Traditional process control methods such as X control chart and time series models are fitted to find any anomaly behavior in the data. Finally unsupervised learning algorithm such as k-means clustering is used to find any anomaly spot in the sensor data. Results: According to exploratory data analysis, the distribution of accelerator pedal sensor values is very much skewed to the left. The motorcycle seemed to have been driven in a city at speed less than 45 kilometers per hour. Traditional process control charts such as X control chart fail due to severe autocorrelation in each sensor data. However, ARIMA model found three abnormal points where they are beyond 2 sigma limits in the control chart. We applied a copula based Markov chain to perform statistical process control for correlated observations. Copula based Markov model found anomaly behavior in the similar places as ARIMA model. In an unsupervised learning algorithm, large sensor values get subdivided into two, three, and four disjoint regions. So extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior in the sensor values. Conclusion: Exploratory data analysis is useful to find any pattern in the sensor data. Process control chart using ARIMA and Joe's copula based Markov model also give warnings near similar places in the data. Unsupervised learning algorithm shows us that the extreme sensor values are the ones that need to be tracked down for any sign of anomaly behavior.

A Design of Feature-based Data Model Using Digital Map 2.0 (수치지도 2.0을 이용한 객체기반 데이터 모델 설계)

  • Lim, Kwang-Hyeon;Jin, Cheng Hao;Kim, Hyeong-Soo;Li, Xun;Ryu, Keun-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.17 no.7
    • /
    • pp.33-43
    • /
    • 2012
  • In With increase of a demand on the spatial data, the need of spatial data model which can effectively store and manege spatial objects becomes more important in many GIS applications. There are many researches on the spatial data model. Several data models were proposed for some special functions, however, there are still many problems in the management and applications. Digital Map is one of spatial data model which is being used in Korea. The existing Digital Map is based on the Tiles. This approach needs more cost in its construction and management. Therefore, in this paper, we propose a feature-based seamless data model with Digital map 2.0 which is based on Tiles. This model can be easily constructed and managed in the large databases so that it is able to apply to any systems. The proposed model uses the relationships between features to correct updated data and the Unique Feature IDentifier(UFID) also makes system to search and manage the feature data more easily and efficiently.

Development of Wastewater Treatment Process Simulators Based on Artificial Neural Network and Mass Balance Models (인공신경망 및 물질수지 모델을 활용한 하수처리 프로세스 시뮬레이터 구축)

  • Kim, Jungruyl;Lee, Jaehyun;Oh, Jeill
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.29 no.3
    • /
    • pp.427-436
    • /
    • 2015
  • Developing two process models to simulate wastewater treatment process is needed to draw a comparison between measured BOD data and estimated process model data: a mathematical model based on the process mass-balance and an ANN (artificial neural network) model. Those two types of simulator can fit well in terms of effluent BOD data, which models are formulated based on the distinctive five parameters: influent flow rate, effluent flow rate, influent BOD concentration, biomass concentration, and returned sludge percentage. The structuralized mass-balance model and ANN modeI with seasonal periods can estimate data set more precisely, and changing optimization algorithm for the penalty could be a useful option to tune up the process behavior estimations. An complex model such as ANN model coupled with mass-balance equation will be required to simulate process dynamics more accurately.