• Title/Summary/Keyword: Inference models

Search Result 449, Processing Time 0.022 seconds

Implementation on the evolutionary machine learning approaches for streamflow forecasting: case study in the Seybous River, Algeria (유출예측을 위한 진화적 기계학습 접근법의 구현: 알제리 세이보스 하천의 사례연구)

  • Zakhrouf, Mousaab;Bouchelkia, Hamid;Stamboul, Madani;Kim, Sungwon;Singh, Vijay P.
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.6
    • /
    • pp.395-408
    • /
    • 2020
  • This paper aims to develop and apply three different machine learning approaches (i.e., artificial neural networks (ANN), adaptive neuro-fuzzy inference systems (ANFIS), and wavelet-based neural networks (WNN)) combined with an evolutionary optimization algorithm and the k-fold cross validation for multi-step (days) streamflow forecasting at the catchment located in Algeria, North Africa. The ANN and ANFIS models yielded similar performances, based on four different statistical indices (i.e., root mean squared error (RMSE), Nash-Sutcliffe efficiency (NSE), correlation coefficient (R), and peak flow criteria (PFC)) for training and testing phases. The values of RMSE and PFC for the WNN model (e.g., RMSE = 8.590 ㎥/sec, PFC = 0.252 for (t+1) day, testing phase) were lower than those of ANN (e.g., RMSE = 19.120 ㎥/sec, PFC = 0.446 for (t+1) day, testing phase) and ANFIS (e.g., RMSE = 18.520 ㎥/sec, PFC = 0.444 for (t+1) day, testing phase) models, while the values of NSE and R for WNN model were higher than those of ANNs and ANFIS models. Therefore, the new approach can be a robust tool for multi-step (days) streamflow forecasting in the Seybous River, Algeria.

Location Inference of Twitter Users using Timeline Data (타임라인데이터를 이용한 트위터 사용자의 거주 지역 유추방법)

  • Kang, Ae Tti;Kang, Young Ok
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.69-81
    • /
    • 2015
  • If one can infer the residential area of SNS users by analyzing the SNS big data, it can be an alternative by replacing the spatial big data researches which result from the location sparsity and ecological error. In this study, we developed the way of utilizing the daily life activity pattern, which can be found from timeline data of tweet users, to infer the residential areas of tweet users. We recognized the daily life activity pattern of tweet users from user's movement pattern and the regional cognition words that users text in tweet. The models based on user's movement and text are named as the daily movement pattern model and the daily activity field model, respectively. And then we selected the variables which are going to be utilized in each model. We defined the dependent variables as 0, if the residential areas that users tweet mainly are their home location(HL) and as 1, vice versa. According to our results, performed by the discriminant analysis, the hit ratio of the two models was 67.5%, 57.5% respectively. We tested both models by using the timeline data of the stress-related tweets. As a result, we inferred the residential areas of 5,301 users out of 48,235 users and could obtain 9,606 stress-related tweets with residential area. The results shows about 44 times increase by comparing to the geo-tagged tweets counts. We think that the methodology we have used in this study can be used not only to secure more location data in the study of SNS big data, but also to link the SNS big data with regional statistics in order to analyze the regional phenomenon.

The Comparative Study for Software Reliability Models Based on NHPP (NHPP에 기초한 소프트웨어 신뢰도 모형에 대한 비교연구)

  • Gan, Gwang-Hyeon;Kim, Hui-Cheol;Lee, Byeong-Su
    • The KIPS Transactions:PartD
    • /
    • v.8D no.4
    • /
    • pp.393-400
    • /
    • 2001
  • This paper presents a stochastic model for the software failure phenomenon based on a nonhomogeneous Poisson process (NHPP). The failure process is analyzed to develop a suitable mean value function for the NHPP ; expressions are given for several performance measure. Actual software failure data are compared with generalized model by Goel dependent on the constant reflecting the quality of testing. The performance measures and parametric inferences of the new models, Rayleigh and Gumbel distributions, are discussed. The results of the new models are applied to real software failure data and compared with Goel-Okumoto and Yamada, Ohba and Osaki models. Tools of parameter inference was used method of the maximun likelihood estimate and the bisection algorithm for the computing nonlinear root. In this paper, using the sum of the squared errors, model selection was employed. The numerical example by NTDS data was illustrated.

  • PDF

E-commerce data based Sentiment Analysis Model Implementation using Natural Language Processing Model (자연어처리 모델을 이용한 이커머스 데이터 기반 감성 분석 모델 구축)

  • Choi, Jun-Young;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.11
    • /
    • pp.33-39
    • /
    • 2020
  • In the field of Natural Language Processing, Various research such as Translation, POS Tagging, Q&A, and Sentiment Analysis are globally being carried out. Sentiment Analysis shows high classification performance for English single-domain datasets by pretrained sentence embedding models. In this thesis, the classification performance is compared by Korean E-commerce online dataset with various domain attributes and 6 Neural-Net models are built as BOW (Bag Of Word), LSTM[1], Attention, CNN[2], ELMo[3], and BERT(KoBERT)[4]. It has been confirmed that the performance of pretrained sentence embedding models are higher than word embedding models. In addition, practical Neural-Net model composition is proposed after comparing classification performance on dataset with 17 categories. Furthermore, the way of compressing sentence embedding model is mentioned as future work, considering inference time against model capacity on real-time service.

Bayesian Inference for Autoregressive Models with Skewed Exponential Power Errors (비대칭 지수멱 오차를 가지는 자기회귀모형에서의 베이지안 추론)

  • Ryu, Hyunnam;Kim, Dal Ho
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.6
    • /
    • pp.1039-1047
    • /
    • 2014
  • An autoregressive model with normal errors is a natural model that attempts to fit time series data. More flexible models that include normal distribution as a special case are necessary because they can cover normality to non-normality models. The skewed exponential power distribution is a possible candidate for autoregressive models errors that may have tails lighter(platykurtic) or heavier(leptokurtic) than normal and skewness; in addition, the use of skewed exponential power distribution can reduce the influence of outliers and consequently increases the robustness of the analysis. We use SIR algorithm and grid method for an efficient Bayesian estimation.

An Extended Generative Feature Learning Algorithm for Image Recognition

  • Wang, Bin;Li, Chuanjiang;Zhang, Qian;Huang, Jifeng
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.11 no.8
    • /
    • pp.3984-4005
    • /
    • 2017
  • Image recognition has become an increasingly important topic for its wide application. It is highly challenging when facing to large-scale database with large variance. The recognition systems rely on a key component, i.e. the low-level feature or the learned mid-level feature. The recognition performance can be potentially improved if the data distribution information is exploited using a more sophisticated way, which usually a function over hidden variable, model parameter and observed data. These methods are called generative score space. In this paper, we propose a discriminative extension for the existing generative score space methods, which exploits class label when deriving score functions for image recognition task. Specifically, we first extend the regular generative models to class conditional models over both observed variable and class label. Then, we derive the mid-level feature mapping from the extended models. At last, the derived feature mapping is embedded into a discriminative classifier for image recognition. The advantages of our proposed approach are two folds. First, the resulted methods take simple and intuitive forms which are weighted versions of existing methods, benefitting from the Bayesian inference of class label. Second, the probabilistic generative modeling allows us to exploit hidden information and is well adapt to data distribution. To validate the effectiveness of the proposed method, we cooperate our discriminative extension with three generative models for image recognition task. The experimental results validate the effectiveness of our proposed approach.

In-depth exploration of machine learning algorithms for predicting sidewall displacement in underground caverns

  • Hanan Samadi;Abed Alanazi;Sabih Hashim Muhodir;Shtwai Alsubai;Abdullah Alqahtani;Mehrez Marzougui
    • Geomechanics and Engineering
    • /
    • v.37 no.4
    • /
    • pp.307-321
    • /
    • 2024
  • This paper delves into the critical assessment of predicting sidewall displacement in underground caverns through the application of nine distinct machine learning techniques. The accurate prediction of sidewall displacement is essential for ensuring the structural safety and stability of underground caverns, which are prone to various geological challenges. The dataset utilized in this study comprises a total of 310 data points, each containing 13 relevant parameters extracted from 10 underground cavern projects located in Iran and other regions. To facilitate a comprehensive evaluation, the dataset is evenly divided into training and testing subset. The study employs a diverse array of machine learning models, including recurrent neural network, back-propagation neural network, K-nearest neighbors, normalized and ordinary radial basis function, support vector machine, weight estimation, feed-forward stepwise regression, and fuzzy inference system. These models are leveraged to develop predictive models that can accurately forecast sidewall displacement in underground caverns. The training phase involves utilizing 80% of the dataset (248 data points) to train the models, while the remaining 20% (62 data points) are used for testing and validation purposes. The findings of the study highlight the back-propagation neural network (BPNN) model as the most effective in providing accurate predictions. The BPNN model demonstrates a remarkably high correlation coefficient (R2 = 0.99) and a low error rate (RMSE = 4.27E-05), indicating its superior performance in predicting sidewall displacement in underground caverns. This research contributes valuable insights into the application of machine learning techniques for enhancing the safety and stability of underground structures.

Research on Mining Technology for Explainable Decision Making (설명가능한 의사결정을 위한 마이닝 기술)

  • Kyungyong Chung
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.24 no.4
    • /
    • pp.186-191
    • /
    • 2023
  • Data processing techniques play a critical role in decision-making, including handling missing and outlier data, prediction, and recommendation models. This requires a clear explanation of the validity, reliability, and accuracy of all processes and results. In addition, it is necessary to solve data problems through explainable models using decision trees, inference, etc., and proceed with model lightweight by considering various types of learning. The multi-layer mining classification method that applies the sixth principle is a method that discovers multidimensional relationships between variables and attributes that occur frequently in transactions after data preprocessing. This explains how to discover significant relationships using mining on transactions and model the data through regression analysis. It develops scalable models and logistic regression models and proposes mining techniques to generate class labels through data cleansing, relevance analysis, data transformation, and data augmentation to make explanatory decisions.

Bayesian estimation of tension in bridge hangers using modal frequency measurements

  • Papadimitriou, Costas;Giakoumi, Konstantina;Argyris, Costas;Spyrou, Leonidas A.;Panetsos, Panagiotis
    • Structural Monitoring and Maintenance
    • /
    • v.3 no.4
    • /
    • pp.349-375
    • /
    • 2016
  • The tension of an arch bridge hanger is estimated using a number of experimentally identified modal frequencies. The hanger is connected through metallic plates to the bridge deck and arch. Two different categories of model classes are considered to simulate the vibrations of the hanger: an analytical model based on the Euler-Bernoulli beam theory, and a high-fidelity finite element (FE) model. A Bayesian parameter estimation and model selection method is used to discriminate between models, select the best model, and estimate the hanger tension and its uncertainty. It is demonstrated that the end plate connections and boundary conditions of the hanger due to the flexibility of the deck/arch significantly affect the estimate of the axial load and its uncertainty. A fixed-end high fidelity FE model of the hanger underestimates the hanger tension by more than 20 compared to a baseline FE model with flexible supports. Simplified beam models can give fairly accurate results, close to the ones obtained from the high fidelity FE model with flexible support conditions, provided that the concept of equivalent length is introduced and/or end rotational springs are included to simulate the flexibility of the hanger ends. The effect of the number of experimentally identified modal frequencies on the estimates of the hanger tension and its uncertainty is investigated.

An Extension of SWCL to Represent Logical Implication Knowledge under Semantic Web Environment (의미웹 환경에서 조건부함축 제약 지식표현을 위한 SWCL의 확장)

  • Kim, Hak-Jin
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.39 no.3
    • /
    • pp.7-22
    • /
    • 2014
  • By the publications of RDF and OWL, the Semantic Web is confirmed as a technology through which information in the Internet can be processed by machines. The focus of the Semantic Web study after then has moved to how to provide more useful information to users for their decision making beyond simple use of the structured data in ontologies. SWRL that makes logical inference possible by rules, and SWCL that formulates constraints under the Semantic Web environment are some of many efforts toward the achievement of that goal. Constraint represents a connection or a relationship between individual data in ontology. Based on SWCL, this paper tries to extend the language by adding one more type of constraint, implication constaint, in its repertoire. When users use binary variables to represent logical relationships in mathematical models, it requires and knowledge on the solver to solve the models. The use of implication constraint ease this difficulty. Its need, definition and relevant technical description is presented by the use of the optimal common attribute selection problem in product design.