• Title/Summary/Keyword: Ensemble Algorithm

Search Result 223, Processing Time 0.026 seconds

An Ensemble Classification of Mental Health in Malaysia related to the Covid-19 Pandemic using Social Media Sentiment Analysis

  • Nur 'Aisyah Binti Zakaria Adli;Muneer Ahmad;Norjihan Abdul Ghani;Sri Devi Ravana;Azah Anir Norman
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.2
    • /
    • pp.370-396
    • /
    • 2024
  • COVID-19 was declared a pandemic by the World Health Organization (WHO) on 30 January 2020. The lifestyle of people all over the world has changed since. In most cases, the pandemic has appeared to create severe mental disorders, anxieties, and depression among people. Mostly, the researchers have been conducting surveys to identify the impacts of the pandemic on the mental health of people. Despite the better quality, tailored, and more specific data that can be generated by surveys,social media offers great insights into revealing the impact of the pandemic on mental health. Since people feel connected on social media, thus, this study aims to get the people's sentiments about the pandemic related to mental issues. Word Cloud was used to visualize and identify the most frequent keywords related to COVID-19 and mental health disorders. This study employs Majority Voting Ensemble (MVE) classification and individual classifiers such as Naïve Bayes (NB), Support Vector Machine (SVM), and Logistic Regression (LR) to classify the sentiment through tweets. The tweets were classified into either positive, neutral, or negative using the Valence Aware Dictionary or sEntiment Reasoner (VADER). Confusion matrix and classification reports bestow the precision, recall, and F1-score in identifying the best algorithm for classifying the sentiments.

A Multimodal Profile Ensemble Approach to Development of Recommender Systems Using Big Data (빅데이터 기반 추천시스템 구현을 위한 다중 프로파일 앙상블 기법)

  • Kim, Minjeong;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.93-110
    • /
    • 2015
  • The recommender system is a system which recommends products to the customers who are likely to be interested in. Based on automated information filtering technology, various recommender systems have been developed. Collaborative filtering (CF), one of the most successful recommendation algorithms, has been applied in a number of different domains such as recommending Web pages, books, movies, music and products. But, it has been known that CF has a critical shortcoming. CF finds neighbors whose preferences are like those of the target customer and recommends products those customers have most liked. Thus, CF works properly only when there's a sufficient number of ratings on common product from customers. When there's a shortage of customer ratings, CF makes the formation of a neighborhood inaccurate, thereby resulting in poor recommendations. To improve the performance of CF based recommender systems, most of the related studies have been focused on the development of novel algorithms under the assumption of using a single profile, which is created from user's rating information for items, purchase transactions, or Web access logs. With the advent of big data, companies got to collect more data and to use a variety of information with big size. So, many companies recognize it very importantly to utilize big data because it makes companies to improve their competitiveness and to create new value. In particular, on the rise is the issue of utilizing personal big data in the recommender system. It is why personal big data facilitate more accurate identification of the preferences or behaviors of users. The proposed recommendation methodology is as follows: First, multimodal user profiles are created from personal big data in order to grasp the preferences and behavior of users from various viewpoints. We derive five user profiles based on the personal information such as rating, site preference, demographic, Internet usage, and topic in text. Next, the similarity between users is calculated based on the profiles and then neighbors of users are found from the results. One of three ensemble approaches is applied to calculate the similarity. Each ensemble approach uses the similarity of combined profile, the average similarity of each profile, and the weighted average similarity of each profile, respectively. Finally, the products that people among the neighborhood prefer most to are recommended to the target users. For the experiments, we used the demographic data and a very large volume of Web log transaction for 5,000 panel users of a company that is specialized to analyzing ranks of Web sites. R and SAS E-miner was used to implement the proposed recommender system and to conduct the topic analysis using the keyword search, respectively. To evaluate the recommendation performance, we used 60% of data for training and 40% of data for test. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. A widely used combination metric called F1 metric that gives equal weight to both recall and precision was employed for our evaluation. As the results of evaluation, the proposed methodology achieved the significant improvement over the single profile based CF algorithm. In particular, the ensemble approach using weighted average similarity shows the highest performance. That is, the rate of improvement in F1 is 16.9 percent for the ensemble approach using weighted average similarity and 8.1 percent for the ensemble approach using average similarity of each profile. From these results, we conclude that the multimodal profile ensemble approach is a viable solution to the problems encountered when there's a shortage of customer ratings. This study has significance in suggesting what kind of information could we use to create profile in the environment of big data and how could we combine and utilize them effectively. However, our methodology should be further studied to consider for its real-world application. We need to compare the differences in recommendation accuracy by applying the proposed method to different recommendation algorithms and then to identify which combination of them would show the best performance.

Comparison between Uncertainties of Cultivar Parameter Estimates Obtained Using Error Calculation Methods for Forage Rice Cultivars (오차 계산 방식에 따른 사료용 벼 품종의 품종모수 추정치 불확도 비교)

  • Young Sang Joh;Shinwoo Hyun;Kwang Soo Kim
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.3
    • /
    • pp.129-141
    • /
    • 2023
  • Crop models have been used to predict yield under diverse environmental and cultivation conditions, which can be used to support decisions on the management of forage crop. Cultivar parameters are one of required inputs to crop models in order to represent genetic properties for a given forage cultivar. The objectives of this study were to compare calibration and ensemble approaches in order to minimize the uncertainty of crop yield estimates using the SIMPLE crop model. Cultivar parameters were calibrated using Log-likelihood (LL) and Generic Composite Similarity Measure (GCSM) as an objective function for Metropolis-Hastings (MH) algorithm. In total, 20 sets of cultivar parameters were generated for each method. Two types of ensemble approach. First type of ensemble approach was the average of model outputs (Eem), using individual parameters. The second ensemble approach was model output (Epm) of cultivar parameter obtained by averaging given 20 sets of parameters. Comparison was done for each cultivar and for each error calculation methods. 'Jowoo' and 'Yeongwoo', which are forage rice cultivars used in Korea, were subject to the parameter calibration. Yield data were obtained from experiment fields at Suwon, Jeonju, Naju and I ksan. Data for 2013, 2014 and 2016 were used for parameter calibration. For validation, yield data reported from 2016 to 2018 at Suwon was used. Initial calibration indicated that genetic coefficients obtained by LL were distributed in a narrower range than coefficients obtained by GCSM. A two-sample t-test was performed to compare between different methods of ensemble approaches and no significant difference was found between them. Uncertainty of GCSM can be neutralized by adjusting the acceptance probability. The other ensemble method (Epm) indicates that the uncertainty can be reduced with less computation using ensemble approach.

Power Line Noise Reductions in ABR by Properly Chosen Iteration Numbers (ABR에서 반복회수 설정에 의한 전력선 잡음의 제거)

  • 안주현;김수찬;남기창;심윤주;김희남;송철규;김덕원
    • Journal of Biomedical Engineering Research
    • /
    • v.22 no.3
    • /
    • pp.241-247
    • /
    • 2001
  • ABR(auditory brainstem response) is one of the audiometry which measures objective hearing threshold level by acquiring electric evoked potentials emanated from auditory nerve system responding to an auditory stimulation. However, the obtained potentials which are largely interfered by power line noise, have extremely low SNR, thus ensemble average algorithm is generally used. The purpose of this study was to investigate the effect of iteration number in ensemble average on the reduction of the power line noise. The power line noise was modeled to be a 60 Hz sinusoidal signal and the energy of the modeled signal was calculated when it was averaged. It was verified by simulation that the energy had the periodic zero points for each stimulation rate, and 60 Hz signal induced by the power line was applied to the developed ABR system to confirm that the period of zero energy point was the same with that of the simulation. By the properly selected iteration number, power line noise could be reduced and more reliable ABR could be acquired.

  • PDF

Tire Lateral Force Estimation System Using Nonlinear Kalman Filter (비선형 Kalman Filter를 사용한 타이어 횡력 추정 시스템)

  • Lee, Dong-Hun;Kim, In-Keun;Huh, Kun-Soo
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.20 no.6
    • /
    • pp.126-131
    • /
    • 2012
  • Tire force is one of important parameters which determine vehicle dynamics. However, it is hard to measure tire force directly through sensors. Not only the sensor is expensive but also installation of sensors on harsh environments is difficult. Therefore, estimation algorithms based on vehicle dynamic models are introduced to estimate the tire forces indirectly. In this paper, an estimation system for estimating lateral force and states is suggested. The state-space equation is constructed based on the 3-DOF bicycle model. Extended Kalman Filter, Unscented Kalman Filter and Ensemble Kalman Filter are used for estimating states on the nonlinear system. Performance of each algorithm is evaluated in terms of RMSE (Root Mean Square Error) and maximum error.

Forecasting Day-ahead Electricity Price Using a Hybrid Improved Approach

  • Hu, Jian-Ming;Wang, Jian-Zhou
    • Journal of Electrical Engineering and Technology
    • /
    • v.12 no.6
    • /
    • pp.2166-2176
    • /
    • 2017
  • Electricity price prediction plays a crucial part in making the schedule and managing the risk to the competitive electricity market participants. However, it is a difficult and challenging task owing to the characteristics of the nonlinearity, non-stationarity and uncertainty of the price series. This study proposes a hybrid improved strategy which incorporates data preprocessor components and a forecasting engine component to enhance the forecasting accuracy of the electricity price. In the developed forecasting procedure, the Seasonal Adjustment (SA) method and the Ensemble Empirical Mode Decomposition (EEMD) technique are synthesized as the data preprocessing component; the Coupled Simulated Annealing (CSA) optimization method and the Least Square Support Vector Regression (LSSVR) algorithm construct the prediction engine. The proposed hybrid approach is verified with electricity price data sampled from the power market of New South Wales in Australia. The simulation outcome manifests that the proposed hybrid approach obtains the observable improvement in the forecasting accuracy compared with other approaches, which suggests that the proposed combinational approach occupies preferable predication ability and enough precision.

An Ensemble Model for Machine Failure Prediction (앙상블 모델 기반의 기계 고장 예측 방법)

  • Cheon, Kang Min;Yang, Jaekyung
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.43 no.1
    • /
    • pp.123-131
    • /
    • 2020
  • There have been a lot of studies in the past for the method of predicting the failure of a machine, and recently, a lot of researches and applications have been generated to diagnose the physical condition of the machine and the parts and to calculate the remaining life through various methods. Survival models are also used to predict plant failures based on past anomaly cycles. In particular, special machine that reflect the fluid flow and process characteristics of chemical plants are connected to hundreds or thousands of sensors, so there are not many factors that need to be considered, such as process and material data as well as application of derivative variables. In this paper, the data were preprocessed through time series anomaly detection based on unsupervised learning to predict the abnormalities of these special machine. Next, clustering results reflecting clustering-based data characteristics were applied to produce additional variables, and a learning data set was created based on the history of past facility abnormalities. Finally, the prediction methodology based on the supervised learning algorithm was applied, and the model update was confirmed to improve the accuracy of the prediction of facility failure. Through this, it is expected to improve the efficiency of facility operation by flexibly replacing the maintenance time and parts supply and demand by predicting abnormalities of machine and extracting key factors.

Developing efficient model updating approaches for different structural complexity - an ensemble learning and uncertainty quantifications

  • Lin, Guangwei;Zhang, Yi;Liao, Qinzhuo
    • Smart Structures and Systems
    • /
    • v.29 no.2
    • /
    • pp.321-336
    • /
    • 2022
  • Model uncertainty is a key factor that could influence the accuracy and reliability of numerical model-based analysis. It is necessary to acquire an appropriate updating approach which could search and determine the realistic model parameter values from measurements. In this paper, the Bayesian model updating theory combined with the transitional Markov chain Monte Carlo (TMCMC) method and K-means cluster analysis is utilized in the updating of the structural model parameters. Kriging and polynomial chaos expansion (PCE) are employed to generate surrogate models to reduce the computational burden in TMCMC. The selected updating approaches are applied to three structural examples with different complexity, including a two-storey frame, a ten-storey frame, and the national stadium model. These models stand for the low-dimensional linear model, the high-dimensional linear model, and the nonlinear model, respectively. The performances of updating in these three models are assessed in terms of the prediction uncertainty, numerical efforts, and prior information. This study also investigates the updating scenarios using the analytical approach and surrogate models. The uncertainty quantification in the Bayesian approach is further discussed to verify the validity and accuracy of the surrogate models. Finally, the advantages and limitations of the surrogate model-based updating approaches are discussed for different structural complexity. The possibility of utilizing the boosting algorithm as an ensemble learning method for improving the surrogate models is also presented.

Strategies to improve the range verification of stochastic origin ensembles for low-count prompt gamma imaging

  • Hsuan-Ming Huang
    • Nuclear Engineering and Technology
    • /
    • v.55 no.10
    • /
    • pp.3700-3708
    • /
    • 2023
  • The stochastic origin ensembles method with resolution recovery (SOE-RR) has been proposed to reconstruct proton-induced prompt gammas (PGs), and the reconstructed PG image was used for range verification. However, due to low detection efficiency, the number of valid events is low. Such a low-count condition can degrade the accuracy of the SOE-RR method for proton range verification. In this study, we proposed two strategies to improve the reconstruction of the SOE-RR algorithm for low-count PG imaging. We also studied the number of iterations and repetitions required to achieve reliable range verification. We simulated a proton beam (108 protons) irradiated on a water phantom and used a two-layer Compton camera to detect 4.44-MeV PGs. Our simulated results show that combining the SOE-RR algorithm with restricted volume (SOE-RR-RV) can reduce the error of the estimation of the Bragg peak position from 5.0 mm to 2.5 mm. We also found that the SOE-RR-RV algorithm initialized using a back-projection image could improve the convergence rate while maintaining accurate range verification. Finally, we observed that the improved SOE-RR algorithm set for 60,000 iterations and 25 repetitions could provide reliable PG images. Based on the proposed reconstruction strategies, the SOE-RR algorithm has the potential to achieve a positioning error of 2.5 mm for low-count PG imaging.

Prediction of Track Quality Index (TQI) Using Vehicle Acceleration Data based on Machine Learning (차량가속도데이터를 이용한 머신러닝 기반의 궤도품질지수(TQI) 예측)

  • Choi, Chanyong;Kim, Hunki;Kim, Young Cheul;Kim, Sang-su
    • Journal of the Korean Geosynthetics Society
    • /
    • v.19 no.1
    • /
    • pp.45-53
    • /
    • 2020
  • There is an increasing tendency to try to make predictive analysis using measurement data based on machine learning techniques in the railway industries. In this paper, it was predicted that Track quality index (TQI) using vehicle acceleration data based on the machine learning method. The XGB (XGBoost) was the most accurate with 85% in the all data sets. Unlike the SVM model with a single algorithm, the RF and XGB model with a ensemble system were considered to be good at the prediction performance. In the case of the Surface TQI, it is shown that the acceleration of the z axis is highly related to the vertical direction and is in good agreement with the previous studies. Therefore, it is appropriate to apply the model with the ensemble algorithm to predict the track quality index using the vehicle vibration acceleration data because the accuracy may vary depending on the applied model in the machine learning methods.