• Title/Summary/Keyword: Ensemble prediction

Search Result 382, Processing Time 0.027 seconds

Particular Matter Concentration Prediction Models Based on EEMD (EEMD 기반의 미세먼지 농도 예측 모델)

  • Jung, Yong-jin;Lee, Jong-sung;Oh, Chang-heon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.345-347
    • /
    • 2021
  • Various studies are being conducted to improve the accuracy of fine dust, but there is a problem that deep learning models are not well learned due to various characteristics according to the concentration of fine dust. This paper proposes an EEMD-based fine dust concentration prediction model to decompose the characteristics of fine dust concentration and reflect the characteristics. After decomposing the fine dust concentration through EEMD, the final fine dust concentration value is derived by ensemble of the prediction results according to the characteristics derived from each. As a result of the model's performance evaluation, 91.7% of the fine dust concentration prediction accuracy was confirmed.

  • PDF

A Real-Time Data Mining for Stream Data Sets (연속발생 데이터를 위한 실시간 데이터 마이닝 기법)

  • Kim Jinhwa;Min Jin Young
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.29 no.4
    • /
    • pp.41-60
    • /
    • 2004
  • A stream data is a data set that is accumulated to the data storage from a data source over time continuously. The size of this data set, in many cases. becomes increasingly large over time. To mine information from this massive data. it takes much resource such as storage, memory and time. These unique characteristics of the stream data make it difficult and expensive to use this large size data accumulated over time. Otherwise. if we use only recent or part of a whole data to mine information or pattern. there can be loss of information. which may be useful. To avoid this problem. we suggest a method that efficiently accumulates information. in the form of rule sets. over time. It takes much smaller storage compared to traditional mining methods. These accumulated rule sets are used as prediction models in the future. Based on theories of ensemble approaches. combination of many prediction models. in the form of systematically merged rule sets in this study. is better than one prediction model in performance. This study uses a customer data set that predicts buying power of customers based on their information. This study tests the performance of the suggested method with the data set alone with general prediction methods and compares performances of them.

An Accurate Stock Price Forecasting with Ensemble Learning Based on Sentiment of News (뉴스 감성 앙상블 학습을 통한 주가 예측기의 성능 향상)

  • Kim, Ha-Eun;Park, Young-Wook;Yoo, Si-eun;Jeong, Seong-Woo;Yoo, Joonhyuk
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.1
    • /
    • pp.51-58
    • /
    • 2022
  • Various studies have been conducted from the past to the present because stock price forecasts provide stability in the national economy and huge profits to investors. Recently, there have been many studies that suggest stock price prediction models using various input data such as macroeconomic indicators and emotional analysis. However, since each study was conducted individually, it is difficult to objectively compare each method, and studies on their impact on stock price prediction are still insufficient. In this paper, the effect of input data currently mainly used on the stock price is evaluated through the predicted value of the deep learning model and the error rate of the actual stock price. In addition, unlike most papers in emotional analysis, emotional analysis using the news body was conducted, and a method of supplementing the results of each emotional analysis is proposed through three emotional analysis models. Through experiments predicting Microsoft's revised closing price, the results of emotional analysis were found to be the most important factor in stock price prediction. Especially, when all of input data is used, error rate of ensembled sentiment analysis model is reduced by 58% compared to the baseline.

Predicting Administrative Issue Designation in KOSDAQ Market Using Machine Learning Techniques (머신러닝을 활용한 코스닥 관리종목지정 예측)

  • Chae, Seung-Il;Lee, Dong-Joo
    • Asia-Pacific Journal of Business
    • /
    • v.13 no.2
    • /
    • pp.107-122
    • /
    • 2022
  • Purpose - This study aims to develop machine learning models to predict administrative issue designation in KOSDAQ Market using financial data. Design/methodology/approach - Employing four classification techniques including logistic regression, support vector machine, random forest, and gradient boosting to a matched sample of five hundred and thirty-six firms over an eight-year period, the authors develop prediction models and explore the practicality of the models. Findings - The resulting four binary selection models reveal overall satisfactory classification performance in terms of various measures including AUC (area under the receiver operating characteristic curve), accuracy, F1-score, and top quartile lift, while the ensemble models (random forest and gradienct boosting) outperform the others in terms of most measures. Research implications or Originality - Although the assessment of administrative issue potential of firms is critical information to investors and financial institutions, detailed empirical investigation has lagged behind. The current research fills this gap in the literature by proposing parsimonious prediction models based on a few financial variables and validating the applicability of the models.

An Exploratory Study on the Prediction of Business Survey Index Using Data Mining (기업경기실사지수 예측에 대한 탐색적 연구: 데이터 마이닝을 이용하여)

  • Kyungbo Park;Mi Ryang Kim
    • Journal of Information Technology Services
    • /
    • v.22 no.4
    • /
    • pp.123-140
    • /
    • 2023
  • In recent times, the global economy has been subject to increasing volatility, which has made it considerably more difficult to accurately predict economic indicators compared to previous periods. In response to this challenge, the present study conducts an exploratory investigation that aims to predict the Business Survey Index (BSI) by leveraging data mining techniques on both structured and unstructured data sources. For the structured data, we have collected information regarding foreign, domestic, and industrial conditions, while the unstructured data consists of content extracted from newspaper articles. By employing an extensive set of 44 distinct data mining techniques, our research strives to enhance the BSI prediction accuracy and provide valuable insights. The results of our analysis demonstrate that the highest predictive power was attained when using data exclusively from the t-1 period. Interestingly, this suggests that previous timeframes play a vital role in forecasting the BSI effectively. The findings of this study hold significant implications for economic decision-makers, as they will not only facilitate better-informed decisions but also serve as a robust foundation for predicting a wide range of other economic indicators. By improving the prediction of crucial economic metrics, this study ultimately aims to contribute to the overall efficacy of economic policy-making and decision processes.

Machine learning application to seismic site classification prediction model using Horizontal-to-Vertical Spectral Ratio (HVSR) of strong-ground motions

  • Francis G. Phi;Bumsu Cho;Jungeun Kim;Hyungik Cho;Yun Wook Choo;Dookie Kim;Inhi Kim
    • Geomechanics and Engineering
    • /
    • v.37 no.6
    • /
    • pp.539-554
    • /
    • 2024
  • This study explores development of prediction model for seismic site classification through the integration of machine learning techniques with horizontal-to-vertical spectral ratio (HVSR) methodologies. To improve model accuracy, the research employs outlier detection methods and, synthetic minority over-sampling technique (SMOTE) for data balance, and evaluates using seven machine learning models using seismic data from KiK-net. Notably, light gradient boosting method (LGBM), gradient boosting, and decision tree models exhibit improved performance when coupled with SMOTE, while Multiple linear regression (MLR) and Support vector machine (SVM) models show reduced efficacy. Outlier detection techniques significantly enhance accuracy, particularly for LGBM, gradient boosting, and voting boosting. The ensemble of LGBM with the isolation forest and SMOTE achieves the highest accuracy of 0.91, with LGBM and local outlier factor yielding the highest F1-score of 0.79. Consistently outperforming other models, LGBM proves most efficient for seismic site classification when supported by appropriate preprocessing procedures. These findings show the significance of outlier detection and data balancing for precise seismic soil classification prediction, offering insights and highlighting the potential of machine learning in optimizing site classification accuracy.

Applicability Assessment of Hydrological Drought Outlook Using ESP Method (ESP 기법을 이용한 수문학적 가뭄전망의 활용성 평가)

  • Son, Kyung Hwan;Bae, Deg Hyo
    • Journal of Korea Water Resources Association
    • /
    • v.48 no.7
    • /
    • pp.581-593
    • /
    • 2015
  • This study constructs the drought outlook system using ESP(Ensemble Streamflow Prediction) method and evaluates its utilization for drought prediction. Historical Runoff(HR) was estimated by employing LSM(Land Surface Model) and the observed meteorological, hydrological and topographical data in South Korea. Also Predicted Runoff(PR) was produced for different lead times(i.e. 1-, 2-, 3-month) using 30-year past meteorological data and the initial soil moisture condition. The HR accuracy was higher during MAM, DJF than JJA, SON, and the prediction accuracy was highly decreased after 1 month outlook. SRI(Standardized Runoff Index) verified for the feasibility of domestic drought analysis was used for drought outlook, and PR_SRI was evaluated. The accuracy of PR_SRI with lead times of 1- and 2-month was highly increased as it considered the accumulated 1- and 2-month HR, respectively. The Correlation Coefficient(CC) was 0.71, 0.48, 0.00, and Root Mean Square Error(RMSE) was 0.46, 0.76, 1.01 for 1-, 2- and 3-month lead times, respectively, and the accuracy was higher in arid season. It is concluded that ESP method is applicable to domestic drought prediction up to 1- and 2-month lead times.

Mobile health service user characteristics analysis and churn prediction model development (모바일 헬스 서비스 사용자 특성 분석 및 이탈 예측 모델 개발)

  • Han, Jeong Hyeon;Lee, Joo Yeoun
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.17 no.2
    • /
    • pp.98-105
    • /
    • 2021
  • As the average life expectancy is rising, the population is aging and the number of chronic diseases is increasing. This has increased the importance of healthy life and health management, and interest in mobile health services is on the rise thanks to the development of ICT(Information and communication technologies) and the smartphone use expansion. In order to meet these interests, many mobile services related to daily health are being launched in the market. Therefore, in this study, the characteristics of users who actually use mobile health services were analyzed and a predictive model applied with machine learning modeling was developed. As a result of the study, we developed a prediction model to which the decision tree and ensemble methods were applied. And it was found that the mobile health service users' continued use can be induced by providing features that require frequent visit, suggesting achievable activity missions, and guiding the sensor connection for user's activity measurement.

Early Detection of Rice Leaf Blast Disease using Deep-Learning Techniques

  • Syed Rehan Shah;Syed Muhammad Waqas Shah;Hadia Bibi;Mirza Murad Baig
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.4
    • /
    • pp.211-221
    • /
    • 2024
  • Pakistan is a top producer and exporter of high-quality rice, but traditional methods are still being used for detecting rice diseases. This research project developed an automated rice blast disease diagnosis technique based on deep learning, image processing, and transfer learning with pre-trained models such as Inception V3, VGG16, VGG19, and ResNet50. The modified connection skipping ResNet 50 had the highest accuracy of 99.16%, while the other models achieved 98.16%, 98.47%, and 98.56%, respectively. In addition, CNN and an ensemble model K-nearest neighbor were explored for disease prediction, and the study demonstrated superior performance and disease prediction using recommended web-app approaches.

Identifying the Optimal Machine Learning Algorithm for Breast Cancer Prediction

  • ByungJoo Kim
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.80-88
    • /
    • 2024
  • Breast cancer remains a significant global health burden, necessitating accurate and timely detection for improved patient outcomes. Machine learning techniques have demonstrated remarkable potential in assisting breast cancer diagnosis by learning complex patterns from multi-modal patient data. This study comprehensively evaluates several popular machine learning models, including logistic regression, decision trees, random forests, support vector machines (SVMs), naive Bayes, k-nearest neighbors (KNN), XGBoost, and ensemble methods for breast cancer prediction using the Wisconsin Breast Cancer Dataset (WBCD). Through rigorous benchmarking across metrics like accuracy, precision, recall, F1-score, and area under the ROC curve (AUC), we identify the naive Bayes classifier as the top-performing model, achieving an accuracy of 0.974, F1-score of 0.979, and highest AUC of 0.988. Other strong performers include logistic regression, random forests, and XGBoost, with AUC values exceeding 0.95. Our findings showcase the significant potential of machine learning, particularly the robust naive Bayes algorithm, to provide highly accurate and reliable breast cancer screening from fine needle aspirate (FNA) samples, ultimately enabling earlier intervention and optimized treatment strategies.