• Title/Summary/Keyword: Ensemble Algorithm

Search Result 223, Processing Time 0.026 seconds

Comparing the Performance of 17 Machine Learning Models in Predicting Human Population Growth of Countries

  • Otoom, Mohammad Mahmood
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.1
    • /
    • pp.220-225
    • /
    • 2021
  • Human population growth rate is an important parameter for real-world planning. Common approaches rely upon fixed parameters like human population, mortality rate, fertility rate, which is collected historically to determine the region's population growth rate. Literature does not provide a solution for areas with no historical knowledge. In such areas, machine learning can solve the problem, but a multitude of machine learning algorithm makes it difficult to determine the best approach. Further, the missing feature is a common real-world problem. Thus, it is essential to compare and select the machine learning techniques which provide the best and most robust in the presence of missing features. This study compares 17 machine learning techniques (base learners and ensemble learners) performance in predicting the human population growth rate of the country. Among the 17 machine learning techniques, random forest outperformed all the other techniques both in predictive performance and robustness towards missing features. Thus, the study successfully demonstrates and compares machine learning techniques to predict the human population growth rate in settings where historical data and feature information is not available. Further, the study provides the best machine learning algorithm for performing population growth rate prediction.

A Computerized Doughty Predictor Framework for Corona Virus Disease: Combined Deep Learning based Approach

  • P, Ramya;Babu S, Venkatesh
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.2018-2043
    • /
    • 2022
  • Nowadays, COVID-19 infections are influencing our daily lives which have spread globally. The major symptoms' of COVID-19 are dry cough, sore throat, and fever which in turn to critical complications like multi organs failure, acute respiratory distress syndrome, etc. Therefore, to hinder the spread of COVID-19, a Computerized Doughty Predictor Framework (CDPF) is developed to yield benefits in monitoring the progression of disease from Chest CT images which will reduce the mortality rates significantly. The proposed framework CDPF employs Convolutional Neural Network (CNN) as a feature extractor to extract the features from CT images. Subsequently, the extracted features are fed into the Adaptive Dragonfly Algorithm (ADA) to extract the most significant features which will smoothly drive the diagnosing of the COVID and Non-COVID cases with the support of Doughty Learners (DL). This paper uses the publicly available SARS-CoV-2 and Github COVID CT dataset which contains 2482 and 812 CT images with two class labels COVID+ and COVI-. The performance of CDPF is evaluated against existing state of art approaches, which shows the superiority of CDPF with the diagnosis accuracy of about 99.76%.

Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction (앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향)

  • Kang, Byeong-Koo;Park, Jungsu
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.35 no.6
    • /
    • pp.417-424
    • /
    • 2021
  • Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

A Non-annotated Recurrent Neural Network Ensemble-based Model for Near-real Time Detection of Erroneous Sea Level Anomaly in Coastal Tide Gauge Observation (비주석 재귀신경망 앙상블 모델을 기반으로 한 조위관측소 해수위의 준실시간 이상값 탐지)

  • LEE, EUN-JOO;KIM, YOUNG-TAEG;KIM, SONG-HAK;JU, HO-JEONG;PARK, JAE-HUN
    • The Sea:JOURNAL OF THE KOREAN SOCIETY OF OCEANOGRAPHY
    • /
    • v.26 no.4
    • /
    • pp.307-326
    • /
    • 2021
  • Real-time sea level observations from tide gauges include missing and erroneous values. Classification as abnormal values can be done for the latter by the quality control procedure. Although the 3𝜎 (three standard deviations) rule has been applied in general to eliminate them, it is difficult to apply it to the sea-level data where extreme values can exist due to weather events, etc., or where erroneous values can exist even within the 3𝜎 range. An artificial intelligence model set designed in this study consists of non-annotated recurrent neural networks and ensemble techniques that do not require pre-labeling of the abnormal values. The developed model can identify an erroneous value less than 20 minutes of tide gauge recording an abnormal sea level. The validated model well separates normal and abnormal values during normal times and weather events. It was also confirmed that abnormal values can be detected even in the period of years when the sea level data have not been used for training. The artificial neural network algorithm utilized in this study is not limited to the coastal sea level, and hence it can be extended to the detection model of erroneous values in various oceanic and atmospheric data.

Development of an Ensemble-Based Multi-Region Integrated Odor Concentration Prediction Model (앙상블 기반의 악취 농도 다지역 통합 예측 모델 개발)

  • Seong-Ju Cho;Woo-seok Choi;Sang-hyun Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.383-400
    • /
    • 2023
  • Air pollution-related diseases are escalating worldwide, with the World Health Organization (WHO) estimating approximately 7 million annual deaths in 2022. The rapid expansion of industrial facilities, increased emissions from various sources, and uncontrolled release of odorous substances have brought air pollution to the forefront of societal concerns. In South Korea, odor is categorized as an independent environmental pollutant, alongside air and water pollution, directly impacting the health of local residents by causing discomfort and aversion. However, the current odor management system in Korea remains inadequate, necessitating improvements. This study aims to enhance the odor management system by analyzing 1,010,749 data points collected from odor sensors located in Osong, Chungcheongbuk-do, using an Ensemble-Based Multi-Region Integrated Odor Concentration Prediction Model. The research results demonstrate that the model based on the XGBoost algorithm exhibited superior performance, with an RMSE of 0.0096, significantly outperforming the single-region model (0.0146) with a 51.9% reduction in mean error size. This underscores the potential for increasing data volume, improving accuracy, and enabling odor prediction in diverse regions using a unified model through the standardization of odor concentration data collected from various regions.

Boosting Algorithms for Large-Scale Data and Data Batch Stream (대용량 자료와 순차적 자료를 위한 부스팅 알고리즘)

  • Yoon, Young-Joo
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.1
    • /
    • pp.197-206
    • /
    • 2010
  • In this paper, we propose boosting algorithms when data are very large or coming in batches sequentially over time. In this situation, ordinary boosting algorithm may be inappropriate because it requires the availability of all of the training set at once. To apply to large scale data or data batch stream, we modify the AdaBoost and Arc-x4. These algorithms have good results for both large scale data and data batch stream with or without concept drift on simulated data and real data sets.

Uncertainty decomposition in climate-change impact assessments: a Bayesian perspective

  • Ohn, Ilsang;Seo, Seung Beom;Kim, Seonghyeon;Kim, Young-Oh;Kim, Yongdai
    • Communications for Statistical Applications and Methods
    • /
    • v.27 no.1
    • /
    • pp.109-128
    • /
    • 2020
  • A climate-impact projection usually consists of several stages, and the uncertainty of the projection is known to be quite large. It is necessary to assess how much each stage contributed to the uncertainty. We call an uncertainty quantification method in which relative contribution of each stage can be evaluated as uncertainty decomposition. We propose a new Bayesian model for uncertainty decomposition in climate change impact assessments. The proposed Bayesian model can incorporate uncertainty of natural variability and utilize data in control period. We provide a simple and efficient Gibbs sampling algorithm using the auxiliary variable technique. We compare the proposed method with other existing uncertainty decomposition methods by analyzing streamflow data for Yongdam Dam basin located at Geum River in South Korea.

Optimal Classifier Ensemble for Lymphoma Cancer Using Genetic Algorithm (유전자 알고리즘을 이용한 림프종 암의 최적 분류기 앙상블)

  • 박찬호;조성배
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.04c
    • /
    • pp.356-358
    • /
    • 2003
  • DNA microarray기술의 발달로 한꺼번에 수천 개 유전자의 발현 정보를 얻는 것이 가능해졌는데, 이렇게 얻어진 데이터를 효과적으로 분류하는 시스템을 만들어놓으면 새로운 샘플이 정상상태인지, 질병을 가진 상태인지 예측할 수 있다. 분류 시스템을 위하여 여러 가지 특징선택방법들과 분류기법들을 사용할 수 있는데, 모든 상황에서 항상 뛰어난 성능을 보이는 특징선택법이나 분류기를 찾기는 힘들다. 안정되고 개선된 성능을 내기 위해서 특징-분류기의 앙상블을 이용할 수 있는데, 앙상블에 이용될 수 있는 특징선택 방법이나 분류기의 수가 많다면, 앙상블을 만들 수 있는 조합이 많아지기 때문에, 모든 조합에 대하여 앙상블 결과를 구하기는 거의 불가능하다. 이를 해결하기 위하여 본 논문에서는 유전자알고리즘을 이용하여 모든 앙상블 결과를 계산하지 않으면서 최적의 앙상블을 찾아내는 방법을 제안하였으며, 실제로 림프종 암 데이터에 적용한 결과 100%의 결합결과를 보이는 최적의 앙상블을 효과적으로 찾아내었다.

  • PDF

A Study on Designing the System of Vital and Environmental Sensor for Future Soldier System (미래병사 생체환경센서 시스템 설계에 관한 연구)

  • Kim, Hyun-Jun;Chae, Je-Wook;Choe, Eui-Jung
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.16 no.3
    • /
    • pp.233-239
    • /
    • 2013
  • This paper includes the algorithm of eliminating noise, the processing technique of sensor and the results of designing vital and environmental sensor, one of the survivability subsystem of Future Soldier System. In this paper, we propose the adaptive filtering, moving noise removal in order to detect signals stabilized. And these help that we get bio-signals the ECG calculating methods such as search back and ensemble method. It is made up the vital and environmental sensor including the flexible sensor. In that sense, this study can be applied when it is planned the modular type Future Soldier System.

Analysis of Mechanical Behavior of Nanowire by $Nos\acute{e}-Poincar\acute{e}$ Molecular Dynamics Simulation ($Nos\acute{e}-Poincar\acute{e}$ 분자 동역학 알고리즘을 이용한 나노 와이어의 역학적 거동 해석)

  • Lee, Byeong-Yong;Cho, Maeng-Hyo
    • Proceedings of the KSME Conference
    • /
    • 2007.05a
    • /
    • pp.506-511
    • /
    • 2007
  • Mechanical behavior of copper nanowire is investigated. An FCC nanowire model composed of 1,408 atoms is used for MD simulation. Simulations are performed within NVT ensemble setting without periodic boundary conditions. $Nos\acute{e}-Poincar\acute{e}$ MD algorithm is employed to guarantee preservation of Hamiltonian and temperature. Numerical tensile tests of Nanowire are carried out with constant strain rate. Additionally, temperature and strain rate effects are considered. Stress-strain curve is constructed from the calculated Cauchy stresses and specified strain values. In (22,4,4) Copper nanowire, non-linear behavior appears around ${\epsilon}\simeq0.09.$ At this instance, starting of structural reorientations are observed. At the onset of reorientation, the modulus characteristics are also investigated.

  • PDF