• Title/Summary/Keyword: Ensemble system

Search Result 373, Processing Time 0.029 seconds

Prediction and Analysis of PM2.5 Concentration in Seoul Using Ensemble-based Model (앙상블 기반 모델을 이용한 서울시 PM2.5 농도 예측 및 분석)

  • Ryu, Minji;Son, Sanghun;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.6_1
    • /
    • pp.1191-1205
    • /
    • 2022
  • Particulate matter(PM) among air pollutants with complex and widespread causes is classified according to particle size. Among them, PM2.5 is very small in size and can cause diseases in the human respiratory tract or cardiovascular system if inhaled by humans. In order to prepare for these risks, state-centered management and preventable monitoring and forecasting are important. This study tried to predict PM2.5 in Seoul, where high concentrations of fine dust occur frequently, using two ensemble models, random forest (RF) and extreme gradient boosting (XGB) using 15 local data assimilation and prediction system (LDAPS) weather-related factors, aerosol optical depth (AOD) and 4 chemical factors as independent variables. Performance evaluation and factor importance evaluation of the two models used for prediction were performed, and seasonal model analysis was also performed. As a result of prediction accuracy, RF showed high prediction accuracy of R2 = 0.85 and XGB R2 = 0.91, and it was confirmed that XGB was a more suitable model for PM2.5 prediction than RF. As a result of the seasonal model analysis, it can be said that the prediction performance was good compared to the observed values with high concentrations in spring. In this study, PM2.5 of Seoul was predicted using various factors, and an ensemble-based PM2.5 prediction model showing good performance was constructed.

Development of Impact-based Heat Health Warning System Based on Ensemble Forecasts of Perceived Temperature and its Evaluation using Heat-Related Patients in 2019 (인지온도 확률예보기반 폭염-건강영향예보 지원시스템 개발 및 2019년 온열질환자를 이용한 평가)

  • Kang, Misun;Belorid, Miloslav;Kim, Kyu Rang
    • Atmosphere
    • /
    • v.30 no.2
    • /
    • pp.195-207
    • /
    • 2020
  • This study aims to introduce the structure of the impact-based heat health warning system on 165 counties in South Korea developed by the National Institute of Meteorological Sciences. This system was developed using the daily maximum perceived temperature (PTmax), which is a human physiology-based thermal comfort index, and the Local ENSemble prediction system for the probability forecasts. Also, A risk matrix proposed by the World Meteorological Organization was employed for the impact-based forecasts of this system. The threshold value of the risk matrix was separately set depending on regions. In this system, the risk level was issued as four levels (GREEN, YELLOW, ORANGE, RED) for first, second, and third forecast lead-day (LD1, LD2, and LD3). The daily risk level issued by the system was evaluated using emergency heat-related patients obtained at six cities, including Seoul, Incheon, Daejeon, Gwangju, Daegu, and Busan, for LD1 to LD3. The high-risks level occurred more consistently in the shorter lead time (LD3 → LD1) and the performance (rs) was increased from 0.42 (LD3) to 0.45 (LD1) in all cities. Especially, it showed good performance (rs = 0.51) in July and August, when heat stress is highest in South Korea. From an impact-based forecasting perspective, PTmax is one of the most suitable temperature indicators for issuing the health risk warnings by heat in South Korea.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Development of Product Recommender System using Collaborative Filtering and Stacking Model (협업필터링과 스태킹 모형을 이용한 상품추천시스템 개발)

  • Park, Sung-Jong;Kim, Young-Min;Ahn, Jae-Joon
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.6
    • /
    • pp.83-90
    • /
    • 2019
  • People constantly strive for better choices. For this reason, recommender system has been developed since the early 1990s. In particular, collaborative filtering technique has shown excellent performance in the field of recommender systems, and research of recommender system using machine learning has been actively conducted. This study constructs recommender system using collaborative filtering and machine learning based on stacking model which is one of ensemble methods. The results of this study confirm that the recommender system with the stacking model is useful in aspects of recommender performance. In the future, the model proposed in this study is expected to help individuals or firms to make better choices.

A Personalized Recommendation System Using Machine Learning for Performing Arts Genre (머신러닝을 이용한 공연문화예술 개인화 장르 추천 시스템)

  • Hyung Su Kim;Yerin Bak;Jeongmin Lee
    • Information Systems Review
    • /
    • v.21 no.4
    • /
    • pp.31-45
    • /
    • 2019
  • Despite the expansion of the market of performing arts and culture, small and medium size theaters are still experiencing difficulties due to poor accessibility of information by consumers. This study proposes a machine learning based genre recommendation system as an alternative to enhance the marketing capability of small and medium sized theaters. We developed five recommendation systems that recommend three genres per customer using customer master DB and transaction history DB of domestic venues. We propose an optimal recommendation system by comparing performances of recommendation system. As a result, the recommendation system based on the ensemble model showed better performance than the single predictive model. This study applied the personalized recommendation technique which was scarce in the field of performing arts and culture, and suggests that it is worthy enough to use it in the field of performing arts and culture.

A study on the nonlinearity in bio-logical systems using approximate entropy and correlation dimension (근사엔트로피와 상관차원을 이용한 비선형 신호의 분석)

  • Lee, Hae-Jin;Choi, Won-Young;Cha, Kyung-Joon;Park, Moon-Il;Oh, Jae-Eung
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.11a
    • /
    • pp.760-763
    • /
    • 2007
  • We studied how linear and nonlinear heart rate dynamics differ between normal fetuses and uncomplicated small-forgestational age (SGA) fetuses, aged 32-40 weeks' gestation. We analyzed each fetal heart rate time series for 20 min and quantified the complexity (nonlinear dynamics) of each fetal heart rate (FHR) time series by approximate entropy (ApEn) and correlation dimension (CD). The linear dynamics were analyzed by canonical correlation analysis (CCA). The ApEn and CD of the uncomplicated SGA fetuses were significantly lower than that of the normal fetuses in all three gestational periods (32-34, 35-37, 38-40 weeks). Canonical correlation ensemble in SGA fetuses is slightly higher than normal ones in all three gestational periods, especially at 35-37 weeks. Irregularity and complexity of the heart rate dynamics of SGA fetuses are lower than that of normal ones. Also, canonical ensemble in SGA fetuses is higher than in normal ones, suggesting that the FHR control system has multiple complex interactions. Along with the clear difference between the two groups' non-linear chaotic dynamics in FHR patterns, we clarified the hidden subtle differences in linearity (e.g. canonical ensemble). The decrease in non-linear dynamics may contribute to the increase in linear dynamics. The present statistical methodology can be readily and routinely utilized in Obstetrics and Gynecologic fields.

  • PDF

Response of Terrestrial Carbon Cycle: Climate Variability in CarbonTracker and CMIP5 Earth System Models (기후 인자와 관련된 육상 탄소 순환 변동: 탄소추적시스템과 CMIP5 모델 결과 비교)

  • Sun, Minah;Kim, Youngmi;Lee, Johan;Boo, Kyoung-On;Byun, Young-Hwa;Cho, Chun-Ho
    • Atmosphere
    • /
    • v.27 no.3
    • /
    • pp.301-316
    • /
    • 2017
  • This study analyzes the spatio-temporal variability of terrestrial carbon flux and the response of land carbon sink with climate factors to improve of understanding of the variability of land-atmosphere carbon exchanges accurately. The coupled carbon-climate models of CMIP5 (the fifth phase of the Coupled Model Intercomparison Project) and CT (CarbonTracker) are used. The CMIP5 multi-model ensemble mean overestimated the NEP (Net Ecosystem Production) compares to CT and GCP (Global Carbon Project) estimates over the period 2001~2012. Variation of NEP in the CMIP5 ensemble mean is similar to CT, but a couple of models which have fire module without nitrogen cycle module strongly simulate carbon sink in the Africa, Southeast Asia, South America, and some areas of the United States. Result in comparison with climate factor, the NEP is highly affected by temperature and solar radiation in both of CT and CMIP5. Partial correlation between temperature and NEP indicates that the temperature is affecting NEP positively at higher than mid-latitudes in the Northern Hemisphere, but opposite correlation represents at other latitudes in CT and most CMIP5 models. The CMIP5 models except for few models show positive correlation with precipitation at $30^{\circ}N{\sim}90^{\circ}N$, but higher percentage of negative correlation represented at $60^{\circ}S{\sim}30^{\circ}N$ compare to CT. For each season, the correlation between temperature (solar radiation) and NEP in the CMIP5 ensemble mean is similar to that of CT, but overestimated.

Estimating Korean Pine(Pinus koraiensis) Habitat Distribution Considering Climate Change Uncertainty - Using Species Distribution Models and RCP Scenarios - (불확실성을 고려한 미래 잣나무의 서식 적지 분포 예측 - 종 분포 모형과 RCP시나리오를 중심으로 -)

  • Ahn, Yoonjung;Lee, Dong-Kun;Kim, Ho Gul;Park, Chan;Kim, Jiyeon;Kim, Jae-uk
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.3
    • /
    • pp.51-64
    • /
    • 2015
  • Climate change will make significant impact on species distribution in forest. Pinus koraiensis which is commonly called as Korean Pine is normally distributed in frigid zones. Climate change which causes severe heat could affect distribution of Korean pine. Therefore, this study predicted the distribution of Korean Pine and the suitable habitat area with consideration on uncertainty by applying climate change scenarios on an ensemble model. First of all, a site index was considered when selecting present and absent points and a stratified method was used to select the points. Secondly, environmental and climate variables were chosen by literature review and then confirmed with experts. Those variables were used as input data of BIOMOD2. Thirdly, the present distribution model was made. The result was validated with ROC. Lastly, RCP scenarios were applied on the models to create the future distribution model. As a results, each individual model shows quite big differences in the results but generally most models and ensemble models estimated that the suitable habitat area would be decreased in midterm future(40s) as well as long term future(90s).

Cancer Diagnosis System using Genetic Algorithm and Multi-boosting Classifier (Genetic Algorithm과 다중부스팅 Classifier를 이용한 암진단 시스템)

  • Ohn, Syng-Yup;Chi, Seung-Do
    • Journal of the Korea Society for Simulation
    • /
    • v.20 no.2
    • /
    • pp.77-85
    • /
    • 2011
  • It is believed that the anomalies or diseases of human organs are identified by the analysis of the patterns. This paper proposes a new classification technique for the identification of cancer disease using the proteome patterns obtained from two-dimensional polyacrylamide gel electrophoresis(2-D PAGE). In the new classification method, three different classification methods such as support vector machine(SVM), multi-layer perceptron(MLP) and k-nearest neighbor(k-NN) are extended by multi-boosting method in an array of subclassifiers and the results of each subclassifier are merged by ensemble method. Genetic algorithm was applied to obtain optimal feature set in each subclassifier. We applied our method to empirical data set from cancer research and the method showed the better accuracy and more stable performance than single classifier.

A Study on the Accuracy Improvement of Movie Recommender System Using Word2Vec and Ensemble Convolutional Neural Networks (Word2Vec과 앙상블 합성곱 신경망을 활용한 영화추천 시스템의 정확도 개선에 관한 연구)

  • Kang, Boo-Sik
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.123-130
    • /
    • 2019
  • One of the most commonly used methods of web recommendation techniques is collaborative filtering. Many studies on collaborative filtering have suggested ways to improve accuracy. This study proposes a method of movie recommendation using Word2Vec and an ensemble convolutional neural networks. First, in the user, movie, and rating information, construct the user sentences and movie sentences. It inputs user sentences and movie sentences into Word2Vec to obtain user vectors and movie vectors. User vectors are entered into user convolution model and movie vectors are input to movie convolution model. The user and the movie convolution models are linked to a fully connected neural network model. Finally, the output layer of the fully connected neural network outputs forecasts of user movie ratings. Experimentation results showed that the accuracy of the technique proposed in this study accuracy of conventional collaborative filtering techniques was improved compared to those of conventional collaborative filtering technique and the technique using Word2Vec and deep neural networks proposed in a similar study.