• Title/Summary/Keyword: ensemble training

Search Result 126, Processing Time 0.028 seconds

Investigating Data Preprocessing Algorithms of a Deep Learning Postprocessing Model for the Improvement of Sub-Seasonal to Seasonal Climate Predictions (계절내-계절 기후예측의 딥러닝 기반 후보정을 위한 입력자료 전처리 기법 평가)

  • Uran Chung;Jinyoung Rhee;Miae Kim;Soo-Jin Sohn
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.25 no.2
    • /
    • pp.80-98
    • /
    • 2023
  • This study explores the effectiveness of various data preprocessing algorithms for improving subseasonal to seasonal (S2S) climate predictions from six climate forecast models and their Multi-Model Ensemble (MME) using a deep learning-based postprocessing model. A pipeline of data transformation algorithms was constructed to convert raw S2S prediction data into the training data processed with several statistical distribution. A dimensionality reduction algorithm for selecting features through rankings of correlation coefficients between the observed and the input data. The training model in the study was designed with TimeDistributed wrapper applied to all convolutional layers of U-Net: The TimeDistributed wrapper allows a U-Net convolutional layer to be directly applied to 5-dimensional time series data while maintaining the time axis of data, but every input should be at least 3D in U-Net. We found that Robust and Standard transformation algorithms are most suitable for improving S2S predictions. The dimensionality reduction based on feature selections did not significantly improve predictions of daily precipitation for six climate models and even worsened predictions of daily maximum and minimum temperatures. While deep learning-based postprocessing was also improved MME S2S precipitation predictions, it did not have a significant effect on temperature predictions, particularly for the lead time of weeks 1 and 2. Further research is needed to develop an optimal deep learning model for improving S2S temperature predictions by testing various models and parameters.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Predicting the Potential Distribution of Korean Pine (Pinus koraiensis) Using an Ensemble of Climate Scenarios (앙상블 기후 시나리오 자료를 활용한 우리나라 잣나무림 분포 적지 전망)

  • Kim, Jaeuk;Jung, Huicheul;Jeon, Seong Woo;Lee, Dong-Kun
    • Journal of the Korean Society of Environmental Restoration Technology
    • /
    • v.18 no.2
    • /
    • pp.79-88
    • /
    • 2015
  • Preparations need to be made for Korean pine(Pinus koraiensis) in anticipation of climate change because Korean pine is an endemic species of South Korea and the source of timber and pine nut. Therefore, climate change adaptation policy has been established to conduct an impact assessment on the distribution of Korean pine. Our objective was to predict the distribution of Korean pine while taking into account uncertainty and afforestation conditions. We used the 5th forest types map, a forest site map and BIOCLIM variables. The climate scenarios are RCP 4.5 and RCP 8.5 for uncertainty and the climate models are 5 regional climate models (HadGEM3RA, RegCM4, SNURCM, GRIMs, WRF). The base period for this study is 1971 to 2000. The target periods are the mid-21st century (2021-2050) and the end of the 21st century (2071-2100). This study used the MaxEnt model, and 50% of the presences were randomly set as training data. The remaining 50% were used as test data, and 10 cross-validated replicates were run. The selected variables were the annual mean temperature (Bio1), the precipitation of the wettest month (Bio13) and the precipitation of the driest month (Bio14). The test data's ROC curve of Korean pine was 0.689. The distribution of Korean pine in the mid-21st century decreased from 11.9% to 37.8% on RCP 4.5 and RCP 8.5. The area of Korean pine at an artificial plantation occupied from 32.1% to 45.4% on both RCPs. The areas at the end of the 21st century declined by 53.9% on RCP 4.5 and by 86.0% on RCP 8.5. The area of Korean pine at an artificial plantation occupied 23.8% on RCP 4.5 and 7.2% on RCP 8.5. Private forests showed more of a decrease than national forests for all subsequent periods. Our results may contribute to the establishment of climate change adaptation policies for considering various adaptation options.

Machine Learning Based Structural Health Monitoring System using Classification and NCA (분류 알고리즘과 NCA를 활용한 기계학습 기반 구조건전성 모니터링 시스템)

  • Shin, Changkyo;Kwon, Hyunseok;Park, Yurim;Kim, Chun-Gon
    • Journal of Advanced Navigation Technology
    • /
    • v.23 no.1
    • /
    • pp.84-89
    • /
    • 2019
  • This is a pilot study of machine learning based structural health monitoring system using flight data of composite aircraft. In this study, the most suitable machine learning algorithm for structural health monitoring was selected and dimensionality reduction method for application on the actual flight data was conducted. For these tasks, impact test on the cantilever beam with added mass, which is the simulation of damage in the aircraft wing structure was conducted and classification model for damage states (damage location and level) was trained. Through vibration test of cantilever beam with fiber bragg grating (FBG) sensor, data of normal and 12 damaged states were acquired, and the most suitable algorithm was selected through comparison between algorithms like tree, discriminant, support vector machine (SVM), kNN, ensemble. Besides, through neighborhood component analysis (NCA) feature selection, dimensionality reduction which is necessary to deal with high dimensional flight data was conducted. As a result, quadratic SVMs performed best with 98.7% for without NCA and 95.9% for with NCA. It is also shown that the application of NCA improved prediction speed, training time, and model memory.

Radar rainfall prediction based on deep learning considering temporal consistency (시간 연속성을 고려한 딥러닝 기반 레이더 강우예측)

  • Shin, Hongjoon;Yoon, Seongsim;Choi, Jaemin
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.5
    • /
    • pp.301-309
    • /
    • 2021
  • In this study, we tried to improve the performance of the existing U-net-based deep learning rainfall prediction model, which can weaken the meaning of time series order. For this, ConvLSTM2D U-Net structure model considering temporal consistency of data was applied, and we evaluated accuracy of the ConvLSTM2D U-Net model using a RainNet model and an extrapolation-based advection model. In addition, we tried to improve the uncertainty in the model training process by performing learning not only with a single model but also with 10 ensemble models. The trained neural network rainfall prediction model was optimized to generate 10-minute advance prediction data using four consecutive data of the past 30 minutes from the present. The results of deep learning rainfall prediction models are difficult to identify schematically distinct differences, but with ConvLSTM2D U-Net, the magnitude of the prediction error is the smallest and the location of rainfall is relatively accurate. In particular, the ensemble ConvLSTM2D U-Net showed high CSI, low MAE, and a narrow error range, and predicted rainfall more accurately and stable prediction performance than other models. However, the prediction performance for a specific point was very low compared to the prediction performance for the entire area, and the deep learning rainfall prediction model also had limitations. Through this study, it was confirmed that the ConvLSTM2D U-Net neural network structure to account for the change of time could increase the prediction accuracy, but there is still a limitation of the convolution deep neural network model due to spatial smoothing in the strong rainfall region or detailed rainfall prediction.

Fully Automatic Segmentation of Acute Ischemic Lesions on Diffusion-Weighted Imaging Using Convolutional Neural Networks: Comparison with Conventional Algorithms

  • Ilsang Woo;Areum Lee;Seung Chai Jung;Hyunna Lee;Namkug Kim;Se Jin Cho;Donghyun Kim;Jungbin Lee;Leonard Sunwoo;Dong-Wha Kang
    • Korean Journal of Radiology
    • /
    • v.20 no.8
    • /
    • pp.1275-1284
    • /
    • 2019
  • Objective: To develop algorithms using convolutional neural networks (CNNs) for automatic segmentation of acute ischemic lesions on diffusion-weighted imaging (DWI) and compare them with conventional algorithms, including a thresholding-based segmentation. Materials and Methods: Between September 2005 and August 2015, 429 patients presenting with acute cerebral ischemia (training:validation:test set = 246:89:94) were retrospectively enrolled in this study, which was performed under Institutional Review Board approval. Ground truth segmentations for acute ischemic lesions on DWI were manually drawn under the consensus of two expert radiologists. CNN algorithms were developed using two-dimensional U-Net with squeeze-and-excitation blocks (U-Net) and a DenseNet with squeeze-and-excitation blocks (DenseNet) with squeeze-and-excitation operations for automatic segmentation of acute ischemic lesions on DWI. The CNN algorithms were compared with conventional algorithms based on DWI and the apparent diffusion coefficient (ADC) signal intensity. The performances of the algorithms were assessed using the Dice index with 5-fold cross-validation. The Dice indices were analyzed according to infarct volumes (< 10 mL, ≥ 10 mL), number of infarcts (≤ 5, 6-10, ≥ 11), and b-value of 1000 (b1000) signal intensities (< 50, 50-100, > 100), time intervals to DWI, and DWI protocols. Results: The CNN algorithms were significantly superior to conventional algorithms (p < 0.001). Dice indices for the CNN algorithms were 0.85 for U-Net and DenseNet and 0.86 for an ensemble of U-Net and DenseNet, while the indices were 0.58 for ADC-b1000 and b1000-ADC and 0.52 for the commercial ADC algorithm. The Dice indices for small and large lesions, respectively, were 0.81 and 0.88 with U-Net, 0.80 and 0.88 with DenseNet, and 0.82 and 0.89 with the ensemble of U-Net and DenseNet. The CNN algorithms showed significant differences in Dice indices according to infarct volumes (p < 0.001). Conclusion: The CNN algorithm for automatic segmentation of acute ischemic lesions on DWI achieved Dice indices greater than or equal to 0.85 and showed superior performance to conventional algorithms.

Development and Assessment of Dynamical Seasonal Forecast System Using the Cryospheric Variables (빙권요소를 활용한 겨울철 역학 계절예측 시스템의 개발 및 검증)

  • Shim, Taehyoun;Jeong, Jee-Hoon;Ok, Jung;Jeong, Hyun-Sook;Kim, Baek-Min
    • Atmosphere
    • /
    • v.25 no.1
    • /
    • pp.155-167
    • /
    • 2015
  • A dynamical seasonal prediction system for boreal winter utilizing cryospheric information was developed. Using the Community Atmospheric Model, version3, (CAM3) as a modeling system, newly developed snow depth initialization method and sea ice concentration treatment were implemented to the seasonal prediction system. Daily snow depth analysis field was scaled in order to prevent climate drift problem before initializing model's snow fields and distributed to the model snow-depth layers. To maximize predictability gain from land surface, we applied one-month-long training procedure to the prediction system, which adjusts soil moisture and soil temperature to the imposed snow depth. The sea ice concentration over the Arctic region for prediction period was prescribed with an anomaly-persistent method that considers seasonality of sea ice. Ensemble hindcast experiments starting at 1st of November for the period 1999~2000 were performed and the predictability gain from the imposed cryospheric informations were tested. Large potential predictability gain from the snow information was obtained over large part of high-latitude and of mid-latitude land as a result of strengthened land-atmosphere interaction in the modeling system. Large-scale atmospheric circulation responses associated with the sea ice concentration anomalies were main contributor to the predictability gain.

A Study on Acoustic Characteristics of Music Practice Room in the University (대학교내 음악연습실의 음향특성에 관한 연구)

  • Jung, Chul-Woon;Jung, Eun-Jung;Ju, Duck-Hoon;Kim, Jae-Soo
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.715-719
    • /
    • 2007
  • Recently, since there appear the quality improvement in both educational and cultural level at the college campus also, thus the lecture room is requiring by the students where the intimacy degree among the students can be raised, also a smoother interaction between the professor and the student is able to be generated. Particularly in case of College of Music, the Practical Technique Training Rooms such as Orchestral Music Room, Pipe Music Concert Room, Music-Part Practice Room are more important for the interaction between Professor and Student or Student attends at the lesson, than the lecture rooms of any other colleges. Likewise, while such music practice room should be designed with consideration of the acoustic characteristics, so as to obtain the feel as if hear it performs at a music hall, but since the most of music practice room was designed with consideration of the convenience of construction work and its economical efficiency only, it has been exposed with many acoustic defects after the completion of construction. Therefore in this thesis, it has grasped the physical acoustic characteristics on the object of the two orchestral music rooms, pipe music concert room and ensemble practice room, among the newly constructed practice rooms of the Music College, W University, and it is considered that it could be utilized as the fundamental data on the base of this material when designing of the college music practice room, for the future.

  • PDF

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.

Improving Weak Classifiers by Using Discriminant Function in Selecting Threshold Values (판별 함수를 이용한 문턱치 선정에 의한 약분류기 개선)

  • Shyam, Adhikari;Yoo, Hyeon-Joong;Kim, Hyong-Suk
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.12
    • /
    • pp.84-90
    • /
    • 2010
  • In this paper, we propose a quadratic discriminant analysis based approach for improving the discriminating strength of weak classifiers based on simple Haar-like features that were used in the Viola-Jones object detection framework. Viola and Jones built a strong classifier using a boosted ensemble of weak classifiers. However, their single threshold (or decision boundary) based weak classifier is sub-optimal and too weak for efficient discrimination between object class and background. A quadratic discriminant analysis based approach is presented which leads to hyper-quadric boundary between the object class and background class, thus realizing multiple thresholds based weak classifiers. Experiments carried out for car detection using 1000 positive and 3000 negative images for training, and 500 positive and 500 negative images for testing show that our method yields higher classification performance with fewer classifiers than single threshold based weak classifiers.