• Title/Summary/Keyword: Ensemble system

Search Result 373, Processing Time 0.027 seconds

Ensemble Machine Learning Model Based YouTube Spam Comment Detection (앙상블 머신러닝 모델 기반 유튜브 스팸 댓글 탐지)

  • Jeong, Min Chul;Lee, Jihyeon;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.24 no.5
    • /
    • pp.576-583
    • /
    • 2020
  • This paper proposes a technique to determine the spam comments on YouTube, which have recently seen tremendous growth. On YouTube, the spammers appeared to promote their channels or videos in popular videos or leave comments unrelated to the video, as it is possible to monetize through advertising. YouTube is running and operating its own spam blocking system, but still has failed to block them properly and efficiently. Therefore, we examined related studies on YouTube spam comment screening and conducted classification experiments with six different machine learning techniques (Decision tree, Logistic regression, Bernoulli Naive Bayes, Random Forest, Support vector machine with linear kernel, Support vector machine with Gaussian kernel) and ensemble model combining these techniques in the comment data from popular music videos - Psy, Katy Perry, LMFAO, Eminem and Shakira.

Metaheuristic models for the prediction of bearing capacity of pile foundation

  • Kumar, Manish;Biswas, Rahul;Kumar, Divesh Ranjan;T., Pradeep;Samui, Pijush
    • Geomechanics and Engineering
    • /
    • v.31 no.2
    • /
    • pp.129-147
    • /
    • 2022
  • The properties of soil are naturally highly variable and thus, to ensure proper safety and reliability, we need to test a large number of samples across the length and depth. In pile foundations, conducting field tests are highly expensive and the traditional empirical relations too have been proven to be poor in performance. The study proposes a state-of-art Particle Swarm Optimization (PSO) hybridized Artificial Neural Network (ANN), Extreme Learning Machine (ELM) and Adaptive Neuro Fuzzy Inference System (ANFIS); and comparative analysis of metaheuristic models (ANN-PSO, ELM-PSO, ANFIS-PSO) for prediction of bearing capacity of pile foundation trained and tested on dataset of nearly 300 dynamic pile tests from the literature. A novel ensemble model of three hybrid models is constructed to combine and enhance the predictions of the individual models effectively. The authenticity of the dataset is confirmed using descriptive statistics, correlation matrix and sensitivity analysis. Ram weight and diameter of pile are found to be most influential input parameter. The comparative analysis reveals that ANFIS-PSO is the best performing model in testing phase (R2 = 0.85, RMSE = 0.01) while ELM-PSO performs best in training phase (R2 = 0.88, RMSE = 0.08); while the ensemble provided overall best performance based on the rank score. The performance of ANN-PSO is least satisfactory compared to the other two models. The findings were confirmed using Taylor diagram, error matrix and uncertainty analysis. Based on the results ELM-PSO and ANFIS-PSO is proposed to be used for the prediction of bearing capacity of piles and ensemble learning method of joining the outputs of individual models should be encouraged. The study possesses the potential to assist geotechnical engineers in the design phase of civil engineering projects.

Future Change Using the CMIP5 MME and Best Models: I. Near and Long Term Future Change of Temperature and Precipitation over East Asia (CMIP5 MME와 Best 모델의 비교를 통해 살펴본 미래전망: I. 동아시아 기온과 강수의 단기 및 장기 미래전망)

  • Moon, Hyejin;Kim, Byeong-Hee;Oh, Hyoeun;Lee, June-Yi;Ha, Kyung-Ja
    • Atmosphere
    • /
    • v.24 no.3
    • /
    • pp.403-417
    • /
    • 2014
  • Future changes in seasonal mean temperature and precipitation over East Asia under anthropogenic global warming are investigated by comparing the historical run for 1979~2005 and the Representative Concentration Pathway (RCP) 4.5 run for 2006~2100 with 20 coupled models which participated in the phase five of Coupled Model Inter-comparison Project (CMIP5). Although an increase in future temperature over the East Asian monsoon region has been commonly accepted, the prediction of future precipitation under global warming still has considerable uncertainties with a large inter-model spread. Thus, we select best five models, based on the evaluation of models' performance in present climate for boreal summer and winter seasons, to reduce uncertainties in future projection. Overall, the CMIP5 models better simulate climatological temperature and precipitation over East Asia than the phase 3 of CMIP and the five best models' multi-model ensemble (B5MME) has better performance than all 20 models' multi-model ensemble (MME). Under anthropogenic global warming, significant increases are expected in both temperature and land-ocean thermal contrast over the entire East Asia region during both seasons for near and long term future. The contrast of future precipitation in winter between land and ocean will decrease over East Asia whereas that in summer particularly over the Korean Peninsula, associated with the Changma, will increase. Taking into account model validation and uncertainty estimation, this study has made an effort on providing a more reliable range of future change for temperature and precipitation particularly over the Korean Peninsula than previous studies.

Improvement of Soil Moisture Initialization for a Global Seasonal Forecast System (전지구 계절 예측 시스템의 토양수분 초기화 방법 개선)

  • Seo, Eunkyo;Lee, Myong-In;Jeong, Jee-Hoon;Kang, Hyun-Suk;Won, Duk-Jin
    • Atmosphere
    • /
    • v.26 no.1
    • /
    • pp.35-45
    • /
    • 2016
  • Initialization of the global seasonal forecast system is as much important as the quality of the embedded climate model for the climate prediction in sub-seasonal time scale. Recent studies have emphasized the important role of soil moisture initialization, suggesting a significant increase in the prediction skill particularly in the mid-latitude land area where the influence of sea surface temperature in the tropics is less crucial and the potential predictability is supplemented by land-atmosphere interaction. This study developed a new soil moisture initialization method applicable to the KMA operational seasonal forecasting system. The method includes first the long-term integration of the offline land surface model driven by observed atmospheric forcing and precipitation. This soil moisture reanalysis is given for the initial state in the ensemble seasonal forecasts through a simple anomaly initialization technique to avoid the simulation drift caused by the systematic model bias. To evaluate the impact of the soil moisture initialization, two sets of long-term, 10-member ensemble experiment runs have been conducted for 1996~2009. As a result, the soil moisture initialization improves the prediction skill of surface air temperature significantly at the zero to one month forecast lead (up to ~60 days forecast lead), although the skill increase in precipitation is less significant. This study suggests that improvements of the prediction in the sub-seasonal timescale require the improvement in the quality of initial data as well as the adequate treatment of the model systematic bias.

Assimilation of Satellite-Based Soil Moisture (SMAP) in KMA GloSea6: The Results of the First Preliminary Experiment (기상청 GloSea의 위성관측 기반 토양수분(SMAP) 동화: 예비 실험 분석)

  • Ji, Hee-Sook;Hwang, Seung-On;Lee, Johan;Hyun, Yu-Kyung;Ryu, Young;Boo, Kyung-On
    • Atmosphere
    • /
    • v.32 no.4
    • /
    • pp.395-409
    • /
    • 2022
  • A new soil moisture initialization scheme is applied to the Korea Meteorological Administration (KMA) Global Seasonal forecasting system version 6 (GloSea6). It is designed to ingest the microwave soil moisture retrievals from Soil Moisture Active Passive (SMAP) radiometer using the Local Ensemble Transform Kalman Filter (LETKF). In this technical note, we describe the procedure of the newly-adopted initialization scheme, the change of soil moisture states by assimilation, and the forecast skill differences for the surface temperature and precipitation by GloSea6 simulation from two preliminary experiments. Based on a 4-year analysis experiment, the soil moisture from the land-surface model of current operational GloSea6 is found to be drier generally comparing to SMAP observation. LETKF data assimilation shows a tendency toward being wet globally, especially in arid area such as deserts and Tibetan Plateau. Also, it increases soil moisture analysis increments in most soil levels of wetness in land than current operation. The other experiment of GloSea6 forecast with application of the new initialization system for the heat wave case in 2020 summer shows that the memory of soil moisture anomalies obtained by the new initialization system is persistent throughout the entire forecast period of three months. However, averaged forecast improvements are not substantial and mixed over Eurasia during the period of forecast: forecast skill for the precipitation improved slightly but for the surface air temperature rather degraded. Our preliminary results suggest that additional elaborate developments in the soil moisture initialization are still required to improve overall forecast skills.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

A study on the measurement and characterization of tubulent flow inside an engine cylinder (엔진 실린더내 난류유동 측정과 정량화방법에 관한 연구)

  • 강건용;엄종호;김용선
    • Journal of the korean Society of Automotive Engineers
    • /
    • v.14 no.6
    • /
    • pp.39-47
    • /
    • 1992
  • The engine combustion is one of the most important process affecting performance and emissions. One effective way to improve the engine combustion is to control motion of the charge inside a cylinder by means of optimum induction system design, because the flame speed is mainly determined by the turbulence in a gasoline engine. This paper describes the measurement and characterization of mean velocity and turbulence intensity inside the cylinder of a 4-valve gasoline engine using laser Doppler velocimeter(LDV) under motoring(non-firing) conditions. Since the measured LDV data in each cycle show small cycle variation during compression stroke in the tested engine, the mean velocity and turbulence intensity are calculated by ensemble averaging method neglecting cycle variation effects. In the ensemble averaging method, the effects of the calculation window, in which velocities are assumed as the same crank angle, on mean velocity and turbulence intensity are fully investigated. In addition, the effects of measuring point on the flow characteristics are studied. With large calculation window, the mean velocity is shown to be less sensitive with respect to crank angle and turbulence intensity decrease in its absolute amplitude. When the piston approch to the top dead center of compression, the turbulence intensity is found to be homogeneous in the cylinder.

  • PDF

Red Blood Cell Velocity Field in Rat Mesenteric Arterioles Using Micro PIV Technique

  • Sugii, Y;Nishio, S;Okamoto, K;Nakano, A;Minamiyama, M;Niimi, H
    • International Journal of Vascular Biomedical Engineering
    • /
    • v.1 no.1
    • /
    • pp.24-31
    • /
    • 2003
  • As endothelial cells are subject to flow shear stress, it is important to determine the detailed velocity distribution in microvessels in the study of mechanical interactions between blood and endothelium. This paper describes a velocity field of the arteriole in the rat mesentery using an intravital microscope and high-speed digital video system obtained by a highly accurate PIV technique. Red blood cells (RBCs) velocity distributions with spatial resolutions of $0.8{\times}0.8{\mu}m$ were obtained even near the wall in the center plane of the arteriole. By making ensemble-averaged time-series of velocity distributions, velocity profiles over different cross-sections were calculated for comparison. The shear rate at the vascular wall also evaluated on the basis of the ensemble-averaged profiles. It was shown that the velocity profiles were blunt in the center region of the vessel cross-section while they were steep in the near wall region. The wall shear rates were significantly small, compared with those estimated from the Poiseuille profiles.

  • PDF

Pre- and Post-Processors of Ensemble Streamflow Prediction System (앙상블 유량예측 시스템의 사전 및 사후처리에 관한 연구)

  • Kang, Tae-Ho;Kim, Young-Oh;Hong, Il-Pyo
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2008.05a
    • /
    • pp.264-268
    • /
    • 2008
  • 미래 발생 가능한 수문 및 기상현상의 예측과정은 지식의 부족과 자연현상의 다양성으로 인해 불확실성을 포함하게 된다. 하지만 많은 예측들은 아직까지 확정적으로 제공되고 있으며, 결과적으로 예측결과의 불확실성 정도를 제공하지 못하고 있다. 앙상블 유량예측(ESP, Ensemble Streamflow Prediction)은 이러한 불확실성을 고려하여 수자원시스템의 의사결정에 있어 중요한 요소 중 하나인 유량예측을 수행할 수 있는 방법이다. 하지만 ESP의 결과는 기상자료, 유역 초기조건, 수문모형의 매개변수, 단순화된 수문모형에 의해 비교적 큰 불확실성을 포함하게 되며, 따라서 실제적인 현업에서의 사용을 위해서는 불확실성 정도를 줄이기 위한 사전 및 사후처리 과정이 요구된다. 본 연구에서는 국내에서 활용 가능한 기후 예보자료를 사용하여 앙상블 유량예측에 적용할 수 있는 사전처리 방안들을 검토하고, 국내에서 사후처리를 위해 적용되었던 최적선형 보정기법에 더해 다양한 기법들을 강우유출모형인 TANK모형의 모의결과 보정에 적용하였다. 사전 및 사후처리를 적용한 결과 기상자료와 유량예측과정에 존재하는 불확실성을 저감시키는 것이 가능하였다. 특히 사전 및 사후 처리가 동시에 적용되었을 경우 그 향상 정도가 단순히 각각의 방법에 의한 향상 정도를 합한 것보다 높게 나타날 수 있음이 확인되었다. 사전 및 사후처리를 동시에 적용한 경우 이수기에는 RPS(Ranked Probability Score) 평가방법 내에서 54%를, 홍수기에는 8%를 향상시키는 것이 가능하였다.

  • PDF

Energy Efficient Design of a Jet Pump by Ensemble of Surrogates and Evolutionary Approach

  • Husain, Afzal;Sonawat, Arihant;Mohan, Sarath;Samad, Abdus
    • International Journal of Fluid Machinery and Systems
    • /
    • v.9 no.3
    • /
    • pp.265-276
    • /
    • 2016
  • Energy systems working coherently in different conditions may not have a specific design which can provide optimal performance. A system working for a longer period at lower efficiency implies higher energy consumption. In this effort, a methodology demonstrated by a jet pump design and optimization via numerical modeling for fluid dynamics and implementation of an evolutionary algorithm for the optimization shows a reduction in computational costs. The jet pump inherently has a low efficiency because of improper mixing of primary and secondary fluids, and multiple momentum and energy transfer phenomena associated with it. The high fidelity solutions were obtained through a validated numerical model to construct an approximate function through surrogate analysis. Pareto-optimal solutions for two objective functions, i.e., secondary fluid pressure head and primary fluid pressure-drop, were generated through a multi-objective genetic algorithm. For the jet pump geometry, a design space of several design variables was discretized using the Latin hypercube sampling method for the optimization. The performance analysis of the surrogate models shows that the combined surrogates perform better than a single surrogate and the optimized jet pump shows a higher performance. The approach can be implemented in other energy systems to find a better design.