• Title/Summary/Keyword: Model Generalization

Search Result 432, Processing Time 0.026 seconds

Predicting movie audience with stacked generalization by combining machine learning algorithms

  • Park, Junghoon;Lim, Changwon
    • Communications for Statistical Applications and Methods
    • /
    • v.28 no.3
    • /
    • pp.217-232
    • /
    • 2021
  • The Korea film industry has matured and the number of movie-watching per capita has reached the highest level in the world. Since then, movie industry growth rate is decreasing and even the total sales of movies per year slightly decreased in 2018. The number of moviegoers is the first factor of sales in movie industry and also an important factor influencing additional sales. Thus it is important to predict the number of movie audiences. In this study, we predict the cumulative number of audiences of films using stacking, an ensemble method. Stacking is a kind of ensemble method that combines all the algorithms used in the prediction. We use box office data from Korea Film Council and web comment data from Daum Movie (www.movie.daum.net). This paper describes the process of collecting and preprocessing of explanatory variables and explains regression models used in stacking. Final stacking model outperforms in the prediction of test set in terms of RMSE.

A New Ensemble Machine Learning Technique with Multiple Stacking (다중 스태킹을 가진 새로운 앙상블 학습 기법)

  • Lee, Su-eun;Kim, Han-joon
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.3
    • /
    • pp.1-13
    • /
    • 2020
  • Machine learning refers to a model generation technique that can solve specific problems from the generalization process for given data. In order to generate a high performance model, high quality training data and learning algorithms for generalization process should be prepared. As one way of improving the performance of model to be learned, the Ensemble technique generates multiple models rather than a single model, which includes bagging, boosting, and stacking learning techniques. This paper proposes a new Ensemble technique with multiple stacking that outperforms the conventional stacking technique. The learning structure of multiple stacking ensemble technique is similar to the structure of deep learning, in which each layer is composed of a combination of stacking models, and the number of layers get increased so as to minimize the misclassification rate of each layer. Through experiments using four types of datasets, we have showed that the proposed method outperforms the exiting ones.

A study of the River Meanders in the Han River System (한강수계의 사행에 관한 연구)

  • 김종섭;김양수
    • Water for future
    • /
    • v.18 no.1
    • /
    • pp.57-65
    • /
    • 1985
  • In recent years, an increment of river engineering activities and more intensive use of flood plain, the river geomorphology has attracted considerable attention owing to an extensive land reclamation. One of the important problems is the maintenance of river meanders and almost all natural rivers exhibit the tendency to be a meander. A statistical analysis is applied to typifying their shapes and the meander characteristics are analyzed by channel model of line generalization algorithm in this study. This method is applied to Han River System. The results show that the variance of curvature is a better index to describe the meander intensity and the kurtosis is a good index to characterize the total lengh of the straight sections for a given reach. The channel model of line generalization algorithm gives good results in analysis of meander characteristics.

  • PDF

The Study of Class Library Design for Reusable Object-Oriented Software (객체지향 소프트웨어 재사용을 위한 클래스 라이브러리 설계에 관한 연구)

  • Lee, Hae-Won;Kim, Jin-Seok;Kim, Hye-Gyu;Ha, Su-Cheol
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.9
    • /
    • pp.2350-2364
    • /
    • 1999
  • In this paper, we propose a method of class library repository design for provide reuser the object-oriented C++ class component. To class library design, we started by studying the characteristics of a reusable component. We formally defined the reusable component model using an entity relationship model. This formal definition has been directly used as the database schema for storing the reusable component in a repository. The reusable class library may be considered a knowledge base for software reuse. Thus, we used that Enumerative classification of breakdown of knowledge based. And another used classification is clustering of based on class similarity. The class similarity composes member function similarity and member data similarity. Finally, we have designed class library for hierarchical inheritance mechanism of object-oriented concept Generalization, Specialization and Aggregation.

  • PDF

EIGENVALUE APPROACH FOR UNSTEADY FRICTION WATER HAMMER MODEL

  • Jung Bong Seog;Karney Bryan W.
    • Water Engineering Research
    • /
    • v.5 no.4
    • /
    • pp.177-183
    • /
    • 2004
  • This paper introduces an eigenvalue method of transforming the hyperbolic partial differential equations of a particular unsteady friction water hammer model into characteristic form. This method is based on the solution of the corresponding one-dimensional Riemann problem that transforms hyperbolic quasi-linear equations into ordinary differential equations along the characteristic directions, which in this case arises as the eigenvalues of the system. A mathematical justification and generalization of the eigenvalues method is provided and this approach is compared to the traditional characteristic method.

  • PDF

Development and Testing of a New Area Search Model with Partially Overlapping Target and Searcher Patrol Area

  • Kim, Gi-Young;Eagle, James N.;Kang, Sung-Jin
    • Journal of the military operations research society of Korea
    • /
    • v.35 no.1
    • /
    • pp.21-32
    • /
    • 2009
  • In this study, the author uses a MATLAB simulation to develop and test a generalization of the traditional Random Search model which allows both the searcher and target to move and to be in different, but overlapping, areas. Also the best evasion speed for a randomly moving target against a Systematic Search is studied.

Bio-data Classification using Modified Additive Factor Model (변형된 팩터 분석 모델을 이용한 생체데이타 분류 시스템)

  • Cho, Min-Kook;Park, Hye-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.7
    • /
    • pp.667-680
    • /
    • 2007
  • The bio-data processing is used for a suitable purpose with bio-signals, which are obtained from human individuals. Recently, there is increasing demand that the bio-data has been widely applied to various applications. However, it is often that the number of data within each class is limited and the number of classes is large due to the property of problem domain. Therefore, the conventional pattern recognition systems and classification methods are suffering form low generalization performance because the system using the lack of data is influenced by noises of that. To solve this problem, we propose a modified additive factor model for bio-data generation, with two factors; the class factor which affects properties of each individuals and the environment factor such as noises which affects all classes. We then develop a classification system through defining a new similarity function using the proposed model. The proposed method maximizes to use an information of the class classification. So, we can expect to obtain good generalization performances with robust noises from small number of datas for bio-data. Experimental results show that proposed method outperforms significantly conventional method with real bio-data.

FUZZY REGRESSION MODEL WITH MONOTONIC RESPONSE FUNCTION

  • Choi, Seung Hoe;Jung, Hye-Young;Lee, Woo-Joo;Yoon, Jin Hee
    • Communications of the Korean Mathematical Society
    • /
    • v.33 no.3
    • /
    • pp.973-983
    • /
    • 2018
  • Fuzzy linear regression model has been widely studied with many successful applications but there have been only a few studies on the fuzzy regression model with monotonic response function as a generalization of the linear response function. In this paper, we propose the fuzzy regression model with the monotonic response function and the algorithm to construct the proposed model by using ${\alpha}-level$ set of fuzzy number and the resolution identity theorem. To estimate parameters of the proposed model, the least squares (LS) method and the least absolute deviation (LAD) method have been used in this paper. In addition, to evaluate the performance of the proposed model, two performance measures of goodness of fit are introduced. The numerical examples indicate that the fuzzy regression model with the monotonic response function is preferable to the fuzzy linear regression model when the fuzzy data represent the non-linear pattern.

Identification and Extraction of Reusable Linear Programming Model Components (재사용 가능한 성형계획모형 요소의 인식과 추출에 관한 연구)

  • 박성주;권오병
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.18 no.3
    • /
    • pp.79-100
    • /
    • 1993
  • This paper proposes an idea of reverse modeling that analyzes LP models and then converts them into an object-oriented model repository. The process of reverse modeling consists of (1) identifying and analyzing source models by meta processor, (2) model decomposition and generalization to scan the models and divide them into model components, and (3) deriving model selection rules from the components by rule generator. Through the process, we can extract reusable model components and build a model base with model selectioon rules. Examples with models created by SML and MODLER modeling languages are given to illustrate the methods. The model base management capabilities provided by reverse modeling can increase the reusabioity of current modeling tools.

  • PDF

The effect of perceived within-category variability through its examples on category-based inductive generalization (범주예시에 의해 지각된 범주내 변산성이 범주기반 귀납적 일반화에 미치는 효과)

  • Lee, Guk-Hee;Kim, ShinWoo;Li, Hyung-Chul O.
    • Korean Journal of Cognitive Science
    • /
    • v.25 no.3
    • /
    • pp.233-257
    • /
    • 2014
  • Category-based induction is one of major inferential reasoning methods used by humans. This research tested the effect of perceived within-category variability on the inductive generalization. Experiment 1 manipulated variability by directly presenting category exemplars. After displaying low variable (low variability condition) or highly variable exemplars (high variability condition) depending on condition, participants performed inductive generalization task about a category in question. The results showed that participants have greater confidence in generalization when category variability was low than when it was high. Rather than directly presenting category exemplars in Experiment 2, participants performed induction task after they formed category variability impression by categorization task of identifying category exemplars. Experiment 2 also found the tendency that participants have greater inductive confidence when category variability was low. The variability effect discovered in this research is distinct from the diversity effect in previous research and the category-based induction model proposed by Osherson et al. (1990) cannot fully account for the variability effect in this research. Test of variability effect in category-based induction is discussed in the general discussion section.