• 제목/요약/키워드: Statistical Learning Theory

검색결과 70건 처리시간 0.017초

A Co-Evolutionary Computing for Statistical Learning Theory

  • Jun Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제5권4호
    • /
    • pp.281-285
    • /
    • 2005
  • Learning and evolving are two basics for data mining. As compared with classical learning theory based on objective function with minimizing training errors, the recently evolutionary computing has had an efficient approach for constructing optimal model without the minimizing training errors. The global search of evolutionary computing in solution space can settle the local optima problems of learning models. In this research, combining co-evolving algorithm into statistical learning theory, we propose an co-evolutionary computing for statistical learning theory for overcoming local optima problems of statistical learning theory. We apply proposed model to classification and prediction problems of the learning. In the experimental results, we verify the improved performance of our model using the data sets from UCI machine learning repository and KDD Cup 2000.

Collaborative CRM using Statistical Learning Theory and Bayesian Fuzzy Clustering

  • Jun, Sung-Hae
    • Communications for Statistical Applications and Methods
    • /
    • 제11권1호
    • /
    • pp.197-211
    • /
    • 2004
  • According to the increase of internet application, the marketing process as well as the research and survey, the education process, and administration of government are very depended on web bases. All kinds of goods and sales which are traded on the internet shopping malls are extremely increased. So, the necessity of automatically intelligent information system is shown, this system manages web site connected users for effective marketing. For the recommendation system which can offer a fit information from numerous web contents to user, we propose an automatic recommendation system which furnish necessary information to connected web user using statistical learning theory and bayesian fuzzy clustering. This system is called collaborative CRM in this paper. The performance of proposed system is compared with the other methods using real data of the existent shopping mall site. This paper shows that the predictive accuracy of the proposed system is improved by comparison with others.

Improvement of Support Vector Clustering using Evolutionary Programming and Bootstrap

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제8권3호
    • /
    • pp.196-201
    • /
    • 2008
  • Statistical learning theory has three analytical tools which are support vector machine, support vector regression, and support vector clustering for classification, regression, and clustering respectively. In general, their performances are good because they are constructed by convex optimization. But, there are some problems in the methods. One of the problems is the subjective determination of the parameters for kernel function and regularization by the arts of researchers. Also, the results of the learning machines are depended on the selected parameters. In this paper, we propose an efficient method for objective determination of the parameters of support vector clustering which is the clustering method of statistical learning theory. Using evolutionary algorithm and bootstrap method, we select the parameters of kernel function and regularization constant objectively. To verify improved performances of proposed research, we compare our method with established learning algorithms using the data sets form ucr machine learning repository and synthetic data.

Korean College Students' English Learning Motivation and Listening Proficiency

  • Yang, Eun-Mi
    • 영어어문교육
    • /
    • 제17권2호
    • /
    • pp.93-114
    • /
    • 2011
  • The aim of this study is twofold. First, this study aimed to explore how Korean university students' English learning motivation is related to their English listening proficiency and study time. Second, it attempted to interpret the English learning motivation linking the two different motivation theories: self-determination theory and L2 motivational self system. The constructs of the students' L2 learning motivation were investigated with the data obtained through the questionnaire from 122 sophomore students. A factor analysis was conducted to extract the major factors of motivation. As a result, 6 factors were extracted: Intrinsic Pleasure, Identified Value Regulation, Intrinsic Accomplishment, Introjected Regulation, External Regulation, and Identified Regulation. The Interrelatedness among the assessment results on the L2 listening proficiency (pre and post test), listening study time, and motivation factors was measured by correlation coefficients. The statistical results indicated that pre-test scores were significantly related to Identified Regulation and Identified Value Regulation toward English learning, and post-test results had significant correlation with Intrinsic Accomplishment and Identified Regulation. However, no motivation subtypes showed statistical association with the students' listening study time. The results were attempted to be interpreted both under L2 motivational self system and self-determination framework to better illuminate the motivation theory with more explanatory power.

  • PDF

Statistical Inference in Non-Identifiable and Singular Statistical Models

  • Amari, Shun-ichi;Amari, Shun-ichi;Tomoko Ozeki
    • Journal of the Korean Statistical Society
    • /
    • 제30권2호
    • /
    • pp.179-192
    • /
    • 2001
  • When a statistical model has a hierarchical structure such as multilayer perceptrons in neural networks or Gaussian mixture density representation, the model includes distribution with unidentifiable parameters when the structure becomes redundant. Since the exact structure is unknown, we need to carry out statistical estimation or learning of parameters in such a model. From the geometrical point of view, distributions specified by unidentifiable parameters become a singular point in the parameter space. The problem has been remarked in many statistical models, and strange behaviors of the likelihood ratio statistics, when the null hypothesis is at a singular point, have been analyzed so far. The present paper studies asymptotic behaviors of the maximum likelihood estimator and the Bayesian predictive estimator, by using a simple cone model, and show that they are completely different from regular statistical models where the Cramer-Rao paradigm holds. At singularities, the Fisher information metric degenerates, implying that the cramer-Rao paradigm does no more hold, and that he classical model selection theory such as AIC and MDL cannot be applied. This paper is a first step to establish a new theory for analyzing the accuracy of estimation or learning at around singularities.

  • PDF

WHEN CAN SUPPORT VECTOR MACHINE ACHIEVE FAST RATES OF CONVERGENCE?

  • Park, Chang-Yi
    • Journal of the Korean Statistical Society
    • /
    • 제36권3호
    • /
    • pp.367-372
    • /
    • 2007
  • Classification as a tool to extract information from data plays an important role in science and engineering. Among various classification methodologies, support vector machine has recently seen significant developments. The central problem this paper addresses is the accuracy of support vector machine. In particular, we are interested in the situations where fast rates of convergence to the Bayes risk can be achieved by support vector machine. Through learning examples, we illustrate that support vector machine may yield fast rates if the space spanned by an adopted kernel is sufficiently large.

Support Vector Machine based on Stratified Sampling

  • Jun, Sung-Hae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제9권2호
    • /
    • pp.141-146
    • /
    • 2009
  • Support vector machine is a classification algorithm based on statistical learning theory. It has shown many results with good performances in the data mining fields. But there are some problems in the algorithm. One of the problems is its heavy computing cost. So we have been difficult to use the support vector machine in the dynamic and online systems. To overcome this problem we propose to use stratified sampling of statistical sampling theory. The usage of stratified sampling supports to reduce the size of training data. In our paper, though the size of data is small, the performance accuracy is maintained. We verify our improved performance by experimental results using data sets from UCI machine learning repository.

A New Similarity Measure Based on Intraclass Statistics for Biometric Systems

  • Lee, Kwan-Yong;Park, Hye-Young
    • ETRI Journal
    • /
    • 제25권5호
    • /
    • pp.401-406
    • /
    • 2003
  • A biometric system determines the identity of a person by measuring physical features that can distinguish that person from others. Since biometric features have many variations and can be easily corrupted by noises and deformations, it is necessary to apply machine learning techniques to treat the data. When applying the conventional machine learning methods in designing a specific biometric system, however, one first runs into the difficulty of collecting sufficient data for each person to be registered to the system. In addition, there can be an almost infinite number of variations of non-registered data. Therefore, it is difficult to analyze and predict the distributional properties of real data that are essential for the system to deal with in practical applications. These difficulties require a new framework of identification and verification that is appropriate and efficient for the specific situations of biometric systems. As a preliminary solution, this paper proposes a simple but theoretically well-defined method based on a statistical test theory. Our computational experiments on real-world data show that the proposed method has potential for coping with the actual difficulties in biometrics.

  • PDF

표집 시뮬레이션을 활용한 비형식적 통계적 추리의 교수-학습: 문화-역사적 활동이론의 관점에 따른 분석 (Teaching and learning about informal statistical inference using sampling simulation : A cultural-historical activity theory analysis)

  • 서민주;서유민;정혜윤;이경화
    • 한국학교수학회논문집
    • /
    • 제26권1호
    • /
    • pp.21-47
    • /
    • 2023
  • 본 연구에서는 문화-역사적 활동이론에 기반하여, 표집 시뮬레이션을 활용한 비형식적 통계적 추리의 교수-학습 과정을 활동체계로 고려하고, 이러한 활동체계에서 발생하는 모순과 모순에 의한 변화를 확인하고자 하였다. 이를 위해 초등학생 5~6학년 20명을 대상으로 표집 시뮬레이션을 활용한 비형식적 통계적 추리에 대한 수업을 진행하고 활동체계를 분석하였다. 주제분석을 수행한 결과는 다음과 같다. 먼저, 규칙과 목표, 인공물과 목표 사이의 모순이 발생했으며, 이를 해결하는 과정에서 경험적 표집 분포의 시각화라는 새로운 인공물이 도입되는 것을 확인할 수 있었다. 또한, 규칙과 인공물, 규칙과 주체 사이의 모순이 발생했으며, 이를 해결하는 과정에서 표본 평균들의 평균을 구하는 알고리즘이 새로운 규칙으로 도입되는 것을 확인할 수 있었다.

Support Vector Regression을 이용한 희소 데이터의 전처리 (A Sparse Data Preprocessing Using Support Vector Regression)

  • 전성해;박정은;오경환
    • 한국지능시스템학회논문지
    • /
    • 제14권6호
    • /
    • pp.789-792
    • /
    • 2004
  • 웹 마이닝, 바이오정보학, 통계적 자료 분석 등 여러 분야에서 매우 다양한 형태의 결측치가 발생하여 학습 데이터를 희소하게 만든다. 결측치는 주로 전처리 과정에서 가장 기본적인 평균과 최빈수뿐만 아니라 조건부 평균, 나무 모형, 그리고 마코프체인 몬테칼로 기법과 같은 결측치 대체 기법들을 적용하여 추정된 값에 의해 대체된다. 그런데 주어진 데이터의 결측치 비율이 크게 되면 기존의 결측치 대체 방법들의 예측의 정확도는 낮아지는 특성을 보인다. 또한 데이터의 결측치 비율이 증가할수록 사용 가능한 결측치 대체 방법들의 수는 제한된다. 이러한 문제점을 해결하기 위하여 본 논문에서는 통계적 학습 이론 중에서 Vapnik의 Support Vector Regression을 데이터 전처리 과정에 알맞게 변형하여 적용하였다. 제안 방법을 이용하여 결측치 비율이 큰 희소 데이터의 전처리도 가능할 수 있도록 하였다 UCI machine learning repository로부터 얻어진 데이터를 이용하여 제안 방법의 성능을 확인하였다.