• Title/Summary/Keyword: Methods selection

Search Result 4,081, Processing Time 0.032 seconds

On loss functions for model selection in wavelet based Bayesian method

  • Park, Chun-Gun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.6
    • /
    • pp.1191-1197
    • /
    • 2009
  • Most Bayesian approaches to model selection of wavelet analysis have drawbacks that computational cost is expensive to obtain accuracy for the fitted unknown function. To overcome the drawback, this article introduces loss functions which are criteria for level dependent threshold selection in wavelet based Bayesian methods with arbitrary size and regular design points. We demonstrate the utility of these criteria by four test functions and real data.

  • PDF

Discretization Method Based on Quantiles for Variable Selection Using Mutual Information

  • CHa, Woon-Ock;Huh, Moon-Yul
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.3
    • /
    • pp.659-672
    • /
    • 2005
  • This paper evaluates discretization of continuous variables to select relevant variables for supervised learning using mutual information. Three discretization methods, MDL, Histogram and 4-Intervals are considered. The process of discretization and variable subset selection is evaluated according to the classification accuracies with the 6 real data sets of UCI databases. Results show that 4-Interval discretization method based on quantiles, is robust and efficient for variable selection process. We also visually evaluate the appropriateness of the selected subset of variables.

Automation of Model Selection through Neural Networks Learning (신경 회로망 학습을 통한 모델 선택의 자동화)

  • 류재흥
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.10a
    • /
    • pp.313-316
    • /
    • 2004
  • Model selection is the process that sets up the regularization parameter in the support vector machine or regularization network by using the external methods such as general cross validation or L-curve criterion. This paper suggests that the regularization parameter can be obtained simultaneously within the learning process of neural networks without resort to separate selection methods. In this paper, extended kernel method is introduced. The relationship between regularization parameter and the bias term in the extended kernel is established. Experimental results show the effectiveness of the new model selection method.

  • PDF

A Study on the Selection and Evaluation of Information Resources(2) : Expert System (정보자료(情報資料)의 선택과 평가(評價)에 관한 이론(理論)과 사례 연구(2) : 전문가(專門家) 시스템을 중심으로)

  • Choi, Won-Tae
    • Journal of Information Management
    • /
    • v.25 no.3
    • /
    • pp.1-27
    • /
    • 1994
  • This study intends to discuss the theories and case studies related to the selection and evaluation of information resources. The studies for the selection and evaluation of information resources can be divided into as follows : statistical methods, cost-effectiveness methods and expert system methods. Auther tried to discuss problems and prospects of the theoretical backgrounds and applications of expert system for the selection and evaluation of information resources.

  • PDF

A Study on the Selection and Evaluation of Information Resources(l) : Monographs and Serials (정보자료(情報資料)의 선택과 평가(評價)에 관한 이론(理論)과 사례 연구(1) : 단행본(單行本)과 연속간행물(連續刊行物)을 중심으로)

  • Choi, Won-Tae
    • Journal of Information Management
    • /
    • v.25 no.2
    • /
    • pp.1-30
    • /
    • 1994
  • This study intends to discuss the theories and case studies related to the selection and evaluation of information resources. The studies for the selection and evaluation of information resources can be divided into as follows : statistical methods, cost-effectiveness methods and expert system methods. I also tried to discuss problems and prospects of the theoretical backgrounds and applications of expert system for the selection and evaluation of information resources.

  • PDF

Sequencing to Minimize the Total Utility Work in Car Assembly Lines (자동차 조립라인에서 총 가외작업을 최소로 하는 투입순서 결정)

  • 현철주
    • Journal of the Korea Safety Management & Science
    • /
    • v.5 no.1
    • /
    • pp.69-82
    • /
    • 2003
  • The sequence which minimizes overall utility work in car assembly lines reduces the cycle time, the number of utility workers, and the risk of conveyor stopping. This study suggests mathematical formulation of the sequencing problem to minimize overall utility work, and present a genetic algorithm which can provide a near optimal solution in real time. To apply a genetic algorithm to the sequencing problem in car assembly lines, the representation, selection methods, and genetic parameters are studied. Experiments are carried out to compare selection methods such as roullette wheel selection, tournament selection and ranking selection. Experimental results show that ranking selection method outperforms the others in solution quality, whereas tournament selection provides the best performance in computation time.

Band Selection Using Forward Feature Selection Algorithm for Citrus Huanglongbing Disease Detection

  • Katti, Anurag R.;Lee, W.S.;Ehsani, R.;Yang, C.
    • Journal of Biosystems Engineering
    • /
    • v.40 no.4
    • /
    • pp.417-427
    • /
    • 2015
  • Purpose: This study investigated different band selection methods to classify spectrally similar data - obtained from aerial images of healthy citrus canopies and citrus greening disease (Huanglongbing or HLB) infected canopies - using small differences without unmixing endmember components and therefore without the need for an endmember library. However, large number of hyperspectral bands has high redundancy which had to be reduced through band selection. The objective, therefore, was to first select the best set of bands and then detect citrus Huanglongbing infected canopies using these bands in aerial hyperspectral images. Methods: The forward feature selection algorithm (FFSA) was chosen for band selection. The selected bands were used for identifying HLB infected pixels using various classifiers such as K nearest neighbor (KNN), support vector machine (SVM), naïve Bayesian classifier (NBC), and generalized local discriminant bases (LDB). All bands were also utilized to compare results. Results: It was determined that a few well-chosen bands yielded much better results than when all bands were chosen, and brought the classification results on par with standard hyperspectral classification techniques such as spectral angle mapper (SAM) and mixture tuned matched filtering (MTMF). Median detection accuracies ranged from 66-80%, which showed great potential toward rapid detection of the disease. Conclusions: Among the methods investigated, a support vector machine classifier combined with the forward feature selection algorithm yielded the best results.

A Novel Feature Selection Method in the Categorization of Imbalanced Textual Data

  • Pouramini, Jafar;Minaei-Bidgoli, Behrouze;Esmaeili, Mahdi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3725-3748
    • /
    • 2018
  • Text data distribution is often imbalanced. Imbalanced data is one of the challenges in text classification, as it leads to the loss of performance of classifiers. Many studies have been conducted so far in this regard. The proposed solutions are divided into several general categories, include sampling-based and algorithm-based methods. In recent studies, feature selection has also been considered as one of the solutions for the imbalance problem. In this paper, a novel one-sided feature selection known as probabilistic feature selection (PFS) was presented for imbalanced text classification. The PFS is a probabilistic method that is calculated using feature distribution. Compared to the similar methods, the PFS has more parameters. In order to evaluate the performance of the proposed method, the feature selection methods including Gini, MI, FAST and DFS were implemented. To assess the proposed method, the decision tree classifications such as C4.5 and Naive Bayes were used. The results of tests on Reuters-21875 and WebKB figures per F-measure suggested that the proposed feature selection has significantly improved the performance of the classifiers.

Assessment of genomic prediction accuracy using different selection and evaluation approaches in a simulated Korean beef cattle population

  • Nwogwugwu, Chiemela Peter;Kim, Yeongkuk;Choi, Hyunji;Lee, Jun Heon;Lee, Seung-Hwan
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.33 no.12
    • /
    • pp.1912-1921
    • /
    • 2020
  • Objective: This study assessed genomic prediction accuracies based on different selection methods, evaluation procedures, training population (TP) sizes, heritability (h2) levels, marker densities and pedigree error (PE) rates in a simulated Korean beef cattle population. Methods: A simulation was performed using two different selection methods, phenotypic and estimated breeding value (EBV), with an h2 of 0.1, 0.3, or 0.5 and marker densities of 10, 50, or 777K. A total of 275 males and 2,475 females were randomly selected from the last generation to simulate ten recent generations. The simulation of the PE dataset was modified using only the EBV method of selection with a marker density of 50K and a heritability of 0.3. The proportions of errors substituted were 10%, 20%, 30%, and 40%, respectively. Genetic evaluations were performed using genomic best linear unbiased prediction (GBLUP) and single-step GBLUP (ssGBLUP) with different weighted values. The accuracies of the predictions were determined. Results: Compared with phenotypic selection, the results revealed that the prediction accuracies obtained using GBLUP and ssGBLUP increased across heritability levels and TP sizes during EBV selection. However, an increase in the marker density did not yield higher accuracy in either method except when the h2 was 0.3 under the EBV selection method. Based on EBV selection with a heritability of 0.1 and a marker density of 10K, GBLUP and ssGBLUP_0.95 prediction accuracy was higher than that obtained by phenotypic selection. The prediction accuracies from ssGBLUP_0.95 outperformed those from the GBLUP method across all scenarios. When errors were introduced into the pedigree dataset, the prediction accuracies were only minimally influenced across all scenarios. Conclusion: Our study suggests that the use of ssGBLUP_0.95, EBV selection, and low marker density could help improve genetic gains in beef cattle.

A review of gene selection methods based on machine learning approaches (기계학습 접근법에 기반한 유전자 선택 방법들에 대한 리뷰)

  • Lee, Hajoung;Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.5
    • /
    • pp.667-684
    • /
    • 2022
  • Gene expression data present the level of mRNA abundance of each gene, and analyses of gene expressions have provided key ideas for understanding the mechanism of diseases and developing new drugs and therapies. Nowadays high-throughput technologies such as DNA microarray and RNA-sequencing enabled the simultaneous measurement of thousands of gene expressions, giving rise to a characteristic of gene expression data known as high dimensionality. Due to the high-dimensionality, learning models to analyze gene expression data are prone to overfitting problems, and to solve this issue, dimension reduction or feature selection techniques are commonly used as a preprocessing step. In particular, we can remove irrelevant and redundant genes and identify important genes using gene selection methods in the preprocessing step. Various gene selection methods have been developed in the context of machine learning so far. In this paper, we intensively review recent works on gene selection methods using machine learning approaches. In addition, the underlying difficulties with current gene selection methods as well as future research directions are discussed.