• Title/Summary/Keyword: Validation measures

Search Result 232, Processing Time 0.021 seconds

Validation Measures of Bicluster Solutions

  • Lee, Young-Rok;Lee, Jeong-Hwa;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • v.8 no.2
    • /
    • pp.101-108
    • /
    • 2009
  • Biclustering is a method to extract subsets of objects and features from a dataset which are characterized in some way. In contrast to traditional clustering algorithms which group objects similar in a whole feature set, biclustering methods find groups of objects which have similar values or patterns in some features. Both in clustering and biclustering, validating how much the result is informative or reliable is a very important task. Whereas validation methods of cluster solutions have been studied actively, there are only few measures to validate bicluster solutions. Furthermore, the existing validation methods of bicluster solutions have some critical problems to be used in general cases. In this paper, we review several well-known validation measures for cluster and bicluster solutions and discuss their limitations. Then, we propose several improved validation indices as modified versions of existing ones.

Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data (고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교)

  • Jeong, Yun-Kyoung;Baek, Jang-Sun
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.167-181
    • /
    • 2007
  • Many clustering algorithms and cluster validation techniques for high-dimensional gene expression data have been suggested. The evaluations of these cluster validation techniques have, however, seldom been implemented. In this paper we compared various cluster validity indices for low-dimensional simulation data and real gene expression data, and found that Dunn's index is the most effective and robust, Silhouette index is next and Davies-Bouldin index is the bottom among the internal measures. Jaccard index is much more effective than Goodman-Kruskal index and adjusted Rand index among the external measures.

A Study on Accuracy Estimation of Service Model by Cross-validation and Pattern Matching

  • Cho, Seongsoo;Shrestha, Bhanu
    • International journal of advanced smart convergence
    • /
    • v.6 no.3
    • /
    • pp.17-21
    • /
    • 2017
  • In this paper, the service execution accuracy was compared by ontology based rule inference method and machine learning method, and the amount of data at the point when the service execution accuracy of the machine learning method becomes equal to the service execution accuracy of the rule inference was found. The rule inference, which measures service execution accuracy and service execution accuracy using accumulated data and pattern matching on service results. And then machine learning method measures service execution accuracy using cross validation data. After creating a confusion matrix and measuring the accuracy of each service execution, the inference algorithm can be selected from the results.

Diagnostic In Spline Regression Model With Heteroscedasticity

  • Lee, In-Suk;Jung, Won-Tae;Jeong, Hye-Jeong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.6 no.1
    • /
    • pp.63-71
    • /
    • 1995
  • We have consider the study of local influence for smoothing parameter estimates in spline regression model with heteroscedasticity. Practically, generalized cross-validation does not work well in the presence of heteroscedasticity. Thus we have proposed the local influence measure for generalized cross-validation estimates when errors are heteroscedastic. And we have examined effects of diagnostic by above measures through Hyperinflation data.

  • PDF

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12spc
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

Diagnostic for Smoothing Parameter Estimate in Nonparametric Regression Model

  • In-Suk Lee;Won-Tae Jung
    • Communications for Statistical Applications and Methods
    • /
    • v.2 no.2
    • /
    • pp.266-276
    • /
    • 1995
  • We have considered the study of local influence for smoothing parameter estimates in nonparametric regression model. Practically, generalized cross validation(GCV) does not work well in the presence of data perturbation. Thus we have proposed local influence measures for GCV estimates and examined effects of diagnostic by above measures.

  • PDF

A Validation Study of the Modified Korean Version of Ethical Leadership at Work Questionnaire (K-ELW) (간호사가 인지하는 간호관리자의 윤리적 리더십 측정 도구 K-ELW의 타당화 연구)

  • Kim, Jeong-Eon;Park, Eun-Jun
    • Journal of Korean Academy of Nursing
    • /
    • v.45 no.2
    • /
    • pp.240-250
    • /
    • 2015
  • Purpose: The purpose of this study was to validate the Korean version of the Ethical Leadership at Work questionnaire (K-ELW) that measures RNs' perceived ethical leadership of their nurse managers. Methods: The strong validation process suggested by Benson (1998), including translation and cultural adaptation stage, structural stage, and external stage, was used. Participants were 241 RNs who reported their perceived ethical leadership using both the pre-version of K-ELW and a previously known Ethical Leadership Scale, and interactional justice of their managers, as well as their own demographics, organizational commitment and organizational citizenship behavior. Data analyses included descriptive statistics, Pearson correlation coefficients, reliability coefficients, exploratory factor analysis, and confirmatory factor analysis. SPSS 19.0 and Amos 18.0 versions were used. Results: A modified K-ELW was developed from construct validity evidence and included 31 items in 7 domains: People orientation, task responsibility fairness, relationship fairness, power sharing, concern for sustainability, ethical guidance, and integrity. Convergent validity, discriminant validity, and concurrent validity were supported according to the correlation coefficients of the 7 domains with other measures. Conclusion: The results of this study provide preliminary evidence that the modified K-ELW can be adopted in Korean nursing organizations, and reliable and valid ethical leadership scores can be expected.

Consensus Clustering for Time Course Gene Expression Microarray Data

  • Kim, Seo-Young;Bae, Jong-Sung
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.335-348
    • /
    • 2005
  • The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Recently, the time course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. For the data, biologists are attempting to group genes based on the temporal pattern of their expression levels. We apply the consensus clustering algorithm to a time course gene expression data in order to infer statistically meaningful information from the measurements. We evaluate each of consensus clustering and existing clustering methods with various validation measures. In this paper, we consider hierarchical clustering and Diana of existing methods, and consensus clustering with hierarchical clustering, Diana and mixed hierachical and Diana methods and evaluate their performances on a real micro array data set and two simulated data sets.

Development of On-Line Diagnostic Expert System Algorithmic Sensor Validation (진단 전문가시스템의 개발 : 연산적 센서검증)

  • 김영진
    • Transactions of the Korean Society of Mechanical Engineers
    • /
    • v.18 no.2
    • /
    • pp.323-338
    • /
    • 1994
  • This paper outlines a framework for performing intelligent sensor validation for a diagnostic expert system while reasoning under uncertainty. The emphasis is on the algorithmic preprocess technique. A companion paper focusses on heuristic post-processing. Sensor validation plays a vital role in the ability of the overall system to correctly detemine the state of a plant monitored by imperfect sensors. Especially, several theoretical developments were made in understanding uncertain sensory data in statistical aspect. Uncertain information in sensory values is represented through probability assignments on three discrete states, "high", "normal", and "low", and additional sensor confidence measures in Algorithmic Sv.Upper and lower warning limits are generated from the historical learning sets, which represents the borderlines for heat rate degradation generated in the Algorithmic SV initiates a historic data base for better reference in future use. All the information generated in the Algorithmic SV initiate a session to differentiate the sensor fault from the process fault and to make an inference on the system performance. This framework for a diagnostic expert system with sensor validation and reasonig under uncertainty applies in HEATXPRT$^{TM}$, a data-driven on-line expert system for diagnosing heat rate degradation problems in fossil power plants.

A study for development and validation of the 'course evaluation' scale for learner-centered (학습자 중심의 '강의평가' 도구 개발 및 타당화 연구)

  • Park, Sung-Mi
    • Journal of Fisheries and Marine Sciences Education
    • /
    • v.23 no.1
    • /
    • pp.13-22
    • /
    • 2011
  • The purpose of this study was to development and validation of the 'course evaluation' scale for learner-centered in university. The research collected preliminary data from 1,567 university students's responses for item and scale quality analyses, and collected 2,539 university students's for item and scale quality analyses, and 300 university professors's responses for validation. Data were analyzed to obtain item quality, reliability, and validity analysis. The results of the study were as follows; The 'course evaluation' scale for learner-centered in university was defined by 5 factors. The 5 factors were structure and sincerity of lecture, suitability of report and test, level of consulting for student, application of educational media, communication. The results of the confirmatory factor analysis confirmed five sub-scales in the 'course evaluation' scale for learner-centered in university scale. Criterion-related validity evidence was obtained from the correlation analysis as the criterion measures. Cross validity evidence was obtained from the confirmatory factor analysis in university professors.