• 제목/요약/키워드: Validation measures

검색결과 223건 처리시간 0.022초

Validation Measures of Bicluster Solutions

  • Lee, Young-Rok;Lee, Jeong-Hwa;Jun, Chi-Hyuck
    • Industrial Engineering and Management Systems
    • /
    • 제8권2호
    • /
    • pp.101-108
    • /
    • 2009
  • Biclustering is a method to extract subsets of objects and features from a dataset which are characterized in some way. In contrast to traditional clustering algorithms which group objects similar in a whole feature set, biclustering methods find groups of objects which have similar values or patterns in some features. Both in clustering and biclustering, validating how much the result is informative or reliable is a very important task. Whereas validation methods of cluster solutions have been studied actively, there are only few measures to validate bicluster solutions. Furthermore, the existing validation methods of bicluster solutions have some critical problems to be used in general cases. In this paper, we review several well-known validation measures for cluster and bicluster solutions and discuss their limitations. Then, we propose several improved validation indices as modified versions of existing ones.

고차원 (유전자 발현) 자료에 대한 군집 타당성분석 기법의 성능 비교 (Comparison of the Cluster Validation Methods for High-dimensional (Gene Expression) Data)

  • 정윤경;백장선
    • 응용통계연구
    • /
    • 제20권1호
    • /
    • pp.167-181
    • /
    • 2007
  • 유전자 발현 자료(gene expression data)는 전형적인 고차원 자료이며, 이를 분석하기 위한 여러 가지 군집 알고리즘(clustering algorithm)과 군집 결과들을 검증하는 군집타당성분석 기법(cluster validation technique)이 제안되고 있지만, 이들 군집 타당성을 분석하는 기법의 성능에 대한 비교, 평가는 매우 드물다. 본 논문에서는 저차원의 모의실험 자료와 실제 유전자 발현 자료에 대하여 군집 타당성분석 기법들의 성능을 비교하였으며, 그 결과 내적 측도에서는 Dunn 지수, Silhouette 지수 순으로 뛰어났고 외적 측도에서는 Jaccard 지수가 성능이 가장 우수한 것으로 평가되었다.

A Study on Accuracy Estimation of Service Model by Cross-validation and Pattern Matching

  • Cho, Seongsoo;Shrestha, Bhanu
    • International journal of advanced smart convergence
    • /
    • 제6권3호
    • /
    • pp.17-21
    • /
    • 2017
  • In this paper, the service execution accuracy was compared by ontology based rule inference method and machine learning method, and the amount of data at the point when the service execution accuracy of the machine learning method becomes equal to the service execution accuracy of the rule inference was found. The rule inference, which measures service execution accuracy and service execution accuracy using accumulated data and pattern matching on service results. And then machine learning method measures service execution accuracy using cross validation data. After creating a confusion matrix and measuring the accuracy of each service execution, the inference algorithm can be selected from the results.

Diagnostic In Spline Regression Model With Heteroscedasticity

  • Lee, In-Suk;Jung, Won-Tae;Jeong, Hye-Jeong
    • Journal of the Korean Data and Information Science Society
    • /
    • 제6권1호
    • /
    • pp.63-71
    • /
    • 1995
  • We have consider the study of local influence for smoothing parameter estimates in spline regression model with heteroscedasticity. Practically, generalized cross-validation does not work well in the presence of heteroscedasticity. Thus we have proposed the local influence measure for generalized cross-validation estimates when errors are heteroscedastic. And we have examined effects of diagnostic by above measures through Hyperinflation data.

  • PDF

Finding Unexpected Test Accuracy by Cross Validation in Machine Learning

  • Yoon, Hoijin
    • International Journal of Computer Science & Network Security
    • /
    • 제21권12spc호
    • /
    • pp.549-555
    • /
    • 2021
  • Machine Learning(ML) splits data into 3 parts, which are usually 60% for training, 20% for validation, and 20% for testing. It just splits quantitatively instead of selecting each set of data by a criterion, which is very important concept for the adequacy of test data. ML measures a model's accuracy by applying a set of validation data, and revises the model until the validation accuracy reaches on a certain level. After the validation process, the complete model is tested with the set of test data, which are not seen by the model yet. If the set of test data covers the model's attributes well, the test accuracy will be close to the validation accuracy of the model. To make sure that ML's set of test data works adequately, we design an experiment and see if the test accuracy of model is always close to its validation adequacy as expected. The experiment builds 100 different SVM models for each of six data sets published in UCI ML repository. From the test accuracy and its validation accuracy of 600 cases, we find some unexpected cases, where the test accuracy is very different from its validation accuracy. Consequently, it is not always true that ML's set of test data is adequate to assure a model's quality.

Diagnostic for Smoothing Parameter Estimate in Nonparametric Regression Model

  • In-Suk Lee;Won-Tae Jung
    • Communications for Statistical Applications and Methods
    • /
    • 제2권2호
    • /
    • pp.266-276
    • /
    • 1995
  • We have considered the study of local influence for smoothing parameter estimates in nonparametric regression model. Practically, generalized cross validation(GCV) does not work well in the presence of data perturbation. Thus we have proposed local influence measures for GCV estimates and examined effects of diagnostic by above measures.

  • PDF

간호사가 인지하는 간호관리자의 윤리적 리더십 측정 도구 K-ELW의 타당화 연구 (A Validation Study of the Modified Korean Version of Ethical Leadership at Work Questionnaire (K-ELW))

  • 김정언;박은준
    • 대한간호학회지
    • /
    • 제45권2호
    • /
    • pp.240-250
    • /
    • 2015
  • Purpose: The purpose of this study was to validate the Korean version of the Ethical Leadership at Work questionnaire (K-ELW) that measures RNs' perceived ethical leadership of their nurse managers. Methods: The strong validation process suggested by Benson (1998), including translation and cultural adaptation stage, structural stage, and external stage, was used. Participants were 241 RNs who reported their perceived ethical leadership using both the pre-version of K-ELW and a previously known Ethical Leadership Scale, and interactional justice of their managers, as well as their own demographics, organizational commitment and organizational citizenship behavior. Data analyses included descriptive statistics, Pearson correlation coefficients, reliability coefficients, exploratory factor analysis, and confirmatory factor analysis. SPSS 19.0 and Amos 18.0 versions were used. Results: A modified K-ELW was developed from construct validity evidence and included 31 items in 7 domains: People orientation, task responsibility fairness, relationship fairness, power sharing, concern for sustainability, ethical guidance, and integrity. Convergent validity, discriminant validity, and concurrent validity were supported according to the correlation coefficients of the 7 domains with other measures. Conclusion: The results of this study provide preliminary evidence that the modified K-ELW can be adopted in Korean nursing organizations, and reliable and valid ethical leadership scores can be expected.

Consensus Clustering for Time Course Gene Expression Microarray Data

  • Kim, Seo-Young;Bae, Jong-Sung
    • Communications for Statistical Applications and Methods
    • /
    • 제12권2호
    • /
    • pp.335-348
    • /
    • 2005
  • The rapid development of microarray technologies enabled the monitoring of expression levels of thousands of genes simultaneously. Recently, the time course gene expression data are often measured to study dynamic biological systems and gene regulatory networks. For the data, biologists are attempting to group genes based on the temporal pattern of their expression levels. We apply the consensus clustering algorithm to a time course gene expression data in order to infer statistically meaningful information from the measurements. We evaluate each of consensus clustering and existing clustering methods with various validation measures. In this paper, we consider hierarchical clustering and Diana of existing methods, and consensus clustering with hierarchical clustering, Diana and mixed hierachical and Diana methods and evaluate their performances on a real micro array data set and two simulated data sets.

진단 전문가시스템의 개발 : 연산적 센서검증 (Development of On-Line Diagnostic Expert System Algorithmic Sensor Validation)

  • 김영진
    • 대한기계학회논문집
    • /
    • 제18권2호
    • /
    • pp.323-338
    • /
    • 1994
  • This paper outlines a framework for performing intelligent sensor validation for a diagnostic expert system while reasoning under uncertainty. The emphasis is on the algorithmic preprocess technique. A companion paper focusses on heuristic post-processing. Sensor validation plays a vital role in the ability of the overall system to correctly detemine the state of a plant monitored by imperfect sensors. Especially, several theoretical developments were made in understanding uncertain sensory data in statistical aspect. Uncertain information in sensory values is represented through probability assignments on three discrete states, "high", "normal", and "low", and additional sensor confidence measures in Algorithmic Sv.Upper and lower warning limits are generated from the historical learning sets, which represents the borderlines for heat rate degradation generated in the Algorithmic SV initiates a historic data base for better reference in future use. All the information generated in the Algorithmic SV initiate a session to differentiate the sensor fault from the process fault and to make an inference on the system performance. This framework for a diagnostic expert system with sensor validation and reasonig under uncertainty applies in HEATXPRT$^{TM}$, a data-driven on-line expert system for diagnosing heat rate degradation problems in fossil power plants.

학습자 중심의 '강의평가' 도구 개발 및 타당화 연구 (A study for development and validation of the 'course evaluation' scale for learner-centered)

  • 박성미
    • 수산해양교육연구
    • /
    • 제23권1호
    • /
    • pp.13-22
    • /
    • 2011
  • The purpose of this study was to development and validation of the 'course evaluation' scale for learner-centered in university. The research collected preliminary data from 1,567 university students's responses for item and scale quality analyses, and collected 2,539 university students's for item and scale quality analyses, and 300 university professors's responses for validation. Data were analyzed to obtain item quality, reliability, and validity analysis. The results of the study were as follows; The 'course evaluation' scale for learner-centered in university was defined by 5 factors. The 5 factors were structure and sincerity of lecture, suitability of report and test, level of consulting for student, application of educational media, communication. The results of the confirmatory factor analysis confirmed five sub-scales in the 'course evaluation' scale for learner-centered in university scale. Criterion-related validity evidence was obtained from the correlation analysis as the criterion measures. Cross validity evidence was obtained from the confirmatory factor analysis in university professors.