• Title/Summary/Keyword: Data normalization

Search Result 483, Processing Time 0.036 seconds

Effects of Normalization and Aggregation Methods on the Volatility of Rankings and Rank Reversals (정규화 및 통합 방법이 순위의 변동성과 순위 역전에 미치는 영향)

  • Park, Youngsun
    • Journal of Korean Society for Quality Management
    • /
    • v.41 no.4
    • /
    • pp.709-724
    • /
    • 2013
  • Purpose: The purpose of this study is to examine five evaluation models constructed by different normalization and aggregation methods in terms of the volatility of rankings and rank reversals. We also explore how the volatility of rankings of the five models changes and how often the rank reversals occur when the outliers are removed. Methods: We used data published in the Complete University Guide 2014. Two universities with missing values were excluded from the data. The university rankings were derived by using the five models, and then each model's volatility of rankings was measured. The box-plot was used to detect outliers. Results: Model 1 has the lowest volatility among the five models whether or not the outliers are included. Model 5 has the lowest number of rank reversals. Model 3, which has been used by many institutions, appears to be in the middle among the five in terms of the volatility and the rank reversals. Conclusion: The university rankings vary from one evaluation model to another depending on what normalization and aggregation methods are used. No single model exhibits clear superiority over others in both the volatility and the rank reversal. The findings of this study are expected to provide a stepping stone toward a superior model which is both reliable and robust.

Rank-Based Nonlinear Normalization of Oligonucleotide Arrays

  • Park, Peter J.;Kohane, Isaac S.;Kim, Ju Han
    • Genomics & Informatics
    • /
    • v.1 no.2
    • /
    • pp.94-100
    • /
    • 2003
  • Motivation: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. Results: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of non­differentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.11
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

Annual Yearly Load Forecasting by Using Seasonal Load Characteristics With Considering Weekly Normalization (주단위 정규화를 통하여 계절별 부하특성을 고려한 연간 전력수요예측)

  • Cha, Jun-Min;Yoon, Kyoung-Ha;Ku, Bon-Hui
    • Proceedings of the KIEE Conference
    • /
    • 2011.07a
    • /
    • pp.199-200
    • /
    • 2011
  • Load forecasting is very important for power system analysis and planning. This paper suggests yearly load forecasting of considering weekly normalization and seasonal load characteristics. Each weekly peak load is normalized and the average value is calculated. The new hourly peak load is seasonally collected. This method was used for yearly load forecasting. The results of the actual data and forecast data were calculated error rate by comparing.

  • PDF

New Normalization Methods using Support Vector Machine Regression Approach in cDNA Microarray Analysis

  • Sohn, In-Suk;Kim, Su-Jong;Hwang, Chang-Ha;Lee, Jae-Won
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2005.09a
    • /
    • pp.51-56
    • /
    • 2005
  • There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels like differences in labeling efficiency between the two fluorescent dyes. Print-tip lowess normalization is used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situation where error variability for each gene is heterogeneous over intensity ranges. We proposed the new print-tip normalization methods based on support vector machine regression(SVMR) and support vector machine quantile regression(SVMQR). SVMQR was derived by employing the basic principle of support vector machine (SVM) for the estimation of the linear and nonlinear quantile regressions. We applied our proposed methods to previous cDNA micro array data of apolipoprotein-AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our statistical analysis, we found that the proposed methods perform better than the existing print-tip lowess normalization method.

  • PDF

The Effects of the Postural Movement Normalization and Eye Movement Program on the Oculomotor Ability of Children With Cerebral Palsy (자세·움직임 정상화 및 안구운동 프로그램이 뇌성마비아동의 안구운동 기능에 미치는 효과)

  • Han, Dong-Wook;Kong, Nam-Ho
    • Physical Therapy Korea
    • /
    • v.14 no.3
    • /
    • pp.32-40
    • /
    • 2007
  • Although many children with cerebral palsy have problems with their eye movements available data on its intervention is minimal. The purpose of the study was to determine the effectiveness of the postural movement normalization and eye movement program on the oculomotor ability of children with cerebral palsy. Twenty-four children with cerebral palsy (12 male and 12 female), aged between 10 and 12, were invited to partake in this study. The subjects were randomly allocated to two groups: an experimental group received the postural movement normalization and eye movement program and a control group which received conventional therapy without the eye movement program. Each subject received intervention three times a week for twelve weeks. The final measurement was the ocular motor computerized test before and after treatment sessions through an independent assessor. Differences between the experimental group and control group were determined by assessing changes in oculomotor ability using analysis of covariance (ANCOVA). The changes of visual fixation (p<.01), saccadic eye movement (p<.01) and pursuit eye movement (p<.01) were significantly higher in the experimental group than in the control group. These results show that the postural movement normalization and eye movement program may be helpful to treat children with cerebral palsy who lose normal physical and eye movement.

  • PDF

A Concordance Study of the Preprocessing Orders in Microarray Data (마이크로어레이 자료의 사전 처리 순서에 따른 검색의 일치도 분석)

  • Kim, Sang-Cheol;Lee, Jae-Hwi;Kim, Byung-Soo
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.585-594
    • /
    • 2009
  • Researchers of microarray experiment transpose processed images of raw data to possible data of statistical analysis: it is preprocessing. Preprocessing of microarray has image filtering, imputation and normalization. There have been studied about several different methods of normalization and imputation, but there was not further study on the order of the procedures. We have no further study about which things put first on our procedure between normalization and imputation. This study is about the identification of differentially expressed genes(DEG) on the order of the preprocessing steps using two-dye cDNA microarray in colon cancer and gastric cancer. That is, we check for compare which combination of imputation and normalization steps can detect the DEG. We used imputation methods(K-nearly neighbor, Baysian principle comparison analysis) and normalization methods(global, within-print tip group, variance stabilization). Therefore, preprocessing steps have 12 methods. We identified concordance measure of DEG using the datasets to which the 12 different preprocessing orders were applied. When we applied preprocessing using variance stabilization of normalization method, there was a little variance in a sensitive way for detecting DEG.

Normalization of XQuery Queries for Efficient XML Query Processing (효율적인 XML질의 처리를 위한 XQuery 질의의 정규화)

  • 김서영;이기훈;황규영
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.10 no.5
    • /
    • pp.419-433
    • /
    • 2004
  • As XML becomes a standard for data representation, integration, and exchange on the Web, several XML query languages have been proposed. World Wide Web Consortium(W3C) has proposed XQuery as a standard for the XML query language. Like SQL, XQuery allows nested queries. Thus, normalization rules have been proposed to transform nested XQuery queries to semantically equivalent ones that could be executed more efficiently. However, previous normalization rules are applicable only to restricted forms of nested XQuery queries. Specifically, they can not handle FLWR expressions having nested expressions in the where clause. In this paper, we propose normalization rules for XQuery queries by extending those for SQL queries. Our proposed rules can handle FLWR expressions haying nested expressions in every clause. The major contributions of this paper are as follows. First, we classily nesting types of XQuery queries according to the existence of correlation and aggregation. We then propose normalization rules for each nesting type. Second, we propose detailed algorithms that apply the normalization rules to nested XQuery queries.

A Brief Verification Study on the Normalization and Translation Invariant of Measurement Data for Seaport Efficiency;DEA Approach (항만효율성 측정 자료의 정규성과 변환 불변성 검증소고;DEA접근)

  • Park, Ro-Kyung
    • Proceedings of the Korea Port Economic Association Conference
    • /
    • 2007.07a
    • /
    • pp.391-405
    • /
    • 2007
  • The purpose of this paper is to verify the two problems(normalization for the different inputs and outputs data, and translation invariant for the negative data) which will be occurred in measuring the seaport DEA(data envelopment analysis) efficiency. The main result is as follow: Normalization and translation invariant in the BCC model for measuring the seaport efficiency by using 26 Korean seaport data in 1995 with two inputs(berthing capacity, cargo handling capacity) and three outputs(import cargo throughput, export cargo throughput, number of ship calls) was verified. The main policy implication of this paper is that the port management authority should collect the more specific data and publish these data on the inputs and outputs in the seaports with consideration of negative(ex. accident numbers in each seaport) and positive value for analyzing the efficiency by the scholars, because normalization and translation invariant in the data was verified.

  • PDF

Maternal Uncertainty in Childhood Chronic Illness (만성질환아 어머니의 아동질병으로 인한 불확실성 경험)

  • Park Eun Sook;Martinson M.I.
    • Child Health Nursing Research
    • /
    • v.4 no.2
    • /
    • pp.207-220
    • /
    • 1998
  • The purpose of this study was to build a substantive theory about the experience of the maternal uncertainty in childhood chronic illness. The qualitative research method used was grounded theory. The interviewees were 12 mothers who have cared for a child who had chronic illness. The data were collected through in-depth interviews with audiotape recording done by the investigator over a period of nine months. The data were analyzed simutaneously by a constant comparative method in which new data were continuously coded into categories and properties according to Strauss and Corbin's methodology. The 34 concepts were identified as a result of analyzing the grounded data. Ten categories emerged from the analysis. The categories were lack of clarity, unpredictability, unfamiliarity, negative change, anxiety, devotion normalization and burn-out. Causal conditions included : lack of clarity, unpredictability, unfamiliarity and change ; central phenomena : anxiety, being perplexed ; context. seriousness of illness, support ; intervening condition : belief action/interaction strategies devotion, overprotection ; consequences : normalization, burn-out. These categories were synthesized into the core concept-anxiety. The process of experiencing uncertainty was 1) Entering the world of uncertainty, 2) Struggling in the tunnel of uncertainty, 3) Reconstruction of the situation of uncertainty. Four hypotheses were derived from the analysis : (1) The higher the lack of clarity, unpredictability, unfamiliaity, change, the higher the level of uncertainty (2) The more serious the illness and the less the support, the higher the level of uncertainty. (3) The positive believes will influence the devoted care and normalization of the family life. Through this substantive theory, pediatric nurses can understand the process of experiencing maternal uncertainty in childhood chronic illness. Further research to build substantive theories to explain other uncertainties may contribute to a formal theory of how normalization is achieved in the family with chronically ill child.

  • PDF