• 제목/요약/키워드: data normalization

검색결과 487건 처리시간 0.029초

정규화 및 통합 방법이 순위의 변동성과 순위 역전에 미치는 영향 (Effects of Normalization and Aggregation Methods on the Volatility of Rankings and Rank Reversals)

  • 박영선
    • 품질경영학회지
    • /
    • 제41권4호
    • /
    • pp.709-724
    • /
    • 2013
  • Purpose: The purpose of this study is to examine five evaluation models constructed by different normalization and aggregation methods in terms of the volatility of rankings and rank reversals. We also explore how the volatility of rankings of the five models changes and how often the rank reversals occur when the outliers are removed. Methods: We used data published in the Complete University Guide 2014. Two universities with missing values were excluded from the data. The university rankings were derived by using the five models, and then each model's volatility of rankings was measured. The box-plot was used to detect outliers. Results: Model 1 has the lowest volatility among the five models whether or not the outliers are included. Model 5 has the lowest number of rank reversals. Model 3, which has been used by many institutions, appears to be in the middle among the five in terms of the volatility and the rank reversals. Conclusion: The university rankings vary from one evaluation model to another depending on what normalization and aggregation methods are used. No single model exhibits clear superiority over others in both the volatility and the rank reversal. The findings of this study are expected to provide a stepping stone toward a superior model which is both reliable and robust.

Rank-Based Nonlinear Normalization of Oligonucleotide Arrays

  • Park, Peter J.;Kohane, Isaac S.;Kim, Ju Han
    • Genomics & Informatics
    • /
    • 제1권2호
    • /
    • pp.94-100
    • /
    • 2003
  • Motivation: Many have observed a nonlinear relationship between the signal intensity and the transcript abundance in microarray data. The first step in analyzing the data is to normalize it properly, and this should include a correction for the nonlinearity. The commonly used linear normalization schemes do not address this problem. Results: Nonlinearity is present in both cDNA and oligonucleotide arrays, but we concentrate on the latter in this paper. Across a set of chips, we identify those genes whose within-chip ranks are relatively constant compared to other genes of similar intensity. For each gene, we compute the sum of the squares of the differences in its within-chip ranks between every pair of chips as our statistic and we select a small fraction of the genes with the minimal changes in ranks at each intensity level. These genes are most likely to be non-differentially expressed and are subsequently used in the normalization procedure. This method is a generalization of the rank-invariant normalization (Li and Wong, 2001), using all available chips rather than two at a time to gather more information, while using the chip that is least likely to be affected by nonlinear effects as the reference chip. The assumption in our method is that there are at least a small number of non­differentially expressed genes across the intensity range. The normalized expression values can be substantially different from the unnormalized values and may result in altered down-stream analysis.

Semi-supervised Software Defect Prediction Model Based on Tri-training

  • Meng, Fanqi;Cheng, Wenying;Wang, Jingdong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제15권11호
    • /
    • pp.4028-4042
    • /
    • 2021
  • Aiming at the problem of software defect prediction difficulty caused by insufficient software defect marker samples and unbalanced classification, a semi-supervised software defect prediction model based on a tri-training algorithm was proposed by combining feature normalization, over-sampling technology, and a Tri-training algorithm. First, the feature normalization method is used to smooth the feature data to eliminate the influence of too large or too small feature values on the model's classification performance. Secondly, the oversampling method is used to expand and sample the data, which solves the unbalanced classification of labelled samples. Finally, the Tri-training algorithm performs machine learning on the training samples and establishes a defect prediction model. The novelty of this model is that it can effectively combine feature normalization, oversampling techniques, and the Tri-training algorithm to solve both the under-labelled sample and class imbalance problems. Simulation experiments using the NASA software defect prediction dataset show that the proposed method outperforms four existing supervised and semi-supervised learning in terms of Precision, Recall, and F-Measure values.

주단위 정규화를 통하여 계절별 부하특성을 고려한 연간 전력수요예측 (Annual Yearly Load Forecasting by Using Seasonal Load Characteristics With Considering Weekly Normalization)

  • 차준민;윤경하;구본희
    • 대한전기학회:학술대회논문집
    • /
    • 대한전기학회 2011년도 제42회 하계학술대회
    • /
    • pp.199-200
    • /
    • 2011
  • Load forecasting is very important for power system analysis and planning. This paper suggests yearly load forecasting of considering weekly normalization and seasonal load characteristics. Each weekly peak load is normalized and the average value is calculated. The new hourly peak load is seasonally collected. This method was used for yearly load forecasting. The results of the actual data and forecast data were calculated error rate by comparing.

  • PDF

New Normalization Methods using Support Vector Machine Regression Approach in cDNA Microarray Analysis

  • Sohn, In-Suk;Kim, Su-Jong;Hwang, Chang-Ha;Lee, Jae-Won
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.51-56
    • /
    • 2005
  • There are many sources of systematic variations in cDNA microarray experiments which affect the measured gene expression levels like differences in labeling efficiency between the two fluorescent dyes. Print-tip lowess normalization is used in situations where dye biases can depend on spot overall intensity and/or spatial location within the array. However, print-tip lowess normalization performs poorly in situation where error variability for each gene is heterogeneous over intensity ranges. We proposed the new print-tip normalization methods based on support vector machine regression(SVMR) and support vector machine quantile regression(SVMQR). SVMQR was derived by employing the basic principle of support vector machine (SVM) for the estimation of the linear and nonlinear quantile regressions. We applied our proposed methods to previous cDNA micro array data of apolipoprotein-AI-knockout (apoAI-KO) mice, diet-induced obese mice, and genistein-fed obese mice. From our statistical analysis, we found that the proposed methods perform better than the existing print-tip lowess normalization method.

  • PDF

자세·움직임 정상화 및 안구운동 프로그램이 뇌성마비아동의 안구운동 기능에 미치는 효과 (The Effects of the Postural Movement Normalization and Eye Movement Program on the Oculomotor Ability of Children With Cerebral Palsy)

  • 한동욱;공남호
    • 한국전문물리치료학회지
    • /
    • 제14권3호
    • /
    • pp.32-40
    • /
    • 2007
  • Although many children with cerebral palsy have problems with their eye movements available data on its intervention is minimal. The purpose of the study was to determine the effectiveness of the postural movement normalization and eye movement program on the oculomotor ability of children with cerebral palsy. Twenty-four children with cerebral palsy (12 male and 12 female), aged between 10 and 12, were invited to partake in this study. The subjects were randomly allocated to two groups: an experimental group received the postural movement normalization and eye movement program and a control group which received conventional therapy without the eye movement program. Each subject received intervention three times a week for twelve weeks. The final measurement was the ocular motor computerized test before and after treatment sessions through an independent assessor. Differences between the experimental group and control group were determined by assessing changes in oculomotor ability using analysis of covariance (ANCOVA). The changes of visual fixation (p<.01), saccadic eye movement (p<.01) and pursuit eye movement (p<.01) were significantly higher in the experimental group than in the control group. These results show that the postural movement normalization and eye movement program may be helpful to treat children with cerebral palsy who lose normal physical and eye movement.

  • PDF

마이크로어레이 자료의 사전 처리 순서에 따른 검색의 일치도 분석 (A Concordance Study of the Preprocessing Orders in Microarray Data)

  • 김상철;이재휘;김병수
    • 응용통계연구
    • /
    • 제22권3호
    • /
    • pp.585-594
    • /
    • 2009
  • 마이크로어레이 실험의 실험자들은 원 측정치인 영상을 조사하여 통계적 분석이 가능한 자료의 형태로 변환하는데 이러한 과정을 흔히 사전 처리라고 부른다. 마이크로어레이의 사전 처리는 불량 영상의 제거(filtering), 결측치의 대치와 표준화로 세분되어질 수 있다. 표준화 방법과 결측치 대치 방법 각각에 대하여서는 많은 연구가 보고되었으나, 사전 처리를 구성하는 원소들간의 적정한 순서에 대하여서는 연구가 미흡하다. 표준화 방법과 결측치 대치 방법 중 어느 것이 먼저 실시되어야 하는지에 대하여서 아직 알려진 바가 없다. 본 연구는 사전 처리 순서에 대한 탐색적 시도로서 대장암과 위암을 대상으로 실시한 두 조의 cDNA 마이크로어레이 실험 자료를 이용하여 사전 처리를 구성하는 원소들간의 다양한 순서에 따라 검색된 특이 발현 유전자 군이 어떻게 변화하는지를 분석하고 있다. 즉, 결측치대치와 표준화의 여러가지 방법들의 조합에 따라 검색된 특이 발현 유전자 군이 얼마나 일치적인가를 확인하고자 한다. 결측치 대치 방법으로는 K 최근접 이웃 방법과 베이지안 주성분 분석을 고려하였고, 표준화 방법으로는 전체 표준화, 블럭별 국소(within-print tip group) 평활 표준화 그리고 분산 안정화를 유도하는 표준화 방법을 적용하였다. 따라서 사전 처리를 구성하는 두개 원소가 각각 2개 수준과 3개 수준을 가지고 있고, 두개 원소의 순열에 따른 모든 가능한 사전 처리 개수 수는 12개가 된다. 본 연구에서는 12개 사전 처리 방법 각각에 따라 정상 조직과 암 조직간 특이적으로 발현하는 유전자 군을 검색하였고, 사전 처리 순서를 바꾸었을때 유전자 군이 얼마나 일치적으로 유지되는지를 파악하고 있다. 표준화 방법으로 분산 안정화 표준화를 사용할 경우는 사전 처리 순서에 따라 특이 발현 유전자 군이 다소 민감하게 변하는 것을 보이고 있다.

효율적인 XML질의 처리를 위한 XQuery 질의의 정규화 (Normalization of XQuery Queries for Efficient XML Query Processing)

  • 김서영;이기훈;황규영
    • 한국정보과학회논문지:컴퓨팅의 실제 및 레터
    • /
    • 제10권5호
    • /
    • pp.419-433
    • /
    • 2004
  • XML이 웹 상에서의 정보 표현, 통합, 교환을 위한 표준이 됨에 따라 다양한 XML 질의 언어들이 제안되었으며, World Wide Web Consortium(W3C)은 XQuery를 XML 질의 언어의 표준으로 권고하였다. XQuery는 SQL과 유사하게 중첩 질의를 허용하므로, 중첩된 XQuery 질의를 동일한 의미를 가지면서 보다 효율적으로 실행될 수 있는 질의로 변환하는 정규화 규칙들이 제안되었다. 하지만 제안된 정규화 규칙들은 제한적인 형태의 중첩 질의에만 적용되는 문제점을 가지고 있다. 특히, FLWR 표현식의 where 절에 있는 중첩을 처리할 수 없다. 본 논문에서는 SQL 질의의 정규화 규칙들을 확장하여 XQuery 질의의 정규화 규칙들을 제안한다. 제안한 정규화 규칙들은 FLWR 표현식의 모든 절에 나타나는 중첩을 처리할 수 있다. 본 논문의 주요 공헌은 다음과 같다. 첫째, 상관과 집계의 유무에 따라 XQuery 질의의 중첩 유형을 분류하고, 각 유형 별로 정규화 규칙들을 제안한다. 둘째, 중첩된 XQuery 질의에 정규화 규칙들을 적용하는 세부 알고리즘들을 제안한다.

항만효율성 측정 자료의 정규성과 변환 불변성 검증소고;DEA접근 (A Brief Verification Study on the Normalization and Translation Invariant of Measurement Data for Seaport Efficiency;DEA Approach)

  • 박노경
    • 한국항만경제학회:학술대회논문집
    • /
    • 한국항만경제학회 2007년도 정책세미나 및 국제학술대회
    • /
    • pp.391-405
    • /
    • 2007
  • 본 논문에서는 항만효율성 측정 시 문제가 되었던 두 가지 문제점(첫째, 각기 상이한 기본단위를 갖는 투입변수와 산출변수의 정규화문제, 둘째, DEA분석의 기본가정인 비음수조건에 벗어난 자료, 즉, 음수를 갖는 투입-산출자료의 변환불변성)를 해결하기 위해서 국내 26개항만의 자료를 이용하여 실증분석을 한 후에 검증을 함으로써 항만효율성 측정방법을 부분적으로 확장시켰다. 본 논문의 실증분석의 핵심적인 결과는 다음과 같다. 첫째, 항만효율성 측정 시 사용되는 자료의 정규성과 변환불변성은 실증분석 결과 분명하게 있는 것으로 검증되었다. 둘째, 항만효율성 측정 시 사용되는 자료가 마이너스(-)인 경우에 가장 큰 음수보다 더 큰 양수를 더해 주는 이른바 자료의 변환를 검증하는 변환불변성은 투입지향-산출지향 BCC 모형에서 확인되었다. 위와 같은 실증분석 결과는 다음과 같은 정책적인 함의를 갖고 있다. 즉, 효율성 측정 시 사용되는 자료의 정규성과 변환불변성이 실증적으로 검증되었으므로, 국내 항만의 정책입안가들은 항만효율성 측정 시 이용되는 자료의 정규성과 변환불변성과 같은 사항을 고려하여 보다 세부적인 항만통계자료를 수집 ${\cdot}$ 정리 ${\cdot}$ 공표하는 것이 매우 필요하다. 예를 들면 항만사고와 같은 통계도 해역별이 아닌 항만별로 세부적으로 통계를 발행하도록 관련된 정책적인 지원이 필요하다.

  • PDF

만성질환아 어머니의 아동질병으로 인한 불확실성 경험 (Maternal Uncertainty in Childhood Chronic Illness)

  • 박은숙
    • Child Health Nursing Research
    • /
    • 제4권2호
    • /
    • pp.207-220
    • /
    • 1998
  • The purpose of this study was to build a substantive theory about the experience of the maternal uncertainty in childhood chronic illness. The qualitative research method used was grounded theory. The interviewees were 12 mothers who have cared for a child who had chronic illness. The data were collected through in-depth interviews with audiotape recording done by the investigator over a period of nine months. The data were analyzed simutaneously by a constant comparative method in which new data were continuously coded into categories and properties according to Strauss and Corbin's methodology. The 34 concepts were identified as a result of analyzing the grounded data. Ten categories emerged from the analysis. The categories were lack of clarity, unpredictability, unfamiliarity, negative change, anxiety, devotion normalization and burn-out. Causal conditions included : lack of clarity, unpredictability, unfamiliarity and change ; central phenomena : anxiety, being perplexed ; context. seriousness of illness, support ; intervening condition : belief action/interaction strategies devotion, overprotection ; consequences : normalization, burn-out. These categories were synthesized into the core concept-anxiety. The process of experiencing uncertainty was 1) Entering the world of uncertainty, 2) Struggling in the tunnel of uncertainty, 3) Reconstruction of the situation of uncertainty. Four hypotheses were derived from the analysis : (1) The higher the lack of clarity, unpredictability, unfamiliaity, change, the higher the level of uncertainty (2) The more serious the illness and the less the support, the higher the level of uncertainty. (3) The positive believes will influence the devoted care and normalization of the family life. Through this substantive theory, pediatric nurses can understand the process of experiencing maternal uncertainty in childhood chronic illness. Further research to build substantive theories to explain other uncertainties may contribute to a formal theory of how normalization is achieved in the family with chronically ill child.

  • PDF