• Title/Summary/Keyword: Statistics data

Search Result 13,842, Processing Time 0.033 seconds

Comparison and analysis of multiple testing methods for microarray gene expression data (유전자 발현 데이터에 대한 다중검정법 비교 및 분석)

  • Seo, Sumin;Kim, Tae Houn;Kim, Jaehee
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.971-986
    • /
    • 2014
  • When thousands of hypotheses are tested simultaneously, the probability of rejecting any true hypotheses increases, and large multiplicity problems are generated. To solve these problems, researchers have proposed different approaches to multiple testing methods, considering family-wise error rate (FWER), false discovery rate (FDR) or false nondiscovery rate (FNR) as a type I error and some test statistics. In this article, we discuss Bonferroni (1960), Holm (1979), Benjamini and Hochberg (1995) and Benjamini and Yekutieli (2001) procedures based on T statistics, modified T statistics or local-pooled-error (LPE) statistics. We also consider Sun and Cai (2007) procedure based on Z statistics. These procedures are compared in the simulation and applied to Arabidopsis microarray gene expression data to identify differentially expressed genes.

Local Projective Display of Multivariate Numerical Data

  • Huh, Myung-Hoe;Lee, Yong-Goo
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.4
    • /
    • pp.661-668
    • /
    • 2012
  • For displaying multivariate numerical data on a 2D plane by the projection, principal components biplot and the GGobi are two main tools of data visualization. The biplot is very useful for capturing the global shape of the dataset, by representing $n$ observations and $p$ variables simultaneously on a single graph. The GGobi shows a dynamic movie of the images of $n$ observations projected onto a sequence of unit vectors floating on the $p$-dimensional sphere. Even though these two methods are certainly very valuable, there are drawbacks. The biplot is too condensed to describe the detailed parts of the data, and the GGobi is too burdensome for ordinary data analyses. In this paper, "the local projective display(LPD)" is proposed for visualizing multivariate numerical data. Main steps of the LDP are 1) $k$-means clustering of the data into $k$ subsets, 2) drawing $k$ principal components biplots of individual subsets, and 3) sequencing $k$ plots by Hurley's (2004) endlink algorithm for cognitive continuity.

Comparison of GHG Emission with Activity Data in Korean Railroad Sector (국내 철도부문의 활동도 자료에 따른 온실가스 배출량 비교 연구)

  • Lee, Jae-Young;Rhee, Young-Ho;Kim, Yong-Ki;Jung, Woo-Sung;Kim, Hee-Man
    • Proceedings of the KSR Conference
    • /
    • 2011.10a
    • /
    • pp.861-864
    • /
    • 2011
  • Since national GHG reduction target by 2020 has been presented in Korea, the role of railroad has been reinforced within transport system due to the allocation of reduction target into sector. So, it is necessary to manage activity data systematically for the calculation of GHG emission in railroad. Now, the activity data of diesel consumption for NIR(National Inventory Report) are provided from oil supply and demand statistics. On the other hands, the activity data collected directly from railroad operating companies are used for GHG & Energy Target Management Act. This study aimed to assess the GHG emissions using two kinds of activity data related to the diesel consumption of railroad in 2009 and 2010. As a result, GHG emissions based on oil supply and demand statistics was 636 thousands ton $CO_{2e}$, but the activity data collected from railroad operating companies showed 649 thousands ton $CO_{2e}$ in 2009. Also, the gap of $CO_{2e}$ emission was increased in 2010. These trends were caused because oil supply and demand statistics included total diesel sales volume during 1 year and the activity data collected from railroad operating companies were the amount of diesel consumption only at railcar operation and maintenance step. In conclusion, it is important to develop the management and verification system of activity data with high reliability to substitute oil supply and demand statistics in railroad sector.

  • PDF

Standardized polytomous discrimination index using concordance (부합성을 이용한 표준화된 다항판별지수)

  • Choi, Jin Soo;Hong, Chong Sun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.1
    • /
    • pp.33-44
    • /
    • 2016
  • There are many situations that the outcome for clinical decision and credit assessment should be predicted more than two categories. Five kinds of statistics which are used the concordance are proposed and used for these polytomous problems. However, these statistics are defined without exact distinction of categories, so that we have difficulty to use both the pair and set approaches and it is hard to understand the meanings of these statistics. Hence, it is not possible to compare and analyze them. In this paper, the polytomous confusion matrix is standardized and the concordance statistic can be represented based on the confusion matrix. The five kinds of statistics by using the concordance are defined. With the methods proposed in this paper, we could not only explain their meanings but also compare and analyze these statistics. Based on various data sets, properties of these five statistics are explored and explained.

A Study on Probability and Statistics Education in High School

  • Kang, Suk-Bok;Choi, Hui-Taeg
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.379-385
    • /
    • 2006
  • In this paper, the probability and statistics education of the 7th high school curriculum is studied. We analyze each unit of probability and statistics in high school textbooks $\ulcorner$Mathematics 10-GA$\lrcorner$, $\ulcorner$Mathematics I$\lrcorner$, and $\ulcorner$Practical Mathematics$\lrcorner$, and then research the percentage for the unit of probability and statistics for all textbooks. We also investigate the proportion for the number of students who select each subject of mathematics of the national academic aptitude tests for university admission in 2005 and 2006.

  • PDF

Empirical Comparisons of Disparity Measures for Three Dimensional Log-Linear Models

  • Park, Y.S.;Hong, C.S.;Jeong, D.B.
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.2
    • /
    • pp.543-557
    • /
    • 2006
  • This paper is concerned with the applicability of the chi-square approximation to the six disparity statistics: the Pearson chi-square, the generalized likelihood ratio, the power divergence, the blended weight chi-square, the blended weight Hellinger distance, and the negative exponential disparity statistic. Three dimensional contingency tables of small and moderate sample sizes are generated to be fitted to all possible hierarchical log-linear models: the completely independent model, the conditionally independent model, the partial association models, and the model with one variable independent of the other two. For models with direct solutions of expected cell counts, point estimates and confidence intervals of the 90 and 95 percentage points of six statistics are explored. For model without direct solutions, the empirical significant levels and the empirical powers of six statistics to test the significance of the three factor interaction are computed and compared.

  • PDF

Statistical System of the CIS Countries

  • Kim, Joo-Hwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.18 no.4
    • /
    • pp.1023-1032
    • /
    • 2007
  • We introduce the statistical system of the Commonwelth Independence State(CIS) countries located in the Central Asia. At present, the level of the national statistics production system of Korean National Statistical Office(NSO) is very high and locate on just behind Japan among all asian countries, and they are also trying to reach the statistics quality level upto the advanced developed countries in the world. To have the optimal Statistics production processing, we must understand the methodologies parts as well as the aspect of the macro statistics that can be applied to the country#s economic plan. Like the history is repeated, it is valuable to look at the development history of statistical system of other countries one century ago. We study the relationship among CIS countries along with the history of Russian statistics development. It will be helpful to look and understand the statistical system of CIS countries including Russia to use their statistics for international comparison study.

  • PDF

Development of process-oriented education tool for Statistics with Excel Macro (엑셀 매크로를 이용한 절차 중심 통계교육도구 개발)

  • Choi, Hyun-Seok;Ha, Jeong-Cheol
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.4
    • /
    • pp.643-650
    • /
    • 2011
  • Recently the needs for education on Statistics is growing bigger, but a mathematics-oriented education makes college students loose interest in Statistics. On the hypothesis that motivating interest is the key factor for learning, we need to develop an education tool for Statistics that makes learners to study independently but throughly. By using Excel Macro, we develop and introduce add-in program, called PETS, which supplies not only results but also process to get them.

On the Aggregation of Multi-dimensional Data using Data Cube and MDX

  • Ahn, Jeong-Yong;Kim, Seok-Ki
    • Journal of the Korean Data and Information Science Society
    • /
    • v.14 no.1
    • /
    • pp.37-44
    • /
    • 2003
  • One of the characteristics of both on-line analytical processing(OLAP) applications and decision support systems is to provide aggregated source data. The purpose of this study is to discuss on the aggregation of multi-dimensional data. In this paper, we (1) examine the SQL aggregate functions and the GROUP BY operator, (2) introduce the Data Cube and MDX, (3) present an example for the practical usage of the Data Cube and MDX using sample data.

  • PDF