• Title/Summary/Keyword: Statistics data

Search Result 13,842, Processing Time 0.029 seconds

Data Technology: New Interdisciplinary Science & Technology (데이터 기술: 지식창조를 위한 새로운 융합과학기술)

  • Park, Sung-Hyun
    • Journal of Korean Society for Quality Management
    • /
    • v.38 no.3
    • /
    • pp.294-312
    • /
    • 2010
  • Data Technology (DT) is a new technology which deals with data collection, data analysis, information generation from data, knowledge generation from modelling and future prediction. DT is a newly emerged interdisciplinary science & technology in this 21st century knowledge society. Even though the main body of DT is applied statistics, it also contains management information system (MIS), quality management, process system analysis and so on. Therefore, it is an interdisciplinary science and technology of statistics, management science, industrial engineering, computer science and social science. In this paper, first of all, the definition of DT is given, and then the effects and the basic properties of DT, the differences between IT and DT, the 6 step process for DT application, and a DT example are provided. Finally, the relationship among DT, e-Statistics and Data Mining is explained, and the direction of DT development is proposed.

Visualization and interpretation of cancer data using linked micromap plots

  • Park, Se Jin;Ahn, Jeong Yong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.6
    • /
    • pp.1531-1538
    • /
    • 2014
  • The causes of cancer are diverse, complex, and only partially understood. Many factors including health behaviors, socioeconomic environments and geographical locations can directly damage genes or combine with existing genetic faults within cells to cause cancerous mutations. Collecting the cancer data and reporting the statistics, therefore, are important to help identify health trends and establish normal health changes in geographical areas. In this article, we analyzed cancer data and demon-strated how spatial patterns of the age-standardized rate and health indicators can be examined visually and simultaneously using linked micromap plots. As a result of data analysis, the age-standardized rate has positive correlativity with thyroid and breast cancer, but the rate has negative correlativity with smoking and drinking. In addition, the regions with high age-standardized rate are located in southwest and the areas of high population density while the standardized mortality ratio is higher in southwest and northeast where there are lots of rural areas.

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Convergence rate of a test statistics observed by the longitudinal data with long memory

  • Kim, Yoon Tae;Park, Hyun Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.5
    • /
    • pp.481-492
    • /
    • 2017
  • This paper investigates a convergence rate of a test statistics given by two scale sampling method based on $A\ddot{i}t$-Sahalia and Jacod (Annals of Statistics, 37, 184-222, 2009). This statistics tests for longitudinal data having the existence of long memory dependence driven by fractional Brownian motion with Hurst parameter $H{\in}(1/2,\;1)$. We obtain an upper bound in the Kolmogorov distance for normal approximation of this test statistic. As a main tool for our works, the recent results in Nourdin and Peccati (Probability Theory and Related Fields, 145, 75-118, 2009; Annals of Probability, 37, 2231-2261, 2009) will be used. These results are obtained by employing techniques based on the combination between Malliavin calculus and Stein's method for normal approximation.

Families of Distributions Arising from Distributions of Ordered Data

  • Ahmadi, Mosayeb;Razmkhah, M.;Mohtashami Borzadaran, G.R.
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.2
    • /
    • pp.105-120
    • /
    • 2015
  • A large family of distributions arising from distributions of ordered data is proposed which contains other models studied in the literature. This extension subsume many cases of weighted random variables such as order statistics, records, k-records and many others in variety. Such a distribution can be used for modeling data which are not identical in distribution. Some properties of the theoretical model such as moment, mean deviation, entropy criteria, symmetry and unimodality are derived. The proposed model also studies the problem of parameter estimation and derives maximum likelihood estimators in a weighted gamma distribution. Finally, it will be shown that the proposed model is the best among the previously introduced distributions for modeling a real data set.

Improvement of the Integrated Management System of Agriculture, Livestock and Food Statistics Data for Maintaining Information Consistency (정보 정합성 유지를 위한 농축산식품 통계 데이터 통합관리 체계 개선)

  • Wee, Seong Seung;Yoon, Seo il;Seo, Jung Hun;Shin, Yong Tae
    • Journal of The Korean Society of Agricultural Engineers
    • /
    • v.65 no.2
    • /
    • pp.13-20
    • /
    • 2023
  • In the past, the Ministry of Agriculture, Food, and Rural Affairs (MAFRA) conducted research into and production of agricultural statistics. As a result of organizational changes, these responsibilities were transferred to Statistics Korea in 1998 and 2008. So, the MAFRA managed data without a formal system for managing data and statistical information. The demand for a foundation to improve the usefulness of agri-food statistics has recently increased due to the necessity for basic data representing the realities of rural and agricultural areas. This is especially true for field-oriented agricultural administration. This study aims to propose a strategy for the systematically integrated administration of dispersed data and statistical information. It served as the foundation for the integrated management of distributed administrative and statistical data.

Logistic Regression Method in Interval-Censored Data

  • Yun, Eun-Young;Kim, Jin-Mi;Ki, Choong-Rak
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.871-881
    • /
    • 2011
  • In this paper we propose a logistic regression method to estimate the survival function and the median survival time in interval-censored data. The proposed method is motivated by the data augmentation technique with no sacrifice in augmenting data. In addition, we develop a cross validation criterion to determine the size of data augmentation. We compare the proposed estimator with other existing methods such as the parametric method, the single point imputation method, and the nonparametric maximum likelihood estimator through extensive numerical studies to show that the proposed estimator performs better than others in the sense of the mean squared error. An illustrative example based on a real data set is given.

Prole likelihood estimation of generalized half logistic distribution under progressively type-II censoring

  • Kim, Yong-Ku;Kang, Suk-Bok;Han, Song-Hui;Seo, Jung-In
    • Journal of the Korean Data and Information Science Society
    • /
    • v.22 no.3
    • /
    • pp.597-603
    • /
    • 2011
  • The half logistic distribution has been used intensively in reliability and survival analysis especially when the data is censored. In this paper, we provide prole likelihood estimation of the shape parameter and scale parameter in the generalized half logistic distribution based on progressively Type-II censored data. We also introduce approximate maximum prole likelihood estimates for the scale parameter. As an illustration, we examine the validity of our estimation using real data and simulated data.

Estimation of Gini-Simpson index for SNP data

  • Kang, Joonsung
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1557-1564
    • /
    • 2017
  • We take genomic sequences of high-dimensional low sample size (HDLSS) without ordering of response categories into account. When constructing an appropriate test statistics in this model, the classical multivariate analysis of variance (MANOVA) approach might not be useful owing to very large number of parameters and very small sample size. For these reasons, we present a pseudo marginal model based upon the Gini-Simpson index estimated via Bayesian approach. In view of small sample size, we consider the permutation distribution by every possible n! (equally likely) permutation of the joined sample observations across G groups of (sizes $n_1,{\ldots}n_G$). We simulate data and apply false discovery rate (FDR) and positive false discovery rate (pFDR) with associated proposed test statistics to the data. And we also analyze real SARS data and compute FDR and pFDR. FDR and pFDR procedure along with the associated test statistics for each gene control the FDR and pFDR respectively at any level ${\alpha}$ for the set of p-values by using the exact conditional permutation theory.

Discussion : Vision and Strategy for Undergraduate Statistics Major Program (토론 : 통계학 학부전공 프로그램의 비전과 전략에 비추어)

  • 손건태;허명회
    • The Korean Journal of Applied Statistics
    • /
    • v.12 no.2
    • /
    • pp.705-709
    • /
    • 1999
  • We discuss the paper by Cho, Shin, Lee, and Han on the "information-relate" undergraduate statistics major program from the following perspectives: Recently, Korean universities are under re-structuring turmoil. To effectively confront the situation, we need both the vision and the strategy for statistics and statistics departments. For undergraduate statistics major program, our visions are 1) it should not be preliminary education program targeted for the graduate degrees, 2) it should be responsive to future social demand, and 3) it should incorporate the progressive identity of statistics as information and data science. As strategies, we propose 1) the effective integration and due balance among data collection, management and analysis, 2) the harmony and role development of computers and mathematics as statistical tools, 3) the statistics education through task-oriented problem solving, and 4) the emphasis of team work and communication skills.on skills.

  • PDF