• 제목/요약/키워드: statistical data analysis

검색결과 9,255건 처리시간 0.032초

초기 데이터 분석 로드맵을 적용한 사례 연구 (The Study on Application of Data Gathering for the site and Statistical analysis process)

  • 최은향;이상복
    • 한국품질경영학회:학술대회논문집
    • /
    • 한국품질경영학회 2010년도 춘계학술대회
    • /
    • pp.226-234
    • /
    • 2010
  • In this thesis, we present process that remove mistake of data before statistical analysis. If field data which is not simple examination about validity of data, we cannot believe analyzed statistics information. As statistical analysis information is produced based on data to be input in statistical analysis process, the data to be input should be free of error. In this paper, we study the application of statistical analysis road map that can enhance application on site by organizing basic theory and approaching on initial data exploratory phase, essential step before conducting statistical analysis. Therefore, access to statistical analysis can be enhanced and reliability on result of analysis can be secured by conducting correct statistical analysis.

  • PDF

A guideline for the statistical analysis of compositional data in immunology

  • Yoo, Jinkyung;Sun, Zequn;Greenacre, Michael;Ma, Qin;Chung, Dongjun;Kim, Young Min
    • Communications for Statistical Applications and Methods
    • /
    • 제29권4호
    • /
    • pp.453-469
    • /
    • 2022
  • The study of immune cellular composition has been of great scientific interest in immunology because of the generation of multiple large-scale data. From the statistical point of view, such immune cellular data should be treated as compositional. In compositional data, each element is positive, and all the elements sum to a constant, which can be set to one in general. Standard statistical methods are not directly applicable for the analysis of compositional data because they do not appropriately handle correlations between the compositional elements. In this paper, we review statistical methods for compositional data analysis and illustrate them in the context of immunology. Specifically, we focus on regression analyses using log-ratio transformations and the alternative approach using Dirichlet regression analysis, discuss their theoretical foundations, and illustrate their applications with immune cellular fraction data generated from colorectal cancer patients.

Program Development of Genetic Analysis for Diallel Cross Experiment

  • Kim, Seo Young;Bae, Jong Sung
    • Communications for Statistical Applications and Methods
    • /
    • 제9권3호
    • /
    • pp.675-682
    • /
    • 2002
  • In this study, we develop the statistical analysis program for genetic analysis of diallel crosses data by SAS/MACRO, SAS/IML. Genetic analysis is to estimate of genetics parameters and heredity with reciprocal cross and without reciprocal cross. Statistical analysis program solve the problem of the difficulties on the data analysis in field denetics and breeding Therefore the user whoever want to analysis of data on genetics and breeding easily conduct the work saving time and suffering.

Exploratory Data Analysis for microarray experiments with replicates

  • Lee, Eun-Kyung;Yi, Sung-Gon;Park, Tae-Sung
    • 한국통계학회:학술대회논문집
    • /
    • 한국통계학회 2005년도 추계 학술발표회 논문집
    • /
    • pp.37-41
    • /
    • 2005
  • Exploratory data analysis(EDA) is the initial stage of data analysis and provides a useful overview about the whole microarray experiment. If the experiments are replicated, the analyst should check the quality and reliability of microarray data within same experimental condition before the deeper statistical analysis. We shows EDA method focusing on the quality and reproducibility for replicates.

  • PDF

A Proposal of Some Analysis Methods for Discovery of User Information from Web Data

  • Ahn, JeongYong;Han, Kyung Soo
    • Communications for Statistical Applications and Methods
    • /
    • 제8권1호
    • /
    • pp.281-289
    • /
    • 2001
  • The continuous growth in the use of the World Wide Web is creating the data with very large scale and different types. Analyzing such data can help to determine the life time value of users, evaluate the effectiveness of web sites, and design marketing strategies and services. In this paper, we propose some analysis methods for web data and present an example of a prototypical web data analysis.

  • PDF

Big Data Smoothing and Outlier Removal for Patent Big Data Analysis

  • Choi, JunHyeog;Jun, Sunghae
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권8호
    • /
    • pp.77-84
    • /
    • 2016
  • In general statistical analysis, we need to make a normal assumption. If this assumption is not satisfied, we cannot expect a good result of statistical data analysis. Most of statistical methods processing the outlier and noise also need to the assumption. But the assumption is not satisfied in big data because of its large volume and heterogeneity. So we propose a methodology based on box-plot and data smoothing for controling outlier and noise in big data analysis. The proposed methodology is not dependent upon the normal assumption. In addition, we select patent documents as target domain of big data because patent big data analysis is a important issue in management of technology. We analyze patent documents using big data learning methods for technology analysis. The collected patent data from patent databases on the world are preprocessed and analyzed by text mining and statistics. But the most researches about patent big data analysis did not consider the outlier and noise problem. This problem decreases the accuracy of prediction and increases the variance of parameter estimation. In this paper, we check the existence of the outlier and noise in patent big data. To know whether the outlier is or not in the patent big data, we use box-plot and smoothing visualization. We use the patent documents related to three dimensional printing technology to illustrate how the proposed methodology can be used for finding the existence of noise in the searched patent big data.

Patterns of Data Analysis\ulcorner

  • Unwin, Antony
    • Journal of the Korean Statistical Society
    • /
    • 제30권2호
    • /
    • pp.219-230
    • /
    • 2001
  • How do you carry out data analysis\ulcorner There are few texts and little theory. One approach could be to use a pattern language, an idea which has been successful in field as diverse as town planning and software engineering. Patterns for data analysis are defined and discussed, illustrated with examples.

  • PDF

침구학회지 논문에 응용된 통계방식에 관한 연구 -1984 창간호부터 2002년 19권 6호까지 19년간- (Analysis of various statistical techniques used in the articles published during last 19 years in The Journal of Korean Acupuncture & Moxibusition Society)

  • 이승덕
    • Journal of Acupuncture Research
    • /
    • 제20권1호
    • /
    • pp.144-158
    • /
    • 2003
  • This study was carried out to investigate what kinds of statistical techniques have been used to analyze data from oriental medicine research, For study, 551 original articles which used statistical techniques in their data analysis were selected form the articles published in The journal of Korean Acupuncture & Moxibustion Society(JKAMS) between 1984 to 2002. among them, 122 articles used descriptive statistics while 429 articles used inferential statistics for data analysis. For that 429 articles, t-test (189 articles), analysis fo variance (111 articles), chi-square test (14 articles), correlation (10 articles), regression analysis (4 articles), factor analysis(5 articles), or nonparametric test (23 articles) were chose to analyze the data. Nonparametric approach has substantial power in case data do not meet the assumption of normality. This method is not only easy to use ut also provides measures of the statistical variation of nominal and ordinal scale. This study shows that more and more recent papers use nonparametric test compared to the old articles. nine different statistical software or packages (SAS, SPSS, Statview, Minitab, Sigma plot, ISP, Graphpad prism, Excel, Access) have been used in the articles published JKMAS. High level statistical techniques such as SAS, SPSS, and Statview are user friendly and used most for acupuncture and Moxibustion research. Including tables and plots in an article facilitates understanding family process data from a descriptive standpoint, minimized erroneous statistical conclusions, and clarifies theoretically important relationships among variables. Table and plots have been used 500 and 233 articles, respectively. A computer procedure is proposed and illustrated with statistical packages using SAS, SPSS, Statview and ISP.

  • PDF

자료 분석의 기초 (An Introduction to Data Analysis)

  • 박선일;이영원
    • 한국임상수의학회지
    • /
    • 제26권3호
    • /
    • pp.189-199
    • /
    • 2009
  • With the growing importance of evidence-based medicine, clinical or biomedical research relies critically on the validity and reliability of data, and the subsequent statistical inferences for medical decision-making may lead to valid conclusion. Despite widespread use of analytical techniques in papers published in the Journal of Veterinary Clinics statistical errors particularly in design of experiments, research methodology or data analysis methods are commonly encountered. These flaws often leading to misinterpretation of the data, thereby, subjected to inappropriate conclusions. This article is the first in a series of nontechnical introduction designed not to systemic review of medical statistics but intended to provide the journal readers with an understanding of common statistical concepts, including data scale, selection of appropriate statistical methods, descriptive statistics, data transformation, confidence interval, the principles of hypothesis testing, sampling distribution, and interpretation of results.

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • 제14권3호
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.