• Title/Summary/Keyword: graphical exploratory data analysis

Search Result 16, Processing Time 0.022 seconds

Firework plot for evaluating the impact of outliers in statistical inference (통계적 추론에서 특이점의 영향을 평가하기 위한 탐색적 자료분석 그림도구로서의 불꽃그림)

  • Moon, Sungho
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.1
    • /
    • pp.155-165
    • /
    • 2018
  • Outliers and influential observations often distort many numerical measures for data analysis. Jang and Anderson-Cook (Quality and Reliability Engineering International, 30, 1409-1425, 2014) proposed a graphical firework plot method for exploratory analysis purpose to provide a possible visualization of the trace of the impact of the possible outlying and influential observations on the univariate/bivariate data analysis and regression. They developed 3-D plot as well as pairwise plot for the appropriate measures of interest. We use firework plots as a graphical exploratory data analysis tool to detect outliers and evaluate the impact of outliers in statistical inference.

Categorical Data Analysis System in the Internet (인터넷상에서의 범주형 자료분석 시스템 개발)

  • Hong, Jong Seon;Kim, Dong Uk;O, Min Gwon
    • The Korean Journal of Applied Statistics
    • /
    • v.12 no.1
    • /
    • pp.81-81
    • /
    • 1999
  • A categorical data analysis system in the World Wide Web is proposed with an easy- to-use environment . This system is composed of four components. First, this system presents several graphical displays for Exploratory Data Analysis for categorical data. Second, it provides some measures of association Including dynamic graphics for mosaic plots of Hartigan and Kleiner (1981) and Friendly (1994). Dynamic graphics for mosaic plots give some useful informations. Third, this system can analyze categorical data with loglinear models. So we can select the best fitted loglinear model interactively.

Firework Plot as a Graphical Exploratory Data Analysis Tool to Evaluate the Impact of Outliers in a Mixture Experiment (혼합물 실험에서 특이값의 영향을 평가하기 위한 그래픽 탐색적 자료분석 도구로서의 불꽃그림)

  • Jang, Dae-Heung;Ahn, SoJin;Kim, Youngil
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.629-643
    • /
    • 2014
  • It is common to check the validity of an assumed model with the heavy use of diagnostics tools when conducting data analysis with regression techniques; however, outliers and influential data points often distort the regression output in undesired manner. Jang and Anderson-Cook (2013) proposed a graphical method called a firework plot for exploratory analysis that could visualize the trace of the impact of possible outlying and/or influential data points on individual regression coefficients and the overall residual sum of squares(SSE) measure. They developed 3-D plot as well as pair-wise plot for the appropriate measures of interest. In this paper, the approach was extended further to tell the strength of their approach; in addition, a more meaningful interpretation was possible by adding a measure not mentioned in their paper. This approach was applied to the mixture experiment because we felt that a detailed analysis of statistical measure sensitivity is required in a small experiment.

Statistical Analysis on the Web Using PHP3 (PHP3를 이용한 웹상에서의 통계분석)

  • Hwang, Jin-Soo;Uhm, Dae-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.10 no.2
    • /
    • pp.501-510
    • /
    • 1999
  • We have seen a rapid development of multimedia intustry as computer evolves and the internet has changed our way of life dramatically in these days. There we several attempts to teach elementary statistics on the web but most of them are based on commercial products. The need for statistical data analysis and decision making based on those analysis is growing. In this article we try to show one way of reaching that goal by using a server side scripting language PHP3 toghether with extra graphical module and statistical distribution module on the web. We showed some elementary exploratory graphical data analysis and statistical inferences. There are plenty of room of improvements to make it a full blown statistical analysis tool on the web in the new future. All the programs and databases used in our article we public programs. The main engine PHP3 is included as an apache web server module so it is very light and fast. It will be much better when the PHP4(ZEND) will be officially out in terms of processing speed.

  • PDF

Nonstandard Machine Learning Algorithms for Microarray Data Mining

  • Zhang, Byoung-Tak
    • Proceedings of the Korean Society for Bioinformatics Conference
    • /
    • 2001.10a
    • /
    • pp.165-196
    • /
    • 2001
  • DNA chip 또는 microarray는 다수의 유전자 또는 유전자 조각을 (보통 수천내지 수만 개)칩상에 고정시켜 놓고 DNA hybridization 반응을 이용하여 유전자들의 발현 양상을 분석할 수 있는 기술이다. 이러한 high-throughput기술은 예전에는 생각하지 못했던 여러가지 분자생물학의 문제에 대한 해답을 제시해 줄 수 있을 뿐 만 아니라, 분자수준에서의 질병 진단, 신약 개발, 환경 오염 문제의 해결 등 그 응용 가능성이 무한하다. 이 기술의 실용적인 적용을 위해서는 DNA chip을 제작하기 위한 하드웨어/웻웨어 기술 외에도 이러한 데이터로부터 최대한 유용하고 새로운 지식을 창출하기 위한 bioinformatics 기술이 핵심이라고 할 수 있다. 유전자 발현 패턴을 데이터마이닝하는 문제는 크게 clustering, classification, dependency analysis로 구분할 수 있으며 이러한 기술은 통계학과인공지능 기계학습에 기반을 두고 있다. 주로 사용된 기법으로는 principal component analysis, hierarchical clustering, k-means, self-organizing maps, decision trees, multilayer perceptron neural networks, association rules 등이다. 본 세미나에서는 이러한 기본적인 기계학습 기술 외에 최근에 연구되고 있는 새로운 학습 기술로서 probabilistic graphical model (PGM)을 소개하고 이를 DNA chip 데이터 분석에 응용하는 연구를 살펴본다. PGM은 인공신경망, 그래프 이론, 확률 이론이 결합되어 형성된 기계학습 모델로서 인간 두뇌의 기억과 학습 기작에 기반을 두고 있으며 다른 기계학습 모델과의 큰 차이점 중의 하나는 generative model이라는 것이다. 즉 일단 모델이 만들어지면 이것으로부터 새로운 데이터를 생성할 수 있는 능력이 있어서, 만들어진 모델을 검증하고 이로부터 새로운 사실을 추론해 낼 수 있어 biological data mining 문제에서와 같이 새로운 지식을 발견하는 exploratory analysis에 적합하다. 또한probabilistic graphical model은 기존의 신경망 모델과는 달리 deterministic한의사결정이 아니라 확률에 기반한 soft inference를 하고 학습된 모델로부터 관련된 요인들간의 인과관계(causal relationship) 또는 상호의존관계(dependency)를 분석하기에 적합한 장점이 있다. 군체적인 PGM 모델의 예로서, Bayesian network, nonnegative matrix factorization (NMF), generative topographic mapping (GTM)의 구조와 학습 및 추론알고리즘을소개하고 이를 DNA칩 데이터 분석 평가 대회인 CAMDA-2000과 CAMDA-2001에서 사용된cancer diagnosis 문제와 gene-drug dependency analysis 문제에 적용한 결과를 살펴본다.

  • PDF

Development of a Data Integration Tool for Hydraulic Conductivity Map and Its Application (수리전도도맵 작성을 위한 자료병합 툴 개발과 적용)

  • Ryu, Dong-Woo;Park, Eui-Seup;Kenichi, Ando;Kim, Hyung-Mok
    • Tunnel and Underground Space
    • /
    • v.17 no.6
    • /
    • pp.493-502
    • /
    • 2007
  • Measurements of hydraulic conductivity are point or interval values, and are highly limited in their number. Meanwhile, results of geophysical prospecting can provide the information of spatial variation of geology, and abundant in number. In this study, it was aimed to develop a data integration tool for constructing a hydraulic conductivity map by integrating geophysical data and hydraulic conductivity measurements. The developed code employed a geostatistical optimization method, simulated annealing (SA), and consists of 4 distinct computation modules by which from exploratory data analysis to postprocessing of the simulation were processed. All these modules are equipped with Graphical User Interface (GUI). Validation of the developed code was evaluated in-situ in characterizing hydraulic characteristics of highly permeable fractured zone.