• Title/Summary/Keyword: methods:data analysis

Search Result 19,281, Processing Time 0.041 seconds

A Methodology for Deriving An Object Model by Using Structured Analysis Results (구조적 분석 산출물을 이용한 객체 모델 유도 방법론)

  • 이희석;배한욱;유천수
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.21 no.3
    • /
    • pp.175-195
    • /
    • 1996
  • In conventional analysis methods, data and process are loosely coupled for building information systems. Several object oriented approaches have been proposed to integrate data and process. However, object oriented analysis requires a radical paradigm and thus system analysts find difficulties in generating object models direcctly from end users. To alleviate these difficulties, this paper proposes a methodology for deriving an object model by using structured analysis results. Objects are obtianed primarily from entities in Entity-Relationship Diagram. Methods are obtained through the analysis of the relationship between processes and data stores in Data Flow Diagram Methods are assigned to the objects by using object/process matrices. A real-life case is illustrated to demonstrate the usefulness of the methodology.

  • PDF

Graphical Methods for the Sensitivity Analysis in Discriminant Analysis

  • Jang, Dae-Heung;Anderson-Cook, Christine M.;Kim, Youngil
    • Communications for Statistical Applications and Methods
    • /
    • v.22 no.5
    • /
    • pp.475-485
    • /
    • 2015
  • Similar to regression, many measures to detect influential data points in discriminant analysis have been developed. Many follow similar principles as the diagnostic measures used in linear regression in the context of discriminant analysis. Here we focus on the impact on the predicted classification posterior probability when a data point is omitted. The new method is intuitive and easily interpretable compared to existing methods. We also propose a graphical display to show the individual movement of the posterior probability of other data points when a specific data point is omitted. This enables the summaries to capture the overall pattern of the change.

Explanatory Analysis for South Korea's Political Website Linking - Statistical Aspects

  • Choi, Kyoung-Ho;Park, Han-Woo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.4
    • /
    • pp.899-911
    • /
    • 2005
  • This paper conducts an explanatory analysis of the web sphere produced by National Assemblymen in South Korea, using some statistical methods. First, some descriptive metrics were employed. Next, the traditional methods of multi-variate analyses, multidimensional scaling and corresponding analysis, were applied to the data. Finally, cross-sectional data were compared to examine a change over time.

  • PDF

Veri cation of Improving a Clustering Algorith for Microarray Data with Missing Values

  • Kim, Su-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.2
    • /
    • pp.315-321
    • /
    • 2011
  • Gene expression microarray data often include multiple missing values. Most gene expression analysis (including gene clustering analysis); however, require a complete data matric as an input. In ordinary clustering methods, just a single missing value makes one abandon the whole data of a gene even if the rest of data for that gene was intact. The quality of analysis may decrease seriously as the missing rate is increased. In the opposite aspect, the imputation of missing value may result in an artifact that reduces the reliability of the analysis. To clarify this contradiction in microarray clustering analysis, this paper compared the accuracy of clustering with and without imputation over several microarray data having different missing rates. This paper also tested the clustering efficiency of several imputation methods including our propose algorithm. The results showed it is worthwhile to check the clustering result in this alternative way without any imputed data for the imperfect microarray data.

The Characteristics of Coastal Zone Management Methods in U.S.A -Focus on Zoning & Integrated Methods of Different Kind Data- (미국 연안구역(Coastal Zone) 관리수단의 특성 -조닝방식과 이종 데이터 간 통합방법을 중심으로-)

  • Oh, Ji-Hoon;Lee, Seok-Hwan;Lee, Hee-Won
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.11 no.9
    • /
    • pp.3590-3598
    • /
    • 2010
  • It is necessary to collect the coastal zone data, to prepare the objective analysis methods, and to build the scientific and technical support system for the efficient management of coastal zone in local aspect. This study analyzes the coastal zoning methods and the integrated methods of different kind data by case study of U.S.A coastal zone. The characteristics of coastal zone management methods in U.S.A are as follows; the concrete indices and methods of establishing coastal zone which can respond to the local values and land use, related data analysis methods for supporting spatial decision making, and establishment and administration of bureau for the spatial information construction and integration of coastal zone. This study suggest the technical implications which can build the domestic coastal zone management in local level on the basis of the common values of the coastal zoning methods and integrated methods of heterogeneous data in U.S.A.

Exploring COVID-19 in mainland China during the lockdown of Wuhan via functional data analysis

  • Li, Xing;Zhang, Panpan;Feng, Qunqiang
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.1
    • /
    • pp.103-125
    • /
    • 2022
  • In this paper, we analyze the time series data of the case and death counts of COVID-19 that broke out in China in December, 2019. The study period is during the lockdown of Wuhan. We exploit functional data analysis methods to analyze the collected time series data. The analysis is divided into three parts. First, the functional principal component analysis is conducted to investigate the modes of variation. Second, we carry out the functional canonical correlation analysis to explore the relationship between confirmed and death cases. Finally, we utilize a clustering method based on the Expectation-Maximization (EM) algorithm to run the cluster analysis on the counts of confirmed cases, where the number of clusters is determined via a cross-validation approach. Besides, we compare the clustering results with some migration data available to the public.

The development of statistical methods for retrieving MODIS missing data: Mean bias, regressions analysis and local variation method (MODIS 손실 자료 복원을 위한 통계적 방법 개발: 평균 편차 방법, 회귀 분석 방법과 지역 변동 방법)

  • Kim, Min Wook;Yi, Jonghyuk;Park, Yeon Gu;Song, Junghyun
    • Journal of Satellite, Information and Communications
    • /
    • v.11 no.4
    • /
    • pp.94-101
    • /
    • 2016
  • Satellite data for remote sensing technology has limitations, especially with visible range sensor, cloud and/or other environmental factors cause missing data. In this study, using land surface temperature data from the MODerate resolution Imaging Spectro-radiometer(MODIS), we developed retrieving methods for satellite missing data and developed three methods; mean bias, regression analysis and local variation method. These methods used the previous day data as reference data. In order to validate these methods, we selected a specific measurement ratio using artificial missing data from 2014 to 2015. The local variation method showed low accuracy with root mean square error(RMSE) more than 2 K in some cases, and the regression analysis method showed reliable results in most cases with small RMSE values, 1.13 K, approximately. RMSE with the mean bias method was similar to RMSE with the regression analysis method, 1.32 K, approximately.

Complex sample design effects and inference for Korea National Health and Nutrition Examination Survey data (국민건강영양조사 자료의 복합표본설계효과와 통계적 추론)

  • Chung, Chin-Eun
    • Journal of Nutrition and Health
    • /
    • v.45 no.6
    • /
    • pp.600-612
    • /
    • 2012
  • Nutritional researchers world-wide are using large-scale sample survey methods to study nutritional health epidemiology and services utilization in general, non-clinical populations. This article provides a review of important statistical methods and software that apply to descriptive and multivariate analysis of data collected in sample surveys, such as national health and nutrition examination survey. A comparative data analysis of the Korea National Health and Nutrition Examination Survey (KNHANES) was used to illustrate analytical procedures and design effects for survey estimates of population statistics, model parameters, and test statistics. This article focused on the following points, method of approach to analyze of the sample survey data, right software tools available to perform these analyses, and correct survey analysis methods important to interpretation of survey data. It addresses the question of approaches to analysis of complex sample survey data. The latest developments in software tools for analysis of complex sample survey data are covered, and empirical examples are presented that illustrate the impact of survey sample design effects on the parameter estimates, test statistics, and significance probabilities (p values) for univariate and multivariate analyses.

Applications of response dimension reduction in large p-small n problems

  • Minjee Kim;Jae Keun Yoo
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.191-202
    • /
    • 2024
  • The goal of this paper is to show how multivariate regression analysis with high-dimensional responses is facilitated by the response dimension reduction. Multivariate regression, characterized by multi-dimensional response variables, is increasingly prevalent across diverse fields such as repeated measures, longitudinal studies, and functional data analysis. One of the key challenges in analyzing such data is managing the response dimensions, which can complicate the analysis due to an exponential increase in the number of parameters. Although response dimension reduction methods are developed, there is no practically useful illustration for various types of data such as so-called large p-small n data. This paper aims to fill this gap by showcasing how response dimension reduction can enhance the analysis of high-dimensional response data, thereby providing significant assistance to statistical practitioners and contributing to advancements in multiple scientific domains.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • v.30 no.3
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.