• Title/Summary/Keyword: Statistical data

Search Result 15,004, Processing Time 0.046 seconds

A General Mixed Linear Model with Left-Censored Data

  • Ha, Il-Do
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.969-976
    • /
    • 2008
  • Mixed linear models have been widely used in various correlated data including multivariate survival data. In this paper we extend hierarchical-likelihood(h-likelihood) approach for mixed linear models with right censored data to that for left censored data. We also allow a general random-effect structure and propose the estimation procedure. The proposed method is illustrated using a numerical data set and is also compared with marginal likelihood method.

Receiver Operating Characteristic Analysis by Data Mining

  • Rhee Seong-Won;Lee Jea-Young
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2001.11a
    • /
    • pp.195-197
    • /
    • 2001
  • Data Mining is used to discover patterns and relationships in huge amounts of data. Researchers in many different fields have shown great interest in data mining analysis. Using the classification technique of data mining analysis, the available model for Receiver Operating Characteristic(ROC) method is presented. We present that this may help analyze result of data mining techniques.

  • PDF

GIS-based Spatial Integration and Statistical Analysis using Multiple Geoscience Data Sets : A Case Study for Mineral Potential Mapping (다중 지구과학자료를 이용한 GIS 기반 공간통합과 통계량 분석 : 광물 부존 예상도 작성을 위한 사례 연구)

  • 이기원;박노욱;권병두;지광훈
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.2
    • /
    • pp.91-105
    • /
    • 1999
  • Spatial data integration using multiple geo-based data sets has been regarded as one of the primary GIS application issues. As for this issue, several integration schemes have been developed as the perspectives of mathematical geology or geo-mathematics. However, research-based approaches for statistical/quantitative assessments between integrated layer and input layers are not fully considered yet. Related to this niche point, in this study, spatial data integration using multiple geoscientific data sets by known integration algorithms was primarily performed. For spatial integration by using raster-based GIS functionality, geological, geochemical, geophysical data sets, DEM-driven data sets and remotely sensed imagery data sets from the Ogdong area were utilized for geological thematic mapping related by mineral potential mapping. In addition, statistical/quantitative information extraction with respective to relationships among used data sets and/or between each data set and integrated layer was carried out, with the scope of multiple data fusion and schematic statistical assessment methodology. As for the spatial integration scheme, certainty factor (CF) estimation and principal component analysis (PCA) were applied. However, this study was not aimed at direct comparison of both methodologies; whereas, for the statistical/quantitative assessment between integrated layer and input layers, some statistical methodologies based on contingency table were focused. Especially, for the bias reduction, jackknife technique was also applied in PCA-based spatial integration. Through the statistic analyses with respect to the integration information in this case study, new information for relationships of integrated layer and input layers was extracted. In addition, influence effects of input data sets with respect to integrated layer were assessed. This kind of approach provides a decision-making information in the viewpoint of GIS and is also exploratory data analysis in conjunction with GIS and geoscientific application, especially handing spatial integration or data fusion with complex variable data sets.

Application of Statistical Models for Default Probability of Loans in Mortgage Companies

  • Jung, Jin-Whan
    • Communications for Statistical Applications and Methods
    • /
    • v.7 no.2
    • /
    • pp.605-616
    • /
    • 2000
  • Three primary interests frequently raised by mortgage companies are introduced and the corresponding statistical approaches for the default probability in mortgage companies are examined. Statistical models considered in this paper are time series, logistic regression, decision tree, neural network, and discrete time models. Usage of the models is illustrated using an artificially modified data set and the corresponding models are evaluated in appropriate manners.

  • PDF

Study on the Levels of Informal Statistical Inference of the Middle and High School Students (중·고등학생들의 비형식적 통계적 추리의 수준 연구)

  • Lee, Jung Yeon;Lee, Kyeong Hwa
    • School Mathematics
    • /
    • v.19 no.3
    • /
    • pp.533-551
    • /
    • 2017
  • The statistical education researchers advise instructors to educate informal statistical inference and they are paying close attention to the progress of the statistical inference in general. This study was conducted by analyzing the levels and the traits of each levels of the informal statistical inference of the middle and high school students for comparing the samples of data and estimating the graph of a population. Research has shown that five levels of the informal statistical inference were identified for comparing the samples of data: responses that are distracted or misled by an irrelevant aspect, responses that focus on frequencies of individual data points and hold a local view of the sample data sets, responses that the student's view of the data is transitioning from local to global, responses that hold a global view but do not clearly integrate multiple aspects of the distribution, and responses that integrate multiple aspects of the distribution. Another five levels of the informal statistical inference were identified for estimating the graph of a population: responses that are distracted or misled by an irrelevant aspect, responses that focus only on representativeness, responses that consider both representativeness and variability and focus on one particular aspect of the distribution, responses that focus on multiple aspects of distribution but do not clearly integrate them, and responses that integrate multiple aspects of the distribution.

Analysis of Market Trajectory Data using k-NN

  • Park, So-Hyun;Ihm, Sun-Young;Park, Young-Ho
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.195-200
    • /
    • 2018
  • Recently, as the sensor and big data analysis technology have been developed, there have been a lot of researches that analyze the purchase-related data such as the trajectory information and the stay time. Such purchase-related data is usefully used for the purchase pattern prediction and the purchase time prediction. Because it is difficult to find periodic patterns in large-scale human data, it is necessary to look at actual data sets, find various feature patterns, and then apply a machine learning algorithm appropriate to the pattern and purpose. Although existing papers have been used to analyze data using various machine learning methods, there is a lack of statistical analysis such as finding feature patterns before applying the machine learning algorithm. Therefore, we analyze the purchasing data of Songjeong Maeil Market, which is a data gathering place, and finds some characteristic patterns through statistical data analysis. Based on the results of 1, we derive meaningful conclusions by applying the machine learning algorithm and present future research directions. Through the data analysis, it was confirmed that the number of visits was different according to the regional characteristics around Songjeong Maeil Market, and the distribution of time spent by consumers could be grasped.

Compression of the Variables Classifying Domestic Marine Accident Data

  • Park, Deuk-Jin;Yang, Hyeong-Sun;Yim, Jeong-Bin
    • Journal of Navigation and Port Research
    • /
    • v.46 no.2
    • /
    • pp.92-98
    • /
    • 2022
  • Maritime accidents result in enormous economic loss and loss of life; thus, such accidents must be prevented, and risks must be managed to prevent these occurrences Risk management must be based on statistical evidence such as variables. Because calculating when variables increase statistically can be difficult, compressing the designated variables is necessary to use the maritime accident data in Korea. Thus, in this study, variables of marine accident data are compressed using statistical methods. The date, ship type, and marine accident type included in all maritime accident data were extracted, the number of optimal variables was confirmed using the hierarchical clustering analysis method, and the data were compressed. For the compressed variables, the validity of the data use was statistically confirmed using analysis of variance, and the data of the variables identified using the variable compression method were designated. Consequently, among the monthly and yearly data, statistical significance was confirmed in yearly data, and compression was possible. The significance of the data was confirmed in six and eight types of ships and accidents, respectively, and these were compressed. These results can be directly used for prevention or prediction based on past maritime accident data. Additionally, the data range extracted from past maritime accidents and the number of applicable data will be studied in the future.

Resistant Singular Value Decomposition and Its Statistical Applications

  • Park, Yong-Seok;Huh, Myung-Hoe
    • Journal of the Korean Statistical Society
    • /
    • v.25 no.1
    • /
    • pp.49-66
    • /
    • 1996
  • The singular value decomposition is one of the most useful methods in the area of matrix computation. It gives dimension reduction which is the centeral idea in many multivariate analyses. But this method is not resistant, i.e., it is very sensitive to small changes in the input data. In this article, we derive the resistant version of singular value decomposition for principal component analysis. And we give its statistical applications to biplot which is similar to principal component analysis in aspects of the dimension reduction of an n x p data matrix. Therefore, we derive the resistant principal component analysis and biplot based on the resistant singular value decomposition. They provide graphical multivariate data analyses relatively little influenced by outlying observations.

  • PDF

A System for Medical Statistical Analysis Using Guide Maps and Interactive Visualization (가이드 맵과 인터랙티브 시각화를 이용한 의료 통계분석 시스템)

  • Lee Don-Soo;Choi Soo-Mi
    • Journal of Korea Multimedia Society
    • /
    • v.8 no.7
    • /
    • pp.1000-1011
    • /
    • 2005
  • This paper presents a system for medical statistical analysis that helps medical professionals analyze clinical data more easily and accurately. It is able to recommend proper methods according to the distribution of sample data, and provides guide maps composed of icons for the understanding of the process of analysis. Besides general statistical analysis, it includes commonly-used statistical methods for medical fields, such as survival analysis and methods for repetitive measurements. The results of analysis are interactively displayed by 3D glyph-based visualization with uncertainty.

  • PDF

Introduction to S-PLUS and graphical comparison with SAS (S-PLUS의 소개 및 SAS 와의 그래픽 비교)

  • 김성수;한경수
    • The Korean Journal of Applied Statistics
    • /
    • v.6 no.1
    • /
    • pp.1-11
    • /
    • 1993
  • Statistical graphics have been important new tools for data analysis and many statistical softwares are exploiting graphical methods. Among these softwares available in personal computer at low cost, we intriduce S-PLUS(version 2.0). S-PLUS is an interactive graphical data analysis system and object-oriented programming language. SAS/GRAPH is another popular graphical system for displaying data in the form of color plots, charts, maps, and slides on screen and hardcopy devices. S-PLUS is compared to SAS/GRAPH(version 6.04) in viewpoints of statistical graphics.

  • PDF