• 제목/요약/키워드: distribution valued data

검색결과 26건 처리시간 0.026초

Symbolic Cluster Analysis for Distribution Valued Dissimilarity

  • Matsui, Yusuke;Minami, Hiroyuki;Misuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제21권3호
    • /
    • pp.225-234
    • /
    • 2014
  • We propose a novel hierarchical clustering for distribution valued dissimilarities. Analysis of large and complex data has attracted significant interest. Symbolic Data Analysis (SDA) was proposed by Diday in 1980's, which provides a new framework for statistical analysis. In SDA, we analyze an object with internal variation, including an interval, a histogram and a distribution, called a symbolic object. In the study, we focus on a cluster analysis for distribution valued dissimilarities, one of the symbolic objects. A hierarchical clustering has two steps in general: find out step and update step. In the find out step, we find the nearest pair of clusters. We extend it for distribution valued dissimilarities, introducing a measure on their order relations. In the update step, dissimilarities between clusters are redefined by mixture of distributions with a mixing ratio. We show an actual example of the proposed method and a simulation study.

Exploratory Methods for Joint Distribution Valued Data and Their Application

  • Igarashi, Kazuto;Minami, Hiroyuki;Mizuta, Masahiro
    • Communications for Statistical Applications and Methods
    • /
    • 제22권3호
    • /
    • pp.265-276
    • /
    • 2015
  • In this paper, we propose hierarchical cluster analysis and multidimensional scaling for joint distribution valued data. Information technology is increasing the necessity of statistical methods for large and complex data. Symbolic Data Analysis (SDA) is an attractive framework for the data. In SDA, target objects are typically represented by aggregated data. Most methods on SDA deal with objects represented as intervals and histograms. However, those methods cannot consider information among variables including correlation. In addition, objects represented as a joint distribution can contain information among variables. Therefore, we focus on methods for joint distribution valued data. We expanded the two well-known exploratory methods using the dissimilarities adopted Hall Type relative projection index among joint distribution valued data. We show a simulation study and an actual example of proposed methods.

구간형 자료의 주성분 분석에 관한 연구 (On principal component analysis for interval-valued data)

  • 최수진;강기훈
    • 응용통계연구
    • /
    • 제33권1호
    • /
    • pp.61-74
    • /
    • 2020
  • 심볼릭 자료 중 하나인 구간형 자료는 모든 관측값에서 단일 값이 아닌 구간을 값으로 취하며, 관측값 내에 변동이 존재한다는 특징을 갖는다. 주성분 분석은 자료의 분산을 최대로 설명하여 자료의 차원을 축소하는 방법이므로 구간형 자료의 주성분 분석은 관측값 간의 분산 뿐만 아니라 관측값 내의 분산 역시 설명하여야 한다. 본 논문에서는 구간형 자료의 세 가지 주성분 분석법을 소개하고자 한다. 또한 기존의 분위수 방법에서 균일분포를 사용하는 것이 아니라 구간의 중심점 부근이 좀 더 많은 정보를 가지고 있는 것으로 보고 절단정규분포를 사용하는 방법을 제안하였다. 모의실험과 OECD 관련 실제 통계 자료를 통하여 각 방법의 결과를 비교해 보았다. 마지막으로 분위수 방법의 경우 화살표 표현법을 통해 주성분 산점도를 그리고 분위수들의 위치와 분포를 확인하였다.

데이터 분포를 고려한 연속 값 속성의 이산화 (Discretization of Continuous-Valued Attributes considering Data Distribution)

  • 이상훈;박정은;오경환
    • 한국지능시스템학회논문지
    • /
    • 제13권4호
    • /
    • pp.391-396
    • /
    • 2003
  • 본 논문에서는 특정 매개변수(parameter)의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(continuous) 속성 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화 하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

구간 자료의 확률적 순서 검정 (Testing for stochastic order in interval-valued data)

  • 최혜정;임요한;곽민정;박성오
    • 응용통계연구
    • /
    • 제32권6호
    • /
    • pp.879-887
    • /
    • 2019
  • 본 연구에서는 이표본 구간 자료의 확률적 순서 검정 절차를 제안한다. 제안하는 검정 통계량은 U-통계량에 해당하며 본 연구에서는 이에 대한 점근적 분포를 귀무 가설 하에서 유도하였다. 실제 자료와 모의 실험을 통해 새로 제안한 방법의 성능을 단측 이변량 Kolmogorov-Smirnov 검정법과 비교한다.

Integer-Valued HAR(p) model with Poisson distribution for forecasting IPO volumes

  • SeongMin Yu;Eunju Hwang
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.273-289
    • /
    • 2023
  • In this paper, we develop a new time series model for predicting IPO (initial public offering) data with non-negative integer value. The proposed model is based on integer-valued autoregressive (INAR) model with a Poisson thinning operator. Just as the heterogeneous autoregressive (HAR) model with daily, weekly and monthly averages in a form of cascade, the integer-valued heterogeneous autoregressive (INHAR) model is considered to reflect efficiently the long memory. The parameters of the INHAR model are estimated using the conditional least squares estimate and Yule-Walker estimate. Through simulations, bias and standard error are calculated to compare the performance of the estimates. Effects of model fitting to the Korea's IPO are evaluated using performance measures such as mean square error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE) etc. The results show that INHAR model provides better performance than traditional INAR model. The empirical analysis of the Korea's IPO indicates that our proposed model is efficient in forecasting monthly IPO volumes.

Assessing the Coronavirus Impact on the Asean Countries' Top 10 Most Valuable Brands

  • ZAHARI, Abdul Rahman;ESA, Elinda;AZIZAN, Noor Azlinna
    • The Journal of Asian Finance, Economics and Business
    • /
    • 제9권5호
    • /
    • pp.251-260
    • /
    • 2022
  • The goal of this study is to see if the Coronavirus affects the Top 10 most valuable brands in various ASEAN countries (Malaysia, Singapore, Indonesia, and Vietnam) and industry types differently. The data for this study was collected using a secondary data method (content analysis). Based on their annual reports from 2019 to 2021, the researchers examined the brand equity of the Top 10 most valued brands in each of the four ASEAN countries. IBM Statistical Package for Social Science (SPSS) Statistics for Windows was used to examine the data. Frequency, an independent T-test, and one-way analysis of variance tests were also applied to the data. The findings revealed considerable disparities between the Top 10 most valued ASEAN country brands in 2019-2020 and 2019-2021 due to the impact of the Coronavirus. Due to the influence of the Coronavirus, the data revealed no significant differences between industry categories. Future studies could look into the disparities between the most valuable brands and the influence of the Coronavirus over a longer period of time and include a larger number of firms and countries. Brand managers in ASEAN countries' Top 10 most valuable companies must carefully manage their brands to preserve brand life and reduce the impact of future global pandemics.

데이터 분포를 고려한 연속 값 속성의 이산화 (Discretization of continuous-valued attributes considering data distribution)

  • 이상훈;박정은;오경환
    • 한국지능시스템학회:학술대회논문집
    • /
    • 한국퍼지및지능시스템학회 2003년도 춘계 학술대회 학술발표 논문집
    • /
    • pp.217-220
    • /
    • 2003
  • 본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.

  • PDF

Probabilistic estimates of corrosion rate of fuel tank structures of aging bulk carriers

  • Ivosevic, Spiro;Mestrovic, Romeo;Kovac, Natasa
    • International Journal of Naval Architecture and Ocean Engineering
    • /
    • 제11권1호
    • /
    • pp.165-177
    • /
    • 2019
  • This paper considers corrosion wastage of two ship hull structure members as a part of investigated fuel oil tanks of 25 aging bulk carriers. Taking into account that many factors which influence corrosion wastage of ship hull structures are of uncertain nature, the related corrosion rate ($c_1$) is considered here as a real-valued continuous distribution, assuming that the corrosion wastage starts after 5, 6 or 7 years. In all considered cases, by using available data and applying three basic statistical tests, it is established that between two-parameter continuous distributions, normal, Weibull and logistic distributions are best fitted distributions for the mentioned corrosion rate ($c_1$). Note that the presented statistical, numerical and graphical results concerning two mentioned ship hull structure members allow to compare and discuss the corresponding probabilistic estimates for the corrosion rate ($c_1$).

The Impact of Traditional Market Properties and Relationship Quality on Customer Value : Approach from the viewpoint of the Means-end Chain Theory

  • Cho, Hee-Young;Han, Sang-Ho;Yang, Hoe-Chang
    • 유통과학연구
    • /
    • 제12권1호
    • /
    • pp.13-19
    • /
    • 2014
  • Purpose - This study investigated relationship quality and/or loyalty, from the viewpoint that merchants and consumers could develop the traditional market. It reorganized variables to find the conditions of values that could stimulate consumers' motives to revive the traditional market. Research Design, data, and methodology - This study employed 202 copies of effective questionnaires, based on the data of Yang & Ju (2012), to conduct correlation, regression, and structured equation modeling (SEM). Results - The results emphasized product and store atmosphere as store selection attributes to consider in the minimum error correction (MEC) model; service factor was not significant. Further, consumers valued relationship quality in the test of mediated effects of the sub-factors of store selection attributes, including consumers' social and emotional value. The relationship quality significantly influenced consumers' value in traditional markets that needed to improve and develop using several variables. Conclusions - This study revealed connections between attributes, consequences, and values using the causal relation model, to generate an optimal model based on a practical and theoretical background and proposed ways to obtain consumer-related information easily.