• Title/Summary/Keyword: 심볼릭 자료분석

Search Result 8, Processing Time 0.02 seconds

A Divisive Clustering for Mixed Feature-Type Symbolic Data (혼합형태 심볼릭 데이터의 군집분석방법)

  • Kim, Jaejik
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.6
    • /
    • pp.1147-1161
    • /
    • 2015
  • Nowadays we are considering and analyzing not only classical data expressed by points in the p-dimensional Euclidean space but also new types of data such as signals, functions, images, and shapes, etc. Symbolic data also can be considered as one of those new types of data. Symbolic data can have various formats such as intervals, histograms, lists, tables, distributions, models, and the like. Up to date, symbolic data studies have mainly focused on individual formats of symbolic data. In this study, it is extended into datasets with both histogram and multimodal-valued data and a divisive clustering method for the mixed feature-type symbolic data is introduced and it is applied to the analysis of industrial accident data.

자바를 활용한 수치계산에서의 심볼릭 연산 알고리즘

  • Kim, Cheol-Su;Kim, Ik-Chan;Yang, Jun-Yeong
    • Communications of Mathematical Education
    • /
    • v.13 no.2
    • /
    • pp.535-547
    • /
    • 2002
  • 본 논문은 교육현장에서 자바(Java)를 이용한 수치계산 애플릿(Applet)을 개발할 경우 수식을 인식하여 그 결과를 실행하고 보여주는 심볼릭 연산을 구현하기 위한 알고리즘 개발과 다양한 입력식을 처리하기 위한 효율적인 자료구조를 제안한다. 구현된 패키지내의 클래스는 변수와 상수, 다양한 연산자를 처리하기에 적합하며 분석된 정보를 통해 사칙연산의 처리, 연산자 우선순위의 처리, 심볼릭 연산, 다항식, 방정식, 함수의 그래프 작성, 간단한 미적분 처리를 하는 알고리즘을 제안한다.

  • PDF

Symbolic tree based model for HCC using SNP data (악성간암환자의 유전체자료 심볼릭 나무구조 모형연구)

  • Lee, Tae Rim
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1095-1106
    • /
    • 2014
  • Symbolic data analysis extends the data mining and exploratory data analysis to the knowledge mining, we can suggest the SDA tree model on clinical and genomic data with new knowledge mining SDA approach. Using SDA application for huge genomic SNP data, we can get the correlation the availability of understanding of hidden structure of HCC data could be proved. We can confirm validity of application of SDA to the tree structured progression model and to quantify the clinical lab data and SNP data for early diagnosis of HCC. Our proposed model constructs the representative model for HCC survival time and causal association with their SNP gene data. To fit the simple and easy interpretation tree structured survival model which could reduced from huge clinical and genomic data under the new statistical theory of knowledge mining with SDA.

Cluster analysis for Seoul apartment price using symbolic data (서울 아파트 매매가 자료의 심볼릭 데이터를 이용한 군집분석)

  • Kim, Jaejik
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.6
    • /
    • pp.1239-1247
    • /
    • 2015
  • In this study, 64 administrative regions with high frequencies of apartment trade in Seoul, Korea are classified by the apartment sale price. To consider distributions of apartment price for each region as well as the mean of the price, the symbolic histogram-valued data approach is employed. Symbolic data include all types of data which have internal variation in themselves such as intervals, lists, histograms, distributions, and models, etc. As a result of the cluster analysis using symbolic histogram data, it is found that Gangnam, Seocho, and Songpa districts and regions near by those districts have relatively higher prices and larger dispersions. This result makes sense because those regions have good accessibility to downtown and educational environment.

On principal component analysis for interval-valued data (구간형 자료의 주성분 분석에 관한 연구)

  • Choi, Soojin;Kang, Kee-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.1
    • /
    • pp.61-74
    • /
    • 2020
  • Interval-valued data, one type of symbolic data, are observed in the form of intervals rather than single values. Each interval-valued observation has an internal variation. Principal component analysis reduces the dimension of data by maximizing the variance of data. Therefore, the principal component analysis of the interval-valued data should account for the variance between observations as well as the variation within the observed intervals. In this paper, three principal component analysis methods for interval-valued data are summarized. In addition, a new method using a truncated normal distribution has been proposed instead of a uniform distribution in the conventional quantile method, because we believe think there is more information near the center point of the interval. Each method is compared using simulations and the relevant data set from the OECD. In the case of the quantile method, we draw a scatter plot of the principal component, and then identify the position and distribution of the quantiles by the arrow line representation method.

Time series representation for clustering using unbalanced Haar wavelet transformation (불균형 Haar 웨이블릿 변환을 이용한 군집화를 위한 시계열 표현)

  • Lee, Sehun;Baek, Changryong
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.6
    • /
    • pp.707-719
    • /
    • 2018
  • Various time series representation methods have been proposed for efficient time series clustering and classification. Lin et al. (DMKD, 15, 107-144, 2007) proposed a symbolic aggregate approximation (SAX) method based on symbolic representations after approximating the original time series using piecewise local mean. The performance of SAX therefore depends heavily on how well the piecewise local averages approximate original time series features. SAX equally divides the entire series into an arbitrary number of segments; however, it is not sufficient to capture key features from complex, large-scale time series data. Therefore, this paper considers data-adaptive local constant approximation of the time series using the unbalanced Haar wavelet transformation. The proposed method is shown to outperforms SAX in many real-world data applications.

A Study on Gene Search Using Test for Interval Data (구간형 데이터 검정법을 이용한 유전자 탐색에 관한 연구)

  • Lee, Seong-Keon
    • Journal of the Korean Data Analysis Society
    • /
    • v.20 no.6
    • /
    • pp.2805-2812
    • /
    • 2018
  • The methylation score, expressed as a percentage of the methylation status data derived from the iterative sequencing process, has a value between 0 and 1. It is contrary to the assumption of normal distribution that simply applying the t-test to examine the difference in population-specific methylation scores in these data. In addition, since the result may vary depending on the number of repetitions of sequencing in the process of methylation score generation, a method that can analyze such errors is also necessary. In this paper, we introduce the symbolic data analysis and the interval K-S test method which convert observation data into interval data including uncertainty rather than one numerical data. In addition, it is possible to analyze the characteristics of methylation score by using Beta distribution without using normal distribution in the process of converting into interval data. For the data analysis, the nature of the proposed method was examined using sequencing data of actual patients and normal persons. While the t-test is only possible for the location test, it is found that the interval type K-S statistic can be used to test not only the location parameter but also the heterogeneity of the distribution function.

Automated Test Data Generation for Dynamic Branch Coverage (동적 분기 커버리지를 위한 테스트 데이터 자동 생성)

  • Chung, In Sang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.7
    • /
    • pp.451-460
    • /
    • 2013
  • In order to achieve high test coverage, it is usual to generate test data using various techniques including symbolic execution, data flow analysis or constraints solving. Recently, a technique for automated test data generation that fulfills high coverage effectively without those sophisticated means has been proposed. However, the technique shows its weakness in the generation of test data that leads to high coverage for programs having branch conditions where different memory locations are binded during execution. For certain programs with flag conditions, in particular, high coverage can not be achieved because specific branches are not executed. To address the problem, this paper presents dynamic branch coverage criteria and a test data generation technique based on the notion of dynamic branch. It is shown that the proposed technique compared to the previous approach is more effective by conducting experiments involving programs with flag conditions.