• 제목/요약/키워드: methods of data analysis

Search Result 19,359, Processing Time 0.054 seconds

Comparative Study of Dimension Reduction Methods for Highly Imbalanced Overlapping Churn Data

  • Lee, Sujee;Koo, Bonhyo;Jung, Kyu-Hwan
    • Industrial Engineering and Management Systems
    • /
    • v.13 no.4
    • /
    • pp.454-462
    • /
    • 2014
  • Retention of possible churning customer is one of the most important issues in customer relationship management, so companies try to predict churn customers using their large-scale high-dimensional data. This study focuses on dealing with large data sets by reducing the dimensionality. By using six different dimension reduction methods-Principal Component Analysis (PCA), factor analysis (FA), locally linear embedding (LLE), local tangent space alignment (LTSA), locally preserving projections (LPP), and deep auto-encoder-our experiments apply each dimension reduction method to the training data, build a classification model using the mapped data and then measure the performance using hit rate to compare the dimension reduction methods. In the result, PCA shows good performance despite its simplicity, and the deep auto-encoder gives the best overall performance. These results can be explained by the characteristics of the churn prediction data that is highly correlated and overlapped over the classes. We also proposed a simple out-of-sample extension method for the nonlinear dimension reduction methods, LLE and LTSA, utilizing the characteristic of the data.

Methodology of Spatio-temporal Matching for Constructing an Analysis Database Based on Different Types of Public Data

  • Jung, In taek;Chong, Kyu soo
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.35 no.2
    • /
    • pp.81-90
    • /
    • 2017
  • This study aimed to construct an integrated database using the same spatio-temporal unit by employing various public-data types with different real-time information provision cycles and spatial units. Towards this end, three temporal interpolation methods (piecewise constant interpolation, linear interpolation, nonlinear interpolation) and a spatial matching method by district boundaries was proposed. The case study revealed that the linear interpolation is an excellent method, and the spatial matching method also showed good results. It is hoped that various prediction models and data analysis methods will be developed in the future using different types of data in the analysis database.

A Systematic Review of Big Data: Research Approaches and Future Prospects

  • Cobanoglu, Cihan;Terrah, Abraham;Hsu, Meng-Jun;Corte, Valentina Della;Gaudio, Giovanna Del
    • Journal of Smart Tourism
    • /
    • v.2 no.1
    • /
    • pp.21-31
    • /
    • 2022
  • This review paper aims at providing a systematic analysis of articles published in various journals and related to the uses and business applications of big data. The goal is to provide a holistic picture of the place of big data in the tourism industry. The reviewed articles have been selected for the period 2013-2020 and have been classified into 8 broad categories namely business strategy and firm performance; banking and finance; healthcare; hospitality; networks and telecommunications; urbanism and infrastructures; law and legal regulations; and government. While the categories are reflective of components of tourism industries and infrastructures, the meta-analysis is organized around 3 broad themes: preferred research contexts, conceptual developments, and methods used to research big data business applications. Main findings revealed that firm performance and healthcare remain popular contexts of research in the big data realm, but also demonstrated a prominence of qualitative methods over mixed and quantitative methods for the period 2013-2020. Scholars have also investigated topics involving the notions of competitive advantage, supply chain management, smart cities, but also ethics and privacy issues as related to the use of big data.

Statistical Analysis of Bending-Strength Data of Ceramic Matrix Composites : Estimation of Weibull Shape Parameter (세라믹 복합체의 굽힘강도 데이터의 통계적분석 : 와이블 형상모수의 추정과 비교를 중심으로)

  • 전영록
    • Journal of Applied Reliability
    • /
    • v.1 no.1
    • /
    • pp.17-33
    • /
    • 2001
  • The characteristics of Weibull distribution are investigated as a function of shape parameter. The statistical estimation methods of the shape parameter and statistical comparison methods of two or more shape parameters are studied. Assuming Weibull distribution, statistical analysis of bending-strength data of alumina titanium carbide ceramic matrix composites machined two different methods are performed.

  • PDF

Functional Data Classification of Variable Stars

  • Park, Minjeong;Kim, Donghoh;Cho, Sinsup;Oh, Hee-Seok
    • Communications for Statistical Applications and Methods
    • /
    • v.20 no.4
    • /
    • pp.271-281
    • /
    • 2013
  • This paper considers a problem of classification of variable stars based on functional data analysis. For a better understanding of galaxy structure and stellar evolution, various approaches for classification of variable stars have been studied. Several features that explain the characteristics of variable stars (such as color index, amplitude, period, and Fourier coefficients) were usually used to classify variable stars. Excluding other factors but focusing only on the curve shapes of variable stars, Deb and Singh (2009) proposed a classification procedure using multivariate principal component analysis. However, this approach is limited to accommodate some features of the light curve data that are unequally spaced in the phase domain and have some functional properties. In this paper, we propose a light curve estimation method that is suitable for functional data analysis, and provide a classification procedure for variable stars that combined the features of a light curve with existing functional data analysis methods. To evaluate its practical applicability, we apply the proposed classification procedure to the data sets of variable stars from the project STellar Astrophysics and Research on Exoplanets (STARE).

Functional Data Analysis of Temperature and Precipitation Data (기온 강수량 자료의 함수적 데이터 분석)

  • Kang, Kee-Hoon;Ahn, Hong-Se
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.3
    • /
    • pp.431-445
    • /
    • 2006
  • In this paper we review some methods for analyzing functional data and illustrate real application of functional data analysis. Representing methods for functional data by using basis function, analyzing functional variation by functional principal component analysis and functional linear models are reviewed. For a real application, we use temperature and precipitation data measured in Korea from the January of 1970 to the May of 2004. We apply functional principal component analysis for each data and test the significance of regional division done by using shining hours. We also estimate functional regression model for temperature and precipitation.

An Analysis of Research Trends in Korean Journals on the Role of Fathers with Young Children: Research Papers from 2000 to Present (유아기 자녀를 둔 아버지의 역할에 관한 국내학술지 연구동향 분석: 2000년이후 발표된 학술지를 중심으로)

  • Yoon, Hye-Jin;Hur, Young-Rim
    • The Korean Journal of Community Living Science
    • /
    • v.25 no.4
    • /
    • pp.449-460
    • /
    • 2014
  • This study examines research trends in Korean journal articles covering the role of fathers with young children. For this study, 45 research papers published from 2000 to present were analyzed according to research periods, research topics, research types, data collection methods, and data analysis methods. First, the largest number of papers was written since 2010. Second, the largest number of papers in terms of research topics focused on the father's child-rearing involvement and behavior. Third the most frequently used research type was the quantitative study. Fourth, the most frequently used data collection method was the questionnaire method. Fifth, the most frequently used data analysis method was the frequency and mean method. Future research should consider broader age groups of father and children by using various types of data collection and analysis methods. In addition, it should be useful to scrutinize general research trends in Korean journal articles highlighting the importance of roles of fathers with young children in a rapidly changing society.

A Study of Applying Bootstrap Method to Seasonal Data (계절성 데이터의 부트스트랩 적용에 관한 연구)

  • Park, Jin-Soo;Kim, Yun-Bae
    • Journal of the Korea Society for Simulation
    • /
    • v.19 no.3
    • /
    • pp.119-125
    • /
    • 2010
  • The moving block bootstrap, the stationary bootstrap, and the threshold bootstrap are methods of simulation output analysis, which are applicable to autocorrelated data. These bootstrap methods assume the stationarity of data. However, bootstrap methods cannot work if the stationary assumption is not guaranteed because of seasonality or trends in data. In the simulation output analysis, threshold bootstrap method is the best in describing the autocorrelation structure of original data set. The threshold bootstrap makes the cycle based on threshold value. If we apply the bootstrap to seasonality data, we can get similar accuracy of the results. In this paper, we verify the possibility of applying the bootstrap to seasonal data.

Analysis of Interval-censored Survival Data from Crossover Trials with Proportional Hazards Model (교차계획 구간절단 생존자료의 비례위험모형을 이용한 분석)

  • Kim, Eun-Young;Song, Hae-Hiang
    • The Korean Journal of Applied Statistics
    • /
    • v.20 no.1
    • /
    • pp.39-52
    • /
    • 2007
  • Crossover trials of new drugs in the treatment of angina pectoris, which frequently use treadmill exercise test for the assessment of its efficacy, produce censored survival times. In this paper we consider analysis approaches for censored survival times from crossover trials. Previously, a stratified Cox model for paired observation and nonparametric methods have been presented as possible analysis methods. On the other hand, the differences of two survival times would produce interval-censored survival times and we propose a Cox model for interval-censored data as n alternative analysis method. Example data is analyzed in order to compare these different methods.

Evaluation of Similarity Analysis of Newspaper Article Using Natural Language Processing

  • Ayako Ohshiro;Takeo Okazaki;Takashi Kano;Shinichiro Ueda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.6
    • /
    • pp.1-7
    • /
    • 2024
  • Comparing text features involves evaluating the "similarity" between texts. It is crucial to use appropriate similarity measures when comparing similarities. This study utilized various techniques to assess the similarities between newspaper articles, including deep learning and a previously proposed method: a combination of Pointwise Mutual Information (PMI) and Word Pair Matching (WPM), denoted as PMI+WPM. For performance comparison, law data from medical research in Japan were utilized as validation data in evaluating the PMI+WPM method. The distribution of similarities in text data varies depending on the evaluation technique and genre, as revealed by the comparative analysis. For newspaper data, non-deep learning methods demonstrated better similarity evaluation accuracy than deep learning methods. Additionally, evaluating similarities in law data is more challenging than in newspaper articles. Despite deep learning being the prevalent method for evaluating textual similarities, this study demonstrates that non-deep learning methods can be effective regarding Japanese-based texts.