• Title/Summary/Keyword: High Dimensionality Data

Search Result 121, Processing Time 0.03 seconds

A Collaborative Filtering using SVD on Low-Dimensional Space (SVD을 이용한 저차원 공간에서 협력적 여과)

  • Jung, Jun;Lee, Pil-Kyu
    • The KIPS Transactions:PartB
    • /
    • v.10B no.3
    • /
    • pp.273-280
    • /
    • 2003
  • Recommender System can help users to find products to Purchase. A representative method for recommender systems is collaborative filtering (CF). It predict products that user may like based on a group of similar users. User information is based on user's ratings for products and similarities of users are measured by ratings. As user is increasing tremendously, the performance of the pure collaborative filtering is lowed because of high dimensionality and scarcity of data. We consider the effect of dimension deduction in collaborative filtering to cope with scarcity of data experimentally. We suggest that SVD improves the performance of collaborative filtering in comparison with pure collaborative filtering.

A Kinematic Approach to Answering Similarity Queries on Complex Human Motion Data (운동학적 접근 방법을 사용한 복잡한 인간 동작 질의 시스템)

  • Han, Hyuck;Kim, Shin-Gyu;Jung, Hyung-Soo;Yeom, Heon-Y.
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.1-11
    • /
    • 2009
  • Recently there has arisen concern in both the database community and the graphics society about data retrieval from large motion databases because the high dimensionality of motion data implies high costs. In this circumstance, finding an effective distance measure and an efficient query processing method for such data is a challenging problem. This paper presents an elaborate motion query processing system, SMoFinder (Similar Motion Finder), which incorporates a novel kinematic distance measure and an efficient indexing strategy via adaptive frame segmentation. To this end, we regard human motions as multi-linkage kinematics and propose the weighted Minkowski distance metric. For efficient indexing, we devise a new adaptive segmentation method that chooses representative frames among similar frames and stores chosen frames instead of all frames. For efficient search, we propose a new search method that processes k-nearest neighbors queries over only representative frames. Our experimental results show that the size of motion databases is reduced greatly (${\times}1/25$) but the search capability of SMoFinder is equal to or superior to that of other systems.

  • PDF

Dimensionality Reduction Methods Analysis of Hyperspectral Imagery for Unsupervised Change Detection of Multi-sensor Images (이종 영상 간의 무감독 변화탐지를 위한 초분광 영상의 차원 축소 방법 분석)

  • PARK, Hong-Lyun;PARK, Wan-Yong;PARK, Hyun-Chun;CHOI, Seok-Keun;CHOI, Jae-Wan;IM, Hon-Ryang
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.22 no.4
    • /
    • pp.1-11
    • /
    • 2019
  • With the development of remote sensing sensor technology, it has become possible to acquire satellite images with various spectral information. In particular, since the hyperspectral image is composed of continuous and narrow spectral wavelength, it can be effectively used in various fields such as land cover classification, target detection, and environment monitoring. Change detection techniques using remote sensing data are generally performed through differences of data with same dimensions. Therefore, it has a disadvantage that it is difficult to apply to heterogeneous sensors having different dimensions. In this study, we have developed a change detection method applicable to hyperspectral image and high spat ial resolution satellite image with different dimensions, and confirmed the applicability of the change detection method between heterogeneous images. For the application of the change detection method, the dimension of hyperspectral image was reduced by using correlation analysis and principal component analysis, and the change detection algorithm used CVA. The ROC curve and the AUC were calculated using the reference data for the evaluation of change detection performance. Experimental results show that the change detection performance is higher when using the image generated by adequate dimensionality reduction than the case using the original hyperspectral image.

An Improvement of FSDD for Evaluating Multi-Dimensional Data (다차원 데이터 평가가 가능한 개선된 FSDD 연구)

  • Oh, Se-jong
    • Journal of Digital Convergence
    • /
    • v.15 no.1
    • /
    • pp.247-253
    • /
    • 2017
  • Feature selection or variable selection is a data mining scheme for selecting highly relevant features with target concept from high dimensional data. It decreases dimensionality of data, and makes it easy to analyze clusters or classification. A feature selection scheme requires an evaluation function. Most of current evaluation functions are based on statistics or information theory, and they can evaluate only for single feature (one-dimensional data). However, features have interactions between them, and require evaluation function for multi-dimensional data for efficient feature selection. In this study, we propose modification of FSDD evaluation function for utilizing evaluation of multiple features using extended distance function. Original FSDD is just possible for single feature evaluation. Proposed approach may be expected to be applied on other single feature evaluation method.

Study on Dimensionality Reduction for Sea-level Variations by Using Altimetry Data around the East Asia Coasts

  • Hwang, Do-Hyun;Bak, Suho;Jeong, Min-Ji;Kim, Na-Kyeong;Park, Mi-So;Kim, Bo-Ram;Yoon, Hong-Joo
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.1
    • /
    • pp.85-95
    • /
    • 2021
  • Recently, as data mining and artificial neural network techniques are developed, analyzing large amounts of data is proposed to reduce the dimension of the data. In general, empirical orthogonal function (EOF) used to reduce the dimension in the ocean data and recently, Self-organizing maps (SOM) algorithm have been investigated to apply to the ocean field. In this study, both algorithms used the monthly Sea level anomaly (SLA) data from 1993 to 2018 around the East Asia Coasts. There was dominated by the influence of the Kuroshio Extension and eddy kinetic energy. It was able to find the maximum amount of variance of EOF modes. SOM algorithm summarized the characteristic of spatial distributions and periods in EOF mode 1 and 2. It was useful to find the change of SLA variable through the movement of nodes. Node 1 and 5 appeared in the early 2000s and the early 2010s when the sea level was high. On the other hand, node 2 and 6 appeared in the late 1990s and the late 2000s, when the sea level was relatively low. Therefore, it is considered that the application of the SOM algorithm around the East Asia Coasts is well distinguished. In addition, SOM results processed by SLA data, it is able to apply the other climate data to explain more clearly SLA variation mechanisms.

Feature selection for text data via sparse principal component analysis (희소주성분분석을 이용한 텍스트데이터의 단어선택)

  • Won Son
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.501-514
    • /
    • 2023
  • When analyzing high dimensional data such as text data, if we input all the variables as explanatory variables, statistical learning procedures may suffer from over-fitting problems. Furthermore, computational efficiency can deteriorate with a large number of variables. Dimensionality reduction techniques such as feature selection or feature extraction are useful for dealing with these problems. The sparse principal component analysis (SPCA) is one of the regularized least squares methods which employs an elastic net-type objective function. The SPCA can be used to remove insignificant principal components and identify important variables from noisy observations. In this study, we propose a dimension reduction procedure for text data based on the SPCA. Applying the proposed procedure to real data, we find that the reduced feature set maintains sufficient information in text data while the size of the feature set is reduced by removing redundant variables. As a result, the proposed procedure can improve classification accuracy and computational efficiency, especially for some classifiers such as the k-nearest neighbors algorithm.

A comparison study of canonical methods: Application to -Omics data (오믹스 자료를 이용한 정준방법 비교)

  • Seungsoo Lee;Eun Jeong Min
    • The Korean Journal of Applied Statistics
    • /
    • v.37 no.2
    • /
    • pp.157-176
    • /
    • 2024
  • Integrative analysis for better understanding of complex biological systems gains more attention. Observing subjects from various perspectives and conducting integrative analysis of those multiple datasets enables a deeper understanding of the subject. In this paper, we compared two methods that simultaneously consider two datasets gathered from the same objects, canonical correlation analysis (CCA) and co-inertia analysis (CIA). Since CCA cannot handle the case when the data exhibit high-dimensionality, two strategies were considered instead: Utilization of a ridge constant (CCA-ridge) and substitution of covariance matrices of each data to identity matrix and then applying penalized singular value decomposition (CCA-PMD). To illustrate CIA and CCA, both extensions of CCA and CIA were applied to NCI60 cell line data. It is shown that both methods yield biologically meaningful and significant results by identifying important genes that enhance our comprehension of the data. Their results shows some dissimilarities arisen from the different criteria used to measure the relationship between two sets of data in each method. Additionally, CIA exhibits variations dependent on the weight matrices employed.

The Study of the Validity Test on the Self-monitoring Scale (자기 검색척도(Self-Monitoring Scale)의 타당성 검정에 관한 연구)

  • 이선아
    • Journal of Korean Academy of Nursing
    • /
    • v.28 no.3
    • /
    • pp.751-759
    • /
    • 1998
  • The study of the validity test on the self-monitoring scale for nurses In this study, both the literary survey as well as empirical research has been executed to test the validity of the scales that measure the construct of the self-monitoring scale. The self-monitoring scale could not be classified into five factors as Snyder suggested. Many other scholars (Briggs, Cheek and Buss, 1980) suggested 3 different classifications which was accepted by Snyder and Gangestad (1986). John, Cheek and Klohnen(1996) claimed a two-factor classification. As has been discussed, factor analysis is used to prove convergent validity within the factor and discriminant validity between the factors. However, depending on the researchers, many variations in classification of the factors were found and a lack of content and discriminant validity were found in the previous research findings. It is also important to note that Snyder's self-monitoring scale did not factor-load at over. 30 for all 25 items, regardless of how many factors could be classified. According to findings of this study, the self-monitoring scale neither classified as five, three or two factors nor factor loaded as hypothesized. It is also clear that Snyder's self-monitoring scale lacks convergent validity as the sub-factors of the scale failed to prove its uni-dimensionality. The A self-monit oring scale not only fail to overcome the problems of Snyder's self-monitori ng scale but even lost the attractiveness of the self-monitoring scale. In this study it was also found that the A self-monitoring scale was not classified in either in a two or three-factor classification as hypothesized. It is, of course, not desirable to use any scale that lacks convergent and discriminant validity even though it has been widely used and has held a great deal of influence on the field of social psychology. To overcome the shortcomings of Snyder's self-monitoring scale, Lennox and Wolfe(1984) suggested 13 items. This study was dedicated to test the validity and reliability of the scale, in which we found that the data presented in validity as the two factors were class ified and loaded as expected. Reliability was also proven by checking Cronbach's α for each factor and for the total items. In addition, a confirmatory factor analysis was executed for the 13 items using LISREL 8.12 program to confirm convergent validity in a two-factor classification. The model was fitting and sound : however, the self-monitoring scale was unfitted and not validated. Thus, it is recommended to use not the original nor the abbreviated self-monitoring scale but the 13 items in future studies. It should also be noted that items 7 and 13 should be removed to obtain better uni-dimensionality for the 13 items. These items loaded at over. 30, too high for the two factors in the test results of Factor analysis. In addition, it is necessary to double-check the cause of two-hold loading at over .30 for the two factors. It could be a problem caused by data or by the scale itself. Therefore, additional studies should follow to better clarify this matter.

  • PDF

Study of the Validity Test on the Self-monitoring Scale for Primi-Gravida (초임부를 대상으로 한 자가검색도 척도의 타당도 비교)

  • Lee, Seon-Ah
    • Women's Health Nursing
    • /
    • v.4 no.2
    • /
    • pp.173-186
    • /
    • 1998
  • In this study, both the literary survey as well as empirical research has been executed to test the validity of the scales that measure the construct of self-monitoring scale could not be classified into five factors as Snyder suggested. Many other scholars (Briggs, Cheek and Buss, 1980) suggested 3 different classifications which was accepted by Snyder and Gangestad (1986). John, Cheek and Klohnen (1996) claimed a two-factor classification. As has been discussed, factor analysis is used to prove convergent validity within the factor and discriminant validity between the factors. However, depending on the researchers, many variations in classification of the factors were found and a lack of content and discriminant validity was found in the previous research findings. It is also important to note that Snyder's self-monitoring scale, did not factor-load at over 30 for all 25 items, regardless of how many factors could be classified. According to findings of this study, the self-monitoring scale neither classified as five, three or two factors nor factor loaded as hypothesized. It is also clear that Snyder's self-monitoring scale lack convergent validity as the sub-factors of the scale fail to prove its uni-dimensionality. The A self-monitoring scale not only fail to overcome the problems of Snyder's self-monitoring scale but even lost the attractiveness of the self-monitoring scale. In this study, it was also found that the A self-monitoring scale was not classified as hypothesized in either in a two or three-factor classification. It is, of course, not desirable to use any scale that lacks convergent and discriminant validity even though it has been widely used but also has held a great deal of influence on the field of social psychology. To overcome the shortcomings of Snyder's self-monitoring scale, Lennox and Wolfe(1984) suggested 13 items. This study 1. was dedicated to test the validity and reliability of the scale, in which we found that the data presented in validity as the two factors were classified and loaded as expected. Reliability was also proven by checking Cronbach's alpha for each factor and for the total items. In addition, a confirmatory factor analysis was executed for the 13 items using LISREL 8.12 program to confirm convergent validity in a two-factor classification. The model was fitting and sound ; however, the self-monitoring scale was unfitted and not validated. Thus, it is recommended to use not the original or the abbreviated self-monitoring scale but the 13 items in future studies. It should also be noted that items 7 and 13 should be removed to obtain better uni-dimensionality for the 13 items. These items loaded at over .30, too high for the two factors in the test results of factor analysis. In addition, it is necessary to double-check the cause of two-hold loading at over .30 for the two factors. It could be a problem caused by data or by the scale itself. Therefore, additional studies should follow to better clarify this matter.

  • PDF

A Study on Face Image Recognition Using Feature Vectors (특징벡터를 사용한 얼굴 영상 인식 연구)

  • Kim Jin-Sook;Kang Jin-Sook;Cha Eui-Young
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.9 no.4
    • /
    • pp.897-904
    • /
    • 2005
  • Face Recognition has been an active research area because it is not difficult to acquire face image data and it is applicable in wide range area in real world. Due to the high dimensionality of a face image space, however, it is not easy to process the face images. In this paper, we propose a method to reduce the dimension of the facial data and extract the features from them. It will be solved using the method which extracts the features from holistic face images. The proposed algorithm consists of two parts. The first is the using of principal component analysis (PCA) to transform three dimensional color facial images to one dimensional gray facial images. The second is integrated linear discriminant analusis (PCA+LDA) to prevent the loss of informations in case of performing separated steps. Integrated LDA is integrated algorithm of PCA for reduction of dimension and LDA for discrimination of facial vectors. First, in case of transformation from color image to gray image, PCA(Principal Component Analysis) is performed to enhance the image contrast to raise the recognition rate. Second, integrated LDA(Linear Discriminant Analysis) combines the two steps, namely PCA for dimensionality reduction and LDA for discrimination. It makes possible to describe concise algorithm expression and to prevent the information loss in separate steps. To validate the proposed method, the algorithm is implemented and tested on well controlled face databases.