• Title/Summary/Keyword: high-dimensional space

Search Result 568, Processing Time 0.028 seconds

A study on high dimensional large-scale data visualization (고차원 대용량 자료의 시각화에 대한 고찰)

  • Lee, Eun-Kyung;Hwang, Nayoung;Lee, Yoondong
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1061-1075
    • /
    • 2016
  • In this paper, we discuss various methods to visualize high dimensional large-scale data and review some issues associated with visualizing this type of data. High-dimensional data can be presented in a 2-dimensional space with a few selected important variables. We can visualize more variables with various aesthetic attributes in graphics or use the projection pursuit method to find an interesting low-dimensional view. For large-scale data, we discuss jittering and alpha blending methods that solve any problem with overlapping points. We also review the R package tabplot, scagnostics, and other R packages for interactive web application with visualization.

Variational Autoencoder Based Dimension Reduction and Clustering for Single-Cell RNA-seq Gene Expression (단일세포 RNA-SEQ의 유전자 발현 군집화를 위한 변이 자동인코더 기반의 차원감소와 군집화)

  • Chi, Sang-Mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.11
    • /
    • pp.1512-1518
    • /
    • 2021
  • Since single cell RNA sequencing provides the expression profiles of individual cells, it provides higher cellular differential resolution than traditional bulk RNA sequencing. Using these single cell RNA sequencing data, clustering analysis is generally conducted to find cell types and understand high level biological processes. In order to effectively process the high-dimensional single cell RNA sequencing data fir the clustering analysis, this paper uses a variational autoencoder to transform a high dimensional data space into a lower dimensional latent space, expecting to produce a latent space that can give more accurate clustering results. By clustering the features in the transformed latent space, we compare the performance of various classical clustering methods for single cell RNA sequencing data. Experimental results demonstrate that the proposed framework outperforms many state-of-the-art methods under various clustering performance metrics.

Declustering of High-dimensional Data by Cyclic Sliced Partitioning (주기적 편중 분할에 의한 다차원 데이터 디클러스터링)

  • Kim Hak-Cheol;Kim Tae-Wan;Li Ki-Joune
    • Journal of KIISE:Databases
    • /
    • v.31 no.6
    • /
    • pp.596-608
    • /
    • 2004
  • A lot of work has been done to reduce disk access time in I/O intensive systems, which store and handle massive amount of data, by distributing data across multiple disks and accessing them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell to a disk number on the assumption that data space is regular grid-like partitioned. Although we can achieve good performance for low-dimensional data by grid-like partitioning, its performance becomes degenerate as grows the dimension of data even with a good disk allocation scheme. This comes from the fact that they partition entire data space equally regardless of distribution ratio of data objects. Most of the data in high-dimensional space exist around the surface of space. For that reason, we propose a new declustering algorithm based on the partitioning scheme which partition data space from the surface. With an unbalanced partitioning scheme, several experimental results show that we can remarkably reduce the number of data blocks touched by a query as grows the dimension of data and a query size. In this paper, we propose disk allocation schemes based on the layout of the resultant data blocks after partitioning. To show the performance of the proposed algorithm, we have performed several experiments with different dimensional data and for a wide range of number of disks. Our proposed disk allocation method gives a performance within 10 additive disk accesses compared with strictly optimal allocation scheme. We compared our algorithm with Kronecker sequence based declustering algorithm, which is reported to be the best among the grid partition and mapping function based declustering algorithms. We can improve declustering performance up to 14 times as grows dimension of data.

The Study of Two-dimensional Chemical Distribution about Soil using Laser Spectroscopy (레이저 분광법을 활용한 토양 2차원 화학적 분포도 검출 연구)

  • Yang, Jun-Ho;Yoh, Jai-Ick
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.45 no.6
    • /
    • pp.523-530
    • /
    • 2017
  • Laser-Induced Breakdown Spectroscopy (LIBS) which a plasma is irradiated at a specific wavelength depending on the material when a high-energy laser is irradiated, and a Raman spectroscopy which measures rotation and vibration in molecules as light-scattering phenomenon occurs, are attracting attention as a space exploration technology because of the advantages of high accuracy and real-time analysis, and the ability to perform long-range detection. In this study, the tendency of the laser spectrum according to the change of the soil component was analyzed by laser spectroscopy and the two - dimensional chemical distribution was conducted based on the trend of laser spectrum. We have also established the environment of Mars (4-7 torr) and lunar atmosphere (<1 torr) in experimental setup, to prove that it is possible to measure by difference of soil chemical composition using LIBS and Raman spectroscopy even in artificial space environment.

Efficient Path Planning of a High DOF Multibody Robotic System using Adaptive RRT (Adaptive RRT를 사용한 고 자유도 다물체 로봇 시스템의 효율적인 경로계획)

  • Kim, Dong-Hyung;Choi, Youn-Sung;Yan, Rui-Jun;Luo, Lu-Ping;Lee, Ji Yeong;Han, Chang-Soo
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.21 no.3
    • /
    • pp.257-264
    • /
    • 2015
  • This paper proposes an adaptive RRT (Rapidly-exploring Random Tree) for path planning of high DOF multibody robotic system. For an efficient path planning in high-dimensional configuration space, the proposed algorithm adaptively selects the robot bodies depending on the complexity of path planning. Then, the RRT grows only using the DOFs corresponding with the selected bodies. Since the RRT is extended in the configuration space with adaptive dimensionality, the RRT can grow in the lower dimensional configuration space. Thus the adaptive RRT method executes a faster path planning and smaller DOF for a robot. We implement our algorithm for path planning of 19 DOF robot, AMIRO. The results from our simulations show that the adaptive RRT-based path planner is more efficient than the basic RRT-based path planner.

An Effective Method for Dimensionality Reduction in High-Dimensional Space (고차원 공간에서 효과적인 차원 축소 기법)

  • Jeong Seung-Do;Kim Sang-Wook;Choi Byung-Uk
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.88-102
    • /
    • 2006
  • In multimedia information retrieval, multimedia data are represented as vectors in high dimensional space. To search these vectors effectively, a variety of indexing methods have been proposed. However, the performance of these indexing methods degrades dramatically with increasing dimensionality, which is known as the dimensionality curse. To resolve the dimensionality curse, dimensionality reduction methods have been proposed. They map feature vectors in high dimensional space into the ones in low dimensional space before indexing the data. This paper proposes a method for dimensionality reduction based on a function approximating the Euclidean distance, which makes use of the norm and angle components of a vector. First, we identify the causes of the errors in angle estimation for approximating the Euclidean distance, and discuss basic directions to reduce those errors. Then, we propose a novel method for dimensionality reduction that composes a set of subvectors from a feature vector and maintains only the norm and the estimated angle for every subvector. The selection of a good reference vector is important for accurate estimation of the angle component. We present criteria for being a good reference vector, and propose a method that chooses a good reference vector by using Levenberg-Marquardt algorithm. Also, we define a novel distance function, and formally prove that the distance function lower-bounds the Euclidean distance. This implies that our approach does not incur any false dismissals in reducing the dimensionality effectively. Finally, we verify the superiority of the proposed method via performance evaluation with extensive experiments.

Feature Extraction on High Dimensional Data Using Incremental PCA (점진적인 주성분분석기법을 이용한 고차원 자료의 특징 추출)

  • Kim Byung-Joo
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.7
    • /
    • pp.1475-1479
    • /
    • 2004
  • High dimensional data requires efficient feature extraction techliques. Though PCA(Principal Component Analysis) is a famous feature extraction method it requires huge memory space and computational cost is high. In this paper we use incremental PCA for feature extraction on high dimensional data. Through experiment we show that proposed method is superior to APEX model.

GC-Tree: A Hierarchical Index Structure for Image Databases (GC-트리 : 이미지 데이타베이스를 위한 계층 색인 구조)

  • 차광호
    • Journal of KIISE:Databases
    • /
    • v.31 no.1
    • /
    • pp.13-22
    • /
    • 2004
  • With the proliferation of multimedia data, there is an increasing need to support the indexing and retrieval of high-dimensional image data. Although there have been many efforts, the performance of existing multidimensional indexing methods is not satisfactory in high dimensions. Thus the dimensionality reduction and the approximate solution methods were tried to deal with the so-called dimensionality curse. But these methods are inevitably accompanied by the loss of precision of query results. Therefore, recently, the vector approximation-based methods such as the VA- file and the LPC-file were developed to preserve the precision of query results. However, the performance of the vector approximation-based methods depend largely on the size of the approximation file and they lose the advantages of the multidimensional indexing methods that prune much search space. In this paper, we propose a new index structure called the GC-tree for efficient similarity search in image databases. The GC-tree is based on a special subspace partitioning strategy which is optimized for clustered high-dimensional images. It adaptively partitions the data space based on a density function and dynamically constructs an index structure. The resultant index structure adapts well to the strongly clustered distribution of high-dimensional images.

Hybrid Dimensional Approach to the Unsteady Compressible Flowfield Analysis around a High-speed Train Passing through a Tunnel (혼합차원기법을 이용한 고속열차의 터널 통과 시 발생하는 비정상 압축성 유동장의 수치해석)

  • Kim, Tae-Yoon;Kwon, Hyeok-Bin;Lee, Dong-Ho;Kim, Moon-Sang
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.30 no.6
    • /
    • pp.78-83
    • /
    • 2002
  • A modified patched grid scheme has been developed and employed for and axi-symmetric unsteady Euler solver based on Roe's FDS to analyze the unsteady flow fields induced by a train and a tunnel. On this paper, the innovative zonal method, named hybrid dimensional approach, was proposed and applied to the train-tunnel interaction problems. The basic idea of this method is to maximize the efficiency of numerical calculations by minimal assumption of spatial dimensions. The hybrid dimensional approach, embedded in the present modified patched grid method, yielded high numerical accuracy as much as the fully axe-symmetric method. The hybrid dimensional approach is expected to reduce the huge computation time of the train-tunnel interaction problems especially in the cases of solving a long tunnel.

Dynamic Characteristics of Space Framed Structures by Using Nonlinear Transient Analysis (비선형 과도해석을 이용한 스페이스 프레임 구조물의 동적특성)

  • Son, Jin Hee;Kim, Joo-Woo
    • Journal of Korean Society of Steel Construction
    • /
    • v.28 no.6
    • /
    • pp.395-402
    • /
    • 2016
  • Space frame structures considering the components such as forms, layers, grids, etc. are possible to form a large space without interior columns. Here, steels having the yield strengths of 210 MPa to 450 MPa are generally used. The high strength steel (i.e., yield strength of 690 MPa) having suitable weldability, aseismicity and economics have been recently developed. In this paper, the high strength steel is applied to the space frame structures in order to analytically find out their transient responses considering the material and geometric nonlinearities. For various circular dome types of space frame structures, the modal analysis and nonlinear transient analysis are carried out using nonlinear three dimensional finite element analysis.