• Title/Summary/Keyword: sparse data structure

Search Result 50, Processing Time 0.02 seconds

MP-Lasso chart: a multi-level polar chart for visualizing group Lasso analysis of genomic data

  • Min Song;Minhyuk Lee;Taesung Park;Mira Park
    • Genomics & Informatics
    • /
    • v.20 no.4
    • /
    • pp.48.1-48.7
    • /
    • 2022
  • Penalized regression has been widely used in genome-wide association studies for joint analyses to find genetic associations. Among penalized regression models, the least absolute shrinkage and selection operator (Lasso) method effectively removes some coefficients from the model by shrinking them to zero. To handle group structures, such as genes and pathways, several modified Lasso penalties have been proposed, including group Lasso and sparse group Lasso. Group Lasso ensures sparsity at the level of pre-defined groups, eliminating unimportant groups. Sparse group Lasso performs group selection as in group Lasso, but also performs individual selection as in Lasso. While these sparse methods are useful in high-dimensional genetic studies, interpreting the results with many groups and coefficients is not straightforward. Lasso's results are often expressed as trace plots of regression coefficients. However, few studies have explored the systematic visualization of group information. In this study, we propose a multi-level polar Lasso (MP-Lasso) chart, which can effectively represent the results from group Lasso and sparse group Lasso analyses. An R package to draw MP-Lasso charts was developed. Through a real-world genetic data application, we demonstrated that our MP-Lasso chart package effectively visualizes the results of Lasso, group Lasso, and sparse group Lasso.

Sparse Matrix Compression Technique and Hardware Design for Lightweight Deep Learning Accelerators (경량 딥러닝 가속기를 위한 희소 행렬 압축 기법 및 하드웨어 설계)

  • Kim, Sunhee;Shin, Dongyeob;Lim, Yong-Seok
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.17 no.4
    • /
    • pp.53-62
    • /
    • 2021
  • Deep learning models such as convolutional neural networks and recurrent neual networks process a huge amounts of data, so they require a lot of storage and consume a lot of time and power due to memory access. Recently, research is being conducted to reduce memory usage and access by compressing data using the feature that many of deep learning data are highly sparse and localized. In this paper, we propose a compression-decompression method of storing only the non-zero data and the location information of the non-zero data excluding zero data. In order to make the location information of non-zero data, the matrix data is divided into sections uniformly. And whether there is non-zero data in the corresponding section is indicated. In this case, section division is not executed only once, but repeatedly executed, and location information is stored in each step. Therefore, it can be properly compressed according to the ratio and distribution of zero data. In addition, we propose a hardware structure that enables compression and decompression without complex operations. It was designed and verified with Verilog, and it was confirmed that it can be used in hardware deep learning accelerators.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.4
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.

Compressing Method of NetCDF Files Based on Sparse Matrix (희소행렬 기반 NetCDF 파일의 압축 방법)

  • Choi, Gyuyeun;Heo, Daeyoung;Hwang, Suntae
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.11
    • /
    • pp.610-614
    • /
    • 2014
  • Like many types of scientific data, results from simulations of volcanic ash diffusion are of a clustered sparse matrix in the netCDF format. Since these data sets are large in size, they generate high storage and transmission costs. In this paper, we suggest a new method that reduces the size of the data of volcanic ash diffusion simulations by converting the multi-dimensional index to a single dimension and keeping only the starting point and length of the consecutive zeros. This method presents performance that is almost as good as that of ZIP format compression, but does not destroy the netCDF structure. The suggested method is expected to allow for storage space to be efficiently used by reducing both the data size and the network transmission time.

Spatial Data Structure for Efficient Representation of Very Large Sparse Volume Data for 3D Reconstruction (3차원 복원을 위한 대용량 희소 볼륨 데이터의 효율적인 저장을 위한 공간자료구조)

  • An, Jae Pung;Shin, Seungmi;Seo, Woong;Ihm, Insung
    • Journal of the Korea Computer Graphics Society
    • /
    • v.23 no.3
    • /
    • pp.19-29
    • /
    • 2017
  • When a fixed-sized memory allocation method is used for sparse volume data, a considerable memory space is in general wasted, which becomes more serious for a large volume of high resolution. In this paper, in order to reduce such unnecessary memory consumption, we propose a volume representation method to store mostly voxels that represent valid information rather than all voxels in a fixed volume space. Then our method is compared with the conventional static memory allocation method, an octree-based representation, and a voxel hashing method in terms of memory usage and computation speed. In particular, we compare the proposed method and the voxel hashing method with respect to implementation of the GPU-based Marching Cubes algorithm.

GEase-K: Linear and Nonlinear Autoencoder-based Recommender System with Side Information (GEase-K: 부가 정보를 활용한 선형 및 비선형 오토인코더 기반의 추천시스템)

  • Taebeom Lee;Seung-hak Lee;Min-jeong Ma;Yoonho Cho
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.3
    • /
    • pp.167-183
    • /
    • 2023
  • In the recent field of recommendation systems, various studies have been conducted to model sparse data effectively. Among these, GLocal-K(Global and Local Kernels for Recommender Systems) is a research endeavor combining global and local kernels to provide personalized recommendations by considering global data patterns and individual user characteristics. However, due to its utilization of kernel tricks, GLocal-K exhibits diminished performance on highly sparse data and struggles to offer recommendations for new users or items due to the absence of side information. In this paper, to address these limitations of GLocal-K, we propose the GEase-K (Global and EASE kernels for Recommender Systems) model, incorporating the EASE(Embarrassingly Shallow Autoencoders for Sparse Data) model and leveraging side information. Initially, we substitute EASE for the local kernel in GLocal-K to enhance recommendation performance on highly sparse data. EASE, functioning as a simple linear operational structure, is an autoencoder that performs highly on extremely sparse data through regularization and learning item similarity. Additionally, we utilize side information to alleviate the cold-start problem. We enhance the understanding of user-item similarities by employing a conditional autoencoder structure during the training process to incorporate side information. In conclusion, GEase-K demonstrates resilience in highly sparse data and cold-start situations by combining linear and nonlinear structures and utilizing side information. Experimental results show that GEase-K outperforms GLocal-K based on the RMSE and MAE metrics on the highly sparse GoodReads and ModCloth datasets. Furthermore, in cold-start experiments divided into four groups using the GoodReads and ModCloth datasets, GEase-K denotes superior performance compared to GLocal-K.

Vehicle Image Recognition Using Deep Convolution Neural Network and Compressed Dictionary Learning

  • Zhou, Yanyan
    • Journal of Information Processing Systems
    • /
    • v.17 no.2
    • /
    • pp.411-425
    • /
    • 2021
  • In this paper, a vehicle recognition algorithm based on deep convolutional neural network and compression dictionary is proposed. Firstly, the network structure of fine vehicle recognition based on convolutional neural network is introduced. Then, a vehicle recognition system based on multi-scale pyramid convolutional neural network is constructed. The contribution of different networks to the recognition results is adjusted by the adaptive fusion method that adjusts the network according to the recognition accuracy of a single network. The proportion of output in the network output of the entire multiscale network. Then, the compressed dictionary learning and the data dimension reduction are carried out using the effective block structure method combined with very sparse random projection matrix, which solves the computational complexity caused by high-dimensional features and shortens the dictionary learning time. Finally, the sparse representation classification method is used to realize vehicle type recognition. The experimental results show that the detection effect of the proposed algorithm is stable in sunny, cloudy and rainy weather, and it has strong adaptability to typical application scenarios such as occlusion and blurring, with an average recognition rate of more than 95%.

Bayesian Inference of the Stochastic Gompertz Growth Model for Tumor Growth

  • Paek, Jayeong;Choi, Ilsu
    • Communications for Statistical Applications and Methods
    • /
    • v.21 no.6
    • /
    • pp.521-528
    • /
    • 2014
  • A stochastic Gompertz diffusion model for tumor growth is a topic of active interest as cancer is a leading cause of death in Korea. The direct maximum likelihood estimation of stochastic differential equations would be possible based on the continuous path likelihood on condition that a continuous sample path of the process is recorded over the interval. This likelihood is useful in providing a basis for the so-called continuous record or infill likelihood function and infill asymptotic. In practice, we do not have fully continuous data except a few special cases. As a result, the exact ML method is not applicable. In this paper we proposed a method of parameter estimation of stochastic Gompertz differential equation via Markov chain Monte Carlo methods that is applicable for several data structures. We compared a Markov transition data structure with a data structure that have an initial point.

An Efficient Ordering Method and Data Structure of the Interior Point Method (Putting Emphasis on the Minimum Deficiency Ordering (내부점기법에 있어서 효율적인 순서화와 자료구조(최소부족순서화를 중심으로))

  • 박순달;김병규;성명기
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.21 no.3
    • /
    • pp.63-74
    • /
    • 1996
  • Ordering plays an important role in solving an LP problem with sparse matrix by the interior point method. Since ordering is NP-complete, we try to find an efficient method. The objective of this paper is to present an efficient heuristic ordering method for implementation of the minimum deficiency method. Both the ordering method and the data structure play important roles in implementation. First we define a new heuristic pseudo-deficiency ordering method and a data structure for the method-quotient graph and cliqued storage. Next we show an experimental result in terms of time and nonzero numbers by NETLIB problems.

  • PDF

Experimental Comparisons of Simplex Method Program's Speed with Various Memory Referencing Techniques and Data Structures (여러 가지 컴퓨터 메모리 참조 방법과 자료구조에 대한 단체법 프로그램 수행 속도의 비교)

  • Park, Chan-Kyoo;Lim, Sung-Mook;Kim, Woo-Jae;Park, Soon-Dal
    • IE interfaces
    • /
    • v.11 no.2
    • /
    • pp.149-157
    • /
    • 1998
  • In this paper, various techniques considering the characteristics of computer memory management are suggested, which can be used in the implementation of simplex method. First, reduction technique of indirect addressing, redundant references of memory, and scatter/gather technique are implemented, and the effectiveness of the techniques is shown. Loop-unrolling technique, which exploits the arithmetic operation mechanism of computer, is also implemented. Second, a subroutine frequently called is written in low-level language, and the effectiveness is proved by experimental results. Third, row-column linked list and Gustavson's data structure are compared as the data structure for the large sparse matrix in LU form. Last, buffering technique and memory-mapped file which can be used in reading large data file are implemented and the effectiveness is shown.

  • PDF