• Title/Summary/Keyword: High-dimensional datasets

Search Result 46, Processing Time 0.025 seconds

Classification of Tabular Data using High-Dimensional Mapping and Deep Learning Network (고차원 매핑기법과 딥러닝 네트워크를 통한 정형데이터의 분류)

  • Kyeong-Taek Kim;Won-Du Chang
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.6
    • /
    • pp.119-124
    • /
    • 2023
  • Deep learning has recently demonstrated conspicuous efficacy across diverse domains than traditional machine learning techniques, as the most popular approach for pattern recognition. The classification problems for tabular data, however, are remain for the area of traditional machine learning. This paper introduces a novel network module designed to tabular data into high-dimensional tensors. The module is integrated into conventional deep learning networks and subsequently applied to the classification of structured data. The proposed method undergoes training and validation on four datasets, culminating in an average accuracy of 90.22%. Notably, this performance surpasses that of the contemporary deep learning model, TabNet, by 2.55%p. The proposed approach acquires significance by virtue of its capacity to harness diverse network architectures, renowned for their superior performance in the domain of computer vision, for the analysis of tabular data.

Application into Assessment of Liquefaction Hazard and Geotechnical Vulnerability During Earthquake with High-Precision Spatial-Ground Model for a City Development Area (도시개발 영역 고정밀 공간지반모델의 지진 시 액상화 재해 및 지반 취약성 평가 활용)

  • Kim, Han-Saem;Sun, Chang-Guk;Ha, Ik-Soo
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.27 no.5
    • /
    • pp.221-230
    • /
    • 2023
  • This study proposes a methodology for assessing seismic liquefaction hazard by implementing high-resolution three-dimensional (3D) ground models with high-density/high-precision site investigation data acquired in an area of interest, which would be linked to geotechnical numerical analysis tools. It is possible to estimate the vulnerability of earthquake-induced geotechnical phenomena (ground motion amplification, liquefaction, landslide, etc.) and their triggering complex disasters across an area for urban development with several stages of high-density datasets. In this study, the spatial-ground models for city development were built with a 3D high-precision grid of 5 m × 5 m × 1 m by applying geostatistic methods. Finally, after comparing each prediction error, the geotechnical model from the Gaussian sequential simulation is selected to assess earthquake-induced geotechnical hazards. In particular, with seven independent input earthquake motions, liquefaction analysis with finite element analyses and hazard mappings with LPI and LSN are performed reliably based on the spatial geotechnical models in the study area. Furthermore, various phenomena and parameters, including settlement in the city planning area, are assessed in terms of geotechnical vulnerability also based on the high-resolution spatial-ground modeling. This case study on the high-precision 3D ground model-based zonations in the area of interest verifies the usefulness in assessing spatially earthquake-induced hazards and geotechnical vulnerability and their decision-making support.

Removing Non-informative Features by Robust Feature Wrapping Method for Microarray Gene Expression Data (유전자 알고리즘과 Feature Wrapping을 통한 마이크로어레이 데이타 중복 특징 소거법)

  • Lee, Jae-Sung;Kim, Dae-Won
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.8
    • /
    • pp.463-478
    • /
    • 2008
  • Due to the high dimensional problem, typically machine learning algorithms have relied on feature selection techniques in order to perform effective classification in microarray gene expression datasets. However, the large number of features compared to the number of samples makes the task of feature selection computationally inprohibitive and prone to errors. One of traditional feature selection approach was feature filtering; measuring one gene per one step. Then feature filtering was an univariate approach that cannot validate multivariate correlations. In this paper, we proposed a function for measuring both class separability and correlations. With this approach, we solved the problem related to feature filtering approach.

Pixel level prediction of dynamic pressure distribution on hull surface based on convolutional neural network (합성곱 신경망 기반 선체 표면 압력 분포의 픽셀 수준 예측)

  • Kim, Dayeon;Seo, Jeongbeom;Lee, Inwon
    • Journal of the Korean Society of Visualization
    • /
    • v.20 no.2
    • /
    • pp.78-85
    • /
    • 2022
  • In these days, the rapid development in prediction technology using artificial intelligent is being applied in a variety of engineering fields. Especially, dimensionality reduction technologies such as autoencoder and convolutional neural network have enabled the classification and regression of high-dimensional data. In particular, pixel level prediction technology enables semantic segmentation (fine-grained classification), or physical value prediction for each pixel such as depth or surface normal estimation. In this study, the pressure distribution of the ship's surface was estimated at the pixel level based on the artificial neural network. First, a potential flow analysis was performed on the hull form data generated by transforming the baseline hull form data to construct 429 datasets for learning. Thereafter, a neural network with a U-shape structure was configured to learn the pressure value at the node position of the pretreated hull form. As a result, for the hull form included in training set, it was confirmed that the neural network can make a good prediction for pressure distribution. But in case of container ship, which is not included and have different characteristics, the network couldn't give a reasonable result.

Human Activity Recognition using View-Invariant Features and Probabilistic Graphical Models (시점 불변인 특징과 확률 그래프 모델을 이용한 인간 행위 인식)

  • Kim, Hyesuk;Kim, Incheol
    • Journal of KIISE
    • /
    • v.41 no.11
    • /
    • pp.927-934
    • /
    • 2014
  • In this paper, we propose an effective method for recognizing daily human activities from a stream of three dimensional body poses, which can be obtained by using Kinect-like RGB-D sensors. The body pose data provided by Kinect SDK or OpenNI may suffer from both the view variance problem and the scale variance problem, since they are represented in the 3D Cartesian coordinate system, the origin of which is located on the center of Kinect. In order to resolve the problem and get the view-invariant and scale-invariant features, we transform the pose data into the spherical coordinate system of which the origin is placed on the center of the subject's hip, and then perform on them the scale normalization using the length of the subject's arm. In order to represent effectively complex internal structures of high-level daily activities, we utilize Hidden state Conditional Random Field (HCRF), which is one of probabilistic graphical models. Through various experiments using two different datasets, KAD-70 and CAD-60, we showed the high performance of our method and the implementation system.

Comparative Analysis of Self-supervised Deephashing Models for Efficient Image Retrieval System (효율적인 이미지 검색 시스템을 위한 자기 감독 딥해싱 모델의 비교 분석)

  • Kim Soo In;Jeon Young Jin;Lee Sang Bum;Kim Won Gyum
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.12
    • /
    • pp.519-524
    • /
    • 2023
  • In hashing-based image retrieval, the hash code of a manipulated image is different from the original image, making it difficult to search for the same image. This paper proposes and evaluates a self-supervised deephashing model that generates perceptual hash codes from feature information such as texture, shape, and color of images. The comparison models are autoencoder-based variational inference models, but the encoder is designed with a fully connected layer, convolutional neural network, and transformer modules. The proposed model is a variational inference model that includes a SimAM module of extracting geometric patterns and positional relationships within images. The SimAM module can learn latent vectors highlighting objects or local regions through an energy function using the activation values of neurons and surrounding neurons. The proposed method is a representation learning model that can generate low-dimensional latent vectors from high-dimensional input images, and the latent vectors are binarized into distinguishable hash code. From the experimental results on public datasets such as CIFAR-10, ImageNet, and NUS-WIDE, the proposed model is superior to the comparative model and analyzed to have equivalent performance to the supervised learning-based deephashing model. The proposed model can be used in application systems that require low-dimensional representation of images, such as image search or copyright image determination.

A New Similarity Measure for Categorical Attribute-Based Clustering (범주형 속성 기반 군집화를 위한 새로운 유사 측도)

  • Kim, Min;Jeon, Joo-Hyuk;Woo, Kyung-Gu;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.37 no.2
    • /
    • pp.71-81
    • /
    • 2010
  • The problem of finding clusters is widely used in numerous applications, such as pattern recognition, image analysis, market analysis. The important factors that decide cluster quality are the similarity measure and the number of attributes. Similarity measures should be defined with respect to the data types. Existing similarity measures are well applicable to numerical attribute values. However, those measures do not work well when the data is described by categorical attributes, that is, when no inherent similarity measure between values. In high dimensional spaces, conventional clustering algorithms tend to break down because of sparsity of data points. To overcome this difficulty, a subspace clustering approach has been proposed. It is based on the observation that different clusters may exist in different subspaces. In this paper, we propose a new similarity measure for clustering of high dimensional categorical data. The measure is defined based on the fact that a good clustering is one where each cluster should have certain information that can distinguish it with other clusters. We also try to capture on the attribute dependencies. This study is meaningful because there has been no method to use both of them. Experimental results on real datasets show clusters obtained by our proposed similarity measure are good enough with respect to clustering accuracy.

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.

Design of Face Recognition algorithm Using PCA&LDA combined for Data Pre-Processing and Polynomial-based RBF Neural Networks (PCA와 LDA를 결합한 데이터 전 처리와 다항식 기반 RBFNNs을 이용한 얼굴 인식 알고리즘 설계)

  • Oh, Sung-Kwun;Yoo, Sung-Hoon
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.61 no.5
    • /
    • pp.744-752
    • /
    • 2012
  • In this study, the Polynomial-based Radial Basis Function Neural Networks is proposed as an one of the recognition part of overall face recognition system that consists of two parts such as the preprocessing part and recognition part. The design methodology and procedure of the proposed pRBFNNs are presented to obtain the solution to high-dimensional pattern recognition problems. In data preprocessing part, Principal Component Analysis(PCA) which is generally used in face recognition, which is useful to express some classes using reduction, since it is effective to maintain the rate of recognition and to reduce the amount of data at the same time. However, because of there of the whole face image, it can not guarantee the detection rate about the change of viewpoint and whole image. Thus, to compensate for the defects, Linear Discriminant Analysis(LDA) is used to enhance the separation of different classes. In this paper, we combine the PCA&LDA algorithm and design the optimized pRBFNNs for recognition module. The proposed pRBFNNs architecture consists of three functional modules such as the condition part, the conclusion part, and the inference part as fuzzy rules formed in 'If-then' format. In the condition part of fuzzy rules, input space is partitioned with Fuzzy C-Means clustering. In the conclusion part of rules, the connection weight of pRBFNNs is represented as two kinds of polynomials such as constant, and linear. The coefficients of connection weight identified with back-propagation using gradient descent method. The output of the pRBFNNs model is obtained by fuzzy inference method in the inference part of fuzzy rules. The essential design parameters (including learning rate, momentum coefficient and fuzzification coefficient) of the networks are optimized by means of Differential Evolution. The proposed pRBFNNs are applied to face image(ex Yale, AT&T) datasets and then demonstrated from the viewpoint of the output performance and recognition rate.

Group Contribution Method and Support Vector Regression based Model for Predicting Physical Properties of Aromatic Compounds (Group Contribution Method 및 Support Vector Regression 기반 모델을 이용한 방향족 화합물 물성치 예측에 관한 연구)

  • Kang, Ha Yeong;Oh, Chang Bo;Won, Yong Sun;Liu, J. Jay;Lee, Chang Jun
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.1
    • /
    • pp.1-8
    • /
    • 2021
  • To simulate a process model in the field of chemical engineering, it is very important to identify the physical properties of novel materials as well as existing materials. However, it is difficult to measure the physical properties throughout a set of experiments due to the potential risk and cost. To address this, this study aims to develop a property prediction model based on the group contribution method for aromatic chemical compounds including benzene rings. The benzene rings of aromatic materials have a significant impact on their physical properties. To establish the prediction model, 42 important functional groups that determine the physical properties are considered, and the total numbers of functional groups on 147 aromatic chemical compounds are counted to prepare a dataset. Support vector regression is employed to prepare a prediction model to handle sparse and high-dimensional data. To verify the efficacy of this study, the results of this study are compared with those of previous studies. Despite the different datasets in the previous studies, the comparison indicated the enhanced performance in this study. Moreover, there are few reports on predicting the physical properties of aromatic compounds. This study can provide an effective method to estimate the physical properties of unknown chemical compounds and contribute toward reducing the experimental efforts for measuring physical properties.