• 제목/요약/키워드: Data feature analysis

검색결과 1,379건 처리시간 0.026초

Comparative Analysis of Building Models to Develop a Generic Indoor Feature Model

  • Kim, Misun;Choi, Hyun-Sang;Lee, Jiyeong
    • 한국측량학회지
    • /
    • 제39권5호
    • /
    • pp.297-311
    • /
    • 2021
  • Around the world, there is an increasing interest in Digital Twin cities. Although geospatial data is critical for building a digital twin city, currently-established spatial data cannot be used directly for its implementation. Integration of geospatial data is vital in order to construct and simulate the virtual space. Existing studies for data integration have focused on data transformation. The conversion method is fundamental and convenient, but the information loss during this process remains a limitation. With this, standardization of the data model is an approach to solve the integration problem while hurdling conversion limitations. However, the standardization within indoor space data models is still insufficient compared to 3D building and city models. Therefore, in this study, we present a comparative analysis of data models commonly used in indoor space modeling as a basis for establishing a generic indoor space feature model. By comparing five models of IFC (Industry Foundation Classes), CityGML (City Geographic Markup Language), AIIM (ArcGIS Indoors Information Model), IMDF (Indoor Mapping Data Format), and OmniClass, we identify essential elements for modeling indoor space and the feature classes commonly included in the models. The proposed generic model can serve as a basis for developing further indoor feature models through specifying minimum required structure and feature classes.

The Audio Signal Classification System Using Contents Based Analysis

  • Lee, Kwang-Seok;Kim, Young-Sub;Han, Hag-Yong;Hur, Kang-In
    • Journal of information and communication convergence engineering
    • /
    • 제5권3호
    • /
    • pp.245-248
    • /
    • 2007
  • In this paper, we research the content-based analysis and classification according to the composition of the feature parameter data base for the audio data to implement the audio data index and searching system. Audio data is classified to the primitive various auditory types. We described the analysis and feature extraction method for the feature parameters available to the audio data classification. And we compose the feature parameters data base in the index group unit, then compare and analyze the audio data centering the including level around and index criterion into the audio categories. Based on this result, we compose feature vectors of audio data according to the classification categories, and simulate to classify using discrimination function.

Speaker Verification with the Constraint of Limited Data

  • Kumari, Thyamagondlu Renukamurthy Jayanthi;Jayanna, Haradagere Siddaramaiah
    • Journal of Information Processing Systems
    • /
    • 제14권4호
    • /
    • pp.807-823
    • /
    • 2018
  • Speaker verification system performance depends on the utterance of each speaker. To verify the speaker, important information has to be captured from the utterance. Nowadays under the constraints of limited data, speaker verification has become a challenging task. The testing and training data are in terms of few seconds in limited data. The feature vectors extracted from single frame size and rate (SFSR) analysis is not sufficient for training and testing speakers in speaker verification. This leads to poor speaker modeling during training and may not provide good decision during testing. The problem is to be resolved by increasing feature vectors of training and testing data to the same duration. For that we are using multiple frame size (MFS), multiple frame rate (MFR), and multiple frame size and rate (MFSR) analysis techniques for speaker verification under limited data condition. These analysis techniques relatively extract more feature vector during training and testing and develop improved modeling and testing for limited data. To demonstrate this we have used mel-frequency cepstral coefficients (MFCC) and linear prediction cepstral coefficients (LPCC) as feature. Gaussian mixture model (GMM) and GMM-universal background model (GMM-UBM) are used for modeling the speaker. The database used is NIST-2003. The experimental results indicate that, improved performance of MFS, MFR, and MFSR analysis radically better compared with SFSR analysis. The experimental results show that LPCC based MFSR analysis perform better compared to other analysis techniques and feature extraction techniques.

A Comparison on Independent Component Analysis and Principal Component Analysis -for Classification Analysis-

  • Kim, Dae-Hak;Lee, Ki-Lak
    • Journal of the Korean Data and Information Science Society
    • /
    • 제16권4호
    • /
    • pp.717-724
    • /
    • 2005
  • We often extract a new feature from the original features for the purpose of reducing the dimensions of feature space and better classification. In this paper, we show feature extraction method based on independent component analysis can be used for classification. Entropy and mutual information are used for the selection of ordered features. Performance of classification based on independent component analysis is compared with principal component analysis for three real data sets.

  • PDF

FEROM: Feature Extraction and Refinement for Opinion Mining

  • Jeong, Ha-Na;Shin, Dong-Wook;Choi, Joong-Min
    • ETRI Journal
    • /
    • 제33권5호
    • /
    • pp.720-730
    • /
    • 2011
  • Opinion mining involves the analysis of customer opinions using product reviews and provides meaningful information including the polarity of the opinions. In opinion mining, feature extraction is important since the customers do not normally express their product opinions holistically but separately according to its individual features. However, previous research on feature-based opinion mining has not had good results due to drawbacks, such as selecting a feature considering only syntactical grammar information or treating features with similar meanings as different. To solve these problems, this paper proposes an enhanced feature extraction and refinement method called FEROM that effectively extracts correct features from review data by exploiting both grammatical properties and semantic characteristics of feature words and refines the features by recognizing and merging similar ones. A series of experiments performed on actual online review data demonstrated that FEROM is highly effective at extracting and refining features for analyzing customer review data and eventually contributes to accurate and functional opinion mining.

특성중요도를 활용한 분류나무의 입력특성 선택효과 : 신용카드 고객이탈 사례 (Feature Selection Effect of Classification Tree Using Feature Importance : Case of Credit Card Customer Churn Prediction)

  • 윤한성
    • 디지털산업정보학회논문지
    • /
    • 제20권2호
    • /
    • pp.1-10
    • /
    • 2024
  • For the purpose of predicting credit card customer churn accurately through data analysis, a model can be constructed with various machine learning algorithms, including decision tree. And feature importance has been utilized in selecting better input features that can improve performance of data analysis models for several application areas. In this paper, a method of utilizing feature importance calculated from the MDI method and its effects are investigated in the credit card customer churn prediction problem with classification trees. Compared with several random feature selections from case data, a set of input features selected from higher value of feature importance shows higher predictive power. It can be an efficient method for classifying and choosing input features necessary for improving prediction performance. The method organized in this paper can be an alternative to the selection of input features using feature importance in composing and using classification trees, including credit card customer churn prediction.

오디오 데이터의 특징 파라메터 구성에 따른 내용기반 분석 (The Content Based Analysis According to the Composition of the Feature Parameters for the Auditory Data)

  • 한학용;허강인;김수훈
    • 한국음향학회지
    • /
    • 제21권2호
    • /
    • pp.182-189
    • /
    • 2002
  • 본 논문은 오디오 색인·검색 시스템을 구현하기 위하여 오디오 신호에 대한특징 파라메터 풀 (pool)을 구성하고 이에 따른 오디오 데이터의 내용분석 및 분류에 관한 연구이다. 오디오 데이터는 기본적인 다양한 오디오 형태로 분류되어진다. 본 논문에서는 오디오 데이터의 분류에 이용 가능한 특징 파라메터를 분석하고 추출방법에 대하여 논한다. 그리고 특징 파라메터 풀을 색인 그룹 단위로 구성하여 오디오 카테고리에 대한 설정된 특징들의 포함 정도와 색인기준을 오디오 데이터의 내용을 중심으로 비교 ·분석한다. 그리고 위의 결과를 바탕으로 분류절차를 구성하여 오디오 신호를 분류하는 모의실험을 행하였다.

Combined Features with Global and Local Features for Gas Classification

  • Choi, Sang-Il
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권9호
    • /
    • pp.11-18
    • /
    • 2016
  • In this paper, we propose a gas classification method using combined features for an electronic nose system that performs well even when some loss occurs in measuring data samples. We first divide the entire measurement for a data sample into three local sections, which are the stabilization, exposure, and purge; local features are then extracted from each section. Based on the discrimination analysis, measurements of the discriminative information amounts are taken. Subsequently, the local features that have a large amount of discriminative information are chosen to compose the combined features together with the global features that extracted from the entire measurement section of the data sample. The experimental results show that the combined features by the proposed method gives better classification performance for a variety of volatile organic compound data than the other feature types, especially when there is data loss.

SVM 기반 자동 품질검사 시스템에서 상관분석 기반 데이터 선정 연구 (Study on Correlation-based Feature Selection in an Automatic Quality Inspection System using Support Vector Machine (SVM))

  • 송동환;오영광;김남훈
    • 대한산업공학회지
    • /
    • 제42권6호
    • /
    • pp.370-376
    • /
    • 2016
  • Manufacturing data analysis and its applications are getting a huge popularity in various industries. In spite of the fast advancement in the big data analysis technology, however, the manufacturing quality data monitored from the automated inspection system sometimes is not reliable enough due to the complex patterns of product quality. In this study, thus, we aim to define the level of trusty of an automated quality inspection system and improve the reliability of the quality inspection data. By correlation analysis and feature selection, this paper presents a method of improving the inspection accuracy and efficiency in an SVM-based automatic product quality inspection system using thermal image data in an auto part manufacturing case. The proposed method is implemented in the sealer dispensing process of the automobile manufacturing and verified by the analysis of the optimal feature selection from the quality analysis results.

Feature Selection via Embedded Learning Based on Tangent Space Alignment for Microarray Data

  • Ye, Xiucai;Sakurai, Tetsuya
    • Journal of Computing Science and Engineering
    • /
    • 제11권4호
    • /
    • pp.121-129
    • /
    • 2017
  • Feature selection has been widely established as an efficient technique for microarray data analysis. Feature selection aims to search for the most important feature/gene subset of a given dataset according to its relevance to the current target. Unsupervised feature selection is considered to be challenging due to the lack of label information. In this paper, we propose a novel method for unsupervised feature selection, which incorporates embedded learning and $l_{2,1}-norm$ sparse regression into a framework to select genes in microarray data analysis. Local tangent space alignment is applied during embedded learning to preserve the local data structure. The $l_{2,1}-norm$ sparse regression acts as a constraint to aid in learning the gene weights correlatively, by which the proposed method optimizes for selecting the informative genes which better capture the interesting natural classes of samples. We provide an effective algorithm to solve the optimization problem in our method. Finally, to validate the efficacy of the proposed method, we evaluate the proposed method on real microarray gene expression datasets. The experimental results demonstrate that the proposed method obtains quite promising performance.