• Title/Summary/Keyword: Data feature analysis

Search Result 1,397, Processing Time 0.034 seconds

Feature Selection for Classification of Mass Spectrometric Proteomic Data Using Random Forest (단백체 스펙트럼 데이터의 분류를 위한 랜덤 포리스트 기반 특성 선택 알고리즘)

  • Ohn, Syng-Yup;Chi, Seung-Do;Han, Mi-Young
    • Journal of the Korea Society for Simulation
    • /
    • v.22 no.4
    • /
    • pp.139-147
    • /
    • 2013
  • This paper proposes a novel method for feature selection for mass spectrometric proteomic data based on Random Forest. The method includes an effective preprocessing step to filter a large amount of redundant features with high correlation and applies a tournament strategy to get an optimal feature subset. Experiments on three public datasets, Ovarian 4-3-02, Ovarian 7-8-02 and Prostate shows that the new method achieves high performance comparing with widely used methods and balanced rate of specificity and sensitivity.

Stepwise Volume Decomposition Considering Design Feature Recognition (설계 특징형상 인식을 고려한 단계적 볼륨 분해)

  • Kim, Byung Chul;Kim, Ikjune;Han, Soonhung;Mun, Duhwan
    • Korean Journal of Computational Design and Engineering
    • /
    • v.18 no.1
    • /
    • pp.71-82
    • /
    • 2013
  • To modify product design easily, modern CAD systems adopt the feature-based model as their primary representation. On the other hand, the boundary representation (B-rep) model is used as their secondary representation. IGES and STEP AP203 edition 1 are the representative standard formats for the exchange of CAD files. Unfortunately, both of them only support the B-rep model. As a result, feature data are lost during the CAD file exchange based on these standards. Loss of feature data causes the difficulty of CAD model modification and prevents the transfer of design intent. To resolve this problem, a tool for recognizing design features from a B-rep model and then reconstructing a feature-based model with the recognized features should be developed. As the first part of this research, this paper presents a method for decomposing a B-rep model into simple volumes suitable for design feature recognition. The results of experiments with a prototype system are analyzed. From the analysis, future research issues are suggested.

Indoor Path Recognition Based on Wi-Fi Fingerprints

  • Donggyu Lee;Jaehyun Yoo
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.12 no.2
    • /
    • pp.91-100
    • /
    • 2023
  • The existing indoor localization method using Wi-Fi fingerprinting has a high collection cost and relatively low accuracy, thus requiring integrated correction of convergence with other technologies. This paper proposes a new method that significantly reduces collection costs compared to existing methods using Wi-Fi fingerprinting. Furthermore, it does not require labeling of data at collection and can estimate pedestrian travel paths even in large indoor spaces. The proposed pedestrian movement path estimation process is as follows. Data collection is accomplished by setting up a feature area near an indoor space intersection, moving through the set feature areas, and then collecting data without labels. The collected data are processed using Kernel Linear Discriminant Analysis (KLDA) and the valley point of the Euclidean distance value between two data is obtained within the feature space of the data. We build learning data by labeling data corresponding to valley points and some nearby data by feature area numbers, and labeling data between valley points and other valley points as path data between each corresponding feature area. Finally, for testing, data are collected randomly through indoor space, KLDA is applied as previous data to build test data, the K-Nearest Neighbor (K-NN) algorithm is applied, and the path of movement of test data is estimated by applying a correction algorithm to estimate only routes that can be reached from the most recently estimated location. The estimation results verified the accuracy by comparing the true paths in indoor space with those estimated by the proposed method and achieved approximately 90.8% and 81.4% accuracy in two experimental spaces, respectively.

Designing VOD Service Domain Feature Model and VOD Service Developing Process Based-on it (VOD 서비스 도메인 피처모델과 이를 기반한 VOD 서비스 개발 프로세스)

  • KO, Kwangil
    • Convergence Security Journal
    • /
    • v.17 no.3
    • /
    • pp.51-57
    • /
    • 2017
  • VOD service provides an additional revenue for broadcasting companies in addition to the existing subscription fees and advertisement-based revenue. Therefore, each broadcasting company develops its own VOD service and performs frequent improvement work. This leads to the development of new VOD services, so developers are considering ways to effectively handle the frequent development needs. In this background, we conducted an underlying research to apply the feature-oriented analysis model to the development of VOD service. The feature-oriented analysis model used in this study is the Feature-Oriented Domain Analysis (FODA) developed by SEI of Carnegie Mellon University. FODA provides a tool for specifying a feature model of a software domain, based on which developers determine the configuration of a software with customers. This study developed a feature model of the VOD service domain and devised the functionalities and testcases in an integrated manner with the feature model. Additionally, we proposed a VOD service development process utilizing the feature model, function specification, and testcases.

Laver Farm Feature Extraction From Landsat ETM+ Using Independent Component Analysis

  • Han J. G.;Yeon Y. K.;Chi K. H.;Hwang J. H.
    • Proceedings of the KSRS Conference
    • /
    • 2004.10a
    • /
    • pp.359-362
    • /
    • 2004
  • In multi-dimensional image, ICA-based feature extraction algorithm, which is proposed in this paper, is for the purpose of detecting target feature about pixel assumed as a linear mixed spectrum sphere, which is consisted of each different type of material object (target feature and background feature) in spectrum sphere of reflectance of each pixel. Landsat ETM+ satellite image is consisted of multi-dimensional data structure and, there is target feature, which is purposed to extract and various background image is mixed. In this paper, in order to eliminate background features (tidal flat, seawater and etc) around target feature (laver farm) effectively, pixel spectrum sphere of target feature is projected onto the orthogonal spectrum sphere of background feature. The rest amount of spectrum sphere of target feature in the pixel can be presumed to remove spectrum sphere of background feature. In order to make sure the excellence of feature extraction method based on ICA, which is proposed in this paper, laver farm feature extraction from Landsat ETM+ satellite image is applied. Also, In the side of feature extraction accuracy and the noise level, which is still remaining not to remove after feature extraction, we have conducted a comparing test with traditionally most popular method, maximum-likelihood. As a consequence, the proposed method from this paper can effectively eliminate background features around mixed spectrum sphere to extract target feature. So, we found that it had excellent detection efficiency.

  • PDF

Comparison of Feature Selection Methods Applied on Risk Prediction for Hypertension (고혈압 위험 예측에 적용된 특징 선택 방법의 비교)

  • Khongorzul, Dashdondov;Kim, Mi-Hye
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.3
    • /
    • pp.107-114
    • /
    • 2022
  • In this paper, we have enhanced the risk prediction of hypertension using the feature selection method in the Korean National Health and Nutrition Examination Survey (KNHANES) database of the Korea Centers for Disease Control and Prevention. The study identified various risk factors correlated with chronic hypertension. The paper is divided into three parts. Initially, the data preprocessing step of removes missing values, and performed z-transformation. The following is the feature selection (FS) step that used a factor analysis (FA) based on the feature selection method in the dataset, and feature importance (FI) and multicollinearity analysis (MC) were compared based on FS. Finally, in the predictive analysis stage, it was applied to detect and predict the risk of hypertension. In this study, we compare the accuracy, f-score, area under the ROC curve (AUC), and mean standard error (MSE) for each model of classification. As a result of the test, the proposed MC-FA-RF model achieved the highest accuracy of 80.12%, MSE of 0.106, f-score of 83.49%, and AUC of 85.96%, respectively. These results demonstrate that the proposed MC-FA-RF method for hypertension risk predictions is outperformed other methods.

Health Risk Management using Feature Extraction and Cluster Analysis considering Time Flow (시간흐름을 고려한 특징 추출과 군집 분석을 이용한 헬스 리스크 관리)

  • Kang, Ji-Soo;Chung, Kyungyong;Jung, Hoill
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.1
    • /
    • pp.99-104
    • /
    • 2021
  • In this paper, we propose health risk management using feature extraction and cluster analysis considering time flow. The proposed method proceeds in three steps. The first is the pre-processing and feature extraction step. It collects user's lifelog using a wearable device, removes incomplete data, errors, noise, and contradictory data, and processes missing values. Then, for feature extraction, important variables are selected through principal component analysis, and data similar to the relationship between the data are classified through correlation coefficient and covariance. In order to analyze the features extracted from the lifelog, dynamic clustering is performed through the K-means algorithm in consideration of the passage of time. The new data is clustered through the similarity distance measurement method based on the increment of the sum of squared errors. Next is to extract information about the cluster by considering the passage of time. Therefore, using the health decision-making system through feature clusters, risks able to managed through factors such as physical characteristics, lifestyle habits, disease status, health care event occurrence risk, and predictability. The performance evaluation compares the proposed method using Precision, Recall, and F-measure with the fuzzy and kernel-based clustering. As a result of the evaluation, the proposed method is excellently evaluated. Therefore, through the proposed method, it is possible to accurately predict and appropriately manage the user's potential health risk by using the similarity with the patient.

A Visualization System for Multiple Heterogeneous Network Security Data and Fusion Analysis

  • Zhang, Sheng;Shi, Ronghua;Zhao, Jue
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.10 no.6
    • /
    • pp.2801-2816
    • /
    • 2016
  • Owing to their low scalability, weak support on big data, insufficient data collaborative analysis and inadequate situational awareness, the traditional methods fail to meet the needs of the security data analysis. This paper proposes visualization methods to fuse the multi-source security data and grasp the network situation. Firstly, data sources are classified at their collection positions, with the objects of security data taken from three different layers. Secondly, the Heatmap is adopted to show host status; the Treemap is used to visualize Netflow logs; and the radial Node-link diagram is employed to express IPS logs. Finally, the Labeled Treemap is invented to make a fusion at data-level and the Time-series features are extracted to fuse data at feature-level. The comparative analyses with the prize-winning works prove this method enjoying substantial advantages for network analysts to facilitate data feature fusion, better understand network security situation with a unified, convenient and accurate mode.

An Underlying Research for Developing VOD Service using Feature-Oriented Analysis Model (피처지향 분석모델을 적용한 VOD 서비스 개발을 위한 기반연구)

  • KO, Kwangil
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.7
    • /
    • pp.26-32
    • /
    • 2017
  • VOD (Video-On Demand) Services are considered to be one of the most successful data broadcasting services, along with Electronic Program Guides (EPGs). In particular, VOD services provide supplementary revenue for broadcasting companies in addition to the existing subscription fees and advertisement-based revenue. Therefore, each broadcasting company has developed its own VOD service and constantly seeks to improve it. This leads to the development of new VOD services, so developers are considering ways to effectively handle the frequent development needs. In this background, we conducted underlying research to apply the feature-oriented analysis model to the development of VOD services. The feature-oriented analysis model used in this study is the Feature-Oriented Domain Analysis (FODA) one developed by SEI of Carnegie Mellon University. FODA provides a tool for specifying the feature model of a software domain, based on which the developers can determine the configuration of the software with the customers. This study developed a feature model of the VOD service domain and devised the functionalities and test cases in an integrated manner with the feature model. Additionally, we proposed a VOD service development process utilizing the feature model, function specification, and test cases.

Detection of multi-type data anomaly for structural health monitoring using pattern recognition neural network

  • Gao, Ke;Chen, Zhi-Dan;Weng, Shun;Zhu, Hong-Ping;Wu, Li-Ying
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.129-140
    • /
    • 2022
  • The effectiveness of system identification, damage detection, condition assessment and other structural analyses relies heavily on the accuracy and reliability of the measured data in structural health monitoring (SHM) systems. However, data anomalies often occur in SHM systems, leading to inaccurate and untrustworthy analysis results. Therefore, anomalies in the raw data should be detected and cleansed before further analysis. Previous studies on data anomaly detection mainly focused on just single type of data anomaly for denoising or removing outliers, meanwhile, the existing methods of detecting multiple data anomalies are usually time consuming. For these reasons, recognising multiple anomaly patterns for real-time alarm and analysis in field monitoring remains a challenge. Aiming to achieve an efficient and accurate detection for multi-type data anomalies for field SHM, this study proposes a pattern-recognition-based data anomaly detection method that mainly consists of three steps: the feature extraction from the long time-series data samples, the training of a pattern recognition neural network (PRNN) using the features and finally the detection of data anomalies. The feature extraction step remarkably reduces the time cost of the network training, making the detection process very fast. The performance of the proposed method is verified on the basis of the SHM data of two practical long-span bridges. Results indicate that the proposed method recognises multiple data anomalies with very high accuracy and low calculation cost, demonstrating its applicability in field monitoring.