• 제목/요약/키워드: High-dimensional datasets

검색결과 46건 처리시간 0.023초

UFKLDA: An unsupervised feature extraction algorithm for anomaly detection under cloud environment

  • Wang, GuiPing;Yang, JianXi;Li, Ren
    • ETRI Journal
    • /
    • 제41권5호
    • /
    • pp.684-695
    • /
    • 2019
  • In a cloud environment, performance degradation, or even downtime, of virtual machines (VMs) usually appears gradually along with anomalous states of VMs. To better characterize the state of a VM, all possible performance metrics are collected. For such high-dimensional datasets, this article proposes a feature extraction algorithm based on unsupervised fuzzy linear discriminant analysis with kernel (UFKLDA). By introducing the kernel method, UFKLDA can not only effectively deal with non-Gaussian datasets but also implement nonlinear feature extraction. Two sets of experiments were undertaken. In discriminability experiments, this article introduces quantitative criteria to measure discriminability among all classes of samples. The results show that UFKLDA improves discriminability compared with other popular feature extraction algorithms. In detection accuracy experiments, this article computes accuracy measures of an anomaly detection algorithm (i.e., C-SVM) on the original performance metrics and extracted features. The results show that anomaly detection with features extracted by UFKLDA improves the accuracy of detection in terms of sensitivity and specificity.

Motion classification using distributional features of 3D skeleton data

  • Woohyun Kim;Daeun Kim;Kyoung Shin Park;Sungim Lee
    • Communications for Statistical Applications and Methods
    • /
    • 제30권6호
    • /
    • pp.551-560
    • /
    • 2023
  • Recently, there has been significant research into the recognition of human activities using three-dimensional sequential skeleton data captured by the Kinect depth sensor. Many of these studies employ deep learning models. This study introduces a novel feature selection method for this data and analyzes it using machine learning models. Due to the high-dimensional nature of the original Kinect data, effective feature extraction methods are required to address the classification challenge. In this research, we propose using the first four moments as predictors to represent the distribution of joint sequences and evaluate their effectiveness using two datasets: The exergame dataset, consisting of three activities, and the MSR daily activity dataset, composed of ten activities. The results show that the accuracy of our approach outperforms existing methods on average across different classifiers.

Prediction of Venous Trans-Stenotic Pressure Gradient Using Shape Features Derived From Magnetic Resonance Venography in Idiopathic Intracranial Hypertension Patients

  • Chao Ma;Haoyu Zhu;Shikai Liang;Yuzhou Chang;Dapeng Mo;Chuhan Jiang;Yupeng Zhang
    • Korean Journal of Radiology
    • /
    • 제25권1호
    • /
    • pp.74-85
    • /
    • 2024
  • Objective: Idiopathic intracranial hypertension (IIH) is a condition of unknown etiology associated with venous sinus stenosis. This study aimed to develop a magnetic resonance venography (MRV)-based radiomics model for predicting a high trans-stenotic pressure gradient (TPG) in IIH patients diagnosed with venous sinus stenosis. Materials and Methods: This retrospective study included 105 IIH patients (median age [interquartile range], 35 years [27-42 years]; female:male, 82:23) who underwent MRV and catheter venography complemented by venous manometry. Contrast enhanced-MRV was conducted under 1.5 Tesla system, and the images were reconstructed using a standard algorithm. Shape features were derived from MRV images via the PyRadiomics package and selected by utilizing the least absolute shrinkage and selection operator (LASSO) method. A radiomics score for predicting high TPG (≥ 8 mmHg) in IIH patients was formulated using multivariable logistic regression; its discrimination performance was assessed using the area under the receiver operating characteristic curve (AUROC). A nomogram was constructed by incorporating the radiomics scores and clinical features. Results: Data from 105 patients were randomly divided into two distinct datasets for model training (n = 73; 50 and 23 with and without high TPG, respectively) and testing (n = 32; 22 and 10 with and without high TPG, respectively). Three informative shape features were identified in the training datasets: least axis length, sphericity, and maximum three-dimensional diameter. The radiomics score for predicting high TPG in IIH patients demonstrated an AUROC of 0.906 (95% confidence interval, 0.836-0.976) in the training dataset and 0.877 (95% confidence interval, 0.755-0.999) in the test dataset. The nomogram showed good calibration. Conclusion: Our study presents the feasibility of a novel model for predicting high TPG in IIH patients using radiomics analysis of noninvasive MRV-based shape features. This information may aid clinicians in identifying patients who may benefit from stenting.

고차원 대용량 자료분석의 현재 동향 (Current trends in high dimensional massive data analysis)

  • 장원철;김광수;김정연
    • 응용통계연구
    • /
    • 제29권6호
    • /
    • pp.999-1005
    • /
    • 2016
  • 빅 데이터의 출현은 여러가지 과학적 난제에 대답 할 수 있는 기회를 제공하지만 흥미로운 도전을 또한 제공한다. 이러한 빅데이터의 주요 특징으로 "고차원"과 "대용량"을 들 수가 있다. 본 논문은 이러한 두 가지 특징에 동반되는 다음과 같은 도전문제에 대한 개요를 제시한다 : (1) 고차원 자료에서의 소음 축적과 위 상관 관계; (ii) 대용량 자료분석을 위한 계산 확장성. 또한 본 논문에서는 재난예측, 디지털 인문학과 세이버메트릭스 등 다양한 분야에서 빅 데이터의 다양한 응용사례를 제공한다.

Comparison of accuracy between digital and conventional implant impressions: two and three dimensional evaluations

  • Bi, Chuang;Wang, Xingyu;Tian, Fangfang;Qu, Zhe;Zhao, Jiaming
    • The Journal of Advanced Prosthodontics
    • /
    • 제14권4호
    • /
    • pp.236-249
    • /
    • 2022
  • PURPOSE. The present study compared the accuracy between digital and conventional implant impressions. MATERIALS AND METHODS. The experimental models were divided into six groups depending on the implant location and the scanning span. Digital impressions were captured using the intraoral optical scanner TRIOS (3Shape, Copenhagen, Denmark). Conventional impressions were taken with the monophase impression material based on addition-cured silicones, Honigum-Mono (DMG, Hamburg, Germany). A high-precision laboratory scanner D900 (3Shape, Copenhagen, Denmark) was used to obtain digital data of resin models and stone casts. Surface tessellation language (STL) datasets from scanner were imported into the analysis software Geomagic Qualify 14 (3D Systems, Rock Hill, SC, USA), and scan body deviations were determined through two-dimensional and three-dimensional analyses. Each scan body was measured five times. The Sidak t test was used to analyze the experimental data. RESULTS. Implant position and scanning distance affected the impression accuracy. For a unilateral arch implant and the mandible models with two implants, no significant difference was observed in the accuracy between the digital and conventional implant impressions on scan bodies; however, the corresponding differences for trans-arch implants and mandible with six implants were extremely significant (P<.001). CONCLUSION. For short-span scanning, the accuracy of digital and conventional implant impressions did not differ significantly. For long-span scanning, the precision of digital impressions was significantly inferior to that of the traditional impressions.

베이즈 정보 기준을 활용한 분할-정복 벌점화 분위수 회귀 (Model selection via Bayesian information criterion for divide-and-conquer penalized quantile regression)

  • 강종경;한석원;방성완
    • 응용통계연구
    • /
    • 제35권2호
    • /
    • pp.217-227
    • /
    • 2022
  • 분위수 회귀 모형은 변수에 숨겨진 복잡한 정보를 살펴보기 위한 효율적인 도구를 제공하는 장점을 바탕으로 많은 분야에서 널리 사용되고 있다. 그러나 현대의 대용량-고차원 데이터는 계산 시간 및 저장공간의 제한으로 인해 분위수 회귀 모형의 추정을 매우 어렵게 만든다. 분할-정복은 전체 데이터를 계산이 용이한 여러개의 부분집합으로 나눈 다음 각 분할에서의 요약 통계량만을 이용하여 전체 데이터의 추정량을 재구성하는 기법이다. 본 연구에서는 분할-정복 기법을 벌점화 분위수 회귀에 적용하고 베이즈 정보기준을 활용하여 변수를 선택하는 방법에 관하여 연구하였다. 제안 방법은 분할 수를 적절하게 선택하였을 때, 전체 데이터로 계산한 일반적인 분위수 회귀 추정량만큼 변수 선택의 측면에서 일관된 결과를 제공하면서 계산 속도의 측면에서 효율적이다. 이러한 제안된 방법의 장점은 시뮬레이션 데이터 및 실제 데이터 분석을 통해 확인하였다.

CREATING MULTIPLE CLASSIFIERS FOR THE CLASSIFICATION OF HYPERSPECTRAL DATA;FEATURE SELECTION OR FEATURE EXTRACTION

  • Maghsoudi, Yasser;Rahimzadegan, Majid;Zoej, M.J.Valadan
    • 대한원격탐사학회:학술대회논문집
    • /
    • 대한원격탐사학회 2007년도 Proceedings of ISRS 2007
    • /
    • pp.6-10
    • /
    • 2007
  • Classification of hyperspectral images is challenging. A very high dimensional input space requires an exponentially large amount of data to adequately and reliably represent the classes in that space. In other words in order to obtain statistically reliable classification results, the number of necessary training samples increases exponentially as the number of spectral bands increases. However, in many situations, acquisition of the large number of training samples for these high-dimensional datasets may not be so easy. This problem can be overcome by using multiple classifiers. In this paper we compared the effectiveness of two approaches for creating multiple classifiers, feature selection and feature extraction. The methods are based on generating multiple feature subsets by running feature selection or feature extraction algorithm several times, each time for discrimination of one of the classes from the rest. A maximum likelihood classifier is applied on each of the obtained feature subsets and finally a combination scheme was used to combine the outputs of individual classifiers. Experimental results show the effectiveness of feature extraction algorithm for generating multiple classifiers.

  • PDF

재분석자료들을 이용한 최근 35년(1979~2013) 동북아시아 상층제트의 변동특성 (Characteristic Variations of Upper Jet Stream over North-East Asian Region during the Recent 35 Years (1979~2013) Based on Four Reanalysis Datasets)

  • 소은미;서명석
    • 대기
    • /
    • 제25권2호
    • /
    • pp.235-248
    • /
    • 2015
  • In this study, we analyzed the three dimensional variations (latitude, longitude, and height of Jet core) and wind speed of upper Jet stream in the East Asian region using recent 35 years (1979~2013) of four reanalysis data (NCEP-R2, MERRA, ERA-Interim. and JRA-55). Most of Jet core is located in $30.0{\sim}37.5^{\circ}N$ and $13.0{\sim}157.5^{\circ}E$ although there are slight differences among the four reanalysis data. The wind speed differences among reanalysis are about $3m\;s^{-1}$ regardless of seasons, the weakest in NCEP-R2 and the strongest in JRA-55. Although significance level is not high, most of reanalysis showed that the Jet core has a tendency of southward moving during spring and winter, but moving northward during summer and fall. This amplified seasonal variation of Jet core suggests that seasonal variations of weather/climate can be increased in the East Asian region. The longitude of Jet core has a tendency of systematically westward moving and decreasing of zonal variations regardless of averaging methods and reanalysis data. In general, the Jet core shows a tendency of moving south-west-ward and upward, getting intensified during spring and winter regardless of the reanalysis data. However, the Jet core shows a tendency of moving westward and downward, and getting weakened during summer. In fall, there were no distinctive trends not only in wind speed but also three dimensional locations compared to other seasons. Although the significance levels are not high and variation patterns are slightly different according to the reanalysis data, our findings are more or less different from the previous results. So, more works are needed to clarify the three dimensional variation patterns of Jet core over the East Asian region as a result of global warming.

[Retracted]Hot Spot Analysis of Tourist Attractions Based on Stay Point Spatial Clustering

  • Liao, Yifan
    • Journal of Information Processing Systems
    • /
    • 제16권4호
    • /
    • pp.750-759
    • /
    • 2020
  • The wide application of various integrated location-based services (LBS social) and tourism application (app) has generated a large amount of trajectory space data. The trajectory data are used to identify popular tourist attractions with high density of tourists, and they are of great significance to smart service and emergency management of scenic spots. A hot spot analysis method is proposed, based on spatial clustering of trajectory stop points. The DBSCAN algorithm is studied with fast clustering speed, noise processing and clustering of arbitrary shapes in space. The shortage of parameters is manually selected, and an improved method is proposed to adaptively determine parameters based on statistical distribution characteristics of data. DBSCAN clustering analysis and contrast experiments are carried out for three different datasets of artificial synthetic two-dimensional dataset, four-dimensional Iris real dataset and scenic track retention point. The experiment results show that the method can automatically generate reasonable clustering division, and it is superior to traditional algorithms such as DBSCAN and k-means. Finally, based on the spatial clustering results of the trajectory stay points, the Getis-Ord Gi* hotspot analysis and mapping are conducted in ArcGIS software. The hot spots of different tourist attractions are classified according to the analysis results, and the distribution of popular scenic spots is determined with the actual heat of the scenic spots.

Three-dimensional geostatistical modeling of subsurface stratification and SPT-N Value at dam site in South Korea

  • Mingi Kim;Choong-Ki Chung;Joung-Woo Han;Han-Saem Kim
    • Geomechanics and Engineering
    • /
    • 제34권1호
    • /
    • pp.29-41
    • /
    • 2023
  • The 3D geospatial modeling of geotechnical information can aid in understanding the geotechnical characteristic values of the continuous subsurface at construction sites. In this study, a geostatistical optimization model for the three-dimensional (3D) mapping of subsurface stratification and the SPT-N value based on a trial-and-error rule was developed and applied to a dam emergency spillway site in South Korea. Geospatial database development for a geotechnical investigation, reconstitution of the target grid volume, and detection of outliers in the borehole dataset were implemented prior to the 3D modeling. For the site-specific subsurface stratification of the engineering geo-layer, we developed an integration method for the borehole and geophysical survey datasets based on the geostatistical optimization procedure of ordinary kriging and sequential Gaussian simulation (SGS) by comparing their cross-validation-based prediction residuals. We also developed an optimization technique based on SGS for estimating the 3D geometry of the SPT-N value. This method involves quantitatively testing the reliability of SGS and selecting the realizations with a high estimation accuracy. Boring tests were performed for validation, and the proposed method yielded more accurate prediction results and reproduced the spatial distribution of geotechnical information more effectively than the conventional geostatistical approach.