• Title/Summary/Keyword: High Dimensionality Data

Search Result 122, Processing Time 0.021 seconds

Missing Value Estimation and Sensor Fault Identification using Multivariate Statistical Analysis (다변량 통계 분석을 이용한 결측 데이터의 예측과 센서이상 확인)

  • Lee, Changkyu;Lee, In-Beum
    • Korean Chemical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.87-92
    • /
    • 2007
  • Recently, developments of process monitoring system in order to detect and diagnose process abnormalities has got the spotlight in process systems engineering. Normal data obtained from processes provide available information of process characteristics to be used for modeling, monitoring, and control. Since modern chemical and environmental processes have high dimensionality, strong correlation, severe dynamics and nonlinearity, it is not easy to analyze a process through model-based approach. To overcome limitations of model-based approach, lots of system engineers and academic researchers have focused on statistical approach combined with multivariable analysis such as principal component analysis (PCA), partial least squares (PLS), and so on. Several multivariate analysis methods have been modified to apply it to a chemical process with specific characteristics such as dynamics, nonlinearity, and so on.This paper discusses about missing value estimation and sensor fault identification based on process variable reconstruction using dynamic PCA and canonical variate analysis.

A Novel RGB Channel Assimilation for Hyperspectral Image Classification using 3D-Convolutional Neural Network with Bi-Long Short-Term Memory

  • M. Preethi;C. Velayutham;S. Arumugaperumal
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.177-186
    • /
    • 2023
  • Hyperspectral imaging technology is one of the most efficient and fast-growing technologies in recent years. Hyperspectral image (HSI) comprises contiguous spectral bands for every pixel that is used to detect the object with significant accuracy and details. HSI contains high dimensionality of spectral information which is not easy to classify every pixel. To confront the problem, we propose a novel RGB channel Assimilation for classification methods. The color features are extracted by using chromaticity computation. Additionally, this work discusses the classification of hyperspectral image based on Domain Transform Interpolated Convolution Filter (DTICF) and 3D-CNN with Bi-directional-Long Short Term Memory (Bi-LSTM). There are three steps for the proposed techniques: First, HSI data is converted to RGB images with spatial features. Before using the DTICF, the RGB images of HSI and patch of the input image from raw HSI are integrated. Afterward, the pair features of spectral and spatial are excerpted using DTICF from integrated HSI. Those obtained spatial and spectral features are finally given into the designed 3D-CNN with Bi-LSTM framework. In the second step, the excerpted color features are classified by 2D-CNN. The probabilistic classification map of 3D-CNN-Bi-LSTM, and 2D-CNN are fused. In the last step, additionally, Markov Random Field (MRF) is utilized for improving the fused probabilistic classification map efficiently. Based on the experimental results, two different hyperspectral images prove that novel RGB channel assimilation of DTICF-3D-CNN-Bi-LSTM approach is more important and provides good classification results compared to other classification approaches.

Spatial Locality Preservation Metric for Constructing Histogram Sequences (히스토그램 시퀀스 구성을 위한 공간 지역성 보존 척도)

  • Lee, Jeonggon;Kim, Bum-Soo;Moon, Yang-Sae;Choi, Mi-Jung
    • Journal of Information Technology and Architecture
    • /
    • v.10 no.1
    • /
    • pp.79-91
    • /
    • 2013
  • This paper proposes a systematic methodology that could be used to decide which one shows the best performance among space filling curves (SFCs) in applying lower-dimensional transformations to histogram sequences. A histogram sequence represents a time-series converted from an image by the given SFC. Due to the high-dimensionality nature, histogram sequences are very difficult to be stored and searched in their original form. To solve this problem, we generally use lower-dimensional transformations, which produce lower bounds among high dimensional sequences, but the tightness of those lower-bounds is highly affected by the types of SFC. In this paper, we attack a challenging problem of evaluating which SFC shows the better performance when we apply the lower-dimensional transformation to histogram sequences. For this, we first present a concept of spatial locality, which comes from an intuition of "if the entries are adjacent in a histogram sequence, their corresponding cells should also be adjacent in its original image." We also propose spatial locality preservation metric (slpm in short) that quantitatively evaluates spatial locality and present its formal computation method. We then evaluate five SFCs from the perspective of slpm and verify that this evaluation result concurs with the performance evaluation of lower-dimensional transformations in real image matching. Finally, we perform k-NN (k-nearest neighbors) search based on lower-dimensional transformations and validate accuracy of the proposed slpm by providing that the Hilbert-order with the highest slpm also shows the best performance in k-NN search.

Determination of Survival of Gastric Cancer Patients With Distant Lymph Node Metastasis Using Prealbumin Level and Prothrombin Time: Contour Plots Based on Random Survival Forest Algorithm on High-Dimensionality Clinical and Laboratory Datasets

  • Zhang, Cheng;Xie, Minmin;Zhang, Yi;Zhang, Xiaopeng;Feng, Chong;Wu, Zhijun;Feng, Ying;Yang, Yahui;Xu, Hui;Ma, Tai
    • Journal of Gastric Cancer
    • /
    • v.22 no.2
    • /
    • pp.120-134
    • /
    • 2022
  • Purpose: This study aimed to identify prognostic factors for patients with distant lymph node-involved gastric cancer (GC) using a machine learning algorithm, a method that offers considerable advantages and new prospects for high-dimensional biomedical data exploration. Materials and Methods: This study employed 79 features of clinical pathology, laboratory tests, and therapeutic details from 289 GC patients whose distant lymphadenopathy was presented as the first episode of recurrence or metastasis. Outcomes were measured as any-cause death events and survival months after distant lymph node metastasis. A prediction model was built based on possible outcome predictors using a random survival forest algorithm and confirmed by 5×5 nested cross-validation. The effects of single variables were interpreted using partial dependence plots. A contour plot was used to visually represent survival prediction based on 2 predictive features. Results: The median survival time of patients with GC with distant nodal metastasis was 9.2 months. The optimal model incorporated the prealbumin level and the prothrombin time (PT), and yielded a prediction error of 0.353. The inclusion of other variables resulted in poorer model performance. Patients with higher serum prealbumin levels or shorter PTs had a significantly better prognosis. The predicted one-year survival rate was stratified and illustrated as a contour plot based on the combined effect the prealbumin level and the PT. Conclusions: Machine learning is useful for identifying the important determinants of cancer survival using high-dimensional datasets. The prealbumin level and the PT on distant lymph node metastasis are the 2 most crucial factors in predicting the subsequent survival time of advanced GC.

The Study on Spatial Classification of Riverine Environment using UAV Hyperspectral Image (UAV를 활용한 초분광 영상의 하천공간특성 분류 연구)

  • Kim, Young-Joo;Han, Hyeong-Jun;Kang, Joon-Gu
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.10
    • /
    • pp.633-639
    • /
    • 2018
  • High-resolution images using remote sensing (RS) is importance to secure for spatial classification depending on the characteristics of the complex and various factors that make up the river environment. The purpose of this study is to evaluate the accuracy of the classification results and to suggest the possibility of applying the high resolution hyperspectral images obtained by using the drone to perform spatial classification. Hyperspectral images obtained from study area were reduced the dimensionality with PCA and MNF transformation to remove effects of noise. Spatial classification was performed by supervised classifications such as MLC(Maximum Likelihood Classification), SVM(Support Vector Machine) and SAM(Spectral Angle Mapping). In overall, the highest classification accuracy was showed when the MLC supervised classification was used by MNF transformed image. However, it was confirmed that the misclassification was mainly found in the boundary of some classes including water body and the shadowing area. The results of this study can be used as basic data for remote sensing using drone and hyperspectral sensor, and it is expected that it can be applied to a wider range of river environments through the development of additional algorithms.

The Fast Search Algorithm for Raman Spectrum (라만 스펙트럼 고속 검색 알고리즘)

  • Ko, Dae-Young;Baek, Sung-June;Park, Jun-Kyu;Seo, Yu-Gyeong;Seo, Sung-Il
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.5
    • /
    • pp.3378-3384
    • /
    • 2015
  • The problem of fast search for raman spectrum has attracted much attention recently. By far the most simple and widely used method is to calculate and compare the Euclidean distance between the given spectrum and the spectra in a database. But it is non-trivial problem because of the inherent high dimensionality of the data. One of the most serious problems is the high computational complexity of searching for the closet codeword. To overcome this problem, The fast codeword search algorithm based on the mean pyramids of codewords is currently used in image coding applications. In this paper, we present three new methods for the fast algorithm to search for the closet codeword. the proposed algorithm uses two significant features of a vector, mean values and variance, to reject many unlikely codewords and save a great deal of computation time. The Experiment results show about 42.8-55.2% performance improvement for the 1DMPS+PDS. The results obtained confirm the effectiveness of the proposed algorithm.

Construct Validation of the Sensory Profile for Children with Congenital Cerebral Palsy (뇌성마비 아동을 대상으로 한 감각프로파일(Sensory Profile)의 구성타당도 연구)

  • Yoo, Doo Han;Hong, Deok Gi;Hwang, Sun Jung
    • 재활복지
    • /
    • v.18 no.4
    • /
    • pp.315-330
    • /
    • 2014
  • The purpose of this study was to verify the construct validity of the Sensory Profile for children with congenital cerebral palsy. Parents of 87 children(the ages of 3 to 10) with congenital cerebral palsy participated in this study. The data were analyzed through Winstep version 3.81 using the Rasch model to examine the uni-dimensionality of the fit of each item, the distribution of difficulty of each item, and the reliability and appropriateness of the rating scale. Based on a Rasch analysis, four out of the 87 children were considered to be inappropriate participants, and 15 item of the Sensory Profile was determined to be an inappropriate item. The items of high-level difficulty are needed as new items of the Korean Sensory Profile. The rating scale of three categories were appropriate than those of the five categories. The person and item separation reliability of three categories was above 0.90, which is a relatively excellent value. Finally, it will be need to verify of validity for Korean version of Sensory Profile, to develop new item of a high level of difficulty and convert into three point rating scale.

The Effects of Environmental Dynamism on Supply Chain Commitment in the High-tech Industry: The Roles of Flexibility and Dependence (첨단산업의 환경동태성이 공급체인의 결속에 미치는 영향: 유연성과 의존성의 역할)

  • Kim, Sang-Deok;Ji, Seong-Goo
    • Journal of Global Scholars of Marketing Science
    • /
    • v.17 no.2
    • /
    • pp.31-54
    • /
    • 2007
  • The exchange between buyers and sellers in the industrial market is changing from short-term to long-term relationships. Long-term relationships are governed mainly by formal contracts or informal agreements, but many scholars are now asserting that controlling relationship by using formal contracts under environmental dynamism is inappropriate. In this case, partners will depend on each other's flexibility or interdependence. The former, flexibility, provides a general frame of reference, order, and standards against which to guide and assess appropriate behavior in dynamic and ambiguous situations, thus motivating the value-oriented performance goals shared between partners. It is based on social sacrifices, which can potentially minimize any opportunistic behaviors. The later, interdependence, means that each firm possesses a high level of dependence in an dynamic channel relationship. When interdependence is high in magnitude and symmetric, each firm enjoys a high level of power and the bonds between the firms should be reasonably strong. Strong shared power is likely to promote commitment because of the common interests, attention, and support found in such channel relationships. This study deals with environmental dynamism in high-tech industry. Firms in the high-tech industry regard it as a key success factor to successfully cope with environmental changes. However, due to the lack of studies dealing with environmental dynamism and supply chain commitment in the high-tech industry, it is very difficult to find effective strategies to cope with them. This paper presents the results of an empirical study on the relationship between environmental dynamism and supply chain commitment in the high-tech industry. We examined the effects of consumer, competitor, and technological dynamism on supply chain commitment. Additionally, we examined the moderating effects of flexibility and dependence of supply chains. This study was confined to the type of high-tech industry which has the characteristics of rapid technology change and short product lifecycle. Flexibility among the firms of this industry, having the characteristic of hard and fast growth, is more important here than among any other industry. Thus, a variety of environmental dynamism can affect a supply chain relationship. The industries targeted industries were electronic parts, metal product, computer, electric machine, automobile, and medical precision manufacturing industries. Data was collected as follows. During the survey, the researchers managed to obtain the list of parts suppliers of 2 companies, N and L, with an international competitiveness in the mobile phone manufacturing industry; and of the suppliers in a business relationship with S company, a semiconductor manufacturing company. They were asked to respond to the survey via telephone and e-mail. During the two month period of February-April 2006, we were able to collect data from 44 companies. The respondents were restricted to direct dealing authorities and subcontractor company (the supplier) staff with at least three months of dealing experience with a manufacture (an industrial material buyer). The measurement validation procedures included scale reliability; discriminant and convergent validity were used to validate measures. Also, the reliability measurements traditionally employed, such as the Cronbach's alpha, were used. All the reliabilities were greater than.70. A series of exploratory factor analyses was conducted. We conducted confirmatory factor analyses to assess the validity of our measurements. A series of chi-square difference tests were conducted so that the discriminant validity could be ensured. For each pair, we estimated two models-an unconstrained model and a constrained model-and compared the two model fits. All these tests supported discriminant validity. Also, all items loaded significantly on their respective constructs, providing support for convergent validity. We then examined composite reliability and average variance extracted (AVE). The composite reliability of each construct was greater than.70. The AVE of each construct was greater than.50. According to the multiple regression analysis, customer dynamism had a negative effect and competitor dynamism had a positive effect on a supplier's commitment. In addition, flexibility and dependence had significant moderating effects on customer and competitor dynamism. On the other hand, all hypotheses about technological dynamism had no significant effects on commitment. In other words, technological dynamism had no direct effect on supplier's commitment and was not moderated by the flexibility and dependence of the supply chain. This study makes its contribution in the point of view that this is a rare study on environmental dynamism and supply chain commitment in the field of high-tech industry. Especially, this study verified the effects of three sectors of environmental dynamism on supplier's commitment. Also, it empirically tested how the effects were moderated by flexibility and dependence. The results showed that flexibility and interdependence had a role to strengthen supplier's commitment under environmental dynamism in high-tech industry. Thus relationship managers in high-tech industry should make supply chain relationship flexible and interdependent. The limitations of the study are as follows; First, about the research setting, the study was conducted with high-tech industry, in which the direction of the change in the power balance of supply chain dyads is usually determined by manufacturers. So we have a difficulty with generalization. We need to control the power structure between partners in a future study. Secondly, about flexibility, we treated it throughout the paper as positive, but it can also be negative, i.e. violating an agreement or moving, but in the wrong direction, etc. Therefore we need to investigate the multi-dimensionality of flexibility in future research.

  • PDF

A Hierarchical Cluster Tree Based Fast Searching Algorithm for Raman Spectroscopic Identification (계층 클러스터 트리 기반 라만 스펙트럼 식별 고속 검색 알고리즘)

  • Kim, Sun-Keum;Ko, Dae-Young;Park, Jun-Kyu;Park, Aa-Ron;Baek, Sung-June
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.20 no.3
    • /
    • pp.562-569
    • /
    • 2019
  • Raman spectroscopy has been receiving increased attention as a standoff explosive detection technique. In addition, there is a growing need for a fast search method that can identify raman spectrum for measured chemical substances compared to known raman spectra in large database. By far the most simple and widely used method is to calculate and compare the Euclidean distance between the given spectrum and the spectra in a database. But it is non-trivial problem because of the inherent high dimensionality of the data. One of the most serious problems is the high computational complexity of searching for the closet spectra. To overcome this problem, we presented the MPS Sort with Sorted Variance+PDS method for the fast algorithm to search for the closet spectra in the last paper. the proposed algorithm uses two significant features of a vector, mean values and variance, to reject many unlikely spectra and save a great deal of computation time. In this paper, we present two new methods for the fast algorithm to search for the closet spectra. the PCA+PDS algorithm reduces the amount of computation by reducing the dimension of the data through PCA transformation with the same result as the distance calculation using the whole data. the Hierarchical Cluster Tree algorithm makes a binary hierarchical tree using PCA transformed spectra data. then it start searching from the clusters closest to the input spectrum and do not calculate many spectra that can not be candidates, which save a great deal of computation time. As the Experiment results, PCA+PDS shows about 60.06% performance improvement for the MPS Sort with Sorted Variance+PDS. also, Hierarchical Tree shows about 17.74% performance improvement for the PCA+PDS. The results obtained confirm the effectiveness of the proposed algorithm.

Robust Face Recognition based on 2D PCA Face Distinctive Identity Feature Subspace Model (2차원 PCA 얼굴 고유 식별 특성 부분공간 모델 기반 강인한 얼굴 인식)

  • Seol, Tae-In;Chung, Sun-Tae;Kim, Sang-Hoon;Chung, Un-Dong;Cho, Seong-Won
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.47 no.1
    • /
    • pp.35-43
    • /
    • 2010
  • 1D PCA utilized in the face appearance-based face recognition methods such as eigenface-based face recognition method may lead to less face representative power and more computational cost due to the resulting 1D face appearance data vector of high dimensionality. To resolve such problems of 1D PCA, 2D PCA-based face recognition methods had been developed. However, the face representation model obtained by direct application of 2D PCA to a face image set includes both face common features and face distinctive identity features. Face common features not only prevent face recognizability but also cause more computational cost. In this paper, we first develope a model of a face distinctive identity feature subspace separated from the effects of face common features in the face feature space obtained by application of 2D PCA analysis. Then, a novel robust face recognition based on the face distinctive identity feature subspace model is proposed. The proposed face recognition method based on the face distinctive identity feature subspace shows better performance than the conventional PCA-based methods (1D PCA-based one and 2D PCA-based one) with respect to recognition rate and processing time since it depends only on the face distinctive identity features. This is verified through various experiments using Yale A and IMM face database consisting of face images with various face poses under various illumination conditions.