• Title/Summary/Keyword: Data reduction

Search Result 6,243, Processing Time 0.034 seconds

Performance evaluation of principal component analysis for clustering problems

  • Kim, Jae-Hwan;Yang, Tae-Min;Kim, Jung-Tae
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.40 no.8
    • /
    • pp.726-732
    • /
    • 2016
  • Clustering analysis is widely used in data mining to classify data into categories on the basis of their similarity. Through the decades, many clustering techniques have been developed, including hierarchical and non-hierarchical algorithms. In gene profiling problems, because of the large number of genes and the complexity of biological networks, dimensionality reduction techniques are critical exploratory tools for clustering analysis of gene expression data. Recently, clustering analysis of applying dimensionality reduction techniques was also proposed. PCA (principal component analysis) is a popular methd of dimensionality reduction techniques for clustering problems. However, previous studies analyzed the performance of PCA for only full data sets. In this paper, to specifically and robustly evaluate the performance of PCA for clustering analysis, we exploit an improved FCBF (fast correlation-based filter) of feature selection methods for supervised clustering data sets, and employ two well-known clustering algorithms: k-means and k-medoids. Computational results from supervised data sets show that the performance of PCA is very poor for large-scale features.

DEVELOPMENT OF REAL-TIME DATA REDUCTION PIPELINE FOR KMTNet (KMTNet 실시간 자료처리 파이프라인 개발)

  • Kim, D.J.;Lee, C.U.;Kim, S.L.;Park, B.G.
    • Publications of The Korean Astronomical Society
    • /
    • v.28 no.1
    • /
    • pp.1-6
    • /
    • 2013
  • Real-time data reduction pipeline for the Korea Microlensing Telescope Network (KMTNet) was developed by Korea Astronomy and Space Science Institute (KASI). The main goal of the data reduction pipeline is to find variable objects and to record their light variation from the large amount of observation data of about 200 GB per night per site. To achieve the goal we adopt three strategic implementations: precision pointing of telescope using the cross correlation correction for target fields, realtime data transferring using kernel-level file handling and high speed network, and segment data processing architecture using the Sun-Grid engine. We tested performance of the pipeline using simulated data which represent the similar circumstance to CTIO (Cerro Tololo Inter-American Observatory), and we have found that it takes about eight hours for whole processing of one-night data. Therefore we conclude that the pipeline works without problem in real-time if the network speed is high enough, e.g., as high as in CTIO.

The Improvement in Signal Integrity of FT-ICR MS (FT-ICR 질량분석기의 신호 충실성 향상)

  • Kim, Seung-Yong;Kim, Seok-Yoon;Kim, Hyun Sik
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.60 no.1
    • /
    • pp.201-204
    • /
    • 2011
  • For efficient noise reduction in a Fourier transform ion cyclotron resonance (FT-ICR) mass spectrum, a new algorithm was proposed. The suggested algorithm reduces white and electrical noise, and it improves signal-to-noise ratio. This algorithm has been optimized to reduce the noise more efficiently using the traces of signal level. The algorithm has been efficiently combined with derivative window to improve the resolution as well S/N. Time domain data was corrected for DC voltage interference. $t^n$ window was applied in time domain data to improved the resolution. However, $t^n$ window can improve the signal resolution, it will also increase the noise level in frequency domain. Therefore, newly developed noise reduction algorithm will be applied to make a balance between resolving power and S/N ratio for magnitude mode. The trace algorithm can determine the current data point with several data points (mean, past data, calculated past data). In the current calculations, we assumed data points with S/N ratio more than 3 were considered as signal data points. After the windowing and noise reduction, both resolution and signal-to-noise ratio were improved. This algorithm is applicable more efficiently to frequency dependent noise and large size data.

Incremental Linear Discriminant Analysis for Streaming Data Using the Minimum Squared Error Solution (스트리밍 데이터에 대한 최소제곱오차해를 통한 점층적 선형 판별 분석 기법)

  • Lee, Gyeong-Hoon;Park, Cheong Hee
    • Journal of KIISE
    • /
    • v.45 no.1
    • /
    • pp.69-75
    • /
    • 2018
  • In the streaming data where data samples arrive sequentially in time, it is difficult to apply the dimension reduction method based on batch learning. Therefore an incremental dimension reduction method for the application to streaming data has been studied. In this paper, we propose an incremental linear discriminant analysis method using the least squared error solution. Instead of computing scatter matrices directly, the proposed method incrementally updates the projective direction for dimension reduction by using the information of a new incoming sample. The experimental results demonstrate that the proposed method is more efficient compared with previously proposed incremental dimension reduction methods.

A Clustering Approach for Feature Selection in Microarray Data Classification Using Random Forest

  • Aydadenta, Husna;Adiwijaya, Adiwijaya
    • Journal of Information Processing Systems
    • /
    • v.14 no.5
    • /
    • pp.1167-1175
    • /
    • 2018
  • Microarray data plays an essential role in diagnosing and detecting cancer. Microarray analysis allows the examination of levels of gene expression in specific cell samples, where thousands of genes can be analyzed simultaneously. However, microarray data have very little sample data and high data dimensionality. Therefore, to classify microarray data, a dimensional reduction process is required. Dimensional reduction can eliminate redundancy of data; thus, features used in classification are features that only have a high correlation with their class. There are two types of dimensional reduction, namely feature selection and feature extraction. In this paper, we used k-means algorithm as the clustering approach for feature selection. The proposed approach can be used to categorize features that have the same characteristics in one cluster, so that redundancy in microarray data is removed. The result of clustering is ranked using the Relief algorithm such that the best scoring element for each cluster is obtained. All best elements of each cluster are selected and used as features in the classification process. Next, the Random Forest algorithm is used. Based on the simulation, the accuracy of the proposed approach for each dataset, namely Colon, Lung Cancer, and Prostate Tumor, achieved 85.87%, 98.9%, and 89% accuracy, respectively. The accuracy of the proposed approach is therefore higher than the approach using Random Forest without clustering.

A Semi-supervised Dimension Reduction Method Using Ensemble Approach (앙상블 접근법을 이용한 반감독 차원 감소 방법)

  • Park, Cheong-Hee
    • The KIPS Transactions:PartD
    • /
    • v.19D no.2
    • /
    • pp.147-150
    • /
    • 2012
  • While LDA is a supervised dimension reduction method which finds projective directions to maximize separability between classes, the performance of LDA is severely degraded when the number of labeled data is small. Recently semi-supervised dimension reduction methods have been proposed which utilize abundant unlabeled data and overcome the shortage of labeled data. However, matrix computation usually used in statistical dimension reduction methods becomes hindrance to make the utilization of a large number of unlabeled data difficult, and moreover too much information from unlabeled data may not so helpful compared to the increase of its processing time. In order to solve these problems, we propose an ensemble approach for semi-supervised dimension reduction. Extensive experimental results in text classification demonstrates the effectiveness of the proposed method.

DR-LSTM: Dimension reduction based deep learning approach to predict stock price

  • Ah-ram Lee;Jae Youn Ahn;Ji Eun Choi;Kyongwon Kim
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.213-234
    • /
    • 2024
  • In recent decades, increasing research attention has been directed toward predicting the price of stocks in financial markets using deep learning methods. For instance, recurrent neural network (RNN) is known to be competitive for datasets with time-series data. Long short term memory (LSTM) further improves RNN by providing an alternative approach to the gradient loss problem. LSTM has its own advantage in predictive accuracy by retaining memory for a longer time. In this paper, we combine both supervised and unsupervised dimension reduction methods with LSTM to enhance the forecasting performance and refer to this as a dimension reduction based LSTM (DR-LSTM) approach. For a supervised dimension reduction method, we use methods such as sliced inverse regression (SIR), sparse SIR, and kernel SIR. Furthermore, principal component analysis (PCA), sparse PCA, and kernel PCA are used as unsupervised dimension reduction methods. Using datasets of real stock market index (S&P 500, STOXX Europe 600, and KOSPI), we present a comparative study on predictive accuracy between six DR-LSTM methods and time series modeling.

Development of Acoustic Emission Monitoring System for Fault Detection of Thermal Reduction Reactor

  • Pakk, Gee-Young;Yoon, Ji-Sup;Park, Byung-Suk;Hong, Dong-Hee;Kim, Young-Hwan
    • Nuclear Engineering and Technology
    • /
    • v.35 no.1
    • /
    • pp.25-34
    • /
    • 2003
  • The research on the development of the fault monitoring system for the thermal reduction reactor has been performed preliminarily in order to support the successful operation of the thermal reduction reactor. The final task of the development of the fault monitoring system is to assure the integrity of the thermal$_3$ reduction reactor by the acoustic emission (AE) method. The objectives of this paper are to identify and characterize the fault-induced signals for the discrimination of the various AE signals acquired during the reactor operation. The AE data acquisition and analysis system was constructed and applied to the fault monitoring of the small- scale reduction reactor, Through the series of experiments, the various signals such as background noise, operating signals, and fault-induced signals were measured and their characteristics were identified, which will be used in the signal discrimination for further application to full-scale thermal reduction reactor.

Development of Noise Prediction Program in Construction Sites (건설 공사장 간이 소음 예측 프로그램 개발)

  • Kim, Ha-Geun;Joo, Si-Woong
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2007.05a
    • /
    • pp.1157-1161
    • /
    • 2007
  • A construction noise is the main reason for people's petition among the pollution. The purpose of this study is to develop the noise prediction program to see the level of the noise on the construction site more accurately. For this purpose, the database of the power level on the various equipments was made. The noise reduction by distance and the noise reduction by diffraction of barrier were mainly considered and calculated. The simple noise prediction program will provide the information about proper height and length of the potable barrier which satisfies noise criteria of the construction sites from a construction planning stage. To investigate the reliability of this program, the predicted data was compared with the measured data. An average of difference between measured data and predicted data is 1.3 dB(A) and a coefficient of correlation is about 0.95.

  • PDF

Low Cost SOC(System-On-a-Chip) Testing Method for Reduction of Test Data and Power Dissipation (테스트 데이터와 전력소비 단축을 위한 저비용 SOC 테스트 기법)

  • Hur Yongmin;Lin Chi-ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.41 no.12
    • /
    • pp.83-90
    • /
    • 2004
  • This paper proposes an efficient scan testing method for compression of test input data and reduction of test power for SOC. The proposed method determines whether some parts of a test response can be reused as a part of next input test data on the analysis of deterministic test data and its response. Our experimental results show that benchmark circuits have a high similarity between un-compacted deterministic input test data and its response. The proposed testing method achieves the average of 29.4% reduction of power dissipation based on the number of test clock and 69.7% reduction of test data for ISCAS'89 benchmark circuits.