• 제목/요약/키워드: Data preprocessing method

검색결과 498건 처리시간 0.026초

Prediction of the price for stock index futures using integrated artificial intelligence techniques with categorical preprocessing

  • Kim, Kyoung-jae;Han, Ingoo
    • 한국경영과학회:학술대회논문집
    • /
    • 한국경영과학회 1997년도 추계학술대회발표논문집; 홍익대학교, 서울; 1 Nov. 1997
    • /
    • pp.105-108
    • /
    • 1997
  • Previous studies in stock market predictions using artificial intelligence techniques such as artificial neural networks and case-based reasoning, have focused mainly on spot market prediction. Korea launched trading in index futures market (KOSPI 200) on May 3, 1996, then more people became attracted to this market. Thus, this research intends to predict the daily up/down fluctuant direction of the price for KOSPI 200 index futures to meet this recent surge of interest. The forecasting methodologies employed in this research are the integration of genetic algorithm and artificial neural network (GAANN) and the integration of genetic algorithm and case-based reasoning (GACBR). Genetic algorithm was mainly used to select relevant input variables. This study adopts the categorical data preprocessing based on expert's knowledge as well as traditional data preprocessing. The experimental results of each forecasting method with each data preprocessing method are compared and statistically tested. Artificial neural network and case-based reasoning methods with best performance are integrated. Out-of-the Model Integration and In-Model Integration are presented as the integration methodology. The research outcomes are as follows; First, genetic algorithms are useful and effective method to select input variables for Al techniques. Second, the results of the experiment with categorical data preprocessing significantly outperform that with traditional data preprocessing in forecasting up/down fluctuant direction of index futures price. Third, the integration of genetic algorithm and case-based reasoning (GACBR) outperforms the integration of genetic algorithm and artificial neural network (GAANN). Forth, the integration of genetic algorithm, case-based reasoning and artificial neural network (GAANN-GACBR, GACBRNN and GANNCBR) provide worse results than GACBR.

  • PDF

A Nonparametric Approach for Noisy Point Data Preprocessing

  • Xi, Yongjian;Duan, Ye;Zhao, Hongkai
    • International Journal of CAD/CAM
    • /
    • 제9권1호
    • /
    • pp.31-36
    • /
    • 2010
  • 3D point data acquired from laser scan or stereo vision can be quite noisy. A preprocessing step is often needed before a surface reconstruction algorithm can be applied. In this paper, we propose a nonparametric approach for noisy point data preprocessing. In particular, we proposed an anisotropic kernel based nonparametric density estimation method for outlier removal, and a hill-climbing line search approach for projecting data points onto the real surface boundary. Our approach is simple, robust and efficient. We demonstrate our method on both real and synthetic point datasets.

데이터 전처리를 이용한 다중 모델 퍼지 예측기의 설계 및 응용 (Design of Multiple Model Fuzzy Predictors using Data Preprocessing and its Application)

  • 방영근;이철희
    • 전기학회논문지
    • /
    • 제58권1호
    • /
    • pp.173-180
    • /
    • 2009
  • It is difficult to predict non-stationary or chaotic time series which includes the drift and/or the non-linearity as well as uncertainty. To solve it, we propose an effective prediction method which adopts data preprocessing and multiple model TS fuzzy predictors combined with model selection mechanism. In data preprocessing procedure, the candidates of the optimal difference interval are determined based on the correlation analysis, and corresponding difference data sets are generated in order to use them as predictor input instead of the original ones because the difference data can stabilize the statistical characteristics of those time series and better reveals their implicit properties. Then, TS fuzzy predictors are constructed for multiple model bank, where k-means clustering algorithm is used for fuzzy partition of input space, and the least squares method is applied to parameter identification of fuzzy rules. Among the predictors in the model bank, the one which best minimizes the performance index is selected, and it is used for prediction thereafter. Finally, the error compensation procedure based on correlation analysis is added to improve the prediction accuracy. Some computer simulations are performed to verify the effectiveness of the proposed method.

데이터 전처리와 퍼지 논리 시스템을 이용한 전력 부하 예측 (Electric Load Forecasting using Data Preprocessing and Fuzzy Logic System)

  • 방영근;이철희
    • 전기학회논문지
    • /
    • 제66권12호
    • /
    • pp.1751-1758
    • /
    • 2017
  • This paper presents a fuzzy logic system with data preprocessing to make the accurate electric power load prediction system. The fuzzy logic system acceptably treats the hidden characteristic of the nonlinear data. The data preprocessing processes the original data to provide more information of its characteristics. Thus the combination of two methods can predict the given data more accurately. The former uses TSK fuzzy logic system to apply the linguistic rule base and the linear regression model while the latter uses the linear interpolation method. Finally, four regional electric power load data in taiwan are used to evaluate the performance of the proposed prediction system.

STATISTICALLY PREPROCESSED DATA BASED PARAMETRIC COST MODEL FOR BUILDING PROJECTS

  • Sae-Hyun Ji;Moonseo Park;Hyun-Soo Lee
    • 국제학술발표논문집
    • /
    • The 3th International Conference on Construction Engineering and Project Management
    • /
    • pp.417-424
    • /
    • 2009
  • For a construction project to progress smoothly, effective cost estimation is vital, particularly in the conceptual and schematic design stages. In these early phases, despite the fact that initial estimates are highly sensitive to changes in project scope, owners require accurate forecasts which reflect their supplying information. Thus, cost estimators need effective estimation strategies. Practically, parametric cost estimates are the most commonly used method in these initial phases, which utilizes historical cost data (Karshenas 1984, Kirkham 2007). Hence, compilation of historical data regarding appropriate cost variance governing parameters is a prime requirement. However, precedent practice of data mining (data preprocessing) for denoising internal errors or abnormal values is needed before compilation. As an effort to deal with this issue, this research proposed a statistical methodology for data preprocessing and verified that data preprocessing has a positive impact on the enhancement of estimate accuracy and stability. Moreover, Statistically Preprocessed data Based Parametric (SPBP) cost models are developed based on multiple regression equations and verified their effectiveness compared with conventional cost models.

  • PDF

최적 TS 퍼지 모델 기반 다중 모델 예측 시스템의 구현과 시계열 예측 응용 (Multiple Model Prediction System Based on Optimal TS Fuzzy Model and Its Applications to Time Series Forecasting)

  • 방영근;이철희
    • 산업기술연구
    • /
    • 제28권B호
    • /
    • pp.101-109
    • /
    • 2008
  • In general, non-stationary or chaos time series forecasting is very difficult since there exists a drift and/or nonlinearities in them. To overcome this situation, we suggest a new prediction method based on multiple model TS fuzzy predictors combined with preprocessing of time series data, where, instead of time series data, the differences of them are applied to predictors as input. In preprocessing procedure, the candidates of optimal difference interval are determined by using con-elation analysis and corresponding difference data are generated. And then, for each of them, TS fuzzy predictor is constructed by using k-means clustering algorithm and least squares method. Finally, the best predictor which minimizes the performance index is selected and it works on hereafter for prediction. Computer simulation is performed to show the effectiveness and usefulness of our method.

  • PDF

효율적인 데이터베이스 마케팅을 위한 데이터마이닝 전처리도구에 관한 연구 (A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing)

  • 이준석
    • 디지털융복합연구
    • /
    • 제12권11호
    • /
    • pp.257-264
    • /
    • 2014
  • 효율적인 데이터베이스 마케팅을 위하여 고객들을 세분화하고, 새로운 지식을 탐색할 수 있는 데이터마이닝의 필요성이 증대되고 있다. 데이터마이닝 도구를 구축하기 위해서는 단계별 구현이 요구되어 지는데, 본 연구에서는 데이터마이닝을 위한 분산 환경에 적응 가능한 데이터 전처리 도구를 구성하였다. 기존의 데이터마이닝 도구인 앤서 트리, 클레멘타인, 엔터프라이즈 마이너, 캔싱턴, 웨카의 전처리 부분을 고찰하고, 분산 환경에서 효율적으로 사용할 수 있는 데이터 마이닝 전처리 도구를 구성하였다. 새로이 제안된 시스템은 엔터프라이즈 자바 빈즈와 XML을 기반으로 하였다.

초분광영상의 조명효과 보정 전처리기법 분석 (Analyzing Preprocessing for Correcting Lighting Effects in Hyperspectral Images)

  • 송영선
    • 한국산업융합학회 논문집
    • /
    • 제26권5호
    • /
    • pp.785-792
    • /
    • 2023
  • Because hyperspectral imaging provides detailed spectral information across a broad range of wavelengths, it can be utilized in numerous applications, including environmental monitoring, food quality inspection, medical diagnosis, material identification, art authentication, and crime scene analysis. However, hyperspectral images often contain various types of distortions due to the environmental conditions during image acquisition, which necessitates the proper removal of these distortions through a data preprocessing process. In this study, a preprocessing method was investigated to effectively correct the distortion caused by artificial light sources used in indoor hyperspectral imaging. For this purpose, a halogen-tungsten artificial light source was installed indoors, and hyperspectral images were acquired. The acquired images were then corrected for distortion using a preprocessing that does not require complex auxiliary equipment. After the corrections were made, the results were analyzed. According to the analysis, a statistical transformation technique using mean and standard deviation with reference to a reference signal was found to be the most effective in correcting distortions caused by artificial light sources.

영상 클러스터링과 HSV 컬러 모델을 이용한 차선 검출 전처리 기법 (Preprocessing Technique for Lane Detection Using Image Clustering and HSV Color Model)

  • 최나래;최상일
    • 한국멀티미디어학회논문지
    • /
    • 제20권2호
    • /
    • pp.144-152
    • /
    • 2017
  • Among the technologies for implementing autonomous vehicles, advanced driver assistance system is a key technology to support driver's safe driving. In the technology using the vision sensor having a high utility, various preprocessing methods are used prior to feature extraction for lane detection. However, in the existing methods, the unnecessary lane candidates such as cars, lawns, and road separator in the road area are false positive. In addition, there are cases where the lane candidate itself can not be extracted in the area under the overpass, the lane within the dark shadow, the center lane of yellow, and weak lane. In this paper, we propose an efficient preprocessing method using k-means clustering for image division and the HSV color model. When the proposed preprocessing method is applied, the true positive region is maximally maintained during the lane detection and many false positive regions are removed.

과실의 비파괴 당도 예측 모델의 성능향상을 위한 투과스펙트럼의 전처리 (Preprocessing of Transmitted Spectrum Data for Development of a Robust Non-destructive Sugar Prediction Model of Intact Fruits)

  • 노상하;류동수
    • 비파괴검사학회지
    • /
    • 제22권4호
    • /
    • pp.361-368
    • /
    • 2002
  • 본 연구는 초당 2개의 속도로 이송되는 사과를 대상으로 측정된 투과 에너지 스팩트럼 데이터를 이용하여 사과의 당도예측 모델을 개발하기 위해 각종 전처리가 당도 예측 모델의 정밀도에 미치는 영향을 구명하고, 신뢰성이 높은 당도 예측 회귀 모델을 개발하기 위해 수행되었다. 스펙트럼의 산란 보정, 노이즈 감소 등을 위해 1차미분, MSC, SNV, OSC 및 이들 조합으로 구성된 전처리 알고리즘을 프로그래밍하고, 이들 전처리를 스펙트럼데이터에 적용한 결과 특히 MSC SNV에 의해 각 파장에서의 투과에너지와 당도와의 상관관계가 전처리를 하지 않은 경우에 비해 현저히 증가하였다. 각종 전처리를 수행한 후 당도 예측 회귀 모델을 개발하고, 검정한 결과, 전처리 방법에 따라 예측모델의 SEP가 최대 1.265%brix 에서 최소 0.507%brix로 큰 차이를 나타내었다. 이는 SEP를 최소화하기 위해 주어진 스펙트럼 데이터의 특성에 알맞는 전처리 방법이 개발 또는 선택되어야 함을 의미한다. MSC 와 SNV는 예측 정밀도와 밀접한 관계가 있으며, OSC는 PLS의 factor 수와 관계되는 것으로 판단되었다. 1차미분은 오히려 모델의 예측 성능을 저하시키는 것으로 나타났다. 이는 실시간으로 측정된 투과스펙트럼에 상대적으로 노이즈 성분이 많이 포함되어 이들 성분이 미분에 의해 강조된 것으로 판단되었다. 본 연구에 사용된 스펙트럼 데이터의 경우 MSC와 OSC 전처리를 수행한 당도예측모델이 $R^2=0.8823$, SEP=0.5071%brix, bias=0.0327로 가장 우수하였다.