• Title/Summary/Keyword: Knowledge-based preprocessing

Search Result 44, Processing Time 0.03 seconds

Prediction of the price for stock index futures using integrated artificial intelligence techniques with categorical preprocessing

  • Kim, Kyoung-jae;Han, Ingoo
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1997.10a
    • /
    • pp.105-108
    • /
    • 1997
  • Previous studies in stock market predictions using artificial intelligence techniques such as artificial neural networks and case-based reasoning, have focused mainly on spot market prediction. Korea launched trading in index futures market (KOSPI 200) on May 3, 1996, then more people became attracted to this market. Thus, this research intends to predict the daily up/down fluctuant direction of the price for KOSPI 200 index futures to meet this recent surge of interest. The forecasting methodologies employed in this research are the integration of genetic algorithm and artificial neural network (GAANN) and the integration of genetic algorithm and case-based reasoning (GACBR). Genetic algorithm was mainly used to select relevant input variables. This study adopts the categorical data preprocessing based on expert's knowledge as well as traditional data preprocessing. The experimental results of each forecasting method with each data preprocessing method are compared and statistically tested. Artificial neural network and case-based reasoning methods with best performance are integrated. Out-of-the Model Integration and In-Model Integration are presented as the integration methodology. The research outcomes are as follows; First, genetic algorithms are useful and effective method to select input variables for Al techniques. Second, the results of the experiment with categorical data preprocessing significantly outperform that with traditional data preprocessing in forecasting up/down fluctuant direction of index futures price. Third, the integration of genetic algorithm and case-based reasoning (GACBR) outperforms the integration of genetic algorithm and artificial neural network (GAANN). Forth, the integration of genetic algorithm, case-based reasoning and artificial neural network (GAANN-GACBR, GACBRNN and GANNCBR) provide worse results than GACBR.

  • PDF

A Regularity-Based Preprocessing Method for Collaborative Recommender Systems

  • Toledo, Raciel Yera;Mota, Yaile Caballero;Borroto, Milton Garcia
    • Journal of Information Processing Systems
    • /
    • v.9 no.3
    • /
    • pp.435-460
    • /
    • 2013
  • Recommender systems are popular applications that help users to identify items that they could be interested in. A recent research area on recommender systems focuses on detecting several kinds of inconsistencies associated with the user preferences. However, the majority of previous works in this direction just process anomalies that are intentionally introduced by users. In contrast, this paper is centered on finding the way to remove non-malicious anomalies, specifically in collaborative filtering systems. A review of the state-of-the-art in this field shows that no previous work has been carried out for recommendation systems and general data mining scenarios, to exactly perform this preprocessing task. More specifically, in this paper we propose a method that is based on the extraction of knowledge from the dataset in the form of rating regularities (similar to frequent patterns), and their use in order to remove anomalous preferences provided by users. Experiments show that the application of the procedure as a preprocessing step improves the performance of a data-mining task associated with the recommendation and also effectively detects the anomalous preferences.

Fingerprint Image Quality Analysis for Knowledge-based Image Enhancement (지식기반 영상개선을 위한 지문영상의 품질분석)

  • 윤은경;조성배
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.7
    • /
    • pp.911-921
    • /
    • 2004
  • Accurate minutiae extraction from input fingerprint images is one of the critical modules in robust automatic fingerprint identification system. However, the performance of a minutiae extraction is heavily dependent on the quality of the input fingerprint images. If the preprocessing is performed according to the fingerprint image characteristics in the image enhancement step, the system performance will be more robust. In this paper, we propose a knowledge-based preprocessing method, which extracts S features (the mean and variance of gray values, block directional difference, orientation change level, and ridge-valley thickness ratio) from the fingerprint images and analyzes image quality with Ward's clustering algorithm, and enhances the images with respect to oily/neutral/dry characteristics. Experimental results using NIST DB 4 and Inha University DB show that clustering algorithm distinguishes the image Quality characteristics well. In addition, the performance of the proposed method is assessed using quality index and block directional difference. The results indicate that the proposed method improves both the quality index and block directional difference.

Creating Knowledge from Construction Documents Using Text Mining

  • Shin, Yoonjung;Chi, Seokho
    • International conference on construction engineering and project management
    • /
    • 2015.10a
    • /
    • pp.37-38
    • /
    • 2015
  • A number of documents containing important and useful knowledge have been generated over time in the construction industry. Such text-based knowledge plays an important role in the construction industry for decision-making and business strategy development by being used as best practice for upcoming projects, delivering lessons learned for better risk management and project control. Thus, practical and usable knowledge creation from construction documents is necessary to improve business efficiency. This study proposes a knowledge creating system from construction documents using text mining and the design comprises three main steps - text mining preprocessing, weight calculation of each term, and visualization. A system prototype was developed as a pilot study of the system design. This study is significant because it validates a knowledge creating system design based on text mining and visualization functionality through the developed system prototype. Automated visualization was found to significantly reduce unnecessary time consumption and energy for processing existing data and reading a range of documents to get to their core, and helped the system to provide an insight into the construction industry.

  • PDF

Ontology based Preprocessing Scheme for Mining Data Streams from Sensor Networks (센서 네트워크의 데이터 스트림 마이닝을 위한 온톨로지 기반의 전처리 기법)

  • Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • v.15 no.3
    • /
    • pp.67-80
    • /
    • 2009
  • By a number of sensors and sensor networks, we can collect environmental information from a certain sensor space. To discover more useful information and knowledge, we want to employ data mining methodologies to sensor data stream from such sensor spaces. In this paper, we present a novel data preprocessing scheme to improve the performances of the data mining algorithms. Especially, ontologies are applied to represent meanings of the sensor data. For evaluating the proposed method, we have collected sensor streams for about 30 days, and simulated them to compare with other approaches.

  • PDF

A Knowledge-Based System for Address Block Location on Korean Envelope Images (우리나라 우편 봉투 영상에서의 주소 영역 추추을 위한 지식 기반 시스템)

  • 김기철;이성환
    • Journal of the Korean Institute of Telematics and Electronics B
    • /
    • v.31B no.8
    • /
    • pp.137-147
    • /
    • 1994
  • In this paper,we propose a knowledge-based system for locating Destination Address Block(DAB) by analyzing the structure of Korean envelope images. In the proposed system the preprocessing steps such as adaptive binarization connected component extraction and deskewing are carried out first for the effective structure analysis of the envelope image. Then DAB containing address name and zipcode parts of the input envelope image is extracted by an iterative procedure based on the knowledge acquired from the statistical feature analysis of the various envelope images. Most of the system for slocating address blocks on envelopes have extracted DAB by segmenting an envelope image into several candidate blocks followed by selecting one among the candidate blocks. Because it is very difficult to segment a Korean envelope image into several blocks due to the specific writing habits that the addresses on the envelope are written in close proximity to each other the proposed iterative procedure determines DAB by splitting or merging the connected components and verifies the determined DAB without segmentation and selection. Experiments with a great number of the live envelopes provided from Seoul Mail Center in Koorea were carried out. The results reveal that the proposed system is very effective for address block location on Korean envelopes.

  • PDF

Comparison of Anomaly Detection Performance Based on GRU Model Applying Various Data Preprocessing Techniques and Data Oversampling (다양한 데이터 전처리 기법과 데이터 오버샘플링을 적용한 GRU 모델 기반 이상 탐지 성능 비교)

  • Yoo, Seung-Tae;Kim, Kangseok
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.201-211
    • /
    • 2022
  • According to the recent change in the cybersecurity paradigm, research on anomaly detection methods using machine learning and deep learning techniques, which are AI implementation technologies, is increasing. In this study, a comparative study on data preprocessing techniques that can improve the anomaly detection performance of a GRU (Gated Recurrent Unit) neural network-based intrusion detection model using NGIDS-DS (Next Generation IDS Dataset), an open dataset, was conducted. In addition, in order to solve the class imbalance problem according to the ratio of normal data and attack data, the detection performance according to the oversampling ratio was compared and analyzed using the oversampling technique applied with DCGAN (Deep Convolutional Generative Adversarial Networks). As a result of the experiment, the method preprocessed using the Doc2Vec algorithm for system call feature and process execution path feature showed good performance, and in the case of oversampling performance, when DCGAN was used, improved detection performance was shown.

A Design and Implementation on Ontology for Public Participation GIS (시민참여형 GIS를 위한 온톨로지 설계 및 구현)

  • Park, Ji-Man
    • Journal of the Korean Geographical Society
    • /
    • v.44 no.3
    • /
    • pp.372-394
    • /
    • 2009
  • This study investigates the ontology-based public participation GIS(PPGIS). The major reason that ontology-based GIS has attracted attention in semantic communication in recent year is due to the wide availability of geographical variable and the imminent need for turning such recommendation into useful geographical knowledge. Therefore, this study has been focused on designing and implementing the pilot tested system for public participation GIS. The applicability of the pilot tested was validated through a simulation experiment for history tourism in Guri city Gyeongi-do, Focused on the methodology, the life cycle model which involves regional statues and user recognition, can be viewed as an important preprocessing step(specification, conceptualization, formalization, integration and implementation) for recommended geographical knowledge discovery by axiom. Focusing on practicality, ontology in this study would be recommended for geographical knowledge through reasoning. In addition, ontology-based public participation GIS would show integration epistemological and ontological approach, and be utilized as an index which is connected with semantic communication. The results of the pilot system was applied to the study area, which was a part of scenario. The model was carried out using axiom of logical constraint in the meaning of human-activity.

Utilization of Syllabic Nuclei Location in Korean Speech Segmentation into Phonemic Units (음절핵의 위치정보를 이용한 우리말의 음소경계 추출)

  • 신옥근
    • The Journal of the Acoustical Society of Korea
    • /
    • v.19 no.5
    • /
    • pp.13-19
    • /
    • 2000
  • The blind segmentation method, which segments input speech data into recognition unit without any prior knowledge, plays an important role in continuous speech recognition system and corpus generation. As no prior knowledge is required, this method is rather simple to implement, but in general, it suffers from bad performance when compared to the knowledge-based segmentation method. In this paper, we introduce a method to improve the performance of a blind segmentation of Korean continuous speech by postprocessing the segment boundaries obtained from the blind segmentation. In the preprocessing stage, the candidate boundaries are extracted by a clustering technique based on the GLR(generalized likelihood ratio) distance measure. In the postprocessing stage, the final phoneme boundaries are selected from the candidates by utilizing a simple a priori knowledge on the syllabic structure of Korean, i.e., the maximum number of phonemes between any consecutive nuclei is limited. The experimental result was rather promising : the proposed method yields 25% reduction of insertion error rate compared that of the blind segmentation alone.

  • PDF

Automatic Detection of Optic Disc Boundary on Fundus Image (안저 영상에서 시신경유두의 윤곽선 자동 검출)

  • 김필운;홍승표;원철호;조진호;김명남
    • Journal of Biomedical Engineering Research
    • /
    • v.24 no.2
    • /
    • pp.91-97
    • /
    • 2003
  • The Propose of this paper is hierarchical detection method for the optic disc in fundus image. We detected the optic disc boundary by using the Prior information. It is based on the anatomical knowledge of fundus which are the vessel information. the image complexity. and etc. The whole method can be divided into three stages . First, we selected the region of interest(ROI) which included optic disc region. This is used to calculate location and size of the optic disc which are prior knowledge to simplify image preprocessing. And then. we divided the fundus image into numberous regions with watershed algorithm and detected intial boundary of the optic disc by reducing the number of the separated regions in ROI. Finally, we have searching the defective parts of boundary as a result of serious vessel interference in order to detect the accurate boundary of optic disc and we have removing and interpolating them.