• 제목/요약/키워드: WEKA

검색결과 57건 처리시간 0.026초

Error Forecasting Using Linear Regression Model

  • Ler, Lian Guey;Kim, Byung-Sik;Choi, Gye-Woon;Kang, Byung-Hwa;Kwang, Jung-Jae
    • 한국습지학회지
    • /
    • 제13권1호
    • /
    • pp.13-23
    • /
    • 2011
  • In this study, Mike11 will be used as the numerical model where a data assimilation method will be applied to it. This paper aims to gain an insight and understanding of data assimilation in flood forecasting models. It will start with a general discussion of data assimilation, followed by a description of the methodology and discussion of the statistical error forecast model used, which in this case is the linear regression. This error forecast model is applied to the water level forecast simulated by MIKE11 to produced improved forecast and validated against real measurements. It is found that there exists a phase error in the improved forecasts. Hence, 2 general formula are used to account for this phase error and they have shown improvement to the accuracy of the forecasts, where one improved the immediate forecast of up to 5 hours while the other improved the estimation of the peak discharge.

이미지 보간을 위한 의사결정나무 분류 기법의 적용 및 구현 (Adopting and Implementation of Decision Tree Classification Method for Image Interpolation)

  • 김동형
    • 디지털산업정보학회논문지
    • /
    • 제16권1호
    • /
    • pp.55-65
    • /
    • 2020
  • With the development of display hardware, image interpolation techniques have been used in various fields such as image zooming and medical imaging. Traditional image interpolation methods, such as bi-linear interpolation, bi-cubic interpolation and edge direction-based interpolation, perform interpolation in the spatial domain. Recently, interpolation techniques in the discrete cosine transform or wavelet domain are also proposed. Using these various existing interpolation methods and machine learning, we propose decision tree classification-based image interpolation methods. In other words, this paper is about the method of adaptively applying various existing interpolation methods, not the interpolation method itself. To obtain the decision model, we used Weka's J48 library with the C4.5 decision tree algorithm. The proposed method first constructs attribute set and select classes that means interpolation methods for classification model. And after training, interpolation is performed using different interpolation methods according to attributes characteristics. Simulation results show that the proposed method yields reasonable performance.

Predicting stock price direction by using data mining methods : Emphasis on comparing single classifiers and ensemble classifiers

  • Eo, Kyun Sun;Lee, Kun Chang
    • 한국컴퓨터정보학회논문지
    • /
    • 제22권11호
    • /
    • pp.111-116
    • /
    • 2017
  • This paper proposes a data mining approach to predicting stock price direction. Stock market fluctuates due to many factors. Therefore, predicting stock price direction has become an important issue in the field of stock market analysis. However, in literature, there are few studies applying data mining approaches to predicting the stock price direction. To contribute to literature, this paper proposes comparing single classifiers and ensemble classifiers. Single classifiers include logistic regression, decision tree, neural network, and support vector machine. Ensemble classifiers we consider are adaboost, random forest, bagging, stacking, and vote. For the sake of experiments, we garnered dataset from Korea Stock Exchange (KRX) ranging from 2008 to 2015. Data mining experiments using WEKA revealed that random forest, one of ensemble classifiers, shows best results in terms of metrics such as AUC (area under the ROC curve) and accuracy.

Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks

  • Sarkar, Kamal;Nasipuri, Mita;Ghose, Suranjan
    • Journal of Information Processing Systems
    • /
    • 제8권4호
    • /
    • pp.693-712
    • /
    • 2012
  • The paper presents three machine learning based keyphrase extraction methods that respectively use Decision Trees, Na$\ddot{i}$ve Bayes, and Artificial Neural Networks for keyphrase extraction. We consider keyphrases as being phrases that consist of one or more words and as representing the important concepts in a text document. The three machine learning based keyphrase extraction methods that we use for experimentation have been compared with a publicly available keyphrase extraction system called KEA. The experimental results show that the Neural Network based keyphrase extraction method outperforms two other keyphrase extraction methods that use the Decision Tree and Na$\ddot{i}$ve Bayes. The results also show that the Neural Network based method performs better than KEA.

음원을 이용한 기기판별 (Device identification Based on Audio Source)

  • 이명환;문창배;김병만
    • 한국정보과학회:학술대회논문집
    • /
    • 한국정보과학회 2012년도 한국컴퓨터종합학술대회논문집 Vol.39 No.1(C)
    • /
    • pp.224-226
    • /
    • 2012
  • IT 기술의 발전과 정보화 사회로 인해 컴퓨터 관련범죄뿐 아니라 일반 범죄에서도 증거 및 단서가 디지털정보 기기에 보관되는 경우가 발생하고 있다. 이러한 맥락에서 본 논문에서는 디지털 포렌식 기술의 하나로서 녹음 데이터로부터 녹음기기를 판별하는 효과적인 방법을 제안한다. 녹음된 데이터에서 노이즈를 추출하고, 이 노이즈의 차이점을 이용하면 효율적인 기기판별 방법이 가능해진다. 본 논문에서는 위너 필터를 통한 기기 Noise를 추출하고, MirToolBox를 이용하여 특징들을 추출한다. 추출된 특징들과 WEKA의 다중 신경망을 이용하여 학습 및 판별하였다. 판별 결과 평균 99.8%의 성능을 보였다.

데이터 정보를 이용한 흑색 플라스틱 분류기 설계 (Design of Black Plastics Classifier Using Data Information)

  • 박상범;오성권
    • 전기학회논문지
    • /
    • 제67권4호
    • /
    • pp.569-577
    • /
    • 2018
  • In this paper, with the aid of information which is included within data, preprocessing algorithm-based black plastic classifier is designed. The slope and area of spectrum obtained by using laser induced breakdown spectroscopy(LIBS) are analyzed for each material and its ensuing information is applied as the input data of the proposed classifier. The slope is represented by the rate of change of wavelength and intensity. Also, the area is calculated by the wavelength of the spectrum peak where the material property of chemical elements such as carbon and hydrogen appears. Using informations such as slope and area, input data of the proposed classifier is constructed. In the preprocessing part of the classifier, Principal Component Analysis(PCA) and fuzzy transform are used for dimensional reduction from high dimensional input variables to low dimensional input variables. Characteristic analysis of the materials as well as the processing speed of the classifier is improved. In the condition part, FCM clustering is applied and linear function is used as connection weight in the conclusion part. By means of Particle Swarm Optimization(PSO), parameters such as the number of clusters, fuzzification coefficient and the number of input variables are optimized. To demonstrate the superiority of classification performance, classification rate is compared by using WEKA 3.8 data mining software which contains various classifiers such as Naivebayes, SVM and Multilayer perceptron.

Set Covering 기반의 대용량 오믹스데이터 특징변수 추출기법 (Set Covering-based Feature Selection of Large-scale Omics Data)

  • 마정우;안기동;김광수;류홍서
    • 한국경영과학회지
    • /
    • 제39권4호
    • /
    • pp.75-84
    • /
    • 2014
  • In this paper, we dealt with feature selection problem of large-scale and high-dimensional biological data such as omics data. For this problem, most of the previous approaches used simple score function to reduce the number of original variables and selected features from the small number of remained variables. In the case of methods that do not rely on filtering techniques, they do not consider the interactions between the variables, or generate approximate solutions to the simplified problem. Unlike them, by combining set covering and clustering techniques, we developed a new method that could deal with total number of variables and consider the combinatorial effects of variables for selecting good features. To demonstrate the efficacy and effectiveness of the method, we downloaded gene expression datasets from TCGA (The Cancer Genome Atlas) and compared our method with other algorithms including WEKA embeded feature selection algorithms. In the experimental results, we showed that our method could select high quality features for constructing more accurate classifiers than other feature selection algorithms.

효율적인 데이터베이스 마케팅을 위한 데이터마이닝 전처리도구에 관한 연구 (A Study on the Data Mining Preprocessing Tool For Efficient Database Marketing)

  • 이준석
    • 디지털융복합연구
    • /
    • 제12권11호
    • /
    • pp.257-264
    • /
    • 2014
  • 효율적인 데이터베이스 마케팅을 위하여 고객들을 세분화하고, 새로운 지식을 탐색할 수 있는 데이터마이닝의 필요성이 증대되고 있다. 데이터마이닝 도구를 구축하기 위해서는 단계별 구현이 요구되어 지는데, 본 연구에서는 데이터마이닝을 위한 분산 환경에 적응 가능한 데이터 전처리 도구를 구성하였다. 기존의 데이터마이닝 도구인 앤서 트리, 클레멘타인, 엔터프라이즈 마이너, 캔싱턴, 웨카의 전처리 부분을 고찰하고, 분산 환경에서 효율적으로 사용할 수 있는 데이터 마이닝 전처리 도구를 구성하였다. 새로이 제안된 시스템은 엔터프라이즈 자바 빈즈와 XML을 기반으로 하였다.

의사결정트리와 인공 신경망 기법을 이용한 침입탐지 효율성 비교 연구 (A Comparative Study on the Performance of Intrusion Detection using Decision Tree and Artificial Neural Network Models)

  • 조성래;성행남;안병혁
    • 디지털산업정보학회논문지
    • /
    • 제11권4호
    • /
    • pp.33-45
    • /
    • 2015
  • Currently, Internet is used an essential tool in the business area. Despite this importance, there is a risk of network attacks attempting collection of fraudulence, private information, and cyber terrorism. Firewalls and IDS(Intrusion Detection System) are tools against those attacks. IDS is used to determine whether a network data is a network attack. IDS analyzes the network data using various techniques including expert system, data mining, and state transition analysis. This paper tries to compare the performance of two data mining models in detecting network attacks. They are decision tree (C4.5), and neural network (FANN model). I trained and tested these models with data and measured the effectiveness in terms of detection accuracy, detection rate, and false alarm rate. This paper tries to find out which model is effective in intrusion detection. In the analysis, I used KDD Cup 99 data which is a benchmark data in intrusion detection research. I used an open source Weka software for C4.5 model, and C++ code available for FANN model.

스마트폰의 3축 가속도 센서를 이용한 농구 자세 인식 (Basket ball motion recognition using a 3-axis accelerometer sensor of smart phone)

  • 호종갑;이상준;왕창원;정화영;나예지;민세동
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2015년도 추계학술발표대회
    • /
    • pp.1372-1374
    • /
    • 2015
  • 본 논문에서는 농구 경기에서의 대표적 자세 중 Standing shoot, Jump shoot, Pass, Dribble, Lay-up shoot, 총 5가지 자세를 인식하기 위해 각 자세와 3축 가속도 값과의 상관관계를 보여주고 있다. 스마트폰에 내장되어 있는 가속도 센서로부터 데이터를 생성해주는 어플리케이션인 Sensor log를 활용하여 얻은 3축 가속도 값으로 수직, 수평축과 3축 가속도의 크기를 구해 Instance로 사용하였다. 위 데이터는 대표적인 데이터 마이닝 도구인 Weka tool을 이용하여 각 모션과 데이터 값의 상관관계를 확인하였고, 실험 결과 10-fold에서 평균 59.8%를 보였으나 Training set과 Test set의 결과 80.8%를 보였다.