• Title/Summary/Keyword: classification/prediction

Search Result 1,121, Processing Time 0.033 seconds

Application of data mining and statistical measurement of agricultural high-quality development

  • Yan Zhou
    • Advances in nano research
    • /
    • v.14 no.3
    • /
    • pp.225-234
    • /
    • 2023
  • In this study, we aim to use big data resources and statistical analysis to obtain a reliable instruction to reach high-quality and high yield agricultural yields. In this regard, soil type data, raining and temperature data as well as wheat production in each year are collected for a specific region. Using statistical methodology, the acquired data was cleaned to remove incomplete and defective data. Afterwards, using several classification methods in machine learning we tried to distinguish between different factors and their influence on the final crop yields. Comparing the proposed models' prediction using statistical quantities correlation factor and mean squared error between predicted values of the crop yield and actual values the efficacy of machine learning methods is discussed. The results of the analysis show high accuracy of machine learning methods in the prediction of the crop yields. Moreover, it is indicated that the random forest (RF) classification approach provides best results among other classification methods utilized in this study.

A Study on the Performance Evaluation of Machine Learning for Predicting the Number of Movie Audiences (영화 관객 수 예측을 위한 기계학습 기법의 성능 평가 연구)

  • Jeong, Chan-Mi;Min, Daiki
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.2
    • /
    • pp.49-63
    • /
    • 2020
  • The accurate prediction of box office in the early stage is crucial for film industry to make better managerial decision. With aims to improve the prediction performance, the purpose of this paper is to evaluate the use of machine learning methods. We tested both classification and regression based methods including k-NN, SVM and Random Forest. We first evaluate input variables, which show that reputation-related information generated during the first two-week period after release is significant. Prediction test results show that regression based methods provides lower prediction error, and Random Forest particularly outperforms other machine learning methods. Regression based method has better prediction power when films have small box office earnings. On the other hand, classification based method works better for predicting large box office earnings.

Land Use Feature Extraction and Sprawl Development Prediction from Quickbird Satellite Imagery Using Dempster-Shafer and Land Transformation Model

  • Saharkhiz, Maryam Adel;Pradhan, Biswajeet;Rizeei, Hossein Mojaddadi;Jung, Hyung-Sup
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.1
    • /
    • pp.15-27
    • /
    • 2020
  • Accurate knowledge of land use/land cover (LULC) features and their relative changes over upon the time are essential for sustainable urban management. Urban sprawl growth has been always also a worldwide concern that needs to carefully monitor particularly in a developing country where unplanned building constriction has been expanding at a high rate. Recently, remotely sensed imageries with a very high spatial/spectral resolution and state of the art machine learning approaches sent the urban classification and growth monitoring to a higher level. In this research, we classified the Quickbird satellite imagery by object-based image analysis of Dempster-Shafer (OBIA-DS) for the years of 2002 and 2015 at Karbala-Iraq. The real LULC changes including, residential sprawl expansion, amongst these years, were identified via change detection procedure. In accordance with extracted features of LULC and detected trend of urban pattern, the future LULC dynamic was simulated by using land transformation model (LTM) in geospatial information system (GIS) platform. Both classification and prediction stages were successfully validated using ground control points (GCPs) through accuracy assessment metric of Kappa coefficient that indicated 0.87 and 0.91 for 2002 and 2015 classification as well as 0.79 for prediction part. Detail results revealed a substantial growth in building over fifteen years that mostly replaced by agriculture and orchard field. The prediction scenario of LULC sprawl development for 2030 revealed a substantial decline in green and agriculture land as well as an extensive increment in build-up area especially at the countryside of the city without following the residential pattern standard. The proposed method helps urban decision-makers to identify the detail temporal-spatial growth pattern of highly populated cities like Karbala. Additionally, the results of this study can be considered as a probable future map in order to design enough future social services and amenities for the local inhabitants.

A Development of Suicidal Ideation Prediction Model and Decision Rules for the Elderly: Decision Tree Approach (의사결정나무 기법을 이용한 노인들의 자살생각 예측모형 및 의사결정 규칙 개발)

  • Kim, Deok Hyun;Yoo, Dong Hee;Jeong, Dae Yul
    • The Journal of Information Systems
    • /
    • v.28 no.3
    • /
    • pp.249-276
    • /
    • 2019
  • Purpose The purpose of this study is to develop a prediction model and decision rules for the elderly's suicidal ideation based on the Korean Welfare Panel survey data. By utilizing this data, we obtained many decision rules to predict the elderly's suicide ideation. Design/methodology/approach This study used classification analysis to derive decision rules to predict on the basis of decision tree technique. Weka 3.8 is used as the data mining tool in this study. The decision tree algorithm uses J48, also known as C4.5. In addition, 66.6% of the total data was divided into learning data and verification data. We considered all possible variables based on previous studies in predicting suicidal ideation of the elderly. Finally, 99 variables including the target variable were used. Classification analysis was performed by introducing sampling technique through backward elimination and data balancing. Findings As a result, there were significant differences between the data sets. The selected data sets have different, various decision tree and several rules. Based on the decision tree method, we derived the rules for suicide prevention. The decision tree derives not only the rules for the suicidal ideation of the depressed group, but also the rules for the suicidal ideation of the non-depressed group. In addition, in developing the predictive model, the problem of over-fitting due to the data imbalance phenomenon was directly identified through the application of data balancing. We could conclude that it is necessary to balance the data on the target variables in order to perform the correct classification analysis without over-fitting. In addition, although data balancing is applied, it is shown that performance is not inferior in prediction rate when compared with a biased prediction model.

Artificial Intelligence-Based Colorectal Polyp Histology Prediction by Using Narrow-Band Image-Magnifying Colonoscopy

  • Istvan Racz;Andras Horvath;Noemi Kranitz;Gyongyi Kiss;Henriett Regoczi;Zoltan Horvath
    • Clinical Endoscopy
    • /
    • v.55 no.1
    • /
    • pp.113-121
    • /
    • 2022
  • Background/Aims: We have been developing artificial intelligence based polyp histology prediction (AIPHP) method to classify Narrow Band Imaging (NBI) magnifying colonoscopy images to predict the hyperplastic or neoplastic histology of polyps. Our aim was to analyze the accuracy of AIPHP and narrow-band imaging international colorectal endoscopic (NICE) classification based histology predictions and also to compare the results of the two methods. Methods: We studied 373 colorectal polyp samples taken by polypectomy from 279 patients. The documented NBI still images were analyzed by the AIPHP method and by the NICE classification parallel. The AIPHP software was created by machine learning method. The software measures five geometrical and color features on the endoscopic image. Results: The accuracy of AIPHP was 86.6% (323/373) in total of polyps. We compared the AIPHP accuracy results for diminutive and non-diminutive polyps (82.1% vs. 92.2%; p=0.0032). The accuracy of the hyperplastic histology prediction was significantly better by NICE compared to AIPHP method both in the diminutive polyps (n=207) (95.2% vs. 82.1%) (p<0.001) and also in all evaluated polyps (n=373) (97.1% vs. 86.6%) (p<0.001) Conclusions: Our artificial intelligence based polyp histology prediction software could predict histology with high accuracy only in the large size polyp subgroup.

MOTIF BASED PROTEIN FUNCTION ANALYSIS USING DATA MINING

  • Lee, Bum-Ju;Lee, Heon-Gyu;Ryu, Keun-Ho
    • Proceedings of the KSRS Conference
    • /
    • v.2
    • /
    • pp.812-815
    • /
    • 2006
  • Proteins are essential agents for controlling, effecting and modulating cellular functions, and proteins with similar sequences have diverged from a common ancestral gene, and have similar structures and functions. Function prediction of unknown proteins remains one of the most challenging problems in bioinformatics. Recently, various computational approaches have been developed for identification of short sequences that are conserved within a family of closely related protein sequence. Protein function is often correlated with highly conserved motifs. Motif is the smallest unit of protein structure and function, and intends to make core part among protein structural and functional components. Therefore, prediction methods using data mining or machine learning have been developed. In this paper, we describe an approach for protein function prediction of motif-based models using data mining. Our work consists of three phrases. We make training and test data set and construct classifier using a training set. Also, through experiments, we evaluate our classifier with other classifiers in point of the accuracy of resulting classification.

  • PDF

Ensemble approach for improving prediction in kernel regression and classification

  • Han, Sunwoo;Hwang, Seongyun;Lee, Seokho
    • Communications for Statistical Applications and Methods
    • /
    • v.23 no.4
    • /
    • pp.355-362
    • /
    • 2016
  • Ensemble methods often help increase prediction ability in various predictive models by combining multiple weak learners and reducing the variability of the final predictive model. In this work, we demonstrate that ensemble methods also enhance the accuracy of prediction under kernel ridge regression and kernel logistic regression classification. Here we apply bagging and random forests to two kernel-based predictive models; and present the procedure of how bagging and random forests can be embedded in kernel-based predictive models. Our proposals are tested under numerous synthetic and real datasets; subsequently, they are compared with plain kernel-based predictive models and their subsampling approach. Numerical studies demonstrate that ensemble approach outperforms plain kernel-based predictive models.

A Take-off Clearance Prediction Model for Mixed Mode Runway Operations (출·도착 혼합 사용 활주로에서의 관제사 이륙 허가 예측 모형 개발)

  • Hong, Sungkwon;Jeon, Daekeun;Kim, Hyounkyoung
    • Journal of the Korean Society for Aviation and Aeronautics
    • /
    • v.24 no.3
    • /
    • pp.48-54
    • /
    • 2016
  • This paper proposes a prediction model of air traffic controller's take-off clearance under mixed mode runway operations. The proposed model has its purpose on the better prediction of the air traffic controller's clearance on take-offs of departure aircraft by considering various factors. For this purpose, support vector machine classification algorithm is used for the proposed model. The proposed model is applied to real air traffic operations to demonstrate its performances.

Neuroimaging-Based Deep Learning in Autism Spectrum Disorder and Attention-Deficit/Hyperactivity Disorder

  • Song, Jae-Won;Yoon, Na-Rae;Jang, Soo-Min;Lee, Ga-Young;Kim, Bung-Nyun
    • Journal of the Korean Academy of Child and Adolescent Psychiatry
    • /
    • v.31 no.3
    • /
    • pp.97-104
    • /
    • 2020
  • Deep learning (DL) is a kind of machine learning technique that uses artificial intelligence to identify the characteristics of given data and efficiently analyze large amounts of information to perform tasks such as classification and prediction. In the field of neuroimaging of neurodevelopmental disorders, various biomarkers for diagnosis, classification, prognosis prediction, and treatment response prediction have been examined; however, they have not been efficiently combined to produce meaningful results. DL can be applied to overcome these limitations and produce clinically helpful results. Here, we review studies that combine neurodevelopmental disorder neuroimaging and DL techniques to explore the strengths, limitations, and future directions of this research area.

DNN based Binary Classification Model by Particular Matter Concentration (DNN 기반의 미세먼지 농도별 이진 분류 모델)

  • Lee, Jong-sung;Jung, Yong-jin;Oh, Chang-heon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.277-279
    • /
    • 2021
  • There is a problem that learning of a prediction model is not well performed depending on the characteristics of each particular matter concentration. To solve this problem, it is necessary to design a prediction model for low concentration and high concentration separately. Therefore, a classification model is needed to classify the concentration of particular matter into low and high concentrations. This paper proposes a classification model to classify low and high concentrations based on the concentration of particular matter. DNN was used as the classification model algorithm, and the classification model was designed by applying the optimal parameters after searching for hyper parameters. As for the result of evaluating the performance of the model, 97.54% of the low concentration classification was measured. And in the case of high concentration classification, 85.51% was measured.

  • PDF