• Title/Summary/Keyword: Large-set Classification

Search Result 183, Processing Time 0.033 seconds

The Predictive QSAR Model for hERG Inhibitors Using Bayesian and Random Forest Classification Method

  • Kim, Jun-Hyoung;Chae, Chong-Hak;Kang, Shin-Myung;Lee, Joo-Yon;Lee, Gil-Nam;Hwang, Soon-Hee;Kang, Nam-Sook
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.4
    • /
    • pp.1237-1240
    • /
    • 2011
  • In this study, we have developed a ligand-based in-silico prediction model to classify chemical structures into hERG blockers using Bayesian and random forest modeling methods. These models were built based on patch clamp experimental results. The findings presented in this work indicate that Laplacian-modified naive Bayesian classification with diverse selection is useful for predicting hERG inhibitors when a large data set is not obtained.

Estimating Prediction Errors in Binary Classification Problem: Cross-Validation versus Bootstrap

  • Kim Ji-Hyun;Cha Eun-Song
    • Communications for Statistical Applications and Methods
    • /
    • v.13 no.1
    • /
    • pp.151-165
    • /
    • 2006
  • It is important to estimate the true misclassification rate of a given classifier when an independent set of test data is not available. Cross-validation and bootstrap are two possible approaches in this case. In related literature bootstrap estimators of the true misclassification rate were asserted to have better performance for small samples than cross-validation estimators. We compare the two estimators empirically when the classification rule is so adaptive to training data that its apparent misclassification rate is close to zero. We confirm that bootstrap estimators have better performance for small samples because of small variance, and we have found a new fact that their bias tends to be significant even for moderate to large samples, in which case cross-validation estimators have better performance with less computation.

Municipal waste classification system design based on Faster-RCNN and YoloV4 mixed model

  • Liu, Gan;Lee, Sang-Hyun
    • International Journal of Advanced Culture Technology
    • /
    • v.9 no.3
    • /
    • pp.305-314
    • /
    • 2021
  • Currently, due to COVID-19, household waste has a lot of impact on the environment due to packaging of food delivery. In this paper, we design and implement Faster-RCNN, SSD, and YOLOv4 models for municipal waste detection and classification. The data set explores two types of plastics, which account for a large proportion of household waste, and the types of aluminum cans. To classify the plastic type and the aluminum can type, 1,083 aluminum can types and 1,003 plastic types were studied. In addition, in order to increase the accuracy, we compare and evaluate the loss value and the accuracy value for the detection of municipal waste classification using Faster-RCNN, SDD, and YoloV4 three models. As a final result of this paper, the average precision value of the SSD model is 99.99%, the average precision value of plastics is 97.65%, and the mAP value is 99.78%, which is the best result.

An Application of Artificial Intelligence System for Accuracy Improvement in Classification of Remotely Sensed Images (원격탐사 영상의 분류정확도 향상을 위한 인공지능형 시스템의 적용)

  • 양인태;한성만;박재국
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.20 no.1
    • /
    • pp.21-31
    • /
    • 2002
  • This study applied each Neural Networks theory and Fuzzy Set theory to improve accuracy in remotely sensed images. Remotely sensed data have been used to map land cover. The accuracy is dependent on a range of factors related to the data set and methods used. Thus, the accuracy of maps derived from conventional supervised image classification techniques is a function of factors related to the training, allocation, and testing stages of the classification. Conventional image classification techniques assume that all the pixels within the image are pure. That is, that they represent an area of homogeneous cover of a single land-cover class. But, this assumption is often untenable with pixels of mixed land-cover composition abundant in an image. Mixed pixels are a major problem in land-cover mapping applications. For each pixel, the strengths of class membership derived in the classification may be related to its land-cover composition. Fuzzy classification techniques are the concept of a pixel having a degree of membership to all classes is fundamental to fuzzy-sets-based techniques. A major problem with the fuzzy-sets and probabilistic methods is that they are slow and computational demanding. For analyzing large data sets and rapid processing, alterative techniques are required. One particularly attractive approach is the use of artificial neural networks. These are non-parametric techniques which have been shown to generally be capable of classifying data as or more accurately than conventional classifiers. An artificial neural networks, once trained, may classify data extremely rapidly as the classification process may be reduced to the solution of a large number of extremely simple calculations which may be performed in parallel.

Segmentation-free Recognition of Touching Numeral Pairs (두자 접촉 숫자열의 분할 자유 인식)

  • Choi, Soon-Man;Oh, Il-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.5
    • /
    • pp.563-574
    • /
    • 2000
  • Recognition of numeral fields is a very important task for many document automation applications. Conventional methods are based on the two-steps process, segmentation of touching numerals and recognition of the individual numerals. However, due to a large variation of touching types this approach has not produced a robust result. In this paper, we present a new segmentation-free method for recognizing the two touching numerals. In this approach, two touching numerals are regarded as a single pattern coming from 100 classes ('00', '01', '02', ..., '98', '99'). For the test set, we manually extract two touching numerals from the data set of NIST numeral fields. Due to the limitation of conventional neural network in case of large-set classification, we use a modular neural network and Drove its superiority through recognition experimen.

  • PDF

Predictive Analysis of Financial Fraud Detection using Azure and Spark ML

  • Priyanka Purushu;Niklas Melcher;Bhagyashree Bhagwat;Jongwook Woo
    • Asia pacific journal of information systems
    • /
    • v.28 no.4
    • /
    • pp.308-319
    • /
    • 2018
  • This paper aims at providing valuable insights on Financial Fraud Detection on a mobile money transactional activity. We have predicted and classified the transaction as normal or fraud with a small sample and massive data set using Azure and Spark ML, which are traditional systems and Big Data respectively. Experimenting with sample dataset in Azure, we found that the Decision Forest model is the most accurate to proceed in terms of the recall value. For the massive data set using Spark ML, it is found that the Random Forest classifier algorithm of the classification model proves to be the best algorithm. It is presented that the Spark cluster gets much faster to build and evaluate models as adding more servers to the cluster with the same accuracy, which proves that the large scale data set can be predictable using Big Data platform. Finally, we reached a recall score with 0.73, which implies a satisfying prediction quality in predicting fraudulent transactions.

Prototype-Based Classification Using Class Hyperspheres (클래스 초월구를 이용한 프로토타입 기반 분류)

  • Lee, Hyun-Jong;Hwang, Doosung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.10
    • /
    • pp.483-488
    • /
    • 2016
  • In this paper, we propose a prototype-based classification learning by using the nearest-neighbor rule. The nearest-neighbor is applied to segment the class area of all the training data with hyperspheres, and a hypersphere must cover the data from the same class. The radius of a hypersphere is computed by the mid point of the two distances to the farthest same class point and the nearest other class point. And we transform the prototype selection problem into a set covering problem in order to determine the smallest set of prototypes that cover all the training data. The proposed prototype selection method is designed by a greedy algorithm and applicable to process a large-scale training set in parallel. The prediction rule is the nearest-neighbor rule and the new training data is the set of prototypes. In experiments, the generalization performance of the proposed method is superior to existing methods.

Stability evaluation model for loess deposits based on PCA-PNN

  • Li, Guangkun;Su, Maoxin;Xue, Yiguo;Song, Qian;Qiu, Daohong;Fu, Kang;Wang, Peng
    • Geomechanics and Engineering
    • /
    • v.27 no.6
    • /
    • pp.551-560
    • /
    • 2021
  • Due to the low strength and high compressibility characteristics, the loess deposits tunnels are prone to large deformations and collapse. An accurate stability evaluation for loess deposits is of considerable significance in deformation control and safety work during tunnel construction. 37 groups of representative data based on real loess deposits cases were adopted to establish the stability evaluation model for the tunnel project in Yan'an, China. Physical and mechanical indices, including water content, cohesion, internal friction angle, elastic modulus, and poisson ratio are selected as index system on the stability level of loess. The data set is randomly divided into 80% as the training set and 20% as the test set. Firstly, principal component analysis (PCA) is used to convert the five index system to three linearly independent principal components X1, X2 and X3. Then, the principal components were used as input vectors for probabilistic neural network (PNN) to map the nonlinear relationship between the index system and stability level of loess. Furthermore, Leave-One-Out cross validation was applied for the training set to find the suitable smoothing factor. At last, the established model with the target smoothing factor 0.04 was applied for the test set, and a 100% prediction accuracy rate was obtained. This intelligent classification method for loess deposits can be easily conducted, which has wide potential applications in evaluating loess deposits.

Band Selection Using Forward Feature Selection Algorithm for Citrus Huanglongbing Disease Detection

  • Katti, Anurag R.;Lee, W.S.;Ehsani, R.;Yang, C.
    • Journal of Biosystems Engineering
    • /
    • v.40 no.4
    • /
    • pp.417-427
    • /
    • 2015
  • Purpose: This study investigated different band selection methods to classify spectrally similar data - obtained from aerial images of healthy citrus canopies and citrus greening disease (Huanglongbing or HLB) infected canopies - using small differences without unmixing endmember components and therefore without the need for an endmember library. However, large number of hyperspectral bands has high redundancy which had to be reduced through band selection. The objective, therefore, was to first select the best set of bands and then detect citrus Huanglongbing infected canopies using these bands in aerial hyperspectral images. Methods: The forward feature selection algorithm (FFSA) was chosen for band selection. The selected bands were used for identifying HLB infected pixels using various classifiers such as K nearest neighbor (KNN), support vector machine (SVM), naïve Bayesian classifier (NBC), and generalized local discriminant bases (LDB). All bands were also utilized to compare results. Results: It was determined that a few well-chosen bands yielded much better results than when all bands were chosen, and brought the classification results on par with standard hyperspectral classification techniques such as spectral angle mapper (SAM) and mixture tuned matched filtering (MTMF). Median detection accuracies ranged from 66-80%, which showed great potential toward rapid detection of the disease. Conclusions: Among the methods investigated, a support vector machine classifier combined with the forward feature selection algorithm yielded the best results.

WAVELET-BASED FOREST AREAS CLASSIFICATION BY USING HIGH RESOLUTION IMAGERY

  • Yoon Bo-Yeol;Kim Choen
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.698-701
    • /
    • 2005
  • This paper examines that is extracted certain information in forest areas within high resolution imagery based on wavelet transformation. First of all, study areas are selected one more species distributed spots refer to forest type map. Next, study area is cut 256 x 256 pixels size because of image processing problem in large volume data. Prior to wavelet transformation, five texture parameters (contrast, dissimilarity, entropy, homogeneity, Angular Second Moment (ASM≫ calculated by using Gray Level Co-occurrence Matrix (GLCM). Five texture images are set that shifting window size is 3x3, distance .is 1 pixel, and angle is 45 degrees used. Wavelet function is selected Daubechies 4 wavelet basis functions. Result is summarized 3 points; First, Wavelet transformation images derived from contrast, dissimilarity (texture parameters) have on effect on edge elements detection and will have probability used forest road detection. Second, Wavelet fusion images derived from texture parameters and original image can apply to forest area classification because of clustering in Homogeneous forest type structure. Third, for grading evaluation in forest fire damaged area, if data fusion of established classification method, GLCM texture extraction concept and wavelet transformation technique effectively applied forest areas (also other areas), will obtain high accuracy result.

  • PDF