• Title/Summary/Keyword: top-k classification

Search Result 160, Processing Time 0.022 seconds

Naive Bayes classifiers boosted by sufficient dimension reduction: applications to top-k classification

  • Yang, Su Hyeong;Shin, Seung Jun;Sung, Wooseok;Lee, Choon Won
    • Communications for Statistical Applications and Methods
    • /
    • v.29 no.5
    • /
    • pp.603-614
    • /
    • 2022
  • The naive Bayes classifier is one of the most straightforward classification tools and directly estimates the class probability. However, because it relies on the independent assumption of the predictor, which is rarely satisfied in real-world problems, its application is limited in practice. In this article, we propose employing sufficient dimension reduction (SDR) to substantially improve the performance of the naive Bayes classifier, which is often deteriorated when the number of predictors is not restrictively small. This is not surprising as SDR reduces the predictor dimension without sacrificing classification information, and predictors in the reduced space are constructed to be uncorrelated. Therefore, SDR leads the naive Bayes to no longer be naive. We applied the proposed naive Bayes classifier after SDR to build a recommendation system for the eyewear-frames based on customers' face shape, demonstrating its utility in the top-k classification problem.

Visualized Malware Classification Based-on Convolutional Neural Network (Convolutional Neural Network 기반의 악성코드 이미지화를 통한 패밀리 분류)

  • Seok, Seonhee;Kim, Howon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.26 no.1
    • /
    • pp.197-208
    • /
    • 2016
  • In this paper, we propose a method based on a convolutional neural network which is one of the deep neural network. So, we convert a malware code to malware image and train the convolutional neural network. In experiment with classify 9-families, the proposed method records a 96.2%, 98.7% of top-1, 2 error rate. And our model can classify 27 families with 82.9%, 89% of top-1,2 error rate.

Analysis of Pyrolysis MS Spectra in Top-down Approach and Differentiation of Gram-type Cells (Top-down 방식의 열분해질량분석 스펙트라 분석 및 Gram-type 세균 분류)

  • Kim, Ju-Hyun
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.14 no.4
    • /
    • pp.719-725
    • /
    • 2011
  • To apply TMAH-based Py-MS to a field biological detection system for real-time classification of cell-type, reproducible patterns of the TMAH-based Py-MS spectra was known as a critical factor for classification but was seriously disturbed by quantity of cells injected into pyro-tube. This factor is an exterior variable that could not be complemented by improving the performance of the TMAH-based Py-MS instrument. One of idea to solve the knotty problem has been flashed from "Top-down proteomics for identification of intact microoganisms". That is, biomarker peaks are selected from complicate Py-MS spectra for intact microoganisms by tracing out their origins, based on Py-MS spectra for the featured components of different cell-types, in Top-down approach. This idea has been tested in classification of different Gram-type microoganisms. Through the analyses of spectra for the featured components - peptidoglycan and lipoteichoic acid for Gram-positive cells and lipopolysaccharide and lipid A for Gram-negative cells - with comparing to the spectra the corresponding Gram-type cells in the Top-down approach, biomarker peaks were selected to carry out PCA(Principal Component Analysis) in order to see classification of different Gram-types, resulting in significant improvement of their classification. Furthermore, weighting biomarker peaks on intact cell's spectra, based on the data for the featured components of the Gram-types, contributed to elevate classification performance.

EFTG: Efficient and Flexible Top-K Geo-textual Publish/Subscribe

  • zhu, Hong;Li, Hongbo;Cui, Zongmin;Cao, Zhongsheng;Xie, Meiyi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.12
    • /
    • pp.5877-5897
    • /
    • 2018
  • With the popularity of mobile networks and smartphones, geo-textual publish/subscribe messaging has attracted wide attention. Different from the traditional publish/subscribe format, geo-textual data is published and subscribed in the form of dynamic data flow in the mobile network. The difference creates more requirements for efficiency and flexibility. However, most of the existing Top-k geo-textual publish/subscribe schemes have the following deficiencies: (1) All publications have to be scored for each subscription, which is not efficient enough. (2) A user should take time to set a threshold for each subscription, which is not flexible enough. Therefore, we propose an efficient and flexible Top-k geo-textual publish/subscribe scheme. First, our scheme groups publish and subscribe based on text classification. Thus, only a few parts of related publications should be scored for each subscription, which significantly enhances efficiency. Second, our scheme proposes an adaptive publish/subscribe matching algorithm. The algorithm does not require the user to set a threshold. It can adaptively return Top-k results to the user for each subscription, which significantly enhances flexibility. Finally, theoretical analysis and experimental evaluation verify the efficiency and effectiveness of our scheme.

A Decision Tree Algorithm using Genetic Programming

  • Park, Chongsun;Ko, Young Kyong
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.3
    • /
    • pp.845-857
    • /
    • 2003
  • We explore the use of genetic programming to evolve decision trees directly for classification problems with both discrete and continuous predictors. We demonstrate that the derived hypotheses of standard algorithms can substantially deviated from the optimum. This deviation is partly due to their top-down style procedures. The performance of the system is measured on a set of real and simulated data sets and compared with the performance of well-known algorithms like CHAID, CART, C5.0, and QUEST. Proposed algorithm seems to be effective in handling problems caused by top-down style procedures of existing algorithms.

A Feature Selection-based Ensemble Method for Arrhythmia Classification

  • Namsrai, Erdenetuya;Munkhdalai, Tsendsuren;Li, Meijing;Shin, Jung-Hoon;Namsrai, Oyun-Erdene;Ryu, Keun Ho
    • Journal of Information Processing Systems
    • /
    • v.9 no.1
    • /
    • pp.31-40
    • /
    • 2013
  • In this paper, a novel method is proposed to build an ensemble of classifiers by using a feature selection schema. The feature selection schema identifies the best feature sets that affect the arrhythmia classification. Firstly, a number of feature subsets are extracted by applying the feature selection schema to the original dataset. Then classification models are built by using the each feature subset. Finally, we combine the classification models by adopting a voting approach to form a classification ensemble. The voting approach in our method involves both classification error rate and feature selection rate to calculate the score of the each classifier in the ensemble. In our method, the feature selection rate depends on the extracting order of the feature subsets. In the experiment, we applied our method to arrhythmia dataset and generated three top disjointed feature sets. We then built three classifiers based on the top-three feature subsets and formed the classifier ensemble by using the voting approach. Our method can improve the classification accuracy in high dimensional dataset. The performance of each classifier and the performance of their ensemble were higher than the performance of the classifier that was based on whole feature space of the dataset. The classification performance was improved and a more stable classification model could be constructed with the proposed approach.

Instagram image classification with Deep Learning (딥러닝을 이용한 인스타그램 이미지 분류)

  • Jeong, Nokwon;Cho, Soosun
    • Journal of Internet Computing and Services
    • /
    • v.18 no.5
    • /
    • pp.61-67
    • /
    • 2017
  • In this paper we introduce two experimental results from classification of Instagram images and some valuable lessons from them. We have tried some experiments for evaluating the competitive power of Convolutional Neural Network(CNN) in classification of real social network images such as Instagram images. We used AlexNet and ResNet, which showed the most outstanding capabilities in ImageNet Large Scale Visual Recognition Challenge(ILSVRC) 2012 and 2015, respectively. And we used 240 Instagram images and 12 pre-defined categories for classifying social network images. Also, we performed fine-tuning using Inception V3 model, and compared those results. In the results of four cases of AlexNet, ResNet, Inception V3 and fine-tuned Inception V3, the Top-1 error rates were 49.58%, 40.42%, 30.42%, and 5.00%. And the Top-5 error rates were 35.42%, 25.00%, 20.83%, and 0.00% respectively.

Development of surface defect inspection algorithms for cold mill strip (냉연 표면흠 검사 알고리듬 개발에 관한 연구)

  • Kim, Kyoung-Min;Park, Gwi-Tae;Park, Joong-Jo;Lee, Jong-Hak;Jung, Jin-Yang;Lee, Joo-Kang
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.3 no.2
    • /
    • pp.179-186
    • /
    • 1997
  • In this paper we suggest a development of surface defect inspection algorithms for cold mill strip. The defects which exist in a surface of cold mill strip have a scattering or singular distribution. This paper consists of preprocessing, feature extraction and defect classification. By preprocessing, the binarized defect image is achieved. In this procedure, Top-hit transform, adaptive thresholding, thinning and noise rejection are used. Especially, Top-hit transform using local min/max operation diminishes the effect of bad lighting. In feature extraction, geometric, moment and co-occurrence matrix features are calculated. For the defect classification, multilayer neural network is used. The proposed algorithm showed 15% error rate.

  • PDF

Analysis of Nursing Interventions Performed by Gynecological Nursing Unit Nurses Using the Nursing Interventions Classification (간호중재분류 (NIC)에 근거한 부인과 간호단위의 간호중재 분석)

  • Hong, Sung-Jung;Lee, Sung-Hee;Kim, Hwa-Sun
    • Women's Health Nursing
    • /
    • v.17 no.3
    • /
    • pp.275-284
    • /
    • 2011
  • Purpose: The purpose of this study was to identify nursing intervention performed by nurses on gynecological nursing units. Methods: The instrument in this study is based on the fifth edition of Nursing Interventions Classification (NIC) (2008). Data was collected by Electronic Medical record from August, 2010 to October, 2010 at one hospital and analyzed by using frequencies in the Microsoft Excel 2010 program. Results: Of a total of 82 NIC, domains of the nursing interventions showed higher percentages for physiological: basic (36.3%) and physiological: complex (34.5%). The classes of nursing interventions showed higher percentage for health system medication (12.1%), perioperative care (10.0%), and drug management (8.6%). The most frequently used top interventions were Discharge Planning. The thirty least used interventions was environmental management. Top thirty most frequently used interventions belonged to the domain of physiological: basic (37.9%), physiological: complex (31.1%), and behavioral (5.4%). Conclusion: These findings will help in the establishment of a standardized language for gynecological nursing units and enhance the quality of nursing care.

Variations of AlexNet and GoogLeNet to Improve Korean Character Recognition Performance

  • Lee, Sang-Geol;Sung, Yunsick;Kim, Yeon-Gyu;Cha, Eui-Young
    • Journal of Information Processing Systems
    • /
    • v.14 no.1
    • /
    • pp.205-217
    • /
    • 2018
  • Deep learning using convolutional neural networks (CNNs) is being studied in various fields of image recognition and these studies show excellent performance. In this paper, we compare the performance of CNN architectures, KCR-AlexNet and KCR-GoogLeNet. The experimental data used in this paper is obtained from PHD08, a large-scale Korean character database. It has 2,187 samples of each Korean character with 2,350 Korean character classes for a total of 5,139,450 data samples. In the training results, KCR-AlexNet showed an accuracy of over 98% for the top-1 test and KCR-GoogLeNet showed an accuracy of over 99% for the top-1 test after the final training iteration. We made an additional Korean character dataset with fonts that were not in PHD08 to compare the classification success rate with commercial optical character recognition (OCR) programs and ensure the objectivity of the experiment. While the commercial OCR programs showed 66.95% to 83.16% classification success rates, KCR-AlexNet and KCR-GoogLeNet showed average classification success rates of 90.12% and 89.14%, respectively, which are higher than the commercial OCR programs' rates. Considering the time factor, KCR-AlexNet was faster than KCR-GoogLeNet when they were trained using PHD08; otherwise, KCR-GoogLeNet had a faster classification speed.