• Title/Summary/Keyword: Classification Algorithms

Search Result 1,182, Processing Time 0.022 seconds

Resume Classification System using Natural Language Processing & Machine Learning Techniques

  • Irfan Ali;Nimra;Ghulam Mujtaba;Zahid Hussain Khand;Zafar Ali;Sajid Khan
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.108-117
    • /
    • 2024
  • The selection and recommendation of a suitable job applicant from the pool of thousands of applications are often daunting jobs for an employer. The recommendation and selection process significantly increases the workload of the concerned department of an employer. Thus, Resume Classification System using the Natural Language Processing (NLP) and Machine Learning (ML) techniques could automate this tedious process and ease the job of an employer. Moreover, the automation of this process can significantly expedite and transparent the applicants' selection process with mere human involvement. Nevertheless, various Machine Learning approaches have been proposed to develop Resume Classification Systems. However, this study presents an automated NLP and ML-based system that classifies the Resumes according to job categories with performance guarantees. This study employs various ML algorithms and NLP techniques to measure the accuracy of Resume Classification Systems and proposes a solution with better accuracy and reliability in different settings. To demonstrate the significance of NLP & ML techniques for processing & classification of Resumes, the extracted features were tested on nine machine learning models Support Vector Machine - SVM (Linear, SGD, SVC & NuSVC), Naïve Bayes (Bernoulli, Multinomial & Gaussian), K-Nearest Neighbor (KNN) and Logistic Regression (LR). The Term-Frequency Inverse Document (TF-IDF) feature representation scheme proven suitable for Resume Classification Task. The developed models were evaluated using F-ScoreM, RecallM, PrecissionM, and overall Accuracy. The experimental results indicate that using the One-Vs-Rest-Classification strategy for this multi-class Resume Classification task, the SVM class of Machine Learning algorithms performed better on the study dataset with over 96% overall accuracy. The promising results suggest that NLP & ML techniques employed in this study could be used for the Resume Classification task.

Comparative Study of Tokenizer Based on Learning for Sentiment Analysis (고객 감성 분석을 위한 학습 기반 토크나이저 비교 연구)

  • Kim, Wonjoon
    • Journal of Korean Society for Quality Management
    • /
    • v.48 no.3
    • /
    • pp.421-431
    • /
    • 2020
  • Purpose: The purpose of this study is to compare and analyze the tokenizer in natural language processing for customer satisfaction in sentiment analysis. Methods: In this study, a supervised learning-based tokenizer Mecab-Ko and an unsupervised learning-based tokenizer SentencePiece were used for comparison. Three algorithms: Naïve Bayes, k-Nearest Neighbor, and Decision Tree were selected to compare the performance of each tokenizer. For performance comparison, three metrics: accuracy, precision, and recall were used in the study. Results: The results of this study are as follows; Through performance evaluation and verification, it was confirmed that SentencePiece shows better classification performance than Mecab-Ko. In order to confirm the robustness of the derived results, independent t-tests were conducted on the evaluation results for the two types of the tokenizer. As a result of the study, it was confirmed that the classification performance of the SentencePiece tokenizer was high in the k-Nearest Neighbor and Decision Tree algorithms. In addition, the Decision Tree showed slightly higher accuracy among the three classification algorithms. Conclusion: The SentencePiece tokenizer can be used to classify and interpret customer sentiment based on online reviews in Korean more accurately. In addition, it seems that it is possible to give a specific meaning to a short word or a jargon, which is often used by users when evaluating products but is not defined in advance.

A Novel Multiple Kernel Sparse Representation based Classification for Face Recognition

  • Zheng, Hao;Ye, Qiaolin;Jin, Zhong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.4
    • /
    • pp.1463-1480
    • /
    • 2014
  • It is well known that sparse code is effective for feature extraction of face recognition, especially sparse mode can be learned in the kernel space, and obtain better performance. Some recent algorithms made use of single kernel in the sparse mode, but this didn't make full use of the kernel information. The key issue is how to select the suitable kernel weights, and combine the selected kernels. In this paper, we propose a novel multiple kernel sparse representation based classification for face recognition (MKSRC), which performs sparse code and dictionary learning in the multiple kernel space. Initially, several possible kernels are combined and the sparse coefficient is computed, then the kernel weights can be obtained by the sparse coefficient. Finally convergence makes the kernel weights optimal. The experiments results show that our algorithm outperforms other state-of-the-art algorithms and demonstrate the promising performance of the proposed algorithms.

URL Filtering by Using Machine Learning

  • Saqib, Malik Najmus
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.275-279
    • /
    • 2022
  • The growth of technology nowadays has made many things easy for humans. These things are from everyday small task to more complex tasks. Such growth also comes with the illegal activities that are perform by using technology. These illegal activities can simple as displaying annoying message to big frauds. The easiest way for the attacker to perform such activities is to convenience user to click on the malicious link. It has been a great concern since a decay to classify URLs as malicious or benign. The blacklist has been used initially for that purpose and is it being used nowadays. It is efficient but has a drawback to update blacklist automatically. So, this method is replace by classification of URLs based on machine learning algorithms. In this paper we have use four machine learning classification algorithms to classify URLs as malicious or benign. These algorithms are support vector machine, random forest, n-nearest neighbor, and decision tree. The dataset that is used in this research has 36694 instances. A comparison of precision accuracy and recall values are shown for dataset with and without preprocessing.

Performance Improvement of Signature-based Traffic Classification System by Optimizing the Search Space (탐색공간 최적화를 통한 시그니쳐기반 트래픽 분석 시스템 성능향상)

  • Park, Jun-Sang;Yoon, Sung-Ho;Kim, Myung-Sup
    • Journal of Internet Computing and Services
    • /
    • v.12 no.3
    • /
    • pp.89-99
    • /
    • 2011
  • The payload signature-based traffic classification system has to deal with large amount of traffic data, as the number of internet-based applications and network traffic continue to grow. While a number of pattern-matching algorithms have been proposed to improve processing speedin the literature, the performance of pattern matching algorithms is restrictive and depends on the features of its input data. In this paper, we studied how to optimize the search space in order to improve the processing speed of the payload signature-based traffic classification system. Also, the feasibility of our design choices was proved via experimental evaluation on our campus traffic trace.

Supervised Classification Systems for High Resolution Satellite Images (고해상도 위성영상을 위한 감독분류 시스템)

  • 전영준;김진일
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.301-310
    • /
    • 2003
  • In this paper, we design and Implement the supervised classification systems for high resolution satellite images. The systems support various interfaces and statistical data of training samples so that we can select the m()st effective training data. In addition, the efficient extension of new classification algorithms and satellite image formats are applied easily through the modularized systems. The classifiers are considered the characteristics of spectral bands from the selected training data. They provide various supervised classification algorithms which include Parallelepiped, Minimum distance, Mahalanobis distance, Maximum likelihood and Fuzzy theory. We used IKONOS images for the input and verified the systems for the classification of high resolution satellite images.

A Classification Analysis using Bayesian Neural Network (베이지안 신경망을 이용한 분류분석)

  • Hwang, Jin-Soo;Choi, Seong-Yong;Jun, Hong-Suk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.12 no.2
    • /
    • pp.11-25
    • /
    • 2001
  • There are several algorithms for classification in modeling relations, patterns, and rules which exist in data. We learn to classify objects on the basis of instances presented to us, not by being given a set of classification rules. The Bayesian learning uses the probability distribution to express our knowledge about unknown parameters and update our knowledge by the law of probability as the evidence gathered from data. Also, the neural network models are designed for predicting an unknown category or quantity on the basis of known attributes by training. In this paper, we compare the misclassification error rates of Bayesian Neural Network method with those of other classification algorithms, CHAID, CART, and QUBST using several data sets.

  • PDF

Black Consumer Detection in E-Commerce Using Filter Method and Classification Algorithms (Filter Method와 Classification 알고리즘을 이용한 전자상거래 블랙컨슈머 탐지에 대한 연구)

  • Lee, Taekyu;Lee, Kyung Ho
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.6
    • /
    • pp.1499-1508
    • /
    • 2018
  • Although fast-growing e-commerce markets gave a lot of companies opportunities to expand their customer bases, it is also the case that there are growing number of cases in which the so-called 'black consumers' cause much damage on many companies. In this study, we will implement and optimize a machine learning model that detects black consumers using customer data from e-commerce store. Using filter method for feature selection and 4 different algorithms for classification, we could get the best-performing machine learning model that detects black consumer with F-measure 0.667 and could also yield improvements in performance which are 11.44% in F-measure, 10.51% in AURC, and 22.87% in TPR.

Robust Algorithms for Combining Multiple Term Weighting Vectors for Document Classification

  • Kim, Minyoung
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.16 no.2
    • /
    • pp.81-86
    • /
    • 2016
  • Term weighting is a popular technique that effectively weighs the term features to improve accuracy in document classification. While several successful term weighting algorithms have been suggested, none of them appears to perform well consistently across different data domains. In this paper we propose several reasonable methods to combine different term weight vectors to yield a robust document classifier that performs consistently well on diverse datasets. Specifically we suggest two approaches: i) learning a single weight vector that lies in a convex hull of the base vectors while minimizing the class prediction loss, and ii) a mini-max classifier that aims for robustness of the individual weight vectors by minimizing the loss of the worst-performing strategy among the base vectors. We provide efficient solution methods for these optimization problems. The effectiveness and robustness of the proposed approaches are demonstrated on several benchmark document datasets, significantly outperforming the existing term weighting methods.

Object Detection from High Resolution Satellite Image by Using Genetic Algorithms

  • Hosomura Tsukasa
    • Proceedings of the KSRS Conference
    • /
    • 2005.10a
    • /
    • pp.123-125
    • /
    • 2005
  • Many researchers conducted the effort for improving the classification accuracy of satellite image. Most of the study has used optical spectrum information of each pixel for image classification. By applying this method for high resolution satellite image, number of class becomes increase. This situation is remarkable for house, because the roof of house has variety of many colors. Even if the classification is carried out for many classes, roof color information of each house is not necessary. Most of the case, we need the information that object is house or not. In this study, we propose the method for detecting the object by using Genetic Algorithms (GA). Aircraft was selected as object. It is easy for this object to detect in the airport. An aircraft was taken as a template. Object image was taken from QuickBird. Target image includes an aircraft and Haneda Airport. Chromosome has four or five parameters which are composed of number of template, position (x,y), rotation angle, rate of enlarge. Good results were obtained in the experiment.

  • PDF