Search | Korea Science

Text Classification based on a Feature Projection Technique with Robustness from Noisy Data (오류 데이타에 강한 자질 투영법 기반의 문서 범주화 기법)

고영중;서정연
- Journal of KIISE:Software and Applications
- /
- v.31 no.4
- /
- pp.498-504
- /
- 2004
This paper presents a new text classifier based on a feature projection technique. In feature projections, training documents are represented as the projections on each feature. A classification process is based on individual feature projections. The final classification is determined by the sum from the individual classification of each feature. In our experiments, the proposed classifier showed high performance. Especially, it have fast execution speed and robustness with noisy data in comparison with k-NN and SVM, which are among the state-of-art text classifiers. Since the algorithm of the proposed classifier is very simple, its implementation and training process can be done very simply. Therefore, it can be a useful classifier in text classification tasks which need fast execution speed, robustness, and high performance.
PDF KSCI

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

Kim, Myoung-Jong
- Journal of Intelligence and Information Systems
- /
- v.18 no.2
- /
- pp.29-45
- /
- 2012
Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.
https://doi.org/10.13088/jiis.2012.18.2.029 인용 PDF KSCI

Support Vector Machines-based classification of video file fragments (서포트 벡터 머신 기반 비디오 조각파일 분류)

Kang, Hyun-Suk;Lee, Young-Seok
- Journal of the Korea Academia-Industrial cooperation Society
- /
- v.16 no.1
- /
- pp.652-657
- /
- 2015
BitTorrent is an innovative protocol related to file-sharing and file-transferring, which allows users to receive pieces of files from multiple sharer on the Internet to make the pieces into complete files. In reality, however, free distribution of illegal or copyright related video data is counted for crime. Difficulty of regulation on the copyright of data on BitTorrent is caused by the fact that data is transferred with the pieces of files instead of the complete file formats. Therefore, the classification process of file formats of the digital contents should take precedence in order to restore digital contents from the pieces of files received from BitTorrent, and to check the violation of copyright. This study has suggested SVM classifier for the classification of digital files, which has the feature vector of histogram differential on the pieces of files. The suggested classifier has evaluated the performance with the division factor by applying the classifier to three different formats of video files.
https://doi.org/10.5762/KAIS.2015.16.1.652 인용 PDF KSCI

Hand Gesture Interface Using Mobile Camera Devices (모바일 카메라 기기를 이용한 손 제스처 인터페이스)

Lee, Chan-Su;Chun, Sung-Yong;Sohn, Myoung-Gyu;Lee, Sang-Heon
- Journal of KIISE:Computing Practices and Letters
- /
- v.16 no.5
- /
- pp.621-625
- /
- 2010
This paper presents a hand motion tracking method for hand gesture interface using a camera in mobile devices such as a smart phone and PDA. When a camera moves according to the hand gesture of the user, global optical flows are generated. Therefore, robust hand movement estimation is possible by considering dominant optical flow based on histogram analysis of the motion direction. A continuous hand gesture is segmented into unit gestures by motion state estimation using motion phase, which is determined by velocity and acceleration of the estimated hand motion. Feature vectors are extracted during movement states and hand gestures are recognized at the end state of each gesture. Support vector machine (SVM), k-nearest neighborhood classifier, and normal Bayes classifier are used for classification. SVM shows 82% recognition rate for 14 hand gestures.
PDF KSCI

On the Performance of Cuckoo Search and Bat Algorithms Based Instance Selection Techniques for SVM Speed Optimization with Application to e-Fraud Detection

AKINYELU, Andronicus Ayobami;ADEWUMI, Aderemi Oluyinka
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.12 no.3
- /
- pp.1348-1375
- /
- 2018
Support Vector Machine (SVM) is a well-known machine learning classification algorithm, which has been widely applied to many data mining problems, with good accuracy. However, SVM classification speed decreases with increase in dataset size. Some applications, like video surveillance and intrusion detection, requires a classifier to be trained very quickly, and on large datasets. Hence, this paper introduces two filter-based instance selection techniques for optimizing SVM training speed. Fast classification is often achieved at the expense of classification accuracy, and some applications, such as phishing and spam email classifiers, are very sensitive to slight drop in classification accuracy. Hence, this paper also introduces two wrapper-based instance selection techniques for improving SVM predictive accuracy and training speed. The wrapper and filter based techniques are inspired by Cuckoo Search Algorithm and Bat Algorithm. The proposed techniques are validated on three popular e-fraud types: credit card fraud, spam email and phishing email. In addition, the proposed techniques are validated on 20 other datasets provided by UCI data repository. Moreover, statistical analysis is performed and experimental results reveals that the filter-based and wrapper-based techniques significantly improved SVM classification speed. Also, results reveal that the wrapper-based techniques improved SVM predictive accuracy in most cases.
https://doi.org/10.3837/tiis.2018.03.021 인용 PDF KSCI

A Novel Kernel SVM Algorithm with Game Theory for Network Intrusion Detection

Liu, Yufei;Pi, Dechang
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.8
- /
- pp.4043-4060
- /
- 2017
Network Intrusion Detection (NID), an important topic in the field of information security, can be viewed as a pattern recognition problem. The existing pattern recognition methods can achieve a good performance when the number of training samples is large enough. However, modern network attacks are diverse and constantly updated, and the training samples have much smaller size. Furthermore, to improve the learning ability of SVM, the research of kernel functions mainly focus on the selection, construction and improvement of kernel functions. Nonetheless, in practice, there are no theories to solve the problem of the construction of kernel functions perfectly. In this paper, we effectively integrate the advantages of the radial basis function kernel and the polynomial kernel on the notion of the game theory and propose a novel kernel SVM algorithm with game theory for NID, called GTNID-SVM. The basic idea is to exploit the game theory in NID to get a SVM classifier with better learning ability and generalization performance. To the best of our knowledge, GTNID-SVM is the first algorithm that studies ensemble kernel function with game theory in NID. We conduct empirical studies on the DARPA dataset, and the results demonstrate that the proposed approach is feasible and more effective.
https://doi.org/10.3837/tiis.2017.08.016 인용 PDF KSCI

Design of Black Plastics Classifier Using Data Information (데이터 정보를 이용한 흑색 플라스틱 분류기 설계)

Park, Sang-Beom;Oh, Sung-Kwun
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.67 no.4
- /
- pp.569-577
- /
- 2018
In this paper, with the aid of information which is included within data, preprocessing algorithm-based black plastic classifier is designed. The slope and area of spectrum obtained by using laser induced breakdown spectroscopy(LIBS) are analyzed for each material and its ensuing information is applied as the input data of the proposed classifier. The slope is represented by the rate of change of wavelength and intensity. Also, the area is calculated by the wavelength of the spectrum peak where the material property of chemical elements such as carbon and hydrogen appears. Using informations such as slope and area, input data of the proposed classifier is constructed. In the preprocessing part of the classifier, Principal Component Analysis(PCA) and fuzzy transform are used for dimensional reduction from high dimensional input variables to low dimensional input variables. Characteristic analysis of the materials as well as the processing speed of the classifier is improved. In the condition part, FCM clustering is applied and linear function is used as connection weight in the conclusion part. By means of Particle Swarm Optimization(PSO), parameters such as the number of clusters, fuzzification coefficient and the number of input variables are optimized. To demonstrate the superiority of classification performance, classification rate is compared by using WEKA 3.8 data mining software which contains various classifiers such as Naivebayes, SVM and Multilayer perceptron.
https://doi.org/10.5370/KIEE.2018.67.4.569 인용 PDF KSCI

Real-time Face Detection Method using SVM Classifier (SW 분류기를 이용한 실시간 얼굴 검출 방법)

지형근;이경희;반성범
- Proceedings of the IEEK Conference
- /
- 2003.11a
- /
- pp.529-532
- /
- 2003
In this paper, we describe new method to detect face in real-time. We use color information, edge information, and binary information to detect candidate regions of eyes from input image, and then extract face region using the detected eye pall. We verify both eye candidate regions and face region using Support Vector Machines(SVM). It is possible to perform fast and reliable face detection because we can protect false detection through these verification processes. From the experimental results, we confirmed the proposed algorithm shows very excellent face detection performance.
PDF

Estimating Basin of Attraction for Multi-Basin Processes Using Support Vector Machine

Lee, Dae-Won;Lee, Jae-Wook
- Management Science and Financial Engineering
- /
- v.18 no.1
- /
- pp.49-53
- /
- 2012
A novel method of transient stability analysis is presented in this paper. The proposed method extracts data points near the basin-of-attraction boundary and then builds a support vector machine (SVM) model learned from the generated data. The constructed SVM classifier has been shown to reduce dramatically the conservativeness of the estimated basin of attraction.
https://doi.org/10.7737/MSFE.2012.18.1.049 인용 PDF KSCI

Development and Application of Risk Recovery Index using Machine Learning Algorithms (기계학습알고리즘을 이용한 위험회복지수의 개발과 활용)

Kim, Sun Woong
- Journal of Information Technology Applications and Management
- /
- v.23 no.4
- /
- pp.25-39
- /
- 2016
Asset prices decline sharply and stock markets collapse when financial crisis happens. Recently we have encountered more frequent financial crises than ever. 1998 currency crisis and 2008 global financial crisis triggered academic researches on early warning systems that aim to detect the symptom of financial crisis in advance. This study proposes a risk recovery index for detection of good opportunities from financial market instability. We use SVM classifier algorithms to separate recovery period from unstable financial market data. Input variables are KOSPI index and V-KOSPI200 index. Our SVM algorithms show highly accurate forecasting results on testing data as well as training data. Risk recovery index is derived from our SVM-trained outputs. We develop a trading system that utilizes the suggested risk recovery index. The trading result records very high profit, that is, its annual return runs to 121%.
https://doi.org/10.21219/jitam.2016.23.4.025 인용 PDF KSCI

Search Result 374, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)