• Title, Summary, Keyword: 지지 벡터 기계

Search Result 96, Processing Time 0.034 seconds

Comparison of Feature Selection Methods in Support Vector Machines (지지벡터기계의 변수 선택방법 비교)

  • Kim, Kwangsu;Park, Changyi
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.1
    • /
    • pp.131-139
    • /
    • 2013
  • Support vector machines(SVM) may perform poorly in the presence of noise variables; in addition, it is difficult to identify the importance of each variable in the resulting classifier. A feature selection can improve the interpretability and the accuracy of SVM. Most existing studies concern feature selection in the linear SVM through penalty functions yielding sparse solutions. Note that one usually adopts nonlinear kernels for the accuracy of classification in practice. Hence feature selection is still desirable for nonlinear SVMs. In this paper, we compare the performances of nonlinear feature selection methods such as component selection and smoothing operator(COSSO) and kernel iterative feature extraction(KNIFE) on simulated and real data sets.

Learning and Performance Comparison of Multi-class Classification Problems based on Support Vector Machine (지지벡터기계를 이용한 다중 분류 문제의 학습과 성능 비교)

  • Hwang, Doo-Sung
    • Journal of Korea Multimedia Society
    • /
    • v.11 no.7
    • /
    • pp.1035-1042
    • /
    • 2008
  • The support vector machine, as a binary classifier, is known to surpass the other classifiers only in binary classification problems through the various experiments. Even though its theory is based on the maximal margin classifier, the support vector machine approach cannot be easily extended to the multi-classification problems. In this paper, we review the extension techniques of the support vector machine toward the multi-classification and do the performance comparison. Depending on the data decomposition of the training data, the support vector machine is easily adapted for a multi-classification problem without modifying the intrinsic characteristics of the binary classifier. The performance is evaluated on a collection of the benchmark data sets and compared according to the selected teaming strategies, the training time, and the results of the neural network with the backpropagation teaming. The experiments suggest that the support vector machine is applicable and effective in the general multi-class classification problems when compared to the results of the neural network.

  • PDF

An analysis of Speech Acts for Korean Using Support Vector Machines (지지벡터기계(Support Vector Machines)를 이용한 한국어 화행분석)

  • En Jongmin;Lee Songwook;Seo Jungyun
    • The KIPS Transactions:PartB
    • /
    • v.12B no.3
    • /
    • pp.365-368
    • /
    • 2005
  • We propose a speech act analysis method for Korean dialogue using Support Vector Machines (SVM). We use a lexical form of a word, its part of speech (POS) tags, and bigrams of POS tags as sentence features and the contexts of the previous utterance as context features. We select informative features by Chi square statistics. After training SVM with the selected features, SVM classifiers determine the speech act of each utterance. In experiment, we acquired overall $90.54\%$ of accuracy with dialogue corpus for hotel reservation domain.

A Sentiment Classification System Using Feature Extraction from Seed Words and Support Vector Machine (종자 어휘를 이용한 자질 추출과 지지 벡터 기계(SVM)을 이용한 문서 감정 분류 시스템의 개발)

  • Hwang, Jae-Won;Jeon, Tae-Gyun;Ko, Young-Joong
    • 한국HCI학회:학술대회논문집
    • /
    • /
    • pp.938-942
    • /
    • 2007
  • 신문 기사 및 상품 평은 특정 주제나 상품을 대상으로 하여 글쓴이의 감정과 의견이 잘 나타나 있는 대표적인 문서이다. 최근 여론 조사 및 상품 의견 조사 등 다양한 측면에서 대용량의 문서의 의미적 분류 및 분석이 요구되고 있다. 본 논문에서는 문서에 나타난 내용을 기준으로 문서가 나타내고 있는 감정을 긍정과 부정의 두 가지 범주로 분류하는 시스템을 구현한다. 문서 분류의 시작은 감정을 지닌 대표적인 종자 어휘(seed word)로부터 시작하며, 자질의 선정은 한국어 특징상 감정 및 감각을 표현하는 명사, 형용사, 부사, 동사를 대상으로 한다. 가중치 부여 방법은 한글 유의어 사전을 통해 종자 어휘의 의미를 확장하여 각각의 가중치를 책정한다. 단어 벡터로 표현된 입력 문서를 이진 분류기인 지지벡터 기계를 이용하여 문서에 나타난 감정을 판단하는 시스템을 구현하고 그 성능을 평가한다.

  • PDF

Word Sense Classification Using Support Vector Machines (지지벡터기계를 이용한 단어 의미 분류)

  • Park, Jun Hyeok;Lee, Songwook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.563-568
    • /
    • 2016
  • The word sense disambiguation problem is to find the correct sense of an ambiguous word having multiple senses in a dictionary in a sentence. We regard this problem as a multi-class classification problem and classify the ambiguous word by using Support Vector Machines. Context words of the ambiguous word, which are extracted from Sejong sense tagged corpus, are represented to two kinds of vector space. One vector space is composed of context words vectors having binary weights. The other vector space has vectors where the context words are mapped by word embedding model. After experiments, we acquired accuracy of 87.0% with context word vectors and 86.0% with word embedding model.

Distributed Support Vector Machines for Localization on a Sensor Newtork (센서 네트워크에서 위치 측정을 위한 분산 지지 벡터 머신)

  • Moon, Sangook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • /
    • pp.944-946
    • /
    • 2014
  • Localization of a sensor network node using machine learning has been recently studied. It is easy for Support vector machines algorithm to implement in high level language enabling parallelism. In this paper, we realized Support vector machine using python language and built a sensor network cluster with 5 Pi's. We also established a Hadoop software framework to employ MapReduce mechanism. We modified the existing Support vector machine algorithm to fit into the distributed hadoop architecture system for localization of a sensor node. In our experiment, we implemented the test sensor network with a variety of parameters and examined based on proficiency, resource evaluation, and processing time.

  • PDF

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

An Automatic Spam e-mail Filter System Using χ2 Statistics and Support Vector Machines (카이 제곱 통계량과 지지벡터기계를 이용한 자동 스팸 메일 분류기)

  • Lee, Songwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • /
    • pp.592-595
    • /
    • 2009
  • We propose an automatic spam mail classifier for e-mail data using Support Vector Machines (SVM). We use a lexical form of a word and its part of speech (POS) tags as features. We select useful features with ${\chi}^2$ statistics and represent each feature using text frequency (TF) and inversed document frequency (IDF) values for each feature. After training SVM with the features, SVM classifies each email as spam mail or not. In experiment, we acquired 82.7% of accuracy with e-mail data collected from a web mail system.

  • PDF

A Spam Message Filter System for Mobile Environment (휴대폰의 스팸문자메시지 판별 시스템)

  • Lee, Songwook
    • Annual Conference on Human and Language Technology
    • /
    • /
    • pp.194-196
    • /
    • 2010
  • 휴대폰의 광범위한 보급으로 문자메시지의 사용이 급증하고 있다. 이와 동시에 사용자가 원하지 않는 광고성 스팸문자도 넘쳐나고 있다. 본 연구는 이러한 스팸문자메시지를 자동으로 판별하는 시스템을 개발하는 것이다. 우리는 기계학습방법인 지지벡터기계(Support Vector Machine)을 사용하여 시스템을 학습하였으며 자질의 선택은 카이제곱 통계량을 이용하였다. 실험결과 F1 척도로 약 95.5%의 정확률을 얻었다

  • PDF

Constructing a Support Vector Machine for Localization on a Low-End Cluster Sensor Network (로우엔드 클러스터 센서 네트워크에서 위치 측정을 위한 지지 벡터 머신)

  • Moon, Sangook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.18 no.12
    • /
    • pp.2885-2890
    • /
    • 2014
  • Localization of a sensor network node using machine learning has been recently studied. It is easy for Support vector machines algorithm to implement in high level language enabling parallelism. Raspberrypi is a linux system which can be used as a sensor node. Pi can be used to construct IP based Hadoop clusters. In this paper, we realized Support vector machine using python language and built a sensor network cluster with 5 Pi's. We also established a Hadoop software framework to employ MapReduce mechanism. In our experiment, we implemented the test sensor network with a variety of parameters and examined based on proficiency, resource evaluation, and processing time. The experimentation showed that with more execution power and memory volume, Pi could be appropriate for a member node of the cluster, accomplishing precise classification for sensor localization using machine learning.