• Title/Summary/Keyword: SVM 모델

Search Result 395, Processing Time 0.024 seconds

A Comparative Study on Feature Selection and Classification Methods Using Closed Frequent Patterns Mining (닫힌 빈발 패턴을 기반으로 한 특징 선택과 분류방법 비교)

  • Zhang, Lei;Jin, Cheng Hao;Ryu, Keun Ho
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.148-151
    • /
    • 2010
  • 분류 기법은 데이터 마이닝 기술 중 가장 잘 알려진 방법으로서, Decision tree, SVM(Support Vector Machine), ANN(Artificial Neural Network) 등 기법을 포함한다. 분류 기법은 이미 알려진 상호 배반적인 몇 개 그룹에 속하는 다변량 관측치로부터 각각의 그룹이 어떤 특징을 가지고 있는지 분류 모델을 만들고, 소속 그룹이 알려지지 않은 새로운 관측치가 어떤 그룹에 분류될 것인가를 결정하는 분석 방법이다. 분류기법을 수행할 때에 기본적으로 특징 공간이 잘 표현되어 있다고 가정한다. 그러나 실제 응용에서는 단일 특징으로 구성된 특징공간이 분명하지 않기 때문에 분류를 잘 수행하지 못하는 문제점이 있다. 본 논문에서는 이 문제에 대한 해결방안으로써 많은 정보를 포함하면서 빈발패턴에 대한 정보의 순실이 없는 닫힌 빈발패턴 기반 분류에 대한 연구를 진행하였다. 본 실험에서는 ${\chi}^2$(Chi-square)과 정보이득(Information Gain) 속성 선택 척도를 사용하여 의미있는 특징 선택을 수행하였다. 그 결과, 이 연구에서 제시한 척도를 사용하여 특징 선택을 수행한 경우, C4.5, SVM 과 같은 분류기법보다 더 향상된 분류 성능을 보였다.

Model based Facial Expression Recognition using New Feature Space (새로운 얼굴 특징공간을 이용한 모델 기반 얼굴 표정 인식)

  • Kim, Jin-Ok
    • The KIPS Transactions:PartB
    • /
    • v.17B no.4
    • /
    • pp.309-316
    • /
    • 2010
  • This paper introduces a new model based method for facial expression recognition that uses facial grid angles as feature space. In order to be able to recognize the six main facial expression, proposed method uses a grid approach and therefore it establishes a new feature space based on the angles that each gird's edge and vertex form. The way taken in the paper is robust against several affine transformations such as translation, rotation, and scaling which in other approaches are considered very harmful in the overall accuracy of a facial expression recognition algorithm. Also, this paper demonstrates the process that the feature space is created using angles and how a selection process of feature subset within this space is applied with Wrapper approach. Selected features are classified by SVM, 3-NN classifier and classification results are validated with two-tier cross validation. Proposed method shows 94% classification result and feature selection algorithm improves results by up to 10% over the full set of feature.

Emotion Classification of User's Utterance for a Dialogue System (대화 시스템을 위한 사용자 발화 문장의 감정 분류)

  • Kang, Sang-Woo;Park, Hong-Min;Seo, Jung-Yun
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.4
    • /
    • pp.459-480
    • /
    • 2010
  • A dialogue system includes various morphological analyses for recognizing a user's intention from the user's utterances. However, a user can represent various intentions via emotional states in addition to morphological expressions. Thus, a user's emotion recognition can analyze a user's intention in various manners. This paper presents a new method to automatically recognize a user's emotion for a dialogue system. For general emotions, we define nine categories using a psychological approach. For an optimal feature set, we organize a combination of sentential, a priori, and context features. Then, we employ a support vector machine (SVM) that has been widely used in various learning tasks to automatically classify a user's emotions. The experiment results show that our method has a 62.8% F-measure, 15% higher than the reference system.

  • PDF

Web Document Classification Based on Hangeul Morpheme and Keyword Analyses (한글 형태소 및 키워드 분석에 기반한 웹 문서 분류)

  • Park, Dan-Ho;Choi, Won-Sik;Kim, Hong-Jo;Lee, Seok-Lyong
    • The KIPS Transactions:PartD
    • /
    • v.19D no.4
    • /
    • pp.263-270
    • /
    • 2012
  • With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.

Text Categorization Using TextRank Algorithm (TextRank 알고리즘을 이용한 문서 범주화)

  • Bae, Won-Sik;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.1
    • /
    • pp.110-114
    • /
    • 2010
  • We describe a new method for text categorization using TextRank algorithm. Text categorization is a problem that over one pre-defined categories are assigned to a text document. TextRank algorithm is a graph-based ranking algorithm. If we consider that each word is a vertex, and co-occurrence of two adjacent words is a edge, we can get a graph from a document. After that, we find important words using TextRank algorithm from the graph and make feature which are pairs of words which are each important word and a word adjacent to the important word. We use classifiers: SVM, Na$\ddot{i}$ve Bayesian classifier, Maximum Entropy Model, and k-NN classifier. We use non-cross-posted version of 20 Newsgroups data set. In consequence, we had an improved performance in whole classifiers, and the result tells that is a possibility of TextRank algorithm in text categorization.

Recognition of Indoor and Outdoor Exercising Activities using Smartphone Sensors and Machine Learning (스마트폰 센서와 기계학습을 이용한 실내외 운동 활동의 인식)

  • Kim, Jaekyung;Ju, YeonHo
    • Journal of Creative Information Culture
    • /
    • v.7 no.4
    • /
    • pp.235-242
    • /
    • 2021
  • Recently, many human activity recognition(HAR) researches using smartphone sensor data have been studied. HAR can be utilized in various fields, such as life pattern analysis, exercise measurement, and dangerous situation detection. However researches have been focused on recognition of basic human behaviors or efficient battery use. In this paper, exercising activities performed indoors and outdoors were defined and recognized. Data collection and pre-processing is performed to recognize the defined activities by SVM, random forest and gradient boosting model. In addition, the recognition result is determined based on voting class approach for accuracy and stable performance. As a result, the proposed activities were recognized with high accuracy and in particular, similar types of indoor and outdoor exercising activities were correctly classified.

Detection of Abnormal Dam Water Level Data Based on Machine Learning (기계학습에 기반한 댐 수위 이상 데이터 탐지)

  • Bang, Suil;Lee, Do-Gil
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2021.05a
    • /
    • pp.293-296
    • /
    • 2021
  • K-water에서는 다목적댐의 관리를 위해 실시간으로 댐수위, 하천 수위 및 강우량 등을 계측하고 있으며, 계측된 값들은 댐을 효과적으로 운영하는데 필요한 데이터로 활용되고 있다. 특히 댐수위 이상 데이터를 탐지하지 못한 채 그대로 사용할 경우 댐의 방류 시기와 방류량 등을 결정하는 중요한 의사결정을 그르칠 수 있으므로 이를 신속히 탐지하는 것이 매우 중요하다. 현재의 자동화된 이상 데이터 탐지방법 중 하나는 현재 데이터가 최댓값과 최솟값을 초과할 때, 다른 하나는 현재 데이터와 일정 시간 동안의 평균값 간의 차이가 관리자가 정한 특정 값을 벗어났을 때를 기준으로 삼고 있다. 전자는 상한과 하한의 초과 여부만 판단하므로 탐지가 쉬우나 정상범위 내에서 발생한 이상 데이터는 탐지가 불가하다. 후자는 관리자의 경험을 통해 판단 조건을 정하기 때문에 객관성이 결여되는 문제가 있다. 특히 방류와 강우가 복합적으로 댐수위에 영향을 미치는 홍수기에 관리자의 경험에 기초한 이상 데이터 판별은 신뢰성의 문제가 있을 수 있다. 따라서 본 연구에서는 기계학습을 최초로 적용하여 이상 데이터를 탐지하고자 하였다. 댐수위, 누적강우량 및 누적방류량 데이터와 댐수위데이터를 가공하여 생성한 댐수위차, 댐수위차평균, 댐수위평균 등 자질들의 다양한 조합을 만든 후 이를 Random Forest, SVM, AdaptiveBoost 및 다층퍼셉트론(MLP) 등과 같은 여러 가지 기계학습모델 등을 통해 이상 데이터를 판별하는 실험(분류)을 하였다. 실험결과 댐수위, 댐수위차, 댐수위-댐수위평균, 누적강우량, 누적방류량 및 댐수위차평균을 사용하였을 때 MLP에서 가장 우수한 성능을 보였다. 이 연구를 통해서 댐수위 이상 데이터를 기계학습의 분류기능을 통해 효과적으로 탐지할 수 있다는 것과 모델의 성능은 실험에 사용한 자질의 수뿐 아니라 자질의 종류에도 큰 영향을 받는다는 것을 알 수 있었다.

Prediction of OPS(On-base Plus Slugging) in KBO League (한국프로야구에서 장타율과 출루율(OPS) 예측 연구)

  • Dong Yun Shin;Jinho Kim
    • The Journal of Bigdata
    • /
    • v.7 no.1
    • /
    • pp.49-61
    • /
    • 2022
  • In sports, the proportion of data analysis in team management such as team strategy planning and marketing is increasing. In KBO(Korea Baseball Organization) league, in particular, plans such as recruiting players and fostering players are established to devise team strategies for the next year, such as FA and trade, at the end of a season. For these reasons, it is very important to predict players' performance for the next year. In this study, the target was limited to only the batter and tried to find out how to predict whether the performance of the next year will improve. As a standard record for rising and falling, OPS(On-Base Plus Slugging), which is easy to calculate and has a high relationship with team score, was used. In this study, 40 years of regular season data from 1982 to 2021 were used as data, and 11 machine learning classification models were used as experimental methods. Predicting the rise and fall of OPS, RBF SVM, Neural Net, Gaussian Process, and AdaBoost were more accurate than other classification models, and age did not significantly affect accuracy.

An Untrained Person's Posture Estimation Scheme by Exploiting a Single 24GHz FMCW Radar and 2D CNN (단일 24GHz FMCW 레이더 및 2D CNN을 이용하여 학습되지 않은 요구조자의 자세 추정 기법)

  • Kyongseok Jang;Junhao Zhou;Chao Sun;Youngok Kim
    • Journal of the Society of Disaster Information
    • /
    • v.19 no.4
    • /
    • pp.897-907
    • /
    • 2023
  • Purpose: In this study, We aim to estimate a untrained person's three postures using a 2D CNN model which is trained with minimal FFT data collected by a 24GHz FMCW radar. Method: In an indoor space, we collected FFT data for three distinct postures (standing, sitting, and lying) from three different individuals. To apply this data to a 2D CNN model, we first converted the collected data into 2D images. These images were then trained using the 2D CNN model to recognize the distinct features of each posture. Following the training, we evaluated the model's accuracy in differentiating the posture features across various individuals. Result: According to the experimental results, the average accuracy of the proposed scheme for the three postures was shown to be a 89.99% and it outperforms the conventional 1D CNN and the SVM schemes. Conclusion: In this study, we aim to estimate any person's three postures using a 2D CNN model and a 24GHz FMCW radar for disastrous situations in indoor. it is shown that the different posture of any persons can be accurately estimated even though his or her data is not used for training the AI model.

Machine Learning Based MMS Point Cloud Semantic Segmentation (머신러닝 기반 MMS Point Cloud 의미론적 분할)

  • Bae, Jaegu;Seo, Dongju;Kim, Jinsoo
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_3
    • /
    • pp.939-951
    • /
    • 2022
  • The most important factor in designing autonomous driving systems is to recognize the exact location of the vehicle within the surrounding environment. To date, various sensors and navigation systems have been used for autonomous driving systems; however, all have limitations. Therefore, the need for high-definition (HD) maps that provide high-precision infrastructure information for safe and convenient autonomous driving is increasing. HD maps are drawn using three-dimensional point cloud data acquired through a mobile mapping system (MMS). However, this process requires manual work due to the large numbers of points and drawing layers, increasing the cost and effort associated with HD mapping. The objective of this study was to improve the efficiency of HD mapping by segmenting semantic information in an MMS point cloud into six classes: roads, curbs, sidewalks, medians, lanes, and other elements. Segmentation was performed using various machine learning techniques including random forest (RF), support vector machine (SVM), k-nearest neighbor (KNN), and gradient-boosting machine (GBM), and 11 variables including geometry, color, intensity, and other road design features. MMS point cloud data for a 130-m section of a five-lane road near Minam Station in Busan, were used to evaluate the segmentation models; the average F1 scores of the models were 95.43% for RF, 92.1% for SVM, 91.05% for GBM, and 82.63% for KNN. The RF model showed the best segmentation performance, with F1 scores of 99.3%, 95.5%, 94.5%, 93.5%, and 90.1% for roads, sidewalks, curbs, medians, and lanes, respectively. The variable importance results of the RF model showed high mean decrease accuracy and mean decrease gini for XY dist. and Z dist. variables related to road design, respectively. Thus, variables related to road design contributed significantly to the segmentation of semantic information. The results of this study demonstrate the applicability of segmentation of MMS point cloud data based on machine learning, and will help to reduce the cost and effort associated with HD mapping.