• Title/Summary/Keyword: 서포트벡터기계

Search Result 106, Processing Time 0.034 seconds

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

  • Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.265-276
    • /
    • 2019
  • The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.

Acoustic Emission based early fault detection and diagnosis method for pipeline (음향방출 기반 배관 조기 결함 검출 및 진단 방법)

  • Kim, Jaeyoung;Jeong, Inkyu;Kim, Jongmyon
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.8 no.3
    • /
    • pp.571-578
    • /
    • 2018
  • The deteriorated pipline often causes the unexpected leakage and crack. Negligence and late maintenance leads the enormous damage for gas and water resource. This paper proposes early fault detection and diagnosis algorithm for pipeline using acoustic emission (AE) signals. Early fault detection method for pipeline compares the frequency amplitude of the spectrum to that of the spectrum in normal condition. Larger amplitude of the spectrum indicates abnormal condition. Early fault diagnosis algorithm uses support vector machines (SVM), which is trained for normal and abnormal conditions to diagnose the measured AE signal from the target pipeline. In the experiment, a pipeline testbed is constructed similarly to real industrial pipeline. Normal, 5mm cracked, 10mm holed pipelines are installed and tested in this study. The proposed fault detection and diagnosis technique is validated as an efficient approach to detect early faulty condition of pipeline.

BLE Signals-based Machine Learning for Determining Indoor Presence (BLE 신호 기반 기계학습을 이용한 재실 여부 결정 방법)

  • Kim, Seong-Chang;Kim, Jin-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.12
    • /
    • pp.1855-1862
    • /
    • 2022
  • Various indoor location-based services can be provided through indoor presence determination and indoor positioning technology using Beacon. However, since the BLE signal advertised by the beacon has an unstable RSSI due to problems such as multi-path fading, it is difficult to guarantee the accuracy of indoor presence determination. In this paper, data were collected while the classroom door was open to ensure accuracy in various situations. Based on the collected data, we propose an indoor presence determination method considering the characteristics of the signal. The proposed method uses support vector machine, showed about 10% accuracy improvement compared to the results using raw RSSI only. This method has the advantage of being able to accurately determine indoor presence with only one receiver. It is expected that the proposed method can implement a low-cost system for determining indoor presence with high accuracy.

Evaluation of Classification Models of Mild Left Ventricular Diastolic Dysfunction by Tei Index (Tei Index를 이용한 경도의 좌심실 이완 기능 장애 분류 모델 평가)

  • Su-Min Kim;Soo-Young Ye
    • Journal of the Korean Society of Radiology
    • /
    • v.17 no.5
    • /
    • pp.761-766
    • /
    • 2023
  • In this paper, TI was measured to classify the presence or absence of mild left ventricular diastolic dysfunction. Of the total 306 data, 206 were used as training data and 100 were used as test data, and the machine learning models used for classification used SVM and KNN. As a result, it was confirmed that SVM showed relatively higher accuracy than KNN and was more useful in diagnosing the presence of left ventricular diastolic dysfunction. In future research, it is expected that classification performance can be further improved by adding various indicators that evaluate not only TI but also cardiac function and securing more data. Furthermore, it is expected to be used as basic data to predict and classify other diseases and solve the problem of insufficient medical manpower compared to the increasing number of tests.

Research on a Non-invasive Blood Glucose level Estimation Algorithm based on Near- infrared Spectroscopy (근적외선 분광법 기반 비침습식 혈당 수치 추정 알고리즘 연구)

  • Young-Man Kang;Soon-Hee Han
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.18 no.6
    • /
    • pp.1353-1362
    • /
    • 2023
  • Various methods are being attempted to resolve the inconvenience of blood glucose meters used to check blood sugar levels. In this paper, we attempted to estimate blood sugar levels non-invasively using machine learning technology from spectral data acquired using a near-infrared sensor. The non-invasive blood glucose meter used in the study has a total of six near-infrared ray emitters, including visible rays, and a light receiver that receives them. It is a device created to collect spectral data on specific parts of the human body, such as the fingers. To verify whether there was a significant difference depending on blood sugar level, we attempted to estimate blood sugar level through machine learning algorithms. As a result of applying five machine learning algorithm techniques to the collected data and adjusting various hyper parameters, it was confirmed that the support vector regression algorithm showed the best performance.

Secure Training Support Vector Machine with Partial Sensitive Part

  • Park, Saerom
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.4
    • /
    • pp.1-9
    • /
    • 2021
  • In this paper, we propose a training algorithm of support vector machine (SVM) with a sensitive variable. Although machine learning models enable automatic decision making in the real world applications, regulations prohibit sensitive information from being used to protect privacy. In particular, the privacy protection of the legally protected attributes such as race, gender, and disability is compulsory. We present an efficient least square SVM (LSSVM) training algorithm using a fully homomorphic encryption (FHE) to protect a partial sensitive attribute. Our framework posits that data owner has both non-sensitive attributes and a sensitive attribute while machine learning service provider (MLSP) can get non-sensitive attributes and an encrypted sensitive attribute. As a result, data owner can obtain the encrypted model parameters without exposing their sensitive information to MLSP. In the inference phase, both non-sensitive attributes and a sensitive attribute are encrypted, and all computations should be conducted on encrypted domain. Through the experiments on real data, we identify that our proposed method enables to implement privacy-preserving sensitive LSSVM with FHE that has comparable performance with the original LSSVM algorithm. In addition, we demonstrate that the efficient sensitive LSSVM with FHE significantly improves the computational cost with a small degradation of performance.

A Method to Find Feature Set for Detecting Various Denial Service Attacks in Power Grid (전력망에서의 다양한 서비스 거부 공격 탐지 위한 특징 선택 방법)

  • Lee, DongHwi;Kim, Young-Dae;Park, Woo-Bin;Kim, Joon-Seok;Kang, Seung-Ho
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.2
    • /
    • pp.311-316
    • /
    • 2016
  • Network intrusion detection system based on machine learning method such as artificial neural network is quite dependent on the selected features in terms of accuracy and efficiency. Nevertheless, choosing the optimal combination of features, which guarantees accuracy and efficienty, from generally used many features to detect network intrusion requires extensive computing resources. In this paper, we deal with a optimal feature selection problem to determine 6 denial service attacks and normal usage provided by NSL-KDD data. We propose a optimal feature selection algorithm. Proposed algorithm is based on the multi-start local search algorithm, one of representative meta-heuristic algorithm for solving optimization problem. In order to evaluate the performance of our proposed algorithm, comparison with a case of all 41 features used against NSL-KDD data is conducted. In addtion, comparisons between 3 well-known machine learning methods (multi-layer perceptron., Bayes classifier, and Support vector machine) are performed to find a machine learning method which shows the best performance combined with the proposed feature selection method.

Status of Groundwater Potential Mapping Research Using GIS and Machine Learning (GIS와 기계학습을 이용한 지하수 가능성도 작성 연구 현황)

  • Lee, Saro;Fetemeh, Rezaie
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.6_1
    • /
    • pp.1277-1290
    • /
    • 2020
  • Water resources which is formed of surface and groundwater, are considered as one of the pivotal natural resources worldwide. Since last century, the rapid population growth as well as accelerated industrialization and explosive urbanization lead to boost demand for groundwater for domestic, industrial and agricultural use. In fact, better management of groundwater can play crucial role in sustainable development; therefore, determining accurate location of groundwater based groundwater potential mapping is indispensable. In recent years, integration of machine learning techniques, Geographical Information System (GIS) and Remote Sensing (RS) are popular and effective methods employed for groundwater potential mapping. For determining the status of the integrated approach, a systematic review of 94 directly relevant papers were carried out over the six previous years (2015-2020). According to the literature review, the number of studies published annually increased rapidly over time. The total study area spanned 15 countries, and 85.1% of studies focused on Iran, India, China, South Korea, and Iraq. 20 variables were found to be frequently involved in groundwater potential investigations, of which 9 factors are almost always present namely slope, lithology (geology), land use/land cover (LU/LC), drainage/river density, altitude (elevation), topographic wetness index (TWI), distance from river, rainfall, and aspect. The data integration was carried random forest, support vector machine and boost regression tree among the machine learning techniques. Our study shows that for optimal results, groundwater mapping must be used as a tool to complement field work, rather than a low-cost substitute. Consequently, more study should be conducted to enhance the generalization and precision of groundwater potential map.

Learning Data Model Definition and Machine Learning Analysis for Data-Based Li-Ion Battery Performance Prediction (데이터 기반 리튬 이온 배터리 성능 예측을 위한 학습 데이터 모델 정의 및 기계학습 분석 )

  • Byoungwook Kim;Ji Su Park;Hong-Jun Jang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.3
    • /
    • pp.133-140
    • /
    • 2023
  • The performance of lithium ion batteries depends on the usage environment and the combination ratio of cathode materials. In order to develop a high-performance lithium-ion battery, it is necessary to manufacture the battery and measure its performance while varying the cathode material ratio. However, it takes a lot of time and money to directly develop batteries and measure their performance for all combinations of variables. Therefore, research to predict the performance of a battery using an artificial intelligence model has been actively conducted. However, since measurement experiments were conducted with the same battery in the existing published battery data, the cathode material combination ratio was fixed and was not included as a data attribute. In this paper, we define a training data model required to develop an artificial intelligence model that can predict battery performance according to the combination ratio of cathode materials. We analyzed the factors that can affect the performance of lithium-ion batteries and defined the mass of each cathode material and battery usage environment (cycle, current, temperature, time) as input data and the battery power and capacity as target data. In the battery data in different experimental environments, each battery data maintained a unique pattern, and the battery classification model showed that each battery was classified with an error of about 2%.

Identifying sources of heavy metal contamination in stream sediments using machine learning classifiers (기계학습 분류모델을 이용한 하천퇴적물의 중금속 오염원 식별)

  • Min Jeong Ban;Sangwook Shin;Dong Hoon Lee;Jeong-Gyu Kim;Hosik Lee;Young Kim;Jeong-Hun Park;ShunHwa Lee;Seon-Young Kim;Joo-Hyon Kang
    • Journal of Wetlands Research
    • /
    • v.25 no.4
    • /
    • pp.306-314
    • /
    • 2023
  • Stream sediments are an important component of water quality management because they are receptors of various pollutants such as heavy metals and organic matters emitted from upland sources and can be secondary pollution sources, adversely affecting water environment. To effectively manage the stream sediments, identification of primary sources of sediment contamination and source-associated control strategies will be required. We evaluated the performance of machine learning models in identifying primary sources of sediment contamination based on the physico-chemical properties of stream sediments. A total of 356 stream sediment data sets of 18 quality parameters including 10 heavy metal species(Cd, Cu, Pb, Ni, As, Zn, Cr, Hg, Li, and Al), 3 soil parameters(clay, silt, and sand fractions), and 5 water quality parameters(water content, loss on ignition, total organic carbon, total nitrogen, and total phosphorous) were collected near abandoned metal mines and industrial complexes across the four major river basins in Korea. Two machine learning algorithms, linear discriminant analysis (LDA) and support vector machine (SVM) classifiers were used to classify the sediments into four cases of different combinations of the sampling period and locations (i.e., mine in dry season, mine in wet season, industrial complex in dry season, and industrial complex in wet season). Both models showed good performance in the classification, with SVM outperformed LDA; the accuracy values of LDA and SVM were 79.5% and 88.1%, respectively. An SVM ensemble model was used for multi-label classification of the multiple contamination sources inlcuding landuses in the upland areas within 1 km radius from the sampling sites. The results showed that the multi-label classifier was comparable performance with sinlgle-label SVM in classifying mines and industrial complexes, but was less accurate in classifying dominant land uses (50~60%). The poor performance of the multi-label SVM is likely due to the overfitting caused by small data sets compared to the complexity of the model. A larger data set might increase the performance of the machine learning models in identifying contamination sources.