• Title/Summary/Keyword: 서포트 벡터 머신 회귀

Search Result 62, Processing Time 0.029 seconds

Study on Predicting the Designation of Administrative Issue in the KOSDAQ Market Based on Machine Learning Based on Financial Data (머신러닝 기반 KOSDAQ 시장의 관리종목 지정 예측 연구: 재무적 데이터를 중심으로)

  • Yoon, Yanghyun;Kim, Taekyung;Kim, Suyeong
    • Asia-Pacific Journal of Business Venturing and Entrepreneurship
    • /
    • v.17 no.1
    • /
    • pp.229-249
    • /
    • 2022
  • This paper investigates machine learning models for predicting the designation of administrative issues in the KOSDAQ market through various techniques. When a company in the Korean stock market is designated as administrative issue, the market recognizes the event itself as negative information, causing losses to the company and investors. The purpose of this study is to evaluate alternative methods for developing a artificial intelligence service to examine a possibility to the designation of administrative issues early through the financial ratio of companies and to help investors manage portfolio risks. In this study, the independent variables used 21 financial ratios representing profitability, stability, activity, and growth. From 2011 to 2020, when K-IFRS was applied, financial data of companies in administrative issues and non-administrative issues stocks are sampled. Logistic regression analysis, decision tree, support vector machine, random forest, and LightGBM are used to predict the designation of administrative issues. According to the results of analysis, LightGBM with 82.73% classification accuracy is the best prediction model, and the prediction model with the lowest classification accuracy is a decision tree with 71.94% accuracy. As a result of checking the top three variables of the importance of variables in the decision tree-based learning model, the financial variables common in each model are ROE(Net profit) and Capital stock turnover ratio, which are relatively important variables in designating administrative issues. In general, it is confirmed that the learning model using the ensemble had higher predictive performance than the single learning model.

QSPR analysis for predicting heat of sublimation of organic compounds (유기화합물의 승화열 예측을 위한 QSPR분석)

  • Park, Yu Sun;Lee, Jong Hyuk;Park, Han Woong;Lee, Sung Kwang
    • Analytical Science and Technology
    • /
    • v.28 no.3
    • /
    • pp.187-195
    • /
    • 2015
  • The heat of sublimation (HOS) is an essential parameter used to resolve environmental problems in the transfer of organic contaminants to the atmosphere and to assess the risk of toxic chemicals. The experimental measurement of the heat of sublimation is time-consuming, expensive, and complicated. In this study, quantitative structural property relationships (QSPR) were used to develop a simple and predictive model for measuring the heat of sublimation of organic compounds. The population-based forward selection method was applied to select an informative subset of descriptors of learning algorithms, such as by using multiple linear regression (MLR) and the support vector machine (SVM) method. Each individual model and consensus model was evaluated by internal validation using the bootstrap method and y-randomization. The predictions of the performance of the external test set were improved by considering their applicability to the domain. Based on the results of the MLR model, we showed that the heat of sublimation was related to dispersion, H-bond, electrostatic forces, and the dipole-dipole interaction between inter-molecules.

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.

Stock Price Direction Prediction Using Convolutional Neural Network: Emphasis on Correlation Feature Selection (합성곱 신경망을 이용한 주가방향 예측: 상관관계 속성선택 방법을 중심으로)

  • Kyun Sun Eo;Kun Chang Lee
    • Information Systems Review
    • /
    • v.22 no.4
    • /
    • pp.21-39
    • /
    • 2020
  • Recently, deep learning has shown high performance in various applications such as pattern analysis and image classification. Especially known as a difficult task in the field of machine learning research, stock market forecasting is an area where the effectiveness of deep learning techniques is being verified by many researchers. This study proposed a deep learning Convolutional Neural Network (CNN) model to predict the direction of stock prices. We then used the feature selection method to improve the performance of the model. We compared the performance of machine learning classifiers against CNN. The classifiers used in this study are as follows: Logistic Regression, Decision Tree, Neural Network, Support Vector Machine, Adaboost, Bagging, and Random Forest. The results of this study confirmed that the CNN showed higher performancecompared with other classifiers in the case of feature selection. The results show that the CNN model effectively predicted the stock price direction by analyzing the embedded values of the financial data

Evaluation of a Thermal Conductivity Prediction Model for Compacted Clay Based on a Machine Learning Method (기계학습법을 통한 압축 벤토나이트의 열전도도 추정 모델 평가)

  • Yoon, Seok;Bang, Hyun-Tae;Kim, Geon-Young;Jeon, Haemin
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.41 no.2
    • /
    • pp.123-131
    • /
    • 2021
  • The buffer is a key component of an engineered barrier system that safeguards the disposal of high-level radioactive waste. Buffers are located between disposal canisters and host rock, and they can restrain the release of radionuclides and protect canisters from the inflow of ground water. Since considerable heat is released from a disposal canister to the surrounding buffer, the thermal conductivity of the buffer is a very important parameter in the entire disposal safety. For this reason, a lot of research has been conducted on thermal conductivity prediction models that consider various factors. In this study, the thermal conductivity of a buffer is estimated using the machine learning methods of: linear regression, decision tree, support vector machine (SVM), ensemble, Gaussian process regression (GPR), neural network, deep belief network, and genetic programming. In the results, the machine learning methods such as ensemble, genetic programming, SVM with cubic parameter, and GPR showed better performance compared with the regression model, with the ensemble with XGBoost and Gaussian process regression models showing best performance.

A study on variable selection and classification in dynamic analysis data for ransomware detection (랜섬웨어 탐지를 위한 동적 분석 자료에서의 변수 선택 및 분류에 관한 연구)

  • Lee, Seunghwan;Hwang, Jinsoo
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.4
    • /
    • pp.497-505
    • /
    • 2018
  • Attacking computer systems using ransomware is very common all over the world. Since antivirus and detection methods are constantly improved in order to detect and mitigate ransomware, the ransomware itself becomes equally better to avoid detection. Several new methods are implemented and tested in order to optimize the protection against ransomware. In our work, 582 of ransomware and 942 of normalware sample data along with 30,967 dynamic action sequence variables are used to detect ransomware efficiently. Several variable selection techniques combined with various machine learning based classification techniques are tried to protect systems from ransomwares. Among various combinations, chi-square variable selection and random forest gives the best detection rates and accuracy.

A Predictive Model of the Generator Output Based on the Learning of Performance Data in Power Plant (발전플랜트 성능데이터 학습에 의한 발전기 출력 추정 모델)

  • Yang, HacJin;Kim, Seong Kun
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.12
    • /
    • pp.8753-8759
    • /
    • 2015
  • Establishment of analysis procedures and validated performance measurements for generator output is required to maintain stable management of generator output in turbine power generation cycle. We developed turbine expansion model and measurement validation model for the performance calculation of generator using turbine output based on ASME (American Society of Mechanical Engineers) PTC (Performance Test Code). We also developed verification model for uncertain measurement data related to the turbine and generator output. Although the model in previous researches was developed using artificial neural network and kernel regression, the verification model in this paper was based on algorithms through Support Vector Machine (SVM) model to overcome the problems of unmeasured data. The selection procedures of related variables and data window for verification learning was also developed. The model reveals suitability in the estimation procss as the learning error was in the range of about 1%. The learning model can provide validated estimations for corrective performance analysis of turbine cycle output using the predictions of measurement data loss.

A Survey on Oil Spill and Weather Forecast Using Machine Learning Based on Neural Networks and Statistical Methods (신경망 및 통계 기법 기반의 기계학습을 이용한 유류유출 및 기상 예측 연구 동향)

  • Kim, Gyoung-Do;Kim, Yong-Hyuk
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.10
    • /
    • pp.1-8
    • /
    • 2017
  • Accurate forecasting enables to effectively prepare for future phenomenon. Especially, meteorological phenomenon is closely related with human life, and it can prevent from damage such as human life and property through forecasting of weather and disaster that can occur. To respond quickly and effectively to oil spill accidents, it is important to accurately predict the movement of oil spills and the weather in the surrounding waters. In this paper, we selected four representative machine learning techniques: support vector machine, Gaussian process, multilayer perceptron, and radial basis function network that have shown good performance and predictability in the previous studies related to oil spill detection and prediction in meteorology such as wind, rainfall and ozone. we suggest the applicability of oil spill prediction model based on machine learning.

City Gas Pipeline Pressure Prediction Model (도시가스 배관압력 예측모델)

  • Chung, Won Hee;Park, Giljoo;Gu, Yeong Hyeon;Kim, Sunghyun;Yoo, Seong Joon;Jo, Young-do
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.33-47
    • /
    • 2018
  • City gas pipelines are buried underground. Because of this, pipeline is hard to manage, and can be easily damaged. This research proposes a real time prediction system that helps experts can make decision about pressure anomalies. The gas pipline pressure data of Jungbu City Gas Company, which is one of the domestic city gas suppliers, time variables and environment variables are analysed. In this research, regression models that predicts pipeline pressure in minutes are proposed. Random forest, support vector regression (SVR), long-short term memory (LSTM) algorithms are used to build pressure prediction models. A comparison of pressure prediction models' preformances shows that the LSTM model was the best. LSTM model for Asan-si have root mean square error (RMSE) 0.011, mean absolute percentage error (MAPE) 0.494. LSTM model for Cheonan-si have RMSE 0.015, MAPE 0.668.

Exploring the Sentiment Analysis of Electric Vehicles Social Media Data by Using Feature Selection Methods (속성선택방법을 이용한 전기자동차 소셜미디어 데이터의 감성분석 연구)

  • Costello, Francis Joseph;Lee, Kun Chang
    • Journal of Digital Convergence
    • /
    • v.18 no.2
    • /
    • pp.249-259
    • /
    • 2020
  • This study presents a recently obtained social media data set based upon the case study of Electric Vehicles (EV) and looks to implement a sentiment analysis (SA) in order to gain insights. This study uses two methods in order to fully analyze the public's sentiment on EVs. First, we implement a SA tool in which we used to extract the sentiment of comments. Next we labeled the data with these sentiments obtained and classified them. While performing classification we found the problem of dimensionality and also explored the use of feature selection (FS) models in order to reduce the data set's dimensionality. We found that the use of three FS models (Chi Squared, Information Gain and ReliefF) showed the most promising results when used alongside a logistic and support vector machines classification algorithm. the contributions of this paper are in providing an real-world example of social media text analytics which can be adopted in many other areas of research and business. Moving forward researchers can use the methodological approach in this paper to further refine and improve their own case uses in text analytics.