• 제목/요약/키워드: Machine Learning #2

검색결과 1,718건 처리시간 0.027초

Comparing Machine Learning Classifiers for Movie WOM Opinion Mining

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제9권8호
    • /
    • pp.3169-3181
    • /
    • 2015
  • Nowadays, online word-of-mouth has become a powerful influencer to marketing and sales in business. Opinion mining and sentiment analysis is frequently adopted at market research and business analytics field for analyzing word-of-mouth content. However, there still remain several challengeable areas for 1) sentiment analysis aiming for Korean word-of-mouth content in film market, 2) availability of machine learning models only using linguistic features, 3) effect of the size of the feature set. This study took a sample of 10,000 movie reviews which had posted extremely negative/positive rating in a movie portal site, and conducted sentiment analysis with four machine learning algorithms: naïve Bayesian, decision tree, neural network, and support vector machines. We found neural network and support vector machine produced better accuracy than naïve Bayesian and decision tree on every size of the feature set. Besides, the performance of them was boosting with increasing of the feature set size.

외연적 객체모델의 정형화 (A Formal Presentation of the Extensional Object Model)

  • 정철용
    • Asia pacific journal of information systems
    • /
    • 제5권2호
    • /
    • pp.143-176
    • /
    • 1995
  • We present an overview of the Extensional Object Model (ExOM) and describe in detail the learning and classification components which integrate concepts from machine learning and object-oriented databases. The ExOM emphasizes flexibility in information acquisition, learning, and classification which are useful to support tasks such as diagnosis, planning, design, and database mining. As a vehicle to integrate machine learning and databases, the ExOM supports a broad range of learning and classification methods and integrates the learning and classification components with traditional database functions. To ensure the integrity of ExOM databases, a subsumption testing rule is developed that encompasses categories defined by type expressions as well as concept definitions generated by machine learning algorithms. A prototype of the learning and classification components of the ExOM is implemented in Smalltalk/V Windows.

  • PDF

전문가시스템 실용화를 위한 지식오류분석방법론 연구 (A Development of Knowledge Error Analysis Methodology for practical use of Expert Systems)

  • 김현수
    • Asia pacific journal of information systems
    • /
    • 제6권2호
    • /
    • pp.77-105
    • /
    • 1996
  • The accuracy of knowledge is a major concern for expert system developers and users. Machine learning approaches have recently been found to be useful in knowledge acquisition for expert systems. However, the accuracy of concept acquired from machine learning could not be analyzed in most cases. In this paper we develop a comprehensive knowledge error analysis methodology for practical use of expert systems. Decision tree induction is an important type of machine learning method for business expert systems. Here we start to analyze with knowledge acquired from decision tree induction method, and extend the results to develop error analysis methodology for general machine learning methods. We give several examples and illustrations for these results. We also discuss the applicability of these results to multistrategy learning approaches.

  • PDF

머신러닝 자동화를 위한 개발 환경에 관한 연구 (A Study on Development Environments for Machine Learning)

  • 김동길;박용순;박래정;정태윤
    • 대한임베디드공학회논문지
    • /
    • 제15권6호
    • /
    • pp.307-316
    • /
    • 2020
  • Machine learning model data is highly affected by performance. preprocessing is needed to enable analysis of various types of data, such as letters, numbers, and special characters. This paper proposes a development environment that aims to process categorical and continuous data according to the type of missing values in stage 1, implementing the function of selecting the best performing algorithm in stage 2 and automating the process of checking model performance in stage 3. Using this model, machine learning models can be created without prior knowledge of data preprocessing.

한국어 트위터의 감정 분류를 위한 기계학습의 실증적 비교 (An Empirical Comparison of Machine Learning Models for Classifying Emotions in Korean Twitter)

  • 임좌상;김진만
    • 한국멀티미디어학회논문지
    • /
    • 제17권2호
    • /
    • pp.232-239
    • /
    • 2014
  • 온라인에서의 글쓰기가 늘어나면서, 기계학습을 통해 이를 분류하는 연구가 늘고 있다. 그럼에도 불구하고 한국어로 작성된 마이크로블로그를 대상으로 한 연구는 많지 않다. 또한 통계적으로 기계학습을 평가한 연구를 찾아보기 힘들다. 본 논문에서는 트위터를 대상으로, 표본을 추출하고, 형태소와 음절을 자질로 사용하여 기계학습에 따라 감정을 분류하였다. 그 결과 약 76%정도 트위터에 포함된 감정이 분류되었다. Support Vector Machine이 Na$\ddot{i}$ve Bayes보다 정확했고, 선형모델도 비구조적인 텍스트 처리에 비선형모델에 상응하는 정확성을 보였다. 또한 형태소가 음절 자질에 비해 높은 정확성을 보이지 않았다.

Investigation of pile group response to adjacent twin tunnel excavation utilizing machine learning

  • Su-Bin Kim;Dong-Wook Oh;Hyeon-Jun Cho;Yong-Joo Lee
    • Geomechanics and Engineering
    • /
    • 제38권5호
    • /
    • pp.517-528
    • /
    • 2024
  • For numerous tunnelling projects implemented in urban areas due to limited space, it is crucial to take into account the interaction between the foundation, ground, and tunnel. In predicting the deformation of piled foundations and the ground during twin tunnel excavation, it is essential to consider various factors. Therefore, this study derived a prediction model for pile group settlement using machine learning to analyze the importance of various factors that determine the settlement of piled foundations during twin tunnelling. Laboratory model tests and numerical analysis were utilized as input data for machine learning. The influence of each independent variable on the prediction model was analyzed. Machine learning techniques such as data preprocessing, feature engineering, and hyperparameter tuning were used to improve the performance of the prediction model. Machine learning models, employing Random Forest (RF), eXtreme Gradient Boosting (XGB), and Light Gradient Boosting Machine (LightGBM, LGB) algorithms, demonstrate enhanced performance after hyperparameter tuning, particularly with LGB achieving an R2 of 0.9782 and RMSE value of 0.0314. The feature importance in the prediction models was analyzed and PN was the highest at 65.04% for RF, 64.81% for XGB, and PCTC (distance between the center of piles) was the highest at 31.32% for LGB. SHAP was utilized for analyzing the impact of each variable. PN (the number of piles) consistently exerted the most influence on the prediction of pile group settlement across all models. The results from both laboratory model tests and numerical analysis revealed a reduction in ground displacement with varying pillar spacing in twin tunnels. However, upon further investigation through machine learning with additional variables, it was found that the number of piles has the most significant impact on ground displacement. Nevertheless, as this study is based on laboratory model testing, further research considering real field conditions is necessary. This study contributes to a better understanding of the complex interactions inherent in twin tunnelling projects and provides a reliable tool for predicting pile group settlement in such scenarios.

탄성파 자료 잡음 제거를 위한 비지도 학습 연구 (The Use of Unsupervised Machine Learning for the Attenuation of Seismic Noise)

  • 김수정;전형구
    • 지구물리와물리탐사
    • /
    • 제25권2호
    • /
    • pp.71-84
    • /
    • 2022
  • 탄성파 자료 취득 시 신호와 함께 기록되는 다양한 형태의 잡음은 탄성파 자료의 정확한 해석을 방해하는 요인으로 작용한다. 따라서 탄성파 자료의 잡음 제거는 탄성파 자료 처리 과정 중 필수적인 절차이므로 기계 학습을 포함한 다양한 방식의 잡음 제거 연구가 수행되고 있다. 본 연구에서는 비지도 학습 기반의 탄성파 잡음 제거 모델을 이용하여 중합 전 탄성파 자료의 잡음 제거를 수행하고자 하였으며 총 세 가지의 비지도 학습 기반 기계 학습 모델을 비교하였다. 세 가지의 비지도 학습 모델은 N2NUNET, PATCHUNET, DDUL로 각각 서로 다른 신경망 구조를 통해 정답 자료 없이 탄성파 잡음을 제거한다. 세 가지 모델들을 인공 합성 및 현장 중합 전 탄성파 자료에 적용하여 잡음을 제거한 후 그 결과를 정성적·정량적으로 분석하였으며, 분석 결과 세 가지 비지도 학습 모델 모두 인공 합성 및 현장 자료의 탄성파 잡음을 적절히 제거하였음을 확인하였다. 그 중 N2NUNET 모델이 가장 낮은 잡음 제거 성능을 보여주었으며, PATCHUNET과 DDUL은 거의 유사한 결과를 도출하였지만, DDUL이 정량적으로 근소한 우위를 보였다.

Machine Learning을 이용한 무기 체계(or 구성품) 고장 유형 식별 (Identify the Failure Mode of Weapon System (or equipment) using Machine Learning)

  • 박연경;이혜원;김상문
    • 한국산학기술학회논문지
    • /
    • 제19권8호
    • /
    • pp.64-70
    • /
    • 2018
  • 무기 체계(or 구성품) 개발은 한정된 개발기간과 비용 등의 제한으로 시험 횟수가 많지 않아, 고장관련 축적된 데이터의 규모도 적다. 그러나 운용 중 발생한 고장 및 정비내역은 많은 부분 전산 데이터로 관리하고 있기 때문에 이를 활용한 무기 체계(or 구성품)의 고장원인 분석은 가능하다. 다만 다양한 무기체계의 고장 및 정비내역 작성 규격이 각 군 별, 업체별 상이하고, 고장 원인의 구체적 내역은 비정형 텍스트 데이터로 기술되어 있기 때문에 이를 분석하는데 어려움이 있었다. 그러나 오늘날 빅데이터 처리 기술과 기계학습(Machine Learning) 알고리즘의 발전, HW연산 능력의 개선과 맞물려, 상기와 같은 비정형 데이터를 처리 할 수 있는 여러 가지 방법들이 시도 되고 있으며, 주요한 연구 분야로 활발히 연구되고 있다. 본 논문에서는 국방 무기 체계(or 구성품)의 고장/정비 관련 비정형 데이터를 기계학습 기법 중 하나인 doc2vec을 적용하여 고장사례 분석 방안에 대하여 제시한다.

Estimation of compressive strength of BFS and WTRP blended cement mortars with machine learning models

  • Ozcan, Giyasettin;Kocak, Yilmaz;Gulbandilar, Eyyup
    • Computers and Concrete
    • /
    • 제19권3호
    • /
    • pp.275-282
    • /
    • 2017
  • The aim of this study is to build Machine Learning models to evaluate the effect of blast furnace slag (BFS) and waste tire rubber powder (WTRP) on the compressive strength of cement mortars. In order to develop these models, 12 different mixes with 288 specimens of the 2, 7, 28, and 90 days compressive strength experimental results of cement mortars containing BFS, WTRP and BFS+WTRP were used in training and testing by Random Forest, Ada Boost, SVM and Bayes classifier machine learning models, which implement standard cement tests. The machine learning models were trained with 288 data that acquired from experimental results. The models had four input parameters that cover the amount of Portland cement, BFS, WTRP and sample ages. Furthermore, it had one output parameter which is compressive strength of cement mortars. Experimental observations from compressive strength tests were compared with predictions of machine learning methods. In order to do predictive experimentation, we exploit R programming language and corresponding packages. During experimentation on the dataset, Random Forest, Ada Boost and SVM models have produced notable good outputs with higher coefficients of determination of R2, RMS and MAPE. Among the machine learning algorithms, Ada Boost presented the best R2, RMS and MAPE values, which are 0.9831, 5.2425 and 0.1105, respectively. As a result, in the model, the testing results indicated that experimental data can be estimated to a notable close extent by the model.

수질자료의 특성을 고려한 앙상블 머신러닝 모형 구축 및 설명가능한 인공지능을 이용한 모형결과 해석에 대한 연구 (Development of ensemble machine learning model considering the characteristics of input variables and the interpretation of model performance using explainable artificial intelligence)

  • 박정수
    • 상하수도학회지
    • /
    • 제36권4호
    • /
    • pp.239-248
    • /
    • 2022
  • The prediction of algal bloom is an important field of study in algal bloom management, and chlorophyll-a concentration(Chl-a) is commonly used to represent the status of algal bloom. In, recent years advanced machine learning algorithms are increasingly used for the prediction of algal bloom. In this study, XGBoost(XGB), an ensemble machine learning algorithm, was used to develop a model to predict Chl-a in a reservoir. The daily observation of water quality data and climate data was used for the training and testing of the model. In the first step of the study, the input variables were clustered into two groups(low and high value groups) based on the observed value of water temperature(TEMP), total organic carbon concentration(TOC), total nitrogen concentration(TN) and total phosphorus concentration(TP). For each of the four water quality items, two XGB models were developed using only the data in each clustered group(Model 1). The results were compared to the prediction of an XGB model developed by using the entire data before clustering(Model 2). The model performance was evaluated using three indices including root mean squared error-observation standard deviation ratio(RSR). The model performance was improved using Model 1 for TEMP, TN, TP as the RSR of each model was 0.503, 0.477 and 0.493, respectively, while the RSR of Model 2 was 0.521. On the other hand, Model 2 shows better performance than Model 1 for TOC, where the RSR was 0.532. Explainable artificial intelligence(XAI) is an ongoing field of research in machine learning study. Shapley value analysis, a novel XAI algorithm, was also used for the quantitative interpretation of the XGB model performance developed in this study.