• Title/Summary/Keyword: 의사결정나무 분석

Search Result 407, Processing Time 0.033 seconds

The Detection of Online Manipulated Reviews Using Machine Learning and GPT-3 (기계학습과 GPT3를 시용한 조작된 리뷰의 탐지)

  • Chernyaeva, Olga;Hong, Taeho
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.347-364
    • /
    • 2022
  • Fraudulent companies or sellers strategically manipulate reviews to influence customers' purchase decisions; therefore, the reliability of reviews has become crucial for customer decision-making. Since customers increasingly rely on online reviews to search for more detailed information about products or services before purchasing, many researchers focus on detecting manipulated reviews. However, the main problem in detecting manipulated reviews is the difficulties with obtaining data with manipulated reviews to utilize machine learning techniques with sufficient data. Also, the number of manipulated reviews is insufficient compared with the number of non-manipulated reviews, so the class imbalance problem occurs. The class with fewer examples is under-represented and can hamper a model's accuracy, so machine learning methods suffer from the class imbalance problem and solving the class imbalance problem is important to build an accurate model for detecting manipulated reviews. Thus, we propose an OpenAI-based reviews generation model to solve the manipulated reviews imbalance problem, thereby enhancing the accuracy of manipulated reviews detection. In this research, we applied the novel autoregressive language model - GPT-3 to generate reviews based on manipulated reviews. Moreover, we found that applying GPT-3 model for oversampling manipulated reviews can recover a satisfactory portion of performance losses and shows better performance in classification (logit, decision tree, neural networks) than traditional oversampling models such as random oversampling and SMOTE.

Analysis of Survivability for Combatants during Offensive Operations at the Tactical Level (전술제대 공격작전간 전투원 생존성에 관한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Kim, GakGyu
    • The Korean Journal of Applied Statistics
    • /
    • v.28 no.5
    • /
    • pp.921-932
    • /
    • 2015
  • This study analyzed military personnel survivability in regards to offensive operations according to the scientific military training data of a reinforced infantry battalion. Scientific battle training was conducted at the Korea Combat Training Center (KCTC) training facility and utilized scientific military training equipment that included MILES and the main exercise control system. The training audience freely engaged an OPFOR who is an expert at tactics and weapon systems. It provides a statistical analysis of data in regards to state-of-the-art military training because the scientific battle training system saves and utilizes all training zone data for analysis and after action review as well as offers training control during the training period. The methodologies used the Cox PH modeling (which does not require parametric distribution assumptions) and decision tree modeling for survival data such as CART, GUIDE, and CTREE for richer and easier interpretation. The variables that violate the PH assumption were stratified and analyzed. Since the Cox PH model result was not easy to interpret the period of service, additional interpretation was attempted through univariate local regression. CART, GUIDE, and CTREE formed different tree models which allow for various interpretations.

Trend Forecasting and Analysis of Quantum Computer Technology (양자 컴퓨터 기술 트렌드 예측과 분석)

  • Cha, Eunju;Chang, Byeong-Yun
    • Journal of the Korea Society for Simulation
    • /
    • v.31 no.3
    • /
    • pp.35-44
    • /
    • 2022
  • In this study, we analyze and forecast quantum computer technology trends. Previous research has been mainly focused on application fields centered on technology for quantum computer technology trends analysis. Therefore, this paper analyzes important quantum computer technologies and performs future signal detection and prediction, for a more market driven technical analysis and prediction. As analyzing words used in news articles to identify rapidly changing market changes and public interest. This paper extends conference presentation of Cha & Chang (2022). The research is conducted by collecting domestic news articles from 2019 to 2021. First, we organize the main keywords through text mining. Next, we explore future quantum computer technologies through analysis of Term Frequency - Inverse Document Frequency(TF-IDF), Key Issue Map(KIM), and Key Emergence Map (KEM). Finally, the relationship between future technologies and supply and demand is identified through random forests, decision trees, and correlation analysis. As results of the study, the interest in artificial intelligence was the highest in frequency analysis, keyword diffusion and visibility analysis. In terms of cyber-security, the rate of mention in news articles is getting overwhelmingly higher than that of other technologies. Quantum communication, resistant cryptography, and augmented reality also showed a high rate of increase in interest. These results show that the expectation is high for applying trend technology in the market. The results of this study can be applied to identifying areas of interest in the quantum computer market and establishing a response system related to technology investment.

Exploration of the Factors Determining Satisfaction in First Job of Graduates from Engineering College by Decision Tree Analysis (의사결정나무분석에 의한 공과대학 졸업생의 첫 일자리 만족도 결정요인 탐색)

  • Lee, Jiyeon;Lee, Yeongju
    • Journal of Engineering Education Research
    • /
    • v.24 no.1
    • /
    • pp.15-23
    • /
    • 2021
  • The first job of university graduates is the beginning of career development, and it has a great influence on a personal life in the transition process of a labor market later. The study has compared and analyzed the effects of major variables (whether or not one participates in the career and employment programs, the satisfaction in the education infrastructure and curriculum) related to university education that determine the satisfaction in the first job of graduates from the entire university of 4-year general courses and the engineering college with experiences of having the first job. Through this, it is meaningful to make it possible for the design of university education related to career and employment tailored to the engineering college. The results of 2017 Graduate Occupational Mobility Survey were used as the data for analysis, which was analyzed by the decision tree analysis. As it was found that the most important factor determining the satisfaction in the first job was student welfare facilities for the entire graduates among education infrastructure and was major curriculum and its content among education curriculum for the graduates from the engineering college, it was analyzed that factors related to majors were more important compared to other majors in the engineering college. The customized major curriculum and content should be considered as a priority, taking into account of the demand of the industry for the successful settlement of graduates from the engineering college in a labor market.

A study on the prediction of korean NPL market return (한국 NPL시장 수익률 예측에 관한 연구)

  • Lee, Hyeon Su;Jeong, Seung Hwan;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.123-139
    • /
    • 2019
  • The Korean NPL market was formed by the government and foreign capital shortly after the 1997 IMF crisis. However, this market is short-lived, as the bad debt has started to increase after the global financial crisis in 2009 due to the real economic recession. NPL has become a major investment in the market in recent years when the domestic capital market's investment capital began to enter the NPL market in earnest. Although the domestic NPL market has received considerable attention due to the overheating of the NPL market in recent years, research on the NPL market has been abrupt since the history of capital market investment in the domestic NPL market is short. In addition, decision-making through more scientific and systematic analysis is required due to the decline in profitability and the price fluctuation due to the fluctuation of the real estate business. In this study, we propose a prediction model that can determine the achievement of the benchmark yield by using the NPL market related data in accordance with the market demand. In order to build the model, we used Korean NPL data from December 2013 to December 2017 for about 4 years. The total number of things data was 2291. As independent variables, only the variables related to the dependent variable were selected for the 11 variables that indicate the characteristics of the real estate. In order to select the variables, one to one t-test and logistic regression stepwise and decision tree were performed. Seven independent variables (purchase year, SPC (Special Purpose Company), municipality, appraisal value, purchase cost, OPB (Outstanding Principle Balance), HP (Holding Period)). The dependent variable is a bivariate variable that indicates whether the benchmark rate is reached. This is because the accuracy of the model predicting the binomial variables is higher than the model predicting the continuous variables, and the accuracy of these models is directly related to the effectiveness of the model. In addition, in the case of a special purpose company, whether or not to purchase the property is the main concern. Therefore, whether or not to achieve a certain level of return is enough to make a decision. For the dependent variable, we constructed and compared the predictive model by calculating the dependent variable by adjusting the numerical value to ascertain whether 12%, which is the standard rate of return used in the industry, is a meaningful reference value. As a result, it was found that the hit ratio average of the predictive model constructed using the dependent variable calculated by the 12% standard rate of return was the best at 64.60%. In order to propose an optimal prediction model based on the determined dependent variables and 7 independent variables, we construct a prediction model by applying the five methodologies of discriminant analysis, logistic regression analysis, decision tree, artificial neural network, and genetic algorithm linear model we tried to compare them. To do this, 10 sets of training data and testing data were extracted using 10 fold validation method. After building the model using this data, the hit ratio of each set was averaged and the performance was compared. As a result, the hit ratio average of prediction models constructed by using discriminant analysis, logistic regression model, decision tree, artificial neural network, and genetic algorithm linear model were 64.40%, 65.12%, 63.54%, 67.40%, and 60.51%, respectively. It was confirmed that the model using the artificial neural network is the best. Through this study, it is proved that it is effective to utilize 7 independent variables and artificial neural network prediction model in the future NPL market. The proposed model predicts that the 12% return of new things will be achieved beforehand, which will help the special purpose companies make investment decisions. Furthermore, we anticipate that the NPL market will be liquidated as the transaction proceeds at an appropriate price.

The big data method for flash flood warning (돌발홍수 예보를 위한 빅데이터 분석방법)

  • Park, Dain;Yoon, Sanghoo
    • Journal of Digital Convergence
    • /
    • v.15 no.11
    • /
    • pp.245-250
    • /
    • 2017
  • Flash floods is defined as the flooding of intense rainfall over a relatively small area that flows through river and valley rapidly in short time with no advance warning. So that it can cause damage property and casuality. This study is to establish the flash-flood warning system using 38 accident data, reported from the National Disaster Information Center and Land Surface Model(TOPLATS) between 2009 and 2012. Three variables were used in the Land Surface Model: precipitation, soil moisture, and surface runoff. The three variables of 6 hours preceding flash flood were reduced to 3 factors through factor analysis. Decision tree, random forest, Naive Bayes, Support Vector Machine, and logistic regression model are considered as big data methods. The prediction performance was evaluated by comparison of Accuracy, Kappa, TP Rate, FP Rate and F-Measure. The best method was suggested based on reproducibility evaluation at the each points of flash flood occurrence and predicted count versus actual count using 4 years data.

Medical Services Specialization strategies of the Regional Public Hospital through Customer Segmentation (고객세분화를 통한 지방의료원의 의료서비스 전문화 전략)

  • Lee, Jin-Woo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.7
    • /
    • pp.4641-4650
    • /
    • 2015
  • This study aims to further strengthen the medical expertise to offer specialized medical care specialization strategies to gain a competitive edge through the customer segmentation of the Regional Public Hospital. Investigation period was selected to study the inpatients 26,658 people January to December 2013. The method of analysis are Cluster analysis and Decision Tree Analysis. In conclusion, female, age over 60, and diseases in musculoskeletal system and connective tissue were commonly selected as identifiers of the target market of Regional Public Hospital. Customers in this target market are loyal to specialized medical service and keeping continuous relationship with these customers through communication and monitoring of results of provided medical service would be important because the effect of word of mouth propagated to other group of customers having equivalent scale of consumption is expected. And the concentration of the scope of medical service of Regional Public Hospital and the collaboration and mutual reliance of medical service under the strategic alliance with other institutions and private hospitals are also needed.

Study on Soil Moisture Predictability using Machine Learning Technique (머신러닝 기법을 활용한 토양수분 예측 가능성 연구)

  • Jo, Bongjun;Choi, Wanmin;Kim, Youngdae;kim, Kisung;Kim, Jonggun
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2020.06a
    • /
    • pp.248-248
    • /
    • 2020
  • 토양수분은 증발산, 유출, 침투 등 물수지 요소들과 밀접한 연관이 있는 주요한 변수 중에 하나이다. 토양수분의 정도는 토양의 특성, 토지이용 형태, 기상 상태 등에 따라 공간적으로 상이하며, 특히 기상 상태에 따라 시간적 변동성을 보이고 있다. 기존 토양수분 측정은 토양시료 채취를 통한 실내 실험 측정과 측정 장비를 통한 현장 조사 방법이 있으나 시간적, 경제적 한계점이 있으며, 원격탐사 기법은 공간적으로 넓은 범위를 포함하지만 시간 해상도가 낮은 단점이 있다. 또한, 모델링을 통한 토양수분 예측 기술은 전문적인 지식이 요구되며, 복잡한 입력자료의 구축이 요구된다. 최근 머신러닝 기법은 수많은 자료 학습을 통해 사용자가 원하는 출력값을 도출하는데 널리 활용되고 있다. 이에 본 연구에서는 토양수분과 연관된 다양한 기상 인자들(강수량, 풍속, 습도 등)을 활용하여 머신러닝기법의 반복학습을 통한 토양수분의 예측 가능성을 분석하고자 한다. 이를 위해 시공간적으로 토양수분 실측 자료가 잘 구축되어 있는 청미천과 설마천 유역을 대상으로 머신러닝 기법을 적용하였다. 두 대상지에서 2008년~2012년 수문자료를 확보하였으며, 기상자료는 기상자료개방포털과 WAMIS를 통해 자료를 확보하였다. 토양수분 자료와 기상자료를 머신러닝 알고리즘을 통해 학습하고 2012년 기상 자료를 바탕으로 토양수분을 예측하였다. 사용되는 머신러닝 기법은 의사결정 나무(Decision Tree), 신경망(Multi Layer Perceptron, MLP), K-최근접 이웃(K-Nearest Neighbors, KNN), 서포트 벡터 머신(Support Vector Machine, SVM), 랜덤 포레스트(Random Forest), 그래디언트 부스팅 (Gradient Boosting)이다. 토양수분과 기상인자 간의 상관관계를 분석하기 위해 히트맵(Heat Map)을 이용하였다. 히트맵 분석 결과 토양수분의 시간적 변동은 다양한 기상 자료 중 강수량과 상대습도가 가장 큰 영향력을 보여주었다. 또한 다양한 기상 인자 기반 머신러닝 기법 적용 결과에서는 두 지역 모두 신경망(MLP) 기법을 제외한 모든 기법이 전반적으로 실측값과 유사한 형태를 보였으며 비교 그래프에서도 실측값과 예측 값이 유사한 추세를 나타냈다. 따라서 상관관계있는 과거 기상자료를 통해 머신러닝 기법 기반 토양수분의 시간적 변동 예측이 가능할 것으로 판단된다.

  • PDF

Comparison of factors affecting residential and residential environment satisfaction by region using the CART algorithm (CART 알고리즘을 이용한 지역별 주택 및 주거환경 만족도 영향 요인의 비교)

  • Jung su eun
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.4
    • /
    • pp.707-715
    • /
    • 2023
  • This study utilized CART algorithm, a decision tree analysis method, to comparatively analyze factors affecting housing and residential environment satisfaction by region using data from Ministry of Land, Infrastructure and Transport's housing survey in 2020. First, in terms of residential environment satisfaction, accessibility to medical facilities and school district showed higher importance in metropolitan cities and areas compared to other regions, whereas safety from accident showed the opposite trait, showing difference between region. Second, housing characteristics were important in housing satisfaction, indoor environment level satisfaction and indoor safety and hygiene being important in almost all regions, while residential environment characteristics were more important in residential environment satisfaction and influencing factors were relatively evenly distributed. In order to generalize these regional characteristics, research using time series data needs to be conducted later.

A Study on the Fraud Detection for Electronic Prepayment using Machine Learning (머신러닝을 이용한 선불전자지급수단의 이상금융거래 탐지 연구)

  • Choi, Byung-Ho;Cho, Nam-Wook
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.65-77
    • /
    • 2022
  • Due to the recent development in electronic financial services, transactions of electronic prepayment are rapidly growing, leading to growing fraud attempts. This paper proposes a methodology that can effectively detect fraud transactions in electronic prepayment by machine learning algorithms, including support vector machines, decision trees, and artificial neural networks. Actual transaction data of electronic prepayment services were collected and preprocessed to extract the most relevant variables from raw data. Two different approaches were explored in the paper. One is a transaction-based approach, and the other is a user ID-based approach. For the transaction-based approach, the first model is primarily based on raw data features, while the second model uses extra features in addition to the first model. The user ID-based approach also used feature engineering to extract and transform the most relevant features. Overall, the user ID-based approach showed a better performance than the transaction-based approach, where the artificial neural networks showed the best performance. The proposed method could be used to reduce the damage caused by financial accidents by detecting and blocking fraud attempts.