• Title/Summary/Keyword: 평가 데이터셋

Search Result 463, Processing Time 0.019 seconds

A Study on the Establishment of Comparison System between the Statement of Military Reports and Related Laws (군(軍) 보고서 등장 문장과 관련 법령 간 비교 시스템 구축 방안 연구)

  • Jung, Jiin;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.109-125
    • /
    • 2020
  • The Ministry of National Defense is pushing for the Defense Acquisition Program to build strong defense capabilities, and it spends more than 10 trillion won annually on defense improvement. As the Defense Acquisition Program is directly related to the security of the nation as well as the lives and property of the people, it must be carried out very transparently and efficiently by experts. However, the excessive diversification of laws and regulations related to the Defense Acquisition Program has made it challenging for many working-level officials to carry out the Defense Acquisition Program smoothly. It is even known that many people realize that there are related regulations that they were unaware of until they push ahead with their work. In addition, the statutory statements related to the Defense Acquisition Program have the tendency to cause serious issues even if only a single expression is wrong within the sentence. Despite this, efforts to establish a sentence comparison system to correct this issue in real time have been minimal. Therefore, this paper tries to propose a "Comparison System between the Statement of Military Reports and Related Laws" implementation plan that uses the Siamese Network-based artificial neural network, a model in the field of natural language processing (NLP), to observe the similarity between sentences that are likely to appear in the Defense Acquisition Program related documents and those from related statutory provisions to determine and classify the risk of illegality and to make users aware of the consequences. Various artificial neural network models (Bi-LSTM, Self-Attention, D_Bi-LSTM) were studied using 3,442 pairs of "Original Sentence"(described in actual statutes) and "Edited Sentence"(edited sentences derived from "Original Sentence"). Among many Defense Acquisition Program related statutes, DEFENSE ACQUISITION PROGRAM ACT, ENFORCEMENT RULE OF THE DEFENSE ACQUISITION PROGRAM ACT, and ENFORCEMENT DECREE OF THE DEFENSE ACQUISITION PROGRAM ACT were selected. Furthermore, "Original Sentence" has the 83 provisions that actually appear in the Act. "Original Sentence" has the main 83 clauses most accessible to working-level officials in their work. "Edited Sentence" is comprised of 30 to 50 similar sentences that are likely to appear modified in the county report for each clause("Original Sentence"). During the creation of the edited sentences, the original sentences were modified using 12 certain rules, and these sentences were produced in proportion to the number of such rules, as it was the case for the original sentences. After conducting 1 : 1 sentence similarity performance evaluation experiments, it was possible to classify each "Edited Sentence" as legal or illegal with considerable accuracy. In addition, the "Edited Sentence" dataset used to train the neural network models contains a variety of actual statutory statements("Original Sentence"), which are characterized by the 12 rules. On the other hand, the models are not able to effectively classify other sentences, which appear in actual military reports, when only the "Original Sentence" and "Edited Sentence" dataset have been fed to them. The dataset is not ample enough for the model to recognize other incoming new sentences. Hence, the performance of the model was reassessed by writing an additional 120 new sentences that have better resemblance to those in the actual military report and still have association with the original sentences. Thereafter, we were able to check that the models' performances surpassed a certain level even when they were trained merely with "Original Sentence" and "Edited Sentence" data. If sufficient model learning is achieved through the improvement and expansion of the full set of learning data with the addition of the actual report appearance sentences, the models will be able to better classify other sentences coming from military reports as legal or illegal. Based on the experimental results, this study confirms the possibility and value of building "Real-Time Automated Comparison System Between Military Documents and Related Laws". The research conducted in this experiment can verify which specific clause, of several that appear in related law clause is most similar to the sentence that appears in the Defense Acquisition Program-related military reports. This helps determine whether the contents in the military report sentences are at the risk of illegality when they are compared with those in the law clauses.

Application of Support Vector Regression for Improving the Performance of the Emotion Prediction Model (감정예측모형의 성과개선을 위한 Support Vector Regression 응용)

  • Kim, Seongjin;Ryoo, Eunchung;Jung, Min Kyu;Kim, Jae Kyeong;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.185-202
    • /
    • 2012
  • .Since the value of information has been realized in the information society, the usage and collection of information has become important. A facial expression that contains thousands of information as an artistic painting can be described in thousands of words. Followed by the idea, there has recently been a number of attempts to provide customers and companies with an intelligent service, which enables the perception of human emotions through one's facial expressions. For example, MIT Media Lab, the leading organization in this research area, has developed the human emotion prediction model, and has applied their studies to the commercial business. In the academic area, a number of the conventional methods such as Multiple Regression Analysis (MRA) or Artificial Neural Networks (ANN) have been applied to predict human emotion in prior studies. However, MRA is generally criticized because of its low prediction accuracy. This is inevitable since MRA can only explain the linear relationship between the dependent variables and the independent variable. To mitigate the limitations of MRA, some studies like Jung and Kim (2012) have used ANN as the alternative, and they reported that ANN generated more accurate prediction than the statistical methods like MRA. However, it has also been criticized due to over fitting and the difficulty of the network design (e.g. setting the number of the layers and the number of the nodes in the hidden layers). Under this background, we propose a novel model using Support Vector Regression (SVR) in order to increase the prediction accuracy. SVR is an extensive version of Support Vector Machine (SVM) designated to solve the regression problems. The model produced by SVR only depends on a subset of the training data, because the cost function for building the model ignores any training data that is close (within a threshold ${\varepsilon}$) to the model prediction. Using SVR, we tried to build a model that can measure the level of arousal and valence from the facial features. To validate the usefulness of the proposed model, we collected the data of facial reactions when providing appropriate visual stimulating contents, and extracted the features from the data. Next, the steps of the preprocessing were taken to choose statistically significant variables. In total, 297 cases were used for the experiment. As the comparative models, we also applied MRA and ANN to the same data set. For SVR, we adopted '${\varepsilon}$-insensitive loss function', and 'grid search' technique to find the optimal values of the parameters like C, d, ${\sigma}^2$, and ${\varepsilon}$. In the case of ANN, we adopted a standard three-layer backpropagation network, which has a single hidden layer. The learning rate and momentum rate of ANN were set to 10%, and we used sigmoid function as the transfer function of hidden and output nodes. We performed the experiments repeatedly by varying the number of nodes in the hidden layer to n/2, n, 3n/2, and 2n, where n is the number of the input variables. The stopping condition for ANN was set to 50,000 learning events. And, we used MAE (Mean Absolute Error) as the measure for performance comparison. From the experiment, we found that SVR achieved the highest prediction accuracy for the hold-out data set compared to MRA and ANN. Regardless of the target variables (the level of arousal, or the level of positive / negative valence), SVR showed the best performance for the hold-out data set. ANN also outperformed MRA, however, it showed the considerably lower prediction accuracy than SVR for both target variables. The findings of our research are expected to be useful to the researchers or practitioners who are willing to build the models for recognizing human emotions.

A Topic Modeling-based Recommender System Considering Changes in User Preferences (고객 선호 변화를 고려한 토픽 모델링 기반 추천 시스템)

  • Kang, So Young;Kim, Jae Kyeong;Choi, Il Young;Kang, Chang Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.43-56
    • /
    • 2020
  • Recommender systems help users make the best choice among various options. Especially, recommender systems play important roles in internet sites as digital information is generated innumerable every second. Many studies on recommender systems have focused on an accurate recommendation. However, there are some problems to overcome in order for the recommendation system to be commercially successful. First, there is a lack of transparency in the recommender system. That is, users cannot know why products are recommended. Second, the recommender system cannot immediately reflect changes in user preferences. That is, although the preference of the user's product changes over time, the recommender system must rebuild the model to reflect the user's preference. Therefore, in this study, we proposed a recommendation methodology using topic modeling and sequential association rule mining to solve these problems from review data. Product reviews provide useful information for recommendations because product reviews include not only rating of the product but also various contents such as user experiences and emotional state. So, reviews imply user preference for the product. So, topic modeling is useful for explaining why items are recommended to users. In addition, sequential association rule mining is useful for identifying changes in user preferences. The proposed methodology is largely divided into two phases. The first phase is to create user profile based on topic modeling. After extracting topics from user reviews on products, user profile on topics is created. The second phase is to recommend products using sequential rules that appear in buying behaviors of users as time passes. The buying behaviors are derived from a change in the topic of each user. A collaborative filtering-based recommendation system was developed as a benchmark system, and we compared the performance of the proposed methodology with that of the collaborative filtering-based recommendation system using Amazon's review dataset. As evaluation metrics, accuracy, recall, precision, and F1 were used. For topic modeling, collapsed Gibbs sampling was conducted. And we extracted 15 topics. Looking at the main topics, topic 1, top 3, topic 4, topic 7, topic 9, topic 13, topic 14 are related to "comedy shows", "high-teen drama series", "crime investigation drama", "horror theme", "British drama", "medical drama", "science fiction drama", respectively. As a result of comparative analysis, the proposed methodology outperformed the collaborative filtering-based recommendation system. From the results, we found that the time just prior to the recommendation was very important for inferring changes in user preference. Therefore, the proposed methodology not only can secure the transparency of the recommender system but also can reflect the user's preferences that change over time. However, the proposed methodology has some limitations. The proposed methodology cannot recommend product elaborately if the number of products included in the topic is large. In addition, the number of sequential patterns is small because the number of topics is too small. Therefore, future research needs to consider these limitations.