• Title/Summary/Keyword: Rule based Systems

Search Result 1,053, Processing Time 0.027 seconds

Sentiment Analysis of Movie Review Using Integrated CNN-LSTM Mode (CNN-LSTM 조합모델을 이용한 영화리뷰 감성분석)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.141-154
    • /
    • 2019
  • Rapid growth of internet technology and social media is progressing. Data mining technology has evolved to enable unstructured document representations in a variety of applications. Sentiment analysis is an important technology that can distinguish poor or high-quality content through text data of products, and it has proliferated during text mining. Sentiment analysis mainly analyzes people's opinions in text data by assigning predefined data categories as positive and negative. This has been studied in various directions in terms of accuracy from simple rule-based to dictionary-based approaches using predefined labels. In fact, sentiment analysis is one of the most active researches in natural language processing and is widely studied in text mining. When real online reviews aren't available for others, it's not only easy to openly collect information, but it also affects your business. In marketing, real-world information from customers is gathered on websites, not surveys. Depending on whether the website's posts are positive or negative, the customer response is reflected in the sales and tries to identify the information. However, many reviews on a website are not always good, and difficult to identify. The earlier studies in this research area used the reviews data of the Amazon.com shopping mal, but the research data used in the recent studies uses the data for stock market trends, blogs, news articles, weather forecasts, IMDB, and facebook etc. However, the lack of accuracy is recognized because sentiment calculations are changed according to the subject, paragraph, sentiment lexicon direction, and sentence strength. This study aims to classify the polarity analysis of sentiment analysis into positive and negative categories and increase the prediction accuracy of the polarity analysis using the pretrained IMDB review data set. First, the text classification algorithm related to sentiment analysis adopts the popular machine learning algorithms such as NB (naive bayes), SVM (support vector machines), XGboost, RF (random forests), and Gradient Boost as comparative models. Second, deep learning has demonstrated discriminative features that can extract complex features of data. Representative algorithms are CNN (convolution neural networks), RNN (recurrent neural networks), LSTM (long-short term memory). CNN can be used similarly to BoW when processing a sentence in vector format, but does not consider sequential data attributes. RNN can handle well in order because it takes into account the time information of the data, but there is a long-term dependency on memory. To solve the problem of long-term dependence, LSTM is used. For the comparison, CNN and LSTM were chosen as simple deep learning models. In addition to classical machine learning algorithms, CNN, LSTM, and the integrated models were analyzed. Although there are many parameters for the algorithms, we examined the relationship between numerical value and precision to find the optimal combination. And, we tried to figure out how the models work well for sentiment analysis and how these models work. This study proposes integrated CNN and LSTM algorithms to extract the positive and negative features of text analysis. The reasons for mixing these two algorithms are as follows. CNN can extract features for the classification automatically by applying convolution layer and massively parallel processing. LSTM is not capable of highly parallel processing. Like faucets, the LSTM has input, output, and forget gates that can be moved and controlled at a desired time. These gates have the advantage of placing memory blocks on hidden nodes. The memory block of the LSTM may not store all the data, but it can solve the CNN's long-term dependency problem. Furthermore, when LSTM is used in CNN's pooling layer, it has an end-to-end structure, so that spatial and temporal features can be designed simultaneously. In combination with CNN-LSTM, 90.33% accuracy was measured. This is slower than CNN, but faster than LSTM. The presented model was more accurate than other models. In addition, each word embedding layer can be improved when training the kernel step by step. CNN-LSTM can improve the weakness of each model, and there is an advantage of improving the learning by layer using the end-to-end structure of LSTM. Based on these reasons, this study tries to enhance the classification accuracy of movie reviews using the integrated CNN-LSTM model.

Comparison of Deep Learning Frameworks: About Theano, Tensorflow, and Cognitive Toolkit (딥러닝 프레임워크의 비교: 티아노, 텐서플로, CNTK를 중심으로)

  • Chung, Yeojin;Ahn, SungMahn;Yang, Jiheon;Lee, Jaejoon
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.1-17
    • /
    • 2017
  • The deep learning framework is software designed to help develop deep learning models. Some of its important functions include "automatic differentiation" and "utilization of GPU". The list of popular deep learning framework includes Caffe (BVLC) and Theano (University of Montreal). And recently, Microsoft's deep learning framework, Microsoft Cognitive Toolkit, was released as open-source license, following Google's Tensorflow a year earlier. The early deep learning frameworks have been developed mainly for research at universities. Beginning with the inception of Tensorflow, however, it seems that companies such as Microsoft and Facebook have started to join the competition of framework development. Given the trend, Google and other companies are expected to continue investing in the deep learning framework to bring forward the initiative in the artificial intelligence business. From this point of view, we think it is a good time to compare some of deep learning frameworks. So we compare three deep learning frameworks which can be used as a Python library. Those are Google's Tensorflow, Microsoft's CNTK, and Theano which is sort of a predecessor of the preceding two. The most common and important function of deep learning frameworks is the ability to perform automatic differentiation. Basically all the mathematical expressions of deep learning models can be represented as computational graphs, which consist of nodes and edges. Partial derivatives on each edge of a computational graph can then be obtained. With the partial derivatives, we can let software compute differentiation of any node with respect to any variable by utilizing chain rule of Calculus. First of all, the convenience of coding is in the order of CNTK, Tensorflow, and Theano. The criterion is simply based on the lengths of the codes and the learning curve and the ease of coding are not the main concern. According to the criteria, Theano was the most difficult to implement with, and CNTK and Tensorflow were somewhat easier. With Tensorflow, we need to define weight variables and biases explicitly. The reason that CNTK and Tensorflow are easier to implement with is that those frameworks provide us with more abstraction than Theano. We, however, need to mention that low-level coding is not always bad. It gives us flexibility of coding. With the low-level coding such as in Theano, we can implement and test any new deep learning models or any new search methods that we can think of. The assessment of the execution speed of each framework is that there is not meaningful difference. According to the experiment, execution speeds of Theano and Tensorflow are very similar, although the experiment was limited to a CNN model. In the case of CNTK, the experimental environment was not maintained as the same. The code written in CNTK has to be run in PC environment without GPU where codes execute as much as 50 times slower than with GPU. But we concluded that the difference of execution speed was within the range of variation caused by the different hardware setup. In this study, we compared three types of deep learning framework: Theano, Tensorflow, and CNTK. According to Wikipedia, there are 12 available deep learning frameworks. And 15 different attributes differentiate each framework. Some of the important attributes would include interface language (Python, C ++, Java, etc.) and the availability of libraries on various deep learning models such as CNN, RNN, DBN, and etc. And if a user implements a large scale deep learning model, it will also be important to support multiple GPU or multiple servers. Also, if you are learning the deep learning model, it would also be important if there are enough examples and references.

Building a Korean Sentiment Lexicon Using Collective Intelligence (집단지성을 이용한 한글 감성어 사전 구축)

  • An, Jungkook;Kim, Hee-Woong
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.49-67
    • /
    • 2015
  • Recently, emerging the notion of big data and social media has led us to enter data's big bang. Social networking services are widely used by people around the world, and they have become a part of major communication tools for all ages. Over the last decade, as online social networking sites become increasingly popular, companies tend to focus on advanced social media analysis for their marketing strategies. In addition to social media analysis, companies are mainly concerned about propagating of negative opinions on social networking sites such as Facebook and Twitter, as well as e-commerce sites. The effect of online word of mouth (WOM) such as product rating, product review, and product recommendations is very influential, and negative opinions have significant impact on product sales. This trend has increased researchers' attention to a natural language processing, such as a sentiment analysis. A sentiment analysis, also refers to as an opinion mining, is a process of identifying the polarity of subjective information and has been applied to various research and practical fields. However, there are obstacles lies when Korean language (Hangul) is used in a natural language processing because it is an agglutinative language with rich morphology pose problems. Therefore, there is a lack of Korean natural language processing resources such as a sentiment lexicon, and this has resulted in significant limitations for researchers and practitioners who are considering sentiment analysis. Our study builds a Korean sentiment lexicon with collective intelligence, and provides API (Application Programming Interface) service to open and share a sentiment lexicon data with the public (www.openhangul.com). For the pre-processing, we have created a Korean lexicon database with over 517,178 words and classified them into sentiment and non-sentiment words. In order to classify them, we first identified stop words which often quite likely to play a negative role in sentiment analysis and excluded them from our sentiment scoring. In general, sentiment words are nouns, adjectives, verbs, adverbs as they have sentimental expressions such as positive, neutral, and negative. On the other hands, non-sentiment words are interjection, determiner, numeral, postposition, etc. as they generally have no sentimental expressions. To build a reliable sentiment lexicon, we have adopted a concept of collective intelligence as a model for crowdsourcing. In addition, a concept of folksonomy has been implemented in the process of taxonomy to help collective intelligence. In order to make up for an inherent weakness of folksonomy, we have adopted a majority rule by building a voting system. Participants, as voters were offered three voting options to choose from positivity, negativity, and neutrality, and the voting have been conducted on one of the largest social networking sites for college students in Korea. More than 35,000 votes have been made by college students in Korea, and we keep this voting system open by maintaining the project as a perpetual study. Besides, any change in the sentiment score of words can be an important observation because it enables us to keep track of temporal changes in Korean language as a natural language. Lastly, our study offers a RESTful, JSON based API service through a web platform to make easier support for users such as researchers, companies, and developers. Finally, our study makes important contributions to both research and practice. In terms of research, our Korean sentiment lexicon plays an important role as a resource for Korean natural language processing. In terms of practice, practitioners such as managers and marketers can implement sentiment analysis effectively by using Korean sentiment lexicon we built. Moreover, our study sheds new light on the value of folksonomy by combining collective intelligence, and we also expect to give a new direction and a new start to the development of Korean natural language processing.

The Behavior Analysis of Exhibition Visitors using Data Mining Technique at the KIDS & EDU EXPO for Children (유아교육 박람회에서 데이터마이닝 기법을 이용한 전시 관람 행동 패턴 분석)

  • Jung, Min-Kyu;Kim, Hyea-Kyeong;Choi, Il-Young;Lee, Kyoung-Jun;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.2
    • /
    • pp.77-96
    • /
    • 2011
  • An exhibition is defined as market events for specific duration to present exhibitors' main products to business or private visitors, and it plays a key role as effective marketing channels. As the importance of exhibition is getting more and more, domestic exhibition industry has achieved such a great quantitative growth. But, In contrast to the quantitative growth of domestic exhibition industry, the qualitative growth of Exhibition has not achieved competent growth. In order to improve the quality of exhibition, we need to understand the preference or behavior characteristics of visitors and to increase the level of visitors' attention and satisfaction through the understanding of visitors. So, in this paper, we used the observation survey method which is a kind of field research to understand visitors and collect the real data for the analysis of behavior pattern. And this research proposed the following methodology framework consisting of three steps. First step is to select a suitable exhibition to apply for our method. Second step is to implement the observation survey method. And we collect the real data for further analysis. In this paper, we conducted the observation survey method to obtain the real data of the KIDS & EDU EXPO for Children in SETEC. Our methodology was conducted on 160 visitors and 78 booths from November 4th to 6th in 2010. And, the last step is to analyze the record data through observation. In this step, we analyze the feature of exhibition using Demographic Characteristics collected by observation survey method at first. And then we analyze the individual booth features by the records of visited booth. Through the analysis of individual booth features, we can figure out what kind of events attract the attention of visitors and what kind of marketing activities affect the behavior pattern of visitors. But, since previous research considered only individual features influenced by exhibition, the research about the correlation among features is not performed much. So, in this research, additional analysis is carried out to supplement the existing research with data mining techniques. And we analyze the relation among booths using data mining techniques to know behavior patterns of visitors. Among data mining techniques, we make use of two data mining techniques, such as clustering analysis and ARM(Association Rule Mining) analysis. In clustering analysis, we use K-means algorithm to figure out the correlation among booths. Through data mining techniques, we figure out that there are two important features to affect visitors' behavior patterns in exhibition. One is the geographical features of booths. The other is the exhibit contents of booths. Those features are considered when the organizer of exhibition plans next exhibition. Therefore, the results of our analysis are expected to provide guideline to understanding visitors and some valuable insights for the exhibition from the earlier phases of exhibition planning. Also, this research would be a good way to increase the quality of visitor satisfaction. Visitors' movement paths, booth location, and distances between each booth are considered to plan next exhibition in advance. This research was conducted at the KIDS & EDU EXPO for Children in SETEC(Seoul Trade Exhibition & Convention), but it has some constraints to be applied directly to other exhibitions. Also, the results were derived from a limited number of data samples. In order to obtain more accurate and reliable results, it is necessary to conduct more experiments based on larger data samples and exhibitions on a variety of genres.

The Influence of the Restrictions in Chinese economic growth on Korean commercial environment (중국 경제성장의 제약요인이 한국 통상환경에 미치는 영향)

  • Shong, Il-Ho;Lee, Gye-Young
    • International Commerce and Information Review
    • /
    • v.15 no.4
    • /
    • pp.457-479
    • /
    • 2013
  • Through a Chinese rise, Chinese dream is actualizing as the world's great power. According to outlook of World Bank and IMF, Around 2030 China will be a great power bigger than America's economic power. The rise of China will give a huge impact to the whole world. China expands her influence through a global manufacturing base and a global market. To actualize 'Peaceful Rise' Strategy, China has many constraints. Chinese society is facing many difficult social problem due to side effects of a rapid development. Such as the spread of corruption, the severity of wealth gap, environmental degradation and energy shortage. Internationally there are containment from hegemon so-called 'China threat' dispute, Taiwan issue and territorial disputes. Western countries are hostile to China for two reasons. Based on expectations, one is China's socialist system and the other is the rising China which will compete for supremacy with Europe and America. Recent emergence of Chinese nationalism and the containment of the neighboring countries are also serious limiting factors. Domestically they have the rampant corruption in the bureaucracy, weakened capacity of Communist rule, wealth disparity due to the discriminatory economic development strategy, seriousness of rural problem, social instability, lack of social security systems and the development gap between the eastern coastal areas and western inland areas, ethnic minorities problems, the constraint of sustainable development issues due to lack of resources, environmental pollution and energy constraints. Like the former Soviet Union, China may face a dismantlement. After the rise, China may encounter possibilities of a war between great powers or a collapse of Chinese society caused by deepening internal conflict. Serious economic polarization would make peasants and urban workers, who are social vulnerable people, to turn their back to communist party and threaten the justification and the appropriateness of the ruling communist party. Chinese government will think internal system security threat is more formidable risk factor than a system security threat from the hegemon. The decline of great country comes from internal reasons rather than external reasons. To achieve peaceful rise, unification with Taiwan is an essential prerequisite. Taiwan issues are complex problems which equipped with international and domestic factors. Lack of energy resources, environmental pollution in China will bring economic crisis to Korean enterprises. Important influence to Korean economy will be a changeover of the method in economic development. It will turn the balance of investment and consumption, GDP-centered growth to consumption and environment-centered growth. Services industries including finance, environment, culture, education, health care and social welfare will grow. Change in China's growth model will give a great challenge upon the intermediate goods industry in Korea. Korea should reduce the portion of machinery, automotive, semiconductor, steel and chemical-centered export industry to China, and should increase the proportion of the service industry.

  • PDF

Optimal Monetary Policy System for Both Macroeconomics and Financial Stability (거시경제와 금융안정을 종합 고려한 최적 통화정책체계 연구)

  • Joonyoung Hur;Hyoung Seok Oh
    • KDI Journal of Economic Policy
    • /
    • v.46 no.1
    • /
    • pp.91-129
    • /
    • 2024
  • The Bank of Korea, through a legal amendment in 2011 following the financial crisis, was entrusted with the additional responsibility of financial stability beyond its existing mandate of price stability. Since then, concerns have been raised about the prolonged increase in household debt compared to income conditions, which could constrain consumption and growth and increase the possibility of a crisis in the event of negative economic shocks. The current accumulation of financial imbalances suggests a critical period for the government and central bank to be more vigilant, ensuring it does not impede the stable flow of our financial and economic systems. This study examines the applicability of the Integrated Inflation Targeting (IIT) framework proposed by the Bank for International Settlements (BIS) for macro-financial stability in promoting long-term economic stability. Using VAR models, the study reveals a clear increase in risk appetite following interest rate cuts after the financial crisis, leading to a rise in household debt. Additionally, analyzing the central bank's conduct of monetary policy from 2000 to 2021 through DSGE models indicates that the Bank of Korea has operated with a form of IIT, considering both inflation and growth in its policy decisions, with some responsiveness to the increase in household debt. However, the estimation of a high interest rate smoothing coefficient suggests a cautious approach to interest rate adjustments. Furthermore, estimating the optimal interest rate rule to minimize the central bank's loss function reveals that a policy considering inflation, growth, and being mindful of household credit conditions is superior. It suggests that the policy of actively adjusting the benchmark interest rate in response to changes in economic conditions and being attentive to household credit situations when household debt is increasing rapidly compared to income conditions has been analyzed as a desirable policy approach. Based on these findings, we conclude that the integrated inflation targeting framework proposed by the BIS could be considered as an alternative policy system that supports the stable growth of the economy in the medium to long term.

An Overview of the Rationale of Monetary and Banking Intervention: The Role of the Central Bank in Money and Banking Revisited (화폐(貨幣)·금융개입(金融介入)의 이론적(理論的) 근거(根據)에 대한 고찰(考察) : 중앙은행(中央銀行)의 존립근거(存立根據)에 대한 개관(槪觀))

  • Jwa, Sung-hee
    • KDI Journal of Economic Policy
    • /
    • v.12 no.3
    • /
    • pp.71-94
    • /
    • 1990
  • This paper reviews the rationale of monetary and banking intervention by an outside authority, either the government or the central bank, and seeks to delineate clearly the optimal limits to the monetary and banking deregulation currently underway in Korea as well as on a global scale. Furthermore, this paper seeks to establish an objective and balanced view on the role of the central bank, especially in light of the current discussion on the restructuring of Korea's central bank, which has been severely contaminated by interest-group politics. The discussion begins with the recognition that the modern free banking school and the new monetary economics are becoming formidable challenges to the traditional role of the government or the central bank in the monetary and banking sector. The paper reviews six arguments that have traditionally been presented to support intervention: (1) the possibility of an over-issue of bank notes under free banking instead of central banking; (2) externalities in and the public good nature of the use of money; (3) economies of scale and natural monopoly in producing money; (4) the need for macro stabilization policy due to the instability of the real sector; (5) the external effects of bank failure due to the inherent instability of the existing banking system; and (6) protection for small banknote users and depositors. Based on an analysis of the above arguments, the paper speculates on the optimal role of the government or central bank in the monetary and banking system and the optimal degree of monetary and banking deregulation. By contrast to the arguments for free banking or laissez-faire monetary systems, which become fashionable in recent years, monopoly and intervention by the government or central bank in the outside money system can be both necessary and optimal. In this case, of course, an over-issue of fiat money may be possible due to political considerations, but this issue is beyond the scope of this paper. On the other hand, the issue of inside monies based on outside money could indeed be provided for optimally under market competition by private institutions. A competitive system in issuing inside monies would help realize, to the maxim urn extent possible, external economies generated by using a single outside money. According to this reasoning, free banking activities will prevail in the inside money system, while a government monopoly will prevail in the outside money system. This speculation, then, also implies that the monetary and banking deregulation currently underway should and most likely will be limited to the inside money system, which could be liberalized to the fullest degree. It is also implied that it will be impractical to deregulate the outside money system and to allow market competition to provide outside money, in accordance with the arguments of the free banking school and the new monetary economics. Furthermore, the role of the government or central bank in this new environment will not be significantly different from their current roles. As far as the supply of fiat money continues to be monopolized by the government, the control of the supply of base money and such related responsibilities as monetary policy (argument(4)) and the lender of the last resort (argument (5)) will naturally be assigned to the outside money supplier. However, a mechanism for controlling an over-issue of fiat money by a monopolistic supplier will definitely be called for (argument(1)). A monetary policy based on a certain policy rule could be one possibility. More importantly, the deregulation of the inside money system would further increase the systemic risk inherent in the current fractional banking system, while enhancing the efficiency of the system (argument (5)). In this context, the role of the lender of the last resort would again become an instrument of paramount importance in alleviating liquidity crises in the early stages, thereby disallowing the possibility of a widespread bank run. Similarly, prudential banking supervision would also help maintain the safety and soundness of the fully deregulated banking system. These functions would also help protect depositors from losses due to bank failures (argument (6)). Finally, these speculations suggest that government or central bank authorities have probably been too conservative on the issue of the deregulation of the financial system, beyond the caution necessary to preserve system safety. Rather, only the fullest deregulation of the inside money system seems to guarantee the maximum enjoyment of external economies in the single outside money system.

  • PDF

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

  • Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.137-152
    • /
    • 2012
  • In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.

A Study on the Establishment of Comparison System between the Statement of Military Reports and Related Laws (군(軍) 보고서 등장 문장과 관련 법령 간 비교 시스템 구축 방안 연구)

  • Jung, Jiin;Kim, Mintae;Kim, Wooju
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.109-125
    • /
    • 2020
  • The Ministry of National Defense is pushing for the Defense Acquisition Program to build strong defense capabilities, and it spends more than 10 trillion won annually on defense improvement. As the Defense Acquisition Program is directly related to the security of the nation as well as the lives and property of the people, it must be carried out very transparently and efficiently by experts. However, the excessive diversification of laws and regulations related to the Defense Acquisition Program has made it challenging for many working-level officials to carry out the Defense Acquisition Program smoothly. It is even known that many people realize that there are related regulations that they were unaware of until they push ahead with their work. In addition, the statutory statements related to the Defense Acquisition Program have the tendency to cause serious issues even if only a single expression is wrong within the sentence. Despite this, efforts to establish a sentence comparison system to correct this issue in real time have been minimal. Therefore, this paper tries to propose a "Comparison System between the Statement of Military Reports and Related Laws" implementation plan that uses the Siamese Network-based artificial neural network, a model in the field of natural language processing (NLP), to observe the similarity between sentences that are likely to appear in the Defense Acquisition Program related documents and those from related statutory provisions to determine and classify the risk of illegality and to make users aware of the consequences. Various artificial neural network models (Bi-LSTM, Self-Attention, D_Bi-LSTM) were studied using 3,442 pairs of "Original Sentence"(described in actual statutes) and "Edited Sentence"(edited sentences derived from "Original Sentence"). Among many Defense Acquisition Program related statutes, DEFENSE ACQUISITION PROGRAM ACT, ENFORCEMENT RULE OF THE DEFENSE ACQUISITION PROGRAM ACT, and ENFORCEMENT DECREE OF THE DEFENSE ACQUISITION PROGRAM ACT were selected. Furthermore, "Original Sentence" has the 83 provisions that actually appear in the Act. "Original Sentence" has the main 83 clauses most accessible to working-level officials in their work. "Edited Sentence" is comprised of 30 to 50 similar sentences that are likely to appear modified in the county report for each clause("Original Sentence"). During the creation of the edited sentences, the original sentences were modified using 12 certain rules, and these sentences were produced in proportion to the number of such rules, as it was the case for the original sentences. After conducting 1 : 1 sentence similarity performance evaluation experiments, it was possible to classify each "Edited Sentence" as legal or illegal with considerable accuracy. In addition, the "Edited Sentence" dataset used to train the neural network models contains a variety of actual statutory statements("Original Sentence"), which are characterized by the 12 rules. On the other hand, the models are not able to effectively classify other sentences, which appear in actual military reports, when only the "Original Sentence" and "Edited Sentence" dataset have been fed to them. The dataset is not ample enough for the model to recognize other incoming new sentences. Hence, the performance of the model was reassessed by writing an additional 120 new sentences that have better resemblance to those in the actual military report and still have association with the original sentences. Thereafter, we were able to check that the models' performances surpassed a certain level even when they were trained merely with "Original Sentence" and "Edited Sentence" data. If sufficient model learning is achieved through the improvement and expansion of the full set of learning data with the addition of the actual report appearance sentences, the models will be able to better classify other sentences coming from military reports as legal or illegal. Based on the experimental results, this study confirms the possibility and value of building "Real-Time Automated Comparison System Between Military Documents and Related Laws". The research conducted in this experiment can verify which specific clause, of several that appear in related law clause is most similar to the sentence that appears in the Defense Acquisition Program-related military reports. This helps determine whether the contents in the military report sentences are at the risk of illegality when they are compared with those in the law clauses.

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.