• Title/Summary/Keyword: 텍스트 데이터 분석

Search Result 1,095, Processing Time 0.034 seconds

Stock-Index Invest Model Using News Big Data Opinion Mining (뉴스와 주가 : 빅데이터 감성분석을 통한 지능형 투자의사결정모형)

  • Kim, Yoo-Sin;Kim, Nam-Gyu;Jeong, Seung-Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.143-156
    • /
    • 2012
  • People easily believe that news and stock index are closely related. They think that securing news before anyone else can help them forecast the stock prices and enjoy great profit, or perhaps capture the investment opportunity. However, it is no easy feat to determine to what extent the two are related, come up with the investment decision based on news, or find out such investment information is valid. If the significance of news and its impact on the stock market are analyzed, it will be possible to extract the information that can assist the investment decisions. The reality however is that the world is inundated with a massive wave of news in real time. And news is not patterned text. This study suggests the stock-index invest model based on "News Big Data" opinion mining that systematically collects, categorizes and analyzes the news and creates investment information. To verify the validity of the model, the relationship between the result of news opinion mining and stock-index was empirically analyzed by using statistics. Steps in the mining that converts news into information for investment decision making, are as follows. First, it is indexing information of news after getting a supply of news from news provider that collects news on real-time basis. Not only contents of news but also various information such as media, time, and news type and so on are collected and classified, and then are reworked as variable from which investment decision making can be inferred. Next step is to derive word that can judge polarity by separating text of news contents into morpheme, and to tag positive/negative polarity of each word by comparing this with sentimental dictionary. Third, positive/negative polarity of news is judged by using indexed classification information and scoring rule, and then final investment decision making information is derived according to daily scoring criteria. For this study, KOSPI index and its fluctuation range has been collected for 63 days that stock market was open during 3 months from July 2011 to September in Korea Exchange, and news data was collected by parsing 766 articles of economic news media M company on web page among article carried on stock information>news>main news of portal site Naver.com. In change of the price index of stocks during 3 months, it rose on 33 days and fell on 30 days, and news contents included 197 news articles before opening of stock market, 385 news articles during the session, 184 news articles after closing of market. Results of mining of collected news contents and of comparison with stock price showed that positive/negative opinion of news contents had significant relation with stock price, and change of the price index of stocks could be better explained in case of applying news opinion by deriving in positive/negative ratio instead of judging between simplified positive and negative opinion. And in order to check whether news had an effect on fluctuation of stock price, or at least went ahead of fluctuation of stock price, in the results that change of stock price was compared only with news happening before opening of stock market, it was verified to be statistically significant as well. In addition, because news contained various type and information such as social, economic, and overseas news, and corporate earnings, the present condition of type of industry, market outlook, the present condition of market and so on, it was expected that influence on stock market or significance of the relation would be different according to the type of news, and therefore each type of news was compared with fluctuation of stock price, and the results showed that market condition, outlook, and overseas news was the most useful to explain fluctuation of news. On the contrary, news about individual company was not statistically significant, but opinion mining value showed tendency opposite to stock price, and the reason can be thought to be the appearance of promotional and planned news for preventing stock price from falling. Finally, multiple regression analysis and logistic regression analysis was carried out in order to derive function of investment decision making on the basis of relation between positive/negative opinion of news and stock price, and the results showed that regression equation using variable of market conditions, outlook, and overseas news before opening of stock market was statistically significant, and classification accuracy of logistic regression accuracy results was shown to be 70.0% in rise of stock price, 78.8% in fall of stock price, and 74.6% on average. This study first analyzed relation between news and stock price through analyzing and quantifying sensitivity of atypical news contents by using opinion mining among big data analysis techniques, and furthermore, proposed and verified smart investment decision making model that could systematically carry out opinion mining and derive and support investment information. This shows that news can be used as variable to predict the price index of stocks for investment, and it is expected the model can be used as real investment support system if it is implemented as system and verified in the future.

Methodology for Identifying Issues of User Reviews from the Perspective of Evaluation Criteria: Focus on a Hotel Information Site (사용자 리뷰의 평가기준 별 이슈 식별 방법론: 호텔 리뷰 사이트를 중심으로)

  • Byun, Sungho;Lee, Donghoon;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.23-43
    • /
    • 2016
  • As a result of the growth of Internet data and the rapid development of Internet technology, "big data" analysis has gained prominence as a major approach for evaluating and mining enormous data for various purposes. Especially, in recent years, people tend to share their experiences related to their leisure activities while also reviewing others' inputs concerning their activities. Therefore, by referring to others' leisure activity-related experiences, they are able to gather information that might guarantee them better leisure activities in the future. This phenomenon has appeared throughout many aspects of leisure activities such as movies, traveling, accommodation, and dining. Apart from blogs and social networking sites, many other websites provide a wealth of information related to leisure activities. Most of these websites provide information of each product in various formats depending on different purposes and perspectives. Generally, most of the websites provide the average ratings and detailed reviews of users who actually used products/services, and these ratings and reviews can actually support the decision of potential customers in purchasing the same products/services. However, the existing websites offering information on leisure activities only provide the rating and review based on one stage of a set of evaluation criteria. Therefore, to identify the main issue for each evaluation criterion as well as the characteristics of specific elements comprising each criterion, users have to read a large number of reviews. In particular, as most of the users search for the characteristics of the detailed elements for one or more specific evaluation criteria based on their priorities, they must spend a great deal of time and effort to obtain the desired information by reading more reviews and understanding the contents of such reviews. Although some websites break down the evaluation criteria and direct the user to input their reviews according to different levels of criteria, there exist excessive amounts of input sections that make the whole process inconvenient for the users. Further, problems may arise if a user does not follow the instructions for the input sections or fill in the wrong input sections. Finally, treating the evaluation criteria breakdown as a realistic alternative is difficult, because identifying all the detailed criteria for each evaluation criterion is a challenging task. For example, if a review about a certain hotel has been written, people tend to only write one-stage reviews for various components such as accessibility, rooms, services, or food. These might be the reviews for most frequently asked questions, such as distance between the nearest subway station or condition of the bathroom, but they still lack detailed information for these questions. In addition, in case a breakdown of the evaluation criteria was provided along with various input sections, the user might only fill in the evaluation criterion for accessibility or fill in the wrong information such as information regarding rooms in the evaluation criteria for accessibility. Thus, the reliability of the segmented review will be greatly reduced. In this study, we propose an approach to overcome the limitations of the existing leisure activity information websites, namely, (1) the reliability of reviews for each evaluation criteria and (2) the difficulty of identifying the detailed contents that make up the evaluation criteria. In our proposed methodology, we first identify the review content and construct the lexicon for each evaluation criterion by using the terms that are frequently used for each criterion. Next, the sentences in the review documents containing the terms in the constructed lexicon are decomposed into review units, which are then reconstructed by using the evaluation criteria. Finally, the issues of the constructed review units by evaluation criteria are derived and the summary results are provided. Apart from the derived issues, the review units are also provided. Therefore, this approach aims to help users save on time and effort, because they will only be reading the relevant information they need for each evaluation criterion rather than go through the entire text of review. Our proposed methodology is based on the topic modeling, which is being actively used in text analysis. The review is decomposed into sentence units rather than considering the whole review as a document unit. After being decomposed into individual review units, the review units are reorganized according to each evaluation criterion and then used in the subsequent analysis. This work largely differs from the existing topic modeling-based studies. In this paper, we collected 423 reviews from hotel information websites and decomposed these reviews into 4,860 review units. We then reorganized the review units according to six different evaluation criteria. By applying these review units in our methodology, the analysis results can be introduced, and the utility of proposed methodology can be demonstrated.

Design and Implementation of Produce Farming Field-Oriented Smart Pest Information Retrieval System based on Mobile for u-Farm (u-Farm을 위한 모바일 기반의 농작물 재배 현장 중심형 스마트 병해충 정보검색 시스템 설계 및 구현)

  • Kang, Ju-Hee;Jung, Se-Hoon;Nor, Sun-Sik;So, Won-Ho;Sim, Chun-Bo
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.10 no.10
    • /
    • pp.1145-1156
    • /
    • 2015
  • There is a shortage of mobile application systems readily applicable to the field of crop cultivation in relation to diseases and insect pests directly connected to the quality of crops. Most of system have been devoted to diseases and insect pests that would offer full predictions and basic information about diseases and insect pests currently. But for lack of the instant diagnostic functions seriously and the field of crop cultivation, we design and implement a crop cultivation field-oriented smart diseases and insect pests information retrieval system based on mobile for u-Farm. The proposed system had such advantages as providing information about diseases and insect pests in the field of crop cultivation and allowing the users to check the information with their smart-phones real-time based on the Lucene, a search library useful for the specialized analysis of images, and JSON data structure. And it was designed based on object-oriented modeling to increase its expandability and reusability. It was capable of search based on such image characteristic information as colors as well as the meta-information of crops and meta-information-based texts. The system was full of great merits including the implementation of u-Farm, the real-time check, and management of crop yields and diseases and insect pests by both the farmers and cultivation field managers.

Research on the Utilization of Recurrent Neural Networks for Automatic Generation of Korean Definitional Sentences of Technical Terms (기술 용어에 대한 한국어 정의 문장 자동 생성을 위한 순환 신경망 모델 활용 연구)

  • Choi, Garam;Kim, Han-Gook;Kim, Kwang-Hoon;Kim, You-eil;Choi, Sung-Pil
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.51 no.4
    • /
    • pp.99-120
    • /
    • 2017
  • In order to develop a semiautomatic support system that allows researchers concerned to efficiently analyze the technical trends for the ever-growing industry and market. This paper introduces a couple of Korean sentence generation models that can automatically generate definitional statements as well as descriptions of technical terms and concepts. The proposed models are based on a deep learning model called LSTM (Long Sort-Term Memory) capable of effectively labeling textual sequences by taking into account the contextual relations of each item in the sequences. Our models take technical terms as inputs and can generate a broad range of heterogeneous textual descriptions that explain the concept of the terms. In the experiments using large-scale training collections, we confirmed that more accurate and reasonable sentences can be generated by CHAR-CNN-LSTM model that is a word-based LSTM exploiting character embeddings based on convolutional neural networks (CNN). The results of this study can be a force for developing an extension model that can generate a set of sentences covering the same subjects, and furthermore, we can implement an artificial intelligence model that automatically creates technical literature.

e-Learning Contents Development as Social Negotiation Perspective: A Case Study of Program Development for the Public Sector Officials' Case Management (사회적 협상 관점의 e-Learning 콘텐츠 개발: 사례관리 담당 공무원을 위한 프로그램 개발 사례연구)

  • Kim, In-Sook;Jin, Sun-Mee
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.7
    • /
    • pp.519-527
    • /
    • 2011
  • The e-Learning program is a multimedia data program consisting of texts, images, animation, audio and video. The development of an e-Learning program requires time and is a complex process, requiring cooperation and open-communication between all parties involved, particularly in the event of a problem. This study will analyze the e-Learning contents development process from the Social Negotiation Perspective. An appropriate process for the development of the program and effective decision-making guidelines for those parties involved will be recommended. Participants' viewpoints regarding program development and guidelines were studied qualitatively, while the evaluation of developed content employed both qualitative and quantitative research. The study found the following results. First, the development of an e-Learning program requires a clear goal and purpose. Second, the target group must be clearly identified. Third, all parties involved must share in the development process and its outcomes. Fourth, the party requesting the program must allocate the appropriate time and budget for the development group. Finally, the project requires a strong, capable leadership for effective decision-making.

A Case Study on the Effectiveness of Major-friendly Contents in Software Education for the Non-majors (비전공자 소프트웨어 교육에서 전공맞춤형 학습 콘텐츠의 효과에 관한 사례 연구)

  • Seo, Joo-Young;Shin, Seung-Hun
    • Journal of Digital Convergence
    • /
    • v.18 no.5
    • /
    • pp.55-63
    • /
    • 2020
  • Recently, there is a strong interest in SW basic education for non-major students in universities, but SW non-majors are having a hard time learning. This paper proposes a class operation method that utilizes customized contents reflecting the interests of non-majors, rather than using existing learning contents for SW majors. The proposed method is to improve the education effects by increasing the learning motivation of SW non-majors. The paper shows a case study of A university, which has operated non-major SW basic education for more than five years. The case study analyzed the change of class satisfaction of students of pre- and post- learning group that reformed major-friendly contents about the same curriculum. As a result, the students of social sciences are interested in learning contents using public data that can examine the social and cultural phenomena of the country, and humanities students are interested in text contents such as novels, history books, and SNS articles. In addition to the understanding of the lectures, the class satisfaction was also greatly improved, and it showed that the major-friendly contents is useful for SW basic education of non-majors.

The Image of Ruralism in Korea through a Text Mining for Online News Media analysis (인터넷 뉴스 데이터 텍스트 분석을 통해 본 우리나라 농촌다움에 대한 이미지 연구)

  • Son, Yong-hoon;Kim, Young-jin
    • Journal of Korean Society of Rural Planning
    • /
    • v.25 no.4
    • /
    • pp.13-26
    • /
    • 2019
  • The rural areas in South Korea have changed rapidly in the process of national land development. Rural landscapes have become discoloured, and their attractiveness has decreased as cities have expanded. But the attractiveness or multifunctional values of rural areas has become more important in contemporary society around the world. According to this social demand, the efforts of conserving the rural landscape are of high priority and the recovery of ruralism in the area is required. This study has tried to understand how the public image of ruralism in South Korea has been influenced by the news media. The study retrieved news articles using the web searching portal site from the six keywords, commonly used to refer to ruralism, including 'rural landscape', 'rural community', 'rural tourism', 'rural life', 'rural amenity', and 'rural environment'. News data from the six keywords were also collected respectively from within the year-period of 2004-05, 2007-08, 2012-13, and 2016-17. In the text mining analysis, the nouns with high Degree Centrality were figured out, and the changes by year-period were identified. Then, LDA topic analysis was performed for text datasets of six keywords. As a result, the study found that the news articles gave an informed focus on only a handful of issues such as 'poor rural living condition', 'regional or village improvement projects', 'rural tourism promotion projects', and 'other government support projects'. On the other hand, nouns related to virtues and values in the rural landscape were less shown in news articles. These results have become more apparent in recent years. In the topic analysis, 35 topics were identified. 'village development projects', 'rural tourism', and 'urban-rural exchange projects' were appeared repeatedly in several keywords. Among the topics, there are also topics closely related to ruralism such as 'rural landscape conservation', 'eco-friendly rural areas', 'local amenity resources', 'public interest values of agriculture', and 'rural life and communities'. The study presented an image map showing ruralism in South Korea using a network map between all topics and keywords. At the end of the study, implications for Korean rural area policy and research directions were discussed.

Semi-automatic Construction of Learning Set and Integration of Automatic Classification for Academic Literature in Technical Sciences (기술과학 분야 학술문헌에 대한 학습집합 반자동 구축 및 자동 분류 통합 연구)

  • Kim, Seon-Wu;Ko, Gun-Woo;Choi, Won-Jun;Jeong, Hee-Seok;Yoon, Hwa-Mook;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.4
    • /
    • pp.141-164
    • /
    • 2018
  • Recently, as the amount of academic literature has increased rapidly and complex researches have been actively conducted, researchers have difficulty in analyzing trends in previous research. In order to solve this problem, it is necessary to classify information in units of academic papers. However, in Korea, there is no academic database in which such information is provided. In this paper, we propose an automatic classification system that can classify domestic academic literature into multiple classes. To this end, first, academic documents in the technical science field described in Korean were collected and mapped according to class 600 of the DDC by using K-Means clustering technique to construct a learning set capable of multiple classification. As a result of the construction of the training set, 63,915 documents in the Korean technical science field were established except for the values in which metadata does not exist. Using this training set, we implemented and learned the automatic classification engine of academic documents based on deep learning. Experimental results obtained by hand-built experimental set-up showed 78.32% accuracy and 72.45% F1 performance for multiple classification.

A Comparative Research on End-to-End Clinical Entity and Relation Extraction using Deep Neural Networks: Pipeline vs. Joint Models (심층 신경망을 활용한 진료 기록 문헌에서의 종단형 개체명 및 관계 추출 비교 연구 - 파이프라인 모델과 결합 모델을 중심으로 -)

  • Sung-Pil Choi
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.1
    • /
    • pp.93-114
    • /
    • 2023
  • Information extraction can facilitate the intensive analysis of documents by providing semantic triples which consist of named entities and their relations recognized in the texts. However, most of the research so far has been carried out separately for named entity recognition and relation extraction as individual studies, and as a result, the effective performance evaluation of the entire information extraction systems was not performed properly. This paper introduces two models of end-to-end information extraction that can extract various entity names in clinical records and their relationships in the form of semantic triples, namely pipeline and joint models and compares their performances in depth. The pipeline model consists of an entity recognition sub-system based on bidirectional GRU-CRFs and a relation extraction module using multiple encoding scheme, whereas the joint model was implemented with a single bidirectional GRU-CRFs equipped with multi-head labeling method. In the experiments using i2b2/VA 2010, the performance of the pipeline model was 5.5% (F-measure) higher. In addition, through a comparative experiment with existing state-of-the-art systems using large-scale neural language models and manually constructed features, the objective performance level of the end-to-end models implemented in this paper could be identified properly.

Analysis of major issues in the field of Maritime Autonomous Surface Ships using text mining: focusing on S.Korea news data (텍스트 마이닝을 활용한 자율운항선박 분야 주요 이슈 분석 : 국내 뉴스 데이터를 중심으로)

  • Hyeyeong Lee;Jin Sick Kim;Byung Soo Gu;Moon Ju Nam;Kook Jin Jang;Sung Won Han;Joo Yeoun Lee;Myoung Sug Chung
    • Journal of the Korean Society of Systems Engineering
    • /
    • v.20 no.spc1
    • /
    • pp.12-29
    • /
    • 2024
  • The purpose of this study is to identify the social issues discussed in Korea regarding Maritime Autonomous Surface Ships (MASS), the most advanced ICT field in the shipbuilding industry, and to suggest policy implications. In recent years, it has become important to reflect social issues of public interest in the policymaking process. For this reason, an increasing number of studies use media data and social media to identify public opinion. In this study, we collected 2,843 domestic media articles related to MASS from 2017 to 2022, when MASS was officially discussed at the International Maritime Organization, and analyzed them using text mining techniques. Through term frequency-inverse document frequency (TF-IDF) analysis, major keywords such as 'shipbuilding,' 'shipping,' 'US,' and 'HD Hyundai' were derived. For LDA topic modeling, we selected eight topics with the highest coherence score (-2.2) and analyzed the main news for each topic. According to the combined analysis of five years, the topics '1. Technology integration of the shipbuilding industry' and '3. Shipping industry in the post-COVID-19 era' received the most media attention, each accounting for 16%. Conversely, the topic '5. MASS pilotage areas' received the least media attention, accounting for 8 percent. Based on the results of the study, the implications for policy, society, and international security are as follows. First, from a policy perspective, the government should consider the current situation of each industry sector and introduce MASS in stages and carefully, as they will affect the shipbuilding, port, and shipping industries, and a radical introduction may cause various adverse effects. Second, from a social perspective, while the positive aspects of MASS are often reported, there are also negative issues such as cybersecurity issues and the loss of seafarer jobs, which require institutional development and strategic commercialization timing. Third, from a security perspective, MASS are expected to change the paradigm of future maritime warfare, and South Korea is promoting the construction of a maritime unmanned system-based power, but it emphasizes the need for a clear plan and military leadership to secure and develop the technology. This study has academic and policy implications by shedding light on the multidimensional political and social issues of MASS through news data analysis, and suggesting implications from national, regional, strategic, and security perspectives beyond legal and institutional discussions.