• Title/Summary/Keyword: Text data

Search Result 2,959, Processing Time 0.03 seconds

Stock Price Prediction by Utilizing Category Neutral Terms: Text Mining Approach (카테고리 중립 단어 활용을 통한 주가 예측 방안: 텍스트 마이닝 활용)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.123-138
    • /
    • 2017
  • Since the stock market is driven by the expectation of traders, studies have been conducted to predict stock price movements through analysis of various sources of text data. In order to predict stock price movements, research has been conducted not only on the relationship between text data and fluctuations in stock prices, but also on the trading stocks based on news articles and social media responses. Studies that predict the movements of stock prices have also applied classification algorithms with constructing term-document matrix in the same way as other text mining approaches. Because the document contains a lot of words, it is better to select words that contribute more for building a term-document matrix. Based on the frequency of words, words that show too little frequency or importance are removed. It also selects words according to their contribution by measuring the degree to which a word contributes to correctly classifying a document. The basic idea of constructing a term-document matrix was to collect all the documents to be analyzed and to select and use the words that have an influence on the classification. In this study, we analyze the documents for each individual item and select the words that are irrelevant for all categories as neutral words. We extract the words around the selected neutral word and use it to generate the term-document matrix. The neutral word itself starts with the idea that the stock movement is less related to the existence of the neutral words, and that the surrounding words of the neutral word are more likely to affect the stock price movements. And apply it to the algorithm that classifies the stock price fluctuations with the generated term-document matrix. In this study, we firstly removed stop words and selected neutral words for each stock. And we used a method to exclude words that are included in news articles for other stocks among the selected words. Through the online news portal, we collected four months of news articles on the top 10 market cap stocks. We split the news articles into 3 month news data as training data and apply the remaining one month news articles to the model to predict the stock price movements of the next day. We used SVM, Boosting and Random Forest for building models and predicting the movements of stock prices. The stock market opened for four months (2016/02/01 ~ 2016/05/31) for a total of 80 days, using the initial 60 days as a training set and the remaining 20 days as a test set. The proposed word - based algorithm in this study showed better classification performance than the word selection method based on sparsity. This study predicted stock price volatility by collecting and analyzing news articles of the top 10 stocks in market cap. We used the term - document matrix based classification model to estimate the stock price fluctuations and compared the performance of the existing sparse - based word extraction method and the suggested method of removing words from the term - document matrix. The suggested method differs from the word extraction method in that it uses not only the news articles for the corresponding stock but also other news items to determine the words to extract. In other words, it removed not only the words that appeared in all the increase and decrease but also the words that appeared common in the news for other stocks. When the prediction accuracy was compared, the suggested method showed higher accuracy. The limitation of this study is that the stock price prediction was set up to classify the rise and fall, and the experiment was conducted only for the top ten stocks. The 10 stocks used in the experiment do not represent the entire stock market. In addition, it is difficult to show the investment performance because stock price fluctuation and profit rate may be different. Therefore, it is necessary to study the research using more stocks and the yield prediction through trading simulation.

A study on the Domestic Consumer's Perception of "Hansik" with Big Data Analysis : Using Text Mining and Semantic Network Analysis (빅데이터를 통한 내국인의 '한식' 인식 연구 : 텍스트마이닝과 의미연결망 중심으로)

  • Park, Kyeong-Won;Yun, Hee-Kyoung
    • Journal of the Korea Convergence Society
    • /
    • v.11 no.6
    • /
    • pp.145-151
    • /
    • 2020
  • 'Hansik', or Korean cuisine is one of Korea national brands. To understand the domestic consumer awareness of Korean cuisine, data was gathered under the keyword search, 'Hansik.' Textom 3.5 was used to gather data from blogs, news media found on Naver from November 1, 2018, to October 31, 2019. The results from frequency and TF-IDF analysis indicate that the 'buffet' had the largest proportion in terms of consumer awareness to Hansik. Also, broadcasting contents starring star chefs had a great influence. The Hansik awareness did not remain in the domains of its traditionality, but also branched into extents into areas such as fusional and gourmet cuisine. UCINET6 and NetDraw were used to conduct CONCOR analysis. Four cluster formations have been found; various food cultural cluster, high-end restaurant cluster referring to aired restaurants on media, Hansik brand cluster, and Hansik buffet cluster. This study proposes presenting a various menu of Hansik which use a multiple number of ingredients. Also, a promotion that introduces fine Hansik and a development of marketing views and media contents about the convenient HMRs make the associated imagery of Hansik to be strengthen.

A Study on the Development of Topic Map for Analysis of Customer Satisfaction in Tourism Industry (관광산업의 고객만족도 분석을 위한 토픽맵 개발에 관한 연구)

  • Kang, Min Shik
    • Journal of the Korea Convergence Society
    • /
    • v.8 no.10
    • /
    • pp.249-255
    • /
    • 2017
  • The domestic tourism industry mostly relies on quantitative surveys for customer satisfaction. However, customer participation of the questionnaires is extremely low and the improvement of the dissatisfactory factors is not being performed promptly. In this paper, we propose a new topic map system and prove its empirical effectiveness to improve the accuracy of customer feedback information and the efficiency of the analysis process. The topic map system is a system for analyzing large amounts of customer feedback data in real time. It uses text mining and ontology techniques by integrating data collected over a certain period from real-time SNS and quantitative data obtained from existing survey systems. The effect after improving the analyzed factors of dissatisfaction is also a new and innovative evaluation system for monitoring customer satisfaction in real time. The classification based on this integrated data is a classification system that is specific to the product or the customer. According to this classification, it is possible to measure the effect of the recognition and improvement of the complaint factor in real time on the topic map system. This provides a sophisticated prioritization of the improvement factors and enables customer satisfaction quality control as a PDCA feedback system. In addition, the survey period and costs are greatly shortened, and responses can be more precise to the existing survey method. As a practical application, this system is applied to the largest H travel agency in Korea to prove the accuracy and efficiency of the proposed system.

Analyzing Tasks in the Geometry Area of 7th Grade of Korean and US Textbooks from the Perspective of Mathematical Modeling (수학적 모델링 관점에 따른 한국과 미국의 중학교 1학년 교과서 기하 영역에 제시된 과제 분석)

  • Jung, Hye-Yun;Jung, Jin-Ho;Lee, Kyeong-Hwa
    • Journal of the Korean School Mathematics Society
    • /
    • v.23 no.2
    • /
    • pp.179-201
    • /
    • 2020
  • The purpose of this study is to analyze tasks reflected in Korean and US textbooks according to the mathematical modeling perspectives, and then to compare the diversity of learning opportunities given to students from both countries. For this, we analyzed mathematical modeling tasks of textbooks based on three aspects: mathematical modeling process, data, and expression. Results are as follows. First, with respect to modeling process, Korean textbook provides a high percentage of the task at all stages of modeling than US textbook. Second, with respect to data, both countries' textbooks have the highest percentage of matching task. Korean textbooks have a large gap in data characteristics by textbook. Third, with respect to expression, both countries' textbooks have the highest percentage of text and picture. Korean textbooks have a large gap in the type of expression than US textbooks, and some textbooks have no other expression except for text and picture. Fourth, tasks were analyzed by integrating the three features. The three features were not combined in various ways. It is necessary to diversify the integration of the three features.

A Method of Mining Visualization Rules from Open Online Text for Situation Aware Business Chart Recommendation (상황인식형 비즈니스 차트 추천기 개발을 위한 개방형 온라인 텍스트로부터의 시각화 규칙 추출 방법 연구)

  • Zhang, Qingxuan;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.83-107
    • /
    • 2020
  • Selecting business charts based on the nature of the data and the purpose of the visualization is useful in business analysis. However, current visualization tools lack the ability to help choose the right business chart for the context. Also, soliciting expert help about visualization methods for every analysis is inefficient. Therefore, the purpose of this study is to propose an accessible method to improve business chart productivity by creating rules for selecting business charts from online published documents. To this end, Korean, English, and Chinese unstructured data describing business charts were collected from the Internet, and the relationships between the contexts and the business charts were calculated using TF-IDF. We also used a Galois lattice to create rules for business chart selection. In order to evaluate the adequacy of the rules generated by the proposed method, experiments were conducted on experimental and control groups. The results confirmed that meaningful rules were extracted by the proposed method. To the best of our knowledge, this is the first study to recommend customizing business charts through open unstructured data analysis and to propose a method that enables efficient selection of business charts for office workers without expert assistance. This method should be useful for staff training by recommending business charts based on the document that he/she is working on.

A Recognition and Application Plan of Placenta Chamber of King Sejong's Princes by Big Data Analytical Technique (빅데이터 분석기법을 통한 성주(星州) 세종대왕자태실(世宗大王子胎室)의 인식 및 활용방안)

  • Lim, Jin-Kang;Park, Ji-Hwan
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.36 no.1
    • /
    • pp.78-88
    • /
    • 2018
  • The purpose of this study is to establish a utilization plan according to the cultural value of Placenta Chamber of King Sejong's Princes. We used SNS to analyze various public perceptions and opinions, collected data and analyzed it. The collection period is from June 01, 2007 to June 30, 2017 (for about 10 years), We gathered data from blogs, cafes, and Knowledge IN that contain keywords related to 'Placenta Chamber', 'Placenta Chamber of Seongju', 'Placenta Chamber of King Sejong's Princes'. and Analyzed using the text mining method of the big date program. Based on the main results of the big data analysis, Placenta Chamber's method of utilization was derived. As a result, major keywords such as King Sejong Great, Prince, Sungju, Feng Shui, culture, preservation, blessing etc were derived. The association of 'world', 'heritage', 'cultural heritage' is high, and the connection of 'Placenta Chamber', 'Gyeongsangbuk-do', 'cultural property' is high, and it was able to confirm the value of Placenta Chamber as a world cultural heritage. and It is necessary to induce visitors to feel stimulation or change of surroundings through facility refurbishment and environmental improvement around Placenta Chamber.

Research on Development of Support Tools for Local Government Business Transaction Operation Using Big Data Analysis Methodology (빅데이터 분석 방법론을 활용한 지방자치단체 단위과제 운영 지원도구 개발 연구)

  • Kim, Dabeen;Lee, Eunjung;Ryu, Hanjo
    • The Korean Journal of Archival Studies
    • /
    • no.70
    • /
    • pp.85-117
    • /
    • 2021
  • The purpose of this study is to investigate and analyze the current status of unit tasks, unit task operation, and record management problems used by local governments, and to present improvement measures using text-based big data technology based on the implications derived from the process. Local governments are in a serious state of record management operation due to errors in preservation period due to misclassification of unit tasks, inability to identify types of overcommon and institutional affairs, errors in unit tasks, errors in name, referenceable standards, and tools. However, the number of unit tasks is about 720,000, which cannot be effectively controlled due to excessive quantities, and thus strict and controllable tools and standards are needed. In order to solve these problems, this study developed a system that applies text-based analysis tools such as corpus and tokenization technology during big data analysis, and applied them to the names and construction terms constituting the record management standard. These unit task operation support tools are expected to contribute significantly to record management tasks as they can support standard operability such as uniform preservation period, identification of delegated office records, control of duplicate and similar unit task creation, and common tasks. Therefore, if the big data analysis methodology can be linked to BRM and RMS in the future, it is expected that the quality of the record management standard work will increase.

Analysis of Major COVID-19 Issues Using Unstructured Big Data (비정형 빅데이터를 이용한 COVID-19 주요 이슈 분석)

  • Kim, Jinsol;Shin, Donghoon;Kim, Heewoong
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.145-165
    • /
    • 2021
  • As of late December 2019, the spread of COVID-19 pandemic began which put the entire world in panic. In order to overcome the crisis and minimize any subsequent damage, the government as well as its affiliated institutions must maximize effects of pre-existing policy support and introduce a holistic response plan that can reflect this changing situation- which is why it is crucial to analyze social topics and people's interests. This study investigates people's major thoughts, attitudes and topics surrounding COVID-19 pandemic through the use of social media and big data. In order to collect public opinion, this study segmented time period according to government countermeasures. All data were collected through NAVER blog from 31 December 2019 to 12 December 2020. This research applied TF-IDF keyword extraction and LDA topic modeling as text-mining techniques. As a result, eight major issues related to COVID-19 have been derived, and based on these keywords, this research presented policy strategies. The significance of this study is that it provides a baseline data for Korean government authorities in providing appropriate countermeasures that can satisfy needs of people in the midst of COVID-19 pandemic.

A study on content strategy for long-term exposure of YouTube's 'Trending' (유튜브 '인기급상승' 장기 노출을 위한 콘텐츠 전략에 관한 연구)

  • Lee, Min-Young;Byun, Guk-Do;Choi, Sang-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.4
    • /
    • pp.359-372
    • /
    • 2022
  • This study aimed to derive a YouTube content strategy that can be exposed to Trending for a long time by comparing the features of 20 channels in the short/long term using 'YouTube Trending' data in 2021. First, through Pearson's correlation analysis, we found that various factors such as 'the number of title or tag letters' related to long-term exposure, and set this as an index to compare features. As a result, 1)'video title' of about 40-45 letters without excessive special characters, 2)'video length' within 10 minutes, 3)'Video description' is effective when writing 2-3 sentences and adding SNS information or including 3 key tags. Also, it would be more effective if you set key tag pairs such as (먹방, mukbang), (역대급, 레전드) derived through text mining. Through this, the channel will spread globally, bringing various advantages, and will be used as an indicator to evaluate the globality of the channel.

Big Data Analysis for Strategic Use of Urban Brands: Case Study Seoul city brand "I SEOUL U" (도시 브랜드의 전략적 활용을 위한 빅데이터 분석 : 서울시 도시 브랜드 "I SEOUL U" 사례)

  • Lim, Haewen
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.197-213
    • /
    • 2022
  • In this study, text mining analysis was performed on online big data for recognition and assessment of urban brand I Seoul U. To this end, TEXTOM, a processing program for data acquisition and analysis was used, and the 'I SEOUL U' keyword was selected as an analysis keyword. Keyword analysis shows the keywords associated with I Seoul U to be as follows: First, as a business and marketing term, keywords include pop-up store, gallery, co-branding, (festival, etc.), commodities, private companies and online. Second, as an event-related term, keywords include Han River, tree-planting day, tree planting, Hongdae, Christmas, Mapo, Jung-gu, Sejong University, and festival. Third, as a promotional term, keywords include robotics engineer Dr. Dennis Hong, Government, Art and Korea. In the N Gram analysis, as the city brand of Seoul, I Seoul U, in the public interest, was found to contribute to the commercial activities of private companies. In connection-oriented analysis, business and marketing, events, and promotions have been derived as categories. In matrix analysis, it was found that the products of the pop-up store are mainly developed, and products in the form of co-branding were being developed. In the topic modeling, a total of 10 topics were extracted and needs for commercial utilization and information for event festivals were mostly found.