• Title/Summary/Keyword: Web Text Analysis

Search Result 278, Processing Time 0.027 seconds

Causal model analysis between quantity and quality for deriving ranking model of Online reviews (온라인리뷰의 랭킹모델링을 위한 양과 질의 인과모형 분석)

  • Lee, Changyong;Kim, Keunhyung
    • The Journal of Information Systems
    • /
    • v.28 no.1
    • /
    • pp.1-16
    • /
    • 2019
  • Purpose The purpose of this study is to analyze causal relationship between quantity and quality for deriving ranking model of Online reviews. Thus, we propose implications for deriving the ranking model for retrieving Online reviews more effectively. Design/methodology/approach We collected Online review from Tripadvisor web sites which might be a kind of world-famous tourism web sites. We transformed the natural text reviews to quantified data which consists of quantified positive opinions, quantified negative opinions, quantified modification opinions, reviews lengths and grade scores by using opinion mining technologies in R package. We executed corelation and regression analysis about the data. Findings According to the empirical analysis result, this study confirmed that the review length influenced positive opinion, negative opinion and modification opinion. We also confirmed that negative opinion and modification opinion influenced the grade score.

Employee's Discontent Text Analysis on Anonymous Company Review Web and Suggestions for Discontent Resolve (기업 리뷰 웹 사이트 텍스트 분석을 통한 직원 불만 표현 추출과 불만 원인 도출 및 해소 방안)

  • Baek, HyeYeon;Park, Yongsuk
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.23 no.4
    • /
    • pp.357-364
    • /
    • 2019
  • As industrial information disclosure by insider's rate is around 80%, most of relevant researches explain briefly its causes are discontent of salary or human resources system. This paper scrapes texts on Jobplanet, an anonymous company review website and analyzes discontent keyword by 7 related area and their contexts to find out more details on brief causes referred above. After drawing LGG (Local Grammar Graph) by each areas with related dictionary list, this paper shows an example of concordance as a proof and several ways for human resources leakage prevention. Finally, text analysis results are compared with previous researches based on survey with limited questions and answers. This study is meaningful to expand the scope of employee discontent analysis with company review text and provide more specific, granular and honest discontent vocabularies.

Investigating Dynamic Mutation Process of Issues Using Unstructured Text Analysis (비정형 텍스트 분석을 활용한 이슈의 동적 변이과정 고찰)

  • Lim, Myungsu;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.1-18
    • /
    • 2016
  • Owing to the extensive use of Web media and the development of the IT industry, a large amount of data has been generated, shared, and stored. Nowadays, various types of unstructured data such as image, sound, video, and text are distributed through Web media. Therefore, many attempts have been made in recent years to discover new value through an analysis of these unstructured data. Among these types of unstructured data, text is recognized as the most representative method for users to express and share their opinions on the Web. In this sense, demand for obtaining new insights through text analysis is steadily increasing. Accordingly, text mining is increasingly being used for different purposes in various fields. In particular, issue tracking is being widely studied not only in the academic world but also in industries because it can be used to extract various issues from text such as news, (SocialNetworkServices) to analyze the trends of these issues. Conventionally, issue tracking is used to identify major issues sustained over a long period of time through topic modeling and to analyze the detailed distribution of documents involved in each issue. However, because conventional issue tracking assumes that the content composing each issue does not change throughout the entire tracking period, it cannot represent the dynamic mutation process of detailed issues that can be created, merged, divided, and deleted between these periods. Moreover, because only keywords that appear consistently throughout the entire period can be derived as issue keywords, concrete issue keywords such as "nuclear test" and "separated families" may be concealed by more general issue keywords such as "North Korea" in an analysis over a long period of time. This implies that many meaningful but short-lived issues cannot be discovered by conventional issue tracking. Note that detailed keywords are preferable to general keywords because the former can be clues for providing actionable strategies. To overcome these limitations, we performed an independent analysis on the documents of each detailed period. We generated an issue flow diagram based on the similarity of each issue between two consecutive periods. The issue transition pattern among categories was analyzed by using the category information of each document. In this study, we then applied the proposed methodology to a real case of 53,739 news articles. We derived an issue flow diagram from the articles. We then proposed the following useful application scenarios for the issue flow diagram presented in the experiment section. First, we can identify an issue that actively appears during a certain period and promptly disappears in the next period. Second, the preceding and following issues of a particular issue can be easily discovered from the issue flow diagram. This implies that our methodology can be used to discover the association between inter-period issues. Finally, an interesting pattern of one-way and two-way transitions was discovered by analyzing the transition patterns of issues through category analysis. Thus, we discovered that a pair of mutually similar categories induces two-way transitions. In contrast, one-way transitions can be recognized as an indicator that issues in a certain category tend to be influenced by other issues in another category. For practical application of the proposed methodology, high-quality word and stop word dictionaries need to be constructed. In addition, not only the number of documents but also additional meta-information such as the read counts, written time, and comments of documents should be analyzed. A rigorous performance evaluation or validation of the proposed methodology should be performed in future works.

Unstructured Data Processing Using Keyword-Based Topic-Oriented Analysis (키워드 기반 주제중심 분석을 이용한 비정형데이터 처리)

  • Ko, Myung-Sook
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.11
    • /
    • pp.521-526
    • /
    • 2017
  • Data format of Big data is diverse and vast, and its generation speed is very fast, requiring new management and analysis methods, not traditional data processing methods. Textual mining techniques can be used to extract useful information from unstructured text written in human language in online documents on social networks. Identifying trends in the message of politics, economy, and culture left behind in social media is a factor in understanding what topics they are interested in. In this study, text mining was performed on online news related to a given keyword using topic - oriented analysis technique. We use Latent Dirichiet Allocation (LDA) to extract information from web documents and analyze which subjects are interested in a given keyword, and which topics are related to which core values are related.

Design and Implementation of Electronic Text Books in order to Utilize Regional Text Books for Social Studies (사회과 지역교과서 활용을 위한 전자교과서의 설계 및 구현)

  • Kang, Oh-Han;Park, Hui-Seong
    • The Journal of Korean Association of Computer Education
    • /
    • v.9 no.1
    • /
    • pp.19-28
    • /
    • 2006
  • In this paper, we have developed electronic textbooks for social studies centering on contents of a public educational process so that primary schools can use them as a text book. Also, we conducted a survey to find out how teachers perceived electronic textbooks in respect to site accessibility and utility, instructional design, progress of lesson, validity and accuracy of learning content, interface design, and web-based multimedia. In this paper, we presented a new model for electronic textbooks development, which is expected to be useful in developing electronic textbooks as a main text book, unlike other existing models. We applied the navigation utilizing book metaphors to the user interface, on the basis of the results from the analysis of the existing electronic textbooks. In addition, we provided affluent multi-media materials as well as hyperlink, a strong point of on-lines. Experimental results show that the academic achievement was high in knowledge-understanding areas and functional areas in the perspective of academic achievements of the learners.

  • PDF

A Study on Automatic Text Categorization of Web-Based Query Using Synonymy List (유사어 사전을 이용한 웹기반 질의문의 자동 범주화에 관한 연구)

  • Nam, Young-Joon;Kim, Gyu-Hwan
    • Journal of Information Management
    • /
    • v.35 no.4
    • /
    • pp.81-105
    • /
    • 2004
  • In this study, the way of the automatic text categorization on web-based query was implemented. X2 methods based on the Supported Vector Machine were used to test the efficiency of text categorization on queries. This test is carried out by the model using the Synonymy List. 713 synonyms were extracted manually from the tested documents. As the result of this test, the precision ratio and the recall ratio were decreased by -0.01% and by 8.53%, respectively whether the synonyms were assigned or not. It also shows that the Value of F1 Measure was increased by 4.58%. The standard deviation between the recall and precision ratio was improve by 18.39%.

Research Trends Investigation Using Text Mining Techniques: Focusing on Social Network Services (텍스트마이닝을 활용한 연구동향 분석: 소셜네트워크서비스를 중심으로)

  • Yoon, Hyejin;Kim, Chang-Sik;Kwahk, Kee-Young
    • Journal of Digital Contents Society
    • /
    • v.19 no.3
    • /
    • pp.513-519
    • /
    • 2018
  • The objective of this study was to examine the trends on social network services. The abstracts of 308 articles were extracted from web of science database published between 1994 and 2016. Time series analysis and topic modeling of text mining were implemented. The topic modeling results showed that the research topics were mainly 20 topics: trust, support, satisfaction model, organization governance, mobile system, internet marketing, college student effect, opinion diffusion, customer, information privacy, health care, web collaboration, method, learning effectiveness, knowledge, individual theory, child support, algorithm, media participation, and context system. The time series regression results indicated that trust, support satisfaction model, and remains of the topics were hot topics. This study also provided suggestions for future research.

The Analysis of Research Trends in Technology to the Fourth Industrial Revolution using SNA (소셜 네트워크 분석을 이용한 4차 산업혁명 기술 분야의 연구 동향 분석)

  • Kim, Hong-Gwang;Ahn, Jong-Wook
    • Journal of Cadastre & Land InformatiX
    • /
    • v.49 no.1
    • /
    • pp.113-121
    • /
    • 2019
  • The fourth industrial revolution technology focused on the fusion of infrastructure and various advanced technologies related city. Therefore, technical cooperation in various fields of research is essential. In order to activating the fourth industrial revolution technologies, it is necessary to research the state of technology in various fields. Consequently, this paper aims to analysis of domestic and foreign research trends on technology to the fourth industrial revolution using SNA and text mining for web site. We collected text, date data of research paper and report in web site for five years, that is, from January 1st in 2014 to December 31st in 2018. Next, we have deduced the major keywords in public data through analyzing the morphemes. Then we have analyzed the core and related keyword lists through an SNA. In Korea, the focus is on R&D and legal/institutional solution in relation to the fourth industrial revolution technology. On the other hand, in the case of foreign, there was focus on practical technologies for urban services in detail aspects.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

Ontology Construction of Technological Knowledge for R&D Trend Analysis (연구 개발 트렌드 분석을 위한 기술 지식 온톨로지 구축)

  • Hwang, Mi-Nyeong;Lee, Seungwoo;Cho, Minhee;Kim, Soon Young;Choi, Sung-Pil;Jung, Hanmin
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.12
    • /
    • pp.35-45
    • /
    • 2012
  • Researchers and scientists spend huge amount of time in analyzing the previous studies and their results. In order to timely take the advantageous position, they usually analyze various resources such as paper, patents, and Web documents on recent research issues to preoccupy newly emerging technologies. However, it is difficult to select invest-worthy research fields out of huge corpus by using the traditional information search based on keywords and bibliographic information. In this paper, we propose a method for efficient creation, storage, and utilization of semantically relevant information among technologies, products and research agents extracted from 'big data' by using text mining. In order to implement the proposed method, we designed an ontology that creates technological knowledge for semantic web environment based on the relationships extracted by text mining techniques. The ontology was utilized for InSciTe Adaptive, a R&D trends analysis and forecast service which supports the search for the relevant technological knowledge.