• Title/Summary/Keyword: Online mining

Search Result 398, Processing Time 0.027 seconds

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Analysis of Behavior of Seoullo 7017 Visitors - With a Focus on Text Mining and Social Network Analysis - (서울로 7017 방문자들의 이용행태 분석 -텍스트 마이닝과 소셜 네트워크 분석을 중심으로-)

  • Woo, Kyung-Sook;Suh, Joo-Hwan
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.48 no.6
    • /
    • pp.16-24
    • /
    • 2020
  • The purpose of this study is to analyze the usage behavior of Seoullo 7017, the first public garden in Korea, to understand the usage status by analyzing blogs, and to present usage behavior and improvement plans for Seoullo 7017. From June 2017 to May 2020, after Seoullo 7017 was open to citizens, character data containing 'Seoullo 7017' in the title and contents of NAVER and·DAUM blogs were converted to text mining and socialization, a Big Data technique. The analysis was conducted using social network analysis. The summary of the research results is as follows. First of all, the ratio of men and women searching for Seoullo 7017 online is similar, and the regions that searched most are in the order of Seoul and Gyeonggi, and those in their 40s and 50s were the most interested. In other words, it can be seen that there is a lack of interest in regions other than Seoul and Gyeonggi and among those in their 10s, 20s, and 30s. The main behaviors of Seoullo 7017 are' night view' and 'walking', and the factors that affect culture and art are elements related to culture and art. If various programs and festivals are opened and actively promoted, the main behavior will be more varied. On the other hand, the main behavior that the users of Seoullo 7017 want is 'sit', which is a static behavior, but the physical conditions are not sufficient for the behavior to occur. Therefore, facilities that can cause sitting behavior, such as shades and benches must be improved to meet the needs of visitors. The peculiarity of the change in the behavior of Seoullo 7017 is that it is recognized as a good place to travel alone and a good place to walk alone as a public multi-use facility and group activities are restricted due to COVID-19. Accordingly, in a situation like the COVD-19 pandemic, more diverse behaviors can be derived in facilities where people can take a walk, etc., and the increase of various attractions and the satisfaction of users can be increased. Seoullo 7017, as Korea's first public pedestrian area, was created for urban regeneration and the efficient use of urban resources in areas beyond the meaning of public spaces and is a place with various values such as history, nature, welfare, culture, and tourism. However, as a result of the use behavior analysis, various behaviors did not occur in Seoullo 7017 as expected, and elements that hinder those major behaviors were derived. Based on these research results, it is necessary to understand the usage behavior of Seoullo 7017 and to establish a plan for spatial system and facility improvement, so that Seoullo 7017 can be an important place for urban residents and a driving force to revitalize the city.

A Comparative Analysis of Cognitive Change about Big Data Using Social Media Data Analysis (소셜 미디어 데이터 분석을 활용한 빅데이터에 대한 인식 변화 비교 분석)

  • Yun, Youdong;Jo, Jaechoon;Hur, Yuna;Lim, Heuiseok
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.7
    • /
    • pp.371-378
    • /
    • 2017
  • Recently, with the spread of smart device and the introduction of web services, the data is rapidly increasing online, and it is utilized in various fields. In particular, the emergence of social media in the big data field has led to a rapid increase in the amount of unstructured data. In order to extract meaningful information from such unstructured data, interest in big data technology has increased in various fields. Big data is becoming a key resource in many areas. Big data's prospects for the future are positive, but concerns about data breaches and privacy are constantly being addressed. On this subject of big data, where positive and negative views coexist, the research of analyzing people's opinions currently lack. In this study, we compared the changes in peoples perception on big data based on unstructured data collected from the social media using a text mining. As a results, yearly keywords for domestic big data, declining positive opinions, and increasing negative opinions were observed. Based on these results, we could predict the flow of domestic big data.

The Analysis of Public Awareness about Literary Therapy by Utilizing Big Data Analysis - The aspects of convergence literature and statistics (빅데이터 분석을 통한 문학치료의 대중적 인지도 분석 - 국문학과 통계학의 융합적 측면)

  • Choi, Kyoung-Ho;Park, Jeong-Hye
    • Journal of Digital Convergence
    • /
    • v.13 no.4
    • /
    • pp.395-404
    • /
    • 2015
  • This study is exploring objective awareness of literary therapy by consideration of popular perception about literary therapy through analysis of big data. The purpose of this study is the deduction of meaning information through analysis in the viewpoint of big data at online social network service(SNS) about 'literary therapy'. Accordingly, the main way of research became content analysis of keyword linked to literary therapy by utilizing opinion mining method related to text mining. The study mainly grasped 'literary therapy' and analyzed 'bibliotherapy' comparatively. The period of study was from Oct. 10th to Nov. 10th, 2014(during 30 days), and SNS such as blog or twitter became the subject of search. Through the result of study analysis, the conclusion that the spread of literary therapeutic prospect, structural harmony of literary therapeutic field, and the solidity of perceptional axis about literary therapy are needed can be drawn. This study is worthwhile because it can investigate popular awareness about literary therapy and can suggest alternative for invigoration of literary therapy.

Development and Application of a Big Data Platform for Education Longitudinal Study Analysis (교육종단연구 분석을 위한 빅데이터 플랫폼 개발 및 적용)

  • Park, Jung;Cho, Wan-Sup
    • The Journal of Bigdata
    • /
    • v.5 no.1
    • /
    • pp.11-27
    • /
    • 2020
  • In this paper, we developed a big data platform to store, process, and analyze effectively on such education longitudinal study data. And it was applied to the Seoul Education Longitudinal Study(SELS) to confirm its usefulness. The developed platform consists of data preprocessing unit and data analysis unit. The data preprocessing unit 1) masking, 2) converts each item into a factor 3) normalizes / creates dummy variables 4) data derivation, and 5) data warehousing. The data analysis unit consists of OLAP and data mining(DM). In the multidimensional analysis, OLAP is performed after selecting a measure and designing a schema. The DM process involves variable selection, research model selection, data modification, parameter tuning, model training, model evaluation, and interpretation of the results. The data warehouse created through the preprocessing process on this platform can be shared by various researchers, and the continuous accumulation of data sets makes further analysis easier for subsequent researchers. In addition, policy-makers can access the SELS data warehouse directly and analyze it online through multi-dimensional analysis, enabling scientific decision making. To prove the usefulness of the developed platform, SELS data was built on the platform and OLAP and DM were performed by selecting the mathematics academic achievement as a measure, and various factors affecting the measurements were analyzed using DM techniques. This enabled us to quickly and effectively derive implications for data-based education policies.

Pandemics Era, A Study one the Viewers' Responses of Medical Drama through Text Mining. -Focused on - (팬데믹 시대, 텍스트 마이닝을 통한 의학드라마의 시청자 반응 연구-<슬기로운 의사생활>을 중심으로-)

  • Ahn, Sunghun;Oh, SeJong;Jeong, Dalyoung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.6 no.4
    • /
    • pp.385-389
    • /
    • 2020
  • The medical drama has developed into a story centered on 'people', raising viewers' sympathy. The story of the drama is the true life story of doctors, patients and families. It is also a story that reminds me of 'a little special day of our ordinary people'. And the song played and sung by five characters in the drama became a factor that stimulates nostalgia and increases immersion. The highest viewer rating was 14.1%, and 51,584 blogs alone were registered. According to the big data analysis, the related words were 'Wise OST', 'Album Name', 'Artist Name', 'Two Hours in a row', 'Record', 'Remake', 'OST Revealed', 'Advertisement Revenue', 'Playlist', 'Aroha' and 'Cho Jung-seok'. The commercialization of medical dramas includes 'Sales of Drama OST Albums', 'Organizing Online Live Concerts (PPL in Advertising)', 'Publishing Piano Music', 'Picture of People-Oriented Photography', 'Making Music Video Editing Drama Highlight', 'YouTube Upload Profits', 'Mask' and 'Disinfectant'. it is predicted that the touching story of Corona 19 and the charming humanity will unfold. The limitations of the research will require analysis of various works by genre and attempts to analyze consumer values by industry.

An exploratory study on consumers' responses to mobile payment service focused on Samsung Pay (텍스트 마이닝 기법을 이용한 모바일 간편결제 서비스에 대한 소비자 반응 분석: 삼성페이를 중심으로)

  • Jung, Minji;Lee, Yu Lim;Yoo, Chae Min;Kim, Ji Won;Chung, Jae-Eun
    • Journal of Digital Convergence
    • /
    • v.17 no.1
    • /
    • pp.9-27
    • /
    • 2019
  • The purpose of this study is to examine consumers' responses to mobile payment services by using a text-mining technique focusing on Samsung Pay as it is used in both online and offline transactions. We conducted text frequency analysis, text clustering analysis, and text network analysis using R programming. The major findings are as follows. First, the most frequently used key words referenced the brand names of the mobile devices, the replacement of traditional wallets and unique functions of Samsung Pay. Second, there was a clear split between positive and negative responses at the macro level. Third, replacement of traditional wallets played a great role in the positive responses and continuous use of mobile payment services. This study provides in-depth understanding of consumer responses toward mobile payment services. It also offers practical implications that may help mobile payment marketers correspond to consumer values and expectations, thus increasing consumer satisfaction.

Research Trends in Record Management Using Unstructured Text Data Analysis (비정형 텍스트 데이터 분석을 활용한 기록관리 분야 연구동향)

  • Deokyong Hong;Junseok Heo
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.23 no.4
    • /
    • pp.73-89
    • /
    • 2023
  • This study aims to analyze the frequency of keywords used in Korean abstracts, which are unstructured text data in the domestic record management research field, using text mining techniques to identify domestic record management research trends through distance analysis between keywords. To this end, 1,157 keywords of 77,578 journals were visualized by extracting 1,157 articles from 7 journal types (28 types) searched by major category (complex study) and middle category (literature informatics) from the institutional statistics (registered site, candidate site) of the Korean Citation Index (KCI). Analysis of t-Distributed Stochastic Neighbor Embedding (t-SNE) and Scattertext using Word2vec was performed. As a result of the analysis, first, it was confirmed that keywords such as "record management" (889 times), "analysis" (888 times), "archive" (742 times), "record" (562 times), and "utilization" (449 times) were treated as significant topics by researchers. Second, Word2vec analysis generated vector representations between keywords, and similarity distances were investigated and visualized using t-SNE and Scattertext. In the visualization results, the research area for record management was divided into two groups, with keywords such as "archiving," "national record management," "standardization," "official documents," and "record management systems" occurring frequently in the first group (past). On the other hand, keywords such as "community," "data," "record information service," "online," and "digital archives" in the second group (current) were garnering substantial focus.

Exploring the Nature of Cybercrime and Countermeasures: Focusing on Copyright Infringement, Gambling, and Pornography Crimes (사이버 범죄의 특성과 대응방안 연구: 저작권 침해, 도박, 음란물 범죄를 중심으로)

  • Ilwoong Kang;Jaehui Kim;So-Hyun Lee;Hee-Woong Kim
    • Knowledge Management Research
    • /
    • v.25 no.2
    • /
    • pp.69-94
    • /
    • 2024
  • With the development of cyberspace and its increasing interaction with our daily lives, cybercrime has been steadily increasing in recent years and has become more prominent as a serious social problem. Notably, the "four major malicious cybercrimes" - cyber fraud, cyber financial crime, cyber sexual violence, and cyber gambling - have drawn significant attention. In order to minimize the damage of cybercrime, it's crucial to delve into the specifics of each crime and develop targeted prevention and intervention strategies. Yet, most existing research relies on indirect data sources like statistics, victim testimonials, and public opinion. This study seeks to uncover the characteristics and factors of cybercrime by directly interviewing suspects involved in 'copyright infringement', 'gambling' related to illicit online content, and 'pornography crime'. Through coding analysis and text mining, the study aims to offer a more in-depth understanding of cybercrime dynamics. Furthermore, by suggesting preventative and remedial measures, the research aims to equip policymakers with vital information to reduce the repercussions of this escalating digital threat.

Information Security Job Skills Requirements: Text-mining to Compare Job Posting and NCS (정보보호 직무 수행을 위해 필요한 지식 및 기술: 텍스트 마이닝을 이용한 구인광고와 NCS의 비교)

  • Hyo-Jung Jun;Byeong-Jo Park;Tae-Sung Kim
    • Information Systems Review
    • /
    • v.25 no.3
    • /
    • pp.179-197
    • /
    • 2023
  • As a sufficient workforce supports the industry's growth, workforce training has also been carried out as part of the industry promotion policy. However, the market still has a shortage of skilled mid-level workers. The information security disclosure requires organizations to secure personnel responsible for information security work. Still, the division between information technology work and job areas is unclear, and the pay is not high for responsibility. This paper compares job keywords in advertisements for the information security workforce for 2014, 2019, and 2022. There is no difference in the keywords describing the job duties of information security personnel in the three years, such as implementation, operation, technical support, network, and security solution. To identify the actual needs of companies, we also analyzed and compared the contents of job advertisements posted on online recruitment sites with information security sector knowledge and skills defined by the National Competence Standards used for comprehensive vocational training. It was found that technical skills such as technology development, network, and operating system are preferred in the actual workplace. In contrast, managerial skills such as the legal system and certification systems are prioritized in vocational training.