• Title/Summary/Keyword: Online mining

Search Result 398, Processing Time 0.025 seconds

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

A Study on the Improvement of Recommendation Accuracy by Using Category Association Rule Mining (카테고리 연관 규칙 마이닝을 활용한 추천 정확도 향상 기법)

  • Lee, Dongwon
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.27-42
    • /
    • 2020
  • Traditional companies with offline stores were unable to secure large display space due to the problems of cost. This limitation inevitably allowed limited kinds of products to be displayed on the shelves, which resulted in consumers being deprived of the opportunity to experience various items. Taking advantage of the virtual space called the Internet, online shopping goes beyond the limits of limitations in physical space of offline shopping and is now able to display numerous products on web pages that can satisfy consumers with a variety of needs. Paradoxically, however, this can also cause consumers to experience the difficulty of comparing and evaluating too many alternatives in their purchase decision-making process. As an effort to address this side effect, various kinds of consumer's purchase decision support systems have been studied, such as keyword-based item search service and recommender systems. These systems can reduce search time for items, prevent consumer from leaving while browsing, and contribute to the seller's increased sales. Among those systems, recommender systems based on association rule mining techniques can effectively detect interrelated products from transaction data such as orders. The association between products obtained by statistical analysis provides clues to predicting how interested consumers will be in another product. However, since its algorithm is based on the number of transactions, products not sold enough so far in the early days of launch may not be included in the list of recommendations even though they are highly likely to be sold. Such missing items may not have sufficient opportunities to be exposed to consumers to record sufficient sales, and then fall into a vicious cycle of a vicious cycle of declining sales and omission in the recommendation list. This situation is an inevitable outcome in situations in which recommendations are made based on past transaction histories, rather than on determining potential future sales possibilities. This study started with the idea that reflecting the means by which this potential possibility can be identified indirectly would help to select highly recommended products. In the light of the fact that the attributes of a product affect the consumer's purchasing decisions, this study was conducted to reflect them in the recommender systems. In other words, consumers who visit a product page have shown interest in the attributes of the product and would be also interested in other products with the same attributes. On such assumption, based on these attributes, the recommender system can select recommended products that can show a higher acceptance rate. Given that a category is one of the main attributes of a product, it can be a good indicator of not only direct associations between two items but also potential associations that have yet to be revealed. Based on this idea, the study devised a recommender system that reflects not only associations between products but also categories. Through regression analysis, two kinds of associations were combined to form a model that could predict the hit rate of recommendation. To evaluate the performance of the proposed model, another regression model was also developed based only on associations between products. Comparative experiments were designed to be similar to the environment in which products are actually recommended in online shopping malls. First, the association rules for all possible combinations of antecedent and consequent items were generated from the order data. Then, hit rates for each of the associated rules were predicted from the support and confidence that are calculated by each of the models. The comparative experiments using order data collected from an online shopping mall show that the recommendation accuracy can be improved by further reflecting not only the association between products but also categories in the recommendation of related products. The proposed model showed a 2 to 3 percent improvement in hit rates compared to the existing model. From a practical point of view, it is expected to have a positive effect on improving consumers' purchasing satisfaction and increasing sellers' sales.

The Development of Design Knowledge Management System Using Data Mining (Data Mining 기법을 활용한 디자인 지식경영 시스템 구축)

  • 양종열;오민권;최경은
    • Archives of design research
    • /
    • v.16 no.2
    • /
    • pp.281-290
    • /
    • 2003
  • In the knowledge and information-based age of today, it would be fair to say that the compatibility of each person, enterprise, and nation can be evaluated by how each of them manages and maintains the knowledge created from data and information. Since the importance and necessity of knowledge management has been acknowledged, there have been studies to create, apply, and evaluate the knowledge concerning design. Previous studies done on this subject can be divided into three main categories - CRM, online statistical research, and eCRM - according to the materials used to create knowledge. These studies are meaningful in that they can create knowledge in their respective fields, although they are somewhat inadequate because the designers can't create as much knowledge as can be applied in business; design-related consumers demand composite knowledge integrating the characteristics of all three fields. In other words, they want to know the ordinary customers'preferences in the previous off-line market in the CRM field, the research results of statistical questionnaires to the various elements of design in statistical research fields, and even the pattern of preference and consumption of many and unspecified persons transcending the time and place in eCRU field. This study proposes to solve the problem related with web-based design knowledge maintenance through the synthetic application of CRM, Statistical Research, and eCRM The information proposed in the solution can De expected to help designers working at design-related enterprises, as well as research institutes, to develop the knowledge necessary to design more consumer-oriented products.

  • PDF

Customer Relationship Management Techniques Based on Dynamic Customer Analysis Utilizing Data Mining (데이터마이닝을 활용한 동적인 고객분석에 따른 고객관계관리 기법)

  • 하성호;이재신
    • Journal of Intelligence and Information Systems
    • /
    • v.9 no.3
    • /
    • pp.23-47
    • /
    • 2003
  • Traditional studies for customer relationship management (CRM) generally focus on static CRM in a specific time frame. The static CRM and customer behavior knowledge derived could help marketers to redirect marketing resources fur profit gain at that given point in time. However, as time goes, the static knowledge becomes obsolete. Therefore, application of CRM to an online retailer should be done dynamically in time. Customer-based analysis should observe the past purchase behavior of customers to understand their current and likely future purchase patterns in consumer markets, and to divide a market into distinct subsets of customers, any of which may conceivably be selected as a market target to be reached with a distinct marketing mix. Though the concept of buying-behavior-based CRM was advanced several decades ago, virtually little application of the dynamic CRM has been reported to date. In this paper, we propose a dynamic CRM model utilizing data mining and a Monitoring Agent System (MAS) to extract longitudinal knowledge from the customer data and to analyze customer behavior patterns over time for the Internet retailer. The proposed model includes an extensive analysis about a customer career path that observes behaviors of segment shifts of each customer: prediction of customer careers, identification of dominant career paths that most customers show and their managerial implications, and about the evolution of customer segments over time. furthermore, we show that dynamic CRM could be useful for solving several managerial problems which any retailers may face.

  • PDF

Analysis of Twitter Post with 'Self-Iinjury' and 'Ssuicide' Using Text Mining (텍스트 마이닝기법을 활용한 '자해' 및 '자살' 관련 트위터 게시물 분석)

  • Yuri Lee;Hoin Kwon
    • Korean Journal of Culture and Social Issue
    • /
    • v.29 no.1
    • /
    • pp.147-170
    • /
    • 2023
  • This study explored keywords and key topics by collecting posts related to 'self-Iinjury' and 'suicide' through Twitter. The study subjects were selected as posts containing related hashtags related to self-injury and suicide from October 29, 2019 to November 30, 2020. Text mining based on collected posts resulted in a total of 11 key topics: -6 related to 'self-Iinjury' and 5 related to 'suicide'. The main message in the topic is as follows. First, looking at the main messages contained in the topic, they honestly expressed self-harm and suicide experiences that are difficult to express offline online, and used SNS as a channelpath for requesting help requests. Second, there were common and discriminatory characteristics in posts related to 'self-Iinjury' and 'suicide'. Although topics related to 'self-Iinjury' mainly revealed emotional control and interpersonal functions of self-harm, messages related to 'suicide' showed more clearly messages about suicide prevention addressing and social problems. These results are meaningful in that they can understand the opinions of people who have experienced self-harm and suicide accidents and the public voice on self-harm and suicide-related issues could be better understood, and that this study seeks for effective self-harm and suicide prevention and intervention measures for self-harm and suicide issues.

Analysis of trends in domestic research on addiction using text mining and CONCOR (텍스트마이닝과 CONCOR을 활용한 중독 관련 국내 연구 동향 분석)

  • Sol-Ji Lee;Ki-Hyok Youn
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.6
    • /
    • pp.99-110
    • /
    • 2023
  • This study analyzed 817 articles published in Korean professional journals over the past three years, from 2020 to 2022, using text mining techniques to identify trends in addiction research in Korea and explore development directions. The analysis results are as follows. First, as a result of the analysis of the top keywords, online addiction studies such as smartphones, games, Internet, gambling, and relationship addiction were prominent as the top keywords. Second, as a result of TF-IDF analysis, many addiction studies related to behavioral addiction such as smartphones, games, the Internet, and work addiction have been conducted over the past three years, and in particular, there are many studies on addiction problems such as smartphones, games, and the Internet that have not yet been clinically diagnosed as addiction problems. This is the same as the result of word frequency analysis, and it can be interpreted that recent studies have been remarkably conducted on more diverse addiction problems. Third, the 2-gram analysis shows that words that mainly correspond to behavioral addiction, such as smartphones, games, and the Internet, appear side by side with the keyword addiction, and among them, words paired with smartphones are mentioned a lot in research papers and are being studied. Fourth, as a result of the CONCOR analysis, there were five clusters: a study on universal addiction issues such as alcohol use disorders and the Internet, a study of recovery on drug and gambling addiction, a study on mobile devices and media addiction, a study on the latest trends related to behavioral addiction, and other addiction issues. Finally, based on the results of this study, a direction for future addiction-related research was suggested.

Impact of Semantic Characteristics on Perceived Helpfulness of Online Reviews (온라인 상품평의 내용적 특성이 소비자의 인지된 유용성에 미치는 영향)

  • Park, Yoon-Joo;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.29-44
    • /
    • 2017
  • In Internet commerce, consumers are heavily influenced by product reviews written by other users who have already purchased the product. However, as the product reviews accumulate, it takes a lot of time and effort for consumers to individually check the massive number of product reviews. Moreover, product reviews that are written carelessly actually inconvenience consumers. Thus many online vendors provide mechanisms to identify reviews that customers perceive as most helpful (Cao et al. 2011; Mudambi and Schuff 2010). For example, some online retailers, such as Amazon.com and TripAdvisor, allow users to rate the helpfulness of each review, and use this feedback information to rank and re-order them. However, many reviews have only a few feedbacks or no feedback at all, thus making it hard to identify their helpfulness. Also, it takes time to accumulate feedbacks, thus the newly authored reviews do not have enough ones. For example, only 20% of the reviews in Amazon Review Dataset (Mcauley and Leskovec, 2013) have more than 5 reviews (Yan et al, 2014). The purpose of this study is to analyze the factors affecting the usefulness of online product reviews and to derive a forecasting model that selectively provides product reviews that can be helpful to consumers. In order to do this, we extracted the various linguistic, psychological, and perceptual elements included in product reviews by using text-mining techniques and identifying the determinants among these elements that affect the usability of product reviews. In particular, considering that the characteristics of the product reviews and determinants of usability for apparel products (which are experiential products) and electronic products (which are search goods) can differ, the characteristics of the product reviews were compared within each product group and the determinants were established for each. This study used 7,498 apparel product reviews and 106,962 electronic product reviews from Amazon.com. In order to understand a review text, we first extract linguistic and psychological characteristics from review texts such as a word count, the level of emotional tone and analytical thinking embedded in review text using widely adopted text analysis software LIWC (Linguistic Inquiry and Word Count). After then, we explore the descriptive statistics of review text for each category and statistically compare their differences using t-test. Lastly, we regression analysis using the data mining software RapidMiner to find out determinant factors. As a result of comparing and analyzing product review characteristics of electronic products and apparel products, it was found that reviewers used more words as well as longer sentences when writing product reviews for electronic products. As for the content characteristics of the product reviews, it was found that these reviews included many analytic words, carried more clout, and related to the cognitive processes (CogProc) more so than the apparel product reviews, in addition to including many words expressing negative emotions (NegEmo). On the other hand, the apparel product reviews included more personal, authentic, positive emotions (PosEmo) and perceptual processes (Percept) compared to the electronic product reviews. Next, we analyzed the determinants toward the usefulness of the product reviews between the two product groups. As a result, it was found that product reviews with high product ratings from reviewers in both product groups that were perceived as being useful contained a larger number of total words, many expressions involving perceptual processes, and fewer negative emotions. In addition, apparel product reviews with a large number of comparative expressions, a low expertise index, and concise content with fewer words in each sentence were perceived to be useful. In the case of electronic product reviews, those that were analytical with a high expertise index, along with containing many authentic expressions, cognitive processes, and positive emotions (PosEmo) were perceived to be useful. These findings are expected to help consumers effectively identify useful product reviews in the future.

Online news-based stock price forecasting considering homogeneity in the industrial sector (산업군 내 동질성을 고려한 온라인 뉴스 기반 주가예측)

  • Seong, Nohyoon;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.1-19
    • /
    • 2018
  • Since stock movements forecasting is an important issue both academically and practically, studies related to stock price prediction have been actively conducted. The stock price forecasting research is classified into structured data and unstructured data, and it is divided into technical analysis, fundamental analysis and media effect analysis in detail. In the big data era, research on stock price prediction combining big data is actively underway. Based on a large number of data, stock prediction research mainly focuses on machine learning techniques. Especially, research methods that combine the effects of media are attracting attention recently, among which researches that analyze online news and utilize online news to forecast stock prices are becoming main. Previous studies predicting stock prices through online news are mostly sentiment analysis of news, making different corpus for each company, and making a dictionary that predicts stock prices by recording responses according to the past stock price. Therefore, existing studies have examined the impact of online news on individual companies. For example, stock movements of Samsung Electronics are predicted with only online news of Samsung Electronics. In addition, a method of considering influences among highly relevant companies has also been studied recently. For example, stock movements of Samsung Electronics are predicted with news of Samsung Electronics and a highly related company like LG Electronics.These previous studies examine the effects of news of industrial sector with homogeneity on the individual company. In the previous studies, homogeneous industries are classified according to the Global Industrial Classification Standard. In other words, the existing studies were analyzed under the assumption that industries divided into Global Industrial Classification Standard have homogeneity. However, existing studies have limitations in that they do not take into account influential companies with high relevance or reflect the existence of heterogeneity within the same Global Industrial Classification Standard sectors. As a result of our examining the various sectors, it can be seen that there are sectors that show the industrial sectors are not a homogeneous group. To overcome these limitations of existing studies that do not reflect heterogeneity, our study suggests a methodology that reflects the heterogeneous effects of the industrial sector that affect the stock price by applying k-means clustering. Multiple Kernel Learning is mainly used to integrate data with various characteristics. Multiple Kernel Learning has several kernels, each of which receives and predicts different data. To incorporate effects of target firm and its relevant firms simultaneously, we used Multiple Kernel Learning. Each kernel was assigned to predict stock prices with variables of financial news of the industrial group divided by the target firm, K-means cluster analysis. In order to prove that the suggested methodology is appropriate, experiments were conducted through three years of online news and stock prices. The results of this study are as follows. (1) We confirmed that the information of the industrial sectors related to target company also contains meaningful information to predict stock movements of target company and confirmed that machine learning algorithm has better predictive power when considering the news of the relevant companies and target company's news together. (2) It is important to predict stock movements with varying number of clusters according to the level of homogeneity in the industrial sector. In other words, when stock prices are homogeneous in industrial sectors, it is important to use relational effect at the level of industry group without analyzing clusters or to use it in small number of clusters. When the stock price is heterogeneous in industry group, it is important to cluster them into groups. This study has a contribution that we testified firms classified as Global Industrial Classification Standard have heterogeneity and suggested it is necessary to define the relevance through machine learning and statistical analysis methodology rather than simply defining it in the Global Industrial Classification Standard. It has also contribution that we proved the efficiency of the prediction model reflecting heterogeneity.

Characterizing Patterns of Experience of Harmful Shops among Adolescents Using Decision Tree Models (데이터마이닝을 이용한 청소년 유해업소 출입경험에 영향을 주는 요인)

  • Sohn, Aeree
    • Korean Journal of Health Education and Promotion
    • /
    • v.31 no.3
    • /
    • pp.15-26
    • /
    • 2014
  • Objective: This study was conducted in order to explore the predictive model of the experience of harmful shops in middle and high school students. Methods: The survey was conducted using a self-administered questionnaire method online via the homepage of the education ministry's student health information center. Participants were 1,888 middle school students and 1,563 high school students from 107 schools in Korea. The collected data were processed using the SPSS classification trees 18.0 program and examined using data mining decision tree model. Results: In this study, 6.9% of all subjects were found to have been to sex industry harmful place and 81.8% game place. The results revealed that smoking, living with parents, and school grade were significant predictors for experience of sex industry harmful place. The perception of study disrupts, drinking, living with parents, stress, and satisfaction of school life were significant predictors for experience of game harmful place. Conclusions: These results suggest that an educational approach should be developed by tailored conditions to prevent the access to harmful shops.

Visualization of movie recommendation system using the sentimental vocabulary distribution map

  • Ha, Hyoji;Han, Hyunwoo;Mun, Seongmin;Bae, Sungyun;Lee, Jihye;Lee, Kyungwon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.21 no.5
    • /
    • pp.19-29
    • /
    • 2016
  • This paper suggests a method to refine a massive collective intelligence data, and visualize with multilevel sentiment network, in order to understand information in an intuitive and semantic way. For this study, we first calculated a frequency of sentiment words from each movie review. Second, we designed a Heatmap visualization to effectively discover the main emotions on each online movie review. Third, we formed a Sentiment-Movie Network combining the MDS Map and Social Network in order to fix the movie network topology, while creating a network graph to enable the clustering of similar nodes. Finally, we evaluated our progress to verify if it is actually helpful to improve user cognition for multilevel analysis experience compared to the existing network system, thus concluded that our method provides improved user experience in terms of cognition, being appropriate as an alternative method for semantic understanding.