• Title/Summary/Keyword: frequency-based method

Search Result 6,115, Processing Time 0.038 seconds

Multi-Dimensional Analysis Method of Product Reviews for Market Insight (마켓 인사이트를 위한 상품 리뷰의 다차원 분석 방안)

  • Park, Jeong Hyun;Lee, Seo Ho;Lim, Gyu Jin;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.57-78
    • /
    • 2020
  • With the development of the Internet, consumers have had an opportunity to check product information easily through E-Commerce. Product reviews used in the process of purchasing goods are based on user experience, allowing consumers to engage as producers of information as well as refer to information. This can be a way to increase the efficiency of purchasing decisions from the perspective of consumers, and from the seller's point of view, it can help develop products and strengthen their competitiveness. However, it takes a lot of time and effort to understand the overall assessment and assessment dimensions of the products that I think are important in reading the vast amount of product reviews offered by E-Commerce for the products consumers want to compare. This is because product reviews are unstructured information and it is difficult to read sentiment of reviews and assessment dimension immediately. For example, consumers who want to purchase a laptop would like to check the assessment of comparative products at each dimension, such as performance, weight, delivery, speed, and design. Therefore, in this paper, we would like to propose a method to automatically generate multi-dimensional product assessment scores in product reviews that we would like to compare. The methods presented in this study consist largely of two phases. One is the pre-preparation phase and the second is the individual product scoring phase. In the pre-preparation phase, a dimensioned classification model and a sentiment analysis model are created based on a review of the large category product group review. By combining word embedding and association analysis, the dimensioned classification model complements the limitation that word embedding methods for finding relevance between dimensions and words in existing studies see only the distance of words in sentences. Sentiment analysis models generate CNN models by organizing learning data tagged with positives and negatives on a phrase unit for accurate polarity detection. Through this, the individual product scoring phase applies the models pre-prepared for the phrase unit review. Multi-dimensional assessment scores can be obtained by aggregating them by assessment dimension according to the proportion of reviews organized like this, which are grouped among those that are judged to describe a specific dimension for each phrase. In the experiment of this paper, approximately 260,000 reviews of the large category product group are collected to form a dimensioned classification model and a sentiment analysis model. In addition, reviews of the laptops of S and L companies selling at E-Commerce are collected and used as experimental data, respectively. The dimensioned classification model classified individual product reviews broken down into phrases into six assessment dimensions and combined the existing word embedding method with an association analysis indicating frequency between words and dimensions. As a result of combining word embedding and association analysis, the accuracy of the model increased by 13.7%. The sentiment analysis models could be seen to closely analyze the assessment when they were taught in a phrase unit rather than in sentences. As a result, it was confirmed that the accuracy was 29.4% higher than the sentence-based model. Through this study, both sellers and consumers can expect efficient decision making in purchasing and product development, given that they can make multi-dimensional comparisons of products. In addition, text reviews, which are unstructured data, were transformed into objective values such as frequency and morpheme, and they were analysed together using word embedding and association analysis to improve the objectivity aspects of more precise multi-dimensional analysis and research. This will be an attractive analysis model in terms of not only enabling more effective service deployment during the evolving E-Commerce market and fierce competition, but also satisfying both customers.

The Prediction of Purchase Amount of Customers Using Support Vector Regression with Separated Learning Method (Support Vector Regression에서 분리학습을 이용한 고객의 구매액 예측모형)

  • Hong, Tae-Ho;Kim, Eun-Mi
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.213-225
    • /
    • 2010
  • Data mining has empowered the managers who are charge of the tasks in their company to present personalized and differentiated marketing programs to their customers with the rapid growth of information technology. Most studies on customer' response have focused on predicting whether they would respond or not for their marketing promotion as marketing managers have been eager to identify who would respond to their marketing promotion. So many studies utilizing data mining have tried to resolve the binary decision problems such as bankruptcy prediction, network intrusion detection, and fraud detection in credit card usages. The prediction of customer's response has been studied with similar methods mentioned above because the prediction of customer's response is a kind of dichotomous decision problem. In addition, a number of competitive data mining techniques such as neural networks, SVM(support vector machine), decision trees, logit, and genetic algorithms have been applied to the prediction of customer's response for marketing promotion. The marketing managers also have tried to classify their customers with quantitative measures such as recency, frequency, and monetary acquired from their transaction database. The measures mean that their customers came to purchase in recent or old days, how frequent in a period, and how much they spent once. Using segmented customers we proposed an approach that could enable to differentiate customers in the same rating among the segmented customers. Our approach employed support vector regression to forecast the purchase amount of customers for each customer rating. Our study used the sample that included 41,924 customers extracted from DMEF04 Data Set, who purchased at least once in the last two years. We classified customers from first rating to fifth rating based on the purchase amount after giving a marketing promotion. Here, we divided customers into first rating who has a large amount of purchase and fifth rating who are non-respondents for the promotion. Our proposed model forecasted the purchase amount of the customers in the same rating and the marketing managers could make a differentiated and personalized marketing program for each customer even though they were belong to the same rating. In addition, we proposed more efficient learning method by separating the learning samples. We employed two learning methods to compare the performance of proposed learning method with general learning method for SVRs. LMW (Learning Method using Whole data for purchasing customers) is a general learning method for forecasting the purchase amount of customers. And we proposed a method, LMS (Learning Method using Separated data for classification purchasing customers), that makes four different SVR models for each class of customers. To evaluate the performance of models, we calculated MAE (Mean Absolute Error) and MAPE (Mean Absolute Percent Error) for each model to predict the purchase amount of customers. In LMW, the overall performance was 0.670 MAPE and the best performance showed 0.327 MAPE. Generally, the performances of the proposed LMS model were analyzed as more superior compared to the performance of the LMW model. In LMS, we found that the best performance was 0.275 MAPE. The performance of LMS was higher than LMW in each class of customers. After comparing the performance of our proposed method LMS to LMW, our proposed model had more significant performance for forecasting the purchase amount of customers in each class. In addition, our approach will be useful for marketing managers when they need to customers for their promotion. Even if customers were belonging to same class, marketing managers could offer customers a differentiated and personalized marketing promotion.

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

Analysis of the Korea Traditional Colors within the Spatial Arrangement and Form of the Traditional Garden of Seyeonjeong (보길도 세연정(洗然庭)의 공간구조 형식에 내재한 전통색채 분석)

  • Han, Hee-Jeong;Cho, Se-Hwan
    • Journal of the Korean Institute of Traditional Landscape Architecture
    • /
    • v.32 no.4
    • /
    • pp.14-23
    • /
    • 2014
  • The purpose of this study is to contribute in building credibility of the methodology of the appearance of the traditional colors and the interpretation of the meaning of those appearances by analyzing the spatial construction and configuration and the traditional colors that appear in spatial elements about the scenery component that appear in Seyeonjeong. We conducted a literature research about the traditional colors, the background of the creation of Seyeonjeong, and etc. For the contents for the empirical analysis, we took the scenery and space elements in the poems, such as Eobusasisa and O-u-ga, and the contents of poems related to ojeongsaek (five Korean traditional colors) based on the Yin-Yang and the Five Elements ideology Particularly, after dividing the spatial elements appearing in Seyoenjeong into visual, synesthetic, symbolic/cognitive spatial element, we further distinguished the visual space into positions and directions of the of the spaces and the scenery of the season; the synesthetic space into seasons, time and five senses; and the symbolic/cognitive space into chiljeong (or the seven passions) and sadan (or the four clues). Then we carried out the study by analyzing the correlation between the intention of the garden creation and the meaning of the spaces, through the analysis of ojeongsaek system for each spatial element. Firstly, spatial structure and format that appear in Seyeonjeong can be divided into two directional axes of southeast and northwest according to the flat form of the Seyeongjeong's rectangular palace, with Seyeongjoeng as the center. Secondly, in spatial component element, the frequencies of appearance of the traditional colors of Seyoenjeong are 33.2% for white, 20.8% for blue, 20.8% for black, 18.7% for red and 6.3% for yellow. Thirdly, based on the analysis of the traditional colors the most frequent appearance of 'white' left a room for interpretation like the creation of Seyeonjeong was to enjoy secular living without lingering political feelings so that the high mountains remain clear and clean. Also, the predominant frequency of appearance of blue, similar frequency of appearance of black and red, and the least frequent appearance of yellow is in agreement with or can be at least interpreted related to Yun Seon-do's intention for creating Seyeonjeong not for political rank or power but as a place to enjoy nature, through which he can build on his knowledge, and to lead rest of his life as a noble being through plays, like dancing and writing poems. Fourthly, these interpretations of the analysis of the frequency of appearance of the traditional colors of Seyeongjong shows the reliability, validity, and consistency of the methodology of the analysis of the frequency of appearance of the traditional colors and the interpretation of the meanings in the context that the color white appears most frequently in Soswewon as well and that the background life of the Soswewon's creator Yangsanbo can be interpreted in a similarly way. Above all, this study is significant from the fact that we proposed a theory about the method of analysis and interpretation of the traditional colors in a traditional landscape space. Moreover, there is a great significance of discovering that traditional colors appear in traditional spaces and this can be used as a methodological framework to interpret things like, intention for creation of (buildings/architectures).

A Study of the Factors Influencing Behavioral Intention for Organic Food: Using the Theory of Planned Behavior (유기농식품에 대한 소비자의 구매의도 영향요인 분석 계획적 행동이론을 중심으로)

  • Choi, Hwa-Sun;Lee, Kwang-Keun
    • Journal of Distribution Science
    • /
    • v.10 no.2
    • /
    • pp.53-62
    • /
    • 2012
  • Well-being is a reflection of current sociocultural trends that focus on the quality of life based on economic growth. Furthermore, organic food is believed to help people maintain good health and therefore leads to increased consumption of organic foods. Therefore, consumer interest in organic food is increasing, causing its market to grow, and this trend will be maintained in the future. The abuse of agricultural pesticides, gene manipulation, and bovine spongiform encephalopathy has caused consumers to worry about food safety. The well-being trend has also contributed to consumers' growing interest inorganic food and organic agricultural products. A consumer's choice offood is a complex processes affected by various factors. In particular, organic food is considered an individualistic merit good, considering the consumers' preferences related to certification policies. Therefore, various factors such as personal characteristics and sense of value could affect consumers' decisions. This research focused on an analysis of the factors influencing consumers' purchasing intention for organic food on the basis of an increase in organic food consumption. The research method was based on the theory of planned behavior (TPB). Factors such as consumer characteristics regarding food consumption, purchasing frequency, and other factors affecting purchasing intention were presented. The hypothesis was set using advanced research and stated that it is easier to forecast purchasing intentions by combining the theory of planned behavior and personal characteristics of consumer. The results show that two dimensions, attitude and perceived behavioral control, have statistically significant influence on the purchasing intention. It can be said that a positive attitude toward organic foods in particular increases the possibility of purchasing intention. In addition, consumers who consume more organic food products are more likely to have positive attitudes, and, in the past, purchasing frequency has positively influenced purchasing intention of organic foods. Consumers' negative feelings about the non-purchase of organic foods also showed a negative influence on purchasing intentions. In other words, even though consumers feel uncomfortable when not consuming organic food products, they do not try to purchase such products because of this feeling of discomfort. Furthermore, the subjective norm and the behavioral control of food-related involvement do not have a statistically significant influence on the purchasing intention or attitudes. This research verified the influence of factors related to purchasing intention. This study has several limitations: (1) even though consumers' responses can change based on the type of food, the types of food were not classified in this study; (2) future studies are necessary to analyze the attitudes of consumers on the basis of their purchasing experiences with organic foods.

  • PDF

Estimation of grid-type precipitation quantile using satellite based re-analysis precipitation data in Korean peninsula (위성 기반 재분석 강수 자료를 이용한 한반도 격자형 확률강수량 산정)

  • Lee, Jinwook;Jun, Changhyun;Kim, Hyeon-joon;Byun, Jongyun;Baik, Jongjin
    • Journal of Korea Water Resources Association
    • /
    • v.55 no.6
    • /
    • pp.447-459
    • /
    • 2022
  • This study estimated the grid-type precipitation quantile for the Korean Peninsula using PERSIANN-CCS-CDR (Precipitation Estimation from Remotely Sensed Information using Artificial Neural Networks-Cloud Classification System-Climate Data Record), a satellite based re-analysis precipitation data. The period considered is a total of 38 years from 1983 to 2020. The spatial resolution of the data is 0.04° and the temporal resolution is 3 hours. For the probability distribution, the Gumbel distribution which is generally used for frequency analysis was used, and the probability weighted moment method was applied to estimate parameters. The duration ranged from 3 hours to 144 hours, and the return period from 2 years to 500 years was considered. The results were compared and reviewed with the estimated precipitation quantile using precipitation data from the Automated Synoptic Observing System (ASOS) weather station. As a result, the parameter estimates of the Gumbel distribution from the PERSIANN-CCS-CDR showed a similar pattern to the results of the ASOS as the duration increased, and the estimates of precipitation quantiles showed a rather large difference when the duration was short. However, when the duration was 18 h or longer, the difference decreased to less than about 20%. In addition, the difference between results of the South and North Korea was examined, it was confirmed that the location parameters among parameters of the Gumbel distribution was markedly different. As the duration increased, the precipitation quantile in North Korea was relatively smaller than those in South Korea, and it was 84% of that of South Korea for a duration of 3 h, and 70-75% of that of South Korea for a duration of 144 h.

Automatic Quality Evaluation with Completeness and Succinctness for Text Summarization (완전성과 간결성을 고려한 텍스트 요약 품질의 자동 평가 기법)

  • Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.125-148
    • /
    • 2018
  • Recently, as the demand for big data analysis increases, cases of analyzing unstructured data and using the results are also increasing. Among the various types of unstructured data, text is used as a means of communicating information in almost all fields. In addition, many analysts are interested in the amount of data is very large and relatively easy to collect compared to other unstructured and structured data. Among the various text analysis applications, document classification which classifies documents into predetermined categories, topic modeling which extracts major topics from a large number of documents, sentimental analysis or opinion mining that identifies emotions or opinions contained in texts, and Text Summarization which summarize the main contents from one document or several documents have been actively studied. Especially, the text summarization technique is actively applied in the business through the news summary service, the privacy policy summary service, ect. In addition, much research has been done in academia in accordance with the extraction approach which provides the main elements of the document selectively and the abstraction approach which extracts the elements of the document and composes new sentences by combining them. However, the technique of evaluating the quality of automatically summarized documents has not made much progress compared to the technique of automatic text summarization. Most of existing studies dealing with the quality evaluation of summarization were carried out manual summarization of document, using them as reference documents, and measuring the similarity between the automatic summary and reference document. Specifically, automatic summarization is performed through various techniques from full text, and comparison with reference document, which is an ideal summary document, is performed for measuring the quality of automatic summarization. Reference documents are provided in two major ways, the most common way is manual summarization, in which a person creates an ideal summary by hand. Since this method requires human intervention in the process of preparing the summary, it takes a lot of time and cost to write the summary, and there is a limitation that the evaluation result may be different depending on the subject of the summarizer. Therefore, in order to overcome these limitations, attempts have been made to measure the quality of summary documents without human intervention. On the other hand, as a representative attempt to overcome these limitations, a method has been recently devised to reduce the size of the full text and to measure the similarity of the reduced full text and the automatic summary. In this method, the more frequent term in the full text appears in the summary, the better the quality of the summary. However, since summarization essentially means minimizing a lot of content while minimizing content omissions, it is unreasonable to say that a "good summary" based on only frequency always means a "good summary" in its essential meaning. In order to overcome the limitations of this previous study of summarization evaluation, this study proposes an automatic quality evaluation for text summarization method based on the essential meaning of summarization. Specifically, the concept of succinctness is defined as an element indicating how few duplicated contents among the sentences of the summary, and completeness is defined as an element that indicating how few of the contents are not included in the summary. In this paper, we propose a method for automatic quality evaluation of text summarization based on the concepts of succinctness and completeness. In order to evaluate the practical applicability of the proposed methodology, 29,671 sentences were extracted from TripAdvisor 's hotel reviews, summarized the reviews by each hotel and presented the results of the experiments conducted on evaluation of the quality of summaries in accordance to the proposed methodology. It also provides a way to integrate the completeness and succinctness in the trade-off relationship into the F-Score, and propose a method to perform the optimal summarization by changing the threshold of the sentence similarity.

Developments of Greenhouse Gas Generation Models and Estimation Method of Their Parameters for Solid Waste Landfills (폐기물매립지에서의 온실가스 발생량 예측 모델 및 변수 산정방법 개발)

  • Park, Jin-Kyu;Kang, Jeong-Hee;Ban, Jong-Ki;Lee, Nam-Hoon
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.32 no.6B
    • /
    • pp.399-406
    • /
    • 2012
  • The objective of this research is to develop greenhouse gas generation models and estimation method of their parameters for solid waste landfills. Two models obtained by differentiating the Modified Gompertz and Logistic models were employed to evaluate two parameters of a first-order decay model, methane generation potential ($L_0$) and methane generation rate constant (k). The parameters were determined by the statistical comparison of predicted gas generation rate data using the two models and actual landfill gas collection data. The values of r-square obtained from regression analysis between two data showed that one model by differentiating the Modified Gompetz was 0.92 and the other model by differentiating the Logistic was 0.94. From this result, the estimation methods showed that $L_0$ and k values can be determined by regression analysis if landfill gas collection data are available. Also, new models based on two models obtained by differentiating the Modified Gompertz and Logistic models were developed to predict greenhouse gas generation from solid waste landfills that actual landfill generation data could not be available. They showed better prediction than LandGEM model. Frequency distribution of the ratio of Qcs (LFG collection system) to Q (prediction value) was used to evaluate the accuracy of the models. The new models showed higher accuracy than LandGEM model. Thus, it is concluded that the models developed in this research are suitable for the prediction of greenhouse gas generation from solid waste landfills.

Comparison of the Recent Trend of Chemistry Education Research Based on the Analysis of the Domestic and Foreign Journals (국내외 학술지를 토대로 분석한 화학교육 연구의 최근 동향 비교)

  • Han, Jae-Young;Lee, Sang-Chul
    • Journal of the Korean Chemical Society
    • /
    • v.56 no.2
    • /
    • pp.290-296
    • /
    • 2012
  • This study analyzed the research papers published in three (2 domestic and 1 foreign) journals, in order to understand the recent trend of chemistry education research. We selected Journal of the Korean Chemical Society (JKCS) and Journal of the Korean Association for Science Education (JKASE) as the domestic journals, and Journal of Chemical Education (JCE) as a foreign journal. The papers published from 2000 to 2009 were analyzed. As the result, the chemistry education research theme focused on 'teaching method and education technology', 'learner's characteristics', and 'chemical concept and experiment' in the order of frequency. The research on 'curriculum and textbooks' was performed often in JKCS reflecting Korean social environment. The most researched chemistry education goal was the 'conceptual understanding/change' followed by 'achievement/grade' in JCE and 'experiment/inquiry skill' in JKCS, and 'attitude/interest/motivation' in JKASE. The research subjects were focused to 'middle or high school students' in JKCS, in contrast to the 'university students' in JCE. More concern to the higher education is required in the domestic research. The most frequently used research method was 'survey/ examination' followed by 'experimental research' in JCE and JKASE and 'data/material analysis' in JKCS. We discussed the implication on future chemistry education research.

Forecasting the Precipitation of the Next Day Using Deep Learning (딥러닝 기법을 이용한 내일강수 예측)

  • Ha, Ji-Hun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.2
    • /
    • pp.93-98
    • /
    • 2016
  • For accurate precipitation forecasts the choice of weather factors and prediction method is very important. Recently, machine learning has been widely used for forecasting precipitation, and artificial neural network, one of machine learning techniques, showed good performance. In this paper, we suggest a new method for forecasting precipitation using DBN, one of deep learning techniques. DBN has an advantage that initial weights are set by unsupervised learning, so this compensates for the defects of artificial neural networks. We used past precipitation, temperature, and the parameters of the sun and moon's motion as features for forecasting precipitation. The dataset consists of observation data which had been measured for 40 years from AWS in Seoul. Experiments were based on 8-fold cross validation. As a result of estimation, we got probabilities of test dataset, so threshold was used for the decision of precipitation. CSI and Bias were used for indicating the precision of precipitation. Our experimental results showed that DBN performed better than MLP.