Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)
-
- Journal of Intelligence and Information Systems
- /
- v.24 no.2
- /
- pp.59-83
- /
- 2018
With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.
As relationship between buyer and seller has been brought closer and long-term relationship has been more important in B2B markets, the importance of service and service convenience increases as well as product. In homogeneous markets, where service offerings are similar and therefore not key competitive differentiator, providing greater convenience may enable a competitive advantage. Service convenience, as conceptualized by Berry et al. (2002), is defined as the consumers' time and effort perceptions related to buying or using a service. For this reason, B2B customers are interested in how fast the service is provided and how much save non-monetary cost like time or effort by the service convenience along with service quality. Therefore, this study attempts to investigate the impact of service convenience on relationship factors such as relationship satisfaction, relationship commitment, and relationship performance. The purpose of this study is to find out whether service convenience can be a new antecedent of relationship quality and relationship performance. In addition, this study tries to examine how five-dimensional service convenience constructs (decision convenience, access convenience, transaction convenience, benefit convenience, post-benefit convenience) affect customers' relationship satisfaction, relationship commitment, and relationship performance. The service convenience comprises five fundamental components - decision convenience (the perceived time and effort costs associated with service purchase or use decisions), access convenience(the perceived time and effort costs associated with initiating service delivery), transaction convenience(the perceived time and effort costs associated with finalizing the transaction), benefit convenience(the perceived time and effort costs associated with experiencing the core benefits of the offering) and post-benefit convenience (the perceived time and effort costs associated with reestablishing subsequent contact with the firm). Earlier studies of perceived service convenience in the industrial market are none. The conventional studies that have dealt with service convenience have usually been made in the consumer market, or they have dealt with convenience aspects in the service process. This service convenience measure for consumer market can be useful tool to estimate service quality in B2B market. The conceptualization developed by Berry et al. (2002) reflects a multistage, experiential consumption process in which evaluations of convenience vary at each stage. For this reason, the service convenience measure is good for B2B service environment which has complex processes and various types. Especially when categorizing B2B service as sequential stage of service delivery like Kumar and Kumar (2004), the Berry's service convenience measure which reflect sequential flow of service deliveries suitable to establish B2B service convenience. For this study, data were gathered from respondents who often buy business service and analyzed by structural equation modeling. The sample size in the present study is 119. Composite reliability values and average variance extracted values were examined for each variable to have reliability. We determine whether the measurement model supports the convergent validity by CFA, and discriminant validity was assessed by examining the correlation matrix of the constructs. For each pair of constructs, the square root of the average variance extracted exceeded their correlations, thus supporting the discriminant validity of the constructs. Hypotheses were tested using the Smart PLS 2.0 and we calculated the PLS path values and followed with a bootstrap re-sampling method to test the hypotheses. Among the five dimensional service convenience constructs, four constructs (decision convenience, transaction convenience, benefit convenience, post-benefit convenience) affected customers' positive relationship satisfaction, relationship commitment, and relationship performance. This result means that service convenience is important cue to improve relationship between buyer and seller. One of the five service convenience dimensions, access convenience, does not affect relationship quality and performance, which implies that the dimension of service convenience is not important factor of cumulative satisfaction. The Cumulative satisfaction can be distinguished from transaction-specific customer satisfaction, which is an immediate post-purchase evaluative judgment or an affective reaction to the most recent transactional experience with the firm. Because access convenience minimizes the physical effort associated with initiating an exchange, the effect on relationship satisfaction similar to cumulative satisfaction may be relatively low in terms of importance than transaction-specific customer satisfaction. Also, B2B firms focus on service quality, price, benefit, follow-up service and so on than convenience of time or place in service because it is relatively difficult to change existing transaction partners in B2B market compared to consumer market. In addition, this study using partial least squares methods reveals that customers' satisfaction and commitment toward relationship has mediating role between the service convenience and relationship performance. The result shows that management and investment to improve service convenience make customers' positive relationship satisfaction, and then the positive relationship satisfaction can enhance the relationship commitment and relationship performance. And to conclude, service convenience management is an important part of successful relationship performance management, and the service convenience is an important antecedent of relationship between buyer and seller such as the relationship commitment and relationship performance. Therefore, it has more important to improve relationship performance that service providers enhance service convenience although competitive service development or service quality improvement is important. Given the pressure to provide increased convenience, it is not surprising that organizations have made significant investments in enhancing the convenience aspect of their product and service offering.
I am working on a series of Korean linguistic studies targeting Ganchal(old typed letters in Korea) for many years and this study is for the typology of the [Safety Expression] as the part. For this purpose, [Safety Expression] were divided into a formal types and semantic types, targeting the Chinese Ganchal and Hangul Ganchal of modern Korean Language time(16th century-19th century). Formal types can be divided based on whether Normal position or not, whether Omission or not, whether the Sending letter or not, whether the relationship of the high and the low or not. Normal position form and completion were made the first type which reveal well the typicality of the [Safety Expression]. Original position while [Own Safety] omitted as the second type, while Original position while [Opposite Safety] omitted as the third type, Original position while [Safety Expression] omitted as the fourth type. Inversion type were made as the fifth type which is the most severe solecism in [Safety Expression]. The first type is refers to Original position type that [Opposite Safety] precede the [Own Safety] and the completion type that is full of semantic element. This type can be referred to most typical and normative in that it equipped all components of [Safety Expression]. A second type is that [Safety Expression] is composed of only the [Opposite Safety]. This type is inferior to the first type in terms of set pattern, it is never outdone when it comes to the appearance frequency. Because asking [Opposite Safety] faithfully, omitting [Own Safety] dose not greatly deviate politeness and easy to write Ganchal, it is utilized. The third type is the Original position type showing the configuration of the [Opposite Safety]+Own Safety], but [Opposite Safety] is omitted. The fourth type is a Original position type showing configuration of the [Opposite Safety+Own Safety], but [Safety Expression] is omitted. This type is divided into A ; [Safety Expression] is entirely omitted and B ; such as 'saving trouble', the conventional expression, replace [Safety Expression]. The fifth type is inversion type that shown to structure of the [Own Safety+Opposite Safety], unlike the Original position type. This type is the most severe solecism type and real example is very rare. It is because let leading [Own Safety] and ask later [Opposite Safety] for face save is offend against common decency. In addition, it can be divided into the direct type that [Opposite Safety] and [Own Safety] is directly connected and indirect type that separate into the [story]. The semantic types of [Safety Expression] can be classified based on whether Sending letter or not, fast or slow, whether intimate or not, and isolation or not. For Sending letter, [Safety Expression] consists [Opposite Safety(Climate+Inquiry after health+Mental state)+Own safety(status+Inquiry after health+Mental state)]. At [Opposite safety], [Climate] could be subdivided as [Season] information and [Climate(weather)] information. Also, [Mental state] is divided as receiver's [Family Safety Mental state] and [Individual Safety Mental state]. In [Own Safety], [Status] is divided as receiver's traditional situation; [Recent condition] and receiver's ongoing situation; [Present condition]. [Inquiry after health] is also subdivided as receiver's [Family Safety] and [Individual Safety], [Safety] is as [Family Safety] and [Individual Safety]. Likewise, [Inquiry after health] or [Safety] is usually used as pairs, in dimension of [Family] and [Individual]. This phenomenon seems to have occurred from a big family system, which is defined as taking care of one's parents or grand parents. As for the Written Reply, [Safety Expression] consists [Opposite Safety (Reception+Inquiry after health+Mental state)+Own safety(status+Inquiry after health+Mental state)], and only in [Opposite safety], a difference in semantic structure happens with Sending letter. In [Opposite Safety], [Reception] is divided as [Letter] which is Ganchal that is directly received and [Message], which is news that is received indirectly from people. [Safety] is as [Family Safety] and [Individual Safety], [Mental state] also as [Family Safety Mental state] and [Individual Safety Mental state].
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70
The wall shear stress in the vicinity of end-to end anastomoses under steady flow conditions was measured using a flush-mounted hot-film anemometer(FMHFA) probe. The experimental measurements were in good agreement with numerical results except in flow with low Reynolds numbers. The wall shear stress increased proximal to the anastomosis in flow from the Penrose tubing (simulating an artery) to the PTFE: graft. In flow from the PTFE graft to the Penrose tubing, low wall shear stress was observed distal to the anastomosis. Abnormal distributions of wall shear stress in the vicinity of the anastomosis, resulting from the compliance mismatch between the graft and the host artery, might be an important factor of ANFH formation and the graft failure. The present study suggests a correlation between regions of the low wall shear stress and the development of anastomotic neointimal fibrous hyperplasia(ANPH) in end-to-end anastomoses. 30523 T00401030523 ^x Air pressure decay(APD) rate and ultrafiltration rate(UFR) tests were performed on new and saline rinsed dialyzers as well as those roused in patients several times. C-DAK 4000 (Cordis Dow) and CF IS-11 (Baxter Travenol) reused dialyzers obtained from the dialysis clinic were used in the present study. The new dialyzers exhibited a relatively flat APD, whereas saline rinsed and reused dialyzers showed considerable amount of decay. C-DAH dialyzers had a larger APD(11.70