• Title/Summary/Keyword: Lexicon Analysis

Search Result 91, Processing Time 0.027 seconds

Study on Characteristics of the Development Process of Fashion Design Thinking through the Lexicon (어휘를 통한 패션 디자인 발상 전개 과정의 특성 연구)

  • Kim, Yoon Kyoung
    • Journal of the Korean Society of Costume
    • /
    • v.64 no.2
    • /
    • pp.113-125
    • /
    • 2014
  • Creative thinking requires an ability to draw ideas on the given topic in a given time period with concentration. For this, the development process of design concept on the topic was collected through experiments and interviews targeting 10 fashion education experts and 10 clothing majors. After the analysis, the results are as follows: First of all, divergent thinking was done to find as many ideas and possibilities as possible at the step of expanding the topic by analogy. This showed characteristics of spreading thoughts through the spread of lexicon to professional field knowledge of learned, individual's cultural background, other art fields. Second, abstracted and designed words that are expanded and listed by the topic analogy were specified the topic gradually through the free combination method between lexicons. The sentences made by the combination of lexicons were interpreted through the serial listing method, in which the connection between sentences had the meaning of orderly cause and effect form, and the parallel listing method that treated information at once. Third, the few characteristics of the procedure that visualizing into the specific design are as follows. Firstly, the method to transform image that lexicon has into the one appropriate to the topic, the case that reflects external characteristics of selected designed word, and the case which reflects as the extrinsic expression of personal immanent and tactic desires. This study has its means to propose methods and directions to help create more creative and systematic ideas by analyzing the characteristics that appeared during the process of thinking language-oriented design.

Sentiment Analysis on 'HelloTalk' App Reviews Using NRC Emotion Lexicon and GoEmotions Dataset

  • Simay Akar;Yang Sok Kim;Mi Jin Noh
    • Smart Media Journal
    • /
    • v.13 no.6
    • /
    • pp.35-43
    • /
    • 2024
  • During the post-pandemic period, the interest in foreign language learning surged, leading to increased usage of language-learning apps. With the rising demand for these apps, analyzing app reviews becomes essential, as they provide valuable insights into user experiences and suggestions for improvement. This research focuses on extracting insights into users' opinions, sentiments, and overall satisfaction from reviews of HelloTalk, one of the most renowned language-learning apps. We employed topic modeling and emotion analysis approaches to analyze reviews collected from the Google Play Store. Several experiments were conducted to evaluate the performance of sentiment classification models with different settings. In addition, we identified dominant emotions and topics within the app reviews using feature importance analysis. The experimental results show that the Random Forest model with topics and emotions outperforms other approaches in accuracy, recall, and F1 score. The findings reveal that topics emphasizing language learning and community interactions, as well as the use of language learning tools and the learning experience, are prominent. Moreover, the emotions of 'admiration' and 'annoyance' emerge as significant factors across all models. This research highlights that incorporating emotion scores into the model and utilizing a broader range of emotion labels enhances model performance.

A domain-specific sentiment lexicon construction method for stock index directionality (주가지수 방향성 예측을 위한 도메인 맞춤형 감성사전 구축방안)

  • Kim, Jae-Bong;Kim, Hyoung-Joong
    • Journal of Digital Contents Society
    • /
    • v.18 no.3
    • /
    • pp.585-592
    • /
    • 2017
  • As development of personal devices have made everyday use of internet much easier than before, it is getting generalized to find information and share it through the social media. In particular, communities specialized in each field have become so powerful that they can significantly influence our society. Finally, businesses and governments pay attentions to reflecting their opinions in their strategies. The stock market fluctuates with various factors of society. In order to consider social trends, many studies have tried making use of bigdata analysis on stock market researches as well as traditional approaches using buzz amount. In the example at the top, the studies using text data such as newspaper articles are being published. In this paper, we analyzed the post of 'Paxnet', a securities specialists' site, to supplement the limitation of the news. Based on this, we help researchers analyze the sentiment of investors by generating a domain-specific sentiment lexicon for the stock market.

Public Sentiment Analysis of Korean Top-10 Companies: Big Data Approach Using Multi-categorical Sentiment Lexicon (국내 주요 10대 기업에 대한 국민 감성 분석: 다범주 감성사전을 활용한 빅 데이터 접근법)

  • Kim, Seo In;Kim, Dong Sung;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.45-69
    • /
    • 2016
  • Recently, sentiment analysis using open Internet data is actively performed for various purposes. As online Internet communication channels become popular, companies try to capture public sentiment of them from online open information sources. This research is conducted for the purpose of analyzing pulbic sentiment of Korean Top-10 companies using a multi-categorical sentiment lexicon. Whereas existing researches related to public sentiment measurement based on big data approach classify sentiment into dimensions, this research classifies public sentiment into multiple categories. Dimensional sentiment structure has been commonly applied in sentiment analysis of various applications, because it is academically proven, and has a clear advantage of capturing degree of sentiment and interrelation of each dimension. However, the dimensional structure is not effective when measuring public sentiment because human sentiment is too complex to be divided into few dimensions. In addition, special training is needed for ordinary people to express their feeling into dimensional structure. People do not divide their sentiment into dimensions, nor do they need psychological training when they feel. People would not express their feeling in the way of dimensional structure like positive/negative or active/passive; rather they express theirs in the way of categorical sentiment like sadness, rage, happiness and so on. That is, categorial approach of sentiment analysis is more natural than dimensional approach. Accordingly, this research suggests multi-categorical sentiment structure as an alternative way to measure social sentiment from the point of the public. Multi-categorical sentiment structure classifies sentiments following the way that ordinary people do although there are possibility to contain some subjectiveness. In this research, nine categories: 'Sadness', 'Anger', 'Happiness', 'Disgust', 'Surprise', 'Fear', 'Interest', 'Boredom' and 'Pain' are used as multi-categorical sentiment structure. To capture public sentiment of Korean Top-10 companies, Internet news data of the companies are collected over the past 25 months from a representative Korean portal site. Based on the sentiment words extracted from previous researches, we have created a sentiment lexicon, and analyzed the frequency of the words coming up within the news data. The frequency of each sentiment category was calculated as a ratio out of the total sentiment words to make ranks of distributions. Sentiment comparison among top-4 companies, which are 'Samsung', 'Hyundai', 'SK', and 'LG', were separately visualized. As a next step, the research tested hypothesis to prove the usefulness of the multi-categorical sentiment lexicon. It tested how effective categorial sentiment can be used as relative comparison index in cross sectional and time series analysis. To test the effectiveness of the sentiment lexicon as cross sectional comparison index, pair-wise t-test and Duncan test were conducted. Two pairs of companies, 'Samsung' and 'Hanjin', 'SK' and 'Hanjin' were chosen to compare whether each categorical sentiment is significantly different in pair-wise t-test. Since category 'Sadness' has the largest vocabularies, it is chosen to figure out whether the subgroups of the companies are significantly different in Duncan test. It is proved that five sentiment categories of Samsung and Hanjin and four sentiment categories of SK and Hanjin are different significantly. In category 'Sadness', it has been figured out that there were six subgroups that are significantly different. To test the effectiveness of the sentiment lexicon as time series comparison index, 'nut rage' incident of Hanjin is selected as an example case. Term frequency of sentiment words of the month when the incident happened and term frequency of the one month before the event are compared. Sentiment categories was redivided into positive/negative sentiment, and it is tried to figure out whether the event actually has some negative impact on public sentiment of the company. The difference in each category was visualized, moreover the variation of word list of sentiment 'Rage' was shown to be more concrete. As a result, there was huge before-and-after difference of sentiment that ordinary people feel to the company. Both hypotheses have turned out to be statistically significant, and therefore sentiment analysis in business area using multi-categorical sentiment lexicons has persuasive power. This research implies that categorical sentiment analysis can be used as an alternative method to supplement dimensional sentiment analysis when figuring out public sentiment in business environment.

A study of flaps in American English based on the Buckeye Corpus (Buckeye corpus에 나타난 탄설음화 현상 분석)

  • Hwang, Byeonghoo;Kang, Seokhan
    • Phonetics and Speech Sciences
    • /
    • v.10 no.3
    • /
    • pp.9-18
    • /
    • 2018
  • This paper presents an acoustic and phonological study of the alveolar flaps in American English. Based on the Buckeye Corpus, the flapping tokens produced by twenty men are analyzed at both lexical and post-lexical levels. The data, analyzed with Pratt speech analysis, include duration, F2 and F3 in voicing during the flap, as well as duration, F1, F2, F3, and f0 in the adjacent vowels. The results provide evidence on two issues: (1) The different ways in which voiced and voiceless alveolar stops give rise to neutralized flapping stops by following lexical and post-lexical levels, (2) The extent to which the vowel features (height, frontness, and tenseness) affect flapping sounds. The results show that flaps are affected by pre-consonantal vowel features at the lexical as well as post-lexical levels. Unlike previous studies, this study uses the Praat method to distinguish flapped from unflapped tokens in the Buckeye Corpus and examines connections between the lexical and post-lexical levels.

Lexicon Analysis Method for Basic Lexicon Construction included 7th Mother Language Text Books of Element School (기초 어휘 선정을 위한 초등학교 국어 교과서에 등장하는 어휘 분석 방안)

  • Chae, Young-Soog;Chae, Young-Hee
    • Annual Conference on Human and Language Technology
    • /
    • 2002.10e
    • /
    • pp.98-102
    • /
    • 2002
  • 초등학교 교과서에 사용된 어휘의 수준을 보기 위해 교과서에 쓰인 어휘의 사용 빈도를 포함하여 결정에 영향력을 미칠 요소를 파악하고 요소간의 관계를 설립하여 교육용 어휘 설정의 나아갈 방향을 제시하는데 목적이 있다. 7차 교육과정에 있는 초등학교 교과서에서 국어 어휘 교육 관련 항목을 살펴 이들의 단계별 학습 수준의 고려가 이루어져 있는지를 검토하고자 한다. 수준별 교육 과정에서 밝히고 있는 어휘 의미 교육의 위계가 세부적이고 치밀한 수준의 적정성을 바탕으로 하여 구성되어 있는지를 검토하고 초등학교 교육용 어휘 선정의 문제 분석을 통해 기본 어휘와 기초 어휘 분류의 적정 기준과 학습 활동에 있어 언어 사용 능력으로서의 어휘력과 언어 체계 속의 어휘력을 구분할 필요가 있음을 설명하고자 한다.

  • PDF

Analysis of Emotions in Lyrics by Combining Deep Learning BERT and Emotional Lexicon (딥러닝 모델(BERT)과 감정 어휘 사전을 결합한 음원 가사 감정 분석)

  • Yoon, Kyung Seob;Oh, Jong Min
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2022.07a
    • /
    • pp.471-474
    • /
    • 2022
  • 음원 스트리밍 서비스 시장은 지속해서 성장해왔다. 그중 최근에 가장 성장세가 돋보이는 서비스는 Spotify와 Youtube music이다. 두 서비스의 추천시스템은 사용자가 좋아할 만한 음악을 계속해서 추천해 줌으로써 많은 사랑을 받고 있다. 추천시스템 성능은 추천에 활용할 수 있는 변수(Feature) 수에 비례한다고 볼 수 있다. 최대한 많은 정보를 알아야 사용자가 원하는 추천이 가능하기 때문이다. 본 논문에서는 기존에 존재하는 감정분류 방법론인 사전기반과 딥러닝 BERT를 사용한 머신기반 방법론을 적절하게 결합하여 장점을 유지하면서 단점을 보완한 하이브리드 감정 분석 모델을 제안함으로써 가사에서 느껴지는 감정 비율을 분석한다. 감정 비율을 음원 가중치 변수로 사용하면 감정 정보를 포함한 고도화된 추천을 기대할 수 있다.

  • PDF

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.