• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.029 seconds

Target-Aspect-Sentiment Joint Detection with CNN Auxiliary Loss for Aspect-Based Sentiment Analysis (CNN 보조 손실을 이용한 차원 기반 감성 분석)

  • Jeon, Min Jin;Hwang, Ji Won;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.4
    • /
    • pp.1-22
    • /
    • 2021
  • Aspect Based Sentiment Analysis (ABSA), which analyzes sentiment based on aspects that appear in the text, is drawing attention because it can be used in various business industries. ABSA is a study that analyzes sentiment by aspects for multiple aspects that a text has. It is being studied in various forms depending on the purpose, such as analyzing all targets or just aspects and sentiments. Here, the aspect refers to the property of a target, and the target refers to the text that causes the sentiment. For example, for restaurant reviews, you could set the aspect into food taste, food price, quality of service, mood of the restaurant, etc. Also, if there is a review that says, "The pasta was delicious, but the salad was not," the words "steak" and "salad," which are directly mentioned in the sentence, become the "target." So far, in ABSA, most studies have analyzed sentiment only based on aspects or targets. However, even with the same aspects or targets, sentiment analysis may be inaccurate. Instances would be when aspects or sentiment are divided or when sentiment exists without a target. For example, sentences like, "Pizza and the salad were good, but the steak was disappointing." Although the aspect of this sentence is limited to "food," conflicting sentiments coexist. In addition, in the case of sentences such as "Shrimp was delicious, but the price was extravagant," although the target here is "shrimp," there are opposite sentiments coexisting that are dependent on the aspect. Finally, in sentences like "The food arrived too late and is cold now." there is no target (NULL), but it transmits a negative sentiment toward the aspect "service." Like this, failure to consider both aspects and targets - when sentiment or aspect is divided or when sentiment exists without a target - creates a dual dependency problem. To address this problem, this research analyzes sentiment by considering both aspects and targets (Target-Aspect-Sentiment Detection, hereby TASD). This study detected the limitations of existing research in the field of TASD: local contexts are not fully captured, and the number of epochs and batch size dramatically lowers the F1-score. The current model excels in spotting overall context and relations between each word. However, it struggles with phrases in the local context and is relatively slow when learning. Therefore, this study tries to improve the model's performance. To achieve the objective of this research, we additionally used auxiliary loss in aspect-sentiment classification by constructing CNN(Convolutional Neural Network) layers parallel to existing models. If existing models have analyzed aspect-sentiment through BERT encoding, Pooler, and Linear layers, this research added CNN layer-adaptive average pooling to existing models, and learning was progressed by adding additional loss values for aspect-sentiment to existing loss. In other words, when learning, the auxiliary loss, computed through CNN layers, allowed the local context to be captured more fitted. After learning, the model is designed to do aspect-sentiment analysis through the existing method. To evaluate the performance of this model, two datasets, SemEval-2015 task 12 and SemEval-2016 task 5, were used and the f1-score increased compared to the existing models. When the batch was 8 and epoch was 5, the difference was largest between the F1-score of existing models and this study with 29 and 45, respectively. Even when batch and epoch were adjusted, the F1-scores were higher than the existing models. It can be said that even when the batch and epoch numbers were small, they can be learned effectively compared to the existing models. Therefore, it can be useful in situations where resources are limited. Through this study, aspect-based sentiments can be more accurately analyzed. Through various uses in business, such as development or establishing marketing strategies, both consumers and sellers will be able to make efficient decisions. In addition, it is believed that the model can be fully learned and utilized by small businesses, those that do not have much data, given that they use a pre-training model and recorded a relatively high F1-score even with limited resources.

A Literature Review and Classification of Recommender Systems on Academic Journals (추천시스템관련 학술논문 분석 및 분류)

  • Park, Deuk-Hee;Kim, Hyea-Kyeong;Choi, Il-Young;Kim, Jae-Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.1
    • /
    • pp.139-152
    • /
    • 2011
  • Recommender systems have become an important research field since the emergence of the first paper on collaborative filtering in the mid-1990s. In general, recommender systems are defined as the supporting systems which help users to find information, products, or services (such as books, movies, music, digital products, web sites, and TV programs) by aggregating and analyzing suggestions from other users, which mean reviews from various authorities, and user attributes. However, as academic researches on recommender systems have increased significantly over the last ten years, more researches are required to be applicable in the real world situation. Because research field on recommender systems is still wide and less mature than other research fields. Accordingly, the existing articles on recommender systems need to be reviewed toward the next generation of recommender systems. However, it would be not easy to confine the recommender system researches to specific disciplines, considering the nature of the recommender system researches. So, we reviewed all articles on recommender systems from 37 journals which were published from 2001 to 2010. The 37 journals are selected from top 125 journals of the MIS Journal Rankings. Also, the literature search was based on the descriptors "Recommender system", "Recommendation system", "Personalization system", "Collaborative filtering" and "Contents filtering". The full text of each article was reviewed to eliminate the article that was not actually related to recommender systems. Many of articles were excluded because the articles such as Conference papers, master's and doctoral dissertations, textbook, unpublished working papers, non-English publication papers and news were unfit for our research. We classified articles by year of publication, journals, recommendation fields, and data mining techniques. The recommendation fields and data mining techniques of 187 articles are reviewed and classified into eight recommendation fields (book, document, image, movie, music, shopping, TV program, and others) and eight data mining techniques (association rule, clustering, decision tree, k-nearest neighbor, link analysis, neural network, regression, and other heuristic methods). The results represented in this paper have several significant implications. First, based on previous publication rates, the interest in the recommender system related research will grow significantly in the future. Second, 49 articles are related to movie recommendation whereas image and TV program recommendation are identified in only 6 articles. This result has been caused by the easy use of MovieLens data set. So, it is necessary to prepare data set of other fields. Third, recently social network analysis has been used in the various applications. However studies on recommender systems using social network analysis are deficient. Henceforth, we expect that new recommendation approaches using social network analysis will be developed in the recommender systems. So, it will be an interesting and further research area to evaluate the recommendation system researches using social method analysis. This result provides trend of recommender system researches by examining the published literature, and provides practitioners and researchers with insight and future direction on recommender systems. We hope that this research helps anyone who is interested in recommender systems research to gain insight for future research.

A Study on Automatic Classification Model of Documents Based on Korean Standard Industrial Classification (한국표준산업분류를 기준으로 한 문서의 자동 분류 모델에 관한 연구)

  • Lee, Jae-Seong;Jun, Seung-Pyo;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.221-241
    • /
    • 2018
  • As we enter the knowledge society, the importance of information as a new form of capital is being emphasized. The importance of information classification is also increasing for efficient management of digital information produced exponentially. In this study, we tried to automatically classify and provide tailored information that can help companies decide to make technology commercialization. Therefore, we propose a method to classify information based on Korea Standard Industry Classification (KSIC), which indicates the business characteristics of enterprises. The classification of information or documents has been largely based on machine learning, but there is not enough training data categorized on the basis of KSIC. Therefore, this study applied the method of calculating similarity between documents. Specifically, a method and a model for presenting the most appropriate KSIC code are proposed by collecting explanatory texts of each code of KSIC and calculating the similarity with the classification object document using the vector space model. The IPC data were collected and classified by KSIC. And then verified the methodology by comparing it with the KSIC-IPC concordance table provided by the Korean Intellectual Property Office. As a result of the verification, the highest agreement was obtained when the LT method, which is a kind of TF-IDF calculation formula, was applied. At this time, the degree of match of the first rank matching KSIC was 53% and the cumulative match of the fifth ranking was 76%. Through this, it can be confirmed that KSIC classification of technology, industry, and market information that SMEs need more quantitatively and objectively is possible. In addition, it is considered that the methods and results provided in this study can be used as a basic data to help the qualitative judgment of experts in creating a linkage table between heterogeneous classification systems.

Digital Archives of Cultural Archetype Contents: Its Problems and Direction (디지털 아카이브즈의 문제점과 방향 - 문화원형 콘텐츠를 중심으로 -)

  • Hahm, Han-Hee;Park, Soon-Cheol
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.17 no.2
    • /
    • pp.23-42
    • /
    • 2006
  • This is a study of the digital archives of Culturecontent.com where 'Cultural Archetype Contents' are currently in service. One of the major purposes of our study is to point out problems in the current system and eventually propose improvements to the digital archives. The government launched a four-year project for developing the cultural archetype content sources and establishing its related business with the hope of enhancing the nation's competitiveness. More specifically, the project focuses on the production of source materials of cultural archetype contents in the subjects of Korea's history. tradition, everyday life. arts and general geographical books. In addition, through this project, the government also intends to establish a proper distribution system of digitalized culture contents and to control copyright issues. This paper analyzes the digital archives system that stores the culture content data that have been produced from 2002 to 2005 and evaluates the current system's weaknesses and strengths. The summary of our findings is as follows. First. the digital archives system does not contain a semantic search engine and therefore its full function is 1agged. Second, similar data is not classified into the same categories but into the different ones, thereby confusing and inconveniencing users. Users who want to find source materials could be disappointed by the current distributive system. Our paper suggests a better system of digital archives with text mining technology which consists of five significant intelligent process-keyword searches, summarization, clustering, classification and topic tracking. Our paper endeavors to develop the best technical environment for preserving and using culture contents data. With the new digitalized upgraded settings, users of culture contents data will discover a world of new knowledge. The technology we introduce in this paper will lead to the highest achievable digital intelligence through a new framework.

Definition and Division in Intelligent Service Facility for Integrating Management (지능화시설의 통합운영관리를 위한 정의 및 구분에 관한 연구)

  • PARK, Jeong-Woo;YIM, Du-Hyun;NAM, Kwang-Woo;KIM, Jin-Young
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.19 no.4
    • /
    • pp.52-62
    • /
    • 2016
  • Smart City is urban development for complex problem solving that provides convenience and safety for citizens, and it is a blueprint for future cities. In 2008, the Korean government defined the construction, management, and government support of U-Cities in the legislation, Act on the Construction, Etc. of Ubiquitous Cities (Ubiquitous City Act), which included definitions of terms used in the act. In addition, the Minister of Land, Infrastructure and Transport has established a "ubiquitous city master plan" considering this legislation. The concept of U-Cities is complex, due to the mix of informatization and urban planning. Because of this complexity, the foundation of relevant regulations is inadequate, which is impeding the establishment and implementation of practical plans. Smart City intelligent service facilities are not easy to define and classify, because technology is rapidly changing and includes various devices for gathering and expressing information. The purpose of this study is to complement the legal definition of the intelligent service facility, which is necessary for integrated management and operation. The related laws and regulations on U-City were analyzed using text-mining techniques to identify insufficient legal definitions of intelligent service facilities. Using data gathered from interviews with officials responsible for constructing U-Cities, this study identified problems generated by implementing intelligent service facilities at the field level. This strategy should contribute to improved efficiency management, the foundation for building integrated utilization between departments. Efficiencies include providing a clear concept for establishing five-year renewable plans for U-Cities.

A Study on Analyzing Sentiments on Movie Reviews by Multi-Level Sentiment Classifier (영화 리뷰 감성분석을 위한 텍스트 마이닝 기반 감성 분류기 구축)

  • Kim, Yuyoung;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.71-89
    • /
    • 2016
  • Sentiment analysis is used for identifying emotions or sentiments embedded in the user generated data such as customer reviews from blogs, social network services, and so on. Various research fields such as computer science and business management can take advantage of this feature to analyze customer-generated opinions. In previous studies, the star rating of a review is regarded as the same as sentiment embedded in the text. However, it does not always correspond to the sentiment polarity. Due to this supposition, previous studies have some limitations in their accuracy. To solve this issue, the present study uses a supervised sentiment classification model to measure a more accurate sentiment polarity. This study aims to propose an advanced sentiment classifier and to discover the correlation between movie reviews and box-office success. The advanced sentiment classifier is based on two supervised machine learning techniques, the Support Vector Machines (SVM) and Feedforward Neural Network (FNN). The sentiment scores of the movie reviews are measured by the sentiment classifier and are analyzed by statistical correlations between movie reviews and box-office success. Movie reviews are collected along with a star-rate. The dataset used in this study consists of 1,258,538 reviews from 175 films gathered from Naver Movie website (movie.naver.com). The results show that the proposed sentiment classifier outperforms Naive Bayes (NB) classifier as its accuracy is about 6% higher than NB. Furthermore, the results indicate that there are positive correlations between the star-rate and the number of audiences, which can be regarded as the box-office success of a movie. The study also shows that there is the mild, positive correlation between the sentiment scores estimated by the classifier and the number of audiences. To verify the applicability of the sentiment scores, an independent sample t-test was conducted. For this, the movies were divided into two groups using the average of sentiment scores. The two groups are significantly different in terms of the star-rated scores.

A study on the readability of web interface for the elderly user -Focused on readability of Typeface- (고령사용자를 위한 웹 인터페이스에서의 가독성에 관한 연구 -Typeface의 가독성을 중심으로-)

  • Lee, Hyun-Ju;Woo, Seo-Hye;Park, Eun-Young;Suh, Hye-Young;Back, Seung-Chul
    • Archives of design research
    • /
    • v.20 no.3 s.71
    • /
    • pp.315-324
    • /
    • 2007
  • The fast development of the information technology makes Korea one of the most advanced countries in information communication in the world in a short period of time. However, the gap between the aged and the young has been seriously increased. Those who are less than 10% of the older adults are using the internet at present. It means the elderly has many difficulties in using the internet because of their physical and cognitive differences. The purpose of this study is that the aged can easily achieve and use information by developing a guidelines for the Korean typography in the web interface. A literature search was conducted on the web interface design guidelines for older adults. These guidelines were classified by interface component and the study subjects needed for the Korean internet environment were selected. The subjects are a more comfortably readable typeface according to the sizes, a proper text size of Gulim and Batang, a more comfortably readable leading size, the appropriate letter spacing, the proper line length of body, the suitable size proportion between a title and a body, and a more comfortably readable text alignment. Survey questions were made and these Questions were improved after the pretest. Both online and offline survey programs were written and the aged and the young were tested with these programs. The result of this survey shows that there are satisfaction differences between the aged and the young in the readability and legibility of the web contents. Therefore these universal guidelines to be used in the Korean typographical environment for the future aged population were specified. It is expected that this study will be used as basic data for the universal web interface where the older adults can easily use and acquire information.

  • PDF

The Effects of Personal, Familial, School Environmental Variables on Mobile Phone Addiction by Adolescent (청소년의 휴대폰 중독성에 영향을 미치는 개인, 가족, 학교환경 변인)

  • Lee, Yeon-Mi;Lee, Seon-Jeong;Shin, Hyo-Shick
    • Journal of Korean Home Economics Education Association
    • /
    • v.21 no.3
    • /
    • pp.29-43
    • /
    • 2009
  • The purposes of this study were to examine effective variables influencing on adolescents' mobile phone addiction. The subjects were the 666 middle or high school students who had their own mobile phone in Gwangju. Data were analyzed with frequency, percentage, cronbach's ${\alpha}$, mean, SD, pearson's correlation and multiple regression using SPSS/PC WIN 14.0 program. The major findings were as follows: 1. The most frequently used function of student mobile phones was text message and they used text message more than 41 times a day. Mostly they were talking to the same sex friends on the phone and the monthly charges ranged from 20,000 to 30,000 won. In general, they called to their friends after school. Their parents' attitudes toward their mobile phone using showed that the most of their parents did not care about their children's mobile phone use. The restriction of using their mobile phone at school was normal. 2. The average scores of mobile phone addiction were lower than median(3.0) and self esteem, self control, family strengths, peer conformity and school life adaptation were higher than median. 3. The adolescence's mobile phone additions were influenced by peer conformity, school life adaptation, school levels, sex and self control. And these variables explained to the adolescence's mobile phone additions about 28%.

  • PDF

The relationship between students' perceptions and practicability of the "Me and My Family Relations" unit and Family strength among middle school students (나와 가족관계' 단원에 대한 중학생의 긍정적 인식, 실천성 인식과 가족건강성)

  • Cho, Byung-Eun;Jung, Sun-Hee
    • Journal of Korean Home Economics Education Association
    • /
    • v.19 no.1 s.43
    • /
    • pp.99-114
    • /
    • 2007
  • This study aims at investigating how middle school students perceive the content of the 'Me and My Family Relations' unit in the technology and home economics textbook, act on such perceptions and how this connects with their healthy family relations. In addition, the study also points at inquiring into what kind of differences and mutual influences can be found in the above-mentioned three factors according to family environment. With this objective. this research has analyzed survey data conducted on 401 7th grade middle school students residing in Incheon Greater city, collected by the random sampling method. The findings are as follow: First, the students were found to have positive perceptions on the 'family relations and communication' unit in the technology and home economics text book. However. they were also found to perceive that the content was not as realizable in their everyday family lives. Second, the number of students who perceived their family lives to be healthy was found to be quite high. The students perceived their family lives to be healthy projecting from such aspects as the degree of gratification and affection, extent of family bonding, communication patterns, and problem solving abilities, in the same order. In addition, the higher the families' socio-economic level, and in the cases that the students had working mothers and the fathers held higher degrees, the degree to which the families were perceived to be healthy was higher. Third, in investigating the influence that such factors as the students' family environment, the degree that students perceived the text book content positively, and the degree that the students perceive the content to be realizable have on healthy family relations, among these factors, the students' perceived degree of how healthy their family relations are had the most bearing over the above-mentioned factors. The second influential factor on how healthy family relations are was the family's affectional environment, found to be more influential than such factors as family type, the mother's employment status, living standards, and the parents' educational level. On the other hand, the perceived level of realizability was found to have a lower influence on the students' family relations than the perceived positivity.

  • PDF

Keyword Network Analysis for Technology Forecasting (기술예측을 위한 특허 키워드 네트워크 분석)

  • Choi, Jin-Ho;Kim, Hee-Su;Im, Nam-Gyu
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.227-240
    • /
    • 2011
  • New concepts and ideas often result from extensive recombination of existing concepts or ideas. Both researchers and developers build on existing concepts and ideas in published papers or registered patents to develop new theories and technologies that in turn serve as a basis for further development. As the importance of patent increases, so does that of patent analysis. Patent analysis is largely divided into network-based and keyword-based analyses. The former lacks its ability to analyze information technology in details while the letter is unable to identify the relationship between such technologies. In order to overcome the limitations of network-based and keyword-based analyses, this study, which blends those two methods, suggests the keyword network based analysis methodology. In this study, we collected significant technology information in each patent that is related to Light Emitting Diode (LED) through text mining, built a keyword network, and then executed a community network analysis on the collected data. The results of analysis are as the following. First, the patent keyword network indicated very low density and exceptionally high clustering coefficient. Technically, density is obtained by dividing the number of ties in a network by the number of all possible ties. The value ranges between 0 and 1, with higher values indicating denser networks and lower values indicating sparser networks. In real-world networks, the density varies depending on the size of a network; increasing the size of a network generally leads to a decrease in the density. The clustering coefficient is a network-level measure that illustrates the tendency of nodes to cluster in densely interconnected modules. This measure is to show the small-world property in which a network can be highly clustered even though it has a small average distance between nodes in spite of the large number of nodes. Therefore, high density in patent keyword network means that nodes in the patent keyword network are connected sporadically, and high clustering coefficient shows that nodes in the network are closely connected one another. Second, the cumulative degree distribution of the patent keyword network, as any other knowledge network like citation network or collaboration network, followed a clear power-law distribution. A well-known mechanism of this pattern is the preferential attachment mechanism, whereby a node with more links is likely to attain further new links in the evolution of the corresponding network. Unlike general normal distributions, the power-law distribution does not have a representative scale. This means that one cannot pick a representative or an average because there is always a considerable probability of finding much larger values. Networks with power-law distributions are therefore often referred to as scale-free networks. The presence of heavy-tailed scale-free distribution represents the fundamental signature of an emergent collective behavior of the actors who contribute to forming the network. In our context, the more frequently a patent keyword is used, the more often it is selected by researchers and is associated with other keywords or concepts to constitute and convey new patents or technologies. The evidence of power-law distribution implies that the preferential attachment mechanism suggests the origin of heavy-tailed distributions in a wide range of growing patent keyword network. Third, we found that among keywords that flew into a particular field, the vast majority of keywords with new links join existing keywords in the associated community in forming the concept of a new patent. This finding resulted in the same outcomes for both the short-term period (4-year) and long-term period (10-year) analyses. Furthermore, using the keyword combination information that was derived from the methodology suggested by our study enables one to forecast which concepts combine to form a new patent dimension and refer to those concepts when developing a new patent.