• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.033 seconds

The way to make training data for deep learning model to recognize keywords in product catalog image at E-commerce (온라인 쇼핑몰에서 상품 설명 이미지 내의 키워드 인식을 위한 딥러닝 훈련 데이터 자동 생성 방안)

  • Kim, Kitae;Oh, Wonseok;Lim, Geunwon;Cha, Eunwoo;Shin, Minyoung;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.1-23
    • /
    • 2018
  • From the 21st century, various high-quality services have come up with the growth of the internet or 'Information and Communication Technologies'. Especially, the scale of E-commerce industry in which Amazon and E-bay are standing out is exploding in a large way. As E-commerce grows, Customers could get what they want to buy easily while comparing various products because more products have been registered at online shopping malls. However, a problem has arisen with the growth of E-commerce. As too many products have been registered, it has become difficult for customers to search what they really need in the flood of products. When customers search for desired products with a generalized keyword, too many products have come out as a result. On the contrary, few products have been searched if customers type in details of products because concrete product-attributes have been registered rarely. In this situation, recognizing texts in images automatically with a machine can be a solution. Because bulk of product details are written in catalogs as image format, most of product information are not searched with text inputs in the current text-based searching system. It means if information in images can be converted to text format, customers can search products with product-details, which make them shop more conveniently. There are various existing OCR(Optical Character Recognition) programs which can recognize texts in images. But existing OCR programs are hard to be applied to catalog because they have problems in recognizing texts in certain circumstances, like texts are not big enough or fonts are not consistent. Therefore, this research suggests the way to recognize keywords in catalog with the Deep Learning algorithm which is state of the art in image-recognition area from 2010s. Single Shot Multibox Detector(SSD), which is a credited model for object-detection performance, can be used with structures re-designed to take into account the difference of text from object. But there is an issue that SSD model needs a lot of labeled-train data to be trained, because of the characteristic of deep learning algorithms, that it should be trained by supervised-learning. To collect data, we can try labelling location and classification information to texts in catalog manually. But if data are collected manually, many problems would come up. Some keywords would be missed because human can make mistakes while labelling train data. And it becomes too time-consuming to collect train data considering the scale of data needed or costly if a lot of workers are hired to shorten the time. Furthermore, if some specific keywords are needed to be trained, searching images that have the words would be difficult, as well. To solve the data issue, this research developed a program which create train data automatically. This program can make images which have various keywords and pictures like catalog and save location-information of keywords at the same time. With this program, not only data can be collected efficiently, but also the performance of SSD model becomes better. The SSD model recorded 81.99% of recognition rate with 20,000 data created by the program. Moreover, this research had an efficiency test of SSD model according to data differences to analyze what feature of data exert influence upon the performance of recognizing texts in images. As a result, it is figured out that the number of labeled keywords, the addition of overlapped keyword label, the existence of keywords that is not labeled, the spaces among keywords and the differences of background images are related to the performance of SSD model. This test can lead performance improvement of SSD model or other text-recognizing machine based on deep learning algorithm with high-quality data. SSD model which is re-designed to recognize texts in images and the program developed for creating train data are expected to contribute to improvement of searching system in E-commerce. Suppliers can put less time to register keywords for products and customers can search products with product-details which is written on the catalog.

A Study on Chinese Utopia Literature - TaohuaYuanji & Renmiantaohua ('대동(大同)'과 '도화원(桃花源)'이후 유토피아는 어떻게 재현되는가 - 격비(格非)의 「인면도화(人面桃花)」에 대한 일고(一考))

  • Kim, Kyung-Seog
    • Cross-Cultural Studies
    • /
    • v.42
    • /
    • pp.7-22
    • /
    • 2016
  • Since the Taoyuanming of Confucius an Arcadia (Taohuayuan), utopia imagined in China has continued to this day. Such utopian imagination has been shaped by a variety of literary texts. These works depict the human desires and frustrations that form the utopian imagination. Typically, an Arcadia (Taohuayuan) has been recognized as a symbol of East Asian utopia. An Arcadia (Taohuayuan) anarchism is the utopian character of this small country, based on sensitization of the Lao strong. An Arcadia (Taohuayuan) world ruling class does not exist. But Confucius thought a utopian world is possible to imagine on the premise of 'Virtuous Policy' (德治). By the late Qing Dynasty (淸末), Kangyouwei (康有爲) had written and presented a utopia realized through the system of the "East-West stand." The literary text on utopia information in continued to experiment with implementation of the 'imagined' in the literature as "reality" of everyday life, immediately following the 1911 revolution in the People's Republic of China. As an Avant garde writer, Gefei's "Renmiantaohua" was influenced by the reflexive nature of the report on the point of the experiment. Gefei's "Renmiantaohua" has been rated as an outstanding work depicting the implementation process and the frustration of the utopian imagination. He is an inspiration to vanguard artists who focus on the desire and the frustration of the utopian imagination through the "Renmiantaohua" myths and stories that are comparable to the Sijipuseu story, which in itself perpetuates utopia in literature. In this paper, we explored the trails to implementation of a utopian imagination that has persisted since the ancient Chinese in current literary text. An Arcadia (Taohuayuan) of Confucius and Taoyuanming accompanied Gefei in the 21st century in describing the process of desire and frustration to realize utopia through a variety of traditional shape figures among other favorites. The author interprets those frustrations and desires of the human life course as just utopian imagination.

On a "duality" of the Corps-actant structure in Misaeng: with Jang Geurae as the central figure (『미생』에 나타난 신체 행위소 구조의 이중성에 관한 고찰 - 장그래를 중심으로)

  • Song, Taemi
    • 기호학연구
    • /
    • no.57
    • /
    • pp.211-255
    • /
    • 2018
  • This paper attempts to "re-read" the webtoon Misaeng, which was once an important issue in the field of public discourse on "labor". Our hypothesis was that the dual actactial structure of Misaeng's hero Jang Geurae gave a dual structure to the entire text, which leads to the discovery of text meaning that was not mentioned in the existing discourse. This is based on the concept of 'meta-story character', which Hiroki Azuma talked in his postmodern literary theory. To verify this hypothesis we analyzed the text by applying the Parisian semiotics, more specifically J. Fontanille's theory. Jang Geurae is observed to be a actant of dual structure divided into 'character' of the enunciated level and 'player' of the enunciation level. Considering this characteristic of the hero, Misaeng can be interpreted as a metafiction that shows the 'shifting' between the subject of the enunciated level and the subject of the enunciation level. On the level of 'character' Jang's existence mode turns out to be "Deficiency (actualized)", but on the level of 'player' Jang 's existence mode turns out to be "Inanity(potentiallized)." His somatic responses also show a duality, which is represented by Fontanille's corps-actant model, where on the level of 'character' the somatic actant of Jang consists of 'Moi-chair(ego-flesh)' and 'Soi-idem(self-idem)', and on the level of 'player' it consists of 'Moi-chair(ego-flesh)' and 'Soi-ipse(self-ipse)'. The former mainly acts as a 'role' and takes charge of exteroceptive perception, while the latter mainly acts as 'attitude' and takes charge of interoceptive perception. Because of this dissociative nature of actant, Jang's two 'self' draw the re-adjustment of values without serious conflict between the collective norms and the individual identity. This is in sharp contrast with other characters who struggle with the conflict between the environment and "self". It becomes customary to adopt norms that are suspected to ineffective, but if you raise questions, the normative system can be updated. On one axis of Misaeng there are characters who have lost themselves in customs. On the other axis, there is Jang who can not help dismantling the existing ineffective norms and updating the normative system. Jang's existence mode seems to be one of many possible modes generated by this era where people share no longer solid community values, His actantial structure also communicates with readers of these days who put themselves more in subject of enunciation level than of enunciated level.

A Dream of Communal Society for Parts Without Parts: On Thomas More's Utopia (몫 없는 자들을 위한 공유사회의 꿈: 토머스 모어의 『유토피아』)

  • Lee, Myung-Ho
    • Cross-Cultural Studies
    • /
    • v.45
    • /
    • pp.295-324
    • /
    • 2016
  • This essay attempts a contrapuntal reading of Thomas More's Utopia. Contrapunctual reading, proposed by Edward Said. attempts to make a text speak across temporal, cultural, and ideological boundaries to a topic of present. I examine two opposite readings of Utopia around 2011 by both pro- and anti-Occupy Wall Street positions. On the one hand, the opponents of Occupy find its limits as a utopian social movement echoing in the fictional character of Hythrodaeus and the alternative society verbally sketched by him in Book Two of Utopia. On the other, Occupy's advocates read More's text as embodying its radial possibility. However, each shares the tendency to denounce Book Two, praising Book One in which Hythrodaeus vehemently criticizes England; they read Hythrodaeus not as an utopian idealist but as a social critic. The Occupy, as a result, is seen here as having an ambivalent relationship to utopianism. I reinterpret the radical possibilities of Book Two criticized by both pro- and anti-Occupy invocations of Utopia. Book Two provides a utopian space in which the existing social contradictions are cancelled, revealing the limits of the three partial utopias proposed at the end of Book One. Following Louis Marin's argument, I argue, the "utopic" space does not lie in the so-called ideal society described in the text but in the inconsistencies between the text's description(discourse) and topography(map). In Book Two the existence of a king is described, yet his space is not found in the topography of utopia; likewise market is described as existing at the center of a city, yet its space is not found either. These inconsistencies create a neutral space in which the ideological contradictions of the text are cancelled, and the space opens up the possibility of communal society beyond modern sovereign power and capitalism I argue this utopian dream needs to be summoned once again in our time as a compelling alternative to the corporate, capitalist order.

Sentiment analysis on movie review through building modified sentiment dictionary by movie genre (영역별 맞춤형 감성사전 구축을 통한 영화리뷰 감성분석)

  • Lee, Sang Hoon;Cui, Jing;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.97-113
    • /
    • 2016
  • Due to the growth of internet data and the rapid development of internet technology, "big data" analysis is actively conducted to analyze enormous data for various purposes. Especially in recent years, a number of studies have been performed on the applications of text mining techniques in order to overcome the limitations of existing structured data analysis. Various studies on sentiment analysis, the part of text mining techniques, are actively studied to score opinions based on the distribution of polarity of words in documents. Usually, the sentiment analysis uses sentiment dictionary contains positivity and negativity of vocabularies. As a part of such studies, this study tries to construct sentiment dictionary which is customized to specific data domain. Using a common sentiment dictionary for sentiment analysis without considering data domain characteristic cannot reflect contextual expression only used in the specific data domain. So, we can expect using a modified sentiment dictionary customized to data domain can lead the improvement of sentiment analysis efficiency. Therefore, this study aims to suggest a way to construct customized dictionary to reflect characteristics of data domain. Especially, in this study, movie review data are divided by genre and construct genre-customized dictionaries. The performance of customized dictionary in sentiment analysis is compared with a common sentiment dictionary. In this study, IMDb data are chosen as the subject of analysis, and movie reviews are categorized by genre. Six genres in IMDb, 'action', 'animation', 'comedy', 'drama', 'horror', and 'sci-fi' are selected. Five highest ranking movies and five lowest ranking movies per genre are selected as training data set and two years' movie data from 2012 September 2012 to June 2014 are collected as test data set. Using SO-PMI (Semantic Orientation from Point-wise Mutual Information) technique, we build customized sentiment dictionary per genre and compare prediction accuracy on review rating. As a result of the analysis, the prediction using customized dictionaries improves prediction accuracy. The performance improvement is 2.82% in overall and is statistical significant. Especially, the customized dictionary on 'sci-fi' leads the highest accuracy improvement among six genres. Even though this study shows the usefulness of customized dictionaries in sentiment analysis, further studies are required to generalize the results. In this study, we only consider adjectives as additional terms in customized sentiment dictionary. Other part of text such as verb and adverb can be considered to improve sentiment analysis performance. Also, we need to apply customized sentiment dictionary to other domain such as product reviews.

Identifying Landscape Perceptions of Visitors' to the Taean Coast National Park Using Social Media Data - Focused on Kkotji Beach, Sinduri Coastal Sand Dune, and Manlipo Beach - (소셜미디어 데이터를 활용한 태안해안국립공원 방문객의 경관인식 파악 - 꽃지해수욕장·신두리해안사구·만리포해수욕장을 대상으로 -)

  • Lee, Sung-Hee;Son, Yong-Hoon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.46 no.5
    • /
    • pp.10-21
    • /
    • 2018
  • This study used text mining methodology to focus on the perceptions of the landscape embedded in text that users spontaneously uploaded to the "Taean Travel"blogpost. The study area is the Taean Coast National Park. Most of the places that are searched by 'Taean Travel' on the blog were located in the Taean Coast National Park. We conducted a network analysis on the top three places and extracted keywords related to the landscape. Finally, using a centrality and cohesion analysis, we derived landscape perceptions and the major characteristics of those landscapes. As a result of the study, it was possible to identify the main tourist places in Taean, the individual landscape experience, and the landscape perception in specific places. There were three different types of landscape characteristics: atmosphere-related keywords, which appeared in Kkotji Beach, symbolic image-related keywords appeared in Sinduri Coastal Sand Dune, and landscape objects-related appeared in Manlipo Beach. It can be inferred that the characteristics of these three places are perceived differently. Kkotji Beach is recognized as a place to appreciate a view the sunset and is a base for the Taean Coast National Park's trekking course. Sinduri Coastal Sand Dune is recognized as a place with unusual scenery, and is an ecologically valuable space. Finally, Manlipo Beach is adjacent to the Chunlipo Arboretum, which is often visited by tourists, and the beach itself is recognized as a place with an impressive appearance. Social media data is very useful because it can enable analysis of various types of contents that are not from an expert's point of view. In this study, we used social media data to analyze various aspects of how people perceive and enjoy landscapes by integrating various content, such as landscape objects, images, and activities. However, because social media data may be amplified or distorted by users' memories and perceptions, field surveys are needed to verify the results of this study.

Predicting the Direction of the Stock Index by Using a Domain-Specific Sentiment Dictionary (주가지수 방향성 예측을 위한 주제지향 감성사전 구축 방안)

  • Yu, Eunji;Kim, Yoosin;Kim, Namgyu;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.1
    • /
    • pp.95-110
    • /
    • 2013
  • Recently, the amount of unstructured data being generated through a variety of social media has been increasing rapidly, resulting in the increasing need to collect, store, search for, analyze, and visualize this data. This kind of data cannot be handled appropriately by using the traditional methodologies usually used for analyzing structured data because of its vast volume and unstructured nature. In this situation, many attempts are being made to analyze unstructured data such as text files and log files through various commercial or noncommercial analytical tools. Among the various contemporary issues dealt with in the literature of unstructured text data analysis, the concepts and techniques of opinion mining have been attracting much attention from pioneer researchers and business practitioners. Opinion mining or sentiment analysis refers to a series of processes that analyze participants' opinions, sentiments, evaluations, attitudes, and emotions about selected products, services, organizations, social issues, and so on. In other words, many attempts based on various opinion mining techniques are being made to resolve complicated issues that could not have otherwise been solved by existing traditional approaches. One of the most representative attempts using the opinion mining technique may be the recent research that proposed an intelligent model for predicting the direction of the stock index. This model works mainly on the basis of opinions extracted from an overwhelming number of economic news repots. News content published on various media is obviously a traditional example of unstructured text data. Every day, a large volume of new content is created, digitalized, and subsequently distributed to us via online or offline channels. Many studies have revealed that we make better decisions on political, economic, and social issues by analyzing news and other related information. In this sense, we expect to predict the fluctuation of stock markets partly by analyzing the relationship between economic news reports and the pattern of stock prices. So far, in the literature on opinion mining, most studies including ours have utilized a sentiment dictionary to elicit sentiment polarity or sentiment value from a large number of documents. A sentiment dictionary consists of pairs of selected words and their sentiment values. Sentiment classifiers refer to the dictionary to formulate the sentiment polarity of words, sentences in a document, and the whole document. However, most traditional approaches have common limitations in that they do not consider the flexibility of sentiment polarity, that is, the sentiment polarity or sentiment value of a word is fixed and cannot be changed in a traditional sentiment dictionary. In the real world, however, the sentiment polarity of a word can vary depending on the time, situation, and purpose of the analysis. It can also be contradictory in nature. The flexibility of sentiment polarity motivated us to conduct this study. In this paper, we have stated that sentiment polarity should be assigned, not merely on the basis of the inherent meaning of a word but on the basis of its ad hoc meaning within a particular context. To implement our idea, we presented an intelligent investment decision-support model based on opinion mining that performs the scrapping and parsing of massive volumes of economic news on the web, tags sentiment words, classifies sentiment polarity of the news, and finally predicts the direction of the next day's stock index. In addition, we applied a domain-specific sentiment dictionary instead of a general purpose one to classify each piece of news as either positive or negative. For the purpose of performance evaluation, we performed intensive experiments and investigated the prediction accuracy of our model. For the experiments to predict the direction of the stock index, we gathered and analyzed 1,072 articles about stock markets published by "M" and "E" media between July 2011 and September 2011.

Tangible Game Console: Physical Entertainment Media Using I/O Coupling Units (텐저블 게임 콘솔: 입출력 일체화 개체를 이용한 물리적 엔터테인먼트 미디어에 대한 연구)

  • Han, Kyung-Hoon;Lee, Woo-Hun;Yun, Wan-Chul
    • 한국HCI학회:학술대회논문집
    • /
    • 2006.02a
    • /
    • pp.818-823
    • /
    • 2006
  • 본 연구에서는 디지털 콘텐츠와 물리적 미디어가 유기적으로 결합할 수 있는 가능성을 모색하고자, 입출력 일체화 개체인 'I/O 셀(Cell)'과 그 집합체인 '텐저블 게임 콘솔(Tangible Game Console)'을 개발하였다. I/O 셀은 여러 종류의 센서와 액츄에이터가 하나의 물리적 노드 내에 통합되어 있는 개체로, 이 I/O 셀 다수를 집적하면 다양한 크기와 용도의 엔터테인먼트 미디어를 구성하는 것이 가능하다. 그 중 하나로 개발된 텐저블 게임 콘솔은 일종의 센서-액츄에이터 집합체로 셀들 사이의 연계를 통해 다양한 물리적 정보를 인식하고 표현할 수 있다. 텐저블 게임 콘솔은 기존 게임 콘텐츠의 주요한 표현 수단이었던 이미지와 텍스트 정보 대신에 물리적 정보를 통해 게임 콘텐츠를 구성하며, 조작대상과 조작수단이 일체화되어 있고, 또한 음악, 퍼즐, 액션 등 다양한 장르의 게임 콘텐츠가 적용 가능하다는 특징을 가지고 있다. 텐저블 게임 콘솔에서는 불, 바람, 물, 생명체 등의 물리적 현상을 형상/움직임/소리/빛 등을 통해 재현하며, 사용자는 이를 때리기/바람불기/흔들기/기울이기 등의 직접적인 조작을 통해 체험할 수 있다. 또한 물리적인 상호작용 과정이 일정한 게임 규칙을 가지고 구성되어 있어 지속성 있는 놀이 경험을 제공한다. 본 연구에서는 텐저블 게임 콘솔의 콘텐츠로 음악, 액션, 대전게임 등을 개발하였으며, 이 콘텐츠들의 놀이경험을 평가함으로써, 텐저블 게임 콘솔의 발전 가능성을 확인하였다.

  • PDF

The Design and Implementation of OWL Ontology Construction System through Information Extraction of Unstructured Documents (비정형 문서의 정보추출을 통한 OWL 온톨로지 구축 시스템의 설계 및 구현)

  • Jo, Dae Woong;Choi, Ji Woong;Kim, Myung Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.10
    • /
    • pp.23-33
    • /
    • 2014
  • The development of the information retrieval field is evolving to the research field searching accurately for the information from thing finding rapidly a large amount of information. Personalization and the semantic web technology is a key technology. The automatic indexing technology about the web document and throughput go beyond the research stage and show up as the practical service. However, there is a lack of research on the document information retrieval field about the attached document type of except the web document. In this paper, we illustrate about the method in which it analyzed the text content of the unstructured documents prepared in the text, word, hwp form and it how to construction OWL ontology. To build TBox of the document ontology and the resources which can be obtained from the document is selected, and we implement with the system in order to utilize as the instant of the constructed document ontology. It is effectually usable in the information retrieval and document management system using the semantic technology of the correspondence document as the ontology automatic construction of this kind of the unstructured documents.

A Study on The Educational Utilization of Webtoons Theoretical and Practical Evidence Exploration (웹툰(Webtoon)의 교육적 활용가능성에 대한 이론적, 실천적 근거탐색)

  • Seol, Yeon-Kyung
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.6
    • /
    • pp.510-520
    • /
    • 2020
  • This study attempts to explore the educational applicability of webtoons. Webtoons have very useful educational elements in educational environments. First, they are extremely popular among learners of the internet age and not only do webtoon images provide immersiveness and enjoyment to learners in class environments based on text but the narrativity and contextuality of webtoons can also provide abundant material in connection with the subject. Also, the interactivity of the Web 2.0, a space where webtoons are circulated, can also be utilized in educational environments. This study attempts to explore grounds for detailed approaches for the educational applicability of webtoons using theoretical and practical methods. It is hoped that through this study, assessments can be made of positive prospects regarding the educational applicability of webtoons.