• Title/Summary/Keyword: Text Input

Search Result 360, Processing Time 0.023 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Design and Implementation of the Java Applet-based Courseware (Java Applet 기반 코스웨어의 설계 및 구현)

  • Kim, Kyu-Soo;Kim, Hyun-Bae
    • Journal of The Korean Association of Information Education
    • /
    • v.4 no.2
    • /
    • pp.179-186
    • /
    • 2001
  • The purpose of this study is to design and implement a courseware that makes possible interaction between man and computer in the internet. For this, We select the contents of learning and designe a courseware with text, graphic data. HTML, Java script and Java applet. Some advantages of the courseware are as follows. Interactions between man and computer are possible by giving diverse feedback to input-response in the web. And it is possible to access the courseware regardless of time and space when the network environment of user's computer is suitably equipped. Finally, on operator's part, the revision of the courseware becomes easier and on client's part, the system resources are less required.

  • PDF

BEPAT: A platform for building energy assessment in energy smart homes and design optimization

  • Kamel, Ehsan;Memari, Ali M.
    • Advances in Energy Research
    • /
    • v.5 no.4
    • /
    • pp.321-339
    • /
    • 2017
  • Energy simulation tools can provide information on the amount of heat transfer through building envelope components, which are considered the main sources of heat loss in buildings. Therefore, it is important to improve the quality of outputs from energy simulation tools and also the process of obtaining them. In this paper, a new Building Energy Performance Assessment Tool (BEPAT) is introduced, which provides users with granular data related to heat transfer through every single wall, window, door, roof, and floor in a building and automatically saves all the related data in text files. This information can be used to identify the envelope components for thermal improvement through energy retrofit or during the design phase. The generated data can also be adopted in the design of energy smart homes, building design tools, and energy retrofit tools as a supplementary dataset. BEPAT is developed by modifying EnergyPlus source code as the energy simulation engine using C++, which only requires Input Data File (IDF) and weather file to perform the energy simulation and automatically provide detailed output. To validate the BEPAT results, a computer model is developed in Revit for use in BEPAT. Validating BEPAT's output with EnergyPlus "advanced output" shows a difference of less than 2% and thus establishing the capability of this tool to facilitate the provision of detailed output on the quantity of heat transfer through walls, fenestrations, roofs, and floors.

A Study on the Automated Design System for Gear (기어설계 자동화 시스템에 관한 연구)

  • Jo, Hae-Yong;Nam, Gi-Jeong;O, Byeong-Gi
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.26 no.8
    • /
    • pp.1506-1511
    • /
    • 2002
  • A computer aided expert system fur spur, helical, bevel and worm gears was newly developed by using AutoiCAD system and its AutoLISP computer language in the present study. Two methods are available for a designer to draw a gear. The first method needs the gear design parameters such as pressure, module, number of tooth, shaft angle, velocity, materials, etc. When the gear design parameters are inputted, a gear is drawn in AutoCAD system and maximum allowable power and shaft diameter are calculated additionally. The second method calculates all dimensions and gear design parameters to draw a gear when the information such as transmission, reduction ratio, nm, materials and pressure are inputted. The system includes four programs. Each program is composed of a data input module, a database module, a strength calculation module, a drawing module, a text module and a drawing edit module. In conclusion, the CAD system would be widely used in companies to find the geometric data and manufacturing course.

The Development of Forest Fire Statistical Management System using Web GIS Technology

  • Jo, Myung-Hee;Kim, Joon-Bum;Kim, Hyun-Sik;Jo, Yun-Won
    • Proceedings of the KSRS Conference
    • /
    • 2002.10a
    • /
    • pp.183-190
    • /
    • 2002
  • In this paper forest fire statistical information management system is constructed on web environment using web based GIS(Geographic Information System) technology. Though this system, general users can easily access forest fire statistical information and obtain them in visual method such as maps, graphs, and text if they have web browsers. Moreover, officials related to forest fire can easily control and manage all information in domestic by accessing input interface, retrieval interface, and out interface. In order to implement this system, IIS 5.0 of Microsoft is used as web server and Oracle 8i and ASP(Active Server Page) are used for database construction and dynamic web page operation, respectively. Also, Arc IMS of ESRI is used to serve map data using Java and HTML as system development language. Through this system, general users can obtain the whole information related to forest fire visually in real time also recognize forest fire prevention. In addition, Forest officials can manage the domestic forest resource and control forest fire dangerous area efficiently and scientifically by analyzing and retrieving huge forest data through this system. So, they can save their manpower, time and cost to collect and manage data.

  • PDF

Study on Multimedia Expert Diagnostic System of Chicken Diseases

  • Lu Changhua;Wang Lifang;Nong, Hu-Yi;Wang Qiming;Lu Qingwen
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2001.01a
    • /
    • pp.508-510
    • /
    • 2001
  • Adopting the method of user weighting fuzzy mathematics, the author accomplished the subject title “Study on Expert System of Chicken\`s Common Diseases Diagnostics”, which could properly diagnose 30 kinds of chicken\`s common diseases and the accordance rate reached 80% verified through 244 disease cases. On the basis of the accomplishment, the multimedia technology was adopted further more to establish a system, which integrated with the input, display, query, and processing of sound, picture and text etc., combined with the previous chicken disease diagnostic expert system, make the output information of computer more rich and comprehensive, and the accordance rate of disease diagnosis could be improved. The system consists of database, knowledge base, graphics and picture base. This system is easy to operate and interface of which is vivid and intuitive. It could output diagnostic result and prescribe rapidly, so that, such a system is not only adapted to large, medium chicken farm but also to grass-roots veterinary station for developing health care and disease diagnosing. It is sure that the system could have side prospect of application.

  • PDF

Implementation of Neural Networks using GPU (GPU를 이용한 신경망 구현)

  • Oh Kyoung-su;Jung Keechul
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.735-742
    • /
    • 2004
  • We present a new use of common graphics hardware to perform a faster artificial neural network. And we examine the use of GPU enhances the time performance of the image processing system using neural network, In the case of parallel computation of multiple input sets, the vector-matrix products become matrix-matrix multiplications. As a result, we can fully utilize the parallelism of GPU. Sigmoid operation and bias term addition are also implemented using pixel shader on GPU. Our preliminary result shows a performance enhancement of about thirty times faster using ATI RADEON 9800 XT board.

Movement Search in Video Stream Using Shape Sequence (동영상에서 모양 시퀀스를 이용한 동작 검색 방법)

  • Choi, Min-Seok
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.4
    • /
    • pp.492-501
    • /
    • 2009
  • Information on movement of objects in videos can be used as an important part in categorizing and separating the contents of a scene. This paper is proposing a shape-based movement-matching algorithm to effectively find the movement of an object in video streams. Information on object movement is extracted from the object boundaries from the input video frames becoming expressed in continuous 2D shape information while individual 2D shape information is converted into a lD shape feature using the shape descriptor. Object movement in video can be found as simply as searching for a word in a text without a separate movement segmentation process using the sequence of the shape descriptor listed according to order. The performance comparison results with the MPEG-7 shape variation descriptor showed that the proposed method can effectively express the movement information of the object and can be applied to movement search and analysis applications.

  • PDF

A comparison of grammatical error detection techniques for an automated english scoring system

  • Lee, Songwook;Lee, Kong Joo
    • Journal of Advanced Marine Engineering and Technology
    • /
    • v.37 no.7
    • /
    • pp.760-770
    • /
    • 2013
  • Detecting grammatical errors from a text is a long-history application. In this paper, we compare the performance of two grammatical error detection techniques, which are implemented as a sub-module of an automated English scoring system. One is to use a full syntactic parser, which has not only grammatical rules but also extra-grammatical rules in order to detect syntactic errors while paring. The other one is to use a finite state machine which can identify an error covering a small range of an input. In order to compare the two approaches, grammatical errors are divided into three parts; the first one is grammatical error that can be handled by both approaches, and the second one is errors that can be handled by only a full parser, and the last one is errors that can be done only in a finite state machine. By doing this, we can figure out the strength and the weakness of each approach. The evaluation results show that a full parsing approach can detect more errors than a finite state machine can, while the accuracy of the former is lower than that of the latter. We can conclude that a full parser is suitable for detecting grammatical errors with a long distance dependency, whereas a finite state machine works well on sentences with multiple grammatical errors.

A Study on Implementation of Printed Character Recognition System And Performance Evaluation (인쇄체 문자 인식기의 성능 평가에 관한 연구)

  • Kim, Min-Soo;Kang, Eun-Young;Kim, Eun-Young;Han, Sun-Hwa;Kim, Jin-Hyung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.11
    • /
    • pp.3584-3591
    • /
    • 2000
  • In this paper we propose measure for performance evaluationof character recognition, We used three commercial character recognizers and one laboratory character recognizer for test. The characteristics of each recognizer is compared by proposed evaluation standrd, and analyzed characteristrics For the input test data, KT test collection are used. KT test collection is composed of 1000 document images about and complete source text. In this paper we propose method for measuring recognition rage in character unit for evaluation of character recogrition, The recogrition rates are compared and analyzed by single feature characteristic or mixed feature characteristic.

  • PDF