• Title/Summary/Keyword: classification of Korean characters

Search Result 248, Processing Time 0.027 seconds

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Habitat and Phytosociological Characters of Ceratopteris thalictroides, Endangered Plant Species on Paddy Field, in Nakdong River (논 잡초 멸종위기식물인 물고사리의 낙동강유역 자생지 최초보고 및 군락분류)

  • Choi, Byoung-Ki;Lee, Chang-Woo;Huh, Man-Kyu
    • Weed & Turfgrass Science
    • /
    • v.3 no.1
    • /
    • pp.50-55
    • /
    • 2014
  • This study is aimed at classifying the syntaxa of Ceratopteris thalictroides dominant community on the Nakdong River, and to collect basic data for research of habitat. The communities were carried out by using the Z.-M. School's method and numerical classification technique. The result of syntaxa was classified three communities such as Persicaria japonica-Ceratopteris thalictroides community, Lindernia procumbens-Ceratropteris thalictroides community, and Limnophila indica-Ceratopteris thalictroides community. The ordination analysis displayed the vegetation types with respect to complex environmental gradients. After ordination and clustering analysis, the effective humidity, soil stability, trampling effects, anthropogenic effects and flooding frequency were identified as the important factors deciding the vegetation pattern. It was pointed out to establish a long-term ecological site for protecting such vulnerable vegetation against overexploitation and global climate change.

Data Acquisition System Using the Second Binary Code (2차원 부호를 이용한 정보 획득 시스템)

  • Kim, In-Kyeom
    • The Journal of Information Technology
    • /
    • v.6 no.1
    • /
    • pp.71-84
    • /
    • 2003
  • In this paper, it is presented the efficient system for data recognition using the proposed binary code images. The proposed algorithm finds the position of binary image. Through the process of the block region classification, it is classified each block with the edge region using the value of gray level only. Each block region is divided horizontal and vertical edge region. If horizontal edge region blocks are classified over six blocks in any region, the proposed algorithm should search the vertical edge region in the start point of the horizontal edge region. If vertical edge region blocks were found over ten blocks in vertical region, the code image would found. Practical code region is acquired from the rate of the total edge region that is computed from the binary image that is processed with the average value. In case of the wrong rate, it is restarted the code search in the point after start point and the total process is followed. It has a short time than the before process time because it had classified block information. The block processing is faster thant the total process. The proposed system acquires the image from the digital camera and makes binary image from the acquired image. Finally, the proposed system extracts various characters from the binary image.

  • PDF

A numerical taxonomic study on heterophyid trematodes (Heterophyidae에 관한 수리분류학적 연구)

  • 김기홍;윤영한
    • Parasites, Hosts and Diseases
    • /
    • v.29 no.1
    • /
    • pp.55-66
    • /
    • 1991
  • A numerical taxonomy was studied on a group of heterophyid trematodes and analysis was made on the following species: Metagonimus yokogawai (3 OTU, Operational Taxonomic Unit) , Metagcnimus Miyata Type (3 OTU), Metagonimus takahashii (2 OTU), Heterophyes dispar (2 OTU), Heterephyes heterophyes (1 OTU), Heterophyes nocens (2 OTV), Heterophyopsis continua (1 OTU), Pygidiopsis summa (3 OTU), Stellantchasmus falcatus (2 OTU) and Stictodora sari (2 OTU). Twenty-six morphological characters were measured and their values were expressed as relative ratios. Similarity and correlation matrix among each individuals were calculated. Clustering analysis by Ward's method and factor analysis were performed using the SAS (Statistical Analysis System) package. As a results, the groups belonging to the genus of Metegenimus were divided into three phonons (Awetegonimus yokogawai, Metegcnimus Miyata Type, M. takahashii) , and Metagonimus Miyata Type was classified as the level of subspecies of M. takahashii. The groups belonging to the genus Heterophyes were clearly divided into three phonons (Heterophyes dispar, H. heterephyes, H. nocens), and H. nccens was classified as not a subspecies level of H. heterophyes but a distinct species. Other species were classified as distinct phonons. From these results, the appllcr lion of numerical taxonomy on trematode classification is considered to be a great aid to determine the limit of taxa.

  • PDF

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.

Morphological Characteristics and Classification Criteria for Azalea Cultivars for Landscaping in Korea (조경용 철쭉재배품종의 형태적 특성 및 분류기준)

  • Choi, Jae-Jin;Park, Seok-Gon
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.42 no.2
    • /
    • pp.77-85
    • /
    • 2014
  • This study was conducted to examine the morphological characteristics of those Azalea Cultivars(hereinafter, Azalea Cultivars) that are mainly used for landscaping in Korea in order to prepare classification criteria. As testing materials, major Azalea Cultivars cultivated in large quantities by its producing companies were collected. Thereafter, the qualitative and quantitative characteristics of the traits of Azalea Cultivars were investigated in the way of characteristic investigation for new cultivars of azalea used by the Korea Seed and Variety Service in order to classify them and prepare the classification criteria. Since cultivar names have not been established for Azalea Cultivars for landscaping thus far, the data were compiled using the names used by cultivating companies. According to the results, Azalea Cultivars cultivated in Suncheon, Jeonnam mainly for landscaping were 10 in number; Beni, Daewang, Three, Zasanhong, Hancheol, Sancheoljuk, Gyeobsancheoljuk, Baekcheoljuk, Akado, and Seok-am. Among them, the cultivars Beni, Daewang, and Three could not be easily distinguished from each other because they are commonly called Yeongsanhong by cultivating companies and the shapes of their leaves and flowers are similar to each other. In particular, the flower color of Beni was 'bright red', that of Daewang was 'vivid purple', and that of Three was 'bright purple'. In addition, Zasanhong and Hancheol were similar to each other in shape the degree of expression of spots on the flowers and the gloss on the front side of the leaves of Hancheol were higher and stronger compared to that of Zasanhong. Sancheoljuk flowered in early April, earlier compared to other Azalea Cultivars. Gyeobsancheoljuk is an elementary species of Sancheoljuk and it had double flowers although all other traits were similar to those of Sancheoljuk. Although Baekcheoljuk was easily distinguished because it had white flowers, its leaves were similar to those of Akado the reason why these two cultivars could not be easily distinguished from each other. The cultivar Akado flowered early May later compared to other Azalea Cultivars and its flowers were relatively large in diameter as with Baekcheoljuk and Sancheoljuk. Finally, the cultivar Seok-am was easily distinguished because it had smaller leaves compared to other cultivars and it flowered late as with the cultivar Akado.

Weed-Ecological Classification of the Collected Barnyardgrass [Echinochloa crus-galli(L.) Beauv.] in Korea - II. Classification of collected barnyardgrass in growth pattern by multivariate clustering (한국산(韓國産) 피[Echinochloa crus-galli (L.) Beauv.] 수집종(蒐集種)의 잡초생태학적(雜草生態學的) 분류(分類)에 관(關한) 연구(硏究) - 제(第)II보(報) 다변량(多變量) 해석법(解析法)에 의한 수집종(蒐集種) 피의 분류(分類))

  • Im, I.B.;Guh, J.O.;Lee, Y.M.
    • Korean Journal of Weed Science
    • /
    • v.9 no.1
    • /
    • pp.1-15
    • /
    • 1989
  • The seventeen barnyardgrass [Echinochloa crus-galli (L.) Beauv.] accessions, which were collected national-widely in 1986 and selected two times through 1987, were experimented at 1988. To identify the ecological properties of the collected accessions of native barnyardgrass species as a weed, the experiment was conducted with Wagner pots in size of I/500a and under PE film house. 1. Accessions were classified into 5 specific groups in plant type properties by use of data from plant height, number of maximum tillers, erectness, culm length and panicle type, among others. 2. As for species identification, they were clustered into 3 similar groups and 2 individual species by use of data from color, first-glumer type, and erectness. 3. Four groups were identified for elongational properties by plant height of 22 days old seedling, length of culm, panical, leaf length and width, and inter-node and spikelet, among others. 4. Properties on quanititative growth were classified into 4 groups and 1 individual accession corresponding to differential plant height of 22 days old seedling, length of culm, panical, inter-node, leaf-sheath, spikelet, first-glumes length, grain, number of tillers, spike, and grain weight. 5. Due to different daily increasing rate in seedling height, dry weight, number of tillers and ratio in dry weight to plant height, the growth rate properties were clustered into 4 groups and one individual accession. 6. Properties on seedling growth were classified into 4 groups by use of differential date in length and width of first-leaf, plant height, number of tillers, and dry weight of young and medium aged seedling. 7. Responding to heading date, the accessions were classified into 3 groups : temperative sensitive, medium, and short-day length sensitive types, respectively. 8. By integrating of all quanititative and attributable characters, the seventeen accessions were clustered into 4 groups and 2 individual accessions.

  • PDF

A study in the Effects on the Quality Attributes of Korean Restaurants menu on Revisit Intention - Centering on Korean Students who are Studying in Paris, France - (한식 레스토랑 메뉴품질속성이 재방문의도에 미치는 영향에 관한 연구 - 프랑스 파리 지역 한국인 유학생을 대상으로 -)

  • Lee, Sun-Ho;Kim, Sun-Hee;Kim, Young-Kyun
    • Culinary science and hospitality research
    • /
    • v.18 no.2
    • /
    • pp.34-50
    • /
    • 2012
  • This study conducted the empirical analysis of the influence that are quality attributes, and strengths, weaknesses, low ranking and excess of Korean restaurants menu on revisit intention based on differences between importance and satisfaction by surveying Korean students in Paris, and also conducting IPA analysis based on survey result. The result of IPA analysis showed weaknesses that are the visual elements of foods; in contrast it showed strengths that are the price, temperature, taste, quantity, cleanliness, freshness, and flavor of dishes. Also it showed low ranking that are seasonal items, authenticity, originality, sizes, colors, texture, explanations, ingredients, recipes of menu, classification of healthy food, suitability of bowls, creativity of existing food. Therefore, it is found that the higher recognition of attributes, organic functions, properties, and explanations of menu, the higher revisit intention. It showed that explanations, sensibility, characters, quality of menu influenced on revisit intention according priority by analysis of quality attributes of menu on revisit intention. This study is significant in that it provides useful data services for marketing strategies and operational suggestions to globalization of Korean food by analysis local menu of Korean food service industry.

  • PDF

A Study on The Association between Extraordinary Organs(奇恒之腑) and Eight Extra Meridians(奇經八脈) (기항지부(奇恒之腑)와 기경팔맥(奇經八脈)의 관련성 고찰)

  • Lyu, Jeong-Ah;Jeong, Chang-Hyun
    • Journal of Korean Medical classics
    • /
    • v.27 no.2
    • /
    • pp.49-67
    • /
    • 2014
  • Subject : The Association between Extraordinary Organs(奇恒之腑) and Eight Extra Meridians(奇經 八脈). Objectives : This study research some special aspects of Extraordinary Organs and Eight Extra Meridians which differentiated from ordinary Organs and Meridians, and the association between Extraordinary Organs and Eight Extra Meridians. Methods : First, researched classification standard and physiological characteristics of Extraordinary Organs through studying various chapters of HuangdiNeijing. Second, researched The Association between Extraordinary Organs and Eight Extra Meridians through studying on the origin of Eight Extra Meridians in HuangdiNeijing. Third, from accompanying researching the subject of Cheon-gye(天癸) and human body shape, draw synthetic hypothesis on the relationship among ordinary Meridians and Organs, muscles and skins of body shape, Extraordinary Organs and Eight Extra Meridians. Results & Conclusions : The following conclusions could be drawn. 1. Extraordinary Organs afford background for shaping human body. This is same as the properties of the earth which afford background for shaping all creations. The physiological characteristics of Extraordinary Organs is intermediation and regulation between Ki(氣) of Five Viscera & six Bowels and shape of muscles & skins in human body. 2. The origin of Eight Extra Meridians could be found in HuangdiNeijing. The collateral Meridians of the Uterus and Epiglottis Meridian are specifically formulated to supplying for the Uterus or Epiglottis. From this we can draw The Association between Extraordinary Organs and Eight Extra Meridians, that is Eight Extra Meridians are specifically formulated to supplying for Extraordinary Organs. 3. The Cheon-gye(天癸) is doing significant function in Eight Extra Meridians supplying for Extraordinary Organs. Cheon-gye concerned to growth, secondary sexual characters, generative function, aging process. Theses are all concerned with the changing of human body shape. Cheon-gye urge to change the body shape with following the human life cycle. 4. Human body has vertical symmetry because preserve its shape from the gravitational force. Eight Extra Meridians place at the middle or flank axis of human body, thus do physiological function that assistant human body have vertical symmetry shape. The purpose of shaping vertical symmetry is securing space what the inner Twelve Regular Meridians and Five Viscera & six Bowels create there own physical changes. On the other hand the inner changes need deviation between left and right because of mobility and circulation of force. But human body change the shape in the process of growth, reproduction, aging. Eight Extra Meridians play role at time of these processing, thus they deeply concerned to human's life cycle and reproduction. 5. Eight Extra Meridians and Extraordinary Organs were named 'Extra' because of some special aspects which differentiated from ordinary Meridians and Organs. All they play role to have vertical symmetry shape of human body and maintain the shape, thus they deeply concerned to the change of human life cycle. These shaping maintaining and the change of human life cycle are very special aspects of human body. So they needed to differently cognize separate to the ordinary changes of Five Viscera & six Bowels and Twelve Meridians at inner space.

Lack of Mitochondrial DNA Sequence Divergence between Two Subspecies of the Siberian Weasel from Korea: Mustela sibirica coreanus from the Korean Peninsula and M. s. quelpartis from Jeju Island

  • Koh, Hung-Sun;Jang, Kyung-Hee;Oh, Jang-Geun;Han, Eui-Dong;Jo, Jae-Eun;Ham, Eui-Jeong;Jeong, Seon-Ki;Lee, Jong-Hyek;Kim, Kwang-Seon;Kweon, Gu-Hee;In, Seong-Teak
    • Animal Systematics, Evolution and Diversity
    • /
    • v.28 no.2
    • /
    • pp.133-136
    • /
    • 2012
  • The objective of this study was to determine the degree of mitochondrial DNA (mtDNA) divergence between two subspecies of $Mustela$ $sibirica$ from Korea ($M.$ $s.$ $coreanus$ on the Korean Peninsula and ($M.$ $s.$ $quelpartis$ on Jeju Island) and to examine the taxonomic status of ($M.$ $s.$ $quelpartis$. Thus, we obtained complete sequences of mtDNA cytochrome $b$ gene (1,140 bp) from the two subspecies, and these sequences were compared to a corresponding haplotype of ($M.$ $s.$ $coreanus$, downloaded from GenBank. From this analysis, it was observed that the sequences from monogenic ($M.$ $s.$ $quelpartis$ on Jeju Island were identical to the sequences of four ($M.$ $s.$ $coreanus$from four locations across the Korean Peninsula, and that the two subspecies formed a single clade; the average nucleotide distance between the two subspecies was 0.26% (range, 0.00 to 0.53%). We found that the subspecies $quelpartis$ is not genetically distinct from the subspecies $coreanus$, and that this cytochrome $b$ sequencing result does not support the current classification, distinguishing these two subspecies by pelage color. Further systematic analyses using morphometric characters and other DNA markers are necessary to confirm the taxonomic status of ($M.$ $s.$ $quelpartis$.