• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.026 seconds

A Study of Curriculum on Vocational High School under Analysis e-Business Demand Education (e-Business Demand Education 분석에 따른 전문계고 Curriculum 연구)

  • An, Jae-Min;Park, Dea-Woo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.8
    • /
    • pp.73-80
    • /
    • 2009
  • It is difficult that expertise human supply and demand for industry requires by imbalance of industry necessity human and profession organs of education's Skill Mismatch. Industry can prove productivity though reeducate school graduation person in spot and master correct technology in industry special quality. This paper is research that accommodate Demand Education that industry requires and make out full text caution Curriculum Specializing Vocational High School in e-Business field. Analysis e-Business industrial classification and occupational classification. Analysis knowledge and technological level that require in industry about e-Business education and investigate and analyze the demand. Base industry, Support industry, Apply e-Business Curriculum that is examined by practical use industry to learning, Do to estimate satisfaction about Demand Education Curriculum of industry and confirm Success special quality with research and investigation and application wave. Suggested for e-Business Curriculum's basis model in this paper and school subject Curriculum. Wish to contribute in nation development through productivity elevation through e-Business education of industry request.

Analysis of Research Trends in Papers Published in the Journal of Korean Medicine for Obesity Research: Focused on 2010-2019 (최근 10년간 한방비만학회지의 연구동향 분석: 2010-2019년 한방비만학회지 게재논문을 중심으로)

  • Park, Seohyun;Song, Yun-kyung
    • Journal of Korean Medicine for Obesity Research
    • /
    • v.20 no.2
    • /
    • pp.149-177
    • /
    • 2020
  • Objectives: This study performed to identify trends in research published in the Journal of Korean Medicine for Obesity Research during last one decade. Methods: All of the articles in the Journal of Korean Medicine for Obesity Research published from 2010 to 2019 were collected. Search were conducted through "http://jkomor.org." Collected articles were classified into year and type of publication. Additional data including study design, study topics, characteristics of participants and treatment, outcomes was extracted from full text of each study. Results: Total 135 articles were analyzed. The number of studies were increasing after 2015. According to classification by type of study, percentage of clinical study took 27%, preclinical study took 37%, literary study took 21%, and case report took 15%. The number of studies were grown and study topics have been diversified. However for the growth of quality, concern for subjects, study design, quality assessment according to research guidelines and ethical consideration is needed. Conclusions: The number of studies and issues each study focused on have been increasing. To improve the quality of studies, further studies should be followed.

Analysis of the Korean Tokenizing Library Module (한글 토크나이징 라이브러리 모듈 분석)

  • Lee, Jae-kyung;Seo, Jin-beom;Cho, Young-bok
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.05a
    • /
    • pp.78-80
    • /
    • 2021
  • Currently, research on natural language processing (NLP) is rapidly evolving. Natural language processing is a technology that allows computers to analyze the meanings of languages used in everyday life, and is used in various fields such as speech recognition, spelling tests, and text classification. Currently, the most commonly used natural language processing library is NLTK based on English, which has a disadvantage in Korean language processing. Therefore, after introducing KonLPy and Soynlp, the Korean Tokenizing libraries, we will analyze morphology analysis and processing techniques, compare and analyze modules with Soynlp that complement KonLPy's shortcomings, and use them as natural language processing models.

  • PDF

Analysis of the Research Trends by Environmental Spatial-Information Using Text-Mining Technology (텍스트 마이닝 기법을 활용한 환경공간정보 연구 동향 분석)

  • OH, Kwan-Young;LEE, Moung-Jin;PARK, Bo-Young;LEE, Jung-Ho;YOON, Jung-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.1
    • /
    • pp.113-126
    • /
    • 2017
  • This study aimed to quantitatively analyze the trends in environmental research that utilize environmental geospatial information through text mining, one of the big data analysis technologies. The analysis was conducted on a total of 869 papers published in the Republic of Korea, which were collected from the National Digital Science Library (NDSL). On the basis of the classification scheme, the keywords extracted from the papers were recategorized into 10 environmental fields including "general environment", "climate", "air quality", and 20 environmental geospatial information fields including "satellite image", "numerical map", and "disaster". With the recategorized keywords, their frequency levels and time series changes in the collected papers were analyzed, as well as the association rules between keywords. First, the results of frequency analysis showed that "general environment"(40.85%) and "satellite image"(24.87%) had the highest frequency levels among environmental fields and environmental geospatial information fields, respectively. Second, the results of the time series analysis on environmental fields showed that the share of "climate" between 1996 and 2000 was high, but since 2001, that of "general environment" has increased. In terms of environmental geospatial information fields, the demand for "satellite image" was highest throughout the period analyzed, and its utilization share has also gradually increased. Third, a total of 80 correlation rules were generated for environmental fields and environmental geospatial information fields. Among environmental fields, "general environment" generated the highest number of correlation rules (17) with environmental geospatial information fields such as "satellite image" and "digital map".

Korean speech recognition using deep learning (딥러닝 모형을 사용한 한국어 음성인식)

  • Lee, Suji;Han, Seokjin;Park, Sewon;Lee, Kyeongwon;Lee, Jaeyong
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.213-227
    • /
    • 2019
  • In this paper, we propose an end-to-end deep learning model combining Bayesian neural network with Korean speech recognition. In the past, Korean speech recognition was a complicated task due to the excessive parameters of many intermediate steps and needs for Korean expertise knowledge. Fortunately, Korean speech recognition becomes manageable with the aid of recent breakthroughs in "End-to-end" model. The end-to-end model decodes mel-frequency cepstral coefficients directly as text without any intermediate processes. Especially, Connectionist Temporal Classification loss and Attention based model are a kind of the end-to-end. In addition, we combine Bayesian neural network to implement the end-to-end model and obtain Monte Carlo estimates. Finally, we carry out our experiments on the "WorimalSam" online dictionary dataset. We obtain 4.58% Word Error Rate showing improved results compared to Google and Naver API.

Knowledge Extraction Methodology and Framework from Wikipedia Articles for Construction of Knowledge-Base (지식베이스 구축을 위한 한국어 위키피디아의 학습 기반 지식추출 방법론 및 플랫폼 연구)

  • Kim, JaeHun;Lee, Myungjin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.43-61
    • /
    • 2019
  • Development of technologies in artificial intelligence has been rapidly increasing with the Fourth Industrial Revolution, and researches related to AI have been actively conducted in a variety of fields such as autonomous vehicles, natural language processing, and robotics. These researches have been focused on solving cognitive problems such as learning and problem solving related to human intelligence from the 1950s. The field of artificial intelligence has achieved more technological advance than ever, due to recent interest in technology and research on various algorithms. The knowledge-based system is a sub-domain of artificial intelligence, and it aims to enable artificial intelligence agents to make decisions by using machine-readable and processible knowledge constructed from complex and informal human knowledge and rules in various fields. A knowledge base is used to optimize information collection, organization, and retrieval, and recently it is used with statistical artificial intelligence such as machine learning. Recently, the purpose of the knowledge base is to express, publish, and share knowledge on the web by describing and connecting web resources such as pages and data. These knowledge bases are used for intelligent processing in various fields of artificial intelligence such as question answering system of the smart speaker. However, building a useful knowledge base is a time-consuming task and still requires a lot of effort of the experts. In recent years, many kinds of research and technologies of knowledge based artificial intelligence use DBpedia that is one of the biggest knowledge base aiming to extract structured content from the various information of Wikipedia. DBpedia contains various information extracted from Wikipedia such as a title, categories, and links, but the most useful knowledge is from infobox of Wikipedia that presents a summary of some unifying aspect created by users. These knowledge are created by the mapping rule between infobox structures and DBpedia ontology schema defined in DBpedia Extraction Framework. In this way, DBpedia can expect high reliability in terms of accuracy of knowledge by using the method of generating knowledge from semi-structured infobox data created by users. However, since only about 50% of all wiki pages contain infobox in Korean Wikipedia, DBpedia has limitations in term of knowledge scalability. This paper proposes a method to extract knowledge from text documents according to the ontology schema using machine learning. In order to demonstrate the appropriateness of this method, we explain a knowledge extraction model according to the DBpedia ontology schema by learning Wikipedia infoboxes. Our knowledge extraction model consists of three steps, document classification as ontology classes, proper sentence classification to extract triples, and value selection and transformation into RDF triple structure. The structure of Wikipedia infobox are defined as infobox templates that provide standardized information across related articles, and DBpedia ontology schema can be mapped these infobox templates. Based on these mapping relations, we classify the input document according to infobox categories which means ontology classes. After determining the classification of the input document, we classify the appropriate sentence according to attributes belonging to the classification. Finally, we extract knowledge from sentences that are classified as appropriate, and we convert knowledge into a form of triples. In order to train models, we generated training data set from Wikipedia dump using a method to add BIO tags to sentences, so we trained about 200 classes and about 2,500 relations for extracting knowledge. Furthermore, we evaluated comparative experiments of CRF and Bi-LSTM-CRF for the knowledge extraction process. Through this proposed process, it is possible to utilize structured knowledge by extracting knowledge according to the ontology schema from text documents. In addition, this methodology can significantly reduce the effort of the experts to construct instances according to the ontology schema.

A Study on Classification into Hangeul and Hanja in Text Area of Printed Document (인쇄체 문서의 문자영역에서 한글과 한자의 구별에 관한 연구)

  • 심상원;이성범;남궁재찬
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.18 no.6
    • /
    • pp.802-814
    • /
    • 1993
  • This paper propose an algorithm for preprocessing of character recognition, which classify characters into Hangeul and Hanja. In this study, we use the 9 structural chacteristics of Hanja which isn't affected by deformation of size and style of characters and rates based on character size to classify characters. Firstly, we process the blocking to segment each characters. Secondly, on this segmented characters, we apply algorithm proposed in this paper to classify Hangeul and Hanja. Finally, we classify characters into Hangeul and Hanja, respectively. An experiment with 2350 Hangeul and 4888 Hanja printed Gothic and Mincho style of KS-C 5601 are carried out. We experiment on typeface sample book, newspapers, academic society's papers, magazines, textbooks and documents written out word processor to obtain the classifying rates of 98.8%, 92%, 96%, 98% and 98%, respectively.

  • PDF

A Study on the Distinctive Features of "Hwangjenaegyeongtaeso(黃帝內經太素)" by Yang Sangseon and his Medical Theory ("황제내경태소(黃帝內經太素)"의 특징(特徵) 및 양상선(楊上善)의 의학이론(醫學理論)에 대한 연구(硏究))

  • Lee, Sang-Hyup;Kim, Joong-Han
    • Journal of Korean Medical classics
    • /
    • v.22 no.2
    • /
    • pp.35-69
    • /
    • 2009
  • Yang Shangseon(楊上善)'s "Hwangjenaegyeongtaeso(黃帝內經太素)" was the first commentary book of "Hwangjenaegyeong(黃帝內經)", its importance often mentioned in level with Wang Bing (王冰)'s "Somun(素問)" "Yeongchu(靈樞)". The distinctive feature of Yang Sangseon(楊上善)'s commentary is that it is easy to comprehend in accordance with an organized classification, and that the explanations are simple and clear. Despite strict application of the Eumyang(陰陽, Yinyang) theory and Five phases[五行] theory throughout the text, should there be sentences which fall out of consistency with the basic theories, he added his own substantial commentary. His medical theory gives attention to the Meridian system[經絡], lays emphasis on developing the soul[神], and has a unique opinion about the Opening closing and pivot[開闔樞] theory along with the Myeongmun(命門). To explain the methods for preserving health[養生], he adopted the Nojang philosophy(老莊思想); to enrich the vitality he adopted the Buddhist philosophy(佛敎思想); and to analyze physiologic and pathogenic factors, he adopted the Confucian philosophy(儒家思想).

  • PDF

A Study on the Reclassification of Author Keywords for Automatic Assignment of Descriptors (디스크립터 자동 할당을 위한 저자키워드의 재분류에 관한 실험적 연구)

  • Kim, Pan-Jun;Lee, Jae-Yun
    • Journal of the Korean Society for information Management
    • /
    • v.29 no.2
    • /
    • pp.225-246
    • /
    • 2012
  • This study purported to investigate the possibility of automatic descriptor assignment using the reclassification of author keywords in domestic scholarly databases. In the first stage, we selected optimal classifiers and parameters for the reclassification by comparing the characteristics of machine learning classifiers. In the next stage, learning the author keywords that were assigned to the selected articles on readings, the author keywords were automatically added to another set of relevant articles. We examined whether the author keyword reclassifications had the effect of vocabulary control just as descriptors collocate the documents on the same topic. The results showed the author keyword reclassification had the capability of the automatic descriptor assignment.

An analysis of the writing tasks in high school English textbooks: Focusing on genre, rhetorical structure, task types, and authenticity (고등학교 1학년 영어교과서 쓰기활동 과업 분석: 장르, 텍스트 전개구조, 활동 유형, 진정성을 중심으로)

  • Choi, Sunhee;Yu, Ho-Jung
    • English Language & Literature Teaching
    • /
    • v.16 no.4
    • /
    • pp.267-290
    • /
    • 2010
  • The purpose of this study is to analyze the writing tasks included in the newly developed high school English textbooks in the aspects of genre, rhetorical structure, task type, and authenticity in order to find out whether these tasks could contribute to improving Korean EFL students' writing skills. A total of nine textbooks were selected for the study and every writing task in each textbook was analyzed. The results show that various types of genres were incorporated in the tasks, but very few opportunities were provided for students to acquire characteristics of specific genres. In terms of rhetorical structure of text, narration, illustration, and transaction were required most, whereas not a single writing task asked students to use classification or cause and effect. Many of the writing tasks analyzed offered linguistic and/or content support through the use of models, which displays traces of the product-based approach to teaching writing. Lastly, most of the tasks lacked authenticity represented by explicit discussion of purpose and audience. Implications for L2 writing task development and writing instruction in the Korean EFL context are discussed.

  • PDF