• Title/Summary/Keyword: Korean text classification

Search Result 413, Processing Time 0.027 seconds

Development of an Instructional Material for High School Environmental Education to Achiece Balanced Objectives (균형있는 환경 교육의 목표 달성을 위한 고등학교 환경 교재의 개발)

  • Park, Jin-Hee;Chang, Nam-Kee
    • Journal of The Korean Association For Science Education
    • /
    • v.15 no.1
    • /
    • pp.39-53
    • /
    • 1995
  • The purpose of this study was development of 'Environmental Science' of high school appropriate to Sixth Natinal Education Curriculum. In view that ultimate aim of environmental education is forming responsible environmental behaviors and the goals of values and behaviors are as important as knowledges and skills, A new environmental text of high school level was developed and it was based on analysis of seven texts and environmental education in Fifth Korean Curriculum. This text have seven units, 1.Habitates : What're the Meanings?, 2.Nuclear Energy: Can't be Avoid?, 3.Acid Rain : What're the Messages", 4.Ethanol : Is this Future Fuel?, 5.Wastes : A New War!, 6.What're the National and Global Environmental Issues? and 7.Our Water: Can Drink, Really? This text was stressed equally in goals of four environmental education and avoided from the array of knowledges. Therefore included various teaching strategies and independent actions of students. 'Open-ended value learning' and 'free behavior learning' in text were special learning parts for aquisition of values and formation of behaviors. To verify the effects of new developed environmental text, the direct learning was carried out by 286 students in total. Post test scores of experimental groups per each units were significantly higher than those of control groups about four goals, respectively. The Results of questionnaires by 50 teachers from five different schools were as follows. For validity of selecting contents for units, 74% of respondents replied positively. For classification and presentation of four goal-groups, 90% replied positively in validity and 82%, in utility. For validity of various teaching strategies, 88% and for the degree of including student-centered independent actions, 86% replied positively. For importances and expected effects of 'open-ended value learning' and 'free behavior learning', showed positive responses respectively, 88%, 92%. Therefore this text is effective to acheive four goals of environmental education equally.

  • PDF

Application of Deep Learning and Optical Character Recognition Technology to Automate Classification and Database of Borehole Log for Ground Stability Investigation of Abandoned Mines (폐광산 지반안정성 조사용 시추주상도의 분류 및 데이터베이스화를 위한 딥러닝 및 광학문자인식 기술의 적용)

  • Hosang Han;Jangwon Suh
    • Economic and Environmental Geology
    • /
    • v.57 no.5
    • /
    • pp.473-486
    • /
    • 2024
  • Boring logs are essential for the evaluation of ground stability in abandoned mine areas, representing geomaterial and subsurface structure information. However, because boring logs are maintained in various analog formats, extracting useful information from them is prone to human error and time-consuming. Therefore, this study develops an algorithm to efficiently manage and analyze boring log data for abandoned mine ground investigation provided in PDF format. For this purpose, the EfficientNet deep learning model was employed to classify the boring logs into five types with a high classification accuracy of 1.00. Then, optical character recognition (OCR) and PDF text extraction techniques were utilized to extract text data from each type of boring log. The OCR technique resulted in many cases of misrecognition of the text data of the boring logs, but the PDF text extraction technique extracted the text with very high accuracy. Subsequently, the structure of the database was established, and the text data of the boring logs were reorganized according to the established schema and written as structured data in the form of a spreadsheet. The results of this study suggest an effective approach for managing boring logs as part of the transition to digital mining, and it is expected that the structured boring log data from legacy data can be readily utilized for machine learning analysis.

Image Classification Approach for Improving CBIR System Performance (콘텐트 기반의 이미지검색을 위한 분류기 접근방법)

  • Han, Woo-Jin;Sohn, Kyung-Ah
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.7
    • /
    • pp.816-822
    • /
    • 2016
  • Content-Based image retrieval is a method to search by image features such as local color, texture, and other image content information, which is different from conventional tag or labeled text-based searching. In real life data, the number of images having tags or labels is relatively small, so it is hard to search the relevant images with text-based approach. Existing image search method only based on image feature similarity has limited performance and does not ensure that the results are what the user expected. In this study, we propose and validate a machine learning based approach to improve the performance of the image search engine. We note that when users search relevant images with a query image, they would expect the retrieved images belong to the same category as that of the query. Image classification method is combined with the traditional image feature similarity method. The proposed method is extensively validated on a public PASCAL VOC dataset consisting of 11,530 images from 20 categories.

Improving Classification Accuracy in Hierarchical Trees via Greedy Node Expansion

  • Byungjin Lim;Jong Wook Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.6
    • /
    • pp.113-120
    • /
    • 2024
  • With the advancement of information and communication technology, we can easily generate various forms of data in our daily lives. To efficiently manage such a large amount of data, systematic classification into categories is essential. For effective search and navigation, data is organized into a tree-like hierarchical structure known as a category tree, which is commonly seen in news websites and Wikipedia. As a result, various techniques have been proposed to classify large volumes of documents into the terminal nodes of category trees. However, document classification methods using category trees face a problem: as the height of the tree increases, the number of terminal nodes multiplies exponentially, which increases the probability of misclassification and ultimately leads to a reduction in classification accuracy. Therefore, in this paper, we propose a new node expansion-based classification algorithm that satisfies the classification accuracy required by the application, while enabling detailed categorization. The proposed method uses a greedy approach to prioritize the expansion of nodes with high classification accuracy, thereby maximizing the overall classification accuracy of the category tree. Experimental results on real data show that the proposed technique provides improved performance over naive methods.

Developing the Automated Sentiment Learning Algorithm to Build the Korean Sentiment Lexicon for Finance (재무분야 감성사전 구축을 위한 자동화된 감성학습 알고리즘 개발)

  • Su-Ji Cho;Ki-Kwang Lee;Cheol-Won Yang
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.1
    • /
    • pp.32-41
    • /
    • 2023
  • Recently, many studies are being conducted to extract emotion from text and verify its information power in the field of finance, along with the recent development of big data analysis technology. A number of prior studies use pre-defined sentiment dictionaries or machine learning methods to extract sentiment from the financial documents. However, both methods have the disadvantage of being labor-intensive and subjective because it requires a manual sentiment learning process. In this study, we developed a financial sentiment dictionary that automatically extracts sentiment from the body text of analyst reports by using modified Bayes rule and verified the performance of the model through a binary classification model which predicts actual stock price movements. As a result of the prediction, it was found that the proposed financial dictionary from this research has about 4% better predictive power for actual stock price movements than the representative Loughran and McDonald's (2011) financial dictionary. The sentiment extraction method proposed in this study enables efficient and objective judgment because it automatically learns the sentiment of words using both the change in target price and the cumulative abnormal returns. In addition, the dictionary can be easily updated by re-calculating conditional probabilities. The results of this study are expected to be readily expandable and applicable not only to analyst reports, but also to financial field texts such as performance reports, IR reports, press articles, and social media.

Machine Learning Approach to Classifying Fatal and Non-Fatal Accidents in Industries (사망사고와 부상사고의 산업재해분류를 위한 기계학습 접근법)

  • Kang, Sungsik;Chang, Seong Rok;Suh, Yongyoon
    • Journal of the Korean Society of Safety
    • /
    • v.36 no.5
    • /
    • pp.52-60
    • /
    • 2021
  • As the prevention of fatal accidents is considered an essential part of social responsibilities, both government and individual have devoted efforts to mitigate the unsafe conditions and behaviors that facilitate accidents. Several studies have analyzed the factors that cause fatal accidents and compared them to those of non-fatal accidents. However, studies on mathematical and systematic analysis techniques for identifying the features of fatal accidents are rare. Recently, various industrial fields have employed machine learning algorithms. This study aimed to apply machine learning algorithms for the classification of fatal and non-fatal accidents based on the features of each accident. These features were obtained by text mining literature on accidents. The classification was performed using four machine learning algorithms, which are widely used in industrial fields, including logistic regression, decision tree, neural network, and support vector machine algorithms. The results revealed that the machine learning algorithms exhibited a high accuracy for the classification of accidents into the two categories. In addition, the importance of comparing similar cases between fatal and non-fatal accidents was discussed. This study presented a method for classifying accidents using machine learning algorithms based on the reports on previous studies on accidents.

Perceptual Evaluation of Duration Models in Spoken Korean

  • Chung, Hyun-Song
    • Speech Sciences
    • /
    • v.9 no.1
    • /
    • pp.207-215
    • /
    • 2002
  • Perceptual evaluation of duration models of spoken Korean was carried out based on the Classification and Regression Tree (CART) model for text-to-speech conversion. A reference set of durations was produced by a commercial text-to-speech synthesis system for comparison. The duration model which was built in the previous research (Chung & Huckvale, 2001) was applied to a Korean language speech synthesis diphone database, 'Hanmal (HN 1.0)'. The synthetic speech produced by the CART duration model was preferred in the subjective preference test by a small margin and the synthetic speech from the commercial system was superior in the clarity test. In the course of preparing the experiment, a labeled database of spoken Korean with 670 sentences was constructed. As a result of the experiment, a trained duration model for speech synthesis was obtained. The 'Hanmal' diphone database for Korean speech synthesis was also developed as a by-product of the perceptual evaluation.

  • PDF

The Construction of a Domain-Specific Sentiment Dictionary Using Graph-based Semi-supervised Learning Method (그래프 기반 준지도 학습 방법을 이용한 특정분야 감성사전 구축)

  • Kim, Jung-Ho;Oh, Yean-Ju;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.18 no.1
    • /
    • pp.103-110
    • /
    • 2015
  • Sentiment lexicon is an essential element for expressing sentiment on a text or recognizing sentiment from a text. We propose a graph-based semi-supervised learning method to construct a sentiment dictionary as sentiment lexicon set. In particular, we focus on the construction of domain-specific sentiment dictionary. The proposed method makes up a graph according to lexicons and proximity among lexicons, and sentiments of some lexicons which already know their sentiment values are propagated throughout all of the lexicons on the graph. There are two typical types of the sentiment lexicon, sentiment words and sentiment phrase, and we construct a sentiment dictionary by creating each graph of them and infer sentiment of all sentiment lexicons. In order to verify our proposed method, we constructed a sentiment dictionary specific to the movie domain, and conducted sentiment classification experiments with it. As a result, it have been shown that the classification performance using the sentiment dictionary is better than the other using typical general-purpose sentiment dictionary.

Automatic Document Classification Based on k-NN Classifier and Object-Based Thesaurus (k-NN 분류 알고리즘과 객체 기반 시소러스를 이용한 자동 문서 분류)

  • Bang Sun-Iee;Yang Jae-Dong;Yang Hyung-Jeong
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.9
    • /
    • pp.1204-1217
    • /
    • 2004
  • Numerous statistical and machine learning techniques have been studied for automatic text classification. However, because they train the classifiers using only feature vectors of documents, ambiguity between two possible categories significantly degrades precision of classification. To remedy the drawback, we propose a new method which incorporates relationship information of categories into extant classifiers. In this paper, we first perform the document classification using the k-NN classifier which is generally known for relatively good performance in spite of its simplicity. We employ the relationship information from an object-based thesaurus to reduce the ambiguity. By referencing various relationships in the thesaurus corresponding to the structured categories, the precision of k-NN classification is drastically improved, removing the ambiguity. Experiment result shows that this method achieves the precision up to 13.86% over the k-NN classification, preserving its recall.

Hybrid Word-Character Neural Network Model for the Improvement of Document Classification (문서 분류의 개선을 위한 단어-문자 혼합 신경망 모델)

  • Hong, Daeyoung;Shim, Kyuseok
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1290-1295
    • /
    • 2017
  • Document classification, a task of classifying the category of each document based on text, is one of the fundamental areas for natural language processing. Document classification may be used in various fields such as topic classification and sentiment classification. Neural network models for document classification can be divided into two categories: word-level models and character-level models that treat words and characters as basic units respectively. In this study, we propose a neural network model that combines character-level and word-level models to improve performance of document classification. The proposed model extracts the feature vector of each word by combining information obtained from a word embedding matrix and information encoded by a character-level neural network. Based on feature vectors of words, the model classifies documents with a hierarchical structure wherein recurrent neural networks with attention mechanisms are used for both the word and the sentence levels. Experiments on real life datasets demonstrate effectiveness of our proposed model.