• Title/Summary/Keyword: 자질추출

Search Result 218, Processing Time 0.027 seconds

Opinion Mining of Product Reviews using Sentiment Phrase Patterns considered the Endings of Declinable Words (어미변화를 고려한 감성 구문 패턴을 이용한 상품평 의견 분류)

  • Kim, Jung-Ho;Cha, Myung-Hoon;Kim, Myung-Kyu;Chae, Soo-Hoan
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2010.06c
    • /
    • pp.285-290
    • /
    • 2010
  • 인터넷이 대중화됨에 따라 누구나 쉽게 자신의 의견을 온라인상에 표현할 수 있게 되었다. 그 결과 생각이나 느낌을 나타내는 의견 데이터들의 양이 급속도로 방대해졌으며, 이러한 데이터들을 이용한 여러 응용 사례들의 등장으로, 효율적인 검색 및 자동 분류 기술이 요구되고 있다. 이런 기술적 흐름에 맞추어 의견 데이터 분류에 관한 여러 연구들이 이루어져 왔다. 이러한 의견 분류에 대한 연구들을 살펴보면, 분류를 위해 자질(Feature)로서 사용한 단일어(Single word)가 아닌 2개 이상의 N-gram 단어, 어휘 구문 패턴 및 통사 구문 패턴 등을 사용한다. 특히, 패턴은 단일어나 N-gram 단어에 비해 유연하고, 언어학적으로 풍부한 정보를 표현할 수 있기 때문에 이를 주요 연구 주제로 사용되었다. 그럼에도 불구하고, 이러한 연구들은 주로 영어에 대한 연구들이었으며, 한국어에 패턴을 적용하여 주관성을 갖는 문장을 분류하거나, 극성을 분류하는 연구들은 아직 미비하다. 한국어의 특색으로 한국어는 용언의 활용이 발달되어 있어, 어미의 변화가 다양하며, 그 변화에 따라 의미가 미묘하게 변화한다. 그러나 기존 한국어에 대한 의견 분류 연구들은 단어의 핵심 의미만을 파악하기 위해 어미 부분을 제거하고 어간만을 취해서 처리하여 어미에 대한 의미변화를 고려하지 못하므로 분류 정확도가 영어권에 연구 결과에 비해 떨어진다. 그래서 본 연구는 영어에 적용된 패턴을 이용한 기존 방법들을 정리하고, 그 방법들 중에서 극성을 지닌 문장성분 패턴을 한국어에 적용하였다. 그리고 어미의 변화에 대한 패턴을 추출하여 이 변화가 의견 분류의 성능에 미치는 영향을 분석하였다.

  • PDF

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.449-458
    • /
    • 2010
  • As occasion demands to obtain efficient information from many documents and reviews on the Internet in many kinds of fields, automatic classification of opinion or thought is required. These automatic classification is called sentiment classification, which can be divided into three steps, such as subjective expression classification to extract subjective sentences from documents, sentiment classification to classify whether the polarity of documents is positive or negative, and strength classification to classify whether the documents have weak polarity or strong polarity. The latest studies in Opinion Mining have used N-gram words, lexical phrase pattern, and syntactic phrase pattern, etc. They have not used single word as feature for classification. Especially, patterns have been used frequently as feature because they are more flexible than N-gram words and are also more deterministic than single word. Theses studies are mainly concerned with English, other studies using patterns for Korean are still at an early stage. Although Korean has a slight difference in the meaning between predicates by the change of endings, which is 'Eomi' in Korean, of declinable words, the earlier studies about Korean opinion classification removed endings from predicates only to extract stems. Finally, this study introduces the earlier studies and methods using pattern for English, uses extracted sentimental patterns from Korean documents, and classifies polarities of these documents. In this paper, it also analyses the influence of the change of endings on performances of opinion classification.

  • PDF

Identification of Differences between Importance and Performance of Forest Interpreter Training Programs using the IPA Method (IPA 기법을 활용한 숲해설가 직무교육프로그램에 대한 중요도와 성취도 차이분석)

  • Choi, Il-Sun;Ha, Si-Yeon;Son, Ji-Won
    • Journal of Korean Society of Forest Science
    • /
    • v.103 no.4
    • /
    • pp.679-686
    • /
    • 2014
  • This study analysed differences between importance and performance of 2014 forest interpreter training program through IPA with the aim to provide suggestions and improvement. First the results of a comparison of the overall average of performance and importance showed importance is higher than performance. Afterwards, the result of IPA showed that confidence of being an interpreter, active involvement, the understanding of the value of forest, expansion of the understanding of forest, the understanding of the mission of interpreter, and the understanding of the qualification of interpreter, those 6 items belong to the I quadrant. In the case of the II quadrant there were interest of the content of education and to learn a lot of things through education. Next, those how to deal with service, planning interpreter programs, clarity of the content of education, accuracy of the content of education, validity of the content of education, appropriate number of participants, and appropriate time of education involved in III quadrant. Finally, concentration in the education and the understanding of the topic of education situated in IV quadrant.

Recognition Method of Korean Abnormal Language for Spam Mail Filtering (스팸메일 필터링을 위한 한글 변칙어 인식 방법)

  • Ahn, Hee-Kook;Han, Uk-Pyo;Shin, Seung-Ho;Yang, Dong-Il;Roh, Hee-Young
    • Journal of Advanced Navigation Technology
    • /
    • v.15 no.2
    • /
    • pp.287-297
    • /
    • 2011
  • As electronic mails are being widely used for facility and speedness of information communication, as the amount of spam mails which have malice and advertisement increase and cause lots of social and economic problem. A number of approaches have been proposed to alleviate the impact of spam. These approaches can be categorized into pre-acceptance and post-acceptance methods. Post-acceptance methods include bayesian filters, collaborative filtering and e-mail prioritization which are based on words or sentances. But, spammers are changing those characteristics and sending to avoid filtering system. In the case of Korean, the abnormal usages can be much more than other languages because syllable is composed of chosung, jungsung, and jongsung. Existing formal expressions and learning algorithms have the limits to meet with those changes promptly and efficiently. So, we present an methods for recognizing Korean abnormal language(Koral) to improve accuracy and efficiency of filtering system. The method is based on syllabic than word and Smith-waterman algorithm. Through the experiment on filter keyword and e-mail extracted from mail server, we confirmed that Koral is recognized exactly according to similarity level. The required time and space costs are within the permitted limit.

Research of a plan setting Secondary School Teacher Recruitment Test of Electricity·Electronics·Communication Subject (중등교사 임용시험 전기·전자·통신 과목의 출제방안 연구)

  • kim, Jinsu;Rho, Taechun;Ryu, BungRho;Eun, Taeuk
    • 대한공업교육학회지
    • /
    • v.31 no.2
    • /
    • pp.128-154
    • /
    • 2006
  • In the knowledge-based society, the quality of education is the core factor of national development. Above all, for improving educational quality, it is important to advance teacher's quality. Therefore, in order to maintaining high-level quality of education, it is required to select and appoint competent teacher. It deserves emphasis on importance of teacher recruitment test for maintaining high-level quality of education in this changes of age. Specially, Secondary School Teacher Recruitment Test of Electricity Electronics Communication Subject is declined qualitatively as each Subject of Electricity Electronics Communication is integrated and criterion of examination is obscured. This research analyzed The seventh curriculum and curriculum of Institution of Teacher Education of Electricity Electronics Communication Subject and already known examination of it On the basis of analyzing result, A field, proportion and points of examination decided through a expert conference are as follow: first, Teacher Recruitment Test of Electricity Electronics Communication Subject consists of subject pedagogics and contents. a proportion of subject pedagogics is 20% and subject contents is 80%. second, a subfield of subject contents consists of industrial education, industrial curriculum, industrial instruction method, practical guidance method, management of practical field organization, assesment of industrial education, industrial-educational cooperation and vocation and career education. third, subject contents consists of a common special, foundation special and application special field. a common a proportion of special field is 7.4%, foundation special is 20% and application special field which consists of electric field(21.3%), electronic field(21.3%) and communication field(10%) is 52.6%. fourth, Teacher Recruitment Test of Electricity Electronics Communication Subject execute practical technique test after finishing writing test.

A Study on the Development of Personality Education Program Using Media in Middle School (미디어 활용 중학교 인성교육 프로그램 개발 연구)

  • Lee, Yeonhee
    • Trans-
    • /
    • v.12
    • /
    • pp.141-171
    • /
    • 2022
  • This study was conducted to understand media and cultivate personality by using media as data for personality education. To achieve this purpose, the Personality Education Promotion Act and the Korea Educational Development Institute's personality virtues were selected as educational elements, and a personality education program using media was developed in combination with the middle school curriculum. For this study, first, in order to extract personality virtues, 13 personality virtues were finally selected as educational elements by comparing and synthesizing the personality virtues of the Personality Education Promotion Act and the Korea Education Development Institute. The final personality virtues selected are self-esteem, courage, sincerity, self-regulation, wisdom, consideration, communication, courtesy, social responsibility, cooperation, citizenship, justice, and respect for human rights. Second, in order to select media and set the direction of development of personality education programs, the process of collecting media data was confirmed, and the direction and goal of the program were set by analyzing the middle school curriculum. Third, in order to propose a method of applying a personality education program using media, the personality grafting unit was selected by referring to the commentary on all subjects of the 2015 revised curriculum.

Nurses' Image perceived by College Nursing Students : Q-Methodological (간호대학생의 간호사 이미지에 대한 인식 : Q 방법론 적용)

  • Oh, Seung Eun;Lee, Hye-Jin;Lee, Joo Young
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.7
    • /
    • pp.192-199
    • /
    • 2018
  • This study was conducted to examine the nursing image types and characteristics of nurses perceived by nursing college students by applying the Q methodology and to manage effective nursing education and clinical education. The survey was conducted from May 15, 2017 to May 24, 2017, and the data collection for the Q population composition was based on in-depth interviews and literature review. First, nursing college students were searched for convenience and 158 statements were obtained based on open questionnaires and 64 statements extracted from in-depth interviews and after a literature review. To select the Q sample, Q population was categorized by taking several repeated readings. Five categories were developed from these processes: quality and role, social awareness, professionalism, uniqueness, and working conditions. The selected statements were reviewed and revised by experts and 35 Q samples were finally selected. Based on this, 46 students in one nursing college classified 35 Q statements, and analyzed data using PC QUANL program. The results of the study were as follows: 2) Type I-1: Job related anxiety, Type I-2: cold and professional, Type II-1: Complaint of treatment and Type II-2: Profession-Unacceptable. These results are expected to provide useful data for understanding the characteristics of nurses' images and provide data enabling development of image improvement strategies of nursing education and clinical education.

Component Analysis for Constructing an Emotion Ontology (감정 온톨로지의 구축을 위한 구성요소 분석)

  • Yoon, Ae-Sun;Kwon, Hyuk-Chul
    • Korean Journal of Cognitive Science
    • /
    • v.21 no.1
    • /
    • pp.157-175
    • /
    • 2010
  • Understanding dialogue participant's emotion is important as well as decoding the explicit message in human communication. It is well known that non-verbal elements are more suitable for conveying speaker's emotions than verbal elements. Written texts, however, contain a variety of linguistic units that express emotions. This study aims at analyzing components for constructing an emotion ontology, that provides us with numerous applications in Human Language Technology. A majority of the previous work in text-based emotion processing focused on the classification of emotions, the construction of a dictionary describing emotion, and the retrieval of those lexica in texts through keyword spotting and/or syntactic parsing techniques. The retrieved or computed emotions based on that process did not show good results in terms of accuracy. Thus, more sophisticate components analysis is proposed and the linguistic factors are introduced in this study. (1) 5 linguistic types of emotion expressions are differentiated in terms of target (verbal/non-verbal) and the method (expressive/descriptive/iconic). The correlations among them as well as their correlation with the non-verbal expressive type are also determined. This characteristic is expected to guarantees more adaptability to our ontology in multi-modal environments. (2) As emotion-related components, this study proposes 24 emotion types, the 5-scale intensity (-2~+2), and the 3-scale polarity (positive/negative/neutral) which can describe a variety of emotions in more detail and in standardized way. (3) We introduce verbal expression-related components, such as 'experiencer', 'description target', 'description method' and 'linguistic features', which can classify and tag appropriately verbal expressions of emotions. (4) Adopting the linguistic tag sets proposed by ISO and TEI and providing the mapping table between our classification of emotions and Plutchik's, our ontology can be easily employed for multilingual processing.

  • PDF

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Effective Feature Vector for Isolated-Word Recognizer using Vocal Cord Signal (성대신호 기반의 명령어인식기를 위한 특징벡터 연구)

  • Jung, Young-Giu;Han, Mun-Sung;Lee, Sang-Jo
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.3
    • /
    • pp.226-234
    • /
    • 2007
  • In this paper, we develop a speech recognition system using a throat microphone. The use of this kind of microphone minimizes the impact of environmental noise. However, because of the absence of high frequencies and the partially loss of formant frequencies, previous systems developed with those devices have shown a lower recognition rate than systems which use standard microphone signals. This problem has led to researchers using throat microphone signals as supplementary data sources supporting standard microphone signals. In this paper, we present a high performance ASR system which we developed using only a throat microphone by taking advantage of Korean Phonological Feature Theory and a detailed throat signal analysis. Analyzing the spectrum and the result of FFT of the throat microphone signal, we find that the conventional MFCC feature vector that uses a critical pass filter does not characterize the throat microphone signals well. We also describe the conditions of the feature extraction algorithm which make it best suited for throat microphone signal analysis. The conditions involve (1) a sensitive band-pass filter and (2) use of feature vector which is suitable for voice/non-voice classification. We experimentally show that the ZCPA algorithm designed to meet these conditions improves the recognizer's performance by approximately 16%. And we find that an additional noise-canceling algorithm such as RAST A results in 2% more performance improvement.