• Title/Summary/Keyword: 언어 분석 자질

Search Result 156, Processing Time 0.024 seconds

Sentiment Classification considering Korean Features (한국어 특성을 고려한 감성 분류)

  • Kim, Jung-Ho;Kim, Myung-Kyu;Cha, Myung-Hoon;In, Joo-Ho;Chae, Soo-Hoan
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.449-458
    • /
    • 2010
  • As occasion demands to obtain efficient information from many documents and reviews on the Internet in many kinds of fields, automatic classification of opinion or thought is required. These automatic classification is called sentiment classification, which can be divided into three steps, such as subjective expression classification to extract subjective sentences from documents, sentiment classification to classify whether the polarity of documents is positive or negative, and strength classification to classify whether the documents have weak polarity or strong polarity. The latest studies in Opinion Mining have used N-gram words, lexical phrase pattern, and syntactic phrase pattern, etc. They have not used single word as feature for classification. Especially, patterns have been used frequently as feature because they are more flexible than N-gram words and are also more deterministic than single word. Theses studies are mainly concerned with English, other studies using patterns for Korean are still at an early stage. Although Korean has a slight difference in the meaning between predicates by the change of endings, which is 'Eomi' in Korean, of declinable words, the earlier studies about Korean opinion classification removed endings from predicates only to extract stems. Finally, this study introduces the earlier studies and methods using pattern for English, uses extracted sentimental patterns from Korean documents, and classifies polarities of these documents. In this paper, it also analyses the influence of the change of endings on performances of opinion classification.

  • PDF

A Study on the Reliable Video Transmission Through Source/Channel Combined Optimal Quantizer for EREC Based Bitstream (EREC 기반 비트열을 위한 Source-Channel 결합 최적 양자화기 설계 및 이를 통한 안정적 영상 전송에 관한 연구)

  • 김용구;송진규;최윤식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.12B
    • /
    • pp.2094-2108
    • /
    • 2000
  • 오류를 수반하는 통신망을 통한 멀티미디어 데이터의 응용은 최근 그 수요가 급증하고 있다. 하지만 그 구현은 많은 문제점들을 야기하는데, 전송된 비디오 데이터에 발생한 오류를 처리하는 문제가 그 중 하나이다. 이는 압축된 비트열에 발생한 오류가 영상의 시-공간 방향으로 심각한 전파 현상을 수반하기 때문이다. 이러한 심각한 오류 전파를 완화하기 위해 본 논문에서는 EREC라 알려진 오류 제한 기법을 적용하고, 적용된 EREC의 오류 전파 특성을 분석하였다. 이를 통해, 압축 부호화된 하나의 기본 블록 (매크로 블록)이 복호시 오류가 생길 확률을 추정하였으며, 추정된 확률의 근사를 통해 양 끝단(전송단과 수신단)에서의 비디오 화질 열화를 예측하였다. 추정 확률의 근사는 매 기본 블록에서 발생된 비트수에 대한 그 기본 블록이 복호시 오류가 생길 확률을 간단한 1차식을 통한 선형 회귀법으로 모델링 되었으며, 따라서 간단한 방법을 통해 양 끝단의 화질 열화를 효과적으로 예측할 수 있었다. 부호화된 비트열이 전송 오류에 보다 강인하게 되도록 하기 위해, 본 논문에서 개발된 화질 열화 모델을 양자화기 선택에 적용함으로써, 새로운 최적 양자화 기법을 제시하였다. 본 논문에서 제안된 최적 양자화 기법은, 기존의 양자기 최적화 기법들과는 달리, 복호단에서의 복원 영상 화질이 주어진 비트율에서 최적이 되도록 양자화를 수행한다. H.263 비디오 압축 규격에 적용한 제안 양자화 기법의 실험 결과를 통해, 제안 기법이 매우 적은 계산상의 부하를 비용으로 객관적 화질은 물론 주관적 화질까지 크게 개선할 수 있음을 확인할 수 있었다.내었다.Lc. lacti ssp. lactis의 젖산과 초산의 생성량은 각각 0.089, 0.003과 0.189, 0.003M이었다. 따라서 corn steep liquor는 L. fermentum와 Lc. lactis ssp, lactis 의 생장을 위해 질소 또는 탄소 공급원으로서 배지에 첨가 될 수 있는 우수한 농업 부산물로 판단되었다.징하며 WLWQ에 적용되는 몇 가지 제약을 관찰하고 이를 일반적인 언어원리로 설명한다. 첫째, XP는 주어로만 해석되는데 그 이유는 XP가 목적어 혹은 부가어 등 다른 기능을 할 경우 생략 부위가 생략의 복원 가능선 원리 (the deletion-up-to recoverability principle)를 위배하기 때문이다. 둘째, WLWQ가 내용 의문문으로만 해석되는데 그 이유는 양의 공리(the maxim of quantity: Grice 1975) 때문이다. 평서문으로 해석될 경우 WP에 들어갈 부분이 XP의 자질의 부분집합에 불과하므로 명제가 아무런 정보제공을 하지 못한다. 반면 의문문 자체는 정보제공을 추구하지 않으므로 앞에서 언급한 양의 공리로부터 자유롭다. 셋째, WLWQ의 XP는 주제어 표지 ‘는/-은’을 취하나 주어표지 ‘가/-이’는 취하지 못한다(XP-는/-은 vs. XP-가/-이). 이는 IP내부 에 비공범주의 존재 여부에 따라 C의 음운형태(PF)가 시성이 정해진다는 가설로 설명하고자 했다. WLWQ에 대한 우리의 논의가 옳다면, 본 논문은 다음과 같은 이론적 함의를 기닌다. 첫째, WLWQ의 존재는 생략에 대한 두 이론 즉 LF 복사 이론과 PF 삭제 이론

  • PDF

A Comparative Research on End-to-End Clinical Entity and Relation Extraction using Deep Neural Networks: Pipeline vs. Joint Models (심층 신경망을 활용한 진료 기록 문헌에서의 종단형 개체명 및 관계 추출 비교 연구 - 파이프라인 모델과 결합 모델을 중심으로 -)

  • Sung-Pil Choi
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.1
    • /
    • pp.93-114
    • /
    • 2023
  • Information extraction can facilitate the intensive analysis of documents by providing semantic triples which consist of named entities and their relations recognized in the texts. However, most of the research so far has been carried out separately for named entity recognition and relation extraction as individual studies, and as a result, the effective performance evaluation of the entire information extraction systems was not performed properly. This paper introduces two models of end-to-end information extraction that can extract various entity names in clinical records and their relationships in the form of semantic triples, namely pipeline and joint models and compares their performances in depth. The pipeline model consists of an entity recognition sub-system based on bidirectional GRU-CRFs and a relation extraction module using multiple encoding scheme, whereas the joint model was implemented with a single bidirectional GRU-CRFs equipped with multi-head labeling method. In the experiments using i2b2/VA 2010, the performance of the pipeline model was 5.5% (F-measure) higher. In addition, through a comparative experiment with existing state-of-the-art systems using large-scale neural language models and manually constructed features, the objective performance level of the end-to-end models implemented in this paper could be identified properly.

A Study on Automatic Classification of Subject Headings Using BERT Model (BERT 모형을 이용한 주제명 자동 분류 연구)

  • Yong-Gu Lee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.2
    • /
    • pp.435-452
    • /
    • 2023
  • This study experimented with automatic classification of subject headings using BERT-based transfer learning model, and analyzed its performance. This study analyzed the classification performance according to the main class of KDC classification and the category type of subject headings. Six datasets were constructed from Korean national bibliographies based on the frequency of the assignments of subject headings, and titles were used as classification features. As a result, classification performance showed values of 0.6059 and 0.5626 on the micro F1 and macro F1 score, respectively, in the dataset (1,539,076 records) containing 3,506 subject headings. In addition, classification performance by the main class of KDC classification showed good performance in the class General works, Natural science, Technology and Language, and low performance in Religion and Arts. As for the performance by the category type of the subject headings, the categories of plant, legal name and product name showed high performance, whereas national treasure/treasure category showed low performance. In a large dataset, the ratio of subject headings that cannot be assigned increases, resulting in a decrease in final performance, and improvement is needed to increase classification performance for low-frequency subject headings.

A Study for the Certified Security Certification in Private Security Industry in Korea (민간경비 자격제도에 관한 연구)

  • Ahn, Hwang-Kwon
    • Korean Security Journal
    • /
    • no.11
    • /
    • pp.159-181
    • /
    • 2006
  • This study is concerned on Why The Certified Security certification is needed and How to control the security quality to get better service to the clients. Theses days are required The Certified Certificate in all the industry. And in this point of view, the certified certificate is a kind of confirmation by an authority to the person who has how much special knowledge and practice in a certain field. Moreover, in the functionalism society the certified certificate system would be very positive effect to the related industry and society as official measurement by an authority. The security is freedom from fear and anxiety. Which means the security can not be operated in isolation from citizen's safe-living expectation, and which is also dealing with valuable human being's life. For getting the better purpose the security industry employees should have more organized special training and education. As my understanding the certified certificate exam system is the confirmation by an authority, the certified certificate is only neutral evidence to get the confidence and credit from the clients. In this point of view the core point is How to control The Certified Certificate by a credied authority.

  • PDF

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.