• Title/Summary/Keyword: 카파계수

Search Result 33, Processing Time 0.027 seconds

Sentiment Categorization of Korean Customer Reviews using CRFs (CRFs를 이용한 한국어 상품평의 감정 분류)

  • Shin, Junsoo;Lee, Juhoo;Kim, Harksoo
    • Annual Conference on Human and Language Technology
    • /
    • 2008.10a
    • /
    • pp.58-62
    • /
    • 2008
  • 인터넷 상에서 상품을 구입할 때 고려하는 부분 중의 하나가 상품평이다. 하지만 이러한 상품평들을 개인이 일일이 확인 하는데에는 상당한 시간이 소요된다. 이러한 문제점을 줄이기 위해서 본 논문에서는 인터넷 상의 상품평에 대한 의견을 긍정, 부정, 일반으로 나누는 시스템을 제안한다. 제안 시스템은 CRFs 기계학습모델을 기반으로 하며, 연결어미, 형태소 유니그램, 슬라이딩 윈도우 기법의 형태소 바이그램을 자질로 사용한다. 실험을 위해서 가격비교 사이트의 모니터 카테고리에서 561개의 상품평을 수집하였다. 이 중 465개의 상품평을 학습 문서로 사용하였고 96개의 상품평을 실험 문서로 사용하였다. 제안 시스템은 실험결과 79% 정도의 정확도를 보였다. 추가 실험으로 제안 시스템이 사람들과 얼마나 비슷한 성능을 보이는지 알아보기 위해서 카파 테스트를 실시하였다. 카파 테스트를 실시한 결과, 사람간의 카파 계수는 0.6415였으며, 제안 시스템과 사람 간의 카파 계수는 평균 0.5976이였다. 결론적으로 제안 시스템이 사람보다는 떨어지지만 유사한 정도의 성능을 보임을 알 수 있었다.

  • PDF

Named Entity Recognition for Patent Documents Based on Conditional Random Fields (조건부 랜덤 필드를 이용한 특허 문서의 개체명 인식)

  • Lee, Tae Seok;Shin, Su Mi;Kang, Seung Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.9
    • /
    • pp.419-424
    • /
    • 2016
  • Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.

Detection of Burned Forest Areas Using Landsat TM Images (Landsat TM 위성영상을 이용한 산불 발생지역의 탐지)

  • 김철민;이승호;노대균
    • Proceedings of the KSRS Conference
    • /
    • 2001.03a
    • /
    • pp.77-81
    • /
    • 2001
  • 2000년 4월, 강원도 삼척일대에 크게 발생한 산불지역에 대해서 Landsat TM 인공위성 영상자료를 이용하여 산불의 피해지역을 조사분석하였다. 산불발생 전과 후의 2시기 위성영상을 이용하여 변화탐지 기법의 하나인 화상간차이법을 적용하였다. 분석결과 산불 발생지역의 탐지에는 NDVI를 유도하고 그 차이를 이용하는 것이 가장 탁월한 것으로 나타났다. 산불 피해지역을 구분하는 임계값을 표준편차$\times$0.9로 하였을 때, 현지조사 결과에 대한 전체정확도는 93.8%, 카파계수는 0.82로 매우 높았다.

  • PDF

A simulation study of rater agreement measures (모의 실험을 이용한 여러 합치도들의 비교)

  • Han, Kyung-Do;Park, Yong-Gyu
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.1
    • /
    • pp.25-37
    • /
    • 2012
  • Many statistics, such as Cohen's (1960) ${\kappa}$, Scott's (1955) ${\pi}$, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.

Comparison and Evaluation of Classification Accuracy for Pinus koraiensis and Larix kaempferi based on LiDAR Platforms and Deep Learning Models (라이다 플랫폼과 딥러닝 모델에 따른 잣나무와 낙엽송의 분류정확도 비교 및 평가)

  • Yong-Kyu Lee;Sang-Jin Lee;Jung-Soo Lee
    • Journal of Korean Society of Forest Science
    • /
    • v.112 no.2
    • /
    • pp.195-208
    • /
    • 2023
  • This study aimed to use three-dimensional point cloud data (PCD) obtained from Terrestrial Laser Scanning (TLS) and Mobile Laser Scanning (MLS) to evaluate a deep learning-based species classification model for two tree species: Pinus koraiensis and Larix kaempferi. Sixteen models were constructed based on the three conditions: LiDAR platform (TLS and MLS), down-sampling intensity (1024, 2048, 4096, 8192), and deep learning model (PointNet, PointNet++). According to the classification accuracy evaluation, the highest kappa coefficients were 93.7% for TLS and 96.9% for MLS when applied to PCD data from the PointNet++ model, with down-sampling intensities of 8192 and 2048, respectively. Furthermore, PointNet++ was consistently more accurate than PointNet in all scenarios sharing the same platform and down-sampling intensity. Misclassification occurred among individuals of different species with structurally similar characteristics, among individual trees that exhibited eccentric growth due to their location on slopes or around trails, and among some individual trees in which the crown was vertically divided during tree segmentation.

Concept-based Automatic Scoring System for Korean Free-text or Constructed Answers (개념 기반 한국어 서답형 답안의 자동채점 시스템)

  • Park, Il-Nam;Noh, Eun-Hee;Sim, Jae-Ho;Kim, Myung-Hwa;Kang, Seung-Shik
    • Annual Conference on Human and Language Technology
    • /
    • 2012.10a
    • /
    • pp.69-72
    • /
    • 2012
  • 본 논문은 한국어 서답형(단어, 구 수준) 문항 유형을 분석하고 실제 채점자가 채점 기준표를 보고 채점하는 방법을 컴퓨터가 인식할 수 있도록 정답 템플릿을 설계 및 개념 정의를 하여 한국어 서답형에 특화된 자동채점 시스템 방법을 제시한다. 본 시스템을 사용하여 1000개의 학생 답안지에 대한 유형 가지수 500개 이하의 2011년도 학업성취도 평가 과학 6개 문항에 대하여 채점 기준표 내용을 정답 템플릿으로 작성한 뒤 250개 학생 답안을 학습데이터로, 정답 템플릿을 업데이트로 사용, 750개 학생 답안에 대하여 자동채점한 결과, 평균 카파계수 0.84라는 수치로서 실제 사람 채점 결과와 거의 완벽히 일치라는 결과를 얻었다.

  • PDF

Inter-Rater Reliability of Carotid Intima-Media Thickness Measurements in a Multicenter Cohort Study (다기관 코호트 연구에서 경동맥 내막-중막 두께 측정의 측정자간 신뢰도 평가)

  • Lee, Jung Hyun;Choi, Dong Phil;Shim, Jee-Seon;Kim, Dae Jung;Park, Sung-Ha;Kim, Hyeon Chang
    • Journal of health informatics and statistics
    • /
    • v.41 no.1
    • /
    • pp.49-56
    • /
    • 2016
  • Objectives: Carotid intima-media thickness (CIMT) and the presence of carotid artery plaque are widely used as preclinical markers of atherosclerosis. Due to operator dependency in measuring CIMT, it is important to evaluate the reliability of measuring CIMT and plaque between centers in a multicenter study. The purpose of this study is to evaluate the inter-rater reliability of CIMT and plaque presence among three clinical centers of the Cardiovascular and Metabolic Disease Etiology Research Center (CMERC). Methods: Twenty people without known cardiovascular disease (age 37-64) were enrolled during 2014-2015, and their left and right carotid arteries were examined repeatedly with ultrasonography for CIMT measurements at three clinical centers according to a predetermined protocol. Maximum and mean values of CIMT at distal common carotid artery were recorded. Plaque presence at a carotid artery was checked by an operator. The reliability of CIMT and carotid plaque presence was assessed using an intraclass correlation coefficient (ICC) and kappa statistics, respectively. Results: Calculated ICC was 0.647 (95% CI: 0.487-0.779) for maximum CIMT, and 0.758 (95% CI: 0.632- 0.854) for mean CIMT. In Bland Altman plot, most observed values were distributed within mean difference ${\pm}1.96$ SD ranges. Kappa statistics of plaque presence between two centers were 0.304 (center 1 and 2), 0.507 (center 1 and 3), and 0.606 (center 2 and 3), respectively, while Fleiss kappa for overall agreement was 0.445. Conclusions: The inter-rater reliability of CIMT measurements among three clinical centers turned out to be high, and the agreement of measuring carotid plaque presence was fair.

Automated Scoring System for Korean Short-Answer Questions Using Predictability and Unanimity (기계학습 분류기의 예측확률과 만장일치를 이용한 한국어 서답형 문항 자동채점 시스템)

  • Cheon, Min-Ah;Kim, Chang-Hyun;Kim, Jae-Hoon;Noh, Eun-Hee;Sung, Kyung-Hee;Song, Mi-Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.5 no.11
    • /
    • pp.527-534
    • /
    • 2016
  • The emergent information society requires the talent for creative thinking based on problem-solving skills and comprehensive thinking rather than simple memorization. Therefore, the Korean curriculum has also changed into the direction of the creative thinking through increasing short-answer questions that can determine the overall thinking of the students. However, their scoring results are a little bit inconsistency because scoring short-answer questions depends on the subjective scoring of human raters. In order to alleviate this point, an automated scoring system using a machine learning has been used as a scoring tool in overseas. Linguistically, Korean and English is totally different in the structure of the sentences. Thus, the automated scoring system used in English cannot be applied to Korean. In this paper, we introduce an automated scoring system for Korean short-answer questions using predictability and unanimity. We also verify the practicality of the automatic scoring system through the correlation coefficient between the results of the automated scoring system and those of human raters. In the experiment of this paper, the proposed system is evaluated for constructed-response items of Korean language, social studies, and science in the National Assessment of Educational Achievement. The analysis was used Pearson correlation coefficients and Kappa coefficient. Results of the experiment had showed a strong positive correlation with all the correlation coefficients at 0.7 or higher. Thus, the scoring results of the proposed scoring system are similar to those of human raters. Therefore, the automated scoring system should be found to be useful as a scoring tool.

Design of a Hopeful Career Forecasting Program for the Career Education (진로교육을 위한 희망진로 예측프로그램 설계)

  • Kim, Geun-Ho;Kim, Eui-Jeong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.8
    • /
    • pp.1055-1060
    • /
    • 2018
  • In the wake of the 4th Industrial Revolution, the problem of career education in schools has become a big issue. While various studies are being conducted on services or technologies to effectively handle artificial intelligence and big data, in the field of education, data on students is simply processed. Therefore, in this paper, we are going to design and present career prediction programs for students using artificial intelligence and big data. Using observational data from students at the institute, the decision tree is constructed with the C4.5 algorithm known to be most intelligent and effective in the decision tree and is used to predict students' path of hope. As a result, the coefficient of kappa exceeded 0.7 and showed a fairly low average error of 0.1 degrees. As shown in this study, a number of studies and data will be deployed to help guide students in their consultation and to provide them with classroom attitudes and directions.

Classification of 3D Road Objects Using Machine Learning (머신러닝을 이용한 3차원 도로객체의 분류)

  • Hong, Song Pyo;Kim, Eui Myoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.36 no.6
    • /
    • pp.535-544
    • /
    • 2018
  • Autonomous driving can be limited by only using sensors if the sensor is blocked by sudden changes in surrounding environments or large features such as heavy vehicles. In order to overcome the limitations, the precise road-map has been used additionally. This study was conducted to segment and classify road objects using 3D point cloud data acquired by terrestrial mobile mapping system provided by National Geographic Information Institute. For this study, the original 3D point cloud data were pre-processed and a filtering technique was selected to separate the ground and non-ground points. In addition, the road objects corresponding to the lanes, the street lights, the safety fences were initially segmented, and then the objects were classified using the support vector machine which is a kind of machine learning. For the training data for supervised classification, only the geometric elements and the height information using the eigenvalues extracted from the road objects were used. The overall accuracy of the classification results was 87% and the kappa coefficient was 0.795. It is expected that classification accuracy will be increased if various classification items are added not only geometric elements for classifying road objects in the future.