• Title/Summary/Keyword: 평가 일치도

Search Result 2,351, Processing Time 0.025 seconds

Data Augmentation and Preprocessing to Improve Automated Essay Scoring Model (에세이 자동 평가 모델 성능 향상을 위한 데이터 증강과 전처리)

  • Kanghee Go;Doguk Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.327-332
    • /
    • 2023
  • 데이터의 품질과 다양성은 모델 성능에 지대한 영향을 끼친다. 본 연구에서는 Topic을 활용한 데이터 전처리와 BERT 기반 MLM, T5, Random Masking을 이용한 증강으로 데이터의 품질과 다양성을 높이고자 했으며, 이를 KoBERT 기반 에세이 자동 평가 모델에 적용했다. 데이터 전처리만 진행했을 때, Quadratic Weighted Kappa Score(QWK)를 기준으로 모델이 에세이의 모든 평가 항목에 대해 베이스라인보다 더욱 높은 일치도를 보였으며 평가항목별 일치도의 평균을 기준으로 0.5368029에서 0.5483064(+0.0115035)로 상승했다. 여기에 제안하는 증강 방식을 추가 할 경우 MLM, T5, Random Masking 모두 성능 향상 효과를 보였다. 특히, MLM 데이터 증강 방식을 추가로 적용하였을 때 최종적으로 0.5483064에서 0.55151645(+0.00321005)으로 상승해 가장 높은 일치도를 보였으며, 에세이 총점으로 QWK를 기준으로 성능을 평가하면 베이스라인 대비 0.4110809에서 0.4380132(+0.0269323)로의 성능 개선이 있었다.

  • PDF

The Model of Appraisal Method on Authentic Records (전자기록의 진본 평가 시스템 모형 연구)

  • Kim, Ik-Han
    • The Korean Journal of Archival Studies
    • /
    • no.14
    • /
    • pp.91-117
    • /
    • 2006
  • Electronic Records need to be appraised the authenticity as well as the value itself. There has been various kinds of discussion about how records to be appraised the value of themselves, but there's little argument about how electronic records to be appraised the authenticity of themselves. Therefore this article is modeling some specific authenticity appraisal methods and showing each stages those methods should or may be applied. At the Ingest stage, integrity verification right after records creation in the organization which produced the records, quality and integrity verification about the transferred in the organization which received the records and integrity check between SIP and AIP in the organization which received and preserved the records are essential. At the Preservation stage, integrity check between same AIPs stored in different medium separately and validation of records where or not damaged and recovery damaged records are needed. At the various Processing stages, suitability evaluation after changing the record's management control meta data and changing the record's classification, integrity check after records migration and periodical validation and integrity verification about DIPs are required. For those activities, the appraisal methods including integrity verification, content consistency check, suitability evaluation about record's meta data, feasibility check of unauthorized update and physical status validation should be applied to the electronic records management process.

Web Usability Testing by Using Scanpath Similarity Analysis (탐색경로 일치도 분석을 이용한 웹사이트 사용성 평가)

  • Kim, Youngjun;Kim, Youngjin
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.2
    • /
    • pp.793-803
    • /
    • 2013
  • This study was performed to determine the usefulness of scanpath similarity analysis as one of new web usability testing. The 5 websites of public institutions were used and 15 students participated. First of all, eye movements were tracked and visual appeal ratings were measured as participants freely viewed each website for 3 seconds. Subsequently in continuously tracking the eye movements we asked the participants to perform 17 missions. Finally, in interview the participants rated on satisfaction, awareness, and mission difficulty. Results of this study showed that scanpath similarity had a significant relationship with both the visual appeal ratings(first impression) and the satisfaction. In other words, higher the visual appeal ratings were related to higher scanpath similarity. This result showed that measurement such as scanpath similarity of eye movements could become an objective index for usability testing instead of subjective evaluation such as the satisfaction. We discussed possibility that the usability testing by using the scanpath similarity with both fixation and duration on eye movements will find more appropriately inference on observers' experiences in websites.

Evaluation of Hygienic Status of University Foodservice Operation using ATP bioluminescence Assay (ATP bioluminescence Assay를 이용한 대학 급식시설의 위생상태 평가에 관한 연구)

  • 박영숙
    • Korean journal of food and cookery science
    • /
    • v.16 no.2
    • /
    • pp.195-201
    • /
    • 2000
  • An investigation was conducted to evaluate the hygienic status of university foodservice operation by using conventional swabbing technique plus standard plate count and ATP bioluminescence assay. The results of the study were as follows: 1) For all kitchen boards, knives, feeding trays, and dish towels tested, there was an overall agreement at 84.7% level between the results obtained using ATP bioluminescence and plate count when using a pass/fail cut-off of 3$\times$ control values for ATP assay and 40 CFU(colony forming unit)/㎠ for plate count. 2) The agrement rate between ATP assay and standard plate count was 87.5% for the samples before use, 29.2% for those during use, and 42.7% for those after cleaning and sanitizing. 3) The plate counts of three university foodservice operations for kitchen board, kitchen knife, feeding tray and dish towel were within the acceptable limits when tested before using. However, none of them were within the acceptable limits when tested during using and after cleaning and sanitizing. 4) Above results suggested that an immediate action needs to be taken to reduce the potential danger of cross-contamination and also effective sanitary control methods needs to be developed to improve the sanitary condition.

  • PDF

The Development of Infants and Toddlers: A Rating Scale for Teachers (교사 평정용 영아발달 평가도구)

  • Lee, Young Ja;Lee, Jong Sook;Shin, Eun Soo;Kwak, Hyang Lim;Lee, Jeong Wuk
    • Korean Journal of Child Studies
    • /
    • v.22 no.2
    • /
    • pp.255-275
    • /
    • 2001
  • The purpose of this study was to construct a rating scale of infant and toddler development for the use of teachers in their observations of children during daily activities. The scale consists of 201 items measuring motor, self-help, socio-communicative, socio-emotional, and cognitive development. Quality, reliability and validity were examined with the use of a nation-wide sample of 1,245 children from classes of 1- and 2-year-olds. Tests on validity and reliability were high in terms of content validity evaluated by early childhood professionals, concurrent validity with the Bayley Scale, internal consistency among raters, and test-retest reliability. Factor analysis showed that the developmental areas of infants and toddlers are not clearly differentiated but are interrelated with each other. The scale was standardized by providing nation-wide norms with raw scores, percentiles, and standardized scores.

  • PDF

Mutual Perceptions among Clients, Agencies, and Consumers on the Evaluation of Ad Creativity: Extending Application of the Co-Orientation Model (광고 창의성 평가에 대한 광고주, 광고 제작자, 소비자 간의 상호인식 연구: 상호지향성 모델의 확장 적용)

  • Kim, Bong-Chul;Choi, Myung-Il;Lee, Jin-U
    • (The) Korean Journal of Advertising
    • /
    • v.25 no.1
    • /
    • pp.179-201
    • /
    • 2014
  • This study explored mutual perceptions among clients, agencies, and consumers on the evaluation of ad creativity applying the co-orientation model. In order to investigate agreement, congruence, accuracy among three groups, they exposed to real commercials as stimuli and evaluated ad creativity of them in terms of four dimensions, such as originality, appropriateness, clarity, and relevance. Results indicated that agreement between agencies and consumers is relatively high, whereas one between clients and agencies is relatively low. Also, clients show relatively higher level of congruence, but agencies have relatively lower level of one. Accuracy between agencies' evaluation of ad creativity and clients' perception of agencies' view on ad creativity, and between consumers' evaluation of ad creativity and clients' perception of consumers' view on ad creativity would be relatively high. On the other hand, accuracy between clients' evaluation of ad creativity and agencies' perception of clients' view on ad creativity would be relatively low. Results showed that there is a need clients and agencies to consider on consumers' viewpoints on ad creativity.

Early Identification of 2- and 3-Year-Old Children for Social and Emotional Problems: A Preliminary Study of the Ages and Stages Questionnaires: Social-Emotional (ASQ:SE) (2, 3세 유아의 사회 정서 문제 조기발견: ASQ:SE 선별 평가서의 표준화 연구)

  • Heo, kay Heoung
    • Korean Journal of Child Studies
    • /
    • v.21 no.4
    • /
    • pp.123-141
    • /
    • 2000
  • 본 연구 논문은 영 유아를 대상으로 한 사회 정서 문제의 선별 평가서인 Ages and Stages Questionnaire: Social-Emotional (ASQ:SE)의 표준화(신뢰도 및 타당도)연구이다. 특히, ASQ:SE 질문지 가운데에서도 24개월과 36개월용을 이용하여 ASQ:SE 의 내적 일치도 (internal consistency reliability), 재검사 신뢰도 (test-retest reliability), 절선 점수 (cutoff points), 공인 타당도 (concurrent validity)에 대해 연구하였다. 전체 447명의 부모가 참여한 가운데, 237명은 24개월용 질문지를 작성했고, 210명은 36개월용 질문지를 작성했다. 내적 일치도는 24개월 질문지에서 71, 36개월 질문지에서는 .73이었다. 재검사 신뢰도는 24 개월 질문지에서 100%, 36개월 질문지에서 97%이었다. 마지막으로 공인 타당도는 24개월과 36개월 설문지에서 95%이었다. 끝으로, 추후 다양한 대상으로 연구될 것이 추천되며 또한 이 연구에서 제외되어졌던 연령의 ASQ:SE 질문지의 내적 일치도, 신뢰도 및 타당도가 연구되어져야 할 것이다.

  • PDF

Permutation p-values for specific-category kappa measure of agreement (특정 범주에 대한 평가자간 카파 일치도의 퍼뮤테이션 p값)

  • Um, Yonghwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.899-910
    • /
    • 2016
  • Asymptotic tests are often not suitable for the analysis of sparse ordered contingency tables as asymptotic p-values may either overestimate or underestimate the true pvalues. In this pater, we describe permutation procedures in which we compute exact or resampling p-values for a weighted specific-category agreement in ordered $k{\times}k$ contingency tables. We use the weighted specific-category kappa proposed by $Kv{\dot{a}}lseth$ to measure the extent to which two independent raters agree on the specific categories. We carried out comparison studies between exact p-values, resampling p-values and asymptotic p-values using $3{\times}3$ contingency data (real and artificial data sets) and $4{\times}4$ artificial contingency data.

Posterior density estimation of Kappa via Gibbs sampler in the beta-binomial model (베타-이항 분포에서 Gibbs sampler를 이용한 평가 일치도의 사후 분포 추정)

  • 엄종석;최일수;안윤기
    • The Korean Journal of Applied Statistics
    • /
    • v.7 no.2
    • /
    • pp.9-19
    • /
    • 1994
  • Beta-binomial model, which is reparametrized in terms of the mean probability $\mu$ of a positive deagnosis and the $\kappa$ of agreement, is widely used in psychology. When $\mu$ is close to 0, inference about $\kappa$ become difficult because likelihood function becomes constant. We consider Bayesian approach in this case. To apply Bayesian analysis, Gibbs sampler is used to overcome difficulties in integration. Marginal posterior density functions are estimated and Bayesian estimates are derived by using Gibbs sampler and compare the results with the one obtained by using numerical integration.

  • PDF

Measuring Agreement of Modified MP3 and CVMS according to BMI Percentile (중지 중절골과 경추를 이용한 골령 평가의 체질량 지수에 따른 일치도)

  • Yi, Seoksoon;Lee, Daewoo;Yang, Yeonmi;Kim, Jaegon
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.46 no.1
    • /
    • pp.48-56
    • /
    • 2019
  • The objective of this study was to examine measuring agreement between middle phalanx of the third finger and cervical vertebrae analysis for assessing skeletal maturity according to body mass index percentile. A retrospective chart view was used to select patients with body mass index data, hand - wrist radiographs and lateral cephalograms of same day. The patients were divided into 4 groups by body mass index percentile. The hand - wrist radiographs were analyzed using modified middle phalanx of the third finger method and the lateral cephalograms were categorized according to cervical vertebral maturation stage. The degree of agreement between the 2 methods of analyzing skeletal maturation was measured by calculating weighted kappa statistic according to body mass index percentile group. There was a good agreement between the 2 methods in the entire body mass index percentile group. According to the body mass index percentile group, the agreement was found to be different, and the pattern was different between boys and girls. Pediatric dentist should consider sex and weight status when evaluating growing children and adolescents because it can affect the agreement of 2 method of analyzing skeletal maturation.