• Title/Summary/Keyword: evaluation metric

Search Result 303, Processing Time 0.023 seconds

SRLev-BIH: An Evaluation Metric for Korean Generative Commonsense Reasoning (SRLev-BIH: 한국어 일반 상식 추론 및 생성 능력 평가 지표)

  • Jaehyung Seo;Yoonna Jang;Jaewook Lee;Hyeonseok Moon;Sugyeong Eo;Chanjun Park;Aram So;Heuiseok Lim
    • Annual Conference on Human and Language Technology
    • /
    • 2022.10a
    • /
    • pp.176-181
    • /
    • 2022
  • 일반 상식 추론 능력은 가장 사람다운 능력 중 하나로써, 인공지능 모델이 쉽게 모사하기 어려운 영역이다. 딥러닝 기반의 언어 모델은 여전히 일반 상식에 기반한 추론을 필요로 하는 분야에서 부족한 성능을 보인다. 특히, 한국어에서는 일반 상식 추론과 관련한 연구가 상당히 부족한 상황이다. 이러한 문제 완화를 위해 최근 생성 기반의 일반 상식 추론을 위한 한국어 데이터셋인 Korean CommonGen [1]이 발표되었다. 그러나, 해당 데이터셋의 평가 지표는 어휘 단계의 유사성과 중첩에 의존하는 한계를 지니며, 생성한 문장이 일반 상식에 부합한 문장인지 측정하기 어렵다. 따라서 본 논문은 한국어 일반 상식 추론 및 생성 능력에 대한 평가 지표를 개선하기 위해 문장 성분의 의미역과 자모의 형태 변화를 바탕으로 생성 결과를 평가하는 SRLev, 사람의 평가 결과를 학습한 BIH, 그리고 두 평가 지표의 장점을 결합한 SRLev-BIH를 제안한다.

  • PDF

The Net Promoter Score with Friends and Family Test applied to arthroscopic shoulder surgery

  • Jabbal Monu;Sharma Sunil
    • Clinics in Shoulder and Elbow
    • /
    • v.26 no.1
    • /
    • pp.20-24
    • /
    • 2023
  • Background: The Friends and Family Test (FFT) developed by the UK National Health Service evaluates whether patients are satisfied with a service provided, where improvements are needed, and how likely patients are to recommend the intervention. Calculated from the FFT, the Net Promoter Score (NPS) creates a recommendation metric for treatment. The primary aim of this prospective study is to evaluate NPS for arthroscopic subacromial decompression (ASD) and rotator cuff repair (RCR). Secondary aims are to postoperatively evaluate 1-year changes in patients' Oxford Shoulder Scores (OSSs) in terms of the proportion of patients satisfied with their surgery and correlation with FFT. Methods: During a 2-year period, all patients undergoing ASD or RCR completed questionnaires prospectively. Collected preoperatively and postoperatively at 1 year. Results: NPSs were 31 for ASD (n=32) and 52 for RCR (n=39). OSSs increased by 4.3 and 6.9 for ASD and RCR, respectively (P<0.001). Overall, 75% of ASD and 77% of RCR patients were either "satisfied" or "very satisfied," respectively, with procedure outcomes. Scores from FFT had a positive correlation with improvement in OSS and satisfaction scores among patients undergoing arthroscopic shoulder surgeries (P<0.001). Conclusions: The current study shows positive NPS outcomes in patients with ASD and RCR. Scores from FFT correlate well with both satisfaction and OSS among patients. NPS can be an adjunct to traditional patient-reported outcome measures to provide global evaluation of patient experiences to aid in determining the clinical value of common procedures in shoulder orthopaedics. Level of evidence: III.

Moving Object Segmentation-based Approach for Improving Car Heading Angle Estimation (Moving Object Segmentation을 활용한 자동차 이동 방향 추정 성능 개선)

  • Chiyun Noh;Sangwoo Jung;Yujin Kim;Kyongsu Yi;Ayoung Kim
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.1
    • /
    • pp.130-138
    • /
    • 2024
  • High-precision 3D Object Detection is a crucial component within autonomous driving systems, with far-reaching implications for subsequent tasks like multi-object tracking and path planning. In this paper, we propose a novel approach designed to enhance the performance of 3D Object Detection, especially in heading angle estimation by employing a moving object segmentation technique. Our method starts with extracting point-wise moving labels via a process of moving object segmentation. Subsequently, these labels are integrated into the LiDAR Pointcloud data and integrated data is used as inputs for 3D Object Detection. We conducted an extensive evaluation of our approach using the KITTI-road dataset and achieved notably superior performance, particularly in terms of AOS, a pivotal metric for assessing the precision of 3D Object Detection. Our findings not only underscore the positive impact of our proposed method on the advancement of detection performance in lidar-based 3D Object Detection methods, but also suggest substantial potential in augmenting the overall perception task capabilities of autonomous driving systems.

Application of a comparative analysis of random forest programming to predict the strength of environmentally-friendly geopolymer concrete

  • Ying Bi;Yeng Yi
    • Steel and Composite Structures
    • /
    • v.50 no.4
    • /
    • pp.443-458
    • /
    • 2024
  • The construction industry, one of the biggest producers of greenhouse emissions, is under a lot of pressure as a result of growing worries about how climate change may affect local communities. Geopolymer concrete (GPC) has emerged as a feasible choice for construction materials as a result of the environmental issues connected to the manufacture of cement. The findings of this study contribute to the development of machine learning methods for estimating the properties of eco-friendly concrete, which might be used in lieu of traditional concrete to reduce CO2 emissions in the building industry. In the present work, the compressive strength (fc) of GPC is calculated using random forests regression (RFR) methodology where natural zeolite (NZ) and silica fume (SF) replace ground granulated blast-furnace slag (GGBFS). From the literature, a thorough set of experimental experiments on GPC samples were compiled, totaling 254 data rows. The considered RFR integrated with artificial hummingbird optimization (AHA), black widow optimization algorithm (BWOA), and chimp optimization algorithm (ChOA), abbreviated as ARFR, BRFR, and CRFR. The outcomes obtained for RFR models demonstrated satisfactory performance across all evaluation metrics in the prediction procedure. For R2 metric, the CRFR model gained 0.9988 and 0.9981 in the train and test data set higher than those for BRFR (0.9982 and 0.9969), followed by ARFR (0.9971 and 0.9956). Some other error and distribution metrics depicted a roughly 50% improvement for CRFR respect to ARFR.

Newly-designed adaptive non-blind deconvolution with structural similarity index in single-photon emission computed tomography

  • Kyuseok Kim;Youngjin Lee
    • Nuclear Engineering and Technology
    • /
    • v.55 no.12
    • /
    • pp.4591-4596
    • /
    • 2023
  • Single-photon emission computed tomography SPECT image reconstruction methods have a significant influence on image quality, with filtered back projection (FBP) and ordered subset expectation maximization (OSEM) being the most commonly used methods. In this study, we proposed newly-designed adaptive non-blind deconvolution with a structural similarity (SSIM) index that can take advantage of the FBP and OSEM image reconstruction methods. After acquiring brain SPECT images, the proposed image was obtained using an algorithm that applied the SSIM metric, defined by predicting the distribution and amount of blurring. As a result of the contrast to noise ratio (CNR) and coefficient of variation evaluation (COV), the resulting image of the proposed algorithm showed a similar trend in spatial resolution to that of FBP, while obtaining values similar to those of OSEM. In addition, we confirmed that the CNR and COV values of the proposed algorithm improved by approximately 1.69 and 1.59 times, respectively, compared with those of the algorithm involving an inappropriate deblurring process. To summarize, we proposed a new type of algorithm that combines the advantages of SPECT image reconstruction techniques and is expected to be applicable in various fields.

Applying NIST AI Risk Management Framework: Case Study on NTIS Database Analysis Using MAP, MEASURE, MANAGE Approaches (NIST AI 위험 관리 프레임워크 적용: NTIS 데이터베이스 분석의 MAP, MEASURE, MANAGE 접근 사례 연구)

  • Jung Sun Lim;Seoung Hun, Bae;Taehoon Kwon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.47 no.2
    • /
    • pp.21-29
    • /
    • 2024
  • Fueled by international efforts towards AI standardization, including those by the European Commission, the United States, and international organizations, this study introduces a AI-driven framework for analyzing advancements in drone technology. Utilizing project data retrieved from the NTIS DB via the "drone" keyword, the framework employs a diverse toolkit of supervised learning methods (Keras MLP, XGboost, LightGBM, and CatBoost) enhanced by BERTopic (natural language analysis tool). This multifaceted approach ensures both comprehensive data quality evaluation and in-depth structural analysis of documents. Furthermore, a 6T-based classification method refines non-applicable data for year-on-year AI analysis, demonstrably improving accuracy as measured by accuracy metric. Utilizing AI's power, including GPT-4, this research unveils year-on-year trends in emerging keywords and employs them to generate detailed summaries, enabling efficient processing of large text datasets and offering an AI analysis system applicable to policy domains. Notably, this study not only advances methodologies aligned with AI Act standards but also lays the groundwork for responsible AI implementation through analysis of government research and development investments.

Development of an Optimal Convolutional Neural Network Backbone Model for Personalized Rice Consumption Monitoring in Institutional Food Service using Feature Extraction

  • Young Hoon Park;Eun Young Choi
    • The Korean Journal of Food And Nutrition
    • /
    • v.37 no.4
    • /
    • pp.197-210
    • /
    • 2024
  • This study aims to develop a deep learning model to monitor rice serving amounts in institutional foodservice, enhancing personalized nutrition management. The goal is to identify the best convolutional neural network (CNN) for detecting rice quantities on serving trays, addressing balanced dietary intake challenges. Both a vanilla CNN and 12 pre-trained CNNs were tested, using features extracted from images of varying rice quantities on white trays. Configurations included optimizers, image generation, dropout, feature extraction, and fine-tuning, with top-1 validation accuracy as the evaluation metric. The vanilla CNN achieved 60% top-1 validation accuracy, while pre-trained CNNs significantly improved performance, reaching up to 90% accuracy. MobileNetV2, suitable for mobile devices, achieved a minimum 76% accuracy. These results suggest the model can effectively monitor rice servings, with potential for improvement through ongoing data collection and training. This development represents a significant advancement in personalized nutrition management, with high validation accuracy indicating its potential utility in dietary management. Continuous improvement based on expanding datasets promises enhanced precision and reliability, contributing to better health outcomes.

Portability Testing Method for Digital Right Management Software (디지털 저작권 관리 S/W의 이식성 시험 방법)

  • Yang, Hae-Sool;Kang, Bae-Keun;Lee, Ha-Yong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.4
    • /
    • pp.103-113
    • /
    • 2009
  • The Digital Right Management from illegal copy protects the various digital contents, the use self-admiration which is lawful does to make the contents use, payment leads about use and is a system which protects a copyright voluntary right and a profit the contents free reproduction permits, illegality use the fact that closes thoroughly is goal. Portability rating will lead and as the technique will be able to induce the quality increase of Digital Right Management S/W the strategic engineering development which accommodates an international standard there is a possibility which an objectivity and an application degree will raise. In order to evaluate Portability of Digital Right Management S/W from the research which sees proposed the tentative metric. Also, measured a quality and the result according to the standard which is appropriate accomplished presented clearly presented a research about the method which decides and an evaluation instance and a evaluation method.

Detectability Evaluation for Alert Sound in an Electric Vehicle (전기자동차의 경고음에 대한 인지성 평가)

  • Han, Man Uk;Lee, Sang Kwon
    • Transactions of the Korean Society of Mechanical Engineers A
    • /
    • v.41 no.10
    • /
    • pp.923-929
    • /
    • 2017
  • Generally, the sound emitted from a vehicle powered by an electric motor is lower than that of internal combustion engine vehicles. Therefore, pedestrians often cannot detect approaching electric vehicles. Therefore, a certain additional warning sound is required for these types of automobiles. In this study, to develop an audible warning sound, nine warning sounds are designed based on signal processing and chord theory. The background noise measured on the road is also added to these synthetic sounds. The detectability of these warning sounds is evaluated by subjective tests. The sound metric is correlated to detectability and is investigated through psychoacoustic theory and subjective evaluation. It is determined that known psychoacoustic parameters such as loudness, sharpness, and roughness have a low correlation with detectability. However, it is found that the interval of harmonic sound correlates well with detectability.

A Comparison of the Search Based Testing Algorithm with Metrics (메트릭에 따른 탐색 기반 테스팅 알고리즘 비교)

  • Choi, HyunJae;Chae, HeungSeok
    • Journal of KIISE
    • /
    • v.43 no.4
    • /
    • pp.480-488
    • /
    • 2016
  • Search-Based Software Testing (SBST) is an effective technique for test data generation on large domain size. Although the performance of SBST seems to be affected by the structural characteristics of Software Under Test (SUT), studies for the comparison of SBST techniques considering structural characteristics are rare. In addition to the comparison study for SBST, we analyzed the best algorithm with different structural characteristics of SUT. For the generalization of experimental results, we automatically generated 19,800 SUTs by combining four metrics, which are expected to affect the performance of SBST. According to the experiment results, Genetic algorithm showed the best performance for SUTs with high complexity and test data evaluation with count ${\leq}20,000$. On the other hand, the genetic simulated annealing and the simulated annealing showed relatively better performance for SUTs with high complexity and test data evaluation with count ${\geq}50,000$. Genetic simulated annealing, simulated annealing and hill climbing showed better performance for SUTs with low complexity.