• Title/Summary/Keyword: AUC consistency

Search Result 6, Processing Time 0.024 seconds

L1-penalized AUC-optimization with a surrogate loss

  • Hyungwoo Kim;Seung Jun Shin
    • Communications for Statistical Applications and Methods
    • /
    • v.31 no.2
    • /
    • pp.203-212
    • /
    • 2024
  • The area under the ROC curve (AUC) is one of the most common criteria used to measure the overall performance of binary classifiers for a wide range of machine learning problems. In this article, we propose a L1-penalized AUC-optimization classifier that directly maximizes the AUC for high-dimensional data. Toward this, we employ the AUC-consistent surrogate loss function and combine the L1-norm penalty which enables us to estimate coefficients and select informative variables simultaneously. In addition, we develop an efficient optimization algorithm by adopting k-means clustering and proximal gradient descent which enjoys computational advantages to obtain solutions for the proposed method. Numerical simulation studies demonstrate that the proposed method shows promising performance in terms of prediction accuracy, variable selectivity, and computational costs.

Validation of Instruments to Classify the Frailty of the Elderly in Community (지역사회 거주 노인의 허약선별도구 타당도 평가)

  • Lee, In-Sook;Park, Young-Im;Park, Eun-Ok;Lee, Soon-Hee;Jeong, Ihn-Sook
    • Research in Community and Public Health Nursing
    • /
    • v.22 no.3
    • /
    • pp.302-314
    • /
    • 2011
  • Purpose: This study aimed to validate instruments to classify the frailty of Korean elderly people in community. Methods: For this study, 632 elders were selected from community-based elderly houses and home visiting registries, and data on frailty were collected using three instruments during November, 2008. The Korean Frail Scale (KFS) was composed of 10 domains with the maximum score of 20. The Edmonton Frail Scale (EFS) had 10 domains with the maximum score of 17. The 25_Japan Frail Scale (25_JFS) was composed of 6 domains with the maximum score of 25. Internal consistency was measured with Cronbach's ${\alpha}$. Sensitivity, specificity and area under the curve (AUC) of ROC were measured to see validity with long.term care insurance grade as a gold standard. Results: The Cronbach's ${\alpha}$ was .72 for KFS, .55 for EFS, and .80 for 25_JFS. Sensitivity, specificity, and AUC were 70.0%, 83.2%, and .83, respectively, at cutting point 10.5 for the KFS, 50.0%, 80.9%, and .66, respectively, at 8.5 for EFS, and 80.0%, 85.9%, and .86, respectively, at 12.5 for 25_JFS. Conclusion: KFS and three JFS showed favorable internal consistency and predictive validity. Further longitudinal studies are recommended to confirm predictive validity.

Hallucination Detection for Generative Large Language Models Exploiting Consistency and Fact Checking Technique (생성형 거대 언어 모델에서 일관성 확인 및 사실 검증을 활 용한 Hallucination 검출 기법)

  • Myeong Jin;Gun-Woo Kim
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2023.11a
    • /
    • pp.461-464
    • /
    • 2023
  • 최근 GPT-3 와 LLaMa 같은 생성형 거대 언어모델을 활용한 서비스가 공개되었고, 실제로 많은 사람들이 사용하고 있다. 해당 모델들은 사용자들의 다양한 질문에 대해 유창한 답변을 한다는 이유로 주목받고 있다. 하지만 LLMs 의 답변에는 종종 Inconsistent content 와 non-factual statement 가 존재하며, 이는 사용자들로 하여금 잘못된 정보의 전파 등의 문제를 야기할 수 있다. 이에 논문에서는 동일한 질문에 대한 LLM 의 답변 샘플과 외부 지식을 활용한 Hallucination Detection 방법을 제안한다. 제안한 방법은 동일한 질문에 대한 LLM 의 답변들을 이용해 일관성 점수(Consistency score)를 계산한다. 거기에 외부 지식을 이용한 사실검증을 통해 사실성 점수(Factuality score)를 계산한다. 계산된 일관성 점수와 사실성 점수를 활용하여 문장 수준의 Hallucination Detection 을 가능하게 했다. 실험에는 GPT-3 를 이용하여 WikiBio dataset 에 있는 인물에 대한 passage 를 생성한 데이터셋을 사용하였으며, 우리는 해당 방법을 통해 문장 수준에서의 Hallucination Detection 성능이 baseline 보다 AUC-PR scores 에서 향상됨을 보였다.

Hindi version of short form of douleur neuropathique 4 (S-DN4) questionnaire for assessment of neuropathic pain component: a cross-cultural validation study

  • Gudala, Kapil;Ghai, Babita;Bansal, Dipika
    • The Korean Journal of Pain
    • /
    • v.30 no.3
    • /
    • pp.197-206
    • /
    • 2017
  • Background: Pain with neuropathic characteristics is generally more severe and associated with a lower quality of life compared to nociceptive pain (NcP). Short form of the Douleur Neuropathique en 4 Questions (S-DN4) is one of the most used and reliable screening questionnaires and is reported to have good diagnostic properties. This study was aimed to cross-culturally validate the Hindi version of the S-DN4 in patients with various chronic pain conditions. Methods: The S-DN4 is already translated into the Hindi language by Mapi Research Trust. This study assessed the psychometric properties of the Hindi version of the S-DN4 including internal consistency and test-retest reliability after 3 days' post-baseline assessment. Diagnostic performance was also assessed. Results: One hundred sixty patients with chronic pain, 80 each in the neuropathic pain (NeP) present and NeP absent groups, were recruited. Patients with NeP present reported significantly higher S-DN4 scores in comparison to patients in the NeP absent group (mean (SD), 4.7 (1.7) vs. 1.8 (1.6), P < 0.01). The S-DN4 was found to have an AUC of 0.88 with adequate internal consistency (Cronbach's ${\alpha}=0.80$) and a test-retest reliability (ICC = 0.92) with an optimal cut-off value of 3 (Youden's index = 0.66, sensitivity and specificity of 88.7% and 77.5%). The diagnostic concordance rate between clinician diagnosis and the S-DN4 questionnaire was 83.1% (kappa = 0.66). Conclusions: Overall, the Hindi version of the S-DN4 has good internal consistency and test-retest reliability along with good diagnostic accuracy.

Development of Korean Intensive Care Delirium Screening Tool (KICDST) (중환자 섬망 선별도구 개발)

  • Nam, Ae-Ri-Na;Park, Jee-Won
    • Journal of Korean Academy of Nursing
    • /
    • v.46 no.1
    • /
    • pp.149-158
    • /
    • 2016
  • Purpose: This study was done to develop of the Korean intensive care delirium screening tool (KICDST). Methods: The KICDST was developed in 5 steps: Configuration of conceptual frame, development of preliminary tool, pilot study, reliability and validity test, development of final KICDST. Reliability tests were done using degree of agreement between evaluators and internal consistency. For validity tests, CVI (Content Validity Index), ROC (Receiver Operating Characteristics) analysis, known group technique and factor analysis were used. Results: In the reliability test, the degree of agreement between evaluators showed .80~1.00 and the internal consistency was KR-20=.84. The CVI was .83~1.00. In ROC analysis, the AUC (Area Under the ROC Curve) was .98. Assessment score was 4 points. The values for sensitivity, specificity, correct classification rate, positive predictive value, and negative predictive value were found to be 95.0%, 93.7%, 94.4%, 95.0% and 93.7%, respectively. In the known group technique, the average delirium screening tool score of the non-delirium group was $1.25{\pm}0.99$ while that of delirium group was $5.07{\pm}1.89$ (t= - 16.33, p <.001). The factors were classified into 3 factors (cognitive change, symptom fluctuation, psychomotor retardation), which explained 67.4% of total variance. Conclusion: Findings show that the KICDST has high sensitivity and specificity. Therefore, this screening tool is recommended for early identification of delirium in intensive care patients.

A probabilistic knowledge model for analyzing heart rate variability (심박수변이도 분석을 위한 확률적 지식기반 모형)

  • Son, Chang-Sik;Kang, Won-Seok;Choi, Rock-Hyun;Park, Hyoung-Seob;Han, Seongwook;Kim, Yoon-Nyun
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.3
    • /
    • pp.61-69
    • /
    • 2015
  • This study presents a probabilistic knowledge discovery method to interpret heart rate variability (HRV) based on time and frequency domain indexes, extracted using discrete wavelet transform. The knowledge induction algorithm was composed of two phases: rule generation and rule estimation. Firstly, a rule generation converts numerical attributes to intervals using ROC curve analysis and constructs a reduced ruleset by comparing consistency degree between attribute-value pairs with different decision values. Then, we estimated three measures such as rule support, confidence, and coverage to a probabilistic interpretation for each rule. To show the effectiveness of proposed model, we evaluated the statistical discriminant power of five rules (3 for atrial fibrillation, 1 for normal sinus rhythm, and 1 for both atrial fibrillation and normal sinus rhythm) generated using a data (n=58) collected from 1 channel wireless holter electrocardiogram (ECG), i.e., HeartCall$^{(R)}$, U-Heart Inc. The experimental result showed the performance of approximately 0.93 (93%) in terms of accuracy, sensitivity, specificity, and AUC measures, respectively.