• 제목/요약/키워드: misclassification

검색결과 229건 처리시간 0.031초

Selecting the Best Prediction Model for Readmission

  • Lee, Eun-Whan
    • Journal of Preventive Medicine and Public Health
    • /
    • 제45권4호
    • /
    • pp.259-266
    • /
    • 2012
  • Objectives: This study aims to determine the risk factors predicting rehospitalization by comparing three models and selecting the most successful model. Methods: In order to predict the risk of rehospitalization within 28 days after discharge, 11 951 inpatients were recruited into this study between January and December 2009. Predictive models were constructed with three methods, logistic regression analysis, a decision tree, and a neural network, and the models were compared and evaluated in light of their misclassification rate, root asymptotic standard error, lift chart, and receiver operating characteristic curve. Results: The decision tree was selected as the final model. The risk of rehospitalization was higher when the length of stay (LOS) was less than 2 days, route of admission was through the out-patient department (OPD), medical department was in internal medicine, 10th revision of the International Classification of Diseases code was neoplasm, LOS was relatively shorter, and the frequency of OPD visit was greater. Conclusions: When a patient is to be discharged within 2 days, the appropriateness of discharge should be considered, with special concern of undiscovered complications and co-morbidities. In particular, if the patient is admitted through the OPD, any suspected disease should be appropriately examined and prompt outcomes of tests should be secured. Moreover, for patients of internal medicine practitioners, co-morbidity and complications caused by chronic illness should be given greater attention.

Issues in the Design of Molecular and Genetic Epidemiologic Studies

  • Fowke, Jay H.
    • Journal of Preventive Medicine and Public Health
    • /
    • 제42권6호
    • /
    • pp.343-348
    • /
    • 2009
  • The final decision of study design in molecular and genetic epidemiology is usually a compromise between the research study aims and a number of logistical and ethical barriers that may limit the feasibility of the study or the interpretation of results. Although biomarker measurements may improve exposure or disease assessments, it is necessary to address the possibility that biomarker measurement inserts additional sources of misclassification and confounding that may lead to inconsistencies across the research literature. Studies targeting multi-causal diseases and investigating gene-environment interactions must not only meet the needs of a traditional epidemiologic study but also the needs of the biomarker investigation. This paper is intended to highlight the major issues that need to be considered when developing an epidemiologic study utilizing biomarkers. These issues covers from molecular and genetic epidemiology (MGE) study designs including cross-sectional, cohort, case-control, clinical trials, nested case-control, and case-only studies to matching the study design to the MGE research goals. This review summarizes logistical barriers and the most common epidemiological study designs most relevant to MGE and describes the strengths and limitations of each approach in the context of common MGE research aims to meet specific MEG objectives.

신용카드 사기 검출을 위한 비용 기반 학습에 관한 연구 (Cost-sensitive Learning for Credit Card Fraud Detection)

  • 박래정
    • 한국지능시스템학회논문지
    • /
    • 제15권5호
    • /
    • pp.545-551
    • /
    • 2005
  • 사기 검출의 주목적은 사기 거래로 인해 발생하는 손실을 최소화하는 것이다. 하지만, 사기 검출 문제의 특이한 속성, 즉 불균형하고 중첩이 심한 클래스 분포와 비균일한 오분류 비용으로 인해, 실제로 희망하는 거절율 동작 영역에서의 분류비용 측면의 최적 분류기를 생성하는 것이 용이하지 않다. 본 논문에서는, 특정 동작 영역에서의 분류기의 분류 비용을 정의하고, 진화 탐색을 이용하여 이를 직접적으로 최적화함으로써, 실제 신용카드 사기 검출에 적합한 분류기를 학습할 수 있는 비용 기반 학습 방법을 제시한다. 신용카드 거래 데이터를 사용한 실험을 통해, 제시한 방법이 타 학습 방법에 비해 비용에 민감한 분류기를 학습할 수 있는 효과적인 방법임을 보인다.

정보부존재 해결방안으로서 기록 생산단계 강화방안 연구 (A Study on Strengthening Records Management as a Solution to the Nonexistence of Information)

  • 권현진;이영학
    • 한국기록관리학회지
    • /
    • 제18권4호
    • /
    • pp.25-43
    • /
    • 2018
  • 본 연구는 기록부존재 현상이 정보부존재로 이어지는 점에 주목했다. 기록의 분실${\cdot}$훼손${\cdot}$방치${\cdot}$무단파기${\cdot}$오분류${\cdot}$보존기간 하향 책정 등으로 인한 기록부존재 현상을 해결하기 위해선 기록 생산단계에서부터 철저한 기록관리가 이루어져야 함을 주장하였다. 이를 통해 정보부존재가 발생하지 않을 수 있고, 정보부존재가 발생하더라도 왜 정보부존재인지에 대한 설명도구로서 쓰일 수 있을 것이라 기대한다.

Grad-CAM을 이용한 적대적 예제 생성 기법 연구 (Research of a Method of Generating an Adversarial Sample Using Grad-CAM)

  • 강세혁
    • 한국멀티미디어학회논문지
    • /
    • 제25권6호
    • /
    • pp.878-885
    • /
    • 2022
  • Research in the field of computer vision based on deep learning is being actively conducted. However, deep learning-based models have vulnerabilities in adversarial attacks that increase the model's misclassification rate by applying adversarial perturbation. In particular, in the case of FGSM, it is recognized as one of the effective attack methods because it is simple, fast and has a considerable attack success rate. Meanwhile, as one of the efforts to visualize deep learning models, Grad-CAM enables visual explanation of convolutional neural networks. In this paper, I propose a method to generate adversarial examples with high attack success rate by applying Grad-CAM to FGSM. The method chooses fixels, which are closely related to labels, by using Grad-CAM and add perturbations to the fixels intensively. The proposed method has a higher success rate than the FGSM model in the same perturbation for both targeted and untargeted examples. In addition, unlike FGSM, it has the advantage that the distribution of noise is not uniform, and when the success rate is increased by repeatedly applying noise, the attack is successful with fewer iterations.

Differentiation among stability regimes of alumina-water nanofluids using smart classifiers

  • Daryayehsalameh, Bahador;Ayari, Mohamed Arselene;Tounsi, Abdelouahed;Khandakar, Amith;Vaferi, Behzad
    • Advances in nano research
    • /
    • 제12권5호
    • /
    • pp.489-499
    • /
    • 2022
  • Nanofluids have recently triggered a substantial scientific interest as cooling media. However, their stability is challenging for successful engagement in industrial applications. Different factors, including temperature, nanoparticles and base fluids characteristics, pH, ultrasonic power and frequency, agitation time, and surfactant type and concentration, determine the nanofluid stability regime. Indeed, it is often too complicated and even impossible to accurately find the conditions resulting in a stabilized nanofluid. Furthermore, there are no empirical, semi-empirical, and even intelligent scenarios for anticipating the stability of nanofluids. Therefore, this study introduces a straightforward and reliable intelligent classifier for discriminating among the stability regimes of alumina-water nanofluids based on the Zeta potential margins. In this regard, various intelligent classifiers (i.e., deep learning and multilayer perceptron neural network, decision tree, GoogleNet, and multi-output least squares support vector regression) have been designed, and their classification accuracy was compared. This comparison approved that the multilayer perceptron neural network (MLPNN) with the SoftMax activation function trained by the Bayesian regularization algorithm is the best classifier for the considered task. This intelligent classifier accurately detects the stability regimes of more than 90% of 345 different nanofluid samples. The overall classification accuracy and misclassification percent of 90.1% and 9.9% have been achieved by this model. This research is the first try toward anticipting the stability of water-alumin nanofluids from some easily measured independent variables.

적대적 공격을 방어하기 위한 StarGAN 기반의 탐지 및 정화 연구 (StarGAN-Based Detection and Purification Studies to Defend against Adversarial Attacks)

  • 박성준;류권상;최대선
    • 정보보호학회논문지
    • /
    • 제33권3호
    • /
    • pp.449-458
    • /
    • 2023
  • 인공지능은 빅데이터와 딥러닝 기술을 이용해 다양한 분야에서 삶의 편리함을 주고 있다. 하지만, 딥러닝 기술은 적대적 예제에 매우 취약하여 적대적 예제가 분류 모델의 오분류를 유도한다. 본 연구는 StarGAN을 활용해 다양한 적대적 공격을 탐지 및 정화하는 방법을 제안한다. 제안 방법은 Categorical Entropy loss를 추가한 StarGAN 모델에 다양한 공격 방법으로 생성된 적대적 예제를 학습시켜 판별자는 적대적 예제를 탐지하고, 생성자는 적대적 예제를 정화한다. CIFAR-10 데이터셋을 통해 실험한 결과 평균 탐지 성능은 약 68.77%, 평균정화성능은 약 72.20%를 보였으며 정화 및 탐지 성능으로 도출되는 평균 방어 성능은 약 93.11%를 보였다.

데이터 마이닝과 텍스트 마이닝의 통합적 접근을 통한 병사 사고예측 모델 개발 (Development of the Accident Prediction Model for Enlisted Men through an Integrated Approach to Datamining and Textmining)

  • 윤승진;김수환;신경식
    • 지능정보연구
    • /
    • 제21권3호
    • /
    • pp.1-17
    • /
    • 2015
  • 최근, 군에서 가장 이슈가 되고 있는 문제는 기강 해이, 복무 부적응 등으로 인한 병력 사고이다. 이 같은 사고를 예방하는 데 있어 가장 중요한 것은, 사고의 요인이 될 수 있는 문제를 사전에 식별 관리하는 것이다. 이를 위해서 지휘관들은 병사들과의 면담, 생활관 순찰, 부모님과의 대화 등 나름대로의 노력을 기울이고 있기는 하지만, 지휘관 개개인의 역량에 따라 사고 징후를 식별하는 데 큰 차이가 나는 것이 현실이다. 본 연구에서는 이러한 문제점을 극복하고자 모든 지휘관들이 쉽게 획득 가능한 객관적 데이터를 활용하여 사고를 예측해 보려 한다. 최근에는 병사들의 생활지도기록부 DB화가 잘 되어있을 뿐 아니라 지휘관들이 병사들과 SNS상에서 소통하며 정보를 얻기 때문에 이를 데이터화 하여 잘 활용한다면 병사들의 사고예측 및 예방이 가능하다고 판단하였다. 본 연구는 이러한 병사의 내부데이터(생활지도기록부) 및 외부데이터(SNS)를 활용하여 그들의 관심분야를 파악하고 사고를 예측, 이를 지휘에 활용하는 데이터마이닝 문제를 다루며, 그 방법으로 토픽분석 및 의사결정나무 방법을 제안한다. 연구는 크게 두 흐름으로 진행하였다. 첫 번째는 병사들의 SNS에서 토픽을 분석하고 이를 독립변수화 하였고 두 번째는 병사들의 내부데이터에 이 토픽분석결과를 독립변수로 추가하여 의사결정나무를 수행하였다. 이 때 종속변수는 병사들의 사고유무이다. 분석결과 사고 예측 정확도가 약 92%로 뛰어난 예측력을 보였다. 본 연구를 기반으로 향후 장병들의 사고예측을 과학적으로 분석, 맞춤식으로 관리한다면 군대 내 각종 사고를 미연에 예방하는데 기여할 것으로 기대된다.

신성장 동력, 소프트웨어산업의 경제적 파급효과 분석 (New Growth Power, Economic Effect Analysis of Software Industry)

  • 최진호;류재홍
    • Journal of Information Technology Applications and Management
    • /
    • 제21권4_spc호
    • /
    • pp.381-401
    • /
    • 2014
  • This study proposes the accurate economic effect (employment inducement coefficient, hiring inducement coefficient, index of the sensitivity of dispersion, index of the power of dispersion, and ratio of value added) of Korea software industry by analyzing the inter-industry relation using the modified inter-industry table. Some previous studies related to the inter-industry analysis were reviewed and the key problems were identified. First, in the current inter-industry table publishedby the Bank of Korea, the output of software industry includes not only the output of pure software industry (package software and IT services) but also the output of non-software industry due to the misclassification of the industry. This causes the output to become bigger than the actual output of the software industry. Second, during rewriting the inter-industry table, the output is changing. The inter-industry table is the table in the form of rows and columns, which records the transactions of goods and services among industries which are required to continue the activities of each industry. Accordingly, if only an output of a specific industry is changed, the reliability of the table would be degraded because the table is prepared based on the relations with other industries. This possibly causes the economic effect coefficient to degrade reliability, over or under estimated. This study tries to correct these problems to get the more accurate economic effect of the software industry. First, to get the output of the pure software section only, the data from the Korea Electronics Association(KEA) was used in the inter-industry table. Second, to prevent the difference in the outputs during rewriting the inter-industry table, the difference between the output in the current inter-industry table and the output from KEA data was identified and then it was defined as the non-software section output for the analysis. The following results were obtained: The pure software section's economic effect coefficient was lower than the coefficient of non-software section. It comes from differenceof data to Bank of Korea and KEA. This study hasa signification from accurate economic effect of Korea software industry.

아시아 지역 지면피복자료 비교 연구: USGS, IGBP, 그리고 UMd (A Comparison of the Land Cover Data Sets over Asian Region: USGS, IGBP, and UMd)

  • 강전호;서명석;곽종흠
    • 대기
    • /
    • 제17권2호
    • /
    • pp.159-169
    • /
    • 2007
  • A comparison of the three land cover data sets (United States Geological Survey: USGS, International Geosphere Biosphere Programme: IGBP, and University of Maryland: UMd), derived from 1992-1993 Advanced Very High Resolution Radiometer(AVHRR) data sets, was performed over the Asian continent. Preprocesses such as the unification of map projection and land cover definition, were applied for the comparison of the three different land cover data sets. Overall, the agreement among the three land cover data sets was relatively high for the land covers which have a distinct phenology, such as urban, open shrubland, mixed forest, and bare ground (>45%). The ratios of triple agreement (TA), couple agreement (CA) and total disagreement (TD) among the three land cover data sets are 30.99%, 57.89% and 8.91%, respectively. The agreement ratio between USGS and IGBP is much greater (about 80%) than that (about 32%) between USGS and UMd (or IGBP and UMd). The main reasons for the relatively low agreement among the three land cover data sets are differences in 1) the number of land cover categories, 2) the basic input data sets used for the classification, 3) classification (or clustering) methodologies, and 4) level of preprocessing. The number of categories for the USGS, IGBP and UMd are 24, 17 and 14, respectively. USGS and IGBP used only the 12 monthly normalized difference vegetation index (NDVI), whereas UMd used the 12 monthly NDVI and other 29 auxiliary data derived from AVHRR 5 channels. USGS and IGBP used unsupervised clustering method, whereas UMd used the supervised technique, decision tree using the ground truth data derived from the high resolution Landsat data. The insufficient preprocessing in USGS and IGBP compared to the UMd resulted in the spatial discontinuity and misclassification.