• 제목/요약/키워드: Dataset Quality

검색결과 426건 처리시간 0.024초

Synthetic data augmentation for pixel-wise steel fatigue crack identification using fully convolutional networks

  • Zhai, Guanghao;Narazaki, Yasutaka;Wang, Shuo;Shajihan, Shaik Althaf V.;Spencer, Billie F. Jr.
    • Smart Structures and Systems
    • /
    • 제29권1호
    • /
    • pp.237-250
    • /
    • 2022
  • Structural health monitoring (SHM) plays an important role in ensuring the safety and functionality of critical civil infrastructure. In recent years, numerous researchers have conducted studies to develop computer vision and machine learning techniques for SHM purposes, offering the potential to reduce the laborious nature and improve the effectiveness of field inspections. However, high-quality vision data from various types of damaged structures is relatively difficult to obtain, because of the rare occurrence of damaged structures. The lack of data is particularly acute for fatigue crack in steel bridge girder. As a result, the lack of data for training purposes is one of the main issues that hinders wider application of these powerful techniques for SHM. To address this problem, the use of synthetic data is proposed in this article to augment real-world datasets used for training neural networks that can identify fatigue cracks in steel structures. First, random textures representing the surface of steel structures with fatigue cracks are created and mapped onto a 3D graphics model. Subsequently, this model is used to generate synthetic images for various lighting conditions and camera angles. A fully convolutional network is then trained for two cases: (1) using only real-word data, and (2) using both synthetic and real-word data. By employing synthetic data augmentation in the training process, the crack identification performance of the neural network for the test dataset is seen to improve from 35% to 40% and 49% to 62% for intersection over union (IoU) and precision, respectively, demonstrating the efficacy of the proposed approach.

GAN 기반 의료영상 생성 모델에 대한 품질 및 다양성 평가 및 분석 (Assessment and Analysis of Fidelity and Diversity for GAN-based Medical Image Generative Model)

  • 장유진;유재준;홍헬렌
    • 한국컴퓨터그래픽스학회논문지
    • /
    • 제28권2호
    • /
    • pp.11-19
    • /
    • 2022
  • 최근 의료영상의 발전에 따라 의료 영상 생성에 대한 다양한 연구가 제안되고 있는데, 이와 관련하여 생성된 의료 영상의 품질과 다양성을 정확하게 평가하는 것이 중요해지고 있다. 생성된 의료 영상을 평가하는 방법으로는 전문가의 시각적 튜링 테스트(visual turing test), 특징 분포 시각화, IS, FID를 통한 정량적 평가를 통해 평가하고 있으나 의료 영상을 품질(fidelity)과 다양성(diversity) 측면에서 정량적으로 평가 하는 방법은 거의 이루어지고 있지 않다. 본 논문에서는 DCGAN과 PGGAN 생성 모델을 통해 비소세포폐암 환자의 흉부 CT 데이터 셋을 학습하여 영상을 생성하고, 이를 품질(fidelity)과 다양성(diversity) 측면에서 두 생성 모델의 성능을 평가한다. 1차원 점수 기반 평가방법인 IS, FID와 2차원 점수 기반 평가방법인 Precision 및 Recall, 개선된 Precision 및 Recall을 통해 성능을 정량적으로 평가하고, 의료영상에서의 각 평가방법들의 특징과 한계점에 대해서도 분석한다.

A Study on the Land Cover Classification and Cross Validation of AI-based Aerial Photograph

  • Lee, Seong-Hyeok;Myeong, Soojeong;Yoon, Donghyeon;Lee, Moung-Jin
    • 대한원격탐사학회지
    • /
    • 제38권4호
    • /
    • pp.395-409
    • /
    • 2022
  • The purpose of this study is to evaluate the classification performance and applicability when land cover datasets constructed for AI training are cross validation to other areas. For study areas, Gyeongsang-do and Jeolla-do in South Korea were selected as cross validation areas, and training datasets were obtained from AI-Hub. The obtained datasets were applied to the U-Net algorithm, a semantic segmentation algorithm, for each region, and the accuracy was evaluated by applying them to the same and other test areas. There was a difference of about 13-15% in overall classification accuracy between the same and other areas. For rice field, fields and buildings, higher accuracy was shown in the Jeolla-do test areas. For roads, higher accuracy was shown in the Gyeongsang-do test areas. In terms of the difference in accuracy by weight, the result of applying the weights of Gyeongsang-do showed high accuracy for forests, while that of applying the weights of Jeolla-do showed high accuracy for dry fields. The result of land cover classification, it was found that there is a difference in classification performance of existing datasets depending on area. When constructing land cover map for AI training, it is expected that higher quality datasets can be constructed by reflecting the characteristics of various areas. This study is highly scalable from two perspectives. First, it is to apply satellite images to AI study and to the field of land cover. Second, it is expanded based on satellite images and it is possible to use a large scale area and difficult to access.

EfficientNetV2 및 YOLOv5를 사용한 금속 표면 결함 검출 및 분류 (Metal Surface Defect Detection and Classification using EfficientNetV2 and YOLOv5)

  • ;김강철
    • 한국전자통신학회논문지
    • /
    • 제17권4호
    • /
    • pp.577-586
    • /
    • 2022
  • 철강 표면 결함의 검출 및 분류는 철강 산업의 제품 품질 관리에 중요하다. 그러나 정확도가 낮고 속도가 느리기 때문에 기존 방식은 생산 라인에서 효과적으로 사용할 수 없다. 현재 널리 사용되는 알고리즘(딥러닝 기반)은 정확도 문제가 있으며 아직 개발의 여지가 있다. 본 논문에서는 이미지 분류를 위한 EfficientNetV2와 물체 검출기로 YOLOv5를 결합한 강철 표면 결함 검출 방법을 제안한다. 이 모델의 장점은 훈련 시간이 짧고 정확도가 높다는 것이다. 먼저 EfficientNetV2 모델에 입력되는 이미지는 결함 클래스를 분류하고 결함이 있을 확률을 예측한다. 결함이 있을 확률이 0.3보다 작으면 알고리즘은 결함이 없는 샘플로 인식한다. 그렇지 않으면 샘플이 YOLOv5에 추가로 입력되어 금속 표면의 결함 감지 프로세스를 수행한다. 실험에 따르면 제안된 모델은 NEU 데이터 세트에서 98.3%의 정확도로 우수한 성능을 보였고, 동시에 평균 훈련 속도는 다른 모델보다 단축된 것으로 나타났다.

Center point prediction using Gaussian elliptic and size component regression using small solution space for object detection

  • Yuantian Xia;Shuhan Lu;Longhe Wang;Lin Li
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권8호
    • /
    • pp.1976-1995
    • /
    • 2023
  • The anchor-free object detector CenterNet regards the object as a center point and predicts it based on the Gaussian circle region. For each object's center point, CenterNet directly regresses the width and height of the objects and finally gets the boundary range of the objects. However, the critical range of the object's center point can not be accurately limited by using the Gaussian circle region to constrain the prediction region, resulting in many low-quality centers' predicted values. In addition, because of the large difference between the width and height of different objects, directly regressing the width and height will make the model difficult to converge and lose the intrinsic relationship between them, thereby reducing the stability and consistency of accuracy. For these problems, we proposed a center point prediction method based on the Gaussian elliptic region and a size component regression method based on the small solution space. First, we constructed a Gaussian ellipse region that can accurately predict the object's center point. Second, we recode the width and height of the objects, which significantly reduces the regression solution space and improves the convergence speed of the model. Finally, we jointly decode the predicted components, enhancing the internal relationship between the size components and improving the accuracy consistency. Experiments show that when using CenterNet as the improved baseline and Hourglass-104 as the backbone, on the MS COCO dataset, our improved model achieved 44.7%, which is 2.6% higher than the baseline.

인공지능 기반 화자 식별 기술의 불공정성 분석 (Analysis of unfairness of artificial intelligence-based speaker identification technology)

  • 신나연;이진민;노현;이일구
    • 융합보안논문지
    • /
    • 제23권1호
    • /
    • pp.27-33
    • /
    • 2023
  • Covid-19으로 인한 디지털화는 인공지능 기반의 음성인식 기술을 급속하게 발전시켰다. 그러나 이 기술은 데이터셋이 일부 집단에 편향될 경우 인종 및 성차별과 같은 불공정한 사회적 문제를 초래하고 인공지능 서비스의 신뢰성과 보안성을 열화시키는 요인이 된다. 본 연구에서는 대표적인 인공지능의 CNN(Convolutional Neural Network) 모델인 VGGNet(Visual Geometry Group Network), ResNet(Residual neural Network), MobileNet을 활용한 편향된 데이터 환경에서 정확도에 기반한 불공정성을 비교 및 분석한다. 실험 결과에 따르면 Top1-accuracy에서 ResNet34가 여성과 남성이 91%, 89.9%로 가장 높은 정확도를 보였고, 성별 간 정확도 차는 ResNet18이 1.8%로 가장 작았다. 모델별 성별 간의 정확도 차이는 서비스 이용 시 남녀 간의 서비스 품질에 대한 차이와 불공정한 결과를 야기한다.

The Effects of Job Demand-control-support Profiles on Presenteeism: Evidence from the Sixth Korean Working Condition Survey

  • Ari Min;Hye Chong Hong
    • Safety and Health at Work
    • /
    • 제14권1호
    • /
    • pp.85-92
    • /
    • 2023
  • Background: Presenteeism is closely related to work performance, work quality and quantity, and productivity at work. According to the job demand-control-support model, job demand, job control, and support play important roles in presenteeism. The present study investigated job characteristics profiles based on the job demand-control-support model and identify the association between job characteristics profiles and presenteeism. Methods: This secondary data analysis used the Sixth Korean Working Condition Survey, a nationwide cross-sectional dataset. The study included 25,361 Korean wage workers employed in the workplace with two or more workers. Participants were classified into four job characteristics profiles based on the job demand-control-support model, using latent profile analysis, and logistic regression was performed to examine the association between study variables. Results: Overall, 11.0 % of study participants reported experience of presenteeism in the past 12 months. Age, sex, location, monthly income, shift work, work hours, health problems, and sleep disturbances were significantly associated with presenteeism. The rate of presenteeism was the highest in the passive isolate group. The passive collective, active collective, and low-stain collective groups had a 23.0%, 21.0%, and 29.0% lower likelihood of experiencing presenteeism, respectively, than the passive isolate group. Conclusions: The job demand-control-support profiles and the risk of presenteeism were significantly associated. The most significant group that lowered the experience of presenteeism was the low-strain collective group, which had a low level of demand and high levels of control and support. Therefore, we need a policy to reduce job demand and increase job control and support at the organizational and national levels.

Negative association between high temperature-humidity index and milk performance and quality in Korean dairy system: big data analysis

  • Dongseok Lee;Daekyum Yoo;Hyeran Kim;Jakyeom Seo
    • Journal of Animal Science and Technology
    • /
    • 제65권3호
    • /
    • pp.588-595
    • /
    • 2023
  • The aim of this study was to investigate the effects of heat stress on milk traits in South Korea using comprehensive data (dairy production and climate). The dataset for this study comprised 1,498,232 test-day records for milk yield, fat- and protein-corrected milk, fat yield, protein yield, milk urea nitrogen (MUN), and somatic cell score (SCS) from 215,276 Holstein cows (primiparous: n = 122,087; multiparous: n = 93,189) in 2,419 South Korean dairy herds. Data were collected from July 2017 to April 2020 through the Dairy Cattle Improvement Program, and merged with meteorological data from 600 automatic weather stations through the Korea Meteorological Administration. The segmented regression model was used to estimate the effects of the temperature-humidity index (THI) on milk traits and elucidate the break point (BP) of the THI. To acquire the least-squares mean of milk traits, the generalized linear model was applied using fixed effects (region, calving year, calving month, parity, days in milk, and THI). For all parameters, the BP of THI was observed; in particular, milk production parameters dramatically decreased after a specific BP of THI (p < 0.05). In contrast, MUN and SCS drastically increased when THI exceeded BP in all cows (p < 0.05) and primiparous cows (p < 0.05), respectively. Dairy cows in South Korea exhibited negative effects on milk traits (decrease in milk performance, increase in MUN, and SCS) when the THI exceeded 70; therefore, detailed feeding management is required to prevent heat stress in dairy cows.

Automated Prioritization of Construction Project Requirements using Machine Learning and Fuzzy Logic System

  • Hassan, Fahad ul;Le, Tuyen;Le, Chau;Shrestha, K. Joseph
    • 국제학술발표논문집
    • /
    • The 9th International Conference on Construction Engineering and Project Management
    • /
    • pp.304-311
    • /
    • 2022
  • Construction inspection is a crucial stage that ensures that all contractual requirements of a construction project are verified. The construction inspection capabilities among state highway agencies have been greatly affected due to budget reduction. As a result, efficient inspection practices such as risk-based inspection are required to optimize the use of limited resources without compromising inspection quality. Automated prioritization of textual requirements according to their criticality would be extremely helpful since contractual requirements are typically presented in an unstructured natural language in voluminous text documents. The current study introduces a novel model for predicting the risk level of requirements using machine learning (ML) algorithms. The ML algorithms tested in this study included naïve Bayes, support vector machines, logistic regression, and random forest. The training data includes sequences of requirement texts which were labeled with risk levels (such as very low, low, medium, high, very high) using the fuzzy logic systems. The fuzzy model treats the three risk factors (severity, probability, detectability) as fuzzy input variables, and implements the fuzzy inference rules to determine the labels of requirements. The performance of the model was examined on labeled dataset created by fuzzy inference rules and three different membership functions. The developed requirement risk prediction model yielded a precision, recall, and f-score of 78.18%, 77.75%, and 75.82%, respectively. The proposed model is expected to provide construction inspectors with a means for the automated prioritization of voluminous requirements by their importance, thus help to maximize the effectiveness of inspection activities under resource constraints.

  • PDF

오류 유형에 따른 생성요약 모델의 본문-요약문 간 요약 성능평가 비교 (Empirical Study for Automatic Evaluation of Abstractive Summarization by Error-Types)

  • 이승수;강상우
    • 인지과학
    • /
    • 제34권3호
    • /
    • pp.197-226
    • /
    • 2023
  • 텍스트 생성요약은 자연어처리의 과업 중 하나로 긴 텍스트의 내용을 보존하면서 짧게 축약된 요약문을 생성한다. 생성요약 과업의 특성 상 본문의 핵심내용을 요약문에서 보존하는 것은 매우 중요하다. 기존의 생성요약 방법론은 정답요약과의 어휘 중첩도(Lexical-Overlap)를 기반으로 본문의 내용과 유창성을 측정했다. ROUGE는 생성요약 요약모델의 평가지표로 많이 사용하는 어휘 중첩도 기반의 평가지표이다. 생성요약 벤치마크에서 ROUGE가 49점대로 매우 높은 성능을 보임에도 불구하고, 생성한 요약문과 본문의 내용이 불일치하는 경우가 30% 가량 존재한다. 본 연구에서는 정답요약의 도움 없이 본문만을 활용해 생성요약 모델의 성능을 평가하는 방법론을 제안한다. 본 연구에서 제안한 평가점수를 AggreFACT의 라벨과 상관도 분석결과, 다음의 두 가지 경우 가장 높은 상관관계를 보였다. 첫 번째는 Transformer 구조의 인코더-디코더 구조에 대규모 사전학습을 진행한 BART와 PEGASUS 등을 생성요약 모델의 베이스라인으로 사용한 경우이고, 두 번째는 요약문 전체에 걸쳐 오류가 발생한 경우이다.