• Title/Summary/Keyword: Dataset Quality

Search Result 414, Processing Time 0.024 seconds

Cross-Project Pooling of Defects for Handling Class Imbalance

  • Catherine, J.M.;Djodilatchoumy, S
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.10
    • /
    • pp.11-16
    • /
    • 2022
  • Applying predictive analytics to predict software defects has improved the overall quality and decreased maintenance costs. Many supervised and unsupervised learning algorithms have been used for defect prediction on publicly available datasets. Most of these datasets suffer from an imbalance in the output classes. We study the impact of class imbalance in the defect datasets on the efficiency of the defect prediction model and propose a CPP method for handling imbalances in the dataset. The performance of the methods is evaluated using measures like Matthew's Correlation Coefficient (MCC), Recall, and Accuracy measures. The proposed sampling technique shows significant improvement in the efficiency of the classifier in predicting defects.

Automatic crack detection of dam concrete structures based on deep learning

  • Zongjie Lv;Jinzhang Tian;Yantao Zhu;Yangtao Li
    • Computers and Concrete
    • /
    • v.32 no.6
    • /
    • pp.615-623
    • /
    • 2023
  • Crack detection is an essential method to ensure the safety of dam concrete structures. Low-quality crack images of dam concrete structures limit the application of neural network methods in crack detection. This research proposes a modified attentional mechanism model to reduce the disturbance caused by uneven light, shadow, and water spots in crack images. Also, the focal loss function solves the small ratio of crack information. The dataset collects from the network, laboratory and actual inspection dataset of dam concrete structures. This research proposes a novel method for crack detection of dam concrete structures based on the U-Net neural network, namely AF-UNet. A mutual comparison of OTSU, Canny, region growing, DeepLab V3+, SegFormer, U-Net, and AF-UNet (proposed) verified the detection accuracy. A binocular camera detects cracks in the experimental scene. The smallest measurement width of the system is 0.27 mm. The potential goal is to achieve real-time detection and localization of cracks in dam concrete structures.

Prediction of Depression from Machine Learning Data (머신러닝 데이터의 우울증에 대한 예측)

  • Jeong Hee KIM;Kyung-A KIM
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.17-21
    • /
    • 2023
  • The primary objective of this research is to utilize machine learning models to analyze factors tailored to each dataset for predicting mental health conditions. The study aims to develop appropriate models based on specific datasets, with the goal of accurately predicting mental health states through the analysis of distinct factors present in each dataset. This approach seeks to design more effective strategies for the prevention and intervention of depression, enhancing the quality of mental health services by providing personalized services tailored to individual circumstances. Overall, the research endeavors to advance the development of personalized mental health prediction models through data-driven factor analysis, contributing to the improvement of mental health services on an individualized basis.

Photorealistic Real-Time Dense 3D Mesh Mapping for AUV (자율 수중 로봇을 위한 사실적인 실시간 고밀도 3차원 Mesh 지도 작성)

  • Jungwoo Lee;Younggun Cho
    • The Journal of Korea Robotics Society
    • /
    • v.19 no.2
    • /
    • pp.188-195
    • /
    • 2024
  • This paper proposes a photorealistic real-time dense 3D mapping system that utilizes a neural network-based image enhancement method and mesh-based map representation. Due to the characteristics of the underwater environment, where problems such as hazing and low contrast occur, it is hard to apply conventional simultaneous localization and mapping (SLAM) methods. At the same time, the behavior of Autonomous Underwater Vehicle (AUV) is computationally constrained. In this paper, we utilize a neural network-based image enhancement method to improve pose estimation and mapping quality and apply a sliding window-based mesh expansion method to enable lightweight, fast, and photorealistic mapping. To validate our results, we utilize real-world and indoor synthetic datasets. We performed qualitative validation with the real-world dataset and quantitative validation by modeling images from the indoor synthetic dataset as underwater scenes.

Fault Prediction Using Statistical and Machine Learning Methods for Improving Software Quality

  • Malhotra, Ruchika;Jain, Ankita
    • Journal of Information Processing Systems
    • /
    • v.8 no.2
    • /
    • pp.241-262
    • /
    • 2012
  • An understanding of quality attributes is relevant for the software organization to deliver high software reliability. An empirical assessment of metrics to predict the quality attributes is essential in order to gain insight about the quality of software in the early phases of software development and to ensure corrective actions. In this paper, we predict a model to estimate fault proneness using Object Oriented CK metrics and QMOOD metrics. We apply one statistical method and six machine learning methods to predict the models. The proposed models are validated using dataset collected from Open Source software. The results are analyzed using Area Under the Curve (AUC) obtained from Receiver Operating Characteristics (ROC) analysis. The results show that the model predicted using the random forest and bagging methods outperformed all the other models. Hence, based on these results it is reasonable to claim that quality models have a significant relevance with Object Oriented metrics and that machine learning methods have a comparable performance with statistical methods.

Proposal of Standardization Plan for Defense Unstructured Datasets based on Unstructured Dataset Standard Format (비정형 데이터셋 표준포맷 기반 국방 비정형 데이터셋 표준화 방안 제안)

  • Yun-Young Hwang;Jiseong Son
    • Journal of Internet Computing and Services
    • /
    • v.25 no.1
    • /
    • pp.189-198
    • /
    • 2024
  • AI is accepted not only in the private sector but also in the defense sector as a cutting-edge technology that must be introduced for the development of national defense. In particular, artificial intelligence has been selected as a key task in defense science and technology innovation, and the importance of data is increasing. As the national defense department shifts from a closed data policy to data sharing and activation, efforts are being made to secure high-quality data necessary for the development of national defense. In particular, we are promoting a review of the business budget system to secure data so that related procedures can be improved to reflect the unique characteristics of AI and big data, and research and development can begin with sufficient large quantities and high-quality data. However, there is a need to establish standardization and quality standards for structured data and unstructured data at the national defense level, but the defense department is still proposing standardization and quality standards for structured data, so this needs to be supplemented. In this paper, we propose an unstructured data set standard format for defense unstructured data sets, which are most needed in defense artificial intelligence, and based on this, we propose a standardization method for defense unstructured data sets.

Quantitative Assessment of the Quality of Regional Adaptation Trial Data for Crop Model Improvement (작물 모형 개선을 위한 지역적응시험 자료의 정량적 품질 평가)

  • Hyun, Shinwoo;Seo, Bo Hun;Lee, Sukin;Kim, Kwang Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.22 no.3
    • /
    • pp.194-204
    • /
    • 2020
  • Cultivar parameters, which are key inputs to a crop growth model, have been estimated using observation data in good quality. Observation data with high quality often require considerable labor and cost, which makes it challenging to gather a large quantity of data for calibration of cultivar parameters. Alternatively, data in sufficient quantity can be collected from the reports on the evaluation of cultivars by region although these data are of questionable quality. The objective of our study was to assess the quality of crop and management data available from the reports on the regional adaptation trials for rice cultivars. We also aimed to propose the measures for improvement of the data quality, which would aid reliable estimation of cultivar parameters. DatasetRanker, which is the tool designed for quantitative assessment of the data for parameter calibration, was used to evaluate the quality of the data available from the regional adaptation trials. It was found that these data for rice cultivars were classified into the Silver class, which could be used for validation or calibration of key cultivar parameters. However, those regional adaptation trial data would fall short of the quality for model improvement. Additional information on management, e.g., harvest and irrigation management, can increase the quantitative quality by 10% with the minimum effort and cost. The quality of the data can also be improved through measurements of initial conditions for crop growth simulations such as soil moisture and nutrients. In addition, crop model improvement can be facilitated using crop growth data in time series, which merits further studies on development of approaches for non-destructive methods to monitor the crop growth.

Wine Quality Assessment Using a Decision Tree with the Features Recommended by the Sequential Forward Selection

  • Lee, Seunghan;Kang, Kyungtae;Noh, Dong Kun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.2
    • /
    • pp.81-87
    • /
    • 2017
  • Nowadays wine is increasingly enjoyed by a wider range of consumers, and wine certification and quality assessment are key elements in supporting the wine industry to develop new technologies for both wine making and selling processes. There have been many attempts to construct a more methodical approach to the assessment of wines, but most of them rely on objective decision rather than subjective judgement. In this paper, we propose a data mining approach to predict human wine taste preferences that is based on easily available analytical tests at the certification step. We used sequential forward selection and decision tree for this purpose. Experiments with the wine quality dataset from the UC Irvine Machine Learning Repository demonstrate the accuracies of 76.7% and 78.7% for red and white wines respectively.

Analysis of Water-Quality Constituents Variations before and after Weir Construction in South Han River using Probability Distribution (확률분포를 이용한 남한강 보 건설 전·후 수질변화 분석)

  • Kim, Kyung Sub
    • Journal of Korean Society on Water Environment
    • /
    • v.35 no.1
    • /
    • pp.55-63
    • /
    • 2019
  • The Four Major Rivers Restoration Project started in 2009 and completed in early 2013 is a large-scale inter-ministry SOC project investing ₩22.2 $10^{12}$ and one of the Project's objectives was to enhance the water-quality grade through recovering the river eco-system and environment. The average concentration and probability distribution of water-quality constituents at given and selected sampling sites are very significant elements for analyzing and controlling the water-quality of rivers or reservoirs effectively. Average concentration can be estimated by point estimator, distribution function of water-quality constituents or Bootstrap method, in which the distribution function estimated with more data in case of insufficient dataset, is applied. Ipo and Gangcheon water-quality monitoring stations in South Han River were selected to compare and analyze the variation of concentration before and after Ipo and Gangcheon Weirs construction, using the whole 4-year's data, from 2005 to 2008 and from 2014 to 2017. Water-quality constituents such as BOD and COD relating to oxygen demanding wastes and TP and Chlorophyll-a relating to the process of nutrient enrichment called eutrophication were also selected. The guidelines for water-quality control and management after weir construction including evaluation of water-quality constituents' variations can be presented by this paper.

A Study on the Data Cleaning and Standardization of National Ecosystem Survey in Korea (전국자연환경조사 데이터 정제와 표준화 방안 연구)

  • Kwon, Yong-Su;Song, Kyohong;Kim, Mokyoung;Kim, Kidong
    • Korean Journal of Ecology and Environment
    • /
    • v.53 no.4
    • /
    • pp.380-389
    • /
    • 2020
  • Research on diagnosing and predicting the response of ecosystems caused by environmental changes such as artificial disturbance and climate change is emerging as the most important issue of biodiversity and ecosystem researches. This study aims to clean, standardize, and provide the results of National Ecosystem Survey which should be considered fundamentally in diagnosing and predicting ecosystem changes in the form of dataset. To refine and clean the dataset we developed a simple verification program based on the fifth National Ecosystem Survey Guideline and applied that program to the data from the second (1997~2005), third (2006~2013) and fourth (2014~2018) National Ecosystem Survey. Data quality control processes were implemented including (1) standardization of terminology, (2) similar data table integration, (3) unnecessary attribute and error elimination, (4) unification of different input items, (5) data arrangement in codes, and (6) code mapping for input items. These approaches and methods are the first attempt propose an option for ecological data standardization in Korea. The standardized dataset of National Ecosystem Survey in Korea will be easily accessible, reusable for both researchers and public. In addition, we expect it will contribute to the establishment of diverse environmental policies concerning environmental assessments, habitat conservation, prediction of endangered species distribution and ecological risks due to climate change. The dataset through this study is open freely online via EcoBank (nie-ecobank.kr) which is the first ecological information portal system in Korea developed by National Institute of Ecology.