• Title/Summary/Keyword: Standard Dataset

검색결과 194건 처리시간 0.029초

Automatic Detection of Dead Trees Based on Lightweight YOLOv4 and UAV Imagery

  • Yuanhang Jin;Maolin Xu;Jiayuan Zheng
    • Journal of Information Processing Systems
    • /
    • 제19권5호
    • /
    • pp.614-630
    • /
    • 2023
  • Dead trees significantly impact forest production and the ecological environment and pose constraints to the sustainable development of forests. A lightweight YOLOv4 dead tree detection algorithm based on unmanned aerial vehicle images is proposed to address current limitations in dead tree detection that rely mainly on inefficient, unsafe and easy-to-miss manual inspections. An improved logarithmic transformation method was developed in data pre-processing to display tree features in the shadows. For the model structure, the original CSPDarkNet-53 backbone feature extraction network was replaced by MobileNetV3. Some of the standard convolutional blocks in the original extraction network were replaced by depthwise separable convolution blocks. The new ReLU6 activation function replaced the original LeakyReLU activation function to make the network more robust for low-precision computations. The K-means++ clustering method was also integrated to generate anchor boxes that are more suitable for the dataset. The experimental results show that the improved algorithm achieved an accuracy of 97.33%, higher than other methods. The detection speed of the proposed approach is higher than that of YOLOv4, improving the efficiency and accuracy of the detection process.

증류 기반 연합 학습에서 로짓 역전을 통한 개인 정보 취약성에 관한 연구 (A Survey on Privacy Vulnerabilities through Logit Inversion in Distillation-based Federated Learning)

  • 윤수빈;조윤기;백윤흥
    • 한국정보처리학회:학술대회논문집
    • /
    • 한국정보처리학회 2024년도 춘계학술발표대회
    • /
    • pp.711-714
    • /
    • 2024
  • In the dynamic landscape of modern machine learning, Federated Learning (FL) has emerged as a compelling paradigm designed to enhance privacy by enabling participants to collaboratively train models without sharing their private data. Specifically, Distillation-based Federated Learning, like Federated Learning with Model Distillation (FedMD), Federated Gradient Encryption and Model Sharing (FedGEMS), and Differentially Secure Federated Learning (DS-FL), has arisen as a novel approach aimed at addressing Non-IID data challenges by leveraging Federated Learning. These methods refine the standard FL framework by distilling insights from public dataset predictions, securing data transmissions through gradient encryption, and applying differential privacy to mask individual contributions. Despite these innovations, our survey identifies persistent vulnerabilities, particularly concerning the susceptibility to logit inversion attacks where malicious actors could reconstruct private data from shared public predictions. This exploration reveals that even advanced Distillation-based Federated Learning systems harbor significant privacy risks, challenging the prevailing assumptions about their security and underscoring the need for continued advancements in secure Federated Learning methodologies.

Online analysis of iron ore slurry using PGNAA technology with artificial neural network

  • Haolong Huang;Pingkun Cai;Xuwen Liang;Wenbao Jia
    • Nuclear Engineering and Technology
    • /
    • 제56권7호
    • /
    • pp.2835-2841
    • /
    • 2024
  • Real-time analysis of metallic mineral grade and slurry concentration is significant for improving flotation efficiency and product quality. This study proposes an online detection method of ore slurry combining the Prompt Gamma Neutron Activation Analysis (PGNAA) technology and artificial neural network (ANN), which can provide mineral information rapidly and accurately. Firstly, a PGNAA analyzer based on a D-T neutron generator and a BGO detector was used to obtain a gamma-ray spectrum dataset of ore slurry samples, which was used to construct and optimize the ANN model for adaptive analysis. The evaluation metrics calculated by leave-one-out cross-validation indicated that, compared with the weighted library least squares (WLLS) approach, ANN obtained more precise and stable results, with mean absolute percentage errors of 4.66% and 2.80% for Fe grade and slurry concentration, respectively, and the highest average standard deviation of only 0.0119. Meanwhile, the analytical errors of the samples most affected by matrix effects was reduced to 0.61 times and 0.56 times of the WLLS method, respectively.

A study on the characteristics of applying oversampling algorithms to Fosberg Fire-Weather Index (FFWI) data

  • Sang Yeob Kim;Dongsoo Lee;Jung-Doung Yu;Hyung-Koo Yoon
    • Smart Structures and Systems
    • /
    • 제34권1호
    • /
    • pp.9-15
    • /
    • 2024
  • Oversampling algorithms are methods employed in the field of machine learning to address the constraints associated with data quantity. This study aimed to explore the variations in reliability as data volume is progressively increased through the use of oversampling algorithms. For this purpose, the synthetic minority oversampling technique (SMOTE) and the borderline synthetic minority oversampling technique (BSMOTE) are chosen. The data inputs, which included air temperature, humidity, and wind speed, are parameters used in the Fosberg Fire-Weather Index (FFWI). Starting with a base of 52 entries, new data sets are generated by incrementally increasing the data volume by 10% up to a total increase of 100%. This augmented data is then utilized to predict FFWI using a deep neural network. The coefficient of determination (R2) is calculated for predictions made with both the original and the augmented datasets. Suggesting that increasing data volume by more than 50% of the original dataset quantity yields more reliable outcomes. This study introduces a methodology to alleviate the challenge of establishing a standard for data augmentation when employing oversampling algorithms, as well as a means to assess reliability.

한국 성인의 총당류 섭취와 대사증후군과의 관계 -2001년과 2002년도 국민건강영양조사자료를 이용하여- (Association of Total Sugar Intakes and Metabolic Syndrome from Korean National Health and Nutrition Examination Survey 2001-2002)

  • 정진은
    • Journal of Nutrition and Health
    • /
    • 제40권sup권
    • /
    • pp.29-38
    • /
    • 2007
  • The purpose of this study was to establish an association between the percent of energy from total sugar and disease prevalence of obesity, hypertension, dyslipidemia, insulin resistance, and metabolic syndrome with the context of the current population dietary practice in Korea. The Korean National Health and Nutrition Survey, 2001 and 2002 dataset were used as the source of data for this research. Usual nutritional intakes for over 20 years old people were calculated from the two non-consecutive dietary intake data from KNHANES 2001 and 2002 dataset. SAS and SUDAAN were used for statistical analyses. Sample weighted means, standard errors, and population percentages were calculated, and multiple logistic regression model with adjustment for covariates were used to determine the odds ratios(ORs) and 95% confidence intervals. Subjects were categorized as 3 ways and compared the LS means and ORs for heath factors. First, subjects excluding pregnant women, were categorized according to percent of energy from the usual total sugar intakes as ${\leq}10%$, 11-15%, 16-20%, 21-25%, >25%. Risk of LDL cholesterol showed a tendency to increase in the '>25%' group compared to the '<10%' group. The risks of the other health effects did not show any significant differences. Second, the subjects were categorized considering both Acceptable Macronutrient Distribution Range(AMDR) from carbohydrate and %Energy from total sugar as 'CHO<55% & Total sugar ${\leq}10%$', 'CHO 55-70% & Total sugar 11-25%', and '$CHO{\geq}70%$ & Total sugar ${\geq}25%$'. The risk of obesity tended to increase in the '$CHO{\geq}70%$ & Total sugar ${\geq}25%$' group compared to the 'CHO<55% & Total sugar ${\leq}10%$'. Third, the subjects were categorized as 'CHO<55% & Total sugar ${\leq}10%$', 'CHO 55-70% & Total sugar 11-20%', and '$CHO{\geq}70%$ & Total sugar ${\geq}20%$'. The risk of obesity also tended to increase in the '$CHO{\geq}70%$ & Total sugar ${\geq}20%$' group compared to the 'CHO<55% & Total sugar ${\geq}20%$' group. In conclusion, risk of LDL cholesterol showed a tendency to increase in the over 25% total sugar intake group, and the risk of obesity tended to increase in the 20-25% total sugar intake and high carbohydrate intake group. The risks of hypertension, hyperlipidemia, insulin resistance, and metabolic syndrome were not associated with total sugar intakes. More research to elucidate the association for Korean between the intakes of total sugar, added sugar, glucose, fructose, and sweeteners and diseases prevalences shoud be excuted in the future.

중풍 기허증 진단 기준에 관한 연구 II (Study II of Diagnosis Criteria for Qi deficiency in Stroke)

  • 강병갑;허태영;윤경진;박태용;이주아;유수성;박건희;이명수
    • 동의생리병리학회지
    • /
    • 제28권1호
    • /
    • pp.76-81
    • /
    • 2014
  • The aim of this study was to build the diagnosis criteria of Qi deficiency using distribution of sum of 11 items for Qi deficiency in stroke patients. Between September 2006 and December 2010, 2,994 patients from 11 Korean medical hospitals were asked to complete the Korean Standard Pattern Identification for Stroke (K-SPI-Stroke) questionnaire as a part of project 'Fundamental study for the standardization and objectification of pattern identification in traditional Korean medicine for stroke (SOPI-Stroke). Each patient was independently diagnosed by two experts (traditional Korean medicine physicians) from the same site according to one of five patterns. 2,994 patients were divided modeling and testing in 70:30 ratio by stratification of pattern identification. We calculated the sensitivity, specificity, accuracy and odds ratio (OR) using distribution of sum of 11 items (signs and symptoms) for Qi deficiency. More than four from 11 items of Qi deficiency in modeling dataset, sensitivity, specificity, accuracy and OR was 70.07%, 74.94%, 73.92% and 7.00, respectively. In testing dataset, 78.31%, 73.45%, 74.47% and 9.98, respectively. Although this values are not high, after values of sensitivity, specificity, accuracy and OR should be more than current value, and then we should be able to suggest as objective diagnosing criteria.

앙상블 머신러닝 모형을 이용한 하천 녹조발생 예측모형의 입력변수 특성에 따른 성능 영향 (Effect of input variable characteristics on the performance of an ensemble machine learning model for algal bloom prediction)

  • 강병구;박정수
    • 상하수도학회지
    • /
    • 제35권6호
    • /
    • pp.417-424
    • /
    • 2021
  • Algal bloom is an ongoing issue in the management of freshwater systems for drinking water supply, and the chlorophyll-a concentration is commonly used to represent the status of algal bloom. Thus, the prediction of chlorophyll-a concentration is essential for the proper management of water quality. However, the chlorophyll-a concentration is affected by various water quality and environmental factors, so the prediction of its concentration is not an easy task. In recent years, many advanced machine learning algorithms have increasingly been used for the development of surrogate models to prediction the chlorophyll-a concentration in freshwater systems such as rivers or reservoirs. This study used a light gradient boosting machine(LightGBM), a gradient boosting decision tree algorithm, to develop an ensemble machine learning model to predict chlorophyll-a concentration. The field water quality data observed at Daecheong Lake, obtained from the real-time water information system in Korea, were used for the development of the model. The data include temperature, pH, electric conductivity, dissolved oxygen, total organic carbon, total nitrogen, total phosphorus, and chlorophyll-a. First, a LightGBM model was developed to predict the chlorophyll-a concentration by using the other seven items as independent input variables. Second, the time-lagged values of all the input variables were added as input variables to understand the effect of time lag of input variables on model performance. The time lag (i) ranges from 1 to 50 days. The model performance was evaluated using three indices, root mean squared error-observation standard deviation ration (RSR), Nash-Sutcliffe coefficient of efficiency (NSE) and mean absolute error (MAE). The model showed the best performance by adding a dataset with a one-day time lag (i=1) where RSR, NSE, and MAE were 0.359, 0.871 and 1.510, respectively. The improvement of model performance was observed when a dataset with a time lag up of about 15 days (i=15) was added.

딥러닝과 구체의 형태 변형 방법을 이용한 단일 이미지에서의 3D Mesh 재구축 기법 (3D Mesh Reconstruction Technique from Single Image using Deep Learning and Sphere Shape Transformation Method)

  • 김정윤;이승호
    • 전기전자학회논문지
    • /
    • 제26권2호
    • /
    • pp.160-168
    • /
    • 2022
  • 본 논문에서는 딥러닝과 구체의 형태 변형 방법을 이용한 단일 이미지에서의 3D mesh 재구축 기법을 제안한다. 제안한 기법은 기존의 방식과 다른 다음과 같은 독창성이 있다. 첫 번째, 기존의 근처의 가까운 점들을 연결하여 모서리 또는 면을 구축하는 방식과 다르게 딥러닝 네트워크을 통하여 구체의 꼭짓점의 위치를 사물의 3D 포인트 클라우드와 매우 유사하게 수정한다. 3D 포인트 클라우드를 이용하므로 메모리가 적게 필요하며 구체의 꼭짓점에 오프셋 값 사이에 덧셈 연산만을 수행하기 때문에 더 빠른 연산이 가능하다. 두 번째, 수정한 꼭짓점에 구체의 면 정보를 씌워 3D mesh를 재구축한다. 구체의 꼭짓점의 위치를 수정하여 생성한 3D 포인트 클라우드의 점들의 간격이 일정하지 않을 때에도 이미 점들 사이의 연결 여부를 나타내는 구체의 면 정보라는 3D mesh의 면 정보를 가지고 있어 표현의 단순화나 결손을 방지할 수 있다. 제안하는 기법의 객관적인 신뢰성을 평가하기 위해 공개된 표준 데이터셋인 ShapeNet 데이터셋을 이용하여 비교 논문들과 같은 방법으로 실험한 결과, 본 논문에서 제안하는 기법의 IoU 값이 0.581로, chamfer distance 값은 0.212로 산출되었다. IoU 값은 수치가 높을수록, chamfer distance 값은 수치가 낮을수록 우수한 결과를 나타내므로 다른 논문에서 발표한 기법들보다 3D mesh 재구축의 결과에서 성능의 효율성이 입증되었다.

대통령 전자기록물의 이관방식 변천과 개선방안 연구 19대 문재인 정부 대통령 전자기록물을 중심으로 (A Study of the Transition Process in Presidential Electronic Records Transfer and Improvement Measures : Focused on the Electronic Records of the 19th President Moon Jae-in's Administration)

  • 윤정훈
    • 기록학연구
    • /
    • 제75호
    • /
    • pp.41-89
    • /
    • 2023
  • 2007년 「대통령기록물법」 제정 이후, 16대 노무현 정부의 대통령 전자기록물 이관 사례는 공공기록물 관리에 있어 첨병으로써의 역할과 새로운 전자기록물 관리의 테스트 베드로써의 역할을 수행했었다. 19대 문재인 정부의 대통령 전자기록물을 이관할 때에는 16대 때의 전자기록물 이관방식을 계승하되, 몇 가지 혁신적인 시도가 있었다. 대통령기록관은 처음으로 대통령자문기관의 전자문서를 장기보존패키지로 변환한 후 온라인으로 이관 받았고, 데이터의 특성을 고려하여 대통령 기록물생산기관의 행정정보 데이터세트를 SIARD 규격으로 이관을 받았다. 그리고 대통령기록관은 웹사이트를 OVF 형태로 시범적으로 이관 받았으며, 소셜미디어를 API를 통해 직접 수집하였다. 이와 같이 이 연구는 16대 노무현 정부 때부터 19대 문재인 정부 때까지 대통령 전자기록물 이관방식과 관련한 변천과정을 조사하였다. 그리고 19대 문재인 정부의 대통령 전자기록물 유형별 이관방식을 중심으로 주요성과 및 문제점을 분석하여 향후 개선방안을 제시하였다.

Prediction of Postoperative Lung Function in Lung Cancer Patients Using Machine Learning Models

  • Oh Beom Kwon;Solji Han;Hwa Young Lee;Hye Seon Kang;Sung Kyoung Kim;Ju Sang Kim;Chan Kwon Park;Sang Haak Lee;Seung Joon Kim;Jin Woo Kim;Chang Dong Yeo
    • Tuberculosis and Respiratory Diseases
    • /
    • 제86권3호
    • /
    • pp.203-215
    • /
    • 2023
  • Background: Surgical resection is the standard treatment for early-stage lung cancer. Since postoperative lung function is related to mortality, predicted postoperative lung function is used to determine the treatment modality. The aim of this study was to evaluate the predictive performance of linear regression and machine learning models. Methods: We extracted data from the Clinical Data Warehouse and developed three sets: set I, the linear regression model; set II, machine learning models omitting the missing data: and set III, machine learning models imputing the missing data. Six machine learning models, the least absolute shrinkage and selection operator (LASSO), Ridge regression, ElasticNet, Random Forest, eXtreme gradient boosting (XGBoost), and the light gradient boosting machine (LightGBM) were implemented. The forced expiratory volume in 1 second measured 6 months after surgery was defined as the outcome. Five-fold cross-validation was performed for hyperparameter tuning of the machine learning models. The dataset was split into training and test datasets at a 70:30 ratio. Implementation was done after dataset splitting in set III. Predictive performance was evaluated by R2 and mean squared error (MSE) in the three sets. Results: A total of 1,487 patients were included in sets I and III and 896 patients were included in set II. In set I, the R2 value was 0.27 and in set II, LightGBM was the best model with the highest R2 value of 0.5 and the lowest MSE of 154.95. In set III, LightGBM was the best model with the highest R2 value of 0.56 and the lowest MSE of 174.07. Conclusion: The LightGBM model showed the best performance in predicting postoperative lung function.