• Title/Summary/Keyword: Cross - Validation

Search Result 994, Processing Time 0.028 seconds

Wildfire Severity Mapping Using Sentinel Satellite Data Based on Machine Learning Approaches (Sentinel 위성영상과 기계학습을 이용한 국내산불 피해강도 탐지)

  • Sim, Seongmun;Kim, Woohyeok;Lee, Jaese;Kang, Yoojin;Im, Jungho;Kwon, Chunguen;Kim, Sungyong
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_3
    • /
    • pp.1109-1123
    • /
    • 2020
  • In South Korea with forest as a major land cover class (over 60% of the country), many wildfires occur every year. Wildfires weaken the shear strength of the soil, forming a layer of soil that is vulnerable to landslides. It is important to identify the severity of a wildfire as well as the burned area to sustainably manage the forest. Although satellite remote sensing has been widely used to map wildfire severity, it is often difficult to determine the severity using only the temporal change of satellite-derived indices such as Normalized Difference Vegetation Index (NDVI) and Normalized Burn Ratio (NBR). In this study, we proposed an approach for determining wildfire severity based on machine learning through the synergistic use of Sentinel-1A Synthetic Aperture Radar-C data and Sentinel-2A Multi Spectral Instrument data. Three wildfire cases-Samcheok in May 2017, Gangreung·Donghae in April 2019, and Gosung·Sokcho in April 2019-were used for developing wildfire severity mapping models with three machine learning algorithms (i.e., Random Forest, Logistic Regression, and Support Vector Machine). The results showed that the random forest model yielded the best performance, resulting in an overall accuracy of 82.3%. The cross-site validation to examine the spatiotemporal transferability of the machine learning models showed that the models were highly sensitive to temporal differences between the training and validation sites, especially in the early growing season. This implies that a more robust model with high spatiotemporal transferability can be developed when more wildfire cases with different seasons and areas are added in the future.

Simultaneous estimation of fatty acids contents from soybean seeds using fourier transform infrared spectroscopy and gas chromatography by multivariate analysis (적외선 분광스펙트럼 및 기체크로마토그라피 분석 데이터의 다변량 통계분석을 이용한 대두 종자 지방산 함량예측)

  • Ahn, Myung Suk;Ji, Eun Yee;Song, Seung Yeob;Ahn, Joon Woo;Jeong, Won Joong;Min, Sung Ran;Kim, Suk Weon
    • Journal of Plant Biotechnology
    • /
    • v.42 no.1
    • /
    • pp.60-70
    • /
    • 2015
  • The aim of this study was to investigate whether fourier transform infrared (FT-IR) spectroscopy can be applied to simultaneous determination of fatty acids contents in different soybean cultivars. Total 153 lines of soybean (Glycine max Merrill) were examined by FT-IR spectroscopy. Quantification of fatty acids from the soybean lines was confirmed by quantitative gas chromatography (GC) analysis. The quantitative spectral variation among different soybean lines was observed in the amide bond region ($1,700{\sim}1,500cm^{-1}$), phosphodiester groups ($1,500{\sim}1,300cm^{-1}$) and sugar region ($1,200{\sim}1,000cm^{-1}$) of FT-IR spectra. The quantitative prediction modeling of 5 individual fatty acids contents (palmitic acid, stearic acid, oleic acid, linoleic acid, linolenic acid) from soybean lines were established using partial least square regression algorithm from FT-IR spectra. In cross validation, there were high correlations ($R^2{\geq}0.97$) between predicted content of 5 individual fatty acids by PLS regression modeling from FT-IR spectra and measured content by GC. In external validation, palmitic acid ($R^2=0.8002$), oleic acid ($R^2=0.8909$) and linoleic acid ($R^2=0.815$) were predicted with good accuracy, while prediction for stearic acid ($R^2=0.4598$), linolenic acid ($R^2=0.6868$) had relatively lower accuracy. These results clearly show that FT-IR spectra combined with multivariate analysis can be used to accurately predict fatty acids contents in soybean lines. Therefore, we suggest that the PLS prediction system for fatty acid contents using FT-IR analysis could be applied as a rapid and high throughput screening tool for the breeding for modified Fatty acid composition in soybean and contribute to accelerating the conventional breeding.

Estimation of river discharge using satellite-derived flow signals and artificial neural network model: application to imjin river (Satellite-derived flow 시그널 및 인공신경망 모형을 활용한 임진강 유역 유출량 산정)

  • Li, Li;Kim, Hyunglok;Jun, Kyungsoo;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.49 no.7
    • /
    • pp.589-597
    • /
    • 2016
  • In this study, we investigated the use of satellite-derived flow (SDF) signals and a data-based model for the estimation of outflow for the river reach where in situ measurements are either completely unavailable or are difficult to access for hydraulic and hydrology analysis such as the upper basin of Imjin River. It has been demonstrated by many studies that the SDF signals can be used as the river width estimates and the correlation between SDF signals and river width is related to the shape of cross sections. To extract the nonlinear relationship between SDF signals and river outflow, Artificial Neural Network (ANN) model with SDF signals as its inputs were applied for the computation of flow discharge at Imjin Bridge located in Imjin River. 15 pixels were considered to extract SDF signals and Partial Mutual Information (PMI) algorithm was applied to identify the most relevant input variables among 150 candidate SDF signals (including 0~10 day lagged observations). The estimated discharges by ANN model were compared with the measured ones at Imjin Bridge gauging station and correlation coefficients of the training and validation were 0.86 and 0.72, respectively. It was found that if the 1 day previous discharge at Imjin bridge is considered as an input variable for ANN model, the correlation coefficients were improved to 0.90 and 0.83, respectively. Based on the results in this study, SDF signals along with some local measured data can play an useful role in river flow estimation and especially in flood forecasting for data-scarce regions as it can simulate the peak discharge and peak time of flood events with satisfactory accuracy.

Development and Validation of Real-time PCR to Determine Branchiostegus japonicus and B. albus Species Based on Mitochondrial DNA (Real-time PCR 분석법을 이용한 옥돔과 옥두어의 종 판별법 개발)

  • Chung, In Young;Seo, Yong Bae;Yang, Ji-Young;Kim, Gun-Do
    • Journal of Life Science
    • /
    • v.27 no.11
    • /
    • pp.1331-1339
    • /
    • 2017
  • DNA barcoding is the identification of a species based on the DNA sequence of a fragment of the cytochrome C oxidase subunit I (COI) gene in the mitochondrial genome. It is widely applied to assist with the sustainable development of fishery-product resources and the protection of fish biodiversity. This study attempted to verify horse-head fish (Branchiostegus japonicus) and fake horse-head fish (Branchiostegus albus) species, which are commonly consumed in Korea. For the validation of the two species, a real-time PCR method was developed based on the species' mitochondrial DNA genome. Inter-species variations in mitochondrial DNA were observed in a bioinformatics analysis of the mitochondrial genomic DNA sequences of the two species. Some highly conserved regions and a few other regions were identified in the mitochondrial COI of the species. In order to test whether variations in the sequences were definitive, primers that targeted the varied regions of COI were designed and applied to amplify the DNA using the real-time PCR system. Threshold-cycle (Ct) range results confirmed that the Ct ranges of the real-time PCR were identical to the expected species of origin. Efficiency, specificity and cross-reactivity assays showed statistically significant differences between the average Ct of B. japonicus DNA ($21.85{\pm}3.599$) and the average Ct of B. albus DNA ($33.49{\pm}1.183$) for confirming B. japonicus. The assays also showed statistically significant differences between the average Ct of B. albus DNA ($22.49{\pm}0.908$) and the average Ct of B. japonicus DNA ($33.93{\pm}0.479$) for confirming B. albus. The methodology was validated by using ten commercial samples. The genomic DNA-based molecular technique that used the real-time PCR was a reliable method for the taxonomic classification of animal tissues.

Predicting the Performance of Recommender Systems through Social Network Analysis and Artificial Neural Network (사회연결망분석과 인공신경망을 이용한 추천시스템 성능 예측)

  • Cho, Yoon-Ho;Kim, In-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.16 no.4
    • /
    • pp.159-172
    • /
    • 2010
  • The recommender system is one of the possible solutions to assist customers in finding the items they would like to purchase. To date, a variety of recommendation techniques have been developed. One of the most successful recommendation techniques is Collaborative Filtering (CF) that has been used in a number of different applications such as recommending Web pages, movies, music, articles and products. CF identifies customers whose tastes are similar to those of a given customer, and recommends items those customers have liked in the past. Numerous CF algorithms have been developed to increase the performance of recommender systems. Broadly, there are memory-based CF algorithms, model-based CF algorithms, and hybrid CF algorithms which combine CF with content-based techniques or other recommender systems. While many researchers have focused their efforts in improving CF performance, the theoretical justification of CF algorithms is lacking. That is, we do not know many things about how CF is done. Furthermore, the relative performances of CF algorithms are known to be domain and data dependent. It is very time-consuming and expensive to implement and launce a CF recommender system, and also the system unsuited for the given domain provides customers with poor quality recommendations that make them easily annoyed. Therefore, predicting the performances of CF algorithms in advance is practically important and needed. In this study, we propose an efficient approach to predict the performance of CF. Social Network Analysis (SNA) and Artificial Neural Network (ANN) are applied to develop our prediction model. CF can be modeled as a social network in which customers are nodes and purchase relationships between customers are links. SNA facilitates an exploration of the topological properties of the network structure that are implicit in data for CF recommendations. An ANN model is developed through an analysis of network topology, such as network density, inclusiveness, clustering coefficient, network centralization, and Krackhardt's efficiency. While network density, expressed as a proportion of the maximum possible number of links, captures the density of the whole network, the clustering coefficient captures the degree to which the overall network contains localized pockets of dense connectivity. Inclusiveness refers to the number of nodes which are included within the various connected parts of the social network. Centralization reflects the extent to which connections are concentrated in a small number of nodes rather than distributed equally among all nodes. Krackhardt's efficiency characterizes how dense the social network is beyond that barely needed to keep the social group even indirectly connected to one another. We use these social network measures as input variables of the ANN model. As an output variable, we use the recommendation accuracy measured by F1-measure. In order to evaluate the effectiveness of the ANN model, sales transaction data from H department store, one of the well-known department stores in Korea, was used. Total 396 experimental samples were gathered, and we used 40%, 40%, and 20% of them, for training, test, and validation, respectively. The 5-fold cross validation was also conducted to enhance the reliability of our experiments. The input variable measuring process consists of following three steps; analysis of customer similarities, construction of a social network, and analysis of social network patterns. We used Net Miner 3 and UCINET 6.0 for SNA, and Clementine 11.1 for ANN modeling. The experiments reported that the ANN model has 92.61% estimated accuracy and 0.0049 RMSE. Thus, we can know that our prediction model helps decide whether CF is useful for a given application with certain data characteristics.

Validation of the Korean version of Center for Epidemiologic Studies Depression Scale-Revised(K-CESD-R) (한국판 역학연구 우울척도 개정판(K-CESD-R)의 표준화 연구)

  • Lee, San;Oh, Seung-Taek;Ryu, So Yeon;Jun, Jin Yong;Lee, Kounseok;Lee, Eun;Park, Jin Young;Yi, Sang-Wook;Choi, Won-Jung
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.24 no.1
    • /
    • pp.83-93
    • /
    • 2016
  • Objectives : The Center for Epidemiologic Studies Depression scale-Revised is a recently revised scale which has been reported as a valid tool for the assessment of depressive symptoms. It encompasses cardinal symptoms of depression described in the Diagnostic and Statistical Manual of Mental disorders, fourth edition. In this study, we assessed the reliability, validity and psychometric properties of the Korean version of the CESD-R(K-CESD-R). Methods : Forty-eight patients diagnosed as major depressive disorder, dysthymia, depressive disorder NOS according to the DSM-IV criteria using Mini International Neuropsychiatric Interview and 48 healthy controls were enrolled in this study. They were assessed with K-CESD-R, K-MADRS, PHQ-9, KQIDS-SR, STAI to check cross-validation. Statistical analyses were performed using calculation of Cronbach's alpha, Pearson correlation coefficient, Principal Component Analysis, ROC curve and optimal cut-off value. Results : The Cronbach's alpha of K-CESD-R was 0.98. The total score of K-CESD-R revealed significantly high correlations with those of K-MADRS, PHQ-9, KQIDS-SR(r=0.910, 0.966 and 0.920, p<0.001, respectively). Factor analysis showed two factors account for 76.29% of total variance. We suggested the optimal cut-off value of K-CESD-R as 13 according to analysis of the ROC curve which value sensitivity and specificity both equally. Conclusions : These Results showed that the K-CESD-R could be a reliable and valid scale to assess depressive symptoms. The K-CESD-R is expected as a useful and effective tool for screening and measuring depressive symptoms not only in outpatient clinic but also epidemiologic studies.

Waterbody Detection for the Reservoirs in South Korea Using Swin Transformer and Sentinel-1 Images (Swin Transformer와 Sentinel-1 영상을 이용한 우리나라 저수지의 수체 탐지)

  • Soyeon Choi;Youjeong Youn;Jonggu Kang;Seoyeon Kim;Yemin Jeong;Yungyo Im;Youngmin Seo;Wanyub Kim;Minha Choi;Yangwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.949-965
    • /
    • 2023
  • In this study, we propose a method to monitor the surface area of agricultural reservoirs in South Korea using Sentinel-1 synthetic aperture radar images and the deep learning model, Swin Transformer. Utilizing the Google Earth Engine platform, datasets from 2017 to 2021 were constructed for seven agricultural reservoirs, categorized into 700 K-ton, 900 K-ton, and 1.5 M-ton capacities. For four of the reservoirs, a total of 1,283 images were used for model training through shuffling and 5-fold cross-validation techniques. Upon evaluation, the Swin Transformer Large model, configured with a window size of 12, demonstrated superior semantic segmentation performance, showing an average accuracy of 99.54% and a mean intersection over union (mIoU) of 95.15% for all folds. When the best-performing model was applied to the datasets of the remaining three reservoirsfor validation, it achieved an accuracy of over 99% and mIoU of over 94% for all reservoirs. These results indicate that the Swin Transformer model can effectively monitor the surface area of agricultural reservoirs in South Korea.

Retrieval of Hourly Aerosol Optical Depth Using Top-of-Atmosphere Reflectance from GOCI-II and Machine Learning over South Korea (GOCI-II 대기상한 반사도와 기계학습을 이용한 남한 지역 시간별 에어로졸 광학 두께 산출)

  • Seyoung Yang;Hyunyoung Choi;Jungho Im
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_3
    • /
    • pp.933-948
    • /
    • 2023
  • Atmospheric aerosols not only have adverse effects on human health but also exert direct and indirect impacts on the climate system. Consequently, it is imperative to comprehend the characteristics and spatiotemporal distribution of aerosols. Numerous research endeavors have been undertaken to monitor aerosols, predominantly through the retrieval of aerosol optical depth (AOD) via satellite-based observations. Nonetheless, this approach primarily relies on a look-up table-based inversion algorithm, characterized by computationally intensive operations and associated uncertainties. In this study, a novel high-resolution AOD direct retrieval algorithm, leveraging machine learning, was developed using top-of-atmosphere reflectance data derived from the Geostationary Ocean Color Imager-II (GOCI-II), in conjunction with their differences from the past 30-day minimum reflectance, and meteorological variables from numerical models. The Light Gradient Boosting Machine (LGBM) technique was harnessed, and the resultant estimates underwent rigorous validation encompassing random, temporal, and spatial N-fold cross-validation (CV) using ground-based observation data from Aerosol Robotic Network (AERONET) AOD. The three CV results consistently demonstrated robust performance, yielding R2=0.70-0.80, RMSE=0.08-0.09, and within the expected error (EE) of 75.2-85.1%. The Shapley Additive exPlanations(SHAP) analysis confirmed the substantial influence of reflectance-related variables on AOD estimation. A comprehensive examination of the spatiotemporal distribution of AOD in Seoul and Ulsan revealed that the developed LGBM model yielded results that are in close concordance with AERONET AOD over time, thereby confirming its suitability for AOD retrieval at high spatiotemporal resolution (i.e., hourly, 250 m). Furthermore, upon comparing data coverage, it was ascertained that the LGBM model enhanced data retrieval frequency by approximately 8.8% in comparison to the GOCI-II L2 AOD products, ameliorating issues associated with excessive masking over very illuminated surfaces that are often encountered in physics-based AOD retrieval processes.

Bioequivqlence of Gabarep Tablet to Neurotin Tablet (Gabapentin 800 mg) (가바렙정 (가바펜틴 800 mg)의 생물학적 동등성 평가)

  • Seo, Young-Hwan;Jeong, Ju-Cheol;Lee, Jae-Young;Li, Zheng-Yi;Yoon, Hyoung-Jong;Sohn, Uy-Dong;Bang, Joon-Seok;Kim, Ho-Hyun;Jeong, Ji-Hoon
    • Journal of Pharmaceutical Investigation
    • /
    • v.38 no.4
    • /
    • pp.261-267
    • /
    • 2008
  • The aim of the present study was to evaluate the bioequivalence of two gabapentin preparations. We used Neurontin tablet 800 mg (Pfizer Korea Inc.) as a reference drug for bioequivalence of Gabalep tablet 800 mg (Chong Kun Dang Pharmaceutical Co., Korea), and performed this whole study according to the guidelines of Korea Food and Drug Administration (KFDA). Twenty five healthy male volunteers were administered with each drug in a randomized $2{\times}2$ cross-over study with one week washout interval. After drug administration, blood was taken at predetermined time intervals ($0{\sim}24$ hours) and the concentrations of gabapentin in serum were determined using an high performance liquid chromatography-tandem mass spectrometer (LC-MS/MS) employing electrospray ionization technique and operating in multiple reaction mornitoring (MRM). The analytical method was validated in specificity, accuracy, precision and linearity. The phar-macokinetic parameters such as AUCt and Cmax were calculated and ANOVA test was utilized for the statistical analysis of the parameters using logarithmically transformed AUCt and Cmax. $Mean{\pm}SD$. of AUCt and Cmax value for reference drug and test drug were $29.94{\pm}9.23\;({\mu}g/mL{\cdot}hr)$ and $3.12{\pm}1.11\;({\mu}g/mL{\cdot}hr)$, and $31.48{\pm}9.77\;({\mu}g/mL{\cdot}hr)$ and $3.15{\pm}1.03\;({\mu}g/mL)$, respectively. The 90% confidence intervals using logarithmically transformed data were within the acceptance range of log(0.8) to log(1.25) for AUCt and Cmax, respectively. These results indicate that Gabalep tablet 800 mg is bioequivalent to Neurontin tablet 800 mg.

Application of groundwater-level prediction models using data-based learning algorithms to National Groundwater Monitoring Network data (자료기반 학습 알고리즘을 이용한 지하수위 변동 예측 모델의 국가지하수관측망 자료 적용에 대한 비교 평가 연구)

  • Yoon, Heesung;Kim, Yongcheol;Ha, Kyoochul;Kim, Gyoo-Bum
    • The Journal of Engineering Geology
    • /
    • v.23 no.2
    • /
    • pp.137-147
    • /
    • 2013
  • For the effective management of groundwater resources, it is necessary to predict groundwater level fluctuations in response to rainfall events. In the present study, time series models using artificial neural networks (ANNs) and support vector machines (SVMs) have been developed and applied to groundwater level data from the Gasan, Shingwang, and Cheongseong stations of the National Groundwater Monitoring Network. We designed four types of model according to input structure and compared their performances. The results show that the rainfall input model is not effective, especially for the prediction of groundwater recession behavior; however, the rainfall-groundwater input model is effective for the entire prediction stage, yielding a high model accuracy. Recursive prediction models were also effective, yielding correlation coefficients of 0.75-0.95 with observed values. The prediction errors were highest for Shingwang station, where the cross-correlation coefficient is lowest among the stations. Overall, the model performance of SVM models was slightly higher than that of ANN models for all cases. Assessment of the model parameter uncertainty of the recursive prediction models, using the ratio of errors in the validation stage to that in the calibration stage, showed that the range of the ratio is much narrower for the SVM models than for the ANN models, which implies that the SVM models are more stable and effective for the present case studies.