• Title/Summary/Keyword: cross-validation method

Search Result 498, Processing Time 0.033 seconds

Enhancement of Geomorphology Generation for the Front Land of Levee Using Aerial Photograph (항공영상을 연계한 하천 제외지의 지형분석 개선 기법)

  • Lee, Geun Sang;Lee, Hyun Seok;Hwang, Eui Ho;Koh, Deuk Koo
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.3D
    • /
    • pp.407-415
    • /
    • 2008
  • This study presents the methodology to link with aerial photos for advancing the accuracy of topographic survey data that is used to calculate water volume in urban stream. First, GIS spatial interpolation technique as Inverse Distance Weight (IDW) and Kriging was applied to construct the terrain morphology to the sand-bar and grass area using cross-sectional survey data, and also validation point data was used to estimate the accuracy of created topographic data. As the result of comparison, IDW ($d^{-2}_{ij}$, 2nd square number) in Sand-bar area and Kriging Spherical model in grass area showed more efficient results in the construction of topographic data of river boundary. But the differences among interpolation methods are very slight. Image classification method, Minimum Distance Method (MDM) was applied to extract sand-bar and grass area that are located to river boundary efficiently and the elevation value of extracted layers was allocated to the water level point value. Water volume with topographic data from aerial photos shows the advanced accuracy of 13% (in sand-bar) and 12% (in grass) compared to the water volume of original terrain data. Therefore, terrain analysis method in river linking with aerial photos is efficient to the monitoring about sand-bar and grass area that are located in the downstream of Dam in flooding season, and also it can be applied to calculate water volume efficiently.

Comparative Assessment of Linear Regression and Machine Learning for Analyzing the Spatial Distribution of Ground-level NO2 Concentrations: A Case Study for Seoul, Korea (서울 지역 지상 NO2 농도 공간 분포 분석을 위한 회귀 모델 및 기계학습 기법 비교)

  • Kang, Eunjin;Yoo, Cheolhee;Shin, Yeji;Cho, Dongjin;Im, Jungho
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.6_1
    • /
    • pp.1739-1756
    • /
    • 2021
  • Atmospheric nitrogen dioxide (NO2) is mainly caused by anthropogenic emissions. It contributes to the formation of secondary pollutants and ozone through chemical reactions, and adversely affects human health. Although ground stations to monitor NO2 concentrations in real time are operated in Korea, they have a limitation that it is difficult to analyze the spatial distribution of NO2 concentrations, especially over the areas with no stations. Therefore, this study conducted a comparative experiment of spatial interpolation of NO2 concentrations based on two linear-regression methods(i.e., multi linear regression (MLR), and regression kriging (RK)), and two machine learning approaches (i.e., random forest (RF), and support vector regression (SVR)) for the year of 2020. Four approaches were compared using leave-one-out-cross validation (LOOCV). The daily LOOCV results showed that MLR, RK, and SVR produced the average daily index of agreement (IOA) of 0.57, which was higher than that of RF (0.50). The average daily normalized root mean square error of RK was 0.9483%, which was slightly lower than those of the other models. MLR, RK and SVR showed similar seasonal distribution patterns, and the dynamic range of the resultant NO2 concentrations from these three models was similar while that from RF was relatively small. The multivariate linear regression approaches are expected to be a promising method for spatial interpolation of ground-level NO2 concentrations and other parameters in urban areas.

Prediction of patent lifespan and analysis of influencing factors using machine learning (기계학습을 활용한 특허수명 예측 및 영향요인 분석)

  • Kim, Yongwoo;Kim, Min Gu;Kim, Young-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.2
    • /
    • pp.147-170
    • /
    • 2022
  • Although the number of patent which is one of the core outputs of technological innovation continues to increase, the number of low-value patents also hugely increased. Therefore, efficient evaluation of patents has become important. Estimation of patent lifespan which represents private value of a patent, has been studied for a long time, but in most cases it relied on a linear model. Even if machine learning methods were used, interpretation or explanation of the relationship between explanatory variables and patent lifespan was insufficient. In this study, patent lifespan (number of renewals) is predicted based on the idea that patent lifespan represents the value of the patent. For the research, 4,033,414 patents applied between 1996 and 2017 and finally granted were collected from USPTO (US Patent and Trademark Office). To predict the patent lifespan, we use variables that can reflect the characteristics of the patent, the patent owner's characteristics, and the inventor's characteristics. We build four different models (Ridge Regression, Random Forest, Feed Forward Neural Network, Gradient Boosting Models) and perform hyperparameter tuning through 5-fold Cross Validation. Then, the performance of the generated models are evaluated, and the relative importance of predictors is also presented. In addition, based on the Gradient Boosting Model which have excellent performance, Accumulated Local Effects Plot is presented to visualize the relationship between predictors and patent lifespan. Finally, we apply Kernal SHAP (SHapley Additive exPlanations) to present the evaluation reason of individual patents, and discuss applicability to the patent evaluation system. This study has academic significance in that it cumulatively contributes to the existing patent life estimation research and supplements the limitations of existing patent life estimation studies based on linearity. It is academically meaningful that this study contributes cumulatively to the existing studies which estimate patent lifespan, and that it supplements the limitations of linear models. Also, it is practically meaningful to suggest a method for deriving the evaluation basis for individual patent value and examine the applicability to patent evaluation systems.

Forecasting the Precipitation of the Next Day Using Deep Learning (딥러닝 기법을 이용한 내일강수 예측)

  • Ha, Ji-Hun;Lee, Yong Hee;Kim, Yong-Hyuk
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.26 no.2
    • /
    • pp.93-98
    • /
    • 2016
  • For accurate precipitation forecasts the choice of weather factors and prediction method is very important. Recently, machine learning has been widely used for forecasting precipitation, and artificial neural network, one of machine learning techniques, showed good performance. In this paper, we suggest a new method for forecasting precipitation using DBN, one of deep learning techniques. DBN has an advantage that initial weights are set by unsupervised learning, so this compensates for the defects of artificial neural networks. We used past precipitation, temperature, and the parameters of the sun and moon's motion as features for forecasting precipitation. The dataset consists of observation data which had been measured for 40 years from AWS in Seoul. Experiments were based on 8-fold cross validation. As a result of estimation, we got probabilities of test dataset, so threshold was used for the decision of precipitation. CSI and Bias were used for indicating the precision of precipitation. Our experimental results showed that DBN performed better than MLP.

Development of Nondestructive Detection Method for Adulterated Powder Products Using Raman Spectroscopy and Partial Least Squares Regression (라만 분광법과 부분최소자승법을 이용한 불량 분말식품 비파괴검사 기술 개발)

  • Lee, Sangdae;Lohumi, Santosh;Cho, Byoung-Kwan;Kim, Moon S.;Lee, Soo-Hee
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.34 no.4
    • /
    • pp.283-289
    • /
    • 2014
  • This study was conducted to develop a non-destructive detection method for adulterated powder products using Raman spectroscopy and partial least squares regression(PLSR). Garlic and ginger powder, which are used as natural seasoning and in health supplement foods, were selected for this experiment. Samples were adulterated with corn starch in concentrations of 5-35%. PLSR models for adulterated garlic and ginger powders were developed and their performances evaluated using cross validation. The $R^2_c$ and SEC of an optimal PLSR model were 0.99 and 2.16 for the garlic powder samples, and 0.99 and 0.84 for the ginger samples, respectively. The variable importance in projection (VIP) score is a useful and simple tool for the evaluation of the importance of each variable in a PLSR model. After the VIP scores were taken pre-selection, the Raman spectrum data was reduced by one third. New PLSR models, based on a reduced number of wavelengths selected by the VIP scores technique, gave good predictions for the adulterated garlic and ginger powder samples.

A Study on the Optimal Discriminant Model Predicting the likelihood of Insolvency for Technology Financing (기술금융을 위한 부실 가능성 예측 최적 판별모형에 대한 연구)

  • Sung, Oong-Hyun
    • Journal of Korea Technology Innovation Society
    • /
    • v.10 no.2
    • /
    • pp.183-205
    • /
    • 2007
  • An investigation was undertaken of the optimal discriminant model for predicting the likelihood of insolvency in advance for medium-sized firms based on the technology evaluation. The explanatory variables included in the discriminant model were selected by both factor analysis and discriminant analysis using stepwise selection method. Five explanatory variables were selected in factor analysis in terms of explanatory ratio and communality. Six explanatory variables were selected in stepwise discriminant analysis. The effectiveness of linear discriminant model and logistic discriminant model were assessed by the criteria of the critical probability and correct classification rate. Result showed that both model had similar correct classification rate and the linear discriminant model was preferred to the logistic discriminant model in terms of criteria of the critical probability In case of the linear discriminant model with critical probability of 0.5, the total-group correct classification rate was 70.4% and correct classification rates of insolvent and solvent groups were 73.4% and 69.5% respectively. Correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify the present sample. However, the actual correct classification rate is an estimate of the probability that the estimated discriminant function will correctly classify a future observation. Unfortunately, the correct classification rate underestimates the actual correct classification rate because the data set used to estimate the discriminant function is also used to evaluate them. The cross-validation method were used to estimate the bias of the correct classification rate. According to the results the estimated bias were 2.9% and the predicted actual correct classification rate was 67.5%. And a threshold value is set to establish an in-doubt category. Results of linear discriminant model can be applied for the technology financing banks to evaluate the possibility of insolvency and give the ranking of the firms applied.

  • PDF

Mixed dentition analysis using a multivariate approach (다변량 기법을 이용한 혼합치열기 분석법)

  • Seo, Seung-Hyun;An, Hong-Seok;Lee, Shin-Jae;Lim, Won Hee;Kim, Bong-Rae
    • The korean journal of orthodontics
    • /
    • v.39 no.2
    • /
    • pp.112-119
    • /
    • 2009
  • Objective: To develop a mixed dentition analysis method in consideration of the normal variation of tooth sizes. Methods: According to the tooth-size of the maxillary central incisor, maxillary 1st molar, mandibular central incisor, mandibular lateral incisor, and mandibular 1st molar, 307 normal occlusion subjects were clustered into the smaller and larger tooth-size groups. Multiple regression analyses were then performed to predict the sizes of the canine and premolars for the 2 groups and both genders separately. For a cross validation dataset, 504 malocclusion patients were assigned into the 2 groups. Then multiple regression equations were applied. Results: Our results show that the maximum errors of the predicted space for the canine, 1st and 2nd premolars were 0.71 and 0.82 mm residual standard deviation for the normal occlusion and malocclusion groups, respectively. For malocclusion patients, the prediction errors did not imply a statistically significant difference depending on the types of malocclusion nor the types of tooth-size groups. The frequency of prediction error more than 1 mm and 2 mm were 17.3% and 1.8%, respectively. The overall prediction accuracy was dramatically improved in this study compared to that of previous studies. Conclusions: The computer aided calculation method used in this study appeared to be more efficient.

Analysis of the CREOLE experiment on the reactivity temperature coefficient of the UO2 light water moderated lattices using Monte Carlo transport calculations and ENDF/B-VII.1 nuclear data library

  • El Ouahdani, S.;Erradi, L.;Boukhal, H.;Chakir, E.;El Bardouni, T.;Boulaich, Y.;Ahmed, A.
    • Nuclear Engineering and Technology
    • /
    • v.52 no.6
    • /
    • pp.1120-1130
    • /
    • 2020
  • The CREOLE experiment performed In the EOLE critical facility located In the Nuclear Center of CADARACHE - CEA have allowed us to get interesting and complete experimental information on the temperature effects in the light water reactor lattices. To analyze these experiments with accuracy an elaborate calculation scheme using the Monte Carlo method implemented in the MCNP6.1 code and the ENDF/B-VII.1 cross section library has been developed. We have used the ENDF/B-VII.1 data provided with the MCNP6.1.1 version in ACE format and the Makxsf utility to handle the data in the specific temperatures not available in the MCNP6.1.1 original library. The main purpose of this analysis is the qualification of the ENDF/B-VII.1 nuclear data for the prediction of the Reactivity Temperature Coefficient while ensuring the ability of the MCNP6.1 system to model such a complex experiment as CREOLE. We have analyzed the case of UO2 lattice with 1166 ppm of boron in ordinary water moderator in specified temperatures. A detailed comparison of the calculated effective multiplication factors with the reference ones [1] in room temperature presented in this work shows a good agreement demonstrating the validation of our 3D calculation model. The discrepancies between calculations and the differential measurements of the Reactivity Temperature Coefficient for the analyzed configuration are relatively small: the maximum discrepancy doesn't exceed 1,1 pcm/℃. In addition to the analysis of direct differential measurements of the reactivity temperature coefficient performed in the poisoned UO2 lattice configuration, we have also analyzed integral measurements in UO2 clean lattice configuration using equivalency of the integral temperature reactivity worth with the driver core fuel reactivity worth and soluble boron reactivity worth. In this case both of the ENDF/B-VII.1 and JENDL.4 libraries were used in our analysis and the obtained results are very similar.

Development of a Classification Model for Driver's Drowsiness and Waking Status Using Heart Rate Variability and Respiratory Features

  • Kim, Sungho;Choi, Booyong;Cho, Taehwan;Lee, Yongkyun;Koo, Hyojin;Kim, Dongsoo
    • Journal of the Ergonomics Society of Korea
    • /
    • v.35 no.5
    • /
    • pp.371-381
    • /
    • 2016
  • Objective:This study aims to evaluate the features of heart rate variability (HRV) and respiratory signals as indices for a driver's drowsiness and waking status in order to develop the classification model for a driver's drowsiness and waking status using those features. Background: Driver's drowsiness is one of the major causal factors for traffic accidents. This study hypothesized that the application of combined bio-signals to monitor the alertness level of drivers would improve the effectiveness of the classification techniques of driver's drowsiness. Method: The features of three heart rate variability (HRV) measurements including low frequency (LF), high frequency (HF), and LF/HF ratio and two respiratory measurements including peak and rate were acquired by the monotonous car driving simulation experiments using the photoplethysmogram (PPG) and respiration sensors. The experiments were repeated a total of 50 times on five healthy male participants in their 20s to 50s. The classification model was developed by selecting the optimal measurements, applying a binary logistic regression method and performing 3-fold cross validation. Results: The power of LF, HF, and LF/HF ratio, and the respiration peak of drowsiness status were reduced by 38%, 22%, 31%, and 7%, compared to those of waking status, while respiration rate was increased by 3%. The classification sensitivity of the model using both HRV and respiratory features (91.4%) was improved, compared to that of the model using only HRV feature (89.8%) and that using only respiratory feature (83.6%). Conclusion: This study suggests that the classification of driver's drowsiness and waking status may be improved by utilizing a combination of HRV and respiratory features. Application: The results of this study can be applied to the development of driver's drowsiness prevention systems.

The Sub Authentication Method For Driver Using Driving Patterns (운전 패턴을 이용한 운전자 보조 인증방법)

  • Jeong, Jong-Myoung;Kang, Hyung Chul;Jo, Hyo Jin;Yoon, Ji Won;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.23 no.5
    • /
    • pp.919-929
    • /
    • 2013
  • Recently, a variety of IT technologies are applied to the vehicle. However, some vehicle-IT technologies without security considerations may cause security problems. Specially, some researches about a smart key system applied to automobiles for authentication show that the system is insecure from replay attacks and modification attacks using a wireless signal of the smart key. Thus, in this paper, we propose an authentication method for the driver by using driving patterns. Nowadays, we can obtain driving patterns using the In-vehicle network data. In our authentication model, we make driving ppatterns of car owner using standard normal distribution and apply these patterns to driver authentication. To validate our model, we perform an k-fold cross validation test using In-vehicle network data and obtain the result(true positive rate 0.7/false positive rate is 0.35). Considering to our result, it turns out that our model is more secure than existing 'what you have' authentication models such as the smart key if the authentication result is sent to the car owner through mobile networks.