• Title/Summary/Keyword: Validation data set

Search Result 383, Processing Time 0.024 seconds

A Profit Prediction Model in the International Construction Market - focusing on Small and Medium Sized Construction Companies (CBR을 활용한 해외건설 수익성 예측 모델 개발 - 중소·중견기업을 중심으로 -)

  • Hwang, Geon Wook;Jang, woosik;Park, Chan-Young;Han, Seung-Heon;Kim, Jong Sung
    • Korean Journal of Construction Engineering and Management
    • /
    • v.16 no.4
    • /
    • pp.50-59
    • /
    • 2015
  • While the international construction industry for Korean companies have grown in market size exponentially in the recent years, the profit rate of small and medium sized construction companies (SMCCs) are incomparably lower than those of large construction companies. Furthermore, small and medium size companies, especially subcontractor, lacks the judgement of project involvement appropriateness, which leads to an unpredictable profit rate. Therefore, this research aims to create a profit rate prediction model for the international construction project focusing on SMCCs. First, the factors that influence the profit rate and the area of profit zone are defined by using a total of 8,637 projects since the year 1965. Seconds, an extensive literature review is conducted to derive 10 influencing factors. Multiple regression analysis and corresponding judgement technique are used to derive the weight of each factor. Third, cased based reasoning (CBR) methodology is applied to develop the model for profit rate analysis in the project participation review stage. Using 120 validation data set, the developed model showed 11% (14 data sets) of error rate for type 1 and type 2 error. In utilizing the result, project decision makers are able to make decision based on authentic results instead of intuitive based decisions. The model additionally give guidance to the Korean subcontractors when advancing into the international construction based on the model result that shows the profit distribution and checks in advance for the quality of the project to secure a sound profit in each project.

A Classification Method of Delirium Patients Using Local Covering-Based Rule Acquisition Approach with Rough Lower Approximation (러프 하한 근사를 갖는 로컬 커버링 기반 규칙 획득 기법을 이용한 섬망 환자의 분류 방법)

  • Son, Chang Sik;Kang, Won Seok;Lee, Jong Ha;Moon, Kyoung Ja
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.4
    • /
    • pp.137-144
    • /
    • 2020
  • Delirium is among the most common mental disorders encountered in patients with a temporary cognitive impairment such as consciousness disorder, attention disorder, and poor speech, particularly among those who are older. Delirium is distressing for patients and families, can interfere with the management of symptoms such as pain, and is associated with increased elderly mortality. The purpose of this paper is to generate useful clinical knowledge that can be used to distinguish the outcomes of patients with delirium in long-term care facilities. For this purpose, we extracted the clinical classification knowledge associated with delirium using a local covering rule acquisition approach with the rough lower approximation region. The clinical applicability of the proposed method was verified using data collected from a prospective cohort study. From the results of this study, we found six useful clinical pieces of evidence that the duration of delirium could more than 12 days. Also, we confirmed eight factors such as BMI, Charlson Comorbidity Index, hospitalization path, nutrition deficiency, infection, sleep disturbance, bed scores, and diaper use are important in distinguishing the outcomes of delirium patients. The classification performance of the proposed method was verified by comparison with three benchmarking models, ANN, SVM with RBF kernel, and Random Forest, using a statistical five-fold cross-validation method. The proposed method showed an improved average performance of 0.6% and 2.7% in both accuracy and AUC criteria when compared with the SVM model with the highest classification performance of the three models respectively.

Groundwater Recharge Estimation for the Gyeongan-cheon Watershed with MIKE SHE Modeling System (MIKE SHE 모형을 이용한 경안천 유역의 지하수 함양량 산정)

  • Kim, Chul-Gyum;Kim, Hyeon-Jun;Jang, Cheol-Hee;Im, Sang-Jun
    • Journal of Korea Water Resources Association
    • /
    • v.40 no.6 s.179
    • /
    • pp.459-468
    • /
    • 2007
  • To estimate the groundwater recharge, the fully distributed parameter based model, MIKE SHE was applied to the Gyeongan-cheon watershed which is one of the tributaries of Han River Basin, and covers approximately $260km^2$ with about 49 km main stream length. To set up the model, spatial data such as topography, land use, soil, and meteorological data were compiled, and grid size of 200m was applied considering computer ability and reliability of the results. The model was calibrated and validated using a split sample procedure against 4-year daily stream flows at the outlet of the watershed. Statistical criteria for the calibration and validation results indicated a good agreement between the simulated and observed stream flows. The annual recharges calculated from the model were compared with the values from the conventional groundwater recession curve method, and the simulated groundwater levels were compared with the observed values. As a result, it was concluded that the model could reasonably simulate the groundwater level and recharge, and could be a useful tool for estimating spatially/temporally the groundwater recharges, and enhancing the analysis of the watershed water cycle.

A Study on the Development and Validation of an Assessment Tool of KBU Core Competency (기독교대학의 핵심역량 측정 도구 개발 사례 연구 : K대학교를 중심으로)

  • Lee, Seong Ah;Nam, Sunwoo;Lee, Eun Chul
    • Journal of Christian Education in Korea
    • /
    • v.62
    • /
    • pp.187-225
    • /
    • 2020
  • The purpose of this study is to develop a assessment tool that can be used to measure core competencies by describing operational definitions of the six core competencies of KBU and constructing sub-factors accordingly. The followings are the results of this study. First, this study analyzed previous research on KBU 6 competencies and set operational definitions through Bible verses describing KBU core competencies. Second, the sub-elements according to the definition of core competency were selected through previous research and the drafting of assessment tools for each sub-element was developed. And then, confirmed the validity of the tool as a suitable tool for KBU, through a review of professors who are fluent in KBU's founding philosophy and policies. The assessment tools were revised based on feedback from the professors. Third, in order to validate the assessment tool, the researchers conducted a survey on students in the second semester of 2018. The data was collected, the reliability was checked based on the data, the validity was verified through EFA and CFA, and the final assessment tool was confirmed. In conclusion, this study is meaningful in developing the KBU core competency assessment tool, and it is expected that will be able to systematically manage the KBU competency of students.

Development of a surrogate model based on temperature for estimation of evapotranspiration and its use for drought index applicability assessment (증발산 산정을 위한 온도기반의 대체모형 개발 및 가뭄지수 적용성 평가)

  • Kim, Ho-Jun;Kim, Kyoungwook;Kwon, Hyun-Han
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.11
    • /
    • pp.969-983
    • /
    • 2021
  • Evapotranspiration, one of the hydrometeorological components, is considered an important variable for water resource planning and management and is primarily used as input data for hydrological models such as water balance models. The FAO56 PM method has been recommended as a standard approach to estimate the reference evapotranspiration with relatively high accuracy. However, the FAO56 PM method is often challenging to apply because it requires considerable hydrometeorological variables. In this perspective, the Hargreaves equation has been widely adopted to estimate the reference evapotranspiration. In this study, a set of parameters of the Hargreaves equation was calibrated with relatively long-term data within a Bayesian framework. Statistical index (CC, RMSE, IoA) is used to validate the model. RMSE for monthly results reduced from 7.94 ~ 24.91 mm/month to 7.94 ~ 24.91 mm/month for the validation period. The results confirmed that the accuracy was significantly improved compared to the existing Hargreaves equation. Further, the evaporative demand drought index (EDDI) based on the evaporative demand (E0) was proposed. To confirm the effectiveness of the EDDI, this study evaluated the estimated EDDI for the recent drought events from 2014 to 2015 and 2018, along with precipitation and SPI. As a result of the evaluation of the Han-river watershed in 2018, the weekly EDDI increased to more than 2 and it was confirmed that EDDI more effectively detects the onset of drought caused by heatwaves. EDDI can be used as a drought index, particularly for heatwave-driven flash drought monitoring and along with SPI.

A modified U-net for crack segmentation by Self-Attention-Self-Adaption neuron and random elastic deformation

  • Zhao, Jin;Hu, Fangqiao;Qiao, Weidong;Zhai, Weida;Xu, Yang;Bao, Yuequan;Li, Hui
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.1-16
    • /
    • 2022
  • Despite recent breakthroughs in deep learning and computer vision fields, the pixel-wise identification of tiny objects in high-resolution images with complex disturbances remains challenging. This study proposes a modified U-net for tiny crack segmentation in real-world steel-box-girder bridges. The modified U-net adopts the common U-net framework and a novel Self-Attention-Self-Adaption (SASA) neuron as the fundamental computing element. The Self-Attention module applies softmax and gate operations to obtain the attention vector. It enables the neuron to focus on the most significant receptive fields when processing large-scale feature maps. The Self-Adaption module consists of a multiplayer perceptron subnet and achieves deeper feature extraction inside a single neuron. For data augmentation, a grid-based crack random elastic deformation (CRED) algorithm is designed to enrich the diversities and irregular shapes of distributed cracks. Grid-based uniform control nodes are first set on both input images and binary labels, random offsets are then employed on these control nodes, and bilinear interpolation is performed for the rest pixels. The proposed SASA neuron and CRED algorithm are simultaneously deployed to train the modified U-net. 200 raw images with a high resolution of 4928 × 3264 are collected, 160 for training and the rest 40 for the test. 512 × 512 patches are generated from the original images by a sliding window with an overlap of 256 as inputs. Results show that the average IoU between the recognized and ground-truth cracks reaches 0.409, which is 29.8% higher than the regular U-net. A five-fold cross-validation study is performed to verify that the proposed method is robust to different training and test images. Ablation experiments further demonstrate the effectiveness of the proposed SASA neuron and CRED algorithm. Promotions of the average IoU individually utilizing the SASA and CRED module add up to the final promotion of the full model, indicating that the SASA and CRED modules contribute to the different stages of model and data in the training process.

Toxicity of Organophosphorus Flame Retardants (OPFRs) and Their Mixtures in Aliivibrio fischeri and Human Hepatocyte HepG2 (인체 간세포주 HepG2 및 발광박테리아를 활용한 유기인계 난연제와 그 혼합물의 독성 스크리닝)

  • Sunmi Kim;Kyounghee Kang;Jiyun Kim;Minju Na;Jiwon Choi
    • Journal of Environmental Health Sciences
    • /
    • v.49 no.2
    • /
    • pp.89-98
    • /
    • 2023
  • Background: Organophosphorus flame retardants (OPFRs) are a group of chemical substances used in building materials and plastic products to suppress or mitigate the combustion of materials. Although OPFRs are generally used in mixed form, information on their mixture toxicity is quite scarce. Objectives: This study aims to elucidate the toxicity and determine the types of interaction (e.g., synergistic, additive, and antagonistic effect) of OPFRs mixtures. Methods: Nine organophosphorus flame retardants, including TEHP (tris(2-ethylhexyl) phosphate) and TDCPP (tris(1,3-dichloro-2-propyl) phosphate), were selected based on indoor dust measurement data in South Korea. Nine OPFRs were exposed to the luminescent bacteria Aliivibrio fischeri for 30 minutes and the human hepatocyte cell line HepG2 for 48 hours. Chemicals with significant toxicity were only used for mixture toxicity tests in HepG2. In addition, the observed ECx values were compared with the predicted toxicity values in the CA (concentration addition) prediction model, and the MDR (model deviation ratio) was calculated to determine the type of interaction. Results: Only four chemicals showed significant toxicity in the luminescent bacteria assays. However, EC50 values were derived for seven out of nine OPFRs in the HepG2 assays. In the HepG2 assays, the highest to lowest EC50 were in the order of the molecular weight of the target chemicals. In the further mixture tests, most binary mixtures show additive interactions except for the two combinations that have TPhP (triphenyl phosphate), i.e., TPhP and TDCPP, and TPhP and TBOEP (tris(2-butoxyethyl) phosphate). Conclusions: Our data shows OPFR mixtures usually have additivity; however, more research is needed to find out the reason for the synergistic effect of TPhP. Also, the mixture experimental dataset can be used as a training and validation set for developing the mixture toxicity prediction model as a further step.

Analysis of Precipitation Characteristics of Regional Climate Model for Climate Change Impacts on Water Resources (기후변화에 따른 수자원 영향 평가를 위한 Regional Climate Model 강수 계열의 특성 분석)

  • Kwon, Hyun-Han;Kim, Byung-Sik;Kim, Bo-Kyung
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.28 no.5B
    • /
    • pp.525-533
    • /
    • 2008
  • Global circulation models (GCMs) have been used to study impact of climate change on water resources for hydrologic models as inputs. Recently, regional circulation models (RCMs) have been used widely for climate change study, but the RCMs have been rarely used in the climate change impacts on water resources in Korea. Therefore, this study is intended to use a set of climate scenarios derived by RegCM3 RCM ($27km{\times}27km$), which is operated by Korea Meteorological Administration. To begin with, the RCM precipitation data surrounding major rainfall stations are extracted to assess validation of the scenarios in terms of reproducing low frequency behavior. A comprehensive comparison between observation and precipitation scenario is performed through statistical analysis, wavelet transform analysis and EOF analysis. Overall analysis confirmed that the precipitation data driven by RegCM3 shows capabilities in simulating hydrological low frequency behavior and reproducing spatio-temporal patterns. However, it is found that spatio-temporal patterns are slightly biased and amplitudes (variances) from the RCMs precipitation tend to be lower than the observations. Therefore, a bias correction scheme to correct the systematic bias needs to be considered in case the RCMs are applied to water resources assessment under climate change.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.

A New Exploratory Research on Franchisor's Provision of Exclusive Territories (가맹본부의 배타적 영업지역보호에 대한 탐색적 연구)

  • Lim, Young-Kyun;Lee, Su-Dong;Kim, Ju-Young
    • Journal of Distribution Research
    • /
    • v.17 no.1
    • /
    • pp.37-63
    • /
    • 2012
  • In franchise business, exclusive sales territory (sometimes EST in table) protection is a very important issue from an economic, social and political point of view. It affects the growth and survival of both franchisor and franchisee and often raises issues of social and political conflicts. When franchisee is not familiar with related laws and regulations, franchisor has high chance to utilize it. Exclusive sales territory protection by the manufacturer and distributors (wholesalers or retailers) means sales area restriction by which only certain distributors have right to sell products or services. The distributor, who has been granted exclusive sales territories, can protect its own territory, whereas he may be prohibited from entering in other regions. Even though exclusive sales territory is a quite critical problem in franchise business, there is not much rigorous research about the reason, results, evaluation, and future direction based on empirical data. This paper tries to address this problem not only from logical and nomological validity, but from empirical validation. While we purse an empirical analysis, we take into account the difficulties of real data collection and statistical analysis techniques. We use a set of disclosure document data collected by Korea Fair Trade Commission, instead of conventional survey method which is usually criticized for its measurement error. Existing theories about exclusive sales territory can be summarized into two groups as shown in the table below. The first one is about the effectiveness of exclusive sales territory from both franchisor and franchisee point of view. In fact, output of exclusive sales territory can be positive for franchisors but negative for franchisees. Also, it can be positive in terms of sales but negative in terms of profit. Therefore, variables and viewpoints should be set properly. The other one is about the motive or reason why exclusive sales territory is protected. The reasons can be classified into four groups - industry characteristics, franchise systems characteristics, capability to maintain exclusive sales territory, and strategic decision. Within four groups of reasons, there are more specific variables and theories as below. Based on these theories, we develop nine hypotheses which are briefly shown in the last table below with the results. In order to validate the hypothesis, data is collected from government (FTC) homepage which is open source. The sample consists of 1,896 franchisors and it contains about three year operation data, from 2006 to 2008. Within the samples, 627 have exclusive sales territory protection policy and the one with exclusive sales territory policy is not evenly distributed over 19 representative industries. Additional data are also collected from another government agency homepage, like Statistics Korea. Also, we combine data from various secondary sources to create meaningful variables as shown in the table below. All variables are dichotomized by mean or median split if they are not inherently dichotomized by its definition, since each hypothesis is composed by multiple variables and there is no solid statistical technique to incorporate all these conditions to test the hypotheses. This paper uses a simple chi-square test because hypotheses and theories are built upon quite specific conditions such as industry type, economic condition, company history and various strategic purposes. It is almost impossible to find all those samples to satisfy them and it can't be manipulated in experimental settings. However, more advanced statistical techniques are very good on clean data without exogenous variables, but not good with real complex data. The chi-square test is applied in a way that samples are grouped into four with two criteria, whether they use exclusive sales territory protection or not, and whether they satisfy conditions of each hypothesis. So the proportion of sample franchisors which satisfy conditions and protect exclusive sales territory, does significantly exceed the proportion of samples that satisfy condition and do not protect. In fact, chi-square test is equivalent with the Poisson regression which allows more flexible application. As results, only three hypotheses are accepted. When attitude toward the risk is high so loyalty fee is determined according to sales performance, EST protection makes poor results as expected. And when franchisor protects EST in order to recruit franchisee easily, EST protection makes better results. Also, when EST protection is to improve the efficiency of franchise system as a whole, it shows better performances. High efficiency is achieved as EST prohibits the free riding of franchisee who exploits other's marketing efforts, and it encourages proper investments and distributes franchisee into multiple regions evenly. Other hypotheses are not supported in the results of significance testing. Exclusive sales territory should be protected from proper motives and administered for mutual benefits. Legal restrictions driven by the government agency like FTC could be misused and cause mis-understandings. So there need more careful monitoring on real practices and more rigorous studies by both academicians and practitioners.

  • PDF