• 제목/요약/키워드: data sets

Search Result 3,771, Processing Time 0.03 seconds

Calculation of a Threshold for Decision of Similar Features in Different Spatial Data Sets (이종의 공간 데이터 셋에서 매칭 객체 판별을 위한 임계값 산출)

  • Kim, Jiyoung;Huh, Yong;Yu, Kiyun;Kim, Jung Ok
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.31 no.1
    • /
    • pp.23-28
    • /
    • 2013
  • The process of a feature matching for two different spatial data sets is similar to the process of classification as a binary class such as matching or non-matching. In this paper, we calculated a threshold by applying an equal error rate (EER) which is widely used in biometrics that classification is a main topic into spatial data sets. In a process of discriminating what's a matching or what's not, a precision and a recall is changed and a trade-off appears between these indexes because the number of matching pairs is changed when a threshold is changed progressively. This trade-off point is EER, that is, threshold. To the result of applying this method into training data, a threshold is estimated at 0.802 of a value of shape similarity. By applying the estimated threshold into test data, F-measure that is a evaluation index of matching method is highly value, 0.940. Therefore we confirmed that an accurate threshold is calculated by EER without person intervention and this is appropriate to matching different spatial data sets.

Eddy Momentum, Heat, and Moisture Transports During the Boreal Winter: Three Reanalysis Data Comparison (북반구 겨울철 에디들에 의한 운동량, 열 그리고 수분 수송: 세 가지 재분석 자료 비교)

  • Moon, Hyejin;Ha, Kyung-Ja
    • Atmosphere
    • /
    • v.26 no.4
    • /
    • pp.649-663
    • /
    • 2016
  • This study investigates eddy transports in terms of space and time for momentum, heat, and moisture, emphasizing comparison of the results in three reanalysis data sets including ERA-Interim from the European Center for Medium-range Weather Forecasts (ECMWF), NCEP2 from the National Center for Environmental Prediction and the Department of Energy (NCEP-DOE), and JRA-55 from the Japan Meteorological Agency (JMA) during boreal winter. The magnitudes for eddy transports of momentum in ERA-Interim are represented as the strongest value in comparison of three data sets, which may be mainly come from that both zonal averaged meridional and zonal wind tend to follow the hierarchy of ERA-Interim, NCEP2, and JRA-55. Whereas in relation to heat and moisture eddy transports, those of NCEP2 are the strongest, implying that zonal averaged air temperature (specific humidity) tend to follow the raking of NCEP2, ERA-Interim, and JRA-55 (NCEP2, JRA-55, and ERA-Interim), except that transient eddy transports for heat in ERA-Interim are the strongest involving both meridional wind and air temperature. The stationary and transient eddy transports in the context of space and time correlation, and intensity of standard deviation demonstrate that the correlation (intensity of standard deviation) influence the structure (magnitude) of eddy transports. The similarity between ERA-Interim and NCEP2 (ERA-Interim and JRA-55) of space correlation (time correlation) closely resembles among three data sets. A resemblance among reanalysis data sets of space correlation is larger than that of time correlation.

Evaluation of Ground-Truth Results of Radar Rainfall Depending on Rain-Gauge Data (우량계 강우 자료에 따른 레이더 강우의 지상보정 결과 검토)

  • Kim, Byoung-Soo;Kim, Kyoung-Jun;Yoo, Chul-Sang
    • Journal of the Korean Society of Hazard Mitigation
    • /
    • v.7 no.4
    • /
    • pp.19-29
    • /
    • 2007
  • This study compares various ground-truth designs of radar rainfall using rain-gauge data sets from Korea Meteorological Administration (KMA), AWS and Ministry of Construction and Transportation (MOCT). These Rain-gauge data sets and the Mt. Gwanak radar rainfall data for the same period were compared, and then the differences between two observed rainfall were evaluated with respect to the amount of bias. Additionally this study investigated possible differences in bias due to different storm characteristics. The application results showed no distinct differences between biases from three rain-gauge data sets, but some differences in their statistical characteristics. In overall, the design bias from MOCT was estimated to be the smallest among the three rain-gauge data sets. Among three storm events considered, the jangma with the highest spatial intermittency showed the smallest bias.

Heterogeneous Ensemble of Classifiers from Under-Sampled and Over-Sampled Data for Imbalanced Data

  • Kang, Dae-Ki;Han, Min-gyu
    • International journal of advanced smart convergence
    • /
    • v.8 no.1
    • /
    • pp.75-81
    • /
    • 2019
  • Data imbalance problem is common and causes serious problem in machine learning process. Sampling is one of the effective methods for solving data imbalance problem. Over-sampling increases the number of instances, so when over-sampling is applied in imbalanced data, it is applied to minority instances. Under-sampling reduces instances, which usually is performed on majority data. We apply under-sampling and over-sampling to imbalanced data and generate sampled data sets. From the generated data sets from sampling and original data set, we construct a heterogeneous ensemble of classifiers. We apply five different algorithms to the heterogeneous ensemble. Experimental results on an intrusion detection dataset as an imbalanced datasets show that our approach shows effective results.

Dose-Response Relationship of Avian Influenza Virus Based on Feeding Trials in Humans and Chickens (조류인플루엔자 바이러스의 양-반응 모형)

  • Pak, Son-Il;Lee, Jae-Yong;Jeon, Jong-Min
    • Journal of Veterinary Clinics
    • /
    • v.28 no.1
    • /
    • pp.101-107
    • /
    • 2011
  • This study aimed to determine dose-response (DR) curve of avian influenza (AI) virus to predict the probability of illness or adverse health effects that may result from exposure to a pathogenic microorganism in a quantitative microbial risk assessment. To determine the parametric DR relationship of several strains of AI virus, 7 feeding trial data sets challenging humans (5 sets) and chickens (2 sets) for strains of H3N2 (4 sets), H5N1 (2 sets) and H1N1 (1 set) from the published literatures. Except for one data set (study with intra-tracheal inoculation for data set no. 6), all were obtained from the studies with intranasal inoculation. The data were analyzed using three types of DR model as the basis of heterogeneity in infectivity of AI strains in humans and chickens: exponential, beta-binomial and beta-Poisson. We fitted to the data using maximum likelihood estimation to get the parameter estimates of each model. The alpha and beta values of the beta-Poisson DR model ranged 0.06-0.19 and 1.7-48.8, respectively for H3N2 strain. Corresponding values for H5N1 ranged 0.464-0.563 and 97.3-99.4, respectively. For H1N1 the parameter values were 0.103 and 12.7, respectively. Using the exponential model, r (infectivity parameter) ranged from $1.6{\times}10^{-8}$ to $1.2{\times}10^{-5}$ for H3N2 and from $7.5{\times}10^{-3}$ to $4.0{\times}10^{-2}$ for H5N1, while the value was $1.6{\times}10^{-8}$ for H1N1. The beta-Poisson DR model provided the best fit to five of 7 data sets tested, and the estimated parameter values in betabinomial model were very close to those of beta-Poisson. Our study indicated that beta-binomial or beta-Poisson model could be the choice for DR modeling of AI, even though DR relationship varied depending on the virus strains studied, as indicated in prior studies. Further DR modeling should be conducted to quantify the differences among AI virus strains.

Analysis of Longitudinal Dispersion Coefficient : Part II. Development of New Dispersion Coefficient Equation (종확산계수에 관한 연구 : II. 새로운 종확산계수 추정식 개발)

  • 서일원;정태성
    • Water for future
    • /
    • v.28 no.4
    • /
    • pp.195-204
    • /
    • 1995
  • New dispersion coefficient equation which can be used to estimate dispersion coefficient by using only hydraulic data easily obtained in natural streams has been developed. Dimensional analysis was performed to select physically meaningful parameters, One-Step Huber method, which is one of the nonlinear multi-regression method, was applied to derive a regression equation of dispersion coefficient. 59 measured hydraulic data which were collected in 26 streams in the United States and were analyzed in the Part I of this study, were used in developing new dispersion coefficient equation. Among 59 measured data sets, 35 data sets were used in deriving regression equation, and 24 data sets are used for verification. The new dispersion coefficient equation, which has been developed in this study was proven to be superior in explaining dispersion characteristics of natural streams more precisely compared to existing dispersion coefficient equations.

  • PDF

AN APPROPRIATE INFLOW MODEL FOR SIMULTANEOUS DISSOLUTION AND DEGRADATION

  • Lee, Ju-Hyun;Kang, Sung-Kwon;Choi, Hoo-Kyun
    • Honam Mathematical Journal
    • /
    • v.31 no.1
    • /
    • pp.109-124
    • /
    • 2009
  • Based on the observed data for Clarithromycin released, three commonly used inflow models: the power, the exponential, and the logarithmic models are considered. Among them, the power model is used most in practice for simplicity. Using the numerical parameter estimation techniques, the parameters appeared in the model equations are estimated. Through the numerical estimation results using the several experimental data sets, the exponential model turns out to be best among the three models. More specifically, the sum of squares of absolute errors and the sum of squares of relative errors for the exponential model are reduced by 80-95 % for the experimental data sets and 60-90 % for the noise added data sets compared with those for the power and logarithmic models. A typical experimental data set is used in this paper to show the estimation method and its numerical results. The proposed numerical method and its algorithm are designed for estimating the parameters appeared in the model differential equations for which the exact form of the solution is unknown in general. The methodology developed can be applied to more general cases such as the nonlinear ordinary differential equations or the partial differential equations.

ConvXGB: A new deep learning model for classification problems based on CNN and XGBoost

  • Thongsuwan, Setthanun;Jaiyen, Saichon;Padcharoen, Anantachai;Agarwal, Praveen
    • Nuclear Engineering and Technology
    • /
    • v.53 no.2
    • /
    • pp.522-531
    • /
    • 2021
  • We describe a new deep learning model - Convolutional eXtreme Gradient Boosting (ConvXGB) for classification problems based on convolutional neural nets and Chen et al.'s XGBoost. As well as image data, ConvXGB also supports the general classification problems, with a data preprocessing module. ConvXGB consists of several stacked convolutional layers to learn the features of the input and is able to learn features automatically, followed by XGBoost in the last layer for predicting the class labels. The ConvXGB model is simplified by reducing the number of parameters under appropriate conditions, since it is not necessary re-adjust the weight values in a back propagation cycle. Experiments on several data sets from UCL Repository, including images and general data sets, showed that our model handled the classification problems, for all the tested data sets, slightly better than CNN and XGBoost alone and was sometimes significantly better.

Study on Rainfall Characteristics for the Millimeter-wave Communication Systems-Comparisons of Rainfall rate data from Several observation methods.

  • Chung, H.S.;Song, B.H.;Lee, J.H.;Park, K.M.;Lee, K.A.
    • Proceedings of the KSRS Conference
    • /
    • 1999.11a
    • /
    • pp.132-134
    • /
    • 1999
  • Rainfall characteristics for designing the optimum millimeter-wave communication systems from two rainfall data set was analyzed. Two rainfall data sets were compared; one-minute rainfall rate data, one-hour synoptic observation data. Each data set has different observation method, sampling frequency. We looked for tendency and quality confluence between two data sets. We showed several results using one-minute rainfall data by millimeter-wave attenuation model. A climatological one-minute rainfall rate data set over Korean Peninsula will be made after data quality control procedure

  • PDF

An Efficient Algorithm for Updating Discovered Association Rules in Data Mining (데이터 마이닝에서 기존의 연관규칙을 갱신하는 효율적인 앨고리듬)

  • 김동필;지영근;황종원;강맹규
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.21 no.45
    • /
    • pp.121-133
    • /
    • 1998
  • This study suggests an efficient algorithm for updating discovered association rules in large database, because a database may allow frequent or occasional updates, and such updates may not only invalidate some existing strong association rules, but also turn some weak rules into strong ones. FUP and DMI update efficiently strong association rules in the whole updated database reusing the information of the old large item-sets. Moreover, these algorithms use a pruning technique for reducing the database size in the update process. This study updates strong association rules efficiently in the whole updated database reusing the information of the old large item-sets. An updating algorithm that is suggested in this study generates the whole candidate item-sets at once in an incremental database in view of the fact that it is difficult to find the new set of large item-sets in the whole updated database after an incremental database is added to the original database. This method of generating candidate item-sets is different from that of FUP and DMI. After generating the whole candidate item-sets, if each item-set in the whole candidate item-sets is large at an incremental database, the original database is scanned and the support of each item-set in the whole candidate item-sets is updated. So, the whole large item-sets in the whole updated database is found out. An updating algorithm that is suggested in this study does not use a pruning technique for reducing the database size in the update process. As a result, an updating algoritm that is suggested updates fast and efficiently discovered large item-sets.

  • PDF