• Title/Summary/Keyword: Simulated Data

Search Result 3,932, Processing Time 0.031 seconds

The impact of the change in the splitting method of decision trees on the prediction power (의사결정나무의 분기법 변화가 예측력에 미치는 영향)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.4
    • /
    • pp.517-525
    • /
    • 2022
  • In the era of big data, various data mining techniques have been proposed as major analysis methodologies. As complex and diverse data is mass-produced, data mining techniques have attracted attention as a method that forms the foundation of data science. In this paper, we focused on the decision tree, which is frequently used in practice and easy to understand as one of representative data mining methods. Specifically, we analyzed the effect of the splitting method of decision trees on the model performance. We compared the prediction power and structures of decision tree models with different split methods based on various simulated data. The results show that the linear combination split method can improve the prediction accuracy of decision trees in the case of data simulated from nonlinear models with complex structure.

Estimating the Total Precipitation Amount with Simulated Precipitation for Ungauged Stations in Jeju Island (미계측 관측 강수 자료 생성을 통한 제주도 지역의 수문총량 추정)

  • Kim, Nam-Won;Um, Myoung-Jin;Chung, Il-Moon;Heo, Jun-Haeng
    • Journal of Korea Water Resources Association
    • /
    • v.45 no.9
    • /
    • pp.875-885
    • /
    • 2012
  • In this study, the total precipitation amount in Jeju Island was estimated with the simulated precipitation for ungauged stations missing precipitation data using the spatial precipitation analysis. The missing data were generated through the modified multiple linear regression in this study, and the analysis of spatial precipitation was conducted with the PRISM(Parameter-elevation Regression on Independent Slope Model). The generated data with modified multiple linear regression model have similar pattern with original data. Thus, the model in this study shows good applicability to estimate the missing data. The difference of annual average precipitation between Case 1 (original data) and Case 2 (modified data) appears very small ratio which is about 1.5%. However, the difference of annual average precipitation according to elevation shows the large ratio up to 37.4%. As the results, the method of estimating missing data in this study would be useful to calculate the total precipitation amount at the low station density area and the places with the high spatial variation of precipitation.

A new method for calculating quantiles of grouped data based on the frequency polygon (집단화된 통계자료의 도수다각형에 근거한 새로운 분위수 계산법)

  • Kim, Hyuk Joo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.383-393
    • /
    • 2017
  • When we deal with grouped statistical data, it is desirable to use a calculation method that gives as close value to the true value of a statistic as possible. In this paper, we suggested a new method to calculate the quantiles of grouped data. The main idea of the suggested method is calculating the data values by partitioning the pentagons, that correspond to the class intervals in the frequency polygon drawn according to the histogram, into parts with equal area. We compared this method with existing methods through simulations using some datasets from introductory statistics textbooks. In the simulation study, we simulated as many data values as given in each class interval using the inverse transform method, on the basis of the distribution that has the shape given by the frequency polygon. Using the sum of squares of differences from quantiles of the simulated data as a criterion, the suggested method was found to have better performance than existing methods for almost all quartiles and deciles.

Simulated Annealing for Overcoming Data Imbalance in Mold Injection Process (사출성형공정에서 데이터의 불균형 해소를 위한 담금질모사)

  • Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.233-239
    • /
    • 2022
  • The injection molding process is a process in which thermoplastic resin is heated and made into a fluid state, injected under pressure into the cavity of a mold, and then cooled in the mold to produce a product identical to the shape of the cavity of the mold. It is a process that enables mass production and complex shapes, and various factors such as resin temperature, mold temperature, injection speed, and pressure affect product quality. In the data collected at the manufacturing site, there is a lot of data related to good products, but there is little data related to defective products, resulting in serious data imbalance. In order to efficiently solve this data imbalance, undersampling, oversampling, and composite sampling are usally applied. In this study, oversampling techniques such as random oversampling (ROS), minority class oversampling (SMOTE), ADASYN(Adaptive Synthetic Sampling), etc., which amplify data of the minority class by the majority class, and complex sampling using both undersampling and oversampling, are applied. For composite sampling, SMOTE+ENN and SMOTE+Tomek were used. Artificial neural network techniques is used to predict product quality. Especially, MLP and RNN are applied as artificial neural network techniques, and optimization of various parameters for MLP and RNN is required. In this study, we proposed an SA technique that optimizes the choice of the sampling method, the ratio of minority classes for sampling method, the batch size and the number of hidden layer units for parameters of MLP and RNN. The existing sampling methods and the proposed SA method were compared using accuracy, precision, recall, and F1 Score to prove the superiority of the proposed method.

Application and Comparison of Data Mining Technique to Prevent Metal-Bush Omission (메탈부쉬 누락예방을 위한 데이터마이닝 기법의 적용 및 비교)

  • Sang-Hyun Ko;Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.139-147
    • /
    • 2023
  • The metal bush assembling process is a process of inserting and compressing a metal bush that serves to reduce the occurrence of noise and stable compression in the rotating section. In the metal bush assembly process, the head diameter defect and placement defect of the metal bush occur due to metal bush omission, non-pressing, and poor press-fitting. Among these causes of defects, it is intended to prevent defects due to omission of the metal bush by using signals from sensors attached to the facility. In particular, a metal bush omission is predicted through various data mining techniques using left load cell value, right load cell value, current, and voltage as independent variables. In the case of metal bush omission defect, it is difficult to get defect data, resulting in data imbalance. Data imbalance refers to a case where there is a large difference in the number of data belonging to each class, which can be a problem when performing classification prediction. In order to solve the problem caused by data imbalance, oversampling and composite sampling techniques were applied in this study. In addition, simulated annealing was applied for optimization of parameters related to sampling and hyper-parameters of data mining techniques used for bush omission prediction. In this study, the metal bush omission was predicted using the actual data of M manufacturing company, and the classification performance was examined. All applied techniques showed excellent results, and in particular, the proposed methods, the method of mixing Random Forest and SA, and the method of mixing MLP and SA, showed better results.

Intercomparison of the East-Asian Summer Monsoon on 11-18 July 2004, simulated by WRF, MM5, and RSM models (WRF, MM5, RSM 모형에서 모의한 2004년 7월 11-18일의 동아시아 몬순의 비교)

  • Ham, Su-Ryun;Park, Seon-Joo;Bang, Cheol-Han;Jung, Byoung-Joo;Hong, Song-You
    • Atmosphere
    • /
    • v.15 no.2
    • /
    • pp.91-99
    • /
    • 2005
  • This study compares the summer monsoon circulations during a heavy rainfall period over the Korean peninsular from 11 to 18 July 2004, simulated by three widely used regional models; WRF, MM5, and RSM. An identical model setup is carried out for all the experiments, except for the physical option differences in the RSM. The three models with a nominal resolution of about 50 km over Korea are nested by NCEP-DOE reanalysis data. Another RSM experiment with the same cumulus parameterization scheme as in the WRF and MM5 is designed to investigate the importance of the representation of subgrid-scale parameterized convection in reproducing monsoonal circulations in East Asia. All thee models are found to be capable of reproducing the general distribution of monsoonal precipitation, extending northeastward from south China across the Korean peninsula, to northern Japan. The results from the WRF and MM5 are similar in terms of accumulated precipitation, but a slightly better performance in the WRF than in the MM5. The RSM improves the bias for precipitation as compared to those from the WRF and MM5, but the pattern correlation is degraded due to overestimation of precipitation in northern China. In the comparison of simulated synoptic scale features, the RSM is found to reproduce the large-scale features well compared to the results from the MM5 and WRF. On the other hand, the simulated precipitation from the RSM with the convection scheme used in the MM5 and WRF is closer to that from the WRF and MM5 simulations, indicating the significant dependency of simulated precipitation in East Asia on the cumulus parameterization scheme.

Assessment (If Estimated Daily Intakes of N-Nitrosamine by Diet (식사를 통한 N-Nitrosamine의 추정 섭취량 평가)

  • 성낙주;신정혜;김연희;이수정;손미예
    • The Korean Journal of Food And Nutrition
    • /
    • v.15 no.1
    • /
    • pp.29-35
    • /
    • 2002
  • N-nitrosamine(NA) contents depending on simulated gastric digestion were analyzed with 12 kinds of diets collected from institutional food service those diets were estimated the total NA amounts including both intake from food directly and its endogenous formation in human body from simulated gastric digestion. NA was determined in dishes of meats, fishes and vegetables before and after simulated gastric digestion. Before digestion, N-nitrosodimethylamine (NDMA) contents ware from not detected(ND) to 4.8$\mu\textrm{g}$/kg in dishes of meats and fishes. After digestion, its contents increased and the highest level was 3.0 $\mu\textrm{g}$/kg in panbroiled dried anchovy. In vegetable dishes, NDMA was detected as ND∼trace before and after digestion. The contents of NDMA in diets collected from institutional food service were 0.20∼0.78$\mu\textrm{g}$/kg 0.43 ∼ 0.80$\mu\textrm{g}$/kg before and after digestion, respectively. The average intake of NA per day to Korean, based on the above data, was 0.60 ∼ 2.34$\mu\textrm{g}$/day/ person. The maximum daily intake of NA was deduced to 5.15$\mu\textrm{g}$/day/person when considering NA amounts formed endogenously by simulated gastric digestion.

Studies on the Development of Storage Tank Model for both Long and Short Terms Runoff (II) (장단기유출 양용저유 탱크 모델의 개발에 관한 연구 (II))

  • 이순혁;박명근
    • Magazine of the Korean Society of Agricultural Engineers
    • /
    • v.33 no.2
    • /
    • pp.51-60
    • /
    • 1991
  • The main objective of this study is to examine the adaptability for the large watershed of the storage tank model which can be applied for the analysis of both long and short terms runoff developed on the basis of hydrologic data for a smaH mountaineous watershed. The results obtained in this study are summarized as follows ; 1. Areal rainfalls of the Dae Chong watershed were calculated by Thiessen method composed of 9 Thiessen networks. 2. Optimal parameters for two types, Model A and Model B of tank models were derived through calibration procedure by standardized Powell method. 3. Monthly simulated flows of Model B are seemed to be closer to the monthly observed than those of Model A during calibration period in the long terms runoff. 4. Relative errors for the simulated flood flows of Model B were apperaed as lower percentage to the observed than those of Model A during calibration period in the short terms runoff. 5. Daily simulated hydrographs of Model B are seemed to be closer to the daily observed than those of Model A during verification period in the long terms runoff. Significance of Model B was highly acknowledged in comparison with Model A in the correlation analysis between annual observed and annual simulated runoff. 6. Reproducibility of simulated flows for Model B is generally seemed to be better than that of Model A during calibration period in the short terms runoff. 7. It can be concluded that reproducibility of Model B is superior to that of Model A in the long and short terms runoff even a large watershed like the result of the small one. 8. It was verified that adaptability for the large watershed of Model B is superior to that of Model A between the two models which were developed by a small watershed characteristics for both long and short terms runoff. 9. Further study for getting a suitable tank model is desirable to be established by the decision, calibration method of initial parameters of tank model and by additional application of another watershed with different watersheds and meterological characteristics.

  • PDF

Consideration of root position in virtual tooth setup for extraction treatment: A comparative study of simulated and actual treatment results

  • Mirinae Park;Veerasathpurush Allareddy;Phimon Atsawasuwan;Min Kyeong Lee;Kyungmin Clara Lee
    • The korean journal of orthodontics
    • /
    • v.53 no.1
    • /
    • pp.26-34
    • /
    • 2023
  • Objective: The purpose of the present study was to compare the root positions in virtual tooth setups using only crowns in a simulated treatment with those achieved in the actual treatment. Methods: Pre- and post-treatment intraoral and corresponding cone beam computed tomography (CBCT) scans were obtained from 15 patients who underwent orthodontic treatment with premolar extraction. A conventional virtual tooth setup was used for the treatment simulation. Pre- and post-treatment three-dimensional digital tooth models were fabricated by integrating the patients' intraoral and CBCT scans. The simulated root positions in the virtual setup were obtained by merging the crown in the virtual setup and root in the pre-treatment tooth model. The root positions of the simulated and actual post-treatment tooth models were compared. Results: Differences in root positions between the simulated and actual models were > 1 mm in all teeth, and statistically significant differences were observed (p < 0.05), except for the maxillary lateral incisors. The differences in the inter-root angulation were > 1° in all teeth, and statistically significant differences were observed in the maxillary and mandibular canines. Conclusions: The virtual tooth setup using only crown data showed errors over the clinical limits. The clinical application of a virtual setup using crowns and roots is necessary for accurate and precise treatment simulation, particularly in extraction treatment.

Concordance of Three International Guidelines for Thyroid Nodules Classified by Ultrasonography and Diagnostic Performance of Biopsy Criteria

  • Younghee Yim;Dong Gyu Na;Eun Ju Ha;Jung Hwan Baek;Jin Yong Sung;Ji-hoon Kim;Won-Jin Moon
    • Korean Journal of Radiology
    • /
    • v.21 no.1
    • /
    • pp.108-116
    • /
    • 2020
  • Objective: To investigate the concordance of three international guidelines: the Korean Thyroid Association/Korean Society of Thyroid Radiology, American Thyroid Association, and American College of Radiology for thyroid nodules classified by ultrasonography (US) and the diagnostic performance of simulated size criteria for malignant biopsies. Materials and Methods: A total of 2586 thyroid nodules (≥ 1 cm) were collected from two multicenter study datasets. The classifications of the thyroid nodules were based on three different guidelines according to US categories for malignancy risk, and the concordance rate between the different guidelines was calculated for the classified nodules. In addition, the diagnostic performance of criteria related to four different simulated biopsy sizes was evaluated. Results: The concordance rate of nodules classified as high- or intermediate-suspicion was high (84.1-100%), but low-suspicion or mildly-suspicious nodules exhibited relatively low concordance (63.8-83.8%) between the three guidelines. The differences in sensitivity, specificity, and accuracy between the guidelines were 0.7-19.8%, 0-40.9%, and 0.1-30.5%, respectively, when the original biopsy criteria were applied. The differences decreased to 0-5.9%, 0-10.9%, and 0.1-8.2%, respectively, when simulated, similar biopsy size criteria were applied. The unnecessary biopsy rate calculated with the original criteria (0-33.8%), decreased with the simulated biopsy size criteria (0-8.7%). Conclusion: We found a high concordance between the three guidelines for high- or intermediate-suspicion nodules, and the diagnostic performance of the biopsy criteria was approximately equivalent for each simulated size criterion. The difference in diagnostic performance between the three guidelines is mostly influenced by the various size thresholds for biopsies.