• Title/Summary/Keyword: Missing Values

Search Result 440, Processing Time 0.025 seconds

A Study on the Selection of Candidates for Substances Subject to Permission Using Chemicals Ranking and Scoring (CRS) (화학물질 우선순위 선정기법(CRS)을 활용한 허가대상 후보물질 선정 연구)

  • Kim, Hyo-dong;Park, Kyo-shik
    • Journal of Korean Society of Occupational and Environmental Hygiene
    • /
    • v.32 no.3
    • /
    • pp.253-267
    • /
    • 2022
  • Objectives: This study was performed to check whether the CRS (Chemical Ranking and Scoring) system is appropriate as a method to determine substances as candidates for substances subject to permission and to apply this system to the selection of candidates for substances subject to permission. Methods: A risk score was obtained by multiplying the hazard score and the exposure score and then ranking them. The hazard sub-indicators are carcinogenicity, germ cell mutagenicity, reproductive toxicity, specific target organ toxicity-repeated exposure, respiratory sensitization and endocrine disrupting chemicals. Exposure sub-indicators are persistence, bioaccumulation and emission volume. Sensitivity analysis was performed for missing values. Correlation analysis and multivariable linear regression analysis were performed among hazard, exposure and risk in order to confirm that CRS was an appropriate method. Results: As a result of the sensitivity analysis on missing values, it was confirmed that the effect on the risk ranking was not sensitive. Correlation and regression analysis confirmed that exposure had a greater effect on risk than hazard. Conclusions: The CRS system, which derives a risk score using a hazard and exposure score, is judged to be appropriate as a method for the selection of preliminary of candidates for substances subject to permission. Benzene, cadmium, nickel, and cobalt were selected as priority candidates for substances subject to permission.

A novel nomogram of naïve Bayesian model for prevalence of cardiovascular disease

  • Kang, Eun Jin;Kim, Hyun Ji;Lee, Jea Young
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.3
    • /
    • pp.297-306
    • /
    • 2018
  • Cardiovascular disease (CVD) is the leading cause of death worldwide and has a high mortality rate after onset; therefore, the CVD management requires the development of treatment plans and the prediction of prevalence rates. In our study, age, income, education level, marriage status, diabetes, and obesity were identified as risk factors for CVD. Using these 6 factors, we proposed a nomogram based on a $na{\ddot{i}}ve$ Bayesian classifier model for CVD. The attributes for each factor were assigned point values between -100 and 100 by Bayes' theorem, and the negative or positive attributes for CVD were represented to the values. Additionally, the prevalence rate can be calculated even in cases with some missing attribute values. A receiver operation characteristic (ROC) curve and calibration plot verified the nomogram. Consequently, when the attribute values for these risk factors are known, the prevalence rate for CVD can be predicted using the proposed nomogram based on a $na{\ddot{i}}ve$ Bayesian classifier model.

Enhancement of durability of tall buildings by using deep-learning-based predictions of wind-induced pressure

  • K.R. Sri Preethaa;N. Yuvaraj;Gitanjali Wadhwa;Sujeen Song;Se-Woon Choi;Bubryur Kim
    • Wind and Structures
    • /
    • v.36 no.4
    • /
    • pp.237-247
    • /
    • 2023
  • The emergence of high-rise buildings has necessitated frequent structural health monitoring and maintenance for safety reasons. Wind causes damage and structural changes on tall structures; thus, safe structures should be designed. The pressure developed on tall buildings has been utilized in previous research studies to assess the impacts of wind on structures. The wind tunnel test is a primary research method commonly used to quantify the aerodynamic characteristics of high-rise buildings. Wind pressure is measured by placing pressure sensor taps at different locations on tall buildings, and the collected data are used for analysis. However, sensors may malfunction and produce erroneous data; these data losses make it difficult to analyze aerodynamic properties. Therefore, it is essential to generate missing data relative to the original data obtained from neighboring pressure sensor taps at various intervals. This study proposes a deep learning-based, deep convolutional generative adversarial network (DCGAN) to restore missing data associated with faulty pressure sensors installed on high-rise buildings. The performance of the proposed DCGAN is validated by using a standard imputation model known as the generative adversarial imputation network (GAIN). The average mean-square error (AMSE) and average R-squared (ARSE) are used as performance metrics. The calculated ARSE values by DCGAN on the building model's front, backside, left, and right sides are 0.970, 0.972, 0.984 and 0.978, respectively. The AMSE produced by DCGAN on four sides of the building model is 0.008, 0.010, 0.015 and 0.014. The average standard deviation of the actual measures of the pressure sensors on four sides of the model were 0.1738, 0.1758, 0.2234 and 0.2278. The average standard deviation of the pressure values generated by the proposed DCGAN imputation model was closer to that of the measured actual with values of 0.1736,0.1746,0.2191, and 0.2239 on four sides, respectively. In comparison, the standard deviation of the values predicted by GAIN are 0.1726,0.1735,0.2161, and 0.2209, which is far from actual values. The results demonstrate that DCGAN model fits better for data imputation than the GAIN model with improved accuracy and fewer error rates. Additionally, the DCGAN is utilized to estimate the wind pressure in regions of buildings where no pressure sensor taps are available; the model yielded greater prediction accuracy than GAIN.

A New Support Vector Machines for Classifying Uncertain Data (불완전 데이터의 패턴 분석을 위한 $_{MI}$SVMs)

  • Kiyoung, Lee;Dae-Won, Kim;Doheon, Lee;Kwang H., Lee
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10b
    • /
    • pp.703-705
    • /
    • 2004
  • Conventional support vector machines (SVMs) find optimal hyperplanes that have maximal margins by treating all data equivalently. In the real world, however, the data within a data set may differ in degree of uncertainty or importance due to noise, inaccuracies or missing values in the data. Hence, if all data are treated as equivalent, without considering such differences, the optimal hyperplanes identified are likely to be less optimal. In this paper, to more accurately identify the optimal hyperplane in a given uncertain data set, we propose a membership-induced distance from a hyperplane using membership values, and formulate three kinds of membership-induced SVMs.

  • PDF

Variance Estimation for Imputed Survey Data using Balanced Repeated Replication Method

  • Lee, Jun-Suk;Hong, Tae-Kyong;Namkung, Pyong
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.2
    • /
    • pp.365-379
    • /
    • 2005
  • Balanced Repeated Replication(BRR) is widely used to estimate the variance of linear or nonlinear estimators from complex sampling surveys. Most of survey data sets include imputed missing values and treat the imputed values as observed data. But applying the standard BRR variance estimation formula for imputed data does not produce valid variance estimators. Shao, Chen and Chen(1998) proposed an adjusted BRR method by adjusting the imputed data to produce more accurate variance estimators. In this paper, another adjusted BRR method is proposed with examples of real data.

Development of a Zinc Database to Estimate the Zinc Intake Levels in the Korean Toddlers and Preschool Children (한국 유아의 아연 섭취 수준 평가를 위한 데이터베이스 구축)

  • Yoon, Su-In;Shim, Jae Eun
    • Korean Journal of Community Nutrition
    • /
    • v.26 no.2
    • /
    • pp.103-110
    • /
    • 2021
  • Objectives: The objective of this study was to develop a zinc database (DB) to estimate the intake levels of zinc in Korean toddlers and preschool children using the data from the Korea National Health and Nutrition Examination Survey (KNHANES). Methods: A total of 3,361 food items for the DB representing the usual diet of Korean toddlers and preschool children were selected based on KNHANES (2009~2013) and the food composition table of Rural Development Administration (RDA). The existing values of zinc in foods were collected from the latest food composition tables of RDA (9th revision) and the US Department of Agriculture (legacy release). The zinc contents were filled preferentially with these collected values. The missing values were replaced with the calculated values or imputed values using the existing values of similar food items from the data source. The zinc intake levels of Korean toddlers and preschool children were estimated using KNHANES and zinc DB Results: A total of 1,188 existing values, 412 calculated values, and 1,727 imputed values were included in the zinc DB. The mean intake levels of zinc for 1-2-year-old children and 3-5-year-olds were 5.17 ± 2.94 mg/day and 6.30 ± 2.84 mg/day, respectively. There was no significant difference in the zinc intake levels between boys and girls in each group. Conclusions: This newly developed zinc DB would be helpful to assess the zinc nutritional status and investigate the association between the zinc intakes and related health concerns in Korean toddlers and preschool children.

A point-scale gap filling of the flux-tower data using the artificial neural network (인공신경망 기법을 이용한 청미천 유역 Flux tower 결측치 보정)

  • Jeon, Hyunho;Baik, Jongjin;Lee, Seulchan;Choi, Minha
    • Journal of Korea Water Resources Association
    • /
    • v.53 no.11
    • /
    • pp.929-938
    • /
    • 2020
  • In this study, we estimated missing evapotranspiration (ET) data at a eddy-covariance flux tower in the Cheongmicheon farmland site using the Artificial Neural Network (ANN). The ANN showed excellent performance in numerical analysis and is expanding in various fields. To evaluate the performance the ANN-based gap-filling, ET was calculated using the existing gap-filling methods of Mean Diagnostic Variation (MDV) and Food and Aggregation Organization Penman-Monteith (FAO-PM). Then ET was evaluated by time series method and statistical analysis (coefficient of determination, index of agreement (IOA), root mean squared error (RMSE) and mean absolute error (MAE). For the validation of each gap-filling model, we used 30 minutes of data in 2015. Of the 121 missing values, the ANN method showed the best performance by supplementing 70, 53 and 84 missing values, respectively, in the order of MDV, FAO-PM, and ANN methods. Analysis of the coefficient of determination (MDV, FAO-PM, and ANN methods followed by 0.673, 0.784, and 0.841, respectively.) and the IOA (The MDV, FAO-PM, and ANN methods followed by 0.899, 0.890, and 0.951 respectively.) indicated that, all three methods were highly correlated and considered to be fully utilized, and among them, ANN models showed the highest performance and suitability. Based on this study, it could be used more appropriately in the study of gap-filling method of flux tower data using machine learning method.

Edge Adaptive Color Interpolation for Ultra-Small HD-Grade CMOS Video Sensor in Camera Phones

  • Jang, Won-Woo;Kim, Joo-Hyun;Yang, Hoon-Gee;Lee, Gi-Dong;Kang, Bong-Soon
    • Journal of information and communication convergence engineering
    • /
    • v.8 no.1
    • /
    • pp.51-58
    • /
    • 2010
  • This paper proposes an edge adaptive color interpolation for an ultra-small HD-grade complementary metal-oxide semiconductor (CMOS) video sensor in camera phones that can process 720-p/30-fps videos. Recently, proposed methods with great image quality perceptually reconstruct the green component and then estimate the red/blue component using the reconstructed green and neighbor red and blue pixels. However, these methods require the bulky memory line buffers in order to temporally store the reconstructed green components. The edge adaptive color interpolation method uses seven or nine patterns to calculate the six edge directions. At the same time, the threshold values are adaptively adjusted by the sum of the color values of the selected pixels. This method selects the suitable one among the patterns using two flowcharts proposed in this paper, and then interpolates the missing color values. For verification, we calculated the peak-signal-to-noise-ratio (PSNR) in the test images, which were processed by the proposed algorithm, and compared the calculated PSNR of the existing methods. The proposed color interpolation is also fabricated with the 0.18-${\mu}m$ CMOS flash memory process.

Prediction of Genomic Relationship Matrices using Single Nucleotide Polymorphisms in Hanwoo (한우의 유전체 표지인자 활용 개체 혈연관계 추정)

  • Lee, Deuk-Hwan;Cho, Chung-Il;Kim, Nae-Soo
    • Journal of Animal Science and Technology
    • /
    • v.52 no.5
    • /
    • pp.357-366
    • /
    • 2010
  • The emergence of next-generation sequencing technologies has lead to application of new computational and statistical methodologies that allow incorporating genetic information from entire genomes of many individuals composing the population. For example, using single-nucleotide polymorphisms (SNP) obtained from whole genome amplification platforms such as the Ilummina BovineSNP50 chip, many researchers are actively engaged in the genetic evaluation of cattle livestock using whole genome relationship analyses. In this study, we estimated the genomic relationship matrix (GRM) and compared it with one computed using a pedigree relationship matrix (PRM) using a population of Hanwoo. This project is a preliminary study that will eventually include future work on genomic selection and prediction. Data used in this study were obtained from 187 blood samples consisting of the progeny of 20 young bulls collected after parentage testing from the Hanwoo improvement center, National Agriculture Cooperative Federation as well as 103 blood samples from the progeny of 12 proven bulls collected from farms around the Kyong-buk area in South Korea. The data set was divided into two cases for analysis. In the first case missing genotypes were included. In the second case missing genotypes were excluded. The effect of missing genotypes on the accuracy of genomic relationship estimation was investigated. Estimation of relationships using genomic information was also carried out chromosome by chromosome for whole genomic SNP markers based on the regression method using allele frequencies across loci. The average correlation coefficient and standard deviation between relationships using pedigree information and chromosomal genomic information using data which was verified using a parentage test andeliminated missing genotypes was $0.81{\pm}0.04$ and their correlation coefficient when using whole genomic information was 0.98, which was higher. Variation in relationships between non-inbred half sibs was $0.22{\pm}0.17$ on chromosomal and $0.22{\pm}0.04$ on whole genomic SNP markers. The variations were larger and unusual values were observed when non-parentage test data were included. So, relationship matrix by genomic information can be useful for genetic evaluation of animal breeding.

Micro-shear bond strengths of resin-matrix ceramics subjected to different surface conditioning strategies with or without coupling agent application

  • Gunal-Abduljalil, Burcu;Onoral, Ozay;Ongun, Salim
    • The Journal of Advanced Prosthodontics
    • /
    • v.13 no.3
    • /
    • pp.180-190
    • /
    • 2021
  • Purpose. This study aimed to assess the influence of various micromechanical surface conditioning (MSC) strategies with or without coupling agent (silane) application on the micro-shear bond strength (µSBS) of resin- matrix ceramics (RMCs). Materials and Methods. GC Cerasmart (GC), Lava Ultimate (LU), Vita Enamic (VE), Voco Grandio (VG), and Brilliant Crios (BC) were cut into 1.0-mm-thick slices (n = 32 per RMC) and separated into four groups according to the MSC strategy applied: control-no conditioning (C), air-borne particle abrasion with aluminum oxide particles (APA), 2W- and 3W-Er,Cr:YSGG group coding is missing. The specimens in each group were further separated into silane-applied and silane-free subgroups. Each specimen received two resin cement microtubules (n = 8 per subgroup). A shear force was applied to the adhesive interface through a universal test machine and µSBS values were measured. Data were statistically analyzed by using 3-way ANOVA and Tukey HSD test. Failure patterns were scrutinized under stereomicroscope. Results. RMC material type, MSC strategy, and silanization influenced the µSBS values (P<.05). In comparison to the control group, µSBS values increased after all other MSC strategies (P<.05) while the differences among these strategies were insignificant (P>.05). For control and APA, there were insignificant differences between RMCs (P>.05). The silanization decreased µSBS values of RMCs except for VE. Considerable declines were observed in GC and BC (P<.05). Conclusion. MSC strategies can enhance bond strength values at the RMC - cement interface. However, the choice of MSC strategy is dependent on RMC material type and each RMC can require a dedicated way of conditioning.