• Title/Summary/Keyword: Data normality

Search Result 325, Processing Time 0.032 seconds

The Effect of Macroeconomic Factors on Income Inequality: Evidence from Indonesia

  • SESSU, Andi;SAMIHA, Yulia Tri;LAISILA, Maya;CHAMIDAH, Nurul;MURDIFIN, Imaduddin;PUTRA, Aditya Halim Perdana Kusuma
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.7
    • /
    • pp.55-66
    • /
    • 2021
  • The purpose of this study is to analyze the relationship and effects of variables both directly and indirectly (e.g., investment (INV), government expenditure (GE), unemployment rate (UR), economic growth (EG), and income inequality). The analytical phases consist, first, to transform the data using the Log Natural (Ln) method. Second, to check normality and multicollinearity of data. Third, to test direct effects of variables (government expenditure and investment effect on the unemployment rate and economic growth; investment on government expenditure; economic growth on unemployment rate; economic growth and unemployment rate on income inequality). Fourth, to test indirect effects using Sobel test, which involves UR and EG as intervening variable. Fifth, to test hypotheses with p-value < 0.05. The results of the study reveal that, of the 12 relationships, statistics show that 11 variations of the association have significant positive and negative effects. Theoretically, the different characters and goals of GE and INV in each country will have a different impact on EG and UR goals. The study provides an input, especially for the government. To create optimal EG through GE and INV, it is necessary to allocate budgets to industrial sectors that can absorb a massive labor force and to new economic growth sectors.

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

Comparative analysis of bond strength to root dentin and compression of bioceramic cements used in regenerative endodontic procedures

  • Maykely Naara Morais Rodrigues;Kely Firmino Bruno;Ana Helena Goncalves de Alencar;Julyana Dumas Santos Silva;Patricia Correia de Siqueira;Daniel de Almeida Decurcio;Carlos Estrela
    • Restorative Dentistry and Endodontics
    • /
    • v.46 no.4
    • /
    • pp.59.1-59.14
    • /
    • 2021
  • Objectives: This study compared the Biodentine, MTA Repair HP, and Bio-C Repair bioceramics in terms of bond strength to dentin, failure mode, and compression. Materials and Methods: Fifty-four slices obtained from the cervical third of 18 single-rooted human mandibular premolars were randomly distributed (n = 18). After insertion of the bioceramic materials, the push-out test was performed. The failure mode was analyzed using stereomicroscopy. Another set of cylindrically-shaped bioceramic samples (n = 10) was prepared for compressive strength testing. The normality of data distribution was analyzed using the Shapiro-Wilk test. The Kruskal-Wallis and Friedman tests were used for the push-out test data, while compressive strength was analyzed with analysis of variance and the Tukey test, considering a significance level of 0.05. Results: Biodentine presented a higher median bond strength value (14.79 MPa) than MTA Repair HP (8.84 MPa) and Bio-C Repair (3.48 MPa), with a significant difference only between Biodentine and Bio-C Repair. In the Biodentine group, the most frequent failure mode was mixed (61%), while in the MTA Repair HP and Bio-C Repair groups, it was adhesive (94% and 72%, respectively). Biodentine showed greater resistance to compression (29.59 ± 8.47 MPa) than MTA Repair HP (18.68 ± 7.40 MPa) and Bio-C Repair (19.96 ± 3.96 MPa) (p < 0.05). Conclusions: Biodentine showed greater compressive strength than MTA Repair HP and Bio-C Repair, and greater bond strength than Bio-C Repair. The most frequent failure mode of Biodentine was mixed, while that of MTA Repair HP and Bio-C Repair was adhesive.

Correlation between cone-beam computed tomographic findings and the apnea-hypopnea index in obstructive sleep apnea patients: A cross-sectional study

  • Marco Isaac;Dina Mohamed ElBeshlawy;Ahmed Elsobki;Dina Fahim Ahmed;Sarah Mohammed Kenawy
    • Imaging Science in Dentistry
    • /
    • v.54 no.2
    • /
    • pp.147-157
    • /
    • 2024
  • Purpose: The aim of this study was to explore the correlations of cone-beam computed tomographic findings with the apnea-hypopnea index in patients with obstructive sleep apnea. Materials and Methods: Forty patients with obstructive sleep apnea were selected from the ear-nose-throat (ENT) outpatient clinic, Faculty of Medicine, Mansoura University. Cone-beam computed tomography was performed for each patient at the end of both inspiration and expiration. Polysomnography was carried out, and the apnea-hypopnea index was obtained. Linear measurements, including cross-sectional area and the SNA and SNB angles, were obtained. Four oral and maxillofacial radiologists categorized pharyngeal and retropalatal airway morphology and calculated the airway length and volume. Continuous data were tested for normality using the Kolmogorov-Smirnov test and reported as the mean and standard deviation or as the median and range. Categorical data were presented as numbers and percentages, and the significance level was set at P<0.05. Results: The minimal value of the cross-sectional area, SNB angle, and airway morphology at the end of inspiration demonstrated a statistically significant association (P<0.05) with the apnea-hypopnea index, with excellent agreement. No statistically significant difference was found in the airway volume, other linear measurements, or retropalatal airway morphology. Conclusion: Cone-beam computed tomographic measurements in obstructive sleep apnea patients may be used as a supplement to a novel radiographic classification corresponding to the established clinical apnea-hypopnea index classification.

A study on the optimal variable transformation method to identify the correlation between ATP and APC (ATP와 APC 간의 관련성 규명을 위한 최적의 변수변환법에 관한 연구)

  • Moon, Hye-Kyung;Shin, Jae-Kyoung;Kim, Yang Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.6
    • /
    • pp.1465-1475
    • /
    • 2016
  • In order to secure safe meals, the hazards of microorganisms associated with food poisoning accident should be monitored and controlled in real situations. It is necessary to determined the correlation between existing common bacteria number (aerobic plate count; APC) and RLU (relative light unit) in cookware. In this paper, we investigate the correlation between ATP (RUL) and APC (CFU) by using three types of transform (inverse, square root, log transforms) of raw data in two steps. Among these transforms, the log transform at the first step has been found to be optimal for the data of cutting board, knife, soup bowl (stainless), and tray (carbon). The square root-inverse and the square root-square root transform at the second step have been shown to be optimal respectively for the cup and for the soup bowl (carbon) data.

Bayesian quantile regression analysis of private education expenses for high scool students in Korea (일반계 고등학생 사교육비 지출에 대한 베이지안 분위회귀모형 분석)

  • Oh, Hyun Sook
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.6
    • /
    • pp.1457-1469
    • /
    • 2017
  • Private education expenses is one of the key issues in Korea and there have been many discussions about it. Academically, most of previous researches for private education expenses have used multiple regression linear model based on ordinary least squares (OLS) method. However, if the data do not satisfy the basic assumptions of the OLS method such as the normality and homoscedasticity, there is a problem with the reliability of estimations of parameters. In this case, quantile regression model is preferred to OLS model since it does not depend on the assumptions of nonnormality and heteroscedasticity for the data. In the present study, the data from a survey on private education expenses, conducted by Statistics Korea in 2015 has been analyzed for investigation of the impacting factors for private education expenses. Since the data do not satisfy the OLS assumptions, quantile regression model has been employed in Bayesian approach by using gibbs sampling method. The analysis results show that the gender of the student, parent's age, and the time and cost of participating after school are not significant. Household income is positively significant in proportion to the same size for all levels (quantiles) of private education expenses. Spending on private education in Seoul is higher than other regions and the regional difference grows as private education expenditure increases. Total time for private education and student's achievement have positive effect on the lower quantiles than the higher quantiles. Education level of father is positively significant for midium-high quantiles only, but education level of mother is for all but low quantiles. Participating after school is positively significant for the lower quantiles but EBS textbook cost is positively significant for the higher quantiles.

Application of Beta Diversity to Analysis the Fish Community Structure in Stream (베타다양성 개념의 적용을 통한 청계천 어류 군집 특성 분석)

  • Kim, Dong-Hwan;Lee, Wan-Ok;Hong, Yang-Ki;Jeon, Hyoung-Joo;Kim, Kyung-Hwan;Kang, Hyejin;Song, Mi-Young
    • Korean Journal of Ecology and Environment
    • /
    • v.52 no.3
    • /
    • pp.274-283
    • /
    • 2019
  • Beta diversity is an efficient means of assessing the spatial variation in community composition among sites. To present fish community variation and LCBD (Local Contribution to Beta Diversity) among sites in stream, 6 sampling sites were selected in Cheonggye stream. Fish communities, environmental and habitat variables were collected at sites from April 2014 to October 2015. We used the total variance of the fish community data table (site-by-species community table) based on different forms, presence-absence, abundance, and Hellinger transformation, to estimate and compare beta diversity and LCBD. Fish community data table transformed by Hellinger distance showed the higher values of beta diversity than presence-absence and abundance data table. A similar patterns of LCBD were observed with presence-absence and Hellinger transformed data table. Low value of beta diversity calculated by community data table with abundance was due to the non-normality of fish assemblage data. Additionally, correlation coefficients were calculated to evaluate the relationships among LCBD, community indices and physicochemical variables. LCBD showed negative correlation coefficients with Shannon diversity. Overall, application of beta diversity analysis is an efficient method of addressing spatial variation of fish communities and ecological uniqueness of the sites in stream.

INVESTIGATION OF THE EFFECT OF AN ANTIBIOTIC "P" ON POTATOES ("감자에 대한 항생제(抗生劑) 피마리신의 통계적(統計的) 효과(效果) 분석(分析)")

  • Kim, Jong-Hoon
    • Journal of Korean Society for Quality Management
    • /
    • v.5 no.2
    • /
    • pp.59-120
    • /
    • 1977
  • An antibiotic 'P', which is one of the products of the Gist Brocades N. V. is being tested by its research department as fungicide on seed-potatoes. For this testing they designed experiments, with two control groups, one competitor's product, eight formulations of the antibiotic to be tested in different concentrations and one mercury treatment which can not be used in practice. The treated potatoes were planted in three different regions, where bifferent conditions prevail. After several months the harvested potatoes are divided in groups according to their diameter, potato illness is analysed and counted. These data were summarised in percentage and given to us for Analysis. We approached and analysed the data by following methods: a. Computation of the mean and standard deviation of the percenage of good results in each size group and treatment. b. Computation of the experimental errors by substraction of each treatment mean from observed data. c. Description of the frequency table, plotting of a histogram and a normal curve on same graph to check normality. d. Test of normality paper and chi-sqeare test to check the goodness of fit to a normal curve. e. Test for homogeneity of variance in each treatment with the Cochran's test and Hartley's test. f. Analysis of Variance for testing the means by one way classifications. g. Drawing of graphs with upper and lower confidence limits to show the effect of different treatments. h. T-test and F-test to two Control mean and variance for making one control of Dunnett's test. i. Dunnett's Test and calculations for numerical comarision of different treatments wth one control. In region R, where the potatoes were planted, it was this year very dry and rather bad conditions to grow potatoes prevailed during the experimental period. The results of this investigation show us that treatment No.2, 3 and 4 are significantly different from other treatments and control groups (none treated, just like natural state). Treatment no.2 is the useless mercury formulation. So only No. 3 and 4, which have high concentrations of antibiotic 'P', gave a good effect to the potatoes. As well as the competitors product, middle and low concentrated formulations are not significantly different from control gro-ups of every size. In region w, where the potatoes got the same treatments as in region R, prevailed better weather conditions and was enough water obtainable from the lake. The results in this region showed that treatment No. 2, 3, 4, and 5 are Significantly different from other treatments and the control groups. Again No.2 is the mercury treatmentin this investigation. Not only high concentrated formulation of antibiotic 'P', but also the competitor's poroduct gave good results. But, the effect of 'P', was better than the competitors porduct. In region G, where the potatoes got the same treatments as in the regions R and w. and the climate conditions were equal to region R, the results showed that most of the treatments are not significantly different from the control groups. Only treatment no. 3 was a little bit different from the others. but not Significantly different. It seems to us that the difference between the results in the three regions was caused by certain conditions like, the nature of the soil the degres of moisture and hours of sunshine, but we are not sure of that. As a conclusion, we can say that antibiotic 'P' has a good effect on potatoes, but in most investigations a rather high concentration of 'P' was required in formulations.

  • PDF

Machine learning-based corporate default risk prediction model verification and policy recommendation: Focusing on improvement through stacking ensemble model (머신러닝 기반 기업부도위험 예측모델 검증 및 정책적 제언: 스태킹 앙상블 모델을 통한 개선을 중심으로)

  • Eom, Haneul;Kim, Jaeseong;Choi, Sangok
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.2
    • /
    • pp.105-129
    • /
    • 2020
  • This study uses corporate data from 2012 to 2018 when K-IFRS was applied in earnest to predict default risks. The data used in the analysis totaled 10,545 rows, consisting of 160 columns including 38 in the statement of financial position, 26 in the statement of comprehensive income, 11 in the statement of cash flows, and 76 in the index of financial ratios. Unlike most previous prior studies used the default event as the basis for learning about default risk, this study calculated default risk using the market capitalization and stock price volatility of each company based on the Merton model. Through this, it was able to solve the problem of data imbalance due to the scarcity of default events, which had been pointed out as the limitation of the existing methodology, and the problem of reflecting the difference in default risk that exists within ordinary companies. Because learning was conducted only by using corporate information available to unlisted companies, default risks of unlisted companies without stock price information can be appropriately derived. Through this, it can provide stable default risk assessment services to unlisted companies that are difficult to determine proper default risk with traditional credit rating models such as small and medium-sized companies and startups. Although there has been an active study of predicting corporate default risks using machine learning recently, model bias issues exist because most studies are making predictions based on a single model. Stable and reliable valuation methodology is required for the calculation of default risk, given that the entity's default risk information is very widely utilized in the market and the sensitivity to the difference in default risk is high. Also, Strict standards are also required for methods of calculation. The credit rating method stipulated by the Financial Services Commission in the Financial Investment Regulations calls for the preparation of evaluation methods, including verification of the adequacy of evaluation methods, in consideration of past statistical data and experiences on credit ratings and changes in future market conditions. This study allowed the reduction of individual models' bias by utilizing stacking ensemble techniques that synthesize various machine learning models. This allows us to capture complex nonlinear relationships between default risk and various corporate information and maximize the advantages of machine learning-based default risk prediction models that take less time to calculate. To calculate forecasts by sub model to be used as input data for the Stacking Ensemble model, training data were divided into seven pieces, and sub-models were trained in a divided set to produce forecasts. To compare the predictive power of the Stacking Ensemble model, Random Forest, MLP, and CNN models were trained with full training data, then the predictive power of each model was verified on the test set. The analysis showed that the Stacking Ensemble model exceeded the predictive power of the Random Forest model, which had the best performance on a single model. Next, to check for statistically significant differences between the Stacking Ensemble model and the forecasts for each individual model, the Pair between the Stacking Ensemble model and each individual model was constructed. Because the results of the Shapiro-wilk normality test also showed that all Pair did not follow normality, Using the nonparametric method wilcoxon rank sum test, we checked whether the two model forecasts that make up the Pair showed statistically significant differences. The analysis showed that the forecasts of the Staging Ensemble model showed statistically significant differences from those of the MLP model and CNN model. In addition, this study can provide a methodology that allows existing credit rating agencies to apply machine learning-based bankruptcy risk prediction methodologies, given that traditional credit rating models can also be reflected as sub-models to calculate the final default probability. Also, the Stacking Ensemble techniques proposed in this study can help design to meet the requirements of the Financial Investment Business Regulations through the combination of various sub-models. We hope that this research will be used as a resource to increase practical use by overcoming and improving the limitations of existing machine learning-based models.

Factor Analysis for Exploratory Research in the Distribution Science Field (유통과학분야에서 탐색적 연구를 위한 요인분석)

  • Yim, Myung-Seong
    • Journal of Distribution Science
    • /
    • v.13 no.9
    • /
    • pp.103-112
    • /
    • 2015
  • Purpose - This paper aims to provide a step-by-step approach to factor analytic procedures, such as principal component analysis (PCA) and exploratory factor analysis (EFA), and to offer a guideline for factor analysis. Authors have argued that the results of PCA and EFA are substantially similar. Additionally, they assert that PCA is a more appropriate technique for factor analysis because PCA produces easily interpreted results that are likely to be the basis of better decisions. For these reasons, many researchers have used PCA as a technique instead of EFA. However, these techniques are clearly different. PCA should be used for data reduction. On the other hand, EFA has been tailored to identify any underlying factor structure, a set of measured variables that cause the manifest variables to covary. Thus, it is needed for a guideline and for procedures to use in factor analysis. To date, however, these two techniques have been indiscriminately misused. Research design, data, and methodology - This research conducted a literature review. For this, we summarized the meaningful and consistent arguments and drew up guidelines and suggested procedures for rigorous EFA. Results - PCA can be used instead of common factor analysis when all measured variables have high communality. However, common factor analysis is recommended for EFA. First, researchers should evaluate the sample size and check for sampling adequacy before conducting factor analysis. If these conditions are not satisfied, then the next steps cannot be followed. Sample size must be at least 100 with communality above 0.5 and a minimum subject to item ratio of at least 5:1, with a minimum of five items in EFA. Next, Bartlett's sphericity test and the Kaiser-Mayer-Olkin (KMO) measure should be assessed for sampling adequacy. The chi-square value for Bartlett's test should be significant. In addition, a KMO of more than 0.8 is recommended. The next step is to conduct a factor analysis. The analysis is composed of three stages. The first stage determines a rotation technique. Generally, ML or PAF will suggest to researchers the best results. Selection of one of the two techniques heavily hinges on data normality. ML requires normally distributed data; on the other hand, PAF does not. The second step is associated with determining the number of factors to retain in the EFA. The best way to determine the number of factors to retain is to apply three methods including eigenvalues greater than 1.0, the scree plot test, and the variance extracted. The last step is to select one of two rotation methods: orthogonal or oblique. If the research suggests some variables that are correlated to each other, then the oblique method should be selected for factor rotation because the method assumes all factors are correlated in the research. If not, the orthogonal method is possible for factor rotation. Conclusions - Recommendations are offered for the best factor analytic practice for empirical research.