• 제목/요약/키워드: Use Statistics

검색결과 2,827건 처리시간 0.026초

Environmental Survey Data Analysis by Data Fusion Technique

  • 조광현;박희창
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2006년도 추계 학술발표회 논문집
    • /
    • pp.21-27
    • /
    • 2006
  • Data fusion is generally defined as the use of techniques that combine data from multiple sources and gather that information in order to achieve inferences. Data fusion is also called data combination or data matching. Data fusion is divided in five branch types which are exact matching, judgemental matching, probability matching, statistical matching, and data linking. Currently, Gyeongnam province is executing the social survey every year with the provincials. But, they have the limit of the analysis as execute the different survey to 3 year cycles. In this paper, we study to data fusion of environmental survey data using sas macro. We can use data fusion outputs in environmental preservation and environmental improvement.

  • PDF

A Bayesian model for two-way contingency tables with nonignorable nonresponse from small areas

  • Woo, Namkyo;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제27권1호
    • /
    • pp.245-254
    • /
    • 2016
  • Many surveys provide categorical data and there may be one or more missing categories. We describe a nonignorable nonresponse model for the analysis of two-way contingency tables from small areas. There are both item and unit nonresponse. One approach to analyze these data is to construct several tables corresponding to missing categories. We describe a hierarchical Bayesian model to analyze two-way categorical data from different areas. This allows a "borrowing of strength" of the data from larger areas to improve the reliability in the estimates of the model parameters corresponding to the small areas. Also we use a nonignorable nonresponse model with Bayesian uncertainty analysis by placing priors in nonidentifiable parameters instead of a sensitivity analysis for nonidentifiable parameters. We use the griddy Gibbs sampler to fit our models and compute DIC and BPP for model diagnostics. We illustrate our method using data from NHANES III data on thirteen states to obtain the finite population proportions.

A Bayesian uncertainty analysis for nonignorable nonresponse in two-way contingency table

  • Woo, Namkyo;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • 제26권6호
    • /
    • pp.1547-1555
    • /
    • 2015
  • We study the problem of nonignorable nonresponse in a two-way contingency table and there may be one or two missing categories. We describe a nonignorable nonresponse model for the analysis of two-way categorical table. One approach to analyze these data is to construct several tables (one complete and the others incomplete). There are nonidentifiable parameters in incomplete tables. We describe a hierarchical Bayesian model to analyze two-way categorical data. We use a nonignorable nonresponse model with Bayesian uncertainty analysis by placing priors in nonidentifiable parameters instead of a sensitivity analysis for nonidentifiable parameters. To reduce the effects of nonidentifiable parameters, we project the parameters to a lower dimensional space and we allow the reduced set of parameters to share a common distribution. We use the griddy Gibbs sampler to fit our models and compute DIC and BPP for model diagnostics. We illustrate our method using data from NHANES III data to obtain the finite population proportions.

Empirical Comparisons of Clustering Algorithms using Silhouette Information

  • Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • 제10권1호
    • /
    • pp.31-36
    • /
    • 2010
  • Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.

Nomogram plot for predicting chronic otitis media in Korean adults

  • Kang, Eun Jin;Lee, Jea Young
    • Journal of the Korean Data and Information Science Society
    • /
    • 제28권4호
    • /
    • pp.899-910
    • /
    • 2017
  • Nomogram is useful for predicting the prevalence of each patient through the scoring system without a complex formula. Because there are few studies on chronic otitis media (COM) in adults, this study aims to identify the relevant risk factors for COM in Korean adults and to build a nomogram for the risk factors. The Health Interview Survey data subset, derived from the Sixth Korean National Health and Nutrition Examination Survey (KNHANES VI), was used to evaluate the participants. Of the participants, the weighted prevalence of COM was 5.3%. Residence, earphone use, atopic dermatitis, allergic rhinitis, chronic rhinosinusitis, and subjective hearing status were identified as risk factors for COM. Using 6 risk factors, we propose a nomogram for COM, and use AUC to verify the discrimination of the nomogram.

Block Toeplitz Matrix Inversion using Levinson Polynomials

  • Lee, Won-Cheol;Nam, Jong-Gil
    • 한국통신학회논문지
    • /
    • 제24권8B호
    • /
    • pp.1438-1443
    • /
    • 1999
  • In this paper, we propose detection methods for gradual scene changes such as dissolve, pan, and zoom. The proposal method to detect a dissolve region uses scene features based on spatial statistics of the image. The spatial statistics to define shot boundaries are derived from squared means within each local area. We also propose a method of the camera motion detection using four representative motion vectors in the background. Representative motion vectors are derived from macroblock motion vectors which are directly extracted from MPEG streams. To reduce the implementation time, we use DC sequences rather than fully decoded MPEG video. In addition, to detect the gradual scene change region precisely, we use all types of the MPEG frames(I, P, B frame). Simulation results show that the proposed detection methods perform better than existing methods.

  • PDF

MODELLING AFRICAN TRYPANOSOMIASIS IN HUMAN WITH OPTIMAL CONTROL AND COST-EFFECTIVENESS ANALYSIS

  • GERVAS, HAMENYIMANA EMANUEL;HUGO, ALFRED K.
    • Journal of applied mathematics & informatics
    • /
    • 제39권5_6호
    • /
    • pp.895-918
    • /
    • 2021
  • Human African Trypanosomiasis (HAT) also known as sleeping sickness, is a neglected tropical vector borne disease caused by trypanosome protozoa transmitted by bites of infected tsetse fly. The basic reproduction number, R0 derived using the next generation matrix method which shows that the disease persists in the population if the value of R0 > 1. The numerical simulations of optimal control model carried out to determine the control strategy that can combat HAT under the minimum cost. The results indicate that, the use of both education campaign, treatment and insecticides are more efficient and effective to eliminate HAT in African community but too costly. Furthermore, the cost-effectiveness of the control measures (education campaign, treatment and insecticides) were determined using incremental cost-effectiveness ratio (ICER) approach and the results show that, the use of education and treatment of infected people as the best cost effective strategy compared to other strategies.

Penalized maximum likelihood estimation with symmetric log-concave errors and LASSO penalty

  • Seo-Young, Park;Sunyul, Kim;Byungtae, Seo
    • Communications for Statistical Applications and Methods
    • /
    • 제29권6호
    • /
    • pp.641-653
    • /
    • 2022
  • Penalized least squares methods are important tools to simultaneously select variables and estimate parameters in linear regression. The penalized maximum likelihood can also be used for the same purpose assuming that the error distribution falls in a certain parametric family of distributions. However, the use of a certain parametric family can suffer a misspecification problem which undermines the estimation accuracy. To give sufficient flexibility to the error distribution, we propose to use the symmetric log-concave error distribution with LASSO penalty. A feasible algorithm to estimate both nonparametric and parametric components in the proposed model is provided. Some numerical studies are also presented showing that the proposed method produces more efficient estimators than some existing methods with similar variable selection performance.

A comparison of imputation methods using machine learning models

  • Heajung Suh;Jongwoo Song
    • Communications for Statistical Applications and Methods
    • /
    • 제30권3호
    • /
    • pp.331-341
    • /
    • 2023
  • Handling missing values in data analysis is essential in constructing a good prediction model. The easiest way to handle missing values is to use complete case data, but this can lead to information loss within the data and invalid conclusions in data analysis. Imputation is a technique that replaces missing data with alternative values obtained from information in a dataset. Conventional imputation methods include K-nearest-neighbor imputation and multiple imputations. Recent methods include missForest, missRanger, and mixgb ,all which use machine learning algorithms. This paper compares the imputation techniques for datasets with mixed datatypes in various situations, such as data size, missing ratios, and missing mechanisms. To evaluate the performance of each method in mixed datasets, we propose a new imputation performance measure (IPM) that is a unified measurement applicable to numerical and categorical variables. We believe this metric can help find the best imputation method. Finally, we summarize the comparison results with imputation performances and computational times.

성장화재통계 기반 건축물 용도별 화재위험도에 관한 기초연구 (A Basic Study on the Fire Risk by Building Use based Growth Fire Statistics)

  • 서동구;이종호
    • 한국건축시공학회:학술대회논문집
    • /
    • 한국건축시공학회 2020년도 봄 학술논문 발표대회
    • /
    • pp.218-219
    • /
    • 2020
  • The risk of a fire in a building is closely related to the usage of the building. In particular, all fires that occur in a building are not risky to safety of human life, and it is associated with the combustion area and the increase of total floor area of the building. Therefore, this study focused on safety of human life in terms of the statistics of fire with considering the aspect of growing fires and analyzed the statistical data of fire for 10 years. As for the analysis on fire, the time of occurrence by usages of buildings, frequency of occurrence and the ratio of casualties etc. were analyzed. It is expected that results of this study could be used for evaluations on a variety of parts in terms of design, construction and maintenance of buildings.

  • PDF