• Title/Summary/Keyword: Statistic approach

Search Result 147, Processing Time 0.026 seconds

A Model Approach to Calculate Cancer Prevalence From 5 Year Survival Data for Selected Cancer Sites in India

  • Takiar, Ramnath;Jayant, Kasturi
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.11
    • /
    • pp.6899-6903
    • /
    • 2013
  • Background: Prevalence is a statistic of primary interest in public health. In the absence of good follow-up facilities, it is difficult to assess the complete prevalence of cancer for a given registry area. Objective: An attempt was here made to arrive at complete prevalence including limited duration prevalence with respect to selected sites of cancer for India by fitting appropriate models to 1, 3 and 5 years cancer survival data available for selected population-based registries. Materials and Methods: Survival data, available for the registries of Bhopal, Chennai, Karunagappally, and Mumbai was pooled to generate survival for breast, cervix, ovary, lung, stomach and mouth cancers. With the available data on survival for 1, 3 and 5 years, a model was fitted and the survival curve was extended beyond 5 years (up to 35 years) for each of the selected sites. This helped in generation of survival proportions by single year and thereby survival of cancer cases. With the help of survival proportions available year-wise and the incidence, prevalence figures were arrived for selected cancer sites and for selected periods. Results: The prevalence to incidence ratio (PI ratio) stabilized after a certain duration for all the cancer sites showing that from the knowledge of incidence, the prevalence can be calculated. The stabilized P/I ratios for the cancer sites of breast, cervix, ovary, stomach, lung, mouth and for life time was observed to be 4.90, 5.33, 2.75, 1.40, 1.37, 4.04 and 3.42 respectively. Conclusions: The validity of the model approach to calculate prevalence could be demonstrated with the help of survival data of Barshi registry for cervix cancer, available for the period 1988-2006.

Implementation of Markov Chain: Review and New Application (관리도에서 Markov연쇄의 적용: 복습 및 새로운 응용)

  • Park, Chang-Soon
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.4
    • /
    • pp.657-676
    • /
    • 2011
  • Properties of statistical process control procedures may not be derived analytically in many cases; however, the application of a Markov chain can solve such problems. This article shows how to derive the properties of the process control procedures using the generated Markov chains when the control statistic satisfies the Markov property. Markov chain approaches that appear in the literature (such as the statistical design and economic design of the control chart as well as the variable sampling rate design) are reviewed along with the introduction of research results for application to a new control procedure and reset chart. The joint application of a Markov chain approach and analytical solutions (when available) can guarantee the correct derivation of the properties. A Markov chain approach is recommended over simulation studies due to its precise derivation of properties and short calculation times.

A Practical Approach Determining an IDF formula with Limited Rainfall-Duration Data Availability (제한적 강우-지속기간 자료를 이용한 실용적 IDF 관계식의 유도)

  • Seong, Kee-Won
    • Journal of Korea Water Resources Association
    • /
    • v.41 no.6
    • /
    • pp.587-595
    • /
    • 2008
  • In order to aid the derivation of the IDF relationship for a station with insufficient duration-rainfall data, an approach to derive a simple and practical IDF formula is presented. The IDF formula is described simply by the term of the two parameters and a design frequency. The model parameters were estimated from a statistical technique based on the normal distribution of transformed rainfall intensities. In order to give the transformed data, both the Kruskal-Wallis statistic and the Manly transformation of duration-rainfall data were adopted. With the methods, the proposed IDF formula becomes a simpler model that compares well with conventional form. In addition, it allows avoiding an exceptional condition of the higher rainfall intensity for longer duration. The performance of the proposed formula was evaluated by using the limited rainfall data for short duration from two gauge stations. The result showed that the IDF formula developed in this work was an effective tool, providing a reliable relationship between the intensity and duration even though insufficient data are only available.

Nomogram Estimating the Probability of Intraabdominal Abscesses after Gastrectomy in Patients with Gastric Cancer

  • Eom, Bang Wool;Joo, Jungnam;Kim, Young-Woo;Park, Boram;Yoon, Hong Man;Ryu, Keun Won;Kim, Soo Jin
    • Journal of Gastric Cancer
    • /
    • v.15 no.4
    • /
    • pp.262-269
    • /
    • 2015
  • Purpose: Intraabdominal abscess is one of the most common reasons for re-hospitalization after gastrectomy. This study aimed to develop a model for estimating the probability of intraabdominal abscesses that can be used during the postoperative period. Materials and Methods: We retrospectively reviewed the clinicopathological data of 1,564 patients who underwent gastrectomy for gastric cancer between 2010 and 2012. Twenty-six related markers were analyzed, and multivariate logistic regression analysis was used to develop the probability estimation model for intraabdominal abscess. Internal validation using a bootstrap approach was employed to correct for bias, and the model was then validated using an independent dataset comprising of patients who underwent gastrectomy between January 2008 and March 2010. Discrimination and calibration abilities were checked in both datasets. Results: The incidence of intraabdominal abscess in the development set was 7.80% (122/1,564). The surgical approach, operating time, pathologic N classification, body temperature, white blood cell count, C-reactive protein level, glucose level, and change in the hemoglobin level were significant predictors of intraabdominal abscess in the multivariate analysis. The probability estimation model that was developed on the basis of these results showed good discrimination and calibration abilities (concordance index=0.828, Hosmer-Lemeshow chi-statistic P=0.274). Finally, we combined both datasets to produce a nomogram that estimates the probability of intraabdominal abscess. Conclusions: This nomogram can be useful for identifying patients at a high risk of intraabdominal abscess. Patients at a high risk may benefit from further evaluation or treatment before discharge.

A Spatial Statistical Approach on the Correlation between Walkability Index and Urban Spatial Characteristics -Case Study on Two Administrative Districts, Busan- (도시 공간특성과 Walkability Index의 상관성에 관한 공간통계학적 접근 -부산광역시 2개 구를 대상으로-)

  • Choi, Don Jeong;Suh, Yong Cheol
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.32 no.4_1
    • /
    • pp.343-351
    • /
    • 2014
  • The correlation between regional Walkability Index and their physical socio-economic characteristics has evaluated by the spatial statistical analysis to understand the urban pedestrian environments, where has been emerging the significance, recently. Following to the study, the Walkability Indexes were calculated quantitatively from two administrative districts of Busan and measured Global Local spatial autocorrelation indices. Additionally, the Geographically Weighted Regression model was applied to define the correlation between Walkability Indexes and urban environmental variables. The spatial autocorrelation values and clusters on the Walkability Indexes were derived in statistically significant level. Furthermore, the Geographically Weighted Regression model has been derived more improved inference than the OLS regression model, so as the influence of local level pedestrian environment was identified. The results of this study suggest that the spatial statistical approach can be effective on quantitative assessing the pedestrian environment and navigating their associated factors.

Implementation of Markov chain: Review and new application (관리도에서 Markov연쇄의 적용: 복습 및 새로운 응용)

  • Park, Changsoon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.4
    • /
    • pp.537-556
    • /
    • 2021
  • Properties of statistical process control procedures may not be derived analytically in many cases; however, the application of a Markov chain can solve such problems. This article shows how to derive the properties of the process control procedures using the generated Markov chains when the control statistic satisfies the Markov property. Markov chain approaches that appear in the literature (such as the statistical design and economic design of the control chart as well as the variable sampling rate design) are reviewed along with the introduction of research results for application to a new control procedure and reset chart. The joint application of a Markov chain approach and analytical solutions (when available) can guarantee the correct derivation of the properties. A Markov chain approach is recommended over simulation studies due to its precise derivation of properties and short calculation times.

Application of Geo-Statistic and Data-Mining for Determining Sampling Number and Interval for Monitoring Microbial Diversity in Tidal Mudflat (갯벌 미생물 다양성 모니터링 시료 채취 개수 및 간격 선정을 위한 지구통계학적 기법과 데이터 마이닝 적용 연구)

  • Yang, Ji-Hoon;Lee, Jae-Jin;Yoo, Keun-Je;Park, Joon-Hong
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.32 no.12
    • /
    • pp.1102-1110
    • /
    • 2010
  • Tidal mudflat is a reservoir for diverse microbial resources. Microbial diversity in tidal mudflat sediment can be easily influenced by various human activities. It is necessary to take representative samples to monitor microbial diversity in tidal mudflat sediments. In this study, we analyzed the microbial diversity and chemical characteristics of vegetation and non-vegetation tidal mudflat regions in the Kangwha tidal mudflat using geo-statistics and data-mining. According to the geo-statistical analysis, most correlation range values for the vegetation region were smaller than those for the non-vegetation region, which suggested that the shorter number and interval of sampling are required for the vegetation tidal mudflat environment due to its higher degree of chemical and biological complexity and heterogeneity. The data-mining analysis suggested that the organic content and nitrate were the major environmental factors influencing microbial diversity in the vegetation region while pH and sulfate were the major influencing factors in the non-vegetation region. Using the geo-statistical and data-mining integration approach, we proposed a guideline for determining the sampling interval and number to monitor microbial diversity in tidal mudflat.

Estimation of BDI Volatility: Leverage GARCH Models (BDI의 변동성 추정: 레버리지 GARCH 모형을 중심으로)

  • Mo, Soo-Won;Lee, Kwang-Bae
    • Journal of Korea Port Economic Association
    • /
    • v.30 no.3
    • /
    • pp.1-14
    • /
    • 2014
  • This paper aims at measuring how new information is incorporated into volatility estimates. Various GARCH models are compared and estimated with daily BDI(Baltic Dry Index) data. While most researchers agree that volatility is predictable, they differ on how this volatility predictability should be modelled. This study, hence, introduces the asymmetric or leverage volatility models, in which good news and bad news have different predictability for future. We provide the systematic comparison of volatility models focusing on the asymmetric effect of news on volatility. Specifically, three diagnostic tests are provided: the sign bias test, the negative size bias test, and the positive size bias test. From the Ljung-Box test statistic for twelfth-order serial correlation for the level we do not find any significant serial correlation in the unpredictable BDI. The coefficients of skewness and kurtosis both indicate that the unpredictable BDI has a distribution which is skewed to the left and significantly flat tailed. Furthermore, the Ljung-Box test statistic for twelfth-order serial correlations in the squares strongly suggests the presence of time-varying volatility. The sign bias test, the negative size bias test, and the positive size bias test strongly indicate that large positive(negative) BDI shocks cause more volatility than small ones. This paper, also, shows that three leverage models have problems in capturing the correct impact of news on volatility and that negative shocks do not cause higher volatility than positive shocks. Specifically, the GARCH model successfully reveals the shape of the news impact curve and is a useful approach to modeling conditional heteroscedasticity of daily BDI.

Terms Based Sentiment Classification for Online Review Using Support Vector Machine (Support Vector Machine을 이용한 온라인 리뷰의 용어기반 감성분류모형)

  • Lee, Taewon;Hong, Taeho
    • Information Systems Review
    • /
    • v.17 no.1
    • /
    • pp.49-64
    • /
    • 2015
  • Customer reviews which include subjective opinions for the product or service in online store have been generated rapidly and their influence on customers has become immense due to the widespread usage of SNS. In addition, a number of studies have focused on opinion mining to analyze the positive and negative opinions and get a better solution for customer support and sales. It is very important to select the key terms which reflected the customers' sentiment on the reviews for opinion mining. We proposed a document-level terms-based sentiment classification model by select in the optimal terms with part of speech tag. SVMs (Support vector machines) are utilized to build a predictor for opinion mining and we used the combination of POS tag and four terms extraction methods for the feature selection of SVM. To validate the proposed opinion mining model, we applied it to the customer reviews on Amazon. We eliminated the unmeaning terms known as the stopwords and extracted the useful terms by using part of speech tagging approach after crawling 80,000 reviews. The extracted terms gained from document frequency, TF-IDF, information gain, chi-squared statistic were ranked and 20 ranked terms were used to the feature of SVM model. Our experimental results show that the performance of SVM model with four POS tags is superior to the benchmarked model, which are built by extracting only adjective terms. In addition, the SVM model based on Chi-squared statistic for opinion mining shows the most superior performance among SVM models with 4 different kinds of terms extraction method. Our proposed opinion mining model is expected to improve customer service and gain competitive advantage in online store.

Optimal design of a nonparametric Shewhart-Lepage control chart (비모수적 Shewhart-Lepage 관리도의 최적 설계)

  • Lee, Sungmin;Lee, Jaeheon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.2
    • /
    • pp.339-348
    • /
    • 2017
  • One of the major issues of statistical process control for variables data is monitoring both the mean and the standard deviation. The traditional approach to monitor these parameters is to simultaneously use two seperate control charts. However there have been some works on developing a single chart using a single plotting statistic for joint monitoring, and it is claimed that they are simpler and may be more appealing than the traditonal one from a practical point of view. When using these control charts for variables data, estimating in-control parameters and checking the normality assumption are the very important step. Nonparametric Shewhart-Lepage chart, proposed by Mukherjee and Chakraborti (2012), is an attractive option, because this chart uses only a single control statistic, and does not require the in-control parameters and the underlying continuous distribution. In this paper, we introduce the Shewhart-Lepage chart, and propose the design procedure to find the optimal diagnosis limits when the location and the scale parameters change simultaneously. We also compare the efficiency of the proposed method with that of Mukherjee and Chakraborti (2012).