• Title/Summary/Keyword: Performance-based Statistics

Search Result 1,048, Processing Time 0.025 seconds

Comparison of survival prediction models for pancreatic cancer: Cox model versus machine learning models

  • Kim, Hyunsuk;Park, Taesung;Jang, Jinyoung;Lee, Seungyeoun
    • Genomics & Informatics
    • /
    • v.20 no.2
    • /
    • pp.23.1-23.9
    • /
    • 2022
  • A survival prediction model has recently been developed to evaluate the prognosis of resected nonmetastatic pancreatic ductal adenocarcinoma based on a Cox model using two nationwide databases: Surveillance, Epidemiology and End Results (SEER) and Korea Tumor Registry System-Biliary Pancreas (KOTUS-BP). In this study, we applied two machine learning methods-random survival forests (RSF) and support vector machines (SVM)-for survival analysis and compared their prediction performance using the SEER and KOTUS-BP datasets. Three schemes were used for model development and evaluation. First, we utilized data from SEER for model development and used data from KOTUS-BP for external evaluation. Second, these two datasets were swapped by taking data from KOTUS-BP for model development and data from SEER for external evaluation. Finally, we mixed these two datasets half and half and utilized the mixed datasets for model development and validation. We used 9,624 patients from SEER and 3,281 patients from KOTUS-BP to construct a prediction model with seven covariates: age, sex, histologic differentiation, adjuvant treatment, resection margin status, and the American Joint Committee on Cancer 8th edition T-stage and N-stage. Comparing the three schemes, the performance of the Cox model, RSF, and SVM was better when using the mixed datasets than when using the unmixed datasets. When using the mixed datasets, the C-index, 1-year, 2-year, and 3-year time-dependent areas under the curve for the Cox model were 0.644, 0.698, 0.680, and 0.687, respectively. The Cox model performed slightly better than RSF and SVM.

Bayesian logit models with auxiliary mixture sampling for analyzing diabetes diagnosis data (보조 혼합 샘플링을 이용한 베이지안 로지스틱 회귀모형 : 당뇨병 자료에 적용 및 분류에서의 성능 비교)

  • Rhee, Eun Hee;Hwang, Beom Seuk
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.131-146
    • /
    • 2022
  • Logit models are commonly used to predicting and classifying categorical response variables. Most Bayesian approaches to logit models are implemented based on the Metropolis-Hastings algorithm. However, the algorithm has disadvantages of slow convergence and difficulty in ensuring adequacy for the proposal distribution. Therefore, we use auxiliary mixture sampler proposed by Frühwirth-Schnatter and Frühwirth (2007) to estimate logit models. This method introduces two sequences of auxiliary latent variables to make logit models satisfy normality and linearity. As a result, the method leads that logit model can be easily implemented by Gibbs sampling. We applied the proposed method to diabetes data from the Community Health Survey (2020) of the Korea Disease Control and Prevention Agency and compared performance with Metropolis-Hastings algorithm. In addition, we showed that the logit model using auxiliary mixture sampling has a great classification performance comparable to that of the machine learning models.

Implementation and Performance Analysis of Web Robot for URL Analysis (URL 분석을 위한 웹 로봇 구현 및 성능분석)

  • Kim, Weon;Kim, Hie-Cheol;Chin, Yong-Ohk
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.3C
    • /
    • pp.226-233
    • /
    • 2002
  • This paper proposes the web robot based on Multi-Agent which the mutual dependency should be minimized each other with dividing the function each to collect Webpage. In result it is written to make a foundation for producing the effective statistics to analyze the domestic webpages and text, multimedia file composition ratio through performance analysis of the implemented system. It is easy that Web robot of the sequential processing method to collect Webpage on the same resource environment produces the limit of collecting performance. So to speak Webpages have "Dead-links" URL which is produced by the temporary host down and instability of network resource. If there are much "Dead-links" URL in the webpages, it takes a lot of time for web robot to collect HTML. The propose of this paper to be proposed, makes the maximum improvement to extract the webpages to process "Dead-links" URL on the Inactive URL scanner Agent.

Small Cell Communication Analysis based on Machine Learning in 5G Mobile Communication

  • Kim, Yoon-Hwan
    • Journal of Integrative Natural Science
    • /
    • v.14 no.2
    • /
    • pp.50-56
    • /
    • 2021
  • Due to the recent increase in the mobile streaming market, mobile traffic is increasing exponentially. IMT-2020, named as the next generation mobile communication standard by ITU, is called the 5th generation mobile communication (5G), and is a technology that satisfies the data traffic capacity, low latency, high energy efficiency, and economic efficiency compared to the existing LTE (Long Term Evolution) system. 5G implements this technology by utilizing a high frequency band, but there is a problem of path loss due to the use of a high frequency band, which is greatly affected by system performance. In this paper, small cell technology was presented as a solution to the high frequency utilization of 5G mobile communication system, and furthermore, the system performance was improved by applying machine learning technology to macro communication and small cell communication method decision. It was found that the system performance was improved due to the technical application and the application of machine learning techniques.

Statistical Algorithm in Genetic Linkage Based on Haplotypes (일배체형에 기초한 연쇄분석의 통계학적 알고리즘 연구)

  • Kim, Jin-Heum;Kang, Dae-Ryong;Lee, Yun-Kyung;Shin, Sun-Mi;Suh, Il;Nam, Chung-Mo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.37 no.4
    • /
    • pp.366-372
    • /
    • 2004
  • Objectives : This study was conducted to propose a new transmission/disequilibrium test(TDT) to test the linkage between genetic markers and disease-susceptibility genes based on haplotypes. Simulation studies were performed to compare the proposed method with that of Zhao et al. in terms of type I error probability and powers. Methods : We estimated the haplotype frequencies using the expectation-maximization(EM) algorithm with parents genotypes taken from a trio dataset, and then constructed a two-way contingency table containing estimated frequencies to all possible pairs of parents haplotypes. We proposed a score test based on differences between column marginals and their corresponding row marginals. The test also involved a covariance structure of marginal differences and their variances. In simulation, we considered a coalescent model with three genetic markers of biallele to investigate the performance of the proposed test under six different configurations. Results : The haplotype-based TDT statistics, our test and Zhao et al.'s test satisfied a type I error probability, but the TDT test based on single locus showed a conservative trend. As expected, the tests based on haplotypes also had better powers than those based on single locus. Our test and that of Zhao et al. were comparable in powers. Conclusion : We proposed a TDT statistic based on haplotypes and showed through simulations that our test was more powerful than the single locus-based test. We will extend our method to multiplex data with affected and/or unaffected sibling(s) or simplex data having only one parent s genotype.

A study on relationship between the performance of professional baseball players and annual salary (한국 프로야구 선수들의 경기력과 연봉의 관계 분석)

  • Seung, Hee-Bae;Kang, Kee-Hoon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.23 no.2
    • /
    • pp.285-298
    • /
    • 2012
  • This research deals with a relationship between the performance of Korean professional baseball players and their annual salaries. It is based on the sabermetrics, which measures the performance of baseball batters in a refined way. We collect the records of batters of eight professional baseball clubs during the season 2009 and 2010. Then, we calculate every index of the sabermetrics. Principal component analysis is used for examining the relationship between those indexes of sabermetrics and annual salary for the next year. In general, batters who show higher performance get more salary. The result of this research can be useful in order to reach an agreement on salary between a player and his club partner.

A novel route restoring method upon geo-tagged photos

  • Wang, Guannan;Wang, Zhizhong;Zhu, Zhenmin;Wen, Saiping
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.7 no.5
    • /
    • pp.1236-1251
    • /
    • 2013
  • Sharing geo-tagged photos has been a hot social activity in the daily life because these photos not only contain geo information but also indicate people's hobbies, intention and mobility patterns. However, the present raw geo-tagged photo routes cannot provide information as enough as complete GPS trajectories due to the defects hidden in them. This paper mainly aims at analyzing the large amounts of geo-tagged photos and proposing a novel travel route restoring method. In our approach we first propose an Interest Measure Ratio to rank the hot spots based on density-based spatial clustering arithmetic. Then we apply the Hidden Semi-Markov model and Mean Value method to demonstrate migration discipline in the hot spots and restore the significant region sequence into complete GPS trajectory. At the end of the paper, a novel experiment method is designed to demonstrate that the approach is feasible in restoring route, and there is a good performance.

A study on alternatives to the permutation test in gene-set analysis (유전자집합분석에서 순열검정의 대안)

  • Lee, Sunho
    • The Korean Journal of Applied Statistics
    • /
    • v.31 no.2
    • /
    • pp.241-251
    • /
    • 2018
  • The analysis of gene sets in microarray has advantages in interpreting biological functions and increasing statistical powers. Many statistical methods have been proposed for detecting significant gene sets that show relations between genes and phenotypes, but there is no consensus about which is the best to perform gene sets analysis and permutation based tests are considered as standard tools. When many gene sets are tested simultaneously, a large number of random permutations are needed for multiple testing with a high computational cost. In this paper, several parametric approximations are considered as alternatives of the permutation distribution and the moment based gene set test has shown the best performance for providing p-values of the permutation test closely and quickly on a general framework.

Cost Reduction and Improving Profitability of Par Level Transfer System for Reagent Materials (정량보충제 도입에 따른 비용절감 및 수익성 증대 효과)

  • Vae, Suk Jin;Hwang, Sung Wan
    • Korea Journal of Hospital Management
    • /
    • v.17 no.4
    • /
    • pp.21-31
    • /
    • 2012
  • This is a case study of Gangnam S University Hospital applying a par level transfer system for reagent materials. The purpose of this study is evaluated on the cutting down on inventory expenses and medical service revenue in the point of resource based view. The data was acquired through the financial statement of Gangnam S Hospital for the fiscal year 2008, 2009, 2010 and 2011, and compared with the Korea health industry statistics index for hospital accounts based on the materials in Korea Health Industry Development Institute. The results of the study are as follows. Medical reagent materials expenditure cut down as 305 million won through 2009 fiscal year. Medical profits for the Gangnam S University hospital's income statement in 2011 show well over acquired 3.37 billion won through the enlarged diagnostic test numbers. In conclusion, Gangnam S University Hospital health statistics's index shows very high profits. The results of this study have some limitations in terms of generalization as only one hospital in Seoul. Further studies with relationship inventory performance and enlarged reagent materials are expected in this area.

  • PDF

Regression models based on cumulative data for forecasting of new product (신제품 수요예측을 위하여 누적자료를 활용한 회귀모형에 관한 연구)

  • Park, Sang-Gue;Oh, Jung-Hyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.20 no.1
    • /
    • pp.117-124
    • /
    • 2009
  • If time series data with seasonal effect exist, various statistical models like winters for successful forecasts could be used. But if the data are not enough to estimate seasonal effect, not much methods are available. This paper proposes the statistical forecasting method based on cumulative data when the data are not enough to estimate seasonal effect. We apply this method to real cosmetic sales data and show its better performance over moving average method.

  • PDF