• Title/Summary/Keyword: predictive potential

Search Result 337, Processing Time 0.023 seconds

Analysis of SEER Adenosquamous Carcinoma Data to Identify Cause Specific Survival Predictors and Socioeconomic Disparities

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.1
    • /
    • pp.347-352
    • /
    • 2016
  • Background: This study used receiver operating characteristic curve to analyze Surveillance, Epidemiology and End Results (SEER) adenosquamous carcinoma data to identify predictive models and potential disparities in outcome. Materials and Methods: This study analyzed socio-economic, staging and treatment factors available in the SEER database for adenosquamous carcinoma. For the risk modeling, each factor was fitted by a generalized linear model to predict the cause specific survival. An area under the receiver operating characteristic curve (ROC) was computed. Similar strata were combined to construct the most parsimonious models. Results: A total of 20,712 patients diagnosed from 1973 to 2009 were included in this study. The mean follow up time (S.D.) was 54.2 (78.4) months. Some 2/3 of the patients were female. The mean (S.D.) age was 63 (13.8) years. SEER stage was the most predictive factor of outcome (ROC area of 0.71). 13.9% of the patients were un-staged and had risk of cause specific death of 61.3% that was higher than the 45.3% risk for the regional disease and lower than the 70.3% for metastatic disease. Sex, site, radiotherapy, and surgery had ROC areas of about 0.55-0.65. Rural residence and race contributed to socioeconomic disparity for treatment outcome. Radiotherapy was underused even with localized and regional stages when the intent was curative. This under use was most pronounced in older patients. Conclusions: Anatomic stage was predictive and useful in treatment selection. Under-staging may have contributed to poor outcome.

Analysis of SEER Glassy Cell Carcinoma Data: Underuse of Radiotherapy and Predicators of Cause Specific Survival

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.1
    • /
    • pp.353-356
    • /
    • 2016
  • Background: This study used receiver operating characteristic curve to analyze Surveillance, Epidemiology and End Results (SEER) for glassy cell carcinoma data to identify predictive models and potential disparities in outcome. Materials and Methods: This study analyzed socio-economic, staging and treatment factors. For risk modeling, each factor was fitted by a generalized linear model to predict the cause specific survival. Area under the receiver operating characteristic curves (ROCs) were computed. Similar strata were combined to construct the most parsimonious models. A random sampling algorithm was used to estimate modeling errors. Risk of glassy cell carcinoma death was computed for the predictors for comparison. Results: There were 79 patients included in this study. The mean follow up time (S.D.) was 37 (32.8) months. Female patients outnumbered males 4:1. The mean (S.D.) age was 54.4 (19.8) years. SEER stage was the most predictive factor of outcome (ROC area of 0.69). The risks of cause specific death were, respectively, 9.4% for localized, 16.7% for regional, 35% for the un-staged/others category, and 60% for distant disease. After optimization, separation between the regional and unstaged/others category was removed with a higher ROC area of 0.72. Several socio-economic factors had small but measurable effects on outcome. Radiotherapy had not been used in 90% of patients with regional disease. Conclusions: Optimized SEER stage was predictive and useful in treatment selection. Underuse of radiotherapy may have contributed to poor outcome.

Data Mining System in the Service Industry : Delphi Study

  • Hyun, Sung-Hyup;Huh, Jin;Hahm, Sung-Pil
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.10 no.4
    • /
    • pp.128-136
    • /
    • 2005
  • The use of technology is increasing within the service industry, but there is some doubt as to whether the benefits of employing this technology have been efficiently harnessed such as data mining. Data mining is the process of extracting certain predictive information from databases that can evolve from currently used restaurant management systems. The potential of harnessing this predictive information can have an enormous impact on the restaurant's operation on the whole, particularly in the area customer retention and competition. Since there is insufficient literature on the use of data mining in the restaurant industry, this study is both seminal and investigative, done via a Delphi survey to explore and describe the current and future applications of this process.

  • PDF

Racial and Social Economic Factors Impact on the Cause Specific Survival of Pancreatic Cancer: A SEER Survey

  • Cheung, Rex
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.1
    • /
    • pp.159-163
    • /
    • 2013
  • Background: This study used Surveillance, Epidemiology and End Results (SEER) pancreatic cancer data to identify predictive models and potential socio-economic disparities in pancreatic cancer outcome. Materials and Methods: For risk modeling, Kaplan Meier method was used for cause specific survival analysis. The Kolmogorov-Smirnov's test was used to compare survival curves. The Cox proportional hazard method was applied for multivariate analysis. The area under the ROC curve was computed for predictors of absolute risk of death, optimized to improve efficiency. Results: This study included 58,747 patients. The mean follow up time (S.D.) was 7.6 (10.6) months. SEER stage and grade were strongly predictive univariates. Sex, race, and three socio-economic factors (county level family income, rural-urban residence status, and county level education attainment) were independent multivariate predictors. Racial and socio-economic factors were associated with about 2% difference in absolute cause specific survival. Conclusions: This study s found significant effects of socio-economic factors on pancreas cancer outcome. These data may generate hypotheses for trials to eliminate these outcome disparities.

Bayesian curve-fitting with radial basis functions under functional measurement error model

  • Hwang, Jinseub;Kim, Dal Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.749-754
    • /
    • 2015
  • This article presents Bayesian approach to regression splines with knots on a grid of equally spaced sample quantiles of the independent variables under functional measurement error model.We consider small area model by using penalized splines of non-linear pattern. Specifically, in a basis functions of the regression spline, we use radial basis functions. To fit the model and estimate parameters we suggest a hierarchical Bayesian framework using Markov Chain Monte Carlo methodology. Furthermore, we illustrate the method in an application data. We check the convergence by a potential scale reduction factor and we use the posterior predictive p-value and the mean logarithmic conditional predictive ordinate to compar models.

Modeling Aided Lead Design of FAK Inhibitors

  • Madhavan, Thirumurthy
    • Journal of Integrative Natural Science
    • /
    • v.4 no.4
    • /
    • pp.266-272
    • /
    • 2011
  • Focal adhesion kinase (FAK) is a potential target for the treatment of primary cancers as well as prevention of tumor metastasis. To understand the structural and chemical features of FAK inhibitors, we report comparative molecular field analysis (CoMFA) for the series of 7H-pyrrolo(2,3-d)pyrimidines. The CoMFA models showed good correlation between the actual and predicted values for training set molecules. Our results indicated the ligand-based alignment has produced better statistical results for CoMFA ($q^2$ = 0.505, $r^2$ = 0.950). Both models were validated using test set compounds, and gave good predictive values of 0.537. The statistical parameters from the generated 3D-QSAR models were indicated that the data are well fitted and have high predictive ability. The contour map from 3D-QSAR models explains nicely the structure-activity relationships of FAK inhibitors and our results would give proper guidelines to further enhance the activity of novel inhibitors.

Performance Evaluation of a Feature-Importance-based Feature Selection Method for Time Series Prediction

  • Hyun, Ahn
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.1
    • /
    • pp.82-89
    • /
    • 2023
  • Various machine-learning models may yield high predictive power for massive time series for time series prediction. However, these models are prone to instability in terms of computational cost because of the high dimensionality of the feature space and nonoptimized hyperparameter settings. Considering the potential risk that model training with a high-dimensional feature set can be time-consuming, we evaluate a feature-importance-based feature selection method to derive a tradeoff between predictive power and computational cost for time series prediction. We used two machine learning techniques for performance evaluation to generate prediction models from a retail sales dataset. First, we ranked the features using impurity- and Local Interpretable Model-agnostic Explanations (LIME) -based feature importance measures in the prediction models. Then, the recursive feature elimination method was applied to eliminate unimportant features sequentially. Consequently, we obtained a subset of features that could lead to reduced model training time while preserving acceptable model performance.

Evaluating seismic liquefaction potential using multivariate adaptive regression splines and logistic regression

  • Zhang, Wengang;Goh, Anthony T.C.
    • Geomechanics and Engineering
    • /
    • v.10 no.3
    • /
    • pp.269-284
    • /
    • 2016
  • Simplified techniques based on in situ testing methods are commonly used to assess seismic liquefaction potential. Many of these simplified methods were developed by analyzing liquefaction case histories from which the liquefaction boundary (limit state) separating two categories (the occurrence or non-occurrence of liquefaction) is determined. As the liquefaction classification problem is highly nonlinear in nature, it is difficult to develop a comprehensive model using conventional modeling techniques that take into consideration all the independent variables, such as the seismic and soil properties. In this study, a modification of the Multivariate Adaptive Regression Splines (MARS) approach based on Logistic Regression (LR) LR_MARS is used to evaluate seismic liquefaction potential based on actual field records. Three different LR_MARS models were used to analyze three different field liquefaction databases and the results are compared with the neural network approaches. The developed spline functions and the limit state functions obtained reveal that the LR_MARS models can capture and describe the intrinsic, complex relationship between seismic parameters, soil parameters, and the liquefaction potential without having to make any assumptions about the underlying relationship between the various variables. Considering its computational efficiency, simplicity of interpretation, predictive accuracy, its data-driven and adaptive nature and its ability to map the interaction between variables, the use of LR_MARS model in assessing seismic liquefaction potential is promising.

Privacy Disclosure and Preservation in Learning with Multi-Relational Databases

  • Guo, Hongyu;Viktor, Herna L.;Paquet, Eric
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.3
    • /
    • pp.183-196
    • /
    • 2011
  • There has recently been a surge of interest in relational database mining that aims to discover useful patterns across multiple interlinked database relations. It is crucial for a learning algorithm to explore the multiple inter-connected relations so that important attributes are not excluded when mining such relational repositories. However, from a data privacy perspective, it becomes difficult to identify all possible relationships between attributes from the different relations, considering a complex database schema. That is, seemingly harmless attributes may be linked to confidential information, leading to data leaks when building a model. Thus, we are at risk of disclosing unwanted knowledge when publishing the results of a data mining exercise. For instance, consider a financial database classification task to determine whether a loan is considered high risk. Suppose that we are aware that the database contains another confidential attribute, such as income level, that should not be divulged. One may thus choose to eliminate, or distort, the income level from the database to prevent potential privacy leakage. However, even after distortion, a learning model against the modified database may accurately determine the income level values. It follows that the database is still unsafe and may be compromised. This paper demonstrates this potential for privacy leakage in multi-relational classification and illustrates how such potential leaks may be detected. We propose a method to generate a ranked list of subschemas that maintains the predictive performance on the class attribute, while limiting the disclosure risk, and predictive accuracy, of confidential attributes. We illustrate and demonstrate the effectiveness of our method against a financial database and an insurance database.

Determination of Commercialization Potential Through Patent Attribute Assessment in Lithium Ion Battery Technology (특허가치 평가지표 선정을 통한 기술 사업화 가능성 판단 : 리튬이온전지분야)

  • Kim, Wanki
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.2
    • /
    • pp.240-249
    • /
    • 2014
  • This study aims to identify an assessment system based on multiple patent indices that can predict the likelihood of success in the commercialization of a patented technology in advance. In addition, we examine the effectiveness of our predictive model in identifying valuable technologies early on. We analyzed 3,063 secondary battery technologies patented in the US over the past 10 years. Our analysis identified 22 of the 25 most promising patented technologies, corresponding with the top 50% of industry-patented technologies that directly and indirectly succeeded in commercialization. These results support our claim that it is possible to identify attributes for the assessment of patent commercial potential to a significant degree. Our system presents a useful assessment index in the forecasting and determination of potential commercial success of patented technologies.