• Title/Summary/Keyword: Cancer prediction

Search Result 414, Processing Time 0.024 seconds

Classification of Genes Based on Age-Related Differential Expression in Breast Cancer

  • Lee, Gunhee;Lee, Minho
    • Genomics & Informatics
    • /
    • v.15 no.4
    • /
    • pp.156-161
    • /
    • 2017
  • Transcriptome analysis has been widely used to make biomarker panels to diagnose cancers. In breast cancer, the age of the patient has been known to be associated with clinical features. As clinical transcriptome data have accumulated significantly, we classified all human genes based on age-specific differential expression between normal and breast cancer cells using public data. We retrieved the values for gene expression levels in breast cancer and matched normal cells from The Cancer Genome Atlas. We divided genes into two classes by paired t test without considering age in the first classification. We carried out a secondary classification of genes for each class into eight groups, based on the patterns of the p-values, which were calculated for each of the three age groups we defined. Through this two-step classification, gene expression was eventually grouped into 16 classes. We showed that this classification method could be applied to establish a more accurate prediction model to diagnose breast cancer by comparing the performance of prediction models with different combinations of genes. We expect that our scheme of classification could be used for other types of cancer data.

Development and Validation of a Breast Cancer Risk Prediction Model for Thai Women: A Cross-Sectional Study

  • Anothaisintawee, Thunyarat;Teerawattananon, Yot;Wiratkapun, Cholatip;Srinakarin, Jiraporn;Woodtichartpreecha, Piyanoot;Hirunpat, Siriporn;Wongwaisayawan, Sansanee;Lertsithichai, Panuwat;Kasamesup, Vijj;Thakkinstian, Ammarin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.16
    • /
    • pp.6811-6817
    • /
    • 2014
  • Background: Breast cancer risk prediction models are widely used in clinical practice. They should be useful in identifying high risk women for screening in limited-resource countries. However, previous models showed poor performance in derived and validated settings. Therefore, we aimed to develop and validate a breast cancer risk prediction model for Thai women. Materials and Methods: This cross-sectional study consisted of derived and validation phases. Data collected at Ramathibodi and other two hospitals were used for deriving and externally validating models, respectively. Multiple logistic regression was applied to construct the model. Calibration and discrimination performances were assessed using the observed/expected ratio and concordance statistic (C-statistic), respectively. A bootstrap with 200 repetitions was applied for internal validation. Results: Age, menopausal status, body mass index, and use of oral contraceptives were significantly associated with breast cancer and were included in the model. Observed/expected ratio and C-statistic were 1.00 (95% CI: 0.82, 1.21) and 0.651 (95% CI: 0.595, 0.707), respectively. Internal validation showed good performance with a bias of 0.010 (95% CI: 0.002, 0.018) and C-statistic of 0.646(95% CI: 0.642, 0.650). The observed/expected ratio and C-statistic from external validation were 0.97 (95% CI: 0.68, 1.35) and 0.609 (95% CI: 0.511, 0.706), respectively. Risk scores were created and was stratified as low (0-0.86), low-intermediate (0.87-1.14), intermediate-high (1.15-1.52), and high-risk (1.53-3.40) groups. Conclusions: A Thai breast cancer risk prediction model was created with good calibration and fair discrimination performance. Risk stratification should aid to prioritize high risk women to receive an organized breast cancer screening program in Thailand and other limited-resource countries.

Prediction of Lung Cancer Susceptibility using an Importance Evaluation of SNP Data and SVM Learning (SNP 데이터의 중요도 평가와 SVM 학습법을 이용한 폐암 감수성 예측)

  • Ryoo, Myung-Chun;Kim, Sang-Jin;Park, Chang-Hyeon
    • The Journal of the Korea Contents Association
    • /
    • v.8 no.10
    • /
    • pp.11-19
    • /
    • 2008
  • In this paper, we propose a prediction method of lung cancer susceptibility using an importance evaluation of SNP data and the SVM learning, a gene data concerning getting sick with the lung cancer. Since the number of negative data is much larger that of positive data, which are to be used in the SVM learning, for each positive data, a negative data is first searched which has the same sex and the minimum age difference with the positive data. The searched negative data is then coupled with the positive data. For the importance evaluation of each SNP data, an equation which calculates the influence of each SNP data on the prediction of getting sick is adopted. The SNP data are sorted according to the evaluated importance. In experiments, we observed the prediction accuracy which varies according to the number of sorted SNP data used in the learning. LOOCV test results showed that the proposed method yields the prediction accuracy of maximum 65.0% for test data.

An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques (데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축)

  • Park, Sun-A
    • Asian Oncology Nursing
    • /
    • v.5 no.1
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

Extracting Wisconsin Breast Cancer Prediction Fuzzy Rules Using Neural Network with Weighted Fuzzy Membership Functions (가중 퍼지 소속함수 기반 신경망을 이용한 Wisconsin Breast Cancer 예측 퍼지규칙의 추출)

  • Lim Joon Shik
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.717-722
    • /
    • 2004
  • This paper presents fuzzy rules to predict diagnosis of Wisconsin breast cancer using neural network with weighted fuzzy membership functions (NNWFM). NNWFM is capable of self-adapting weighted membership functions to enhance accuracy in prediction from the given clinical training data. n set of small, medium, and large weighted triangular membership functions in a hyperbox are used for representing n set of featured input. The membership functions are randomly distributed and weighted initially, and then their positions and weights are adjusted during learning. After learning, prediction rules are extracted directly from the enhanced bounded sums of n set of weighted fuzzy membership functions. Two number of prediction rules extracted from NNWFM outperforms to the current published results in number of rules and accuracy with 99.41%.

Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • v.54 no.4
    • /
    • pp.1439-1448
    • /
    • 2022
  • Background: We investigated the feasibility of in vitro radiosensitivity prediction with gene expression using deep learning. Methods: A microarray gene expression of the National Cancer Institute-60 (NCI-60) panel was acquired from the Gene Expression Omnibus. The clonogenic surviving fractions at an absorbed dose of 2 Gy (SF2) from previous publications were used to measure in vitro radiosensitivity. The radiosensitivity prediction model was based on the convolutional neural network. The 6-fold cross-validation (CV) was applied to train and validate the model. Then, the leave-one-out cross-validation (LOOCV) was applied by using the large-errored samples as a validation set, to determine whether the error was from the high bias of the folded CV. The criteria for correct prediction were defined as an absolute error<0.01 or a relative error<10%. Results: Of the 174 triplicated samples of NCI-60, 171 samples were correctly predicted with the folded CV. Through an additional LOOCV, one more sample was correctly predicted, representing a prediction accuracy of 98.85% (172 out of 174 samples). The average relative error and absolute errors of 172 correctly predicted samples were 1.351±1.875% and 0.00596±0.00638, respectively. Conclusion: We demonstrated the feasibility of a deep learning-based in vitro radiosensitivity prediction using gene expression.

Prognostic Value of Preoperative Serum CA 242 in Esophageal Squamous Cell Carcinoma Cases

  • Feng, Ji-Feng;Huang, Ying;Chen, Qi-Xun
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.3
    • /
    • pp.1803-1806
    • /
    • 2013
  • Purpose: Carbohydrate antigen (CA) 242 is inversely related to prognosis in many cancers. However, few data regarding CA 242 in esophageal cancer (EC) are available. The aim of this study was to determine the prognostic value of CA 242 and propose an optimum cut-off point in predicting survival difference in patients with esophageal squamous cell carcinoma (ESCC). Methods: A retrospective analysis was conducted of 192 cases. A receiver operating characteristic (ROC) curve for survival prediction was plotted to verify the optimum cuf-off point. Univariate and multivariate analyses were performed to evaluate prognostic parameters for survival. Results: The positive rate for CA 242 was 7.3% (14/192). The ROC curve for survival prediction gave an optimum cut-off of 2.15 (U/ml). Patients with CA 242 ${\leq}$ 2.15 U/ml had significantly better 5-year survival than patients with CA 242 >2.15 U/ml (45.4% versus 22.6%; P=0.003). Multivariate analysis showed that differentiation (P=0.033), CA 242 (P=0.017), T grade (P=0.004) and N staging (P<0.001) were independent prognostic factors. Conclusions: Preoperative CA 242 is a predictive factor for long-term survival in ESCC, especially in nodal-negative patients. We conclude that 2.15 U/ml may be the optimum cuf-off point for CA 242 in predicting survival in ESCC.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.

Expression Profiles of Loneliness-associated Genes for Survival Prediction in Cancer Patients

  • You, Liang-Fu;Yeh, Jia-Rong;Su, Mu-Chun
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.1
    • /
    • pp.185-190
    • /
    • 2014
  • Influence of loneliness on human survival has been established epidemiologically, but genomic research remains undeveloped. We identified 34 loneliness-associated genes which were statistically significant for high-lonely and low-lonely individuals. With the univariate Cox proportional hazards regression model, we obtained corresponding regression coefficients for loneliness-associated genes fo individual cancer patients. Furthermore, risk scores could be generated with the combination of gene expression level multiplied by corresponding regression coefficients of loneliness-associated genes. We verified that high-risk score cancer patients had shorter mean survival time than their low-risk score counterparts. Then we validated the loneliness-associated gene signature in three independent brain cancer cohorts with Kaplan-Meier survival curves (n=77, 85 and 191), significantly separable by log-rank test with hazard ratios (HR) >1 and p-values <0.0001 (HR=2.94, 3.82, and 1.78). Moreover, we validated the loneliness-associated gene signature in bone cancer (HR=5.10, p-value=4.69e-3), lung cancer (HR=2.86, p-value=4.71e-5), ovarian cancer (HR=1.97, p-value=3.11e-5), and leukemia (HR=2.06, p-value=1.79e-4) cohorts. The last lymphoma cohort proved to have an HR=3.50, p-value=1.15e-7. Loneliness-associated genes had good survival prediction for cancer patients, especially bone cancer patients. Our study provided the first indication that expression of loneliness-associated genes are related to survival time of cancer patients.

Prediction of Length of ICU Stay Using Data-mining Techniques: an Example of Old Critically Ill Postoperative Gastric Cancer Patients

  • Zhang, Xiao-Chun;Zhang, Zhi-Dan;Huang, De-Sheng
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.1
    • /
    • pp.97-101
    • /
    • 2012
  • Objective: With the background of aging population in China and advances in clinical medicine, the amount of operations on old patients increases correspondingly, which imposes increasing challenges to critical care medicine and geriatrics. The study was designed to describe information on the length of ICU stay from a single institution experience of old critically ill gastric cancer patients after surgery and the framework of incorporating data-mining techniques into the prediction. Methods: A retrospective design was adopted to collect the consecutive data about patients aged 60 or over with a gastric cancer diagnosis after surgery in an adult intensive care unit in a medical university hospital in Shenyang, China, from January 2010 to March 2011. Characteristics of patients and the length their ICU stay were gathered for analysis by univariate and multivariate Cox regression to examine the relationship with potential candidate factors. A regression tree was constructed to predict the length of ICU stay and explore the important indicators. Results: Multivariate Cox analysis found that shock and nutrition support need were statistically significant risk factors for prolonged length of ICU stay. Altogether, eight variables entered the regression model, including age, APACHE II score, SOFA score, shock, respiratory system dysfunction, circulation system dysfunction, diabetes and nutrition support need. The regression tree indicated comorbidity of two or more kinds of shock as the most important factor for prolonged length of ICU stay in the studied sample. Conclusions: Comorbidity of two or more kinds of shock is the most important factor of length of ICU stay in the studied sample. Since there are differences of ICU patient characteristics between wards and hospitals, consideration of the data-mining technique should be given by the intensivists as a length of ICU stay prediction tool.