• Title/Summary/Keyword: Cancer prediction

검색결과 448건 처리시간 0.034초

사례기반 추론을 이용한 암 환자 진료비 예측 모형의 개발 (Development of a Medial Care Cost Prediction Model for Cancer Patients Using Case-Based Reasoning)

  • 정석훈;서용무
    • Asia pacific journal of information systems
    • /
    • 제16권2호
    • /
    • pp.69-84
    • /
    • 2006
  • Importance of Today's diffusion of integrated hospital information systems is that various and huge amount of data is being accumulated in their database systems. Many researchers have studied utilizing such hospital data. While most researches were conducted mainly for medical diagnosis, there have been insufficient studies to develop medical care cost prediction model, especially using machine learning techniques. In this research, therefore, we built a medical care cost prediction model for cancer patients using CBR (Case-Based Reasoning), one of the machine learning techniques. Its performance was compared with those of Neural Networks and Decision Tree models. As a result of the experiment, the CBR prediction model was shown to be the best in general with respect to error rate and linearity between real values and predicted values. It is believed that the medical care cost prediction model can be utilized for the effective management of limited resources in hospitals.

Disease Prediction Using Ranks of Gene Expressions

  • Kim, Ki-Yeol;Ki, Dong-Hyuk;Chung, Hyun-Cheol;Rha, Sun-Young
    • Genomics & Informatics
    • /
    • 제6권3호
    • /
    • pp.136-141
    • /
    • 2008
  • A large number of studies have been performed to identify biomarkers that will allow efficient detection and determination of the precise status of a patient’s disease. The use of microarrays to assess biomarker status is expected to improve prediction accuracies, because a whole-genome approach is used. Despite their potential, however, patient samples can differ with respect to biomarker status when analyzed on different platforms, making it more difficult to make accurate predictions, because bias may exist between any two different experimental conditions. Because of this difficulty in experimental standardization of microarray data, it is currently difficult to utilize microarray-based gene sets in the clinic. To address this problem, we propose a method that predicts disease status using gene expression data that are transformed by their ranks, a concept that is easily applied to two datasets that are obtained using different experimental platforms. NCI and colon cancer datasets, which were assessed using both Affymetrix and cDNA microarray platforms, were used for method validation. Our results demonstrate that the proposed method is able to achieve good predictive performance for datasets that are obtained under different experimental conditions.

Development and Validation of a Breast Cancer Risk Prediction Model for Thai Women: A Cross-Sectional Study

  • Anothaisintawee, Thunyarat;Teerawattananon, Yot;Wiratkapun, Cholatip;Srinakarin, Jiraporn;Woodtichartpreecha, Piyanoot;Hirunpat, Siriporn;Wongwaisayawan, Sansanee;Lertsithichai, Panuwat;Kasamesup, Vijj;Thakkinstian, Ammarin
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제15권16호
    • /
    • pp.6811-6817
    • /
    • 2014
  • Background: Breast cancer risk prediction models are widely used in clinical practice. They should be useful in identifying high risk women for screening in limited-resource countries. However, previous models showed poor performance in derived and validated settings. Therefore, we aimed to develop and validate a breast cancer risk prediction model for Thai women. Materials and Methods: This cross-sectional study consisted of derived and validation phases. Data collected at Ramathibodi and other two hospitals were used for deriving and externally validating models, respectively. Multiple logistic regression was applied to construct the model. Calibration and discrimination performances were assessed using the observed/expected ratio and concordance statistic (C-statistic), respectively. A bootstrap with 200 repetitions was applied for internal validation. Results: Age, menopausal status, body mass index, and use of oral contraceptives were significantly associated with breast cancer and were included in the model. Observed/expected ratio and C-statistic were 1.00 (95% CI: 0.82, 1.21) and 0.651 (95% CI: 0.595, 0.707), respectively. Internal validation showed good performance with a bias of 0.010 (95% CI: 0.002, 0.018) and C-statistic of 0.646(95% CI: 0.642, 0.650). The observed/expected ratio and C-statistic from external validation were 0.97 (95% CI: 0.68, 1.35) and 0.609 (95% CI: 0.511, 0.706), respectively. Risk scores were created and was stratified as low (0-0.86), low-intermediate (0.87-1.14), intermediate-high (1.15-1.52), and high-risk (1.53-3.40) groups. Conclusions: A Thai breast cancer risk prediction model was created with good calibration and fair discrimination performance. Risk stratification should aid to prioritize high risk women to receive an organized breast cancer screening program in Thailand and other limited-resource countries.

Classification of Genes Based on Age-Related Differential Expression in Breast Cancer

  • Lee, Gunhee;Lee, Minho
    • Genomics & Informatics
    • /
    • 제15권4호
    • /
    • pp.156-161
    • /
    • 2017
  • Transcriptome analysis has been widely used to make biomarker panels to diagnose cancers. In breast cancer, the age of the patient has been known to be associated with clinical features. As clinical transcriptome data have accumulated significantly, we classified all human genes based on age-specific differential expression between normal and breast cancer cells using public data. We retrieved the values for gene expression levels in breast cancer and matched normal cells from The Cancer Genome Atlas. We divided genes into two classes by paired t test without considering age in the first classification. We carried out a secondary classification of genes for each class into eight groups, based on the patterns of the p-values, which were calculated for each of the three age groups we defined. Through this two-step classification, gene expression was eventually grouped into 16 classes. We showed that this classification method could be applied to establish a more accurate prediction model to diagnose breast cancer by comparing the performance of prediction models with different combinations of genes. We expect that our scheme of classification could be used for other types of cancer data.

SNP 데이터의 중요도 평가와 SVM 학습법을 이용한 폐암 감수성 예측 (Prediction of Lung Cancer Susceptibility using an Importance Evaluation of SNP Data and SVM Learning)

  • 류명춘;김상진;박창현
    • 한국콘텐츠학회논문지
    • /
    • 제8권10호
    • /
    • pp.11-19
    • /
    • 2008
  • 본 논문에서는 폐암의 발생에 관여하는 유전자 데이터인 SNP 데이터의 중요도 평가와 SVM 학습법을 이용하여 폐암 감수성을 예측하는 방법을 제안한다. 학습에 사용될 폐암 관련 양성 데이터에 비하여 음성 데이터의 수가 훨씬 많은 이유로 각 양성 데이터에 대하여 같은 성별과 적은 나이 차를 갖는 음성 데이터를 찾아서 쌍이 되도록 한다. 또한 각 SNP가 발병 예측에 미칠 영향력을 계산하는 수식을 도입하여 각 SNP의 중요도를 평가하고 SNP를 중요도에 따라 서열화 한다. 실험에서는 학습에 사용되는 순위별 SNP 개수에 따라 변화되는 예측률을 관측하였고, LOOCV 테스트 결과 제안된 방법은 실험 데이터에 대하여 최대 65.0%의 예측 정확도를 보였다.

데이터 마이닝을 이용한 입원 암 환자 간호 중증도 예측모델 구축 (An Analysis of Nursing Needs for Hospitalized Cancer Patients;Using Data Mining Techniques)

  • 박선아
    • 종양간호연구
    • /
    • 제5권1호
    • /
    • pp.3-10
    • /
    • 2005
  • Back ground: Nurses now occupy one third of all hospital human resources. Therefore, efficient management of nursing manpower is getting more important. While it is very clear that nursing workload requirement analysis and patient severity classification should be done first for the efficient allocation of nursing workforce, these processes have been conducted manually with ad hoc rule. Purposes: This study was tried to make a predict model for patient classification according to nursing need. We tried to find the easier and faster method to classify nursing patients that can help efficient management of nursing manpower. Methods: The nursing patient classifications data of the hospitalized cancer patients in one of the biggest cancer center in Korea during 2003.1.1-2003.12.31 were assessed by trained nurses. This study developed a prediction model and analyzing nursing needs by data mining techniques. Patients were classified by three different data mining techniques, (Logistic regression, Decision tree and Neural network) and the results were assessed. Results: The data set was created using 165,073 records of 2,228 patients classification database. Main explaining variables were as follows in 3 different data mining techniques. 1) Logistic regression : age, month and section. 2) Decision tree : section, month, age and tumor. 3) Neural network : section, diagnosis, age, sex, metastasis, hospital days and month. Among these three techniques, neural network showed the best prediction power in ROC curve verification. As the result of the patient classification prediction model developed by neural network based on nurse needs, the prediction accuracy was 84.06%. Conclusion: The patient classification prediction model was developed and tested in this study using real patients data. The result can be employed for more accurate calculation of required nursing staff and effective use of labor force.

  • PDF

가중 퍼지 소속함수 기반 신경망을 이용한 Wisconsin Breast Cancer 예측 퍼지규칙의 추출 (Extracting Wisconsin Breast Cancer Prediction Fuzzy Rules Using Neural Network with Weighted Fuzzy Membership Functions)

  • 임준식
    • 정보처리학회논문지B
    • /
    • 제11B권6호
    • /
    • pp.717-722
    • /
    • 2004
  • 본 논문은 가중 퍼지소속함수 기반 신경망(Neural Network with Weighted Fuzzy Membership Functions, NNWFM)을 이용하여 Wisconsin breast cancer의 예측을 수행하는 퍼지규칙을 추출하고 있다. NNWFM는 자기적응적(self adaptive)가중 퍼지소속함수를 가지고 주어진 입력 데이터로부터 학습하여 퍼지규칙을 생성하고 이론 기반으로 예측을 수행한다. 신경망 구조의 중간 부분인 하이퍼박스(hyperbox)들은 n개의 대, 중, 소의 가중 퍼지소속함수 집합으로 구성되며, 학습 후 각 집합은 퍼지집합의 bounded sum을 사용하여 다시 하나의 가중 퍼지소속함수로 합성된다. n개의 특징입력(feature input)은 학습된 모든 하이퍼박스에 연결되어 예측 작업을 수행한다. NNWFM으로 추출된 2개의 퍼지규칙은 99.41%의 예측 인식율을 가지며 이는 퍼지규칙의 수와 인식율에 있어 현재 발표된 논문의 결과보다 우수함을 보여준다.

Feasibility study of deep learning based radiosensitivity prediction model of National Cancer Institute-60 cell lines using gene expression

  • Kim, Euidam;Chung, Yoonsun
    • Nuclear Engineering and Technology
    • /
    • 제54권4호
    • /
    • pp.1439-1448
    • /
    • 2022
  • Background: We investigated the feasibility of in vitro radiosensitivity prediction with gene expression using deep learning. Methods: A microarray gene expression of the National Cancer Institute-60 (NCI-60) panel was acquired from the Gene Expression Omnibus. The clonogenic surviving fractions at an absorbed dose of 2 Gy (SF2) from previous publications were used to measure in vitro radiosensitivity. The radiosensitivity prediction model was based on the convolutional neural network. The 6-fold cross-validation (CV) was applied to train and validate the model. Then, the leave-one-out cross-validation (LOOCV) was applied by using the large-errored samples as a validation set, to determine whether the error was from the high bias of the folded CV. The criteria for correct prediction were defined as an absolute error<0.01 or a relative error<10%. Results: Of the 174 triplicated samples of NCI-60, 171 samples were correctly predicted with the folded CV. Through an additional LOOCV, one more sample was correctly predicted, representing a prediction accuracy of 98.85% (172 out of 174 samples). The average relative error and absolute errors of 172 correctly predicted samples were 1.351±1.875% and 0.00596±0.00638, respectively. Conclusion: We demonstrated the feasibility of a deep learning-based in vitro radiosensitivity prediction using gene expression.

Prognostic Value of Preoperative Serum CA 242 in Esophageal Squamous Cell Carcinoma Cases

  • Feng, Ji-Feng;Huang, Ying;Chen, Qi-Xun
    • Asian Pacific Journal of Cancer Prevention
    • /
    • 제14권3호
    • /
    • pp.1803-1806
    • /
    • 2013
  • Purpose: Carbohydrate antigen (CA) 242 is inversely related to prognosis in many cancers. However, few data regarding CA 242 in esophageal cancer (EC) are available. The aim of this study was to determine the prognostic value of CA 242 and propose an optimum cut-off point in predicting survival difference in patients with esophageal squamous cell carcinoma (ESCC). Methods: A retrospective analysis was conducted of 192 cases. A receiver operating characteristic (ROC) curve for survival prediction was plotted to verify the optimum cuf-off point. Univariate and multivariate analyses were performed to evaluate prognostic parameters for survival. Results: The positive rate for CA 242 was 7.3% (14/192). The ROC curve for survival prediction gave an optimum cut-off of 2.15 (U/ml). Patients with CA 242 ${\leq}$ 2.15 U/ml had significantly better 5-year survival than patients with CA 242 >2.15 U/ml (45.4% versus 22.6%; P=0.003). Multivariate analysis showed that differentiation (P=0.033), CA 242 (P=0.017), T grade (P=0.004) and N staging (P<0.001) were independent prognostic factors. Conclusions: Preoperative CA 242 is a predictive factor for long-term survival in ESCC, especially in nodal-negative patients. We conclude that 2.15 U/ml may be the optimum cuf-off point for CA 242 in predicting survival in ESCC.

Prediction Model for Gastric Cancer via Class Balancing Techniques

  • Danish, Jamil ;Sellappan, Palaniappan;Sanjoy Kumar, Debnath;Muhammad, Naseem;Susama, Bagchi ;Asiah, Lokman
    • International Journal of Computer Science & Network Security
    • /
    • 제23권1호
    • /
    • pp.53-63
    • /
    • 2023
  • Many researchers are trying hard to minimize the incidence of cancers, mainly Gastric Cancer (GC). For GC, the five-year survival rate is generally 5-25%, but for Early Gastric Cancer (EGC), it is almost 90%. Predicting the onset of stomach cancer based on risk factors will allow for an early diagnosis and more effective treatment. Although there are several models for predicting stomach cancer, most of these models are based on unbalanced datasets, which favours the majority class. However, it is imperative to correctly identify cancer patients who are in the minority class. This research aims to apply three class-balancing approaches to the NHS dataset before developing supervised learning strategies: Oversampling (Synthetic Minority Oversampling Technique or SMOTE), Undersampling (SpreadSubsample), and Hybrid System (SMOTE + SpreadSubsample). This study uses Naive Bayes, Bayesian Network, Random Forest, and Decision Tree (C4.5) methods. We measured these classifiers' efficacy using their Receiver Operating Characteristics (ROC) curves, sensitivity, and specificity. The validation data was used to test several ways of balancing the classifiers. The final prediction model was built on the one that did the best overall.