• Title/Summary/Keyword: Gene prediction

Search Result 297, Processing Time 0.031 seconds

A Study on Predicting Lung Cancer Using RNA-Sequencing Data with Ensemble Learning (앙상블 기법을 활용한 RNA-Sequencing 데이터의 폐암 예측 연구)

  • Geon AN;JooYong PARK
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.2 no.1
    • /
    • pp.7-14
    • /
    • 2024
  • In this paper, we explore the application of RNA-sequencing data and ensemble machine learning to predict lung cancer and treatment strategies for lung cancer, a leading cause of cancer mortality worldwide. The research utilizes Random Forest, XGBoost, and LightGBM models to analyze gene expression profiles from extensive datasets, aiming to enhance predictive accuracy for lung cancer prognosis. The methodology focuses on preprocessing RNA-seq data to standardize expression levels across samples and applying ensemble algorithms to maximize prediction stability and reduce model overfitting. Key findings indicate that ensemble models, especially XGBoost, substantially outperform traditional predictive models. Significant genetic markers such as ADGRF5 is identified as crucial for predicting lung cancer outcomes. In conclusion, ensemble learning using RNA-seq data proves highly effective in predicting lung cancer, suggesting a potential shift towards more precise and personalized treatment approaches. The results advocate for further integration of molecular and clinical data to refine diagnostic models and improve clinical outcomes, underscoring the critical role of advanced molecular diagnostics in enhancing patient survival rates and quality of life. This study lays the groundwork for future research in the application of RNA-sequencing data and ensemble machine learning techniques in clinical settings.

Molecular characterization and expression pattern of a novel Keratin-associated protein 11.1 gene in the Liaoning cashmere goat (Capra hircus)

  • Jin, Mei;Cao, Qian;Wang, Ruilong;Piao, Jun;Zhao, Fengqin;Piao, Jing'ai
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.3
    • /
    • pp.328-337
    • /
    • 2017
  • Objective: An experiment was conducted to determine the relationship between the KAP11.1 and the regulation wool fineness. Methods: In previous work, we constructed a skin cDNA library and isolated a full-length cDNA clone termed KAP11.1. On this basis, we conducted a series of bioinformatics analysis. Tissue distribution of KAP11.1 mRNA was performed using semi-quantitative reverse transcription polymerase chain reaction (RT-PCR) analysis. The expression of KAP11.1 mRNA in primary and secondary hair follicles was performed using real-time PCR (real-time polymerase chain reaction) analysis. The expression location of KAP11.1 mRNA in primary and secondary hair follicles was performed using in situ hybridization. Results: Bioinformatics analysis showed that KAP11.1 gene encodes a putative 158 amino acid protein that exhibited a high content of cysteine, serine, threonine, and valine and has a pubertal mammary gland) structural domain. Secondary structure prediction revealed a high proportion of random coils (76.73%). Semi-quantitative RT-PCR showed that KAP11.1 gene was expressed in heart, skin, and liver, but not expressed in spleen, lung and kidney. Real time PCR results showed that the expression of KAP11.1 has a higher expression in catagen than in anagen in the primary hair follicles. However, in the secondary hair follicles, KAP11.1 has a significantly higher expression in anagen than in catagen. Moreover, KAP11.1 gene has a strong expression in inner root sheath, hair matrix, and a lower expression in hair bulb. Conclusion: We conclude that KAP11.1 gene may play an important role in regulating the fiber diameter.

Gene expression changes in silkworm embryogenesis for prediction of hatching time

  • Jong Woo Park;Chang Hoon Lee;Chan Young Jeong;Hyeok Gyu Kwon;Seul Ki Park;Ji Hae Lee;Sang Kuk Kang;Seong-Wan Kim;Seong-Ryul Kim;Hyun-Bok Kim;Kee Young Kim
    • International Journal of Industrial Entomology and Biomaterials
    • /
    • v.46 no.1
    • /
    • pp.16-23
    • /
    • 2023
  • The silkworm's dormancy and embryonic development are accomplished through the interaction of various genes. Analysis of the expression of several interacting genes can predict the embryonic stage of silkworms. In this study, we analyzed the changes in the expression level of genes at each stage during the embryonic development of dormant silkworm eggs and selected genes that can predict the hatching time. Jam123 and Jam124 silkworms were collected after egg laying, and the silkworm eggs were preserved using a double refrigeration method and expression analysis was performed for 23 genes during embryogenesis. There were 5 genes showing significant changes during embryogenesis: UDP-glucuronosyltransferases (BmUGTs), heat shock protein hsp20.8 (BmHsp20.8), Cytochromes b5-like proteins (BmCytb5), Krüppel homolog 1 (BmKr-h1), and cuticular protein RR-1 motif 41 (BmCpr41). As a result of quantitative comparison of the expression levels of these 5 genes through real-time PCR, the BmUGTs gene showed a difference between Jam123 and Jam124, making it difficult to see it as an indicator for predicting hatching time. However, the BmHsp20.8 gene had a common expression decreased at the imminent hatching stage. In addition, it was confirmed that the expression level of the BmCytb5 gene decreased to the lowest level at the time of imminent hatching, and the expression of the BmKr-h gene was made only at the time of imminent hatching. The expression of the last BmCpr41 gene can be confirmed only at the time of imminent hatching, and it was confirmed that it shows a rapid increase right before hatching. Taken together, these results suggest that expression analysis of BmHsp20.8, BmCytb5, BmKr-h1, and BmCpr41 genes can determine the stage of embryogenesis, predict hatching time, which facilitate better management of silkworm eggs.

Prediction of Lung Cancer Based on Serum Biomarkers by Gene Expression Programming Methods

  • Yu, Zhuang;Chen, Xiao-Zheng;Cui, Lian-Hua;Si, Hong-Zong;Lu, Hai-Jiao;Liu, Shi-Hai
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.21
    • /
    • pp.9367-9373
    • /
    • 2014
  • In diagnosis of lung cancer, rapid distinction between small cell lung cancer (SCLC) and non-small cell lung cancer (NSCLC) tumors is very important. Serum markers, including lactate dehydrogenase (LDH), C-reactive protein (CRP), carcino-embryonic antigen (CEA), neurone specific enolase (NSE) and Cyfra21-1, are reported to reflect lung cancer characteristics. In this study classification of lung tumors was made based on biomarkers (measured in 120 NSCLC and 60 SCLC patients) by setting up optimal biomarker joint models with a powerful computerized tool - gene expression programming (GEP). GEP is a learning algorithm that combines the advantages of genetic programming (GP) and genetic algorithms (GA). It specifically focuses on relationships between variables in sets of data and then builds models to explain these relationships, and has been successfully used in formula finding and function mining. As a basis for defining a GEP environment for SCLC and NSCLC prediction, three explicit predictive models were constructed. CEA and NSE are requentlyused lung cancer markers in clinical trials, CRP, LDH and Cyfra21-1 have significant meaning in lung cancer, basis on CEA and NSE we set up three GEP models-GEP 1(CEA, NSE, Cyfra21-1), GEP2 (CEA, NSE, LDH), GEP3 (CEA, NSE, CRP). The best classification result of GEP gained when CEA, NSE and Cyfra21-1 were combined: 128 of 135 subjects in the training set and 40 of 45 subjects in the test set were classified correctly, the accuracy rate is 94.8% in training set; on collection of samples for testing, the accuracy rate is 88.9%. With GEP2, the accuracy was significantly decreased by 1.5% and 6.6% in training set and test set, in GEP3 was 0.82% and 4.45% respectively. Serum Cyfra21-1 is a useful and sensitive serum biomarker in discriminating between NSCLC and SCLC. GEP modeling is a promising and excellent tool in diagnosis of lung cancer.

Association of Salivary Microbiota with Dental Caries Incidence with Dentine Involvement after 4 Years

  • Kim, Bong-Soo;Han, Dong-Hun;Lee, Ho;Oh, Bumjo
    • Journal of Microbiology and Biotechnology
    • /
    • v.28 no.3
    • /
    • pp.454-464
    • /
    • 2018
  • Salivary microbiota alterations can correlate with dental caries development in children, and mechanisms mediating this association need to be studied in further detail. Our study explored salivary microbiota shifts in children and their association with the incidence of dental caries with dentine involvement. Salivary samples were collected from children with caries and their subsequently matched caries-free controls before and after caries development. The microbiota was analyzed by 16S rRNA gene-based high-throughput sequencing. The salivary microbiota was more diverse in caries-free subjects than in those with dental caries with dentine involvement (DC). Although both groups exhibited similar shifts in microbiota composition, an association with caries was found by function prediction. Analysis of potential microbiome functions revealed that Granulicatella, Streptococcus, Bulleidia, and Staphylococcus in the DC group could be associated with the bacterial invasion of epithelial cells, phosphotransferase system, and ${\text\tiny{D}}-alanine$ metabolism, whereas Neisseria, Lautropia, and Leptotrichia in caries-free subjects could be associated with bacterial motility protein genes, linoleic acid metabolism, and flavonoid biosynthesis, suggesting that functional differences in the salivary microbiota may be associated with caries formation. These results expand the current understanding of the functional significance of the salivary microbiome in caries development, and may facilitate the identification of novel biomarkers and treatment targets.

Two-Stage Logistic Regression for Cancer Classi cation and Prediction from Copy-Numbe Changes in cDNA Microarray-Based Comparative Genomic Hybridization

  • Kim, Mi-Jung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.5
    • /
    • pp.847-859
    • /
    • 2011
  • cDNA microarray-based comparative genomic hybridization(CGH) data includes low-intensity spots and thus a statistical strategy is needed to detect subtle differences between different cancer classes. In this study, genes displaying a high frequency of alteration in one of the different classes were selected among the pre-selected genes that show relatively large variations between genes compared to total variations. Utilizing copy-number changes of the selected genes, this study suggests a statistical approach to predict patients' classes with increased performance by pre-classifying patients with similar genetic alteration scores. Two-stage logistic regression model(TLRM) was suggested to pre-classify homogeneous patients and predict patients' classes for cancer prediction; a decision tree(DT) was combined with logistic regression on the set of informative genes. TLRM was constructed in cDNA microarray-based CGH data from the Cancer Metastasis Research Center(CMRC) at Yonsei University; it predicted the patients' clinical diagnoses with perfect matches (except for one patient among the high-risk and low-risk classified patients where the performance of predictions is critical due to the high sensitivity and specificity requirements for clinical treatments. Accuracy validated by leave-one-out cross-validation(LOOCV) was 83.3% while other classification methods of CART and DT performed as comparisons showed worse performances than TLRM.

Status of Helicopter Rotor Noise Technology Development in KARI (KARI의 헬리콥터 로터 소음관련 기술개발 현황)

  • Hwang, Chang-Jeon;Chung, Ki-Hoon;Song, Keun-Woong;Joo, Gene;Lee, Wook
    • Proceedings of the Korean Society for Noise and Vibration Engineering Conference
    • /
    • 2006.05a
    • /
    • pp.187-192
    • /
    • 2006
  • Helicopter noise has been considered as one of major design factors like a performance and safety since the public acceptance, comfortability and stealth aspects were important for customers. According to the airworthiness regulation, the noise levels in throe different flight conditions shall comply with the specific limits. Main and tail rotors noise is most dominant in far field due to the low and mid range frequency characteristics. It is an air-born noise so That the accurate aerodynamic data is necessary for the accurate noise prediction. In KARI, low noise main and tail rotors as well as analysis codes have been developed since 2000. The approach for low noise main rotor is a kind of tip modifications, so called twin vortices tip to reduce the BVI noise. Analysis results show the 9.3dB reduction in terms of pseudo EPNL. The uneven spacing concept is applied for low noise tail rotor. Three or four decibel noise reduction is achieved by new optimized uneven spacing. Rotor noise and aerodynamic prediction codes have been improved also.

  • PDF

Characterization of gene expression and genetic variation of horse ERBB receptor feedback inhibitor 1 in Thoroughbreds

  • Choi, Jae-Young;Jang, Hyun-Jun;Park, Jeong-Woong;Oh, Jae-Don;Shin, Donghyun;Kim, Nam Young;Oh, Jin Hyeog;Song, Ki-Duk;Cho, Byung-Wook
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.31 no.3
    • /
    • pp.309-315
    • /
    • 2018
  • Objective: This study aimed to test the expression patterns of ERBB receptor feedback inhibitor 1 (ERRFI1) before and after exercise and the association of non-synonymous single-nucleotide polymorphisms (nsSNPs) of horse ERRFI1 with racing traits in Thoroughbreds. Methods: We performed bioinformatics and gene expression analyses for horse ERRFI1. Transcription factor (TF) binding sites in the 5'-regulatory region of this gene were identified through a tool for prediction of TF-binding site (PROMO). A general linear model was used to detect the association between the nsSNP (LOC42830758 A to G) and race performance. Results: Quantitative polymerase chain reaction analysis showed that expression level of ERRFI1 after exercise was 1.6 times higher than that before exercise. Ten transcription factors were predicted from the ERRFI1 regulatory region. A novel nsSNP (LOC42830758 A to G) was found in ERRFI1, which was associated with three racing traits including average prize money, average racing index, and 3-year-old starts percentile ranking. Conclusion: Our analysis will be helpful as a basis for studying genes and SNPs that affect race performance in racehorses.

Functional Prediction of Imprinted Genes in Chicken Based on a Mammalian Comparative Expression Network

  • Kim, Hyo-Young;Moon, Sun-Jin;Kim, Hee-Bal
    • Genomics & Informatics
    • /
    • v.6 no.1
    • /
    • pp.32-35
    • /
    • 2008
  • Little evidence supports the existence of imprinted genes in chicken. Imprinted genes are thought to be intimately connected with the acquisition of parental resources in mammals; thus, the predicted lack of this type of gene in chicken is not surprising, given that they leave their offspring to their own heritance after conception. In this study, we identified several imprinted genes and their orthologs in human, mouse, and zebrafish, including 30 previously identified human and mouse imprinted genes. Next, using the HomoloGene database, we identified six orthologous genes in human, mouse, and chicken; however, no orthologs were identified for SLC22A18, and mouse Ppp1r9a was not included in the HomoloGene database. Thus, from our analysis, four candidate chicken imprinted genes (IGF2, UBE3A, PHLDA2, and GRB10) were identified. To expand our analysis, zebrafish was included, but no probe ID for UBE3A exists in this species. Thus, ultimately, three candidate imprinted genes (IGF2, PHLDA2, and GRB10) in chicken were identified. GRB10 was not significant in chicken and zebrafish based on the Wilcoxon-Mann-Whitney test, whereas a weak correlation between PHLDA2 in chicken and human was identified from the Spearman's rank correlation coefficient. Significant associations between human, mouse, chicken, and zebrafish were found for IGF2 and GRB10 using the Friedman's test. Based on our results, IGF2, PHLDA2, and GRB10 are candidate imprinted genes in chicken. Importantly, the strongest candidate was PHLDA2.

Prediction Value of XRCC 1 Gene Polymorphism on the Survival of Ovarian Cancer Treated by Adjuvant Chemotherapy

  • Miao, Jin;Zhang, Xian;Tang, Qiong-Lan;Wang, Xiao-Yu;Kai, Li
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.10
    • /
    • pp.5007-5010
    • /
    • 2012
  • Objective: We conducted a prospective study to test the association between three amino acid substitution polymorphismic variants of DNA repair genes, XRCC1 (Arg194Trp), XRCC1(Arg280His) and XRCC1 (Arg399Gln), and clinical outcome of ovarian cancer patients undergoing adjuvant chemotherapy. Methods: 195 patients with primary advanced ovarian cancer and treated by adjuvant chemotherapy were included in our study. All were followed-up from Jan. 2007 to Jan. 2012. Genotyping of XRCC1 polymorphisms was conducted by TaqMan Gene Expression assays. Results: The XRCC1 194 Trp/Trp genotype conferred a significant risk of death from ovarian cancer when compared with Arg/Arg (HR=1.56, 95%CI=1.04-3.15). Similarly, those carrying the XRCC1 399 Gln/Gln genotype had a increased risk of death as compared to the XRCC1 399Arg/Arg genotype with an HR (95% CI) of 1.98 (1.09-3.93). Conclusion: This study is the first to provide evidence that XRCC1 gene polymorphisms would well be useful as surrogate markers of clinical outcome in ovarian cancer cases undergoing adjuvant chemotherapy.