• Title/Summary/Keyword: Gene prediction

Search Result 295, Processing Time 0.027 seconds

Design of Distributed Cloud System for Managing large-scale Genomic Data

  • Seine Jang;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.16 no.2
    • /
    • pp.119-126
    • /
    • 2024
  • The volume of genomic data is constantly increasing in various modern industries and research fields. This growth presents new challenges and opportunities in terms of the quantity and diversity of genetic data. In this paper, we propose a distributed cloud system for integrating and managing large-scale gene databases. By introducing a distributed data storage and processing system based on the Hadoop Distributed File System (HDFS), various formats and sizes of genomic data can be efficiently integrated. Furthermore, by leveraging Spark on YARN, efficient management of distributed cloud computing tasks and optimal resource allocation are achieved. This establishes a foundation for the rapid processing and analysis of large-scale genomic data. Additionally, by utilizing BigQuery ML, machine learning models are developed to support genetic search and prediction, enabling researchers to more effectively utilize data. It is expected that this will contribute to driving innovative advancements in genetic research and applications.

Sequencing analysis of the OFC1 gene on the nonsyndromic cleft lip and palate patient in Korean (한국인 비증후군성 구순구개열 환자의 OFC1 유전자의 서열 분석)

  • Kim, Sung-Sik;Son, Woo-Sung
    • The korean journal of orthodontics
    • /
    • v.33 no.3 s.98
    • /
    • pp.185-197
    • /
    • 2003
  • This study was performed to identify the characteristics of the OFC1 gene (locus: chromosome 6p24.3) in Korean patients, which is assumed to be the major gene behind the nonsyndromic cleft lip and palate. The sample consisted of 80 subjects: 40 nonsyndromic cleft lip and palate patients (proband, 20 males and females, mean age 14.2 years); and 40 normal adults (20 males and 20 females, mean age 25.6 years). Using PCR-based assay, the OFC1 gene was amplified, sequenced, and then searched for similar protein structures. Results were as follows: 1. The OFC1 gene contains the microsatellite marker 'CA' repeats. The number of the reference 'CA' repeats was 21 times, and formed as TA(CA)11TA(CA)10. But, in Koreans, the number of tandem 'CA' repeats was varied from 17 to 26 except 18, and 'CA' repeats consisted of TA(CA)n. 2. Nine allelic variants were found. Distribution of the OFC1 allele was similar between the patients and control group. 3. There was a replacement of the base 'T' to 'C' after 11 tandem 'CA' repeats in Koreans compared with Weissenbach's report. However, the difference did not seem to be the ORF prediction results between Koreans and Weissenbach's report. 4. The BLAST search results showed the Telomerase reverse transcriptase (TERT) and the Nucleotide binding protein 2 (NBP2) as similar proteins. The TERT was a protein product by the hTERT gene in the locus 5p15.33 (NCBI Genome Annotation; NT023089) The NBP2 was a protein product by the ABCC3 (ATP-binding cassette, sub-family C) gene in the locus 17q22 (NCBI Genome Annotation; NT010783). 5. In the Pedant-Pro database analysis, the predictable protein structure of the OFC1 gene had at least one transmembrane region and one non-globular region.

Cloning of the posterior silk glands specific-expressed gene of silkworm (누에 후부실샘 특이 발현 유전자 클로닝)

  • Piao, Yulan;Kim, Seong-Ryul;Kim, Sung-Wan;Kang, Seok-Woo;Goo, Tae-Won;Choi, Kwang-Ho
    • Journal of Sericultural and Entomological Science
    • /
    • v.53 no.1
    • /
    • pp.44-49
    • /
    • 2015
  • We characterized tissue specific-expressed genes in the posterior silk gland of Bombyx mori using by the Annealing Control Primer based differential display-PCR manner. In this study, we isolated 34 differentially expressed PCR amplicons, which one of these was identified as a novel transcript named as ACP-16 (366 bp), its expression was observed only in the posterior silk glands by Northern blot analysis. To determine promoter region of the ACP-16, we isolated and analyzed a phage DNA having 1.7 kb-long genome DNA including the open reading flame and 5'- upstream untranslated region of the ACP-16 gene from a genomic DNA library. We have estimated a promoter region of the ACP-16 gene by a web promoter prediction engine, which locates -750 ~ -165 from translation initiation site (ATG, +1). ACP-16 gene is necessary to more studies about critical biological role in order to apply the silkworm's transgenic system.

The Prediction of the Expected Current Selection Coefficient of Single Nucleotide Polymorphism Associated with Holstein Milk Yield, Fat and Protein Contents

  • Lee, Young-Sup;Shin, Donghyun;Lee, Wonseok;Taye, Mengistie;Cho, Kwanghyun;Park, Kyoung-Do;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.29 no.1
    • /
    • pp.36-42
    • /
    • 2016
  • Milk-related traits (milk yield, fat and protein) have been crucial to selection of Holstein. It is essential to find the current selection trends of Holstein. Despite this, uncovering the current trends of selection have been ignored in previous studies. We suggest a new formula to detect the current selection trends based on single nucleotide polymorphisms (SNP). This suggestion is based on the best linear unbiased prediction (BLUP) and the Fisher's fundamental theorem of natural selection both of which are trait-dependent. Fisher's theorem links the additive genetic variance to the selection coefficient. For Holstein milk production traits, we estimated the additive genetic variance using SNP effect from BLUP and selection coefficients based on genetic variance to search highly selective SNPs. Through these processes, we identified significantly selective SNPs. The number of genes containing highly selective SNPs with p-value <0.01 (nearly top 1% SNPs) in all traits and p-value <0.001 (nearly top 0.1%) in any traits was 14. They are phosphodiesterase 4B (PDE4B), serine/threonine kinase 40 (STK40), collagen, type XI, alpha 1 (COL11A1), ephrin-A1 (EFNA1), netrin 4 (NTN4), neuron specific gene family member 1 (NSG1), estrogen receptor 1 (ESR1), neurexin 3 (NRXN3), spectrin, beta, non-erythrocytic 1 (SPTBN1), ADP-ribosylation factor interacting protein 1 (ARFIP1), mutL homolog 1 (MLH1), transmembrane channel-like 7 (TMC7), carboxypeptidase X, member 2 (CPXM2) and ADAM metallopeptidase domain 12 (ADAM12). These genes may be important for future artificial selection trends. Also, we found that the SNP effect predicted from BLUP was the key factor to determine the expected current selection coefficient of SNP. Under Hardy-Weinberg equilibrium of SNP markers in current generation, the selection coefficient is equivalent to $2^*SNP$ effect.

Semantic Similarity Search using the Signature Tree (시그니처 트리를 사용한 의미적 유사성 검색 기법)

  • Kim, Ki-Sung;Im, Dong-Hyuk;Kim, Cheol-Han;Kim, Hyoung-Joo
    • Journal of KIISE:Databases
    • /
    • v.34 no.6
    • /
    • pp.546-553
    • /
    • 2007
  • As ontologies are used widely, interest for semantic similarity search is also increasing. In this paper, we suggest a query evaluation scheme for k-nearest neighbor query, which retrieves k most similar objects to the query object. We use the best match method to calculate the semantic similarity between objects and use the signature tree to index annotation information of objects in database. The signature tree is usually used for the set similarity search. When we use the signature tree in similarity search, we are required to predict the upper-bound of similarity for a node; the highest similarity value which can be found when we traverse into the node. So we suggest a prediction function for the best match similarity function and prove the correctness of the prediction. And we modify the original signature tree structure for same signatures not to be stored redundantly. This improved structure of signature tree not only reduces the size of signature tree but also increases the efficiency of query evaluation. We use the Gene Ontology(GO) for our experiments, which provides large ontologies and large amount of annotation data. Using GO, we show that proposed method improves query efficiency and present several experimental results varying the page size and using several node-splitting methods.

Molecular analysis of alternative transcripts of equine AXL receptor tyrosine kinase gene

  • Park, Jeong-Woong;Song, Ki-Duk;Kim, Nam Young;Choi, Jae-Young;Hong, Seul A;Oh, Jin Hyeog;Kim, Si Won;Lee, Jeong Hyo;Park, Tae Sub;Kim, Jin-Kyoo;Kim, Jong Geun;Cho, Byung-Wook
    • Asian-Australasian Journal of Animal Sciences
    • /
    • v.30 no.10
    • /
    • pp.1471-1477
    • /
    • 2017
  • Objective: Since athletic performance is a most importance trait in horses, most research focused on physiological and physical studies of horse athletic abilities. In contrast, the molecular analysis as well as the regulatory pathway studies remain insufficient for evaluation and prediction of horse athletic abilities. In our previous study, we identified AXL receptor tyrosine kinase (AXL) gene which was expressed as alternative spliced isoforms in skeletal muscle during exercise. In the present study, we validated two AXL alternative splicing transcripts (named as AXLa for long form and AXLb for short form) in equine skeletal muscle to gain insight(s) into the role of each alternative transcript during exercise. Methods: We validated two isoforms of AXL transcripts in horse tissues by reverse transcriptase polymerase chain reaction (RT-PCR), and then cloned the transcripts to confirm the alternative locus and its sequences. Additionally, we examined the expression patterns of AXLa and AXLb transcripts in horse tissues by quantitative RT-PCR (qRT-PCR). Results: Both of AXLa and AXLb transcripts were expressed in horse skeletal muscle and the expression levels were significantly increased after exercise. The sequencing analysis showed that there was an alternative splicing event at exon 11 between AXLa and AXLb transcripts. 3-dimentional (3D) prediction of the alternative protein structures revealed that the structural distance of the connective region between fibronectin type 3 (FN3) and immunoglobin (Ig) domain was different between two alternative isoforms. Conclusion: It is assumed that the expression patterns of AXLa and AXLb transcripts would be involved in regulation of exercise-induced stress in horse muscle possibly through an $NF-{\kappa}B$ signaling pathway. Further study is necessary to uncover biological function(s) and significance of the alternative splicing isoforms in race horse skeletal muscle.

Gene signature for prediction of radiosensitivity in human papillomavirus-negative head and neck squamous cell carcinoma

  • Kim, Su Il;Kang, Jeong Wook;Noh, Joo Kyung;Jung, Hae Rim;Lee, Young Chan;Lee, Jung Woo;Kong, Moonkyoo;Eun, Young-Gyu
    • Radiation Oncology Journal
    • /
    • v.38 no.2
    • /
    • pp.99-108
    • /
    • 2020
  • Purpose: The probability of recurrence of cancer after adjuvant or definitive radiotherapy in patients with human papillomavirus-negative (HPV(-)) head and neck squamous cell carcinoma (HNSCC) varies for each patient. This study aimed to identify and validate radiation sensitivity signature (RSS) of patients with HPV(-) HNSCC to predict the recurrence of cancer after radiotherapy. Materials and Methods: Clonogenic survival assays were performed to assess radiosensitivity in 14 HNSCC cell lines. We identified genes closely correlated with radiosensitivity and validated them in The Cancer Genome Atlas (TCGA) cohort. The validated RSS were analyzed by ingenuity pathway analysis (IPA) to identify canonical pathways, upstream regulators, diseases and functions, and gene networks related to radiosensitive genes in HPV(-) HNSCC. Results: The survival fraction of 14 HNSCC cell lines after exposure to 2 Gy of radiation ranged from 48% to 72%. Six genes were positively correlated and 35 genes were negatively correlated with radioresistance, respectively. RSS was validated in the HPV(-) TCGA HNSCC cohort (n = 203), and recurrence-free survival (RFS) rate was found to be significantly lower in the radioresistant group than in the radiosensitive group (p = 0.035). Cell death and survival, cell-to-cell signaling, and cellular movement were significantly enriched in RSS, and RSSs were highly correlated with each other. Conclusion: We derived a HPV(-) HNSCC-specific RSS and validated it in an independent cohort. The outcome of adjuvant or definitive radiotherapy in HPV(-) patients with HNSCC can be predicted by analyzing their RSS, which might help in establishing a personalized therapeutic plan.

Genome Information of Maribacter dokdonensis DSW-8 and Comparative Analysis with Other Maribacter Genomes

  • Kwak, Min-Jung;Lee, Jidam;Kwon, Soon-Kyeong;Kim, Jihyun F.
    • Journal of Microbiology and Biotechnology
    • /
    • v.27 no.3
    • /
    • pp.591-597
    • /
    • 2017
  • Maribacter dokdonensis DSW-8 was isolated from the seawater off Dokdo in Korea. To investigate the genomic features of this marine bacterium, we sequenced its genome and analyzed the genomic features. After de novo assembly and gene prediction, 16 contigs totaling 4,434,543 bp (35.95% G+C content) in size were generated and 3,835 protein-coding sequences, 36 transfer RNAs, and 6 ribosomal RNAs were detected. In the genome of DSW-8, genes encoding the proteins associated with gliding motility, molybdenum cofactor biosynthesis, and utilization of several kinds of carbohydrates were identified. To analyze the genomic relationships among Maribacter species, we compared publically available Maribacter genomes, including that of M. dokdonensis DSW-8. A phylogenomic tree based on 1,772 genes conserved among the eight Maribacter strains showed that Maribacter speices isolated from seawater are distinguishable from species originating from algal blooms. Comparison of the gene contents using COG and subsystem databases demonstrated that the relative abundance of genes involved in carbohydrate metabolism are higher in seawater-originating strains than those of algal blooms. These results indicate that the genomic information of Maribacter species reflects the characteristics of their habitats and provides useful information for carbon utilization of marine flavobacteria.

A Method for Microarray Data Analysis based on Bayesian Networks using an Efficient Structural learning Algorithm and Data Dimensionality Reduction (효율적 구조 학습 알고리즘과 데이타 차원축소를 통한 베이지안망 기반의 마이크로어레이 데이타 분석법)

  • 황규백;장정호;장병탁
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.11
    • /
    • pp.775-784
    • /
    • 2002
  • Microarray data, obtained from DNA chip technologies, is the measurement of the expression level of thousands of genes in cells or tissues. It is used for gene function prediction or cancer diagnosis based on gene expression patterns. Among diverse methods for data analysis, the Bayesian network represents the relationships among data attributes in the form of a graph structure. This property enables us to discover various relations among genes and the characteristics of the tissue (e.g., the cancer type) through microarray data analysis. However, most of the present microarray data sets are so sparse that it is difficult to apply general analysis methods, including Bayesian networks, directly. In this paper, we harness an efficient structural learning algorithm and data dimensionality reduction in order to analyze microarray data using Bayesian networks. The proposed method was applied to the analysis of real microarray data, i.e., the NC160 data set. And its usefulness was evaluated based on the accuracy of the teamed Bayesian networks on representing the known biological facts.

Prediction and Annotation of ABC Transporter Genes from Magnaporthe oryzae Genome Sequence (벼도열병균 게놈서열로부터 ABC transporter 유전자군의 예측 및 특성 분석)

  • Kim, Yong-Nam;Kim, Jin-Soo;Kim, Su-Young;Kim, Jeong-Hwan;Lee, Jong-Hwan;Choi, Woo-Bong
    • Journal of Life Science
    • /
    • v.20 no.2
    • /
    • pp.176-182
    • /
    • 2010
  • Magnaporthe oryzae is destructive plant-pathogenic fungus and causes rice blast. The pathogen uses several mechanisms to circumvent the inhibitory actions of fungicides. ATP-binding cassette (ABC) transporters are known to provide protection against toxic compounds in the environment. PC facilitated bioinformatic analysis, particularly with respect to accessing and extracting database information and domain identification. We predicted ABC transporter genes from the M. oryzae genome sequence with computation and bioinformatics tools. A total of thirty three genes were predicted to encode ABC transporters. Three of thirty three putative genes corresponded to three known ABC transporter genes (ABC1, ABC2 and ABC3). Copy numbers of the ABC transporter genes were proven by Southern blot analysis, which revealed that twenty genes tested exist as a single copy. We amplified the DNA complementary to RNA corresponding to eleven of these by reverse transcriptase polymerase chain reaction.