• 제목/요약/키워드: Genomic prediction

검색결과 112건 처리시간 0.026초

Hybrid Fungal Genome Annotation Pipeline Combining ab initio, Evidence-, and Homology-based gene model evaluation

  • Min, Byoungnam;Choi, In-Geol
    • 한국균학회소식:학술대회논문집
    • /
    • 한국균학회 2018년도 춘계학술대회 및 임시총회
    • /
    • pp.22-22
    • /
    • 2018
  • Fungal genome sequencing and assembly have been trivial in these days. Genome analysis relies on high quality of gene prediction and annotation. Automatic fungal genome annotation pipeline is essential for handling genomic sequence data accumulated exponentially. However, building an automatic annotation procedure for fungal genomes is not an easy task. FunGAP (Fungal Genome Annotation Pipeline) is developed for precise and accurate prediction of gene models from any fungal genome assembly. To make high-quality gene models, this pipeline employs multiple gene prediction programs encompassing ab initio, evidence-, and homology-based evaluation. FunGAP aims to evaluate all predicted genes by filtering gene models. To make a successful filtering guide for removal of false-positive genes, we used a scoring function that seeks for a consensus by estimating each gene model based on homology to the known proteins or domains. FunGAP is freely available for non-commercial users at the GitHub site (https://github.com/CompSynBioLab-KoreaUniv/FunGAP).

  • PDF

Challenges and New Approaches in Genomics and Bioinformatics

  • Park, Jong Hwa;Han, Kyung Sook
    • Genomics & Informatics
    • /
    • 제1권1호
    • /
    • pp.1-6
    • /
    • 2003
  • In conclusion, the seemingly fuzzy and disorganized data of biology with thousands of different layers ranging from molecule to the Internet have refused so far to be mapped precisely and predicted successfully by mathematicians, physicists or computer scientists. Genomics and bioinformatics are the fields that process such complex data. The insights on the nature of biological entities as complex interaction networks are opening a door toward a generalization of the representation of biological entities. The main challenge of genomics and bioinformatics now lies in 1) how to data mine the networks of the domains of bioinformatics, namely, the literature, metabolic pathways, and proteome and structures, in terms of interaction; and 2) how to generalize the networks in order to integrate the information into computable genomic data for computers regardless of the levels of layer. Once bioinformatists succeed to find a general principle on the way components interact each other to form any organic interaction network at genomic scale, true simulation and prediction of life in silico will be possible.

Comparative Genome Analysis of Sphingomonas chungbukensis DJ77

  • Hai Dang Sy;Kim Young-Pil;Choi Bum-Sun;Um Hyun-Ju;Kim Young-Chang
    • 한국미생물학회:학술대회논문집
    • /
    • 한국미생물학회 2002년도 추계학술대회
    • /
    • pp.175-179
    • /
    • 2002
  • The assemblies of our partial genomic sequence data of Sphingomonas chungbukensis DJ77, with the total size of 877,928 bp, was done by TIGR Assembler. The total size of our current obtained contigs was about 0.73 Mb. A comparative genome analysis between our uncompleted genome and the other completed genomes was performed by taking advantage of the availability of multiple complete genomes in COGs database (Clusters of Orthologous Groups of proteins) to produce the genomic prediction of our S. chungbukensis DJ77. This analysis based on homologues search among completed genomes provides good initial step to our better assigning putative function to predicted coding sequences.

  • PDF

한우의 유전체 육종가의 정확도 추정 (Estimation of the Accuracy of Genomic Breeding Value in Hanwoo (Korean Cattle))

  • 이승수;이승환;최태정;최연호;조광현;최유림;조용민;김내수;이중재
    • Journal of Animal Science and Technology
    • /
    • 제55권1호
    • /
    • pp.13-18
    • /
    • 2013
  • 본 연구는 농협 한우개량사업소 후대검정우 552두의 도체중, 배최장근단면적, 등지방두께 및 근내지방도를 측정한 후 고밀도 SNP 패널(777K)을 사용하여 유전체 혈연 행렬(Genetic Relationship Matrix, GRM)을 추정하고 GBLUP (Genomic Best Linear Unbiased Prediction) 방법으로 GEBV (Genomic Estimated Breeding Value)를 구하여 교차 검증(Cross-validation) 방법으로 그 정확도를 추정함으로써 유전체 선발 기법을 한우 유전평가 체계에 적용하기 위한 기초자료로 이용하고자 수행하였다. 교차 검증 방법으로 각 형질별로 추정된 유전체 육종가의 정확도는 0.915~0.957로 상당히 높게 추정되었다. 대립유전자의 빈도로 계산된 유전체 혈연 행렬을 이용하여 GBLUP 방법으로 추정된 육종가 정확도의 최대 차이는 후대검정우 534두에 대하여 도체중, 배최장근단면적, 등지방 두께 및 근내지방도 순으로 각각 9.56%, 5.78%, 5.78% 및 4.18% 정도의 수준으로 상승했고, 혈통 기록상의 모든 개체 3,674두에 대해서는 형질 별로 최대 13.54%, 6.50%, 6.50% 및 4.31% 정도의 수준으로 증가한 결과가 추정되었다. 이는 한우 보증씨수소의 선발 시스템에서 아직 표현형 자료를 생산할 수 없는 당대검정 후보축 대한 집단을 조성할 때 유전체 정보를 이용한 사전 선발을 활용하면 기존의 상대적으로 낮았던 육종가의 정확도의 상승 효과와 세대 간격의 단축으로 인하여 유전적 개량량을 증대시킬 수 있을 것으로 기대된다. 본 연구에서 genomic breeding value 추정을 위하여 조성된 집단의 경우는 후대 검정우 집단으로서 개체들 간의 혈연관계가 높으며, 이미 전통적인 BLUP 방법으로도 상당히 높은 정확도를 가진 집단을 이용하였다. 그러나, 현재 한우 집단에 대한 유전체 자료 구축 시 이용할 수 있는 정확한 자료는 후대검정우 집단 외에는 참조 집단을 조성할 수 있는 대안이 없으므로, 지속적인 유전체 검정을 위해서는 다양한 유전적 조성이 구축된 참조 집단을 구축해야 할 것으로 사료된다. 또한 유전체 검정을 통한 정확도 상승효과를 기대하기 위해서 지속적으로 참조 집단의 크기를 늘릴 필요성이 있다.

Effect of single nucleotide polymorphism on the total number of piglets born per parity of three different pig breeds

  • Do, Kyoung-Tag;Jung, Soon-Woo;Park, Kyung-Do;Na, Chong-Sam
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제31권5호
    • /
    • pp.628-635
    • /
    • 2018
  • Objective: To determine the effects of genomic breeding values (GBV) and single nucleotide polymorphisms (SNP) on the total number of piglets born (TNB) in 3 pig breeds (Berkshire, Landrace, and Yorkshire). Methods: After collecting genomic information (Porcine SNP BeadChip) and phenotypic TNB records for each breed, the effects of GBV and SNP were estimated by using single step best linear unbiased prediction (ssBLUP) method. Results: The heritability estimates for TNB in Berkshire, Landrace, and Yorkshire breeds were 0.078, 0.107, and 0.121, respectively. The breeding value estimates for TNB in Berkshire, Landrace, and Yorkshire breeds were in the range of -1.34 to 1.47 heads, -1.79 to 1.87 heads, and -2.60 to 2.94 heads, respectively. Of sows having records for TNB, the reliability of breeding value for individuals with SNP information was higher than that for individuals without SNP information. Distributions of the SNP effects on TNB did not follow gamma distribution. Most SNP effects were near zero. Only a few SNPs had large effects. The numbers of SNPs with absolute value of more than 4 standard deviations in Berkshire, Landrace, and Yorkshire breeds were 11, 8, and 19, respectively. There was no SNP with absolute value of more than 5 standard deviations in Berkshire or Landrace. However, in Yorkshire, four SNPs (ASGA 0089457, ASGA0103374, ALGA0111816, and ALGA0098882) had absolute values of more than 5 standard deviations. Conclusion: There was no common SNP with large effect among breeds. This might be due to the large genetic composition differences and the small size of reference population. For the precise evaluation of genetic performance of individuals using a genomic selection method, it may be necessary to establish the appropriate size of reference population.

Genome Information of Maribacter dokdonensis DSW-8 and Comparative Analysis with Other Maribacter Genomes

  • Kwak, Min-Jung;Lee, Jidam;Kwon, Soon-Kyeong;Kim, Jihyun F.
    • Journal of Microbiology and Biotechnology
    • /
    • 제27권3호
    • /
    • pp.591-597
    • /
    • 2017
  • Maribacter dokdonensis DSW-8 was isolated from the seawater off Dokdo in Korea. To investigate the genomic features of this marine bacterium, we sequenced its genome and analyzed the genomic features. After de novo assembly and gene prediction, 16 contigs totaling 4,434,543 bp (35.95% G+C content) in size were generated and 3,835 protein-coding sequences, 36 transfer RNAs, and 6 ribosomal RNAs were detected. In the genome of DSW-8, genes encoding the proteins associated with gliding motility, molybdenum cofactor biosynthesis, and utilization of several kinds of carbohydrates were identified. To analyze the genomic relationships among Maribacter species, we compared publically available Maribacter genomes, including that of M. dokdonensis DSW-8. A phylogenomic tree based on 1,772 genes conserved among the eight Maribacter strains showed that Maribacter speices isolated from seawater are distinguishable from species originating from algal blooms. Comparison of the gene contents using COG and subsystem databases demonstrated that the relative abundance of genes involved in carbohydrate metabolism are higher in seawater-originating strains than those of algal blooms. These results indicate that the genomic information of Maribacter species reflects the characteristics of their habitats and provides useful information for carbon utilization of marine flavobacteria.

The Application of Machine Learning Algorithm In The Analysis of Tissue Microarray; for the Prediction of Clinical Status

  • Cho, Sung-Bum;Kim, Woo-Ho;Kim, Ju-Han
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.366-370
    • /
    • 2005
  • Tissue microarry is one of the high throughput technologies in the post-genomic era. Using tissue microarray, the researchers are able to investigate large amount of gene expressions at the level of DNA, RNA, and protein The important aspect of tissue microarry is its ability to assess a lot of biomarkers which have been used in clinical practice. To manipulate the categorical data of tissue microarray, we applied Bayesian network classifier algorithm. We identified that Bayesian network classifier algorithm could analyze tissue microarray data and integrating prior knowledge about gastric cancer could achieve better performance result. The results showed that relevant integration of prior knowledge promote the prediction accuracy of survival status of the immunohistochemical tissue microarray data of 18 tumor suppressor genes. In conclusion, the application of Bayesian network classifier seemed appropriate for the analysis of the tissue microarray data with clinical information.

  • PDF

Discovering cis-regulatory motifs by combining multiple predictors

  • Chang, Hye-Shik;Hwang, Kyu-Woong;Kim, Dong-Sup
    • Bioinformatics and Biosystems
    • /
    • 제2권2호
    • /
    • pp.52-57
    • /
    • 2007
  • The computational discovery of transcription factor binding site is one of the important tools in the genetic and genomic analysis. Rough prediction of gene regulation network and finding possible co-regulated genes are typical applications of the technique. Countless motif-discovery algorithms have been proposed for the past years. However, there is no dominant algorithm yet. Each algorithm does not give enough accuracy without extensive information. In this paper, we explore the possibility of combining multiple algorithms for the one integrated result in order to improve the performance and the convenience of researchers. Moreover, we apply new high order information that is reorganized from the set of basis predictions to the final prediction.

  • PDF

Functional annotation of uncharacterized proteins from Fusobacterium nucleatum: identification of virulence factors

  • Kanchan Rauthan;Saranya Joshi;Lokesh Kumar;Divya Goel;Sudhir Kumar
    • Genomics & Informatics
    • /
    • 제21권2호
    • /
    • pp.21.1-21.14
    • /
    • 2023
  • Fusobacterium nucleatum is a gram-negative bacteria associated with diverse infections like appendicitis and colorectal cancer. It mainly attacks the epithelial cells in the oral cavity and throat of the infected individual. It has a single circular genome of 2.7 Mb. Many proteins in F. nucleatum genome are listed as "Uncharacterized." Annotation of these proteins is crucial for obtaining new facts about the pathogen and deciphering the gene regulation, functions, and pathways along with discovery of novel target proteins. In the light of new genomic information, an armoury of bioinformatic tools were used for predicting the physicochemical parameters, domain and motif search, pattern search, and localization of the uncharacterized proteins. The programs such as receiver operating characteristics determine the efficacy of the databases that have been employed for prediction of different parameters at 83.6%. Functions were successfully assigned to 46 uncharacterized proteins which included enzymes, transporter proteins, membrane proteins, binding proteins, etc. Apart from the function prediction, the proteins were also subjected to string analysis to reveal the interacting partners. The annotated proteins were also put through homology-based structure prediction and modeling using Swiss PDB and Phyre2 servers. Two probable virulent factors were also identified which could be investigated further for potential drug-related studies. The assigning of functions to uncharacterized proteins has shown that some of these proteins are important for cell survival inside the host and can act as effective drug targets.

다층퍼셉트론 기반 리 샘플링 방법 비교를 위한 마이크로어레이 분류 예측 에러 추정 시스템 (Classification Prediction Error Estimation System of Microarray for a Comparison of Resampling Methods Based on Multi-Layer Perceptron)

  • 박수영;정채영
    • 한국정보통신학회논문지
    • /
    • 제14권2호
    • /
    • pp.534-539
    • /
    • 2010
  • 게놈 연구에서 수천 개의 특징들은 비교적 작은 샘플들로부터 모아진다. 게놈 연구의 목적은 미래 관찰들의 결과를 예측하는 분류기를 만드는 것이다. 분류기를 만들기 위해서는 특징 선택, 모델 선택 그리고 예측 평가 등의 3단계 과정을 거친다. 본 논문은 예측 평가에 초점을 맞추고 모든 슬라이드의 사분위수를 똑같게 맞추는 quantilenormalization 적용하여 마이크로어레이 데이터를 표준화 한 후 특징 선택에 앞서 예측 모델의 '진짜' 예측 에러를 평가하기 위해 몇 개의 방법들을 비교하는 시스템을 고안하고 방법들의 예측 에러를 비교 분석 하였다. LOOCV는 전체적으로 작은 MSE와 bias를 나타내었고, 크기가 작은 샘플에서 split 방법과 2-fold CV는 매우 좋지 않는 결과를 보였다. 계산적으로 번거로운 분석에 대해서는 10-fold CV가 LOOCV보다 오히려 더 낳은 경향을 보였다.