• 제목/요약/키워드: Genomic prediction

검색결과 112건 처리시간 0.033초

Review of statistical methods for survival analysis using genomic data

  • Lee, Seungyeoun;Lim, Heeju
    • Genomics & Informatics
    • /
    • 제17권4호
    • /
    • pp.41.1-41.12
    • /
    • 2019
  • Survival analysis mainly deals with the time to event, including death, onset of disease, and bankruptcy. The common characteristic of survival analysis is that it contains "censored" data, in which the time to event cannot be completely observed, but instead represents the lower bound of the time to event. Only the occurrence of either time to event or censoring time is observed. Many traditional statistical methods have been effectively used for analyzing survival data with censored observations. However, with the development of high-throughput technologies for producing "omics" data, more advanced statistical methods, such as regularization, should be required to construct the predictive survival model with high-dimensional genomic data. Furthermore, machine learning approaches have been adapted for survival analysis, to fit nonlinear and complex interaction effects between predictors, and achieve more accurate prediction of individual survival probability. Presently, since most clinicians and medical researchers can easily assess statistical programs for analyzing survival data, a review article is helpful for understanding statistical methods used in survival analysis. We review traditional survival methods and regularization methods, with various penalty functions, for the analysis of high-dimensional genomics, and describe machine learning techniques that have been adapted to survival analysis.

Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value

  • Shin, Donghyun;Lee, Chul;Park, Kyoung-Do;Kim, Heebal;Cho, Kwang-hyeon
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제30권3호
    • /
    • pp.309-319
    • /
    • 2017
  • Objective: Holsteins are known as the world's highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein) using Korean Holstein data. Methods: This study was performed using single nucleotide polymorphism (SNP) chip data (Illumina BovineSNP50 Beadchip) of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP) and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results: We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion: This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.

In Silico Signature Prediction Modeling in Cytolethal Distending Toxin-Producing Escherichia coli Strains

  • Javadi, Maryam;Oloomi, Mana;Bouzari, Saeid
    • Genomics & Informatics
    • /
    • 제15권2호
    • /
    • pp.69-80
    • /
    • 2017
  • In this study, cytolethal distending toxin (CDT) producer isolates genome were compared with genome of pathogenic and commensal Escherichia coli strains. Conserved genomic signatures among different types of CDT producer E. coli strains were assessed. It was shown that they could be used as biomarkers for research purposes and clinical diagnosis by polymerase chain reaction, or in vaccine development. cdt genes and several other genetic biomarkers were identified as signature sequences in CDT producer strains. The identified signatures include several individual phage proteins (holins, nucleases, and terminases, and transferases) and multiple members of different protein families (the lambda family, phage-integrase family, phage-tail tape protein family, putative membrane proteins, regulatory proteins, restriction-modification system proteins, tail fiber-assembly proteins, base plate-assembly proteins, and other prophage tail-related proteins). In this study, a sporadic phylogenic pattern was demonstrated in the CDT-producing strains. In conclusion, conserved signature proteins in a wide range of pathogenic bacterial strains can potentially be used in modern vaccine-design strategies.

Trimming conditions for DADA2 analysis in QIIME2 platform

  • Lee, Seo-Young;Yu, Yeuni;Chung, Jin;Na, Hee Sam
    • International Journal of Oral Biology
    • /
    • 제46권3호
    • /
    • pp.146-153
    • /
    • 2021
  • Accurate identification of microbes facilitates the prediction, prevention, and treatment of human diseases. To increase the accuracy of microbiome data analysis, a long region of the 16S rRNA is commonly sequenced via paired-end sequencing. In paired-end sequencing, a sufficient length of overlapping region is required for effective joining of the reads, and high-quality sequencing reads are needed at the overlapping region. Trimming sequences at the reads distal to a point where sequencing quality drops below a specific threshold enhance the joining process. In this study, we examined the effect of trimming conditions on the number of reads that remained after quality control and chimera removal in the Illumina paired-end reads of the V3-V4 hypervariable region. We also examined the alpha diversity and taxa assigned by each trimming condition. Optimum quality trimming increased the number of good reads and assigned more number of operational taxonomy units. The pre-analysis trimming step has a great influence on further microbiome analysis, and optimized trimming conditions should be applied for Divisive Amplicon Denoising Algorithm 2 analysis in QIIME2 platform.

The Prediction of the Expected Current Selection Coefficient of Single Nucleotide Polymorphism Associated with Holstein Milk Yield, Fat and Protein Contents

  • Lee, Young-Sup;Shin, Donghyun;Lee, Wonseok;Taye, Mengistie;Cho, Kwanghyun;Park, Kyoung-Do;Kim, Heebal
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제29권1호
    • /
    • pp.36-42
    • /
    • 2016
  • Milk-related traits (milk yield, fat and protein) have been crucial to selection of Holstein. It is essential to find the current selection trends of Holstein. Despite this, uncovering the current trends of selection have been ignored in previous studies. We suggest a new formula to detect the current selection trends based on single nucleotide polymorphisms (SNP). This suggestion is based on the best linear unbiased prediction (BLUP) and the Fisher's fundamental theorem of natural selection both of which are trait-dependent. Fisher's theorem links the additive genetic variance to the selection coefficient. For Holstein milk production traits, we estimated the additive genetic variance using SNP effect from BLUP and selection coefficients based on genetic variance to search highly selective SNPs. Through these processes, we identified significantly selective SNPs. The number of genes containing highly selective SNPs with p-value <0.01 (nearly top 1% SNPs) in all traits and p-value <0.001 (nearly top 0.1%) in any traits was 14. They are phosphodiesterase 4B (PDE4B), serine/threonine kinase 40 (STK40), collagen, type XI, alpha 1 (COL11A1), ephrin-A1 (EFNA1), netrin 4 (NTN4), neuron specific gene family member 1 (NSG1), estrogen receptor 1 (ESR1), neurexin 3 (NRXN3), spectrin, beta, non-erythrocytic 1 (SPTBN1), ADP-ribosylation factor interacting protein 1 (ARFIP1), mutL homolog 1 (MLH1), transmembrane channel-like 7 (TMC7), carboxypeptidase X, member 2 (CPXM2) and ADAM metallopeptidase domain 12 (ADAM12). These genes may be important for future artificial selection trends. Also, we found that the SNP effect predicted from BLUP was the key factor to determine the expected current selection coefficient of SNP. Under Hardy-Weinberg equilibrium of SNP markers in current generation, the selection coefficient is equivalent to $2^*SNP$ effect.

Functional analysis of SH3 domain containing ring finger 2 during the myogenic differentiation of quail myoblast cells

  • Kim, Si Won;Lee, Jeong Hyo;Park, Tae Sub
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제30권8호
    • /
    • pp.1183-1189
    • /
    • 2017
  • Objective: Owing to the public availability of complete genome sequences, including avian species, massive bioinformatics analyses may be conducted for computational gene prediction and the identification of gene regulatory networks through various informatics tools. However, to evaluate the biofunctional activity of a predicted target gene, in vivo and in vitro functional genomic analyses should be a prerequisite. Methods: Due to a lack of quail genomic sequence information, we first identified the partial genomic structure and sequences of the quail SH3 domain containing ring finger 2 (SH3RF2) gene. Subsequently, SH3RF2 was knocked out using clustered regularly interspaced short palindromic repeat/Cas9 technology and single cell-derived SH3RF2 mutant sublines were established to study the biofunctional activity of SH3RF2 in quail myoblast (QM7) cells during muscle differentiation. Results: Through a T7 endonuclease I assay and genotyping analysis, we established an SH3RF2 knockout (KO) QM7#4 subline with 61 and 155 nucleotide deletion mutations in SH3RF2. After the induction of myotube differentiation, the expression profiles were analyzed and compared between regular QM7 and SH3RF2 KO QM7#4 cells by global RNA sequencing and bioinformatics analysis. Conclusion: We did not detect any statistically significant role of SH3RF2 during myotube differentiation in QM7 myoblast cells. However, additional experiments are necessary to examine the biofunctional activity of SH3RF2 in cell proliferation and muscle growth.

The effectiveness of genomic selection for milk production traits of Holstein dairy cattle

  • Lee, Yun-Mi;Dang, Chang-Gwon;Alam, Mohammad Z.;Kim, You-Sam;Cho, Kwang-Hyeon;Park, Kyung-Do;Kim, Jong-Joo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제33권3호
    • /
    • pp.382-389
    • /
    • 2020
  • Objective: This study was conducted to test the efficiency of genomic selection for milk production traits in a Korean Holstein cattle population. Methods: A total of 506,481 milk production records from 293,855 animals (2,090 heads with single nucleotide polymorphism information) were used to estimate breeding value by single step best linear unbiased prediction. Results: The heritability estimates for milk, fat, and protein yields in the first parity were 0.28, 0.26, and 0.23, respectively. As the parity increased, the heritability decreased for all milk production traits. The estimated generation intervals of sire for the production of bulls (LSB) and that for the production of cows (LSC) were 7.9 and 8.1 years, respectively, and the estimated generation intervals of dams for the production of bulls (LDB) and cows (LDC) were 4.9 and 4.2 years, respectively. In the overall data set, the reliability of genomic estimated breeding value (GEBV) increased by 9% on average over that of estimated breeding value (EBV), and increased by 7% in cows with test records, about 4% in bulls with progeny records, and 13% in heifers without test records. The difference in the reliability between GEBV and EBV was especially significant for the data from young bulls, i.e. 17% on average for milk (39% vs 22%), fat (39% vs 22%), and protein (37% vs 22%) yields, respectively. When selected for the milk yield using GEBV, the genetic gain increased about 7.1% over the gain with the EBV in the cows with test records, and by 2.9% in bulls with progeny records, while the genetic gain increased by about 24.2% in heifers without test records and by 35% in young bulls without progeny records. Conclusion: More genetic gains can be expected through the use of GEBV than EBV, and genomic selection was more effective in the selection of young bulls and heifers without test records.

Evaluation of Genome Based Estimated Breeding Values for Meat Quality in a Berkshire Population Using High Density Single Nucleotide Polymorphism Chips

  • Baby, S.;Hyeong, K.E.;Lee, Y.M.;Jung, J.H.;Oh, D.Y.;Nam, K.C.;Kim, T.H.;Lee, H.K.;Kim, Jong-Joo
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제27권11호
    • /
    • pp.1540-1547
    • /
    • 2014
  • The accuracy of genomic estimated breeding values (GEBV) was evaluated for sixteen meat quality traits in a Berkshire population (n = 1,191) that was collected from Dasan breeding farm, Namwon, Korea. The animals were genotyped with the Illumina porcine 62 K single nucleotide polymorphism (SNP) bead chips, in which a set of 36,605 SNPs were available after quality control tests. Two methods were applied to evaluate GEBV accuracies, i.e. genome based linear unbiased prediction method (GBLUP) and Bayes B, using ASREML 3.0 and Gensel 4.0 software, respectively. The traits composed different sets of training (both genotypes and phenotypes) and testing (genotypes only) data. Under the GBLUP model, the GEBV accuracies for the training data ranged from $0.42{\pm}0.08$ for collagen to $0.75{\pm}0.02$ for water holding capacity with an average of $0.65{\pm}0.04$ across all the traits. Under the Bayes B model, the GEBV accuracy ranged from $0.10{\pm}0.14$ for National Pork Producers Council (NPCC) marbling score to $0.76{\pm}0.04$ for drip loss, with an average of $0.49{\pm}0.10$. For the testing samples, the GEBV accuracy had an average of $0.46{\pm}0.10$ under the GBLUP model, ranging from $0.20{\pm}0.18$ for protein to $0.65{\pm}0.06$ for drip loss. Under the Bayes B model, the GEBV accuracy ranged from $0.04{\pm}0.09$ for NPCC marbling score to $0.72{\pm}0.05$ for drip loss with an average of $0.38{\pm}0.13$. The GEBV accuracy increased with the size of the training data and heritability. In general, the GEBV accuracies under the Bayes B model were lower than under the GBLUP model, especially when the training sample size was small. Our results suggest that a much greater training sample size is needed to get better GEBV accuracies for the testing samples.

Estimation of genetic correlations and genomic prediction accuracy for reproductive and carcass traits in Hanwoo cows

  • Md Azizul Haque;Asif Iqbal;Mohammad Zahangir Alam;Yun-Mi Lee;Jae-Jung Ha;Jong-Joo Kim
    • Journal of Animal Science and Technology
    • /
    • 제66권4호
    • /
    • pp.682-701
    • /
    • 2024
  • This study estimated the heritabilities (h2) and genetic and phenotypic correlations between reproductive traits, including calving interval (CI), age at first calving (AFC), gestation length (GL), number of artificial inseminations per conception (NAIPC), and carcass traits, including carcass weight (CWT), eye muscle area (EMA), backfat thickness (BF), and marbling score (MS) in Korean Hanwoo cows. In addition, the accuracy of genomic predictions of breeding values was evaluated by applying the genomic best linear unbiased prediction (GBLUP) and the weighted GBLUP (WGBLUP) method. The phenotypic data for reproductive and carcass traits were collected from 1,544 Hanwoo cows, and all animals were genotyped using Illumina Bovine 50K single nucleotide polymorphism (SNP) chip. The genetic parameters were estimated using a multi-trait animal model using the MTG2 program. The estimated h2 for CI, AFC, GL, NAIPC, CWT, EMA, BF, and MS were 0.10, 0.13, 0.17, 0.11, 0.37, 0.35, 0.27, and 0.45, respectively, according to the GBLUP model. The GBLUP accuracy estimates ranged from 0.51 to 0.74, while the WGBLUP accuracy estimates for the traits under study ranged from 0.51 to 0.79. Strong and favorable genetic correlations were observed between GL and NAIPC (0.61), CWT and EMA (0.60), NAIPC and CWT (0.49), AFC and CWT (0.48), CI and GL (0.36), BF and MS (0.35), NAIPC and EMA (0.35), CI and BF (0.30), EMA and MS (0.28), CI and AFC (0.26), AFC and EMA (0.24), and AFC and BF (0.21). The present study identified low to moderate positive genetic correlations between reproductive and CWT traits, suggesting that a heavier body weight may lead to a longer CI, AFC, GL, and NAIPC. The moderately positive genetic correlation between CWT and AFC, and NAIPC, with a phenotypic correlation of nearly zero, suggesting that the genotype-environment interactions are more likely to be responsible for the phenotypic manifestation of these traits. As a result, the inclusion of these traits by breeders as selection criteria may present a good opportunity for developing a selection index to increase the response to the selection and identification of candidate animals, which can result in significantly increased profitability of production systems.

Comparative Evaluation of Intron Prediction Methods and Detection of Plant Genome Annotation Using Intron Length Distributions

  • Yang, Long;Cho, Hwan-Gue
    • Genomics & Informatics
    • /
    • 제10권1호
    • /
    • pp.58-64
    • /
    • 2012
  • Intron prediction is an important problem of the constantly updated genome annotation. Using two model plant (rice and $Arabidopsis$) genomes, we compared two well-known intron prediction tools: the Blast-Like Alignment Tool (BLAT) and Sim4cc. The results showed that each of the tools had its own advantages and disadvantages. BLAT predicted more than 99% introns of whole genomic introns with a small number of false-positive introns. Sim4cc was successful at finding the correct introns with a false-negative rate of 1.02% to 4.85%, and it needed a longer run time than BLAT. Further, we evaluated the intron information of 10 complete plant genomes. As non-coding sequences, intron lengths are not limited by a triplet codon frame; so, intron lengths have three phases: a multiple of three bases (3n), a multiple of three bases plus one (3n + 1), and a multiple of three bases plus two (3n + 2). It was widely accepted that the percentages of the 3n, 3n + 1, and 3n + 2 introns were quite similar in genomes. Our studies showed that 80% (8/10) of species were similar in terms of the number of three phases. The percentages of 3n introns in $Ostreococcus$ $lucimarinus$ was excessive (47.7%), while in $Ostreococcus$ $tauri$, it was deficient (29.1%). This discrepancy could have been the result of errors in intron prediction. It is suggested that a three-phase evaluation is a fast and effective method of detecting intron annotation problems.