• 제목/요약/키워드: genomic data

검색결과 626건 처리시간 0.033초

A novel homozygous mutation in SZT2 gene in Saudi family with developmental delay, macrocephaly and epilepsy

  • Naseer, Muhammad Imran;Alwasiyah, Mohammad Khalid;Abdulkareem, Angham Abdulrahman;Bajammal, Rayan Abdullah;Trujillo, Carlos;Abu-Elmagd, Muhammad;Jafri, Mohammad Alam;Chaudhary, Adeel G.;Al-Qahtani, Mohammad H.
    • Genes and Genomics
    • /
    • 제40권11호
    • /
    • pp.1149-1155
    • /
    • 2018
  • Epileptic encephalopathies are genetically heterogeneous disorders which leads to epilepsy and cause neurological disorders. Seizure threshold 2 (SZT2) gene located on chromosome 1p34.2 encodes protein mainly expressed predominantly in the parietal and frontal cortex and dorsal root ganglia in the brain. Previous studies in mice showed that mutation in this gene can confers low seizure threshold, enhance epileptogenesis and in human may leads to facial dysmorphism, intellectual disability, seizure and macrocephaly. Objective of this study was to find out novel gene or novel mutation related to the gene phenotype. We have identified a large consanguineous Saudi family segregating developmental delay, intellectual disability, epilepsy, high forehead and macrocephaly. Exome sequencing was performed in affected siblings of the family to study the novel mutation. Whole exome sequencing data analysis, confirmed by subsequent Sanger sequencing validation study. Our results showed a novel homozygous mutation (c.9368G>A) in a substitution of a conserved glycine residue into a glutamic acid in the exon 67 of SZT2 gene. The mutation was ruled out in 100 unrelated healthy controls. The missense variant has not yet been reported as pathogenic in literature or variant databases. In conclusion, the here detected homozygous SZT2 variant might be the causative mutation that further explain epilepsy and developmental delay in this Saudi family.

A study of the genomic estimated breeding value and accuracy using genotypes in Hanwoo steer (Korean cattle)

  • Eun Ho, Kim;Du Won, Sun;Ho Chan, Kang;Ji Yeong, Kim;Cheol Hyun, Myung;Doo Ho, Lee;Seung Hwan, Lee;Hyun Tae, Lim
    • 농업과학연구
    • /
    • 제48권4호
    • /
    • pp.681-691
    • /
    • 2021
  • The estimated breeding value (EBV) and accuracy of Hanwoo steer (Korean cattle) is an indicator that can predict the slaughter time in the future and carcass performance outcomes. Recently, studies using pedigrees and genotypes are being actively conducted to improve the accuracy of the EBV. In this study, the pedigree and genotype of 46 steers obtained from livestock farm A in Gyeongnam were used for a pedigree best linear unbiased prediction (PBLUP) and a genomic best linear unbiased prediction (GBLUP) to estimate and analyze the breeding value and accuracy of the carcass weight (CWT), eye muscle area (EMA), back-fat thickness (BFT), and marbling score (MS). PBLUP estimated the EBV and accuracy by constructing a numeric relationship matrix (NRM) from the 46 steers and reference population I (545,483 heads) with the pedigree and phenotype. GBLUP estimated genomic EBV (GEBV) and accuracy by constructing a genomic relationship matrix (GRM) from the 46 steers and reference population II (16,972 heads) with the genotype and phenotype. As a result, in the order of CWT, EMA, BFT, and MS, the accuracy levels of PBLUP were 0.531, 0.519, 0.524 and 0.530, while the accuracy outcomes of GBLUP were 0.799, 0.779, 0.768, and 0.810. The accuracy estimated by GBLUP was 50.1 - 53.1% higher than that estimated by PBLUP. GEBV estimated with the genotype is expected to show higher accuracy than the EBV calculated using only the pedigree and is thus expected to be used as basic data for genomic selection in the future.

문학과 유전체 내러티브 -리차드 파워스의 생명의 책 (Literature and Genomic Narrative: Richard Powers' The Book of Life)

  • 송태정
    • 영어영문학
    • /
    • 제53권2호
    • /
    • pp.243-260
    • /
    • 2007
  • This article explores how Richard Powers' The Gold Bug Variations, an interdisciplinary novel through the new concepts of biocriticism and bioliterature is connected with literature/art and science/technology. Powers uses Edgar Allen Poe's "The Gold Bug" and Johann Sebastian Bach's "The Goldberg Variations" for decoding DNA in order to analogize a genomic metaphor. He imagines literature as "the book of life" genome, written by DNA code due to the complexity and multiplicity of the genome. His novel, as 'genomic narrative,' shows the articulation of the genomic reading, and expression in the life language through the discourses of the information technology and the rhetorical tropes in biology. New biological ideas are continually required to articulate these processes. In the present tendency of the Human Genome Project, such advanced devices as biocybernetics offer the potential to open up new possibilities to researching the complexity of the genome. This can only happen if the following two ideas are followed: One is to comply with advanced technologies for processing the rapidly increasing data of the genome sequence; The other is to admit the necessary paradigm shift in biology. As shown above, the complexity and multiplicity of the genomic reality is not so simple. We must go beyond determinism, even if representation of a biological reality reveals the possibility of expressing its constituent elements by the advanced biotechnology. Consequently, in the unstoppable advances of the art of decoding the genome, The Gold Bug Variations interrelates to the interdisciplinary approaches through the rhetorical tropes that unfold the complex discursive world of the genome. Powers shows that the complex mechanisms of the genome in the microworld of every cell as the plot of "the book of life" can be designed and written using DNA language. At the same time, his genomic reading and writing demonstrate the historical processes of the shifting center of new genomic development and polysemous interpretation.

Genomic Selection for Adjacent Genetic Markers of Yorkshire Pigs Using Regularized Regression Approaches

  • Park, Minsu;Kim, Tae-Hun;Cho, Eun-Seok;Kim, Heebal;Oh, Hee-Seok
    • Asian-Australasian Journal of Animal Sciences
    • /
    • 제27권12호
    • /
    • pp.1678-1683
    • /
    • 2014
  • This study considers a problem of genomic selection (GS) for adjacent genetic markers of Yorkshire pigs which are typically correlated. The GS has been widely used to efficiently estimate target variables such as molecular breeding values using markers across the entire genome. Recently, GS has been applied to animals as well as plants, especially to pigs. For efficient selection of variables with specific traits in pig breeding, it is required that any such variable selection retains some properties: i) it produces a simple model by identifying insignificant variables; ii) it improves the accuracy of the prediction of future data; and iii) it is feasible to handle high-dimensional data in which the number of variables is larger than the number of observations. In this paper, we applied several variable selection methods including least absolute shrinkage and selection operator (LASSO), fused LASSO and elastic net to data with 47K single nucleotide polymorphisms and litter size for 519 observed sows. Based on experiments, we observed that the fused LASSO outperforms other approaches.

Computational analysis of SARS-CoV-2, SARS-CoV, and MERS-CoV genome using MEGA

  • Sohpal, Vipan Kumar
    • Genomics & Informatics
    • /
    • 제18권3호
    • /
    • pp.30.1-30.7
    • /
    • 2020
  • The novel coronavirus pandemic that has originated from China and spread throughout the world in three months. Genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) predecessor, severe acute respiratory syndrome coronavirus (SARS-CoV) and Middle East respiratory syndrome coronavirus (MERS-CoV) play an important role in understanding the concept of genetic variation. In this paper, the genomic data accessed from National Center for Biotechnology Information (NCBI) through Molecular Evolutionary Genetic Analysis (MEGA) for statistical analysis. Firstly, the Bayesian information criterion (BIC) and Akaike information criterion (AICc) are used to evaluate the best substitution pattern. Secondly, the maximum likelihood method used to estimate of transition/transversions (R) through Kimura-2, Tamura-3, Hasegawa-Kishino-Yano, and Tamura-Nei nucleotide substitutions model. Thirdly and finally nucleotide frequencies computed based on genomic data of NCBI. The results indicate that general times reversible model has the lowest BIC and AICc score 347,394 and 347,287, respectively. The transition/transversions bias for nucleotide substitutions models varies from 0.56 to 0.59 in MEGA output. The average nitrogenous bases frequency of U, C, A, and G are 31.74, 19.48, 28.04, and 20.74, respectively in percentages. Overall the genomic data analysis of SARS-CoV-2, SARS-CoV, and MERS-CoV highlights the close genetic relationship.

대용량 공간 자료들의 세그먼테이션에서의 모수들의 최적화 (Optimization of parameters in segmentation of large-scale spatial data sets)

  • 오미라;이현주
    • 대한전자공학회:학술대회논문집
    • /
    • 대한전자공학회 2008년도 하계종합학술대회
    • /
    • pp.897-898
    • /
    • 2008
  • Array comparative genomic hybridization (aCGH) has been used to detect chromosomal regions of amplifications or deletions, which allows identification of new cancer related genes. As aCGH, a large-scale spatial data, contains significant amount of noises in its raw data, it has been an important research issue to segment genomic DNA regions to detect its true underlying copy number aberrations (CNAs). In this study, we focus on applying a segmentation method to multiple data sets. We compare two different threshold values for analyzing aCGH data with CBS method [1]. The proposed threshold values are p-value or $Q{\pm}1.5IQR$ and $Q{\pm}1.5IQR$.

  • PDF

Application of genotyping-by-sequencing (GBS) in plant genome using bioinformatics pipeline

  • Lee, Yun Gyeong;Kang, Chon-Sik;Kim, Changsoo
    • 한국작물학회:학술대회논문집
    • /
    • 한국작물학회 2017년도 9th Asian Crop Science Association conference
    • /
    • pp.58-58
    • /
    • 2017
  • The advent of next generation sequencing technology has elicited plenty of sequencing data available in agriculturally relevant plant species. For most crop species, it is too expensive to obtain the whole genome sequence data with sufficient coverage. Thus, many approaches have been developed to bring down the cost of NGS. Genotyping-by-sequencing (GBS) is a cost-effective genotyping method for complex genetic populations. GBS can be used for the analysis of genomic selection (GS), genome-wide association study (GWAS) and constructing haplotype and genetic linkage maps in a variety of plant species. For efficiently dealing with plant GBS data, the TASSEL-GBS pipeline is one of the most popular choices for many researchers. TASSEL-GBS is JAVA based a software package to obtain genotyping data from raw GBS sequences. Here, we describe application of GBS and bioinformatics pipeline of TASSEL-GBS for analyzing plant genetics data.

  • PDF

Use of Graph Database for the Integration of Heterogeneous Biological Data

  • Yoon, Byoung-Ha;Kim, Seon-Kyu;Kim, Seon-Young
    • Genomics & Informatics
    • /
    • 제15권1호
    • /
    • pp.19-27
    • /
    • 2017
  • Understanding complex relationships among heterogeneous biological data is one of the fundamental goals in biology. In most cases, diverse biological data are stored in relational databases, such as MySQL and Oracle, which store data in multiple tables and then infer relationships by multiple-join statements. Recently, a new type of database, called the graph-based database, was developed to natively represent various kinds of complex relationships, and it is widely used among computer science communities and IT industries. Here, we demonstrate the feasibility of using a graph-based database for complex biological relationships by comparing the performance between MySQL and Neo4j, one of the most widely used graph databases. We collected various biological data (protein-protein interaction, drug-target, gene-disease, etc.) from several existing sources, removed duplicate and redundant data, and finally constructed a graph database containing 114,550 nodes and 82,674,321 relationships. When we tested the query execution performance of MySQL versus Neo4j, we found that Neo4j outperformed MySQL in all cases. While Neo4j exhibited a very fast response for various queries, MySQL exhibited latent or unfinished responses for complex queries with multiple-join statements. These results show that using graph-based databases, such as Neo4j, is an efficient way to store complex biological relationships. Moreover, querying a graph database in diverse ways has the potential to reveal novel relationships among heterogeneous biological data.

스트링 B-트리를 이용한 게놈 서열 분석 시스템 (An Analysis System for Whole Genomic Sequence Using String B-Tree)

  • 최정현;조환규
    • 정보처리학회논문지A
    • /
    • 제8A권4호
    • /
    • pp.509-516
    • /
    • 2001
  • 생명 과학의 발전과 많은 게놈(genome) 프로젝트의 결과로 여러 종의 게놈 서열이 밝혀지고 있다. 생물체의 서열을 분석하는 방법은 전역정렬(global alignment), 지역정렬(local alignment) 등 여러 가지 방법이 있는데, 그 중 하나가 k-mer 분석이다. k-mer는 유전자의 염기 서열내의 길이가 k인 연속된 염기 서열로서 k-mer 분석은 염기서열이 가진 k-mer들의 빈도 분포나 대칭성 등을 탐색하는 것이다. 그런데 게놈의 염기 서열은 대용량 텍스트이고 k가 클 때 기존의 온메모리 알고리즘으로는 처리가 불가능하므로 효율적인 자료구조와 알고리즘이 필요하다. 스트링 B-트리는 패턴 일치(pattern matching)에 적합하고 외부 메모리를 지원하는 좋은 자료구조이다. 본 논문에서는 스트링 B-트리(string B-tree)를 k-mer 분석에 효율적인 구조로 개선하여, C. elegans 외의 30개의 게놈 서열에 대해 분석한다. k-mer들의 빈도 분포와 대칭성을 보여주기 위해 CGR(Chaotic Game Representation)을 이용한 가시화 시스템을 제시한다. 게놈 서열과 매우 유사한 서열 상의 어떤 부분을 시그니쳐(signature)라 하고, 높은 유사도를 가지는 최소 길이의 시그니쳐를 찾는 알고리즘을 제시한다.

  • PDF

Challenges and New Approaches in Genomics and Bioinformatics

  • Park, Jong Hwa;Han, Kyung Sook
    • Genomics & Informatics
    • /
    • 제1권1호
    • /
    • pp.1-6
    • /
    • 2003
  • In conclusion, the seemingly fuzzy and disorganized data of biology with thousands of different layers ranging from molecule to the Internet have refused so far to be mapped precisely and predicted successfully by mathematicians, physicists or computer scientists. Genomics and bioinformatics are the fields that process such complex data. The insights on the nature of biological entities as complex interaction networks are opening a door toward a generalization of the representation of biological entities. The main challenge of genomics and bioinformatics now lies in 1) how to data mine the networks of the domains of bioinformatics, namely, the literature, metabolic pathways, and proteome and structures, in terms of interaction; and 2) how to generalize the networks in order to integrate the information into computable genomic data for computers regardless of the levels of layer. Once bioinformatists succeed to find a general principle on the way components interact each other to form any organic interaction network at genomic scale, true simulation and prediction of life in silico will be possible.