• 제목/요약/키워드: Genomic analysis

검색결과 1,628건 처리시간 0.023초

Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes

  • Linder, Henry;Zhang, Yuping
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.411-430
    • /
    • 2019
  • Tumor development is driven by complex combinations of biological elements. Recent advances suggest that molecularly distinct subtypes of breast cancers may respond differently to pathway-targeted therapies. Thus, it is important to dissect pathway disturbances by integrating multiple molecular profiles, such as genetic, genomic and epigenomic data. However, missing data are often present in the -omic profiles of interest. Motivated by genomic data integration and imputation, we present a new statistical framework for pathway significance analysis. Specifically, we develop a new strategy for imputation of missing data in large-scale genomic studies, which adapts low-rank, structured matrix completion. Our iterative strategy enables us to impute missing data in complex configurations across multiple data platforms. In turn, we perform large-scale pathway analysis integrating gene expression, copy number, and methylation data. The advantages of the proposed statistical framework are demonstrated through simulations and real applications to breast cancer subtypes. We demonstrate superior power to identify pathway disturbances, compared with other imputation strategies. We also identify differential pathway activity across different breast tumor subtypes.

Generation and analysis of whole-genome sequencing data in human mammary epithelial cells

  • Jong-Lyul Park;Jae-Yoon Kim;Seon-Young Kim;Yong Sun Lee
    • Genomics & Informatics
    • /
    • 제21권1호
    • /
    • pp.11.1-11.5
    • /
    • 2023
  • Breast cancer is the most common cancer worldwide, and advanced breast cancer with metastases is incurable mainly with currently available therapies. Therefore, it is essential to understand molecular characteristics during the progression of breast carcinogenesis. Here, we report a dataset of whole genomes from the human mammary epithelial cell system derived from a reduction mammoplasty specimen. This system comprises pre-stasis 184D cells, considered normal, and seven cell lines along cancer progression series that are immortalized or additionally acquired anchorage-independent growth. Our analysis of the whole-genome sequencing (WGS) data indicates that those seven cancer progression series cells have somatic mutations whose number ranges from 8,393 to 39,564 (with an average of 30,591) compared to 184D cells. These WGS data and our mutation analysis will provide helpful information to identify driver mutations and elucidate molecular mechanisms for breast carcinogenesis.

Comparative Genome-Scale Expression Analysis of Growth Phase-dependent Genes in Wild Type and rpoS Mutant of Escherichia coli

  • Oh, Tae-Jeong;Jung, Il-Lae;Woo, Sook-Kyung;Kim, Myung-Soon;Lee, Sun-Woo;Kim, Keun-Ha;Kim, In-Gyu;An, Sung-Whan
    • 한국미생물생명공학회:학술대회논문집
    • /
    • 한국미생물생명공학회 2004년도 Annual Meeting BioExibition International Symposium
    • /
    • pp.258-265
    • /
    • 2004
  • Numerous genes of Escherichia coli have been shown to growth phase-dependent expression throughout growth. The global patterns of growth phase-dependent gene expression of E. coli throughout growth using oligonucleotide microarrays containing a nearly complete set of 4,289 annotated open reading frames. To determine the change of gene expression throughout growth, we compared RNAs taken from timecourses with common reference RNA, which is combined with equal amount of RNA pooled from each time point. The hierarchical clustering of the conditions in accordance with timecourse expression revealed that growth phases were clustered into four classes, consistent with known physiological growth status. We analyzed the differences of expression levels at genome level in both exponential and stationary growth phase cultures. Statistical analysis showed that 213 genes are shown to, growth phase-dependent expression. We also analyzed the expression of 256 known operons and 208 regulatory genes. To assess the global impact of RpoS, we identified 193 genes coregulated with rpoS and their expression levels were examined in the isogenic rpoS mutant. The results revealed that 99 of 193 were novel RpoS-dependent stationary phase-induced genes and the majority of those are functionally unknown. Our data provide that global changes and adjustments of gene expression are coordinately regulated by growth transition in E. coli.

  • PDF

스트링 B-트리를 이용한 게놈 서열 분석 시스템 (An Analysis System for Whole Genomic Sequence Using String B-Tree)

  • 최정현;조환규
    • 정보처리학회논문지A
    • /
    • 제8A권4호
    • /
    • pp.509-516
    • /
    • 2001
  • 생명 과학의 발전과 많은 게놈(genome) 프로젝트의 결과로 여러 종의 게놈 서열이 밝혀지고 있다. 생물체의 서열을 분석하는 방법은 전역정렬(global alignment), 지역정렬(local alignment) 등 여러 가지 방법이 있는데, 그 중 하나가 k-mer 분석이다. k-mer는 유전자의 염기 서열내의 길이가 k인 연속된 염기 서열로서 k-mer 분석은 염기서열이 가진 k-mer들의 빈도 분포나 대칭성 등을 탐색하는 것이다. 그런데 게놈의 염기 서열은 대용량 텍스트이고 k가 클 때 기존의 온메모리 알고리즘으로는 처리가 불가능하므로 효율적인 자료구조와 알고리즘이 필요하다. 스트링 B-트리는 패턴 일치(pattern matching)에 적합하고 외부 메모리를 지원하는 좋은 자료구조이다. 본 논문에서는 스트링 B-트리(string B-tree)를 k-mer 분석에 효율적인 구조로 개선하여, C. elegans 외의 30개의 게놈 서열에 대해 분석한다. k-mer들의 빈도 분포와 대칭성을 보여주기 위해 CGR(Chaotic Game Representation)을 이용한 가시화 시스템을 제시한다. 게놈 서열과 매우 유사한 서열 상의 어떤 부분을 시그니쳐(signature)라 하고, 높은 유사도를 가지는 최소 길이의 시그니쳐를 찾는 알고리즘을 제시한다.

  • PDF

Molecular cloning, sequence polymorphism and genomic organization of far eastern catfish (Silurus asotus) GH gene

  • Park, Byul-Nim;Bang, In-Chul;Kim, Dong-Soo;Nam, Yoon-Kwon
    • 한국양식학회:학술대회논문집
    • /
    • 한국양식학회 2003년도 추계학술발표대회 논문요약집
    • /
    • pp.42-42
    • /
    • 2003
  • The far eastern catfish (Silurus asotus) growth hormone (GH) gene was cloned and characterized. The complete nucleotide sequences of genomic GH gene sequences as well as a catfish GH cDNA were obtained by RT-PCR and gene filter screening. The GH cDNA and genomic gene span 1.0 and 1.8 kb from the start codon to the polyadenylation signal, respectively. Both on cDNA and gDNA GH genes, the sequence polymorphism was detected including various silence mutations. The genomic GH gene comprised of only four exons and three introns, which was novel type of fish GH gene structure. The evolutionary relation of the catfish GH gene was inferred based on the comparative phylogenic analysis using the gene structures and sequences.

  • PDF

Molecular Cloning and Expression of Genes Related to Antifungal Activities from Enterobacter sp. B54 Antagonistic to Phytophthora capsici

  • YOON, SANG-HONG
    • Journal of Microbiology and Biotechnology
    • /
    • 제9권3호
    • /
    • pp.352-357
    • /
    • 1999
  • Enterobacter sp. B54 inhibited growth of the fungus Phytophthora capsici on potato dextrose agar (PDA). Three mutants with antifungal activities (denoted M54-47, M54-113, and M54-329) which were lost or increased, through Pl::Tn5 lac mutagenesis, were used to isolate genes responsible for fungal inhibition on PDA. Two clones were selected from the partially EcoR1-digested genomic library of the wild-type strain by probing with genomic flanking sequences of each mutant. We have isolated a 20-kb EcoR1 genomic DNA fragment from this strain that contains genes involved in hyphal growth inhibition of P. capsici on PDA. Subcloning and expression analysis of the above DNA fragment identified a 8-kb region which was necessary for antifungal activities. A 8-kb HindⅢDNA fragment covers three genomic loci inserted by Tn5 lac in each mutant. This suggested that all genes which are related to antifungal activities might be clustered in simple forms of at least 5-8 kb sizes.

  • PDF

유전체 코호트 연구의 주요 통계학적 과제 (Statistical Issues in Genomic Cohort Studies)

  • 박소희
    • Journal of Preventive Medicine and Public Health
    • /
    • 제40권2호
    • /
    • pp.108-113
    • /
    • 2007
  • When conducting large-scale cohort studies, numerous statistical issues arise from the range of study design, data collection, data analysis and interpretation. In genomic cohort studies, these statistical problems become more complicated, which need to be carefully dealt with. Rapid technical advances in genomic studies produce enormous amount of data to be analyzed and traditional statistical methods are no longer sufficient to handle these data. In this paper, we reviewed several important statistical issues that occur frequently in large-scale genomic cohort studies, including measurement error and its relevant correction methods, cost-efficient design strategy for main cohort and validation studies, inflated Type I error, gene-gene and gene-environment interaction and time-varying hazard ratios. It is very important to employ appropriate statistical methods in order to make the best use of valuable cohort data and produce valid and reliable study results.

Design of Distributed Cloud System for Managing large-scale Genomic Data

  • Seine Jang;Seok-Jae Moon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제16권2호
    • /
    • pp.119-126
    • /
    • 2024
  • The volume of genomic data is constantly increasing in various modern industries and research fields. This growth presents new challenges and opportunities in terms of the quantity and diversity of genetic data. In this paper, we propose a distributed cloud system for integrating and managing large-scale gene databases. By introducing a distributed data storage and processing system based on the Hadoop Distributed File System (HDFS), various formats and sizes of genomic data can be efficiently integrated. Furthermore, by leveraging Spark on YARN, efficient management of distributed cloud computing tasks and optimal resource allocation are achieved. This establishes a foundation for the rapid processing and analysis of large-scale genomic data. Additionally, by utilizing BigQuery ML, machine learning models are developed to support genetic search and prediction, enabling researchers to more effectively utilize data. It is expected that this will contribute to driving innovative advancements in genetic research and applications.

Genomic aspects in reproductive medicine

  • Minyeon Go;Sung Han Shim
    • Clinical and Experimental Reproductive Medicine
    • /
    • 제51권2호
    • /
    • pp.91-101
    • /
    • 2024
  • Infertility is a complex disease characterized by extreme genetic heterogeneity, compounded by various environmental factors. While there are exceptions, individual genetic and genomic variations related to infertility are typically rare, often family-specific, and may serve as susceptibility factors rather than direct causes of the disease. Consequently, identifying the cause of infertility and developing prevention and treatment strategies based on these factors remain challenging tasks, even in the modern genomic era. In this review, we first examine the genetic and genomic variations associated with infertility, and subsequently summarize the concepts and methods of preimplantation genetic testing in light of advances in genome analysis technology.

Genotypic and Phenotypic Diversity of PGPR Fluorescent Pseudomonads Isolated from the Rhizosphere of Sugarcane (Saccharum officinarum L.)

  • Rameshkumar, Neelamegam;Ayyadurai, Niraikulam;Kayalvizhi, Nagarajan;Gunasekaran, Paramsamy
    • Journal of Microbiology and Biotechnology
    • /
    • 제22권1호
    • /
    • pp.13-24
    • /
    • 2012
  • The genetic diversity of plant growth-promoting rhizobacterial (PGPR) fluorescent pseudomonads associated with the sugarcane (Saccharum officinarum L.) rhizosphere was analyzed. Selected isolates were screened for plant growthpromoting properties including production of indole acetic acid, phosphate solubilization, denitrification ability, and production of antifungal metabolites. Furthermore, 16S rDNA sequence analysis was performed to identify and differentiate these isolates. Based on 16S rDNA sequence similarity, the isolates were designated as Pseudomonas plecoglossicida, P. fluorescens, P. libaniensis, and P. aeruginosa. Differentiation of isolates belonging to the same group was achieved through different genomic DNA fingerprinting techniques, including randomly amplified polymorphic DNA (RAPD), amplified ribosomal DNA restriction analysis (ARDRA), repetitive extragenic palindromic (REP), enterobacterial repetitive intergenic consensus (ERIC), and bacterial repetitive BOX elements (BOX) analyses. The genetic diversity observed among the isolates and rep-PCR-generated fingerprinting patterns revealed that PGPR fluorescent pseudomonads are associated with the rhizosphere of sugarcane and that P. plecoglossicida is a dominant species. The knowledge obtained herein regarding the genetic and functional diversity of fluorescent pseudomonads associated with the sugarcane rhizosphere is useful for understanding their ecological role and potential utilization in sustainable agriculture.