• Title/Summary/Keyword: Biological Data

Search Result 4,536, Processing Time 0.03 seconds

Data Mining Techniques for Analyzing Promoter Sequences (프로모터 염기서열 분석을 위한 데이터 마이닝 기법)

  • 김정자;이도헌
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2000.10a
    • /
    • pp.328-332
    • /
    • 2000
  • As DNA sequences have been known through the Genome project the techniques for dealing with molecule-level gene information are being made researches briskly. It is also urgent to develop new computer algorithms for making databases and analyzing it efficiently considering the vastness of the information for known sequences. In this respect, this paper studies the association rule search algorithms for finding out the characteristics shown by means of the association between promoter sequences and genes, which is one of the important research areas in molecular biology. This paper treat biological data, while previous search algorithms used transaction data. So, we design a transformed association nile algorithm that covers data types and biological properties. These research results will contribute to reducing the time and the cost for biological experiments by minimizing their candidates.

  • PDF

Phylogenomics and its Growing Impact on Algal Phylogeny and Evolution

  • Adrian , Reyes-Prieto;Yoon, Hwan-Su;Bhattacharya, Debashish
    • ALGAE
    • /
    • v.21 no.1
    • /
    • pp.1-10
    • /
    • 2006
  • Genomic data is accumulating in public database at an unprecedented rate. Although presently dominated by the sequences of metazoan, plant, parasitic, and picoeukaryotic taxa, both expressed sequence tag (EST) and complete genomes of free-living algae are also slowly appearing. This wealth of information offers the opportunity to clarify many long-standing issues in algal and plant evolution such as the contribution of the plastid endosymbiont to nuclear genome evolution using the tools of comparative genomics and multi-gene phylogenetics. A particularly powerful approach for the automated analysis of genome data from multiple taxa is termed phylogenomics. Phylogenomics is the convergence of genomics science (the study of the function and structure of genes and genomes) and molecular phylogenetics (the study of the hierarchical evolutionary relationships among organisms, their genes and genomes). The use of phylogenetics to drive comparative genome analyses has facilitated the reconstruction of the evolutionary history of genes, gene families, and organisms. Here we survey the available genome data, introduce phylogenomic pipelines, and review some initial results of phylogenomic analyses of algal genome data.

Estimation for Seaweed Biomass Using Regression: A Methodological Approach (회귀분석을 이용한 해조류 생물량 측정을 위한 방법론)

  • Ko, Young-Wook;Sung, Gun-Hee;Kim, Jeong-Ha
    • ALGAE
    • /
    • v.23 no.4
    • /
    • pp.289-294
    • /
    • 2008
  • To estimate seaweed biomass or standing crop, a nondestructive sampling can be beneficial because of not much destroying living plants and saving time in field works. We suggest a methodological procedure to estimate seaweed biomass per unit area in marine benthic habitats by using species-specific regression equations. Percent cover data are required from the field samplings for most species to convert them to weight data. However, for tall macroalgae such as kelps we need density data and their size (e.g., size class for subtidal kelps) of individuals. We propose that the field sampling should be done with 5 replicates of 50 cm x 50 cm quadrat at three zones of intertidals (upper, middle, lower) and three depth points (1, 5, 10 m) in subtidals. To obtain a reliable regression equation for a species, a substantial number of replicate is necessary from destructive samplings. The regression equation of a species can be further specified by different locality and different season, especially for the species with variable morphology temporally and spatially. Example estimation carried out in Onpyung, Jeju Island, Korea is provided to compare estimated values with real weight data.

Data Mining Techniques for Analyzing Promoter Sequences (프로모터 염기서열 분석을 위한 데이터 마이닝 기법)

  • 김정자;이도헌
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.4 no.4
    • /
    • pp.739-744
    • /
    • 2000
  • As DNA sequences have been known through the Genome project the techniques for dealing with molecule-level gene information are being made researches briskly. It is also urgent to develop new computer algorithms for making databases and analyzing it efficiently considering the vastness of the information for known sequences. In this respect, this paper studies the association rule search algorithms for finding out the characteristics shown by means of the association between promoter sequences and genes, which is one of the important research areas in molecular biology. This paper treat biological data, while previous search algorithms used transaction data. So, we design a transformed association rule algorithm that covers data types and biological properties. These research results will contribute to reducing the time and the cost for biological experiments by minimizing their candidates.

  • PDF

Precision nutrition: approach for understanding intra-individual biological variation (정밀영양: 개인 간 대사 다양성을 이해하기 위한 접근)

  • Kim, Yangha
    • Journal of Nutrition and Health
    • /
    • v.55 no.1
    • /
    • pp.1-9
    • /
    • 2022
  • In the past few decades, great progress has been made on understanding the interaction between nutrition and health status. But despite this wealth of knowledge, health problems related to nutrition continue to increase. This leads us to postulate that the continuing trend may result from a lack of consideration for intra-individual biological variation on dietary responses. Precision nutrition utilizes personal information such as age, gender, lifestyle, diet intake, environmental exposure, genetic variants, microbiome, and epigenetics to provide better dietary advices and interventions. Recent technological advances in the artificial intelligence, big data analytics, cloud computing, and machine learning, have made it possible to process data on a scale and in ways that were previously impossible. A big data platform is built by collecting numerous parameters such as meal features, medical metadata, lifestyle variation, genome diversity and microbiome composition. Sophisticated techniques based on machine learning algorithm can be used to integrate and interpret multiple factors and provide dietary guidance at a personalized or stratified level. The development of a suitable machine learning algorithm would make it possible to suggest a personalized diet or functional food based on analysis of intra-individual metabolic variation. This novel precision nutrition might become one of the most exciting and promising approaches of improving health conditions, especially in the context of non-communicable disease prevention.

Discovery to Human Disease Research: Proteo-Metabolomics Analysis

  • Minjoong Joo;Jeong-Hun Mok;Van-An Duong;Jong-Moon Park;Hookeun Lee
    • Mass Spectrometry Letters
    • /
    • v.15 no.2
    • /
    • pp.69 -78
    • /
    • 2024
  • The advancement of high-throughput omics technologies and systems biology is essential for understanding complex biological mechanisms and diseases. The integration of proteomics and metabolomics provides comprehensive insights into cellular functions and disease pathology, driven by developments in mass spectrometry (MS) technologies, including electrospray ionization (ESI). These advancements are crucial for interpreting biological systems effectively. However, integrating these technologies poses challenges. Compared to genomic, proteomics and metabolomics have limitations in throughput, and data integration. This review examines developments in MS equipped electrospray ionization (ESI), and their importance in the effective interpretation of biological mechanisms. The review also discusses developments in sample preparation, such as Simultaneous Metabolite, Protein, Lipid Extraction (SIMPLEX), analytical techniques, and data analysis, highlighting the application of these technologies in the study of cancer or Huntington's disease, underscoring the potential for personalized medicine and diagnostic accuracy. Efforts by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and integrative data analysis methods such as O2PLS and OnPLS extract statistical similarities between metabolomic and proteomic data. System modeling techniques that mathematically explain and predict system responses are also covered. This practical application also shows significant improvements in cancer research, diagnostic accuracy and therapeutic targeting for diseases like pancreatic ductal adenocarcinoma, non-small cell lung cancer, and Huntington's disease. These approaches enable researchers to develop standardized protocols, and interoperable software and databases, expanding multi-omics research application in clinical practice.

Development of Integrated Retrieval System of the Biology Sequence Database Using Web Service (웹 서비스를 이용한 바이오 서열 정보 데이터베이스 및 통합 검색 시스템 개발)

  • Lee, Su-Jung;Yong, Hwan-Seung
    • The KIPS Transactions:PartD
    • /
    • v.11D no.4
    • /
    • pp.755-764
    • /
    • 2004
  • Recently, the rapid development of biotechnology brings the explosion of biological data and biological data host. Moreover, these data are highly distributed and heterogeneous, reflecting the distribution and heterogeneity of the Molecular Biology research community. As a consequence, the integration and interoperability of molecular biology databases are issue of considerable importance. But, up to now, most of the integrated systems such as link based system, data warehouse based system have many problems which are keeping the data up to date when the schema and data of the data source are changed. For this reason, the integrated system using web service technology that allow biological data to be fully exploited have been proposed. In this paper, we built the integrated system if the bio sequence information bated on the web service technology. The developed system allows users to get data with many format such as BSML, GenBank, Fasta to traverse disparate data resources. Also, it has better retrieval performance because the retrieval modules of the external database proceed in parallel.

A Checklist of the Basidiomycetous Macrofungi and a Record of Five New Species from Mt. Oseo in Korea

  • Lee, Won Dong;Lee, Hyun;Fong, Jonathan J.;Oh, Seung-Yoon;Park, Myung Soo;Quan, Ying;Jung, Paul E.;Lim, Young Woon
    • Mycobiology
    • /
    • v.42 no.2
    • /
    • pp.132-139
    • /
    • 2014
  • Basidiomycetous macrofungi play important roles in maintaining forest ecosystems via carbon cycling and the mobilization of nitrogen and phosphorus. To understand the impact of human activity on macrofungi, an ongoing project at the Korea National Arboretum is focused on surveying the macrofungi in unexploited areas. Mt. Oseo was targeted in this survey because the number of visitors to this destination has been steadily increasing, and management and conservation plans for this destination are urgently required. Through 5 field surveys of Mt. Oseo from April to October 2012, 116 specimens of basidiomycetous macrofungi were collected and classified. The specimens were identified to the species level by analyzing their morphological characteristics and their DNA sequence data. A total of 80 species belonging to 57 genera and 25 families were identified. To the best of our knowledge, this is the first study to identify five of these species-Artomyces microsporus, Hymenopellis raphanipes, Pholiota abietis, Phylloporus brunneiceps, and Sirobasidium magnum-in Korea.

Molecular Characterization of the HERV-W Env Gene in Humans and Primates: Expression, FISH, Phylogeny, and Evolution

  • Kim, Heui-Soo;Kim, Dae-Soo;Huh, Jae-Won;Ahn, Kung;Yi, Joo-Mi;Lee, Ja-Rang;Hirai, Hirohisa
    • Molecules and Cells
    • /
    • v.26 no.1
    • /
    • pp.53-60
    • /
    • 2008
  • We characterized the human endogenous retrovirus (HERV-W) family in humans and primates. In silico expression data indicated that 22 complete HERV-W families from human chromosomes 1-3, 5-8, 10-12, 15, 19, and X are randomly expressed in various tissues. Quantitative real-time RT-PCR analysis of the HERV-W env gene derived from human chromosome 7q21.2 indicated predominant expression in the human placenta. Several copies of repeat sequences (SINE, LINE, LTR, simple repeat) were detected within the complete or processed pseudo HERV-W of the human, chimpanzee, and rhesus monkey. Compared to other regions (5'LTR, Gag, Gag-Pol, Env, 3'LTR), the repeat family has been mainly integrated into the region spanning the 5'LTRs of Gag (1398 bp) and Pol (3242 bp). FISH detected the HERV-W probe (fosWE1) derived from a gorilla fosmid library in the metaphase chromosomes of all primates (five hominoids, three Old World monkeys, two New World monkeys, and one prosimian), but not in Tupaia. This finding was supported by molecular clock and phylogeny data using the divergence values of the complete HERV-W LTR elements. The data suggested that the HERV-W family was integrated into the primate genome approximately 63 million years (Myr) ago, and evolved independently during the course of primate radiation.

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.