• Title/Summary/Keyword: 염기서열변환

Search Result 28, Processing Time 0.022 seconds

Design and Implementation of an Intelligent Multiple DNA Sequence Translation Tool (지능적 다중염기서열 변환 도구의 설계 및 구현)

  • Lee Hye-Ri;Lee Geon-Myeong;Lee Chan-Hui;Lee Seong-Deok;Kim Seong-Su
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2006.05a
    • /
    • pp.37-40
    • /
    • 2006
  • 계통분석을 하는 생물학자들은 관련된 분석대상에 대한 정보를 확보하여 비교분석하기 위해 NCBI 등으로부터 염기서열을 확보하여 아미노산 서열로 변환하는 작업을 수행하게 된다. 많은 서열 데이터에 대해서 데이터베이스로부터 데이터를 검색하고 이를 변환하는 작업을 순차적으로 분석자가 관여하여 작업하는 것이 현재 분석환경이다. 따라서 본 논문에서는 분석의 효율성을 향상시키기 위해, 관심서열의 등록번호(Accession Number) 리스트를 입력하면 해당 서열에 대항 정보를 NCBI로부터 웹로봇을 통해 자동으로 확보한 다음, 확보된 염기서열 전체를 아미노산 서열로 자동 변환하여 가장 긴 ORF(Open Reading Frame)을 추천해주기 위해 설계된 지능형 다중 염기서열 변환 도구에 대해서 소개한다.

  • PDF

A DNA Sequence Search Algorithm Using Integer Type Transformation (정수형 변환을 이용한 DNA 서열 검색 알고리즘)

  • Yoon, Kyong-Oh;Cho, Sung-Bae
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.357-359
    • /
    • 2012
  • 초 고성능 바이오 서열 분석 장비 기술의 발달로 대량의 바이오 정보가 쏟아져 나오고 있으며, 바이오산업의 발달로 개인별 유전체 정보에 의한 맞춤의학의 시대가 도래되고 있다. 수많은 서열에 대한 분석에는 많은 저장장치 및 주기억장치가 필요하므로 슈퍼컴퓨터 급의 서버와 대량의 데이터를 빠르게 처리할 수 있는 프로그램이 필요하다. 이러한 분석에는 염기서열 일치 검색과 이를 기반으로 하는 Alignment와 Assembly 분석이 있으며, 이를 수행하는 기존의 알고리즘 및 대부분의 프로그램들은 염기서열을 문자열로 취급하고, 해쉬 인덱스 테이블, Brujin 그래프의 사용, 버러우즈 휠러 변환(BWT) 등의 기법을 활용하여 효율적인 분석을 도모하였다. 본 논문에서는 염기서열을 문자열이 아닌 k-mer 묶음의 정수형 하나로 변환하여 검색함으로써 저장 공간의 크기를 약 28% 이상으로 줄이고 형 변환 상태에서의 검색을 수행할 수 있는 알고리즘을 제안한다. Assembly 분석 프로그램인 CalcGen 프로그램을 개발하여 본 알고리즘의 효용성 및 효율성을 실험을 통해 검증하였다. 이 연구의 결과는 향후 대량의 유전체 염기서열의 효율적 분석과 저장 및 처리에 또 하나의 새로운 접근 방법을 제안하는데에 그 의미를 둘 수 있다.

Characteristic Signature Extraction using the Base Distribution Substitution Comparison (염기분포와 대치 비교를 이용한 염기서열 집단의 고유 시그너쳐 추출)

  • Hwang, Gyeong-Sun;Lee, Hye-Ri;Lee, Geon-Myeong;Kim, Seong-Su;Lee, Chan-Hui;Lee, Seong-Deok;Yun, Hyeong-U
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.11a
    • /
    • pp.419-422
    • /
    • 2007
  • 유전자 변이가 쉽게 일어나는 바이러스 등은 변이 계통에 따라 집단을 형성하게 된다. 이러한 집단들에 대한 분석은 해당 바이러스 집단에 대한 추적, 백신 및 치료약 개발에서 필수적이다. 어떤 집단의 염기 서열의 특성을 효과적으로 표현하는 패턴을 시그너쳐라 하며, 이러한 시그너쳐는 특정 염기서열 집단의 고유한 특성을 나타내면서 다른 집단과 구별되는 정보를 포함하는 것이 바람직하다. 이 논문에서는 가능한 후보 시그너쳐들을 염기분포를 이용하여 생성해가면서, 시그너쳐 해당부위의 염기를 상대 서열집단의 공통 서열의 염기로 변환하여 집단간의 상대거리를 측정함으로써, 후보 시그너쳐에 의한 집단의 고유성질 표현능력과 집단간 차별화 능력을 고려하여 시그너쳐를 추출하는 방법을 제안한다.

  • PDF

Implementation of an Information Management System for Nucleotide Sequences based on BSML using Active Trigger Rules (BSML 기반 능동 트리거 규칙을 이용한 염기서열정보관리시스템의 구현)

  • Park Sung Hee;Jung Kwang Su;Ryu Keun Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.1
    • /
    • pp.24-42
    • /
    • 2005
  • Characteristics of biological data including genome sequences are heterogeneous and various. Although the need of management systems for genome sequencing which should reflect biological characteristics has been raised, most current biological databases provide restricted function as repositories for biological data. Therefore, this paper describes a management system of nucleotide sequences at the level of biological laboratories. It includes format transformation, editing, storing and retrieval for collected nucleotide sequences from public databases, and handles sequence produced by experiments. It uses BSML based on XML as a common format in order to extract data fields and transfer heterogeneous sequence formats. To manage sequences and their changes, version management system for originated DNA is required so as to detect transformed new sequencing appearance and trigger database update. Our experimental results show that applying active trigger rules to manage changes of sequences can automatically store changes of sequences into databases.

DNA Information Hiding Method for DNA Data Storage (DNA 데이터 저장을 위한 DNA 정보 은닉 기법)

  • Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.118-127
    • /
    • 2014
  • DNA data storage refers to any technique for storing massive digital data in base sequence of DNA and has been recognized as the future storage medium recently. This paper presents an information hiding method for DNA data storage that the massive data is hidden in non-coding strand based on DNA steganography. Our method maps the encrypted data to the data base sequence using the numerical mapping table and then hides it in the non-coding strand using the key that consists of the seed and sector length. Therefore, our method can preserve the protein, extract the hidden data without the knowledge of host DNA sequence, and detect the position of mutation error. Experimental results verify that our method has more high data capacity than conventional methods and also detects the positions of mutation errors by the parity bases.

Cloning and Sequencing of Gene Fragment of Acid Proteinase from Penicillium oxalicum HCLF-34 (Penicillium oxalicum HCLF-34로부터 Acid Proteinase의 부분유전자 Cloning 및 Sequencing)

  • 현성희;천재순;강상순;김진규
    • Korean Journal of Microbiology
    • /
    • v.40 no.1
    • /
    • pp.12-16
    • /
    • 2004
  • Acid proteinase has been discovered in Aspergillus niger (acid protease A) and Cryphonectria parasitica (acid proteinase EapC) and it plays major roles in cheese formation from milk. In this study, a partial gene encoding acid proteinase in Penicillium oxalicum HCLF-34 was cloned by using PCR with degenerate primers corresponding to highly conserved regions of the acid proteinase. The partial acid proteinase gene in P. oxalicum HCLF-34 contains an open reading frame of 438 base pairs and encodes an acid proteinase protein of 146 amino aicds. The predicted amino acid sequences showed 71 % homology with acid protease A and 67% homology with EapC.

Least Square Prediction Error Expansion Based Reversible Watermarking for DNA Sequence (최소자승 예측오차 확장 기반 가역성 DNA 워터마킹)

  • Lee, Suk-Hwan;Kwon, Seong-Geun;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.11
    • /
    • pp.66-78
    • /
    • 2015
  • With the development of bio computing technology, DNA watermarking to do as a medium of DNA information has been researched in the latest time. However, DNA information is very important in biologic function unlikely multimedia data. Therefore, the reversible DNA watermarking is required for the host DNA information to be perfectively recovered. This paper presents a reversible DNA watermarking using least square based prediction error expansion for noncodng DNA sequence. Our method has three features. The first thing is to encode the character string (A,T,C,G) of nucleotide bases in noncoding region to integer code values by grouping n nucleotide bases. The second thing is to expand the prediction error based on least square (LS) as much as the expandable bits. The last thing is to prevent the false start codon using the comparison searching of adjacent watermarked code values. Experimental results verified that our method has more high embedding capacity than conventional methods and mean prediction method and also makes the prevention of false start codon and the preservation of amino acids.

Analyzing Financial Data from Banks and Savings Banks: Application of Bioinformatical Methods (은행과 저축은행 관련 재정 지표 분석: 생물 정보학 분석 기법의 응용)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.577-588
    • /
    • 2014
  • The collection and storage of a large volumes of data are becoming easier; however, the number of variables is sometimes more than the number of samples(objects). We now face the problem of dependency among variables(such as multicollinearity) due to the increased number of variables. We cannot apply various statistical methods without satisfying independency assumption. In order to overcome such a drawback we consider a categorizing (or discretizing) observations. We have a data set of nancial indices from banks in Korea that contain 78 variables from 16 banks. Genetic sequence data is also a good example of a large data and there have been numerous statistical methods to handle it. We discover lots of useful bank information after we transform bank data into categorical data that resembles genetic sequence data and apply bioinformatic techniques.

Construction of Web-Based Database for Anisakis Research (고래회충 연구를 위한 웹기반 데이터베이스 구축)

  • Lee, Yong-Seok;Baek, Moon-Ki;Jo, Yong-Hun;Kang, Se-Won;Lee, Jae-Bong;Han, Yeon-Soo;Cha, Hee-Jae;Yu, Hak-Sun;Ock, Mee-Sun
    • Journal of Life Science
    • /
    • v.20 no.3
    • /
    • pp.411-415
    • /
    • 2010
  • Anisakis simplex is one of the parasitic nematodes, and has a complex life cycle in crustaceans, fish, squid or whale. When people eat under-processed or raw fish, it causes anisakidosis and also plays a critical role in inducing serious allergic reactions in humans. However, no web-based database on A. simplex at the level of DNA or protein has been so far reported. In this context, we constructed a web-based database for Anisakis research. To build up the web-based database for Anisakis research, we proceeded with the following measures: First, sequences of order Ascaridida were downloaded and translated into the multifasta format which was stored as database for stand-alone BLAST. Second, all of the nucleotide and EST sequences were clustered and assembled. And EST sequences were translated into amino acid sequences for Nuclear Localization Signal prediction. In addition, we added the vector, E. coli, and repeat sequences into the database to confirm a potential contamination. The web-based database gave us several advantages. Only data that agrees with the nucleotide sequences directly related with the order Ascaridida can be found and retrieved when searching BLAST. It is also very convenient to confirm contamination when making the cDNA or genomic library from Anisakis. Furthermore, BLAST results on the Anisakis sequence information can be quickly accessed. Taken together, the Web-based database on A. simplex will be valuable in developing species specific PCR markers and in studying SNP in A. simplex-related researches in the future.

Robust DNA Watermarking based on Coding DNA Sequence (부호 영역 DNA 시퀀스 기반 강인한 DNA 워터마킹)

  • Lee, Suk-Hwan;Kwon, Seong-Geun;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.123-133
    • /
    • 2012
  • This paper discuss about DNA watermarking using coding DNA sequence (CDS) for the authentication, the privacy protection, or the prevention of illegal copy and mutation of DNA sequence and propose a DNA watermarking scheme with the mutation robustness and the animo acid preservation. The proposed scheme selects a number of codons at the regular singularity in coding regions for the embedding target and embeds the watermark for watermarked codons and original codons to be transcribed to the same amino acids. DNA base sequence is the string of 4 characters, {A,G,C,T} ({A,G,C,U} in RNA). We design the codon coding table suitable to watermarking signal processing and transform the codon sequence to integer numerical sequence by this table and re-transform this sequence to floating numerical sequence of circular angle. A codon consists of a consecutive of three bases and 64 codons are transcribed to one from 20 amino acids. We substitute the angle of selected codon to one among the angle range with the same animo acid, which is determined by the watermark bit and the angle difference of adjacent codons. From in silico experiment by using HEXA and ANG sequences, we verified that the proposed scheme is more robust to silent and missense mutations than the conventional scheme and preserve the amino acids of the watermarked codons.