• Title/Summary/Keyword: 서열 변환

Search Result 56, Processing Time 0.03 seconds

A Constraint-based Three-Dimensional Visualization Method of Operational Taxonomic Units for Phylogenetic Analysis (계통발생학적 분석을 위한 분류 단위의 제약조건 기반의 3차원 시각화 기법)

  • Lee Sun-a;Lee Keon Myung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2005.11a
    • /
    • pp.523-526
    • /
    • 2005
  • 계통발생학적 분석기법은 서열의 유사성을 비교하여 이들의 유연관계를 알아내는 것으로, 각각의 관계를 시각적으로 표현하는 것이 매우 중요하다. 일반적으로 2차원 계통수를 사용한다. 그러나 2차원으로 시각화했을 때 서로 유사성이 높은 OTU(Operational Taxonomic Unit)들을 서로 멀리 떨어뜨려 놓는 경우도 생기게 된다. 이 논문에서는 이러한 점을 보완하고자 3차원 공간에 OTU들을 배치시키기 위한 2단계 좌표 배치 기법을 제안한다. 단계는 유클리디안 거리를 3차원 좌표로 변환하는 것이다. 1단계 방법은 서열의 비교 순서에 영향을 받기 때문에 2단계를 통해 유전자 알고리즘 기법을 적용하여 보다 적절한 좌표를 찾는다.

  • PDF

A Classification Method for Deformed Words Using Multiple Sequence Alignment (다중서열정렬을 이용한 변형단어집합의 분류 기법)

  • Kim, Sung-Hwan;Cho, Hwan-Gue
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2012.06b
    • /
    • pp.264-266
    • /
    • 2012
  • 인터넷 상에서의 변형 단어들을 처리하는 문제는 정보 검색, 기계 번역, 웹 마이닝, 욕설 및 스팸 필터링과 같은 다양한 분야에서 사용될 수 있다. 특히 단어의 변형 추이를 파악하는 등 데이터 수집 및 분석을 위해서는 주어진 단어가 어떤 변형 단어의 집합으로 이루어진 부류에 포함되는지 여부를 파악해야 할 필요성이 있다. 본 논문에서는 같은 부류에 속한 변형 단어 집합에 대하여 다중 서열 정렬(multiple sequence alignment)을 수행함으로써 해당 집합을 하나의 대표 문자열로 취급하는 변환 기법을 제안하고, 이를 이용해 주어진 단어가 해당 부류에 속하는지 여부를 효과적으로 분류하는 기법을 소개한다. 실험결과 제안 기법의 분류 성능은 민감도 93.4% 수준에서 89.1%의 특이도를 보여 전수 비교를 통한 분류에 비하여 결코 성능은 하락하지 않으면서 분류 속도는 16.5배 향상되었음을 확인할 수 있었다.

DNA Information Hiding Method for DNA Data Storage (DNA 데이터 저장을 위한 DNA 정보 은닉 기법)

  • Lee, Suk-Hwan;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.10
    • /
    • pp.118-127
    • /
    • 2014
  • DNA data storage refers to any technique for storing massive digital data in base sequence of DNA and has been recognized as the future storage medium recently. This paper presents an information hiding method for DNA data storage that the massive data is hidden in non-coding strand based on DNA steganography. Our method maps the encrypted data to the data base sequence using the numerical mapping table and then hides it in the non-coding strand using the key that consists of the seed and sector length. Therefore, our method can preserve the protein, extract the hidden data without the knowledge of host DNA sequence, and detect the position of mutation error. Experimental results verify that our method has more high data capacity than conventional methods and also detects the positions of mutation errors by the parity bases.

Construction of Web-Based Database for Anisakis Research (고래회충 연구를 위한 웹기반 데이터베이스 구축)

  • Lee, Yong-Seok;Baek, Moon-Ki;Jo, Yong-Hun;Kang, Se-Won;Lee, Jae-Bong;Han, Yeon-Soo;Cha, Hee-Jae;Yu, Hak-Sun;Ock, Mee-Sun
    • Journal of Life Science
    • /
    • v.20 no.3
    • /
    • pp.411-415
    • /
    • 2010
  • Anisakis simplex is one of the parasitic nematodes, and has a complex life cycle in crustaceans, fish, squid or whale. When people eat under-processed or raw fish, it causes anisakidosis and also plays a critical role in inducing serious allergic reactions in humans. However, no web-based database on A. simplex at the level of DNA or protein has been so far reported. In this context, we constructed a web-based database for Anisakis research. To build up the web-based database for Anisakis research, we proceeded with the following measures: First, sequences of order Ascaridida were downloaded and translated into the multifasta format which was stored as database for stand-alone BLAST. Second, all of the nucleotide and EST sequences were clustered and assembled. And EST sequences were translated into amino acid sequences for Nuclear Localization Signal prediction. In addition, we added the vector, E. coli, and repeat sequences into the database to confirm a potential contamination. The web-based database gave us several advantages. Only data that agrees with the nucleotide sequences directly related with the order Ascaridida can be found and retrieved when searching BLAST. It is also very convenient to confirm contamination when making the cDNA or genomic library from Anisakis. Furthermore, BLAST results on the Anisakis sequence information can be quickly accessed. Taken together, the Web-based database on A. simplex will be valuable in developing species specific PCR markers and in studying SNP in A. simplex-related researches in the future.

Analyzing Financial Data from Banks and Savings Banks: Application of Bioinformatical Methods (은행과 저축은행 관련 재정 지표 분석: 생물 정보학 분석 기법의 응용)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.27 no.4
    • /
    • pp.577-588
    • /
    • 2014
  • The collection and storage of a large volumes of data are becoming easier; however, the number of variables is sometimes more than the number of samples(objects). We now face the problem of dependency among variables(such as multicollinearity) due to the increased number of variables. We cannot apply various statistical methods without satisfying independency assumption. In order to overcome such a drawback we consider a categorizing (or discretizing) observations. We have a data set of nancial indices from banks in Korea that contain 78 variables from 16 banks. Genetic sequence data is also a good example of a large data and there have been numerous statistical methods to handle it. We discover lots of useful bank information after we transform bank data into categorical data that resembles genetic sequence data and apply bioinformatic techniques.

Least Square Prediction Error Expansion Based Reversible Watermarking for DNA Sequence (최소자승 예측오차 확장 기반 가역성 DNA 워터마킹)

  • Lee, Suk-Hwan;Kwon, Seong-Geun;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.52 no.11
    • /
    • pp.66-78
    • /
    • 2015
  • With the development of bio computing technology, DNA watermarking to do as a medium of DNA information has been researched in the latest time. However, DNA information is very important in biologic function unlikely multimedia data. Therefore, the reversible DNA watermarking is required for the host DNA information to be perfectively recovered. This paper presents a reversible DNA watermarking using least square based prediction error expansion for noncodng DNA sequence. Our method has three features. The first thing is to encode the character string (A,T,C,G) of nucleotide bases in noncoding region to integer code values by grouping n nucleotide bases. The second thing is to expand the prediction error based on least square (LS) as much as the expandable bits. The last thing is to prevent the false start codon using the comparison searching of adjacent watermarked code values. Experimental results verified that our method has more high embedding capacity than conventional methods and mean prediction method and also makes the prevention of false start codon and the preservation of amino acids.

Robust DNA Watermarking based on Coding DNA Sequence (부호 영역 DNA 시퀀스 기반 강인한 DNA 워터마킹)

  • Lee, Suk-Hwan;Kwon, Seong-Geun;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.49 no.2
    • /
    • pp.123-133
    • /
    • 2012
  • This paper discuss about DNA watermarking using coding DNA sequence (CDS) for the authentication, the privacy protection, or the prevention of illegal copy and mutation of DNA sequence and propose a DNA watermarking scheme with the mutation robustness and the animo acid preservation. The proposed scheme selects a number of codons at the regular singularity in coding regions for the embedding target and embeds the watermark for watermarked codons and original codons to be transcribed to the same amino acids. DNA base sequence is the string of 4 characters, {A,G,C,T} ({A,G,C,U} in RNA). We design the codon coding table suitable to watermarking signal processing and transform the codon sequence to integer numerical sequence by this table and re-transform this sequence to floating numerical sequence of circular angle. A codon consists of a consecutive of three bases and 64 codons are transcribed to one from 20 amino acids. We substitute the angle of selected codon to one among the angle range with the same animo acid, which is determined by the watermark bit and the angle difference of adjacent codons. From in silico experiment by using HEXA and ANG sequences, we verified that the proposed scheme is more robust to silent and missense mutations than the conventional scheme and preserve the amino acids of the watermarked codons.

Reversible DNA Information Hiding based on Circular Histogram Shifting (순환형 히스토그램 쉬프팅 기반 가역성 DNA 정보은닉 기법)

  • Lee, Suk-Hwan;Kwon, Seong-Geun;Kwon, Ki-Ryong
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.53 no.12
    • /
    • pp.67-75
    • /
    • 2016
  • DNA computing technology makes the interests on DNA storage and DNA watermarking / steganography that use the DNA information as a newly medium. DNA watermarking that embeds the external watermark into DNA information without the biological mutation needs the reversibility for the perfect recovery of host DNA, the continuous embedding and detecting processing, and the mutation analysis by the watermark. In this paper, we propose a reversible DNA watermarking based on circular histogram shifting of DNA code values with the prevention of false start codon, the preservation of DNA sequence length, and the high watermark capacity, and the blind detection. Our method has the following features. The first is to encode nucleotide bases of 4-character variable to integer code values by code order. It makes the signal processing of DNA sequence easy. The second is to embed the multiple bits of watermark into -order coded value by using circular histogram shifting. The third is to check the possibility of false start codon in the inter or intra code values. Experimental results verified the our method has higher watermark capacity 0.11~0.50 bpn than conventional methods and also the false start codon has not happened in our method.

Cloning and Sequencing of a Gene Involved in the Biosynthesis of Exopolysaccharide in Zoogloea Ramigera 115SLR (Zoogloea Ramigera 115SLR로부터 다당류 생합성에 관여하는 유전자의 분리 및 염기서열 결정)

  • Sam-Pin Lee;Min Yoo
    • Biomedical Science Letters
    • /
    • v.6 no.1
    • /
    • pp.1-9
    • /
    • 2000
  • To identity the genes responsible for the biosynthesis of exopolysaccharide, recombinant plasmids pUEX10 and pLEX10 were constructed from plasmid pLEX3 which was isolated from the recombinant cosmid library of Zoogloea ramigera 115. The complete nucleotide sequence of the 1.7 kb genomic DNA insert in plasmid pUEX10 was determined. Its analysis identified two open reading frames (ORF3 & ORF4) which could encode two proteins. The amino acid sequence derived from ORF3 showed the homology with gumC protein in Xanthomonas campestris as well as exoP protein in Rhizobium melizoti. The partial amino acid sequence of ORF4 showed the homology with polysaccharide export protein in Thermotoga maritima. Z. ramigera 115SLR and Z. ramigera 115SLR/pLEX10 showed the similar pattern for EPS production. Yield of exopolysaccharides produced by Z. ramigera 115SLR and Z. ramigera 115SLR/pLEX10 was 0.26% (w/v) and 0.16% (w/v), respectively.

  • PDF

Molecular cloning, sequences analysis and in vitro expression of the dihydroflavonol 4-reductase gene from Gypsophila paniculata L. (안개초(Gypsophila paniculata L.)로부터 dihydroflavonol 4-reductase 유전자의 분리 및 분석)

  • Min, Byung-Whan;Cheong, Dong-Chun
    • Journal of Plant Biotechnology
    • /
    • v.37 no.1
    • /
    • pp.89-95
    • /
    • 2010
  • Dihydroflavonol 4-reductase (DFR) is a key enzyme of the flavonoid biosynthesis pathway which catalyses the NADPH-dependent reduction of 2R,3R-trans-dihydroflavonols to leucoanthocyanidins. In this study we describe cloning and expression of the genes encoding the flavonoid-biosynthetic enzyme DFR in Gypsophila paniculata L. Inspection of the 1279 bp long sequence revealed an open reading frame 1063 bp, including a 36 bp 5' leader region and 181 bp 3' untranslated region. Comparison of the coding region of this DFR cDNA sequence including the sequences of Arabidopsis thaliana, Citrus sinensis, Dianthus caryophyllus, Ipomoea batatas, Matthiola incana, Nierembergia sp, Petunia hybrida, Solanum tuberosum, Vitis vinifera reveals an identity higher than 69% at the nucleotide level. The function of this nucleotide sequences was verified by comparison with amino acid sequences of the amino-terminus and tryptic peptides from purified plant enzyme, by northern blotting with mRNA from wild type and mutant plants, by in vitro expression yielding and enzymatically active reductase, as indicated by the small leucopelargonidin peak. Genomic southern blot analysis showed the presence of only one gene for DFR in Gypsophila paniculata.