• 제목/요약/키워드: non-redundant protein database

검색결과 9건 처리시간 0.022초

Theoretical Peptide Mass Distribution in the Non-Redundant Protein Database of the NCBI

  • Lim Da-Jeong;Oh Hee-Seok;Kim Hee-Bal
    • Genomics & Informatics
    • /
    • 제4권2호
    • /
    • pp.65-70
    • /
    • 2006
  • Peptide mass mapping is the matching of experimentally generated peptides masses with the predicted masses of digested proteins contained in a database. To identify proteins by matching their constituent fragment masses to the theoretical peptide masses generated from a protein database, the peptide mass fingerprinting technique is used for the protein identification. Thus, it is important to know the theoretical mass distribution of the database. However, few researches have reported the peptide mass distribution of a database. We analyzed the peptide mass distribution of non-redundant protein sequence database in the NCBI after digestion with 15 different types of enzymes. In order to characterize the peptide mass distribution with different digestion enzymes, a power law distribution (Zipfs law) was applied to the distribution. After constructing simulated digestion of a protein database, rank-frequency plot of peptide fragments was applied to generalize a Zipfs law curve for all enzymes. As a result, our data appear to fit Zipfs law with statistically significant parameter values.

Proteomics Data Analysis using Representative Database

  • Kwon, Kyung-Hoon;Park, Gun-Wook;Kim, Jin-Young;Park, Young-Mok;Yoo, Jong-Shin
    • Bioinformatics and Biosystems
    • /
    • 제2권2호
    • /
    • pp.46-51
    • /
    • 2007
  • In the proteomics research using mass spectrometry, the protein database search gives the protein information from the peptide sequences that show the best match with the tandem mass spectra. The protein sequence database has been a powerful knowledgebase for this protein identification. However, as we accumulate the protein sequence information in the database, the database size gets to be huge. Now it becomes hard to consider all the protein sequences in the database search because it consumes much computing time. For the high-throughput analysis of the proteome, usually we have used the non-redundant refined database such as IPI human database of European Bioinformatics Institute. While the non-redundant database can supply the search result in high speed, it misses the variation of the protein sequences. In this study, we have concerned the proteomics data in the point of protein similarities and used the network analysis tool to build a new analysis method. This method will be able to save the computing time for the database search and keep the sequence variation to catch the modified peptides.

  • PDF

Extended latex proteome analysis deciphers additional roles of the lettuce laticifer

  • Cho, Won-Kyong;Chen, Xiong-Yan;Rim, Yeong-Gil;Chu, Hyo-Sub;Jo, Yeon-Hwa;Kim, Su-Wha;Park, Zee-Yong;Kim, Jae-Yean
    • Plant Biotechnology Reports
    • /
    • 제4권4호
    • /
    • pp.311-319
    • /
    • 2010
  • Lettuce is an economically important leafy vegetable that accumulates a milk-like sap called latex in the laticifer. Previously, we conducted a large-scale lettuce latex proteomic analysis. However, the identified proteins were obtained only from lettuce ESTs and proteins deposited in NCBI databases. To extend the number of known latex proteins, we carried out an analysis identifying 302 additional proteins that were matched to the NCBI non-redundant protein database. Interestingly, the newly identified proteins were not recovered from lettuce EST and protein databases, indicating the usefulness of this hetero system in MudPIT analysis. Gene ontology studies revealed that the newly identified latex proteins are involved in many processes, including many metabolic pathways, binding functions, stress responses, developmental processes, protein metabolism, transport and signal transduction. Application of the non-redundant plant protein database led to the identification of an increased number of latex proteins. These newly identified latex proteins provide a rich source of information for laticifer research.

Transcriptome analysis of internal and external stress mechanisms in Aster spathulifolius Maxim.

  • Sivagami, Jean Claude;Park, SeonJoo
    • 한국자원식물학회:학술대회논문집
    • /
    • 한국자원식물학회 2019년도 춘계학술대회
    • /
    • pp.35-35
    • /
    • 2019
  • Aster spathulifolius Maxim. is belongs to the Asteraceae family which is distributed only in Korea and Japan. It is recognize as a traditionally medicinal plants and economically valuable in ornamental field. However, among the Asteraceae family, the Aster genus, which is lacks in genomic resources and information of molecular function. Therefore, we used high throughput RNA-sequencing transcriptome data of the A. spathulifolius to know molecular level function. DeNovo assembly produced 98,660 unigene with N50 value 1126 bp. Unigenes was performed to analyses the functional annotation against NCBI database like plant database of nucleotide (Nt) and non-redundant protein (Nr), Pfam, Uniprot, KEGG and Transcriptional factor (TF). In addition, Distribution of SSR markers also analyzed for future perfectives. Further, Comparing with other two Asteraceae family species like, Karelinia caspica and Chrysanthemum morifolium to the A. spathulifolius shows the number of gene that regulated in internal and external stress respectively salt-tolerant and heat and drought stress to understand the molecular basis related to the different environments stress.

  • PDF

SFannotation: A Simple and Fast Protein Function Annotation System

  • Yu, Dong Su;Kim, Byung Kwon
    • Genomics & Informatics
    • /
    • 제12권2호
    • /
    • pp.76-78
    • /
    • 2014
  • Owing to the generation of vast amounts of sequencing data by using cost-effective, high-throughput sequencing technologies with improved computational approaches, many putative proteins have been discovered after assembly and structural annotation. Putative proteins are typically annotated using a functional annotation system that uses extant databases, but the expansive size of these databases often causes a bottleneck for rapid functional annotation. We developed SFannotation, a simple and fast functional annotation system that rapidly annotates putative proteins against four extant databases, Swiss-Prot, TIGRFAMs, Pfam, and the non-redundant sequence database, by using a best-hit approach with BLASTP and HMMSEARCH.

Identification of Novel Cupredoxin Homologs Using Overlapped Conserved Residues Based Approach

  • Goyal, Amit;Madan, Bharat;Hwang, Kyu-Suk;Lee, Sun-Gu
    • Journal of Microbiology and Biotechnology
    • /
    • 제25권1호
    • /
    • pp.127-136
    • /
    • 2015
  • Cupredoxin-like proteins are mainly copper-binding proteins that conserve a typical rigid Greek-key arrangement consisting of an eight-stranded β-sandwich, even though they share as little as 10-15% sequence similarity. The electron transport function of the Cupredoxins is critical for respiration and photosynthesis, and the proteins have therapeutic potential. Despite their crucial biological functions, the identification of the distant Cupredoxin homologs has been a difficult task due to their low sequence identity. In this study, the overlapped conserved residue (OCR) fingerprint for the Cupredoxin superfamily, which consists of conserved residues in three aspects (i.e., the sequence, structure, and intramolecular interaction), was used to detect the novel Cupredoxin homologs in the NCBI non-redundant protein sequence database. The OCR fingerprint could identify 54 potential Cupredoxin sequences, which were validated by scanning them against the conserved Cupredoxin motif near the Cu-binding site. This study also attempted to model the 3D structures and to predict the functions of the identified potential Cupredoxins. This study suggests that the OCR-based approach can be used efficiently to detect novel homologous proteins with low sequence identity, such as Cupredoxins.

시스템 약리학적 분석에 의한 상산의 암전이 억제 효과 (Systems Pharmacological Analysis of Dichroae Radix in Anti-Tumor Metastasis Activity)

  • 이지예;신아연;김학군;안원근
    • 대한한의학방제학회지
    • /
    • 제31권4호
    • /
    • pp.295-313
    • /
    • 2023
  • Objectives : While treatments for cancer are advancing, the development of effective treatments for cancer metastasis, the main cause of cancer patient death, remains insufficient. Recent studies on Dichroae Radix have revealed that its active ingredients have the potential to inhibit cancer metastasis. This study aimed to investigate the cancer metastasis inhibitory effect of Dichroae Radix using network pharmacological analysis. Methods : The active compounds of Dichroae Radix have been identified using Traditional Chinese Medicine System Pharmacology Database and Analysis Platform. The UniProt database was used to collect each of information of all target proteins associated with the active compounds. To find the bio-metabolic processes associated with each target, the DAVID6.8 Gene Functional classifier tool was used. Compound-Target and Target-Pathway networks were analyzed via Cytoscape 3.40. Results : In total, 25 active compounds and their 62 non-redundant targets were selected through the TCMSP database and analysis platform. The target genes underwent gene ontology and pathway enrichment analysis. The gene list applied to the gene ontology analysis revealed associations with various biological processes, including signal transduction, chemical synaptic transmission, G-protein-coupled receptor signaling pathways, response to xenobiotic stimulus, and response to drugs, among others. A total of eleven genes, including HSP90AB1, CALM1, F2, AR, PAKACA, PTGS2, NOS2, RXRA, ESR1, ESR2, and NCOA1, were found to be associated with biological pathways related to cancer metastasis. Furthermore, nineteen of the active compounds from Dichroae Radix were confirmed to interact with these genes. Conclusions : The results provide valuable insights into the mechanism of action and molecular targets of Dichroae Radix. Notably, Berberine, the main active ingredient of Dichroae Radix, plays a significant role in degrading AR proteins in advanced prostate cancer. Further studies and validations can provide crucial data to advance cancer metastasis prevention and treatment strategies.

Draft Genome Assembly and Annotation for Cutaneotrichosporon dermatis NICC30027, an Oleaginous Yeast Capable of Simultaneous Glucose and Xylose Assimilation

  • Wang, Laiyou;Guo, Shuxian;Zeng, Bo;Wang, Shanshan;Chen, Yan;Cheng, Shuang;Liu, Bingbing;Wang, Chunyan;Wang, Yu;Meng, Qingshan
    • Mycobiology
    • /
    • 제50권1호
    • /
    • pp.66-78
    • /
    • 2022
  • The identification of oleaginous yeast species capable of simultaneously utilizing xylose and glucose as substrates to generate value-added biological products is an area of key economic interest. We have previously demonstrated that the Cutaneotrichosporon dermatis NICC30027 yeast strain is capable of simultaneously assimilating both xylose and glucose, resulting in considerable lipid accumulation. However, as no high-quality genome sequencing data or associated annotations for this strain are available at present, it remains challenging to study the metabolic mechanisms underlying this phenotype. Herein, we report a 39,305,439 bp draft genome assembly for C. dermatis NICC30027 comprised of 37 scaffolds, with 60.15% GC content. Within this genome, we identified 524 tRNAs, 142 sRNAs, 53 miRNAs, 28 snRNAs, and eight rRNA clusters. Moreover, repeat sequences totaling 1,032,129 bp in length were identified (2.63% of the genome), as were 14,238 unigenes that were 1,789.35 bp in length on average (64.82% of the genome). The NCBI non-redundant protein sequences (NR) database was employed to successfully annotate 11,795 of these unigenes, while 3,621 and 11,902 were annotated with the Swiss-Prot and TrEMBL databases, respectively. Unigenes were additionally subjected to pathway enrichment analyses using the Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), Cluster of Orthologous Groups of proteins (COG), Clusters of orthologous groups for eukaryotic complete genomes (KOG), and Non-supervised Orthologous Groups (eggNOG) databases. Together, these results provide a foundation for future studies aimed at clarifying the mechanistic basis for the ability of C. dermatis NICC30027 to simultaneously utilize glucose and xylose to synthesize lipids.

아카풀코나리에서 Differential Slot Blot을 이용한 약발현 유전자 목록작성 (Cataloguing of Anther Expressed Genes through Differential Slot Blot in Oriental Lily (Lilium Oriental Hybrid 'Acapulco'))

  • 서은정;유희주;한봉희;임용표;정미정;이성곤;김동헌;장안철;예병우
    • 원예과학기술지
    • /
    • 제31권5호
    • /
    • pp.598-606
    • /
    • 2013
  • 약은 생식과 화형을 결정짓는 꽃의 주요한 기관 중 하나이다. 오리엔탈나리인 아카풀코로부터 만든 약 특이적 cDNA library로부터 2000개의 ESTs를 무작위로 선발하였다. 잎과 약을 cDNA 탐침으로 이용한 differential slot blot이 약에서 발현되는 클론들을 얻기 위해 사용되었으며 570개의 비반복적 ESTs를 얻었고 염기서열분석을 하였다. BLASTX 알고리즘을 이용하여 GenBank에 비교해서 191개의 클론이 의미 있는 유사성을 보였지만 나머지(66.5%)는 기존에 보고된 염기서열에 확인되지 않았다. Gene ontology(GO) annotation에 따른 기능분류결과 대체적으로 세포와 세포구성 부분에서 주요하게 단백질이 확인되었다. 7개의 다른 기관과 발달 단계에서 전사체 분석은 약특이적일 적으로 추정되는 30개의 클론을 가지고 노던혼성화반응을 이용하여 수행하였다. 이러한 결과는 differential slot blot을 이용하여 약에 발현되는 유전자를 선별하는 것이 매우 효과적인 방법인 것으로 간주되며 또한 지금의 연구가 앞으로 나리의 화분을 포함한 약에 대한 기초정보를 제공할 수 있을 것으로 생각한다.