• Title/Summary/Keyword: n-gram similarity

Search Result 33, Processing Time 0.023 seconds

Isolation and characterization of a noval membrane-bound cytochrome $C_{553}$ from the strictly anaerobic phototroph, heliobacillus mobilis

  • Lee, Woo-Yiel;Bla;Kim, Seung-Ho
    • Journal of Microbiology
    • /
    • v.35 no.3
    • /
    • pp.206-212
    • /
    • 1997
  • Heliobacillus mobilis is a strictly anaerobic Gram-positive bacterium which contains a primitive Photosystem I-type reaction center. The membrane-bound cytochrome $C_{553}$ from the heliobacterium suggested to be the immediate electron donor to the photooxidized pigment (P798+) has been isolated and characterized. The heme protein was visualized as a major component with an apparent molecular size of 17kDa in TMBZ-staining analysis of the membrane preparation and showed characteristic $\alpha$ (552.5 nm), $\beta$ (522nm), and Soret absorption (416 nm) peaks of a typical reduced c-type cytochrome in the partially purified sample. The internal 43 amino acid sequence of the electron donor was obtained by chemical agent and protease treatments followed by N-terminal sequencing of the resulting fragments. The internal sequence carries lots of lysine residues and a Cys-X-X-Cys-His sequence motif which are the characteristics of typical c-type cytochromes. The analysis of the sequence by FAST or FASTA program, however, did not show any significant similarity to other known heme proteins.

  • PDF

Score Image Retrieval to Inaccurate OMR performance

  • Kim, Haekwang
    • Journal of Broadcast Engineering
    • /
    • v.26 no.7
    • /
    • pp.838-843
    • /
    • 2021
  • This paper presents an algorithm for effective retrieval of score information to an input score image. The originality of the proposed algorithm is that it is designed to be robust to recognition errors by an OMR (Optical Music Recognition), while existing methods such as pitch histogram requires error induced OMR result be corrected before retrieval process. This approach helps people to retrieve score without training on music score for error correction. OMR takes a score image as input, recognizes musical symbols, and produces structural symbolic notation of the score as output, for example, in MusicXML format. Among the musical symbols on a score, it is observed that filled noteheads are rarely detected with errors with its simple black filled round shape for OMR processing. Barlines that separate measures also strong to OMR errors with its long uniform length vertical line characteristic. The proposed algorithm consists of a descriptor for a score and a similarity measure between a query score and a reference score. The descriptor is based on note-count, the number of filled noteheads in a measure. Each part of a score is represented by a sequence of note-count numbers. The descriptor is an n-gram sequence of the note-count sequence. Simulation results show that the proposed algorithm works successfully to a certain degree in score image-based retrieval for an erroneous OMR output.

Modern Methods of Text Analysis as an Effective Way to Combat Plagiarism

  • Myronenko, Serhii;Myronenko, Yelyzaveta
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.8
    • /
    • pp.242-248
    • /
    • 2022
  • The article presents the analysis of modern methods of automatic comparison of original and unoriginal text to detect textual plagiarism. The study covers two types of plagiarism - literal, when plagiarists directly make exact copying of the text without changing anything, and intelligent, using more sophisticated techniques, which are harder to detect due to the text manipulation, like words and signs replacement. Standard techniques related to extrinsic detection are string-based, vector space and semantic-based. The first, most common and most successful target models for detecting literal plagiarism - N-gram and Vector Space are analyzed, and their advantages and disadvantages are evaluated. The most effective target models that allow detecting intelligent plagiarism, particularly identifying paraphrases by measuring the semantic similarity of short components of the text, are investigated. Models using neural network architecture and based on natural language sentence matching approaches such as Densely Interactive Inference Network (DIIN), Bilateral Multi-Perspective Matching (BiMPM) and Bidirectional Encoder Representations from Transformers (BERT) and its family of models are considered. The progress in improving plagiarism detection systems, techniques and related models is summarized. Relevant and urgent problems that remain unresolved in detecting intelligent plagiarism - effective recognition of unoriginal ideas and qualitatively paraphrased text - are outlined.

ISOLATION, IDENTIFICATION AND CHARACTERIZATION OF AN IMMOBILIZED BACTERIUM PRODUCING N2 FROM NH4+ UNDER AN AEROBIC CONDITION

  • Park, Kyoung-Joo;Cho, Kyoung-Sook;Kim, Jeong-Bo;Lee, Min-Gyu;Lee, Byung-Hun;Hong, Young-Ki;Kim, Joong-Kyun
    • Environmental Engineering Research
    • /
    • v.10 no.5
    • /
    • pp.213-226
    • /
    • 2005
  • To treat wastewater efficiently by a one-step process of nitrogen removal, a new bacterial strain producing $N_2$ gas from ${NH_4}^+$ under an aerobic condition was isolated and identified. The cell was motile and a Gram-negative rod, and usually occurred in pairs. By 16S-rDNA analysis, the isolated strain was identified as Enterobacter asburiae with 96% similarity. The isolate showed that the capacity of $N_2$ production under an oxic condition was approximately three times higher than that under an anoxic condition. Thus, the consumption of ${NH_4}^+$ by the isolate was significantly different in the metabolism of $N_2$ production under the two different environmental conditions. The optimal conditions of the immobilized isolate for $N_2$ production were found to be pH 7.0, $30^{\circ}C$ and C/N ratio 5, respectively. Under all the optimum reaction conditions, $N_2$ production by the immobilized isolate resulted in reduction of ORP with both the consumption of DO and the drop of pH. The removal efficiencies of $COD_{Cr}$, and TN were 56.1 and 60.9%, respectively. The removal rates of $COD_{Cr}$, and TN were the highest for the first 2.5 hrs with the removal $COD_{Cr}/TN$ ratios of 32.1, and afterwards the rates decreased as reaction proceeded. For application of the immobilized isolate to a practical process of ammonium removal, a continuous operation was executed with a synthetic medium of a low C/N ratio. The continuous bioreactor system exhibited a satisfactory performance at 12.1 hrs of HRT, in which the effluent concentrations of ${NH_4}^+$-N was measured to be 15.4 mg/L with its removal efficiency of 56.0%. The maximum removal rate of ${NH_4}^+$-N reached 1.6 mg ${NH_4}^+$-N/L/hr at 12.1 hrs of HRT(with N loading rate of $0.08\;Kg-N/m^3$-carrier/d). As a result, the application of the immobilized isolate appears a viable alternative to the nitrification-denitrification processes.

An Efficient Frequent Melody Indexing Method to Improve Performance of Query-By-Humming System (허밍 질의 처리 시스템의 성능 향상을 위한 효율적인 빈번 멜로디 인덱싱 방법)

  • You, Jin-Hee;Park, Sang-Hyun
    • Journal of KIISE:Databases
    • /
    • v.34 no.4
    • /
    • pp.283-303
    • /
    • 2007
  • Recently, the study of efficient way to store and retrieve enormous music data is becoming the one of important issues in the multimedia database. Most general method of MIR (Music Information Retrieval) includes a text-based approach using text information to search a desired music. However, if users did not remember the keyword about the music, it can not give them correct answers. Moreover, since these types of systems are implemented only for exact matching between the query and music data, it can not mine any information on similar music data. Thus, these systems are inappropriate to achieve similarity matching of music data. In order to solve the problem, we propose an Efficient Query-By-Humming System (EQBHS) with a content-based indexing method that efficiently retrieve and store music when a user inquires with his incorrect humming. For the purpose of accelerating query processing in EQBHS, we design indices for significant melodies, which are 1) frequent melodies occurring many times in a single music, on the assumption that users are to hum what they can easily remember and 2) melodies partitioned by rests. In addition, we propose an error tolerated mapping method from a note to a character to make searching efficient, and the frequent melody extraction algorithm. We verified the assumption for frequent melodies by making up questions and compared the performance of the proposed EQBHS with N-gram by executing various experiments with a number of music data.

Analysis of ICT Education Trends using Keyword Occurrence Frequency Analysis and CONCOR Technique (키워드 출현 빈도 분석과 CONCOR 기법을 이용한 ICT 교육 동향 분석)

  • Youngseok Lee
    • Journal of Industrial Convergence
    • /
    • v.21 no.1
    • /
    • pp.187-192
    • /
    • 2023
  • In this study, trends in ICT education were investigated by analyzing the frequency of appearance of keywords related to machine learning and using conversion of iteration correction(CONCOR) techniques. A total of 304 papers from 2018 to the present published in registered sites were searched on Google Scalar using "ICT education" as the keyword, and 60 papers pertaining to ICT education were selected based on a systematic literature review. Subsequently, keywords were extracted based on the title and summary of the paper. For word frequency and indicator data, 49 keywords with high appearance frequency were extracted by analyzing frequency, via the term frequency-inverse document frequency technique in natural language processing, and words with simultaneous appearance frequency. The relationship degree was verified by analyzing the connection structure and centrality of the connection degree between words, and a cluster composed of words with similarity was derived via CONCOR analysis. First, "education," "research," "result," "utilization," and "analysis" were analyzed as main keywords. Second, by analyzing an N-GRAM network graph with "education" as the keyword, "curriculum" and "utilization" were shown to exhibit the highest correlation level. Third, by conducting a cluster analysis with "education" as the keyword, five groups were formed: "curriculum," "programming," "student," "improvement," and "information." These results indicate that practical research necessary for ICT education can be conducted by analyzing ICT education trends and identifying trends.

Isolation of Novel Alkalophilic Bacillus alcalophilus subsp. YB380 and the Characteristics of Its Yeast Cell Wall Hydrolase

  • Yeo, Ik-Hyun;Han, Suk-Kyun;Yu, Ju-Hyun;Bai, Dong-Hoon
    • Journal of Microbiology and Biotechnology
    • /
    • v.8 no.5
    • /
    • pp.501-508
    • /
    • 1998
  • An alkalophilic mi.croorganism (strain YB380), which produces yeast cell wall hydrolase extracellulary, was isolated from Korean soil. The rod-shaped cells were 0.3~0.4 by 2~4${\mu}{\textrm}{m}$ long, motile, aerobic, gram-positive, and spore-forming. The color of the colony was light yellow. The temperature range for growth at pH 9.0 was 25 to $45{\circ}C, with optimum growth at $35{\circ}C. The pH range for growth at $35{\circ}C was 8 to 11 with an optimum pH of 9.0. Therefore, the strain YB380 is an obligate alkalophile. The 16S rRNA of strain YB380 has a 99% sequence similarity with that of Bacillus alcalophilus. On the basis of physiological properties, cell wall fatty acid composition, and phylogenetic analysis, we propose that the isolated strain is Bacillus alcalophilus. The yeast cell wall hydrolase from Bacillus alcalophilus subsp. YB380 has been purified and partially characterized. The molecular weight was estimated to be 27,000 daltons with an optimum temperature and pH of $60{\circ}C and 9.0, respectively. The N-terminal amino acid sequence of the enzyme was analyzed as Gln- Thr- Val- Pro- Trp- Gly- Ile- Asn- Arg- Val.

  • PDF

Sphingobacterium composti sp. nov., a Novel DNase-Producing Bacterium Isolated from Compost

  • Ten Leonid N.;Liu, Qing-Mei;Im Wan-Taek;Aslam Zubair;Lee, Sung-Taik
    • Journal of Microbiology and Biotechnology
    • /
    • v.16 no.11
    • /
    • pp.1728-1733
    • /
    • 2006
  • A Gram-negative, strictly aerobic, nonmotile, and nonspore-forming bacterial strain, designated $T5-12^T$, was isolated from compost and characterized using a polyphasic taxonomical approach. The isolate was positive for catalase and oxidase tests. It could degrade DNA, but was negative for degradation of macromolecules such as casein, collagen, starch, chitin, cellulose, and xylan. The DNA G+C content was 36.0 mol%. The predominant isoprenoid quinone was menaquinone 7 (MK-7). The major fatty acids were $iso-C_{15:0}$ (45.6%), $iso-C_{17:0}$ 3OH (17.2%), and summed feature 4 ($C_{16:0}\;{\omega}7c$ and/or $iso-C_{15:0}$ 2OH, 14.9%). Comparative 16S rRNA gene sequence analysis showed that strain $T5-12^T$ fell within the radiation of the cluster comprising members of the genus Sphingobacterium. Strain $T5-12^T$ exhibited lower than 94% of 16S rRNA gene sequence similarity with respect to the type strains of recognized Sphingobacterium species. On the basis of its phenotypic properties and phylogenetic distinctiveness, strain $T5-12^T$ ($=KCTC\;12578^T=LMG\;23401^T=CCUG\;52467^T$) should be classified in the genus Sphingobacterium as the type strain of a novel species, for which the name Sphingobacterium composti sp. novo is proposed.

A report of 42 unrecorded actinobacterial species in Korea

  • Lee, Na-Young;Cha, Chang-Jun;Im, Wan-Taek;Kim, Seung-Bum;Seong, Chi-Nam;Bae, Jin-Woo;Jahng, Kwang Yeop;Cho, Jang-Cheon;Joh, Kiseong;Jeon, Che Ok;Yi, Hana;Lee, Soon Dong
    • Journal of Species Research
    • /
    • v.7 no.1
    • /
    • pp.36-49
    • /
    • 2018
  • During a study to discover indigenous prokaryotic species in Korea in 2016, a total of 42 actinobacterial isolates were recovered from various environmental samples collected from natural cave, squid, sewage, sea water, trees, droppings of birds, freshwater, eelgrass, mud flat, sediment and soil. On the basis of a tight phylogenetic clade with the closest species and high level of 16S rRNA gene sequence similarity, it was shown that each isolate was assigned to independent and previously described bacterial species which were assigned to the phylum Actinobacteria. The following 42 species have not been reported in Korea: eight species in two genera n the order Corynebacteriales, 26 species of 16 genera in the Micrococcales, one species of one genus in the Micromonosporales, one species of one genus in the Propionibacteriales, four species of two genera in the Streptomycetales and two species of two genera in the Streptosporangiale. Cell morphology, Gram staining reaction, colony colors and features, the media and conditions of incubation, physiological and biochemical characteristics, origins of isolation and strain IDs of 42 unrecorded actinobacterial species are presented in the species description.

Novosphingobium ginsenosidimutans sp. nov., with the Ability to Convert Ginsenoside

  • Kim, Jin-Kwang;He, Dan;Liu, Qing-Mei;Park, Hye-Yoon;Jung, Mi-Sun;Yoon, Min-Ho;Kim, Sun-Chang;Im, Wan-Taek
    • Journal of Microbiology and Biotechnology
    • /
    • v.23 no.4
    • /
    • pp.444-450
    • /
    • 2013
  • A Gram-negative, strictly aerobic, non-motile, non-spore-forming, and rod-shaped bacterial strain designated FW-$6^T$ was isolated from a freshwater sample and its taxonomic position was investigated by using a polyphasic approach. Strain FW-$6^T$ grew optimally at $10-42^{\circ}C$ and at pH 7.0 on nutrient and R2A agar. Strain FW-$6^T$ displayed ${\beta}$-glucosidase activity that was responsible for its ability to transform ginsenoside $Rb_1$ (one of the dominant active components of ginseng) to Rd. On the basis of 16S rRNA gene sequence similarity, strain FW-$6^T$ was shown to belong to the family Sphingomonadaceae and was related to Novosphingobium aromaticivorans DSM $12444^T$ (98.1% sequence similarity) and N. subterraneum IFO $16086^T$ (98.0%). The G+C content of the genomic DNA was 64.4%. The major menaquinone was Q-10 and the major fatty acids were summed feature 7 (comprising $C_{18:1}{\omega}9c/{\omega}12t/{\omega}7c$), summed feature 4 (comprising $C_{16:1}{\omega}7c/iso-C_{15:0}2OH$), $C_{16:0}$, and $C_{14:0}$ 2OH. DNA and chemotaxonomic data supported the affiliation of strain FW-$6^T$ to the genus Novosphingobium. Strain FW-$6^T$ could be differentiated genotypically and phenotypically from the recognized species of the genus Novosphingobium. The isolate that has ginsenoside converting ability therefore represents a novel species, for which the name Novosphingobium ginsenosidimutans sp. nov. is proposed, with the type strain FW-$6^T$ (= KACC $16615^T$ = JCM $18202^T$).