• Title/Summary/Keyword: gram

Search Result 3,546, Processing Time 0.029 seconds

n-Gram/2L: A Space and Time Efficient Two-Level n-Gram Inverted Index Structure (n-gram/2L: 공간 및 시간 효율적인 2단계 n-gram 역색인 구조)

  • Kim Min-Soo;Whang Kyu-Young;Lee Jae-Gil;Lee Min-Jae
    • Journal of KIISE:Databases
    • /
    • v.33 no.1
    • /
    • pp.12-31
    • /
    • 2006
  • The n-gram inverted index has two major advantages: language-neutral and error-tolerant. Due to these advantages, it has been widely used in information retrieval or in similar sequence matching for DNA and Protein databases. Nevertheless, the n-gram inverted index also has drawbacks: the size tends to be very large, and the performance of queries tends to be bad. In this paper, we propose the two-level n-gram inverted index (simply, the n-gram/2L index) that significantly reduces the size and improves the query performance while preserving the advantages of the n-gram inverted index. The proposed index eliminates the redundancy of the position information that exists in the n-gram inverted index. The proposed index is constructed in two steps: 1) extracting subsequences of length m from documents and 2) extracting n-grams from those subsequences. We formally prove that this two-step construction is identical to the relational normalization process that removes the redundancy caused by a non-trivial multivalued dependency. The n-gram/2L index has excellent properties: 1) it significantly reduces the size and improves the Performance compared with the n-gram inverted index with these improvements becoming more marked as the database size gets larger; 2) the query processing time increases only very slightly as the query length gets longer. Experimental results using databases of 1 GBytes show that the size of the n-gram/2L index is reduced by up to 1.9${\~}$2.7 times and, at the same time, the query performance is improved by up to 13.1 times compared with those of the n-gram inverted index.

Out of Vocabulary Word Extractor based on a Syllable n-gram (음절 n-gram 기반의 미등록 어휘 추정기 구현)

  • Shin, Junsoo;Hong, Chohee
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.139-141
    • /
    • 2013
  • 다양한 콘텐츠가 생성됨에 따라 신조어 및 미등록어도 다양한 형태로 나타나고 있다. 이러한 신조어 및 미등록어는 텍스트 처리 단계에서 오분석 되어 성능 저하의 원인이 된다. 본 논문은 이러한 문제를 해결하기 위해서 대량의 문서로부터 신조어 및 미등록 어휘를 추정하는 방법에 대해서 제안한다. 제안 방법은 대량의 문서로부터 음절 n-gram을 추출한 뒤, 각 n-gram에서 n을 한음절 축소 및 확장 시켜, (n+1)gram, (n-1)gram을 추가적으로 추출한다. 추출된 음절 n-gram을 기준으로 (n+1)gram, (n-1)gram과의 빈도 차이를 계산하여 빈도차가 급격하게 발생하는 구간을 신조어 및 미등록 어휘로 추정한다. 실험결과 신조어 뿐만 아니라 트위터, 미투데이 등과 같은 도메인에 종속적인 미등록 어휘도 추출되는 것을 확인할 수 있었다.

  • PDF

An investigation of chroma n-gram selection for cover song search (커버곡 검색을 위한 크로마 n-gram 선택에 관한 연구)

  • Seo, Jin Soo;Kim, Junghyun;Park, Jihyun
    • The Journal of the Acoustical Society of Korea
    • /
    • v.36 no.6
    • /
    • pp.436-441
    • /
    • 2017
  • Computing music similarity is indispensable in constructing music retrieval system. This paper focuses on the cover song search among various music-retrieval tasks. We investigate the cover song search method based on the chroma n-gram to reduce storage for feature DB and enhance search accuracy. Specifically we propose t-tab n-gram, n-gram selection method, and n-gram set comparison method. Experiments on the widely used music dataset confirmed that the proposed method improves cover song search accuracy as well as reduces feature storage.

Determination of Microbial Community as an Indicator of Kimchi Fermentation (김치발효의 지표로서 미생물군집의 측정)

  • Han, Hong-Ui;Lim, Chong-Rak;Park, Hyun-Kun
    • Korean Journal of Food Science and Technology
    • /
    • v.22 no.1
    • /
    • pp.26-32
    • /
    • 1990
  • Attempts were made to define the characteristics of microbial community as an indicator of Kimchi fermentation. Determination of communities was carried out by simple Gram-stain, followed by direct microcopic counts. In room-temperature $(15^{\circ}C)$ fermentation, microbial succession was occurred in the order of communities of Gram-positive bacteria, yeasts and Gram-negative bacteria. It was characteristic that Gram-positive bacterial community was developed during the production of lactic acid, yeasts community was developed to cause rancidity, and Gram-negative bacterial community was relevant to maceration (or softening) as well as rancidity. The fluctuation of apparent Gram-negative reaction group might be used as a criterion of death or aging of Gram-positive bacterial populations. In low-temperature fermentation $(5^{\circ}C)$, however, it was found that yeasts and Gram-negative bacterial communities did not developed but only Gram-positive bacterial community did. It follows from these results mentioned above that maturity of Kimchi depends on the development of Cram-positive bacterial community. Thus, the size and occurrence of microbial community are avaiable for an indicator of Kimchi fermentation, and also determination of community could be a useful method to predict the maturity.

  • PDF

A Study on the Air Counts and the Infection of Maternity in n General Hospital (병실 낙하균 및 산모감염에 관한 연구)

  • 이남희
    • Journal of Korean Academy of Nursing
    • /
    • v.9 no.2
    • /
    • pp.17-26
    • /
    • 1979
  • This research is to prevent the infection of maternity in the hospital by examining the microbes contaminations in maternity through airbone microbes and those who are engaged in the ward of O.B. & G.Y. and to furnish the basic data available to hospital management. The bacterial growth of airbone microbes contaminations in nosocomial air and who thor the nasal cavity of passers by (doctors, nurses, parturient women) who went to the ward of O.B. & G.Y. contaminated or not were examined in“E”Univ. Hospital from July to August, 1979 by using thioglycollate broths and agar plates. The following results were obtained: 1. The average colony number of airborne microbes revealed as follows the pediatric ward (36 colonies), the internal ward (33 colonies), the ward of O.B. & G.Y. (30 colonies), the ward of surgery (24 colonies), delivery-waiting room (11 colonies), and the delivery room (3 colonies). 2. The bacterial growth beforenoon differed from that of afternoon. Namely, the latter (24 colonies) was higher than the former (21 colonies). 3. The type of strains isolated from the air of the ward revealed staphylococci (82%), Gram negative bacilli (18%), fungi (17%), Gram positive diplococci (13%), and Bacillus subtilis (2.8%). 4. The strains isolated in the delivery-waiting room revealed staphylococci (66.7%), Gram negative bacilli (33.6%), and revealed staphylococci (75%), Gram positive diplococci (8.3%), and fungi (8.3%), in delivery room. 5. Most of strains isolated in the ward of O.B. & G.Y. revealed staphylococci (100.0%), Gram positive diplococci (8.3%), and Gram negative bacilli (6.7%). 6. The strain isolated in the surgical ward revealed staphylococci (91.7%), fungi (33.3%), Gram positive diplococci (25%), Gram negative bacilli (25%) and Bacillus subtilis (8.3%). 7. The strain isolated in the pediatric ward revealed staphylococci (75%), fungi (25%), Gram positive diplococci (8.3%), Bacillus subtilis (8.3%), and Gram negative bacilli (8.3%). 8. The strain isolated in the internal ward revealed staphylococci (91.7%), fungi (33.3%), Gram positive diplococci (25%), and negative bacilli (16.7%). The strains isolated from the nasal cavity of those doctors and nurses who and enaged in the ward of O.B. & G.Y. revealed staphylococci (80%), Bacillus subtilis (10%), and Gram negative bacilli (10%), from doctors and Gram positive diplococci (10%), instead of Gram negative bacilli (10%), from nurses. 10. The strain isolated from nasal cavity of parturient women on admission revealed staphylococci (90%), and Gram negative bacilli (10%), but after admission revealed staphylococci (70%), Gram positive diplococci (10%), and Gram negative bacilli (10%). 11. Of the total 91 staphylococci isolated from the air of the ward, the Coagulase pastive was 36 (39.6%), and the negative 55 (60.4%), As a result of the coagulase experiment of the staphylococci isolated from the nasal cavity of those who are engaged in the ward of O.B. & G.Y. all were revealed as negative that belonged to non-pathogenic. 12. Consequence of the biochemic examination of the gram negative bacilli isolated from the air of the ward the aerobacter aerogens revealed was (16.7%) E-coli 5% in the nasal cavity of those came and went to the of O.B. & G.Y. and Aerobacter aerogens 7.5%.

  • PDF

Antibacterial Activity of CNT-Ag and GO-Ag Nanocomposites Against Gram-negative and Gram-positive Bacteria

  • Yun, Hyosuk;Kim, Ji Dang;Choi, Hyun Chul;Lee, Chul Won
    • Bulletin of the Korean Chemical Society
    • /
    • v.34 no.11
    • /
    • pp.3261-3264
    • /
    • 2013
  • Carbon nanocomposites composed of carbon nanostructures and metal nanoparticles have become one of useful materials for various applications. Here we present the preparation and antibacterial activity of CNT-Ag and GO-Ag nanocomposites. Their physical properties were characterized by TEM, XPS, and Raman measurements, revealing that size-similar and quasi-spherical Ag nanoparticles were anchored to the surface of the CNT and GO. The antibacterial activities of CNT-Ag and GO-Ag were investigated using the growth curve method and minimal inhibitory concentrations against Gram-negative and Gram-positive bacteria. The antibacterial activities of the carbon nanocomposites were slightly different against Gram-positive and Gram-negative bacteria. The proposed mechanism was discussed.

Comparative Analysis of 4-gram Word Clusters in South vs. North Korean High School English Textbooks (남북한 고등학교 영어교과서 4-gram 연어 비교 분석)

  • Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.7
    • /
    • pp.274-281
    • /
    • 2020
  • N-gram analysis casts a new look at the n-word cluster in use different from the previously known idioms. It analyzes a corpus of English textbooks for frequently occurring n consecutive words mechanically using a concordance software, which is different from the previously known idioms. The current paper aims at extracting and comparing 4-gram words clusters between South Korean high school English textbooks and its North Korean counterpart. The classification criteria includes number of tokens and types between the two across oral and written languages in the textbooks. The criteria also use the grammatical categories and functional categories to classify and compare the 4-gram words clusters. The grammatical categories include noun phrases, verb phrases, prepositional phrases, partial clauses and others. The functional categories include deictic function, text organizers, stance and others. The findings are: South Korean high school English textbook contains more tokens and types in both oral and written languages. Verb phrase and partial clause 4-grams are grammatically most frequently encountered categories across both South and North Korean high school English textbooks. Stance is most dominant functional category in both South and North Korean English textbooks.

Accurate Intrusion Detection using n-Gram Augmented Naive Bayes (N-Gram 증강 나이브 베이스를 이용한 정확한 침입 탐지)

  • Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2008.10a
    • /
    • pp.285-288
    • /
    • 2008
  • In many intrusion detection applications, n-gram approach has been widely applied. However, n-gram approach has shown a few problems including double counting of features. To address those problems, we applied n-gram augmented Naive Bayes directly to classify intrusive sequences and compared performance with those of Naive Bayes and Support Vector Machines (SVM) with n-gram features by the experiments on host-based intrusion detection benchmark data sets. Experimental results on the University of New Mexico (UNM) benchmark data sets show that the n-gram augmented method, which solves the problem of independence violation that happens when n-gram features are directly applied to Naive Bayes (i.e. Naive Bayes with n-gram features), yields intrusion detectors with higher accuracy than those from Naive Bayes with n-gram features and shows comparable accuracy to those from SVM with n-gram features.

  • PDF

Differentiation of mixed bacterial populations by modified gram stain (수정된 Gram 염색법에 의한 혼합세균 개체군의 분별 측정)

  • 장진경;임종락;정계효;한홍의
    • Korean Journal of Microbiology
    • /
    • v.25 no.3
    • /
    • pp.244-248
    • /
    • 1987
  • Attempts were made to enumerate the number of Gram positive and negative bacteria in the development of natural fermentation rapidly and simultaneously. A general Gram stain was applied to this study. The number of cells by Gram stain was proportional to the cell turbidity by spectrophotometer within a range of 0.7 absorbance at 610nm. The cells washed out during procedures were not exceeded about 8 percentage. The standard error of separate counts in the mixture of Cscherichia coli and Micrococcus luteus was $5.1\pm2.3$%. The possible range of counting was $5.5\times 10^{7}-1.0\times 10^{9}$ cells/ml. Therefore, it is believed that a general Gram stain could be applied to the separate counting of mixture of Fram positive and negative bacterial populations too. In practice, growth kinetics of hemp retting and Kimchi fermentation were presented.

  • PDF

A Study on Pseudo N-gram Language Models for Speech Recognition (음성인식을 위한 의사(疑似) N-gram 언어모델에 관한 연구)

  • 오세진;황철준;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.3
    • /
    • pp.16-23
    • /
    • 2001
  • In this paper, we propose the pseudo n-gram language models for speech recognition with middle size vocabulary compared to large vocabulary speech recognition using the statistical n-gram language models. The proposed method is that it is very simple method, which has the standard structure of ARPA and set the word probability arbitrary. The first, the 1-gram sets the word occurrence probability 1 (log likelihood is 0.0). The second, the 2-gram also sets the word occurrence probability 1, which can only connect the word start symbol and WORD, WORD and the word end symbol . Finally, the 3-gram also sets the ward occurrence probability 1, which can only connect the word start symbol , WORD and the word end symbol . To verify the effectiveness of the proposed method, the word recognition experiments are carried out. The preliminary experimental results (off-line) show that the word accuracy has average 97.7% for 452 words uttered by 3 male speakers. The on-line word recognition results show that the word accuracy has average 92.5% for 20 words uttered by 20 male speakers about stock name of 1,500 words. Through experiments, we have verified the effectiveness of the pseudo n-gram language modes for speech recognition.

  • PDF