• Title/Summary/Keyword: n-gram similarity

Search Result 32, Processing Time 0.023 seconds

Route matching delivery recommendation system using text similarity

  • Song, Jeongeun;Song, Yoon-Ah
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.151-160
    • /
    • 2022
  • In this paper, we propose an algorithm that enables near-field delivery at a faster and lowest cost to meet the growing demand for delivery services. The algorithm proposed in this study involves subway passengers (shipper) in logistics movement as delivery sources. At this time, the passenger may select a delivery logistics matching subway route. And from the perspective of the service user, it is possible to select a delivery man whose route matches. At this time, the delivery source recommendation is carried out in a text similarity measurement method that combines TF-IDF&N-gram and BERT. Therefore, unlike the existing delivery system, two-way selection is supported in a man-to-man method between consumers and delivery man. Both cost minimization and delivery period reduction can be guaranteed in that passengers on board are involved in logistics movement. In addition, since special skills are not required in terms of transportation, it is also meaningful in that it can provide opportunities for economic participation to workers whose job positions have been reduced.

Studies on Bacterial Characteristics of Bacillus cereus Group LS-1 Isolated from Suyeong Bay (수영만에서 분리된 Bacillus cereus Group LS-1 의 세균학적 특성에 관한 연구)

  • 성희경;이원재;김용호;함건주
    • Korean Journal of Microbiology
    • /
    • v.30 no.5
    • /
    • pp.339-346
    • /
    • 1992
  • These studies were carried out to identify Bacillus cereus group 1..5-] strain isolated from 5uyeong Bay. This strain was differentiated from B. cereus group using conventional, API system and fatty acid composition analysis. Colony characteristics were opague. mucoid, entire margin. convex. circular and non hemolysis on sheep blood agar plates, and were observed with central spore forming positive bacilli in a Gram stained preparation. and had no motility. The carbohydrates tested; glucose.maltose, and sucrose were assimilated but neither trehalose nor salicin were assimilated. This strain ultilized gelatin and was also inhibited by 6.5% NaCI. The results of biochemical examination were differented from B. cereus group LS-1 compared with others B. cereus group. The fatty acid composition contained major amounts of branched chain acids. iso $C_{15}$ and iso $C_{13}$ and the range of chain length was $C_{12}$ to C"$C_{17}$ and n$C_{15}$, acid was not detected. Automated fatty acid computer profile indicated "B. mycoides GC subgroup B of 0.312 similarity index." The results agreed with other research cases. On the other hand. A TB computer prolile index of API system (API 50 CHB & API 20E) identified" Doubtful profile of 99.7% B. firmus" . These results were presented with considerable discrepancies between API system and fatty acid analysis. With 67 biochemical characters. the similarity matrix of B. mycaides (KCTC 1033). B. thuringiensis (KCTC 1033). B. cereus (5-3) and B. mycoides (S-12) showed 42%. 42%. 59%, and 52%. respectively. Through the key tests and fatty acid analyses. we could notice the appearance of B. mycoides of the B. cereus group and this leads us to suspect the existence of a new biotype B. mycoides.

  • PDF

Physico-chemical Characteristics and Diversity of Marine Actinomycetes Isolated from the Coast of Jeju Island (제주 연안에서 분리된 해양방선균의 이화학적 특성 및 다양성)

  • Kim, Man-Chul;Heo, Moon-Soo
    • Korean Journal of Environmental Biology
    • /
    • v.28 no.4
    • /
    • pp.223-230
    • /
    • 2010
  • To investigate the variations of physico-chemical factors in four stations (Hanlim, Aewol, Sinchon, Hamdeok) at Jeju coastal area, Water temperature, Salinity, dissolved oxygen (DO), pH, chemical oxygen demand (COD), suspended solid (SS), ammonia-nitrogen ($NH_4-N$), nitrite-nitrogen ($NO_3-N$), nitrate-nitrogen ($NO_2-N$) were analysed. The ranges of water temperature were from 26.23 to $28.6^{\circ}C$, the salinity were from 31.4 to 32.88‰, the pH were from 8.15 to 8.35, the chemical oxygen were from 0.48 to 0.91 mg $L^{-1}$. A total of 52 strains of marine actinomycetes was isolated from Jeju Island coastal area. They were characterized by determining morphological and physio-biochemical properties, the API kit and confirmed by molecular methods including partial sequencing of 16S rRNA. A neighbor-joining tree of partial 16S rRNA sequences divided the 52 isolates in 2 major groups, 22 strains of Gram positive bacteria/Actinobacteria (division)/Actinomycetales (order)/Streptomycineae (suborder)/Streptomycetaceae (family)/Streptomyces (93.1%) and 2 strains of Streptospotangineae (suborder)/Nocardiopsaceae (family)/Nocardiopsis (6.9%).

Cohnella panacarvi sp. nov., a Xylanolytic Bacterium Isolated from Ginseng Cultivating Soil

  • Yoon, Min-Ho;Ten, Leonid N.;Im, Wan-Taek
    • Journal of Microbiology and Biotechnology
    • /
    • v.17 no.6
    • /
    • pp.913-918
    • /
    • 2007
  • A Gram-positive, aerobic, rod-shaped, nonmotile, endospore-forming bacterium, designated Gsoil $349^T$, was isolated from soil of a ginseng field and characterized using a polyphasic approach. Comparative analysis of 16S rRNA gene sequences revealed that the strain Gsoil $349^T$ belongs to the family Paenibacillaceae, and the sequence showed closest similarity with Cohnella thermotolerans DSM $17683^T$ (94.1%) and Cohnella hongkongensis DSM $17642^T$ (93.6%). The strain showed less than 91.3% 16S rRNA gene sequence similarity with Paenibacillus species. In addition, the presence of MK-7 as the major menaquinone and $anteiso-C_{15:0},\;iso-C_{16:0},\;and\;C_{16:0}$ as major fatty acids suggested its affiliation to the genus Cohnella. The G+C content of the genomic DNA was 53.4 mol%. On the basis of its phenotypic characteristics and phylogenetic distinctiveness, strain Gsoil $349^T$ should be treated as a novel species within the genus Cohnella for which the name Cohnella panacarvi sp. nov. is proposed. The type strain is Gsoil $349^T\;(=KCTC\;13060^T=\;DSM\;18696^T)$.

Optimized Chinese Pronunciation Prediction by Component-Based Statistical Machine Translation

  • Zhu, Shunle
    • Journal of Information Processing Systems
    • /
    • v.17 no.1
    • /
    • pp.203-212
    • /
    • 2021
  • To eliminate ambiguities in the existing methods to simplify Chinese pronunciation learning, we propose a model that can predict the pronunciation of Chinese characters automatically. The proposed model relies on a statistical machine translation (SMT) framework. In particular, we consider the components of Chinese characters as the basic unit and consider the pronunciation prediction as a machine translation procedure (the component sequence as a source sentence, the pronunciation, pinyin, as a target sentence). In addition to traditional features such as the bidirectional word translation and the n-gram language model, we also implement a component similarity feature to overcome some typos during practical use. We incorporate these features into a log-linear model. The experimental results show that our approach significantly outperforms other baseline models.

Measuring Similarity of Korean Sentences based on BERT (BERT 기반 한국어 문장의 유사도 측정 방법)

  • Hyeon, Jonghwan;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.383-387
    • /
    • 2019
  • 자연어 문장의 자동 평가는 생성된 문장과 정답 문장을 자동으로 비교 및 평가하여, 두 문장 사이의 의미 유사도를 측정하는 기술이다. 이러한 자연어 문장 자동 평가는 기계 번역, 자연어 요약, 패러프레이징 등의 분야에서 자연어 생성 모델의 성능을 평가하는데 활용될 수 있다. 기존 자연어 문장의 유사도 측정 방법은 n-gram 기반의 문자열 비교를 수행하여 유사도를 산출한다. 이러한 방식은 계산 과정이 매우 간단하지만, 자연어의 다양한 특성을 반영할 수 없다. 본 논문에서는 BERT를 활용한 한국어 문장의 유사도 측정 방법을 제안하며, 이를 위해 ETRI에서 한국어 말뭉치를 대상으로 사전 학습하여 공개한 어절 단위의 KorBERT를 활용한다. 그 결과, 기존 자연어 문장의 유사도 평가 방법과 비교했을 때, 약 13%의 성능 향상을 확인할 수 있었다.

  • PDF

A Comparative Analysis of Content-based Music Retrieval Systems (내용기반 음악검색 시스템의 비교 분석)

  • Ro, Jung-Soon
    • Journal of the Korean Society for information Management
    • /
    • v.30 no.3
    • /
    • pp.23-48
    • /
    • 2013
  • This study compared and analyzed 15 CBMR (Content-based Music Retrieval) systems accessible on the web in terms of DB size and type, query type, access point, input and output type, and search functions, with reviewing features of music information and techniques used for transforming or transcribing of music sources, extracting and segmenting melodies, extracting and indexing features of music, and matching algorithms for CBMR systems. Application of text information retrieval techniques such as inverted indexing, N-gram indexing, Boolean search, truncation, keyword and phrase search, normalization, filtering, browsing, exact matching, similarity measure using edit distance, sorting, etc. to enhancing the CBMR; effort for increasing DB size and usability; and problems in extracting melodies, deleting stop notes in queries, and using solfege as pitch information were found as the results of analysis.

Isolation and Characterization of Nicotine-Degrading Bacterium Arthrobacter sp. NU11 and NU15 (니코틴 분해세균 Arthrobacter sp. NU11과 NU15의 분리 및 특성)

  • Jeong, Yeonju;Oh, Ji-Sung;Roh, Dong-Hyun
    • Korean Journal of Microbiology
    • /
    • v.50 no.1
    • /
    • pp.67-72
    • /
    • 2014
  • Minimal broth containing nicotine as a sole carbon source (MB/N) was used to isolate novel nicotine-degrading bacterial strains from tobacco plants and field soils. Comparative analysis of 16S rRNA gene sequence, phenotypic test and morphological tests showed that the position of these isolates were in the genus Arthrobacter of the family Micrococcaceae. The highest 16S rRNA gene sequence similarity of the isolate NU11 and NU15 to type strains in the genus Arthrobacter were Arthrobacter equi (98.2%) which was presumably a novel strain and Arthrobacter nicotinovorans (99.8%), respectively. Both strain NU11 and NU15 showed rod shaped, Gram-positive characteristics and catalase activity, but did not show oxidase activity. The novel strain NU11 was found to degrade efficiently nicotine in MB/N medium by the analysis of UV absorption spectra and could be used as an organism in bioremediation technique.

A Text Reuse Measuring Model Using Circumference Sentence Similarity (주변 문장 유사도를 이용한 문서 재사용 측정 모델)

  • Choi, Sung-Won;Kim, Sang-Bum;Rim, Hae-Chang
    • Annual Conference on Human and Language Technology
    • /
    • 2005.10a
    • /
    • pp.179-183
    • /
    • 2005
  • 기존의 문서 재사용 탐지 모델은 문서 혹은 문장 단위로 그 내부의 단어 혹은 n-gram을 비교를 통해 문장의 재사용을 판별하였다. 그렇지만 문서 단위의 재사용 검사는 다른 문서의 일부분을 재사용하는 경우에 대해서는 문서 내에 문서 재사용이 이루어지지 않은 부분에 의해서 그 재사용 측정값이 낮아지게 되어 오류가 발생할 수 있는 가능성이 높아진다. 반면에 문장 단위의 문서 재사용 검사는 비교문서 내의 문장들에 대한 비교를 수행하게 되므로, 문서의 일부분에 대해 재사용물 수행한 경우에도 그 재사용된 부분 내의 문장들에 대한 비교를 수행하는 것이므로 문서 단위의 재사용에 비해 그런 경우에 더 견고하게 작동된다. 그렇지만, 문장 단위의 비교는 문서에 비해 짧은 문장을 단위로 하기 때문에 그 신뢰도에 문제가 발생하게 된다. 본 논문에서는 이런 문장단위 비교의 단점을 보완하기 위해 문장 단위의 문서 재사용 검사를 수행 후, 문장의 주변 문장의 재사용 검사 결과를 이용하여 문장 단위 재사용 검사에서 일어나는 오류를 감소시키고자 하였다.

  • PDF

Isolation and characterization of a noval membrane-bound cytochrome $C_{553}$ from the strictly anaerobic phototroph, heliobacillus mobilis

  • Lee, Woo-Yiel;Bla;Kim, Seung-Ho
    • Journal of Microbiology
    • /
    • v.35 no.3
    • /
    • pp.206-212
    • /
    • 1997
  • Heliobacillus mobilis is a strictly anaerobic Gram-positive bacterium which contains a primitive Photosystem I-type reaction center. The membrane-bound cytochrome $C_{553}$ from the heliobacterium suggested to be the immediate electron donor to the photooxidized pigment (P798+) has been isolated and characterized. The heme protein was visualized as a major component with an apparent molecular size of 17kDa in TMBZ-staining analysis of the membrane preparation and showed characteristic $\alpha$ (552.5 nm), $\beta$ (522nm), and Soret absorption (416 nm) peaks of a typical reduced c-type cytochrome in the partially purified sample. The internal 43 amino acid sequence of the electron donor was obtained by chemical agent and protease treatments followed by N-terminal sequencing of the resulting fragments. The internal sequence carries lots of lysine residues and a Cys-X-X-Cys-His sequence motif which are the characteristics of typical c-type cytochromes. The analysis of the sequence by FAST or FASTA program, however, did not show any significant similarity to other known heme proteins.

  • PDF