• Title/Summary/Keyword: Modified n-gram

Search Result 25, Processing Time 0.023 seconds

A Modified Binary n-gram Algorithm for the postprocessing of the Automatic Document Reading (자동문서판독 후처리를 위한 수정된 n-gram 알고리즘)

  • Kim, Il-Hwoe;Ryoo, Keun-Ho;Lee, Cheol-Hee
    • Proceedings of the KIEE Conference
    • /
    • 1987.07b
    • /
    • pp.1352-1355
    • /
    • 1987
  • This Paper proposed the modified binary n-gram algorithm for the contextual post processing system in English sentence. Backward gram was used to correct the first position error in a word. It is not requires additional storage but more times of comparison it allows interactive correction routine. Experiments were implemented using PASCAL language on a micro computer, IBM PC/XT. This algorithm improves the correction rate around $4{\sim}5%$ on a limited experimental environments.

  • PDF

Detecting Spectre Malware Binary through Function Level N-gram Comparison (함수 단위 N-gram 비교를 통한 Spectre 공격 바이너리 식별 방법)

  • Kim, Moon-Sun;Yang, Hee-Dong;Kim, Kwang-Jun;Lee, Man-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.6
    • /
    • pp.1043-1052
    • /
    • 2020
  • Signature-based malicious code detection methods share a common limitation; it is very hard to detect modified malicious codes or new malware utilizing zero-day vulnerabilities. To overcome this limitation, many studies are actively carried out to classify malicious codes using N-gram. Although they can detect malicious codes with high accuracy, it is difficult to identify malicious codes that uses very short codes such as Spectre. We propose a function level N-gram comparison algorithm to effectively identify the Spectre binary. To test the validity of this algorithm, we built N-gram data sets from 165 normal binaries and 25 malignant binaries. When we used Random Forest models, the model performance experiments identified Spectre malicious functions with 99.99% accuracy and its f1-score was 92%.

Sentence Similarity Measurement Method Using a Set-based POI Data Search (집합 기반 POI 검색을 이용한 문장 유사도 측정 기법)

  • Ko, EunByul;Lee, JongWoo
    • KIISE Transactions on Computing Practices
    • /
    • v.20 no.12
    • /
    • pp.711-716
    • /
    • 2014
  • With the gradual increase of interest in plagiarism and intelligent file content search, the demand for similarity measuring between two sentences is increasing. There is a lot of researches for sentence similarity measurement methods in various directions such as n-gram, edit-distance and LSA. However, these methods have their own advantages and disadvantages. In this paper, we propose a new sentence similarity measurement method approaching from another direction. The proposed method uses the set-based POI data search that improves search performance compared to the existing hard matching method when data includes the inverse, omission, insertion and revision of characters. Using this method, we are able to measure the similarity between two sentences more accurately and more quickly. We modified the data loading and text search algorithm of the set-based POI data search. We also added a word operation algorithm and a similarity measure between two sentences expressed as a percentage. From the experimental results, we observe that our sentence similarity measurement method shows better performance than n-gram and the set-based POI data search.

Emotion Prediction of Paragraph using Big Data Analysis (빅데이터 분석을 이용한 문단 내의 감정 예측)

  • Kim, Jin-su
    • Journal of Digital Convergence
    • /
    • v.14 no.11
    • /
    • pp.267-273
    • /
    • 2016
  • Creation and Sharing of information which is structured data as well as various unstructured data. makes progress actively through the spread of mobile. Recently, Big Data extracts the semantic information from SNS and data mining is one of the big data technique. Especially, the general emotion analysis that expresses the collective intelligence of the masses is utilized using large and a variety of materials. In this paper, we propose the emotion prediction system architecture which extracts the significant keywords from social network paragraphs using n-gram and Korean morphological analyzer, and predicts the emotion using SVM and these extracted emotion features. The proposed system showed 82.25% more improved recall rate in average than previous systems and it will help extract the semantic keyword using morphological analysis.

Measurement for License Identification of Open Source Software (오픈소스 소프트웨어 라이선스 파일 식별 기술)

  • Yun, Ho-Yeong;Joe, Yong-Joon;Jung, Byung-Ok;Shin, Dong-Myung
    • Journal of Software Assessment and Valuation
    • /
    • v.12 no.2
    • /
    • pp.1-8
    • /
    • 2016
  • In this paper, we study abstracting and identifying license file from a package to prevent unintentional intellectual property infringement because of lost/modified/confliction of license information when redistributing open source software. To invest character of the license files, we analyzed 322 licenses by n-gram and TF-IDF methods, and abstract license files from the packages. We identified license information with a similarity of the registered licenses by cosine measurement.

Effect n-3 Polyunsaturated Fatty Acids on Serum Lipoprotein and Lipid Compositions in Human Subjects (사람에서 n-3계 불포화지방산이 Serum Lipoprotein과 지질조성에 미치는 영향)

  • 박현서
    • Journal of Nutrition and Health
    • /
    • v.21 no.1
    • /
    • pp.61-74
    • /
    • 1988
  • Ten college women were divided into 5 groups and treated in randomized block design for 5 weeks with 1 interval between treatments and subjects serving as their own controls. The experimental diets were corn oil diet as a source of n-6 linoleic acid, perilla oil diet as a source of n-3 $\alpha$-linolenic acid, and fish oil diet as a source of n-3 EPA and DHA. Dietary fat was supplied at 30% Cal and modified to give the total amount of saturated fatty acids and monoenoic acids at constant level. There was no significant effect on serum cholesterol level by different PUFA. However, on a gram-for-gram basis, there was a trend that the decrease in serum cholesterol was proportionate to the degree of fat unsaturation. On the other hand, only fish oil diet significantly decreased TG level but no significant effect on the relative proportion of TG in VLDL. The degree of hypotriglyceridemia did not corrleate with the degree of unsaturation. The relative proportion of CE in LDL was reduced by all PUFA diets but significant only by perilla oil diet. The relative amount of apoprotein in LDL was significantly reduced by n-3 PUFA. HDL-Chol content was significantly increased only in fish oil diet but no change in the relative proportion of its chemical components of HDL.

  • PDF

Emotion Prediction of Document using Paragraph Analysis (문단 분석을 통한 문서 내의 감정 예측)

  • Kim, Jinsu
    • Journal of Digital Convergence
    • /
    • v.12 no.12
    • /
    • pp.249-255
    • /
    • 2014
  • Recently, creation and sharing of information make progress actively through the SNS(Social Network Service) such as twitter, facebook and so on. It is necessary to extract the knowledge from aggregated information and data mining is one of the knowledge based approach. Especially, emotion analysis is a recent subdiscipline of text classification, which is concerned with massive collective intelligence from an opinion, policy, propensity and sentiment. In this paper, We propose the emotion prediction method, which extracts the significant key words and related key words from SNS paragraph, then predicts the emotion using these extracted emotion features.

Design of a Korean Speech Recognition Platform (한국어 음성인식 플랫폼의 설계)

  • Kwon Oh-Wook;Kim Hoi-Rin;Yoo Changdong;Kim Bong-Wan;Lee Yong-Ju
    • MALSORI
    • /
    • no.51
    • /
    • pp.151-165
    • /
    • 2004
  • For educational and research purposes, a Korean speech recognition platform is designed. It is based on an object-oriented architecture and can be easily modified so that researchers can readily evaluate the performance of a recognition algorithm of interest. This platform will save development time for many who are interested in speech recognition. The platform includes the following modules: Noise reduction, end-point detection, met-frequency cepstral coefficient (MFCC) and perceptually linear prediction (PLP)-based feature extraction, hidden Markov model (HMM)-based acoustic modeling, n-gram language modeling, n-best search, and Korean language processing. The decoder of the platform can handle both lexical search trees for large vocabulary speech recognition and finite-state networks for small-to-medium vocabulary speech recognition. It performs word-dependent n-best search algorithm with a bigram language model in the first forward search stage and then extracts a word lattice and restores each lattice path with a trigram language model in the second stage.

  • PDF

Characterization of Immobilized Denitrifying Bacteria Isolated from Municipal Sewage

  • Kim, Joong-Kyun;Kim, Sung-Koo;Kim, Sang-Hee
    • Journal of Microbiology and Biotechnology
    • /
    • v.11 no.5
    • /
    • pp.756-762
    • /
    • 2001
  • As a component for a recirculating aquaculture system, a new strain of denitrifying bacterium was isolated from municipal sewage. The isolate was motile by means of one polar flagellum, catalase-positive, and a Gram-negative rod-shaped cell measuring $0.5-0.6{\mu}m$ in width and $1.3-1.9{\mu}m$ in length. The isolate was identified as Pseudomonas fluorescens and produced dinitrogen gas via the reduction of nitrate. The optimal growth conditions (pH, temperature, carbon source, and C/N ratio) of the isolate were found to be 6.8, $30^{\circ}C$, malate, and 3, respectively. Under optimal growth conditions of P. fluorescens, dinitrogen gas was first detected in the exponential growth phase, then a small amount of nitrite was developed and converted to dinitrogen gas in the stationary phase. Pseudomonas fluorescens cells were immobilized in modified polyvinyl alcohol (PVA) gel beads, and the maximum denitrification rate was measured as $36.6 {\mu}lN_2h^-1$ per bead with an optimum cell loading of $20mg {\mu}l^-1$ and $2\%$ sodium alginate added to the PVA gel. The operating stability of the modified PVA gel beads remained unchanged for up to 43 repeated batches.

  • PDF

Language Model based on VCCV and Test of Smoothing Techniques for Sentence Speech Recognition (문장음성인식을 위한 VCCV 기반의 언어모델과 Smoothing 기법 평가)

  • Park, Seon-Hee;Roh, Yong-Wan;Hong, Kwang-Seok
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.241-246
    • /
    • 2004
  • In this paper, we propose VCCV units as a processing unit of language model and compare them with clauses and morphemes of existing processing units. Clauses and morphemes have many vocabulary and high perplexity. But VCCV units have low perplexity because of the small lexicon and the limited vocabulary. The construction of language models needs an issue of the smoothing. The smoothing technique used to better estimate probabilities when there is an insufficient data to estimate probabilities accurately. This paper made a language model of morphemes, clauses and VCCV units and calculated their perplexity. The perplexity of VCCV units is lower than morphemes and clauses units. We constructed the N-grams of VCCV units with low perplexity and tested the language model using Katz, absolute, modified Kneser-Ney smoothing and so on. In the experiment results, the modified Kneser-Ney smoothing is tested proper smoothing technique for VCCV units.