• Title/Summary/Keyword: word analysis

Search Result 2,157, Processing Time 0.03 seconds

Forgery Detection Mechanism with Abnormal Structure Analysis on Office Open XML based MS-Word File

  • Lee, HanSeong;Lee, Hyung-Woo
    • International journal of advanced smart convergence
    • /
    • v.8 no.4
    • /
    • pp.47-57
    • /
    • 2019
  • We examine the weaknesses of the existing OOXML-based MS-Word file structure, and analyze how data concealment and forgery are performed in MS-Word digital documents. In case of forgery by including hidden information in MS-Word digital document, there is no difference in opening the file with the MS-Word Processor. However, the computer system may be malfunctioned by malware or shell code hidden in the digital document. If a malicious image file or ZIP file is hidden in the document by using the structural vulnerability of the MS-Word document, it may be infected by ransomware that encrypts the entire file on the disk even if the MS-Word file is normally executed. Therefore, it is necessary to analyze forgery and alteration of digital document through internal structure analysis of MS-Word file. In this paper, we designed and implemented a mechanism to detect this efficiently and automatic detection software, and presented a method to proactively respond to attacks such as ransomware exploiting MS-Word security vulnerabilities.

Analysis of Lexical Effect on Spoken Word Recognition Test (한국어 단음절 낱말 인식에 미치는 어휘적 특성의 영향)

  • Yoon, Mi-Sun;Yi, Bong-Won
    • MALSORI
    • /
    • no.54
    • /
    • pp.15-26
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

Word Sense Disambiguation Using Embedded Word Space

  • Kang, Myung Yun;Kim, Bogyum;Lee, Jae Sung
    • Journal of Computing Science and Engineering
    • /
    • v.11 no.1
    • /
    • pp.32-38
    • /
    • 2017
  • Determining the correct word sense among ambiguous senses is essential for semantic analysis. One of the models for word sense disambiguation is the word space model which is very simple in the structure and effective. However, when the context word vectors in the word space model are merged into sense vectors in a sense inventory, they become typically very large but still suffer from the lexical scarcity. In this paper, we propose a word sense disambiguation method using word embedding that makes the sense inventory vectors compact and efficient due to its additive compositionality. Results of experiments with a Korean sense-tagged corpus show that our method is very effective.

Impact of Word Embedding Methods on Performance of Sentiment Analysis with Machine Learning Techniques

  • Park, Hoyeon;Kim, Kyoung-jae
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.8
    • /
    • pp.181-188
    • /
    • 2020
  • In this study, we propose a comparative study to confirm the impact of various word embedding techniques on the performance of sentiment analysis. Sentiment analysis is one of opinion mining techniques to identify and extract subjective information from text using natural language processing and can be used to classify the sentiment of product reviews or comments. Since sentiment can be classified as either positive or negative, it can be considered one of the general classification problems. For sentiment analysis, the text must be converted into a language that can be recognized by a computer. Therefore, text such as a word or document is transformed into a vector in natural language processing called word embedding. Various techniques, such as Bag of Words, TF-IDF, and Word2Vec are used as word embedding techniques. Until now, there have not been many studies on word embedding techniques suitable for emotional analysis. In this study, among various word embedding techniques, Bag of Words, TF-IDF, and Word2Vec are used to compare and analyze the performance of movie review sentiment analysis. The research data set for this study is the IMDB data set, which is widely used in text mining. As a result, it was found that the performance of TF-IDF and Bag of Words was superior to that of Word2Vec and TF-IDF performed better than Bag of Words, but the difference was not very significant.

Performance Analysis of Opinion Mining using Word2vec (Word2vec을 이용한 오피니언 마이닝 성과분석 연구)

  • Eo, Kyun Sun;Lee, Kun Chang
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2018.05a
    • /
    • pp.7-8
    • /
    • 2018
  • This study proposes an analysis of the Word2vec-based machine learning classifiers for the sake of opinion mining tasks. As a bench-marking method, BOW (Bag-of-Words) was adopted. On the basis of utilizing the Word2vec and BOW as feature extraction methods, we applied Laptop and Restaurant dataset to LR, DT, SVM, RF classifiers. The results showed that the Word2vec feature extraction yields more improved performance.

  • PDF

Analysis of Lexical Effect on Spoken Word Recognition Test (낱말 인식 검사에 대한 어휘적 특성의 영향 분석)

  • Yoon, Mi-Sun;Yi, Bong-Won
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.77-80
    • /
    • 2005
  • The aim of this paper was to analyze the lexical effects on spoken word recognition of Korean monosyllabic word. The lexical factors chosen in this paper was frequency, density and lexical familiarity of words. Result of the analysis was as follows; frequency was the significant factor to predict spoken word recognition score of monosyllabic word. The other factors were not significant. This result suggest that word frequency should be considered in speech perception test.

  • PDF

Strong (stressed) syllables in English and lexical segmentation by Koreans (영어의 강음절(강세 음절)과 한국어 화자의 단어 분절)

  • Kim, Sun-Mi;Nam, Ki-Chun
    • Phonetics and Speech Sciences
    • /
    • v.3 no.1
    • /
    • pp.3-14
    • /
    • 2011
  • It has been posited that in English, native listeners use the Metrical Segmentation Strategy (MSS) for the segmentation of continuous speech. Strong syllables tend to be perceived as potential word onsets for English native speakers, which is due to the high proportion of strong syllables word-initially in the English vocabulary. This study investigates whether Koreans employ the same strategy when segmenting speech input in English. Word-spotting experiments were conducted using vowel-initial and consonant-initial bisyllabic targets embedded in nonsense trisyllables in Experiment 1 and 2, respectively. The effect of strong syllable was significant in the RT (reaction times) analysis but not in the error analysis. In both experiments, Korean listeners detected words more slowly when the word-initial syllable is strong (stressed) than when it is weak (unstressed). However, the error analysis showed that there was no effect of initial stress in Experiment 1 and in the item (F2) analysis in Experiment 2. Only the subject (F1) analysis in Experiment 2 showed that the participants made more errors when the word starts with a strong syllable. These findings suggest that Koran listeners do not use the Metrical Segmentation Strategy for segmenting English speech. They do not treat strong syllables as word beginnings, but rather have difficulties recognizing words when the word starts with a strong syllable. These results are discussed in terms of intonational properties of Korean prosodic phrases which are found to serve as lexical segmentation cues in the Korean language.

  • PDF

A Study on the Development of a Practical Morphological Analysis System Based on Word Analysis (어절 분석 기반 형태소 분석 시스템 개발에 관한 연구)

  • 조현양;최성필;최재황
    • Journal of the Korean Society for information Management
    • /
    • v.18 no.2
    • /
    • pp.105-124
    • /
    • 2001
  • The purpose of this study is to develop a Korean word analysis system, which can improve performance of IRS, based on various methods of word analysis. In this study we focused on maximizing the speed of Korean word analysis, modulizing each functional system and analyzing Korean morpheme precisely. The system, developed in this study, implemented optimal algorithm to increase the speed of word analysis and to verify speed and performance of each subsystem. In addition, the numeral analysis processing was achieved to reduce a system burden by avoiding recursive analysis of compound nouns, based on numeral pattern recognition.

  • PDF

The Effect of Online Word-of-mouth on Fashion Involvement and Internet Purchase Behavior (온라인 패션 구전에 따른 패션제품 관여와 인터넷 구매행동)

  • Song, So-Jin;Hwang, Jin-Sook
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.31 no.3 s.162
    • /
    • pp.410-419
    • /
    • 2007
  • The purposes of this study were to segment consumers by on-line word of month and to find the differences among the segmented groups in regard to fashion involvement, internet perceived risk, and internet purchase behavior. The subjects of this study were female consumers who were members of online cafe in Korea. The data were collected during October, 2004. The respondents returned the questionnaires through internet and 480 questionnaires were finally used in the data analysis. The statistical analyses used for the study were factor analysis, cluster analysis, t-test, and $X^2-test$. The results showed that word-of·mouth communication on internet(e-WOM) is composed of two factors, word-of-mouth transmission and word-of-mouth acceptance. These two factors were put under cluster analysis and were classified into two groups of the word-of·mouth communication: WOM group and non-WOM group. T-test showed that word-of-mouth communication groups were significantly different in regard to fashion involvement, internet perceived risk, and internet purchase behavior. For example, WOM group was more uncertain of their clothing choices, put more weight on the internal factors of clothing selection, and was a frequent purchaser of internet fashion products. Internet fashion business needs to implement the proper marketing strategies based on the results of the study.

Comparison between Word Embedding Techniques in Traditional Korean Medicine for Data Analysis: Implementation of a Natural Language Processing Method (한의학 고문헌 데이터 분석을 위한 단어 임베딩 기법 비교: 자연어처리 방법을 적용하여)

  • Oh, Junho
    • Journal of Korean Medical classics
    • /
    • v.32 no.1
    • /
    • pp.61-74
    • /
    • 2019
  • Objectives : The purpose of this study is to help select an appropriate word embedding method when analyzing East Asian traditional medicine texts as data. Methods : Based on prescription data that imply traditional methods in traditional East Asian medicine, we have examined 4 count-based word embedding and 2 prediction-based word embedding methods. In order to intuitively compare these word embedding methods, we proposed a "prescription generating game" and compared its results with those from the application of the 6 methods. Results : When the adjacent vectors are extracted, the count-based word embedding method derives the main herbs that are frequently used in conjunction with each other. On the other hand, in the prediction-based word embedding method, the synonyms of the herbs were derived. Conclusions : Counting based word embedding methods seems to be more effective than prediction-based word embedding methods in analyzing the use of domesticated herbs. Among count-based word embedding methods, the TF-vector method tends to exaggerate the frequency effect, and hence the TF-IDF vector or co-word vector may be a more reasonable choice. Also, the t-score vector may be recommended in search for unusual information that could not be found in frequency. On the other hand, prediction-based embedding seems to be effective when deriving the bases of similar meanings in context.