• Title/Summary/Keyword: Statistical Word Comparison

Search Result 9, Processing Time 0.024 seconds

A Study on Statistical Feature Selection with Supervised Learning for Word Sense Disambiguation (단어 중의성 해소를 위한 지도학습 방법의 통계적 자질선정에 관한 연구)

  • Lee, Yong-Gu
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.22 no.2
    • /
    • pp.5-25
    • /
    • 2011
  • This study aims to identify the most effective statistical feature selecting method and context window size for word sense disambiguation using supervised methods. In this study, features were selected by four different methods: information gain, document frequency, chi-square, and relevancy. The result of weight comparison showed that identifying the most appropriate features could improve word sense disambiguation performance. Information gain was the highest. SVM classifier was not affected by feature selection and showed better performance in a larger feature set and context size. Naive Bayes classifier was the best performance on 10 percent of feature set size. kNN classifier on under 10 percent of feature set size. When feature selection methods are applied to word sense disambiguation, combinations of a small set of features and larger context window size, or a large set of features and small context windows size can make best performance improvements.

A Study on the Durational Characteristics of Korean Distant-Talking Speech (한국어 원거리 음성의 지속시간 연구)

  • Kim, Sun-Hee
    • MALSORI
    • /
    • no.54
    • /
    • pp.1-14
    • /
    • 2005
  • This paper presents durational characteristics of Korean distant-talking speech using speech data, which consist of 500 distant-talking utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. Using a statistical method, the durational change of distant-talking speech in comparison with normal speech was analyzed. The results show that the duration of words with distant-talking speech is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in distant-talking speech. Finally, this study also shows that the speakers of distant-talking speech could be classified according to their different duration rate.

  • PDF

Implementation of a Chatbot Application for Restaurant recommendation using Statistical Word Comparison Method (통계적 단어 대조를 이용한 음식점 추천 챗봇 애플리케이션 구현)

  • Min, Dong-Hee;Lee, Woo-Beom
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.20 no.1
    • /
    • pp.31-36
    • /
    • 2019
  • A chatbot is an important area of mobile service, which understands informal data of a user as a conversational form and provides a customized service information for user. However, there is still a lack of a service way to fully understand the user's natural language typed query dialogue. Therefore, in this paper, we extract meaningful words, such a region, a food category, and a restaurant name from user's dialogue sentences for recommending a restaurant. and by comparing the extracted words against the contents of the knowledge database that is built from the hashtag for recommending a restaurant in SNS, and provides user target information having statistically much the word-similarity. In order to evaluate the performance of the restaurant recommendation chatbot system implemented in this paper, we measured the accessibility of various user query information by constructing a web-based mobile environment. As a results by comparing a previous similar system, our chabot is reduced by 37.2% and 73.3% with respect to the touch-count and the cutaway-count respectively.

A Study on the Durational Characteristics of Korean Lombard Speech (한국어 롬바드 음성의 지속시간 연구)

  • Kim, Sun-Hee
    • Proceedings of the KSPS conference
    • /
    • 2005.04a
    • /
    • pp.21-24
    • /
    • 2005
  • This paper presents durational characteristics of Korean Lombard speech using data, which consist of 500 Lombard utterances and 500 normal utterances of 10 speakers (5 males and 5 females). Each file was segmented and labeled manually and the duration of each segment and each word was extracted. The durational change of Lombard effect in comparison with normal speech was analyzed using a statistical method. The results show that the duration of words with Lombard effect is increased in comparison with normal style, and that the average unvoiced consonantal duration is reduced while the average vocalic duration is increased. Female speakers show a stronger tendency towards lengthening the duration in Lombard speech, but without statistical significance. Finally, this study also shows that the speakers of Lombard speech could be classified according to their different duration rate.

  • PDF

Comparison of System Call Sequence Embedding Approaches for Anomaly Detection (이상 탐지를 위한 시스템콜 시퀀스 임베딩 접근 방식 비교)

  • Lee, Keun-Seop;Park, Kyungseon;Kim, Kangseok
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.2
    • /
    • pp.47-53
    • /
    • 2022
  • Recently, with the change of the intelligent security paradigm, study to apply various information generated from various information security systems to AI-based anomaly detection is increasing. Therefore, in this study, in order to convert log-like time series data into a vector, which is a numerical feature, the CBOW and Skip-gram inference methods of deep learning-based Word2Vec model and statistical method based on the coincidence frequency were used to transform the published ADFA system call data. In relation to this, an experiment was carried out through conversion into various embedding vectors considering the dimension of vector, the length of sequence, and the window size. In addition, the performance of the embedding methods used as well as the detection performance were compared and evaluated through GRU-based anomaly detection model using vectors generated by the embedding model as an input. Compared to the statistical model, it was confirmed that the Skip-gram maintains more stable performance without biasing a specific window size or sequence length, and is more effective in making each event of sequence data into an embedding vector.

Comparative Analysis of Box-office Related Statistics and Diffusion in Korea and US Film Markets (한국과 미국에 있어 영화 수익관련 통계량과 확산 현상의 비교분석)

  • Kim, Taegu;Hong, Jungsik
    • Korean Management Science Review
    • /
    • v.32 no.1
    • /
    • pp.133-145
    • /
    • 2015
  • Motion picture industry in Korea has been growing constantly and aroused various kinds of research attention. Particularly, the introduction of official box-office database service brought quantitative studies. However, approaches based on diffusion models have been rarely found with domestic film markets. In addition to the fundamental statistical review on Korea and US film markets, we applied a diffusion model to daily box-office revenue. Unlike conventional preference of Gamma distribution on the film markets, estimation results proved that BMIC can also explain the trend of daily revenue successfully. The comparison with BMIC showed that there is a distinctive difference in diffusion patterns of Korea and US film markets. Generally, word-of-mouth effect appeared more significant in Korea.

A study on Clothing Behavior and Preference of clothing Design on the Comparison of Body types of Chinese Women (중국 여성의 체형별 의복행동 및 의상디자인 선호도 연구)

  • Kim, Hyo-Sook;Lim, Soon;Son, Hee-Heong
    • Journal of the Korean Home Economics Association
    • /
    • v.39 no.11
    • /
    • pp.15-26
    • /
    • 2001
  • China adopted a free market economy and is a member of WTO. It has now emerged as one of the most promising markets in the word for the near future. The purpose of this study was to investigate of Chinese women clothing behavior and preference of clothing design by body types and to suggest basic information for high quality clothes merchandising of exporting to China. The subjects in this study were 280 Chinese women, aged from 20 to 50 living in Beijing. The survey was taken from June to July, 1999. SAS(Statistical Analysis System) is used for frequency, percentage, average, standard deviation, $\chi$$^2$-test, factor analysis. The results of this study are as follows. Ewamination on the Chinese womens clothing behavior showed that they attach importance to economy for purchasing clothes and have affirmative self-confidence. The thin body type women prefer to fashionable clothes while the fat body type has more reasonable economic behavior for clothing. It is needed to different merchandising project by body type in China..

  • PDF

An English Essay Scoring System Based on Grammaticality and Lexical Cohesion (문법성과 어휘 응집성 기반의 영어 작문 평가 시스템)

  • Kim, Dong-Sung;Kim, Sang-Chul;Chae, Hee-Rahk
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.3
    • /
    • pp.223-255
    • /
    • 2008
  • In this paper, we introduce an automatic system of scoring English essays. The system is comprised of three main components: a spelling checker, a grammar checker and a lexical cohesion checker. We have used such resources as WordNet, Link Grammar/parser and Roget's thesaurus for these components. The usefulness of an automatic scoring system depends on its reliability. To measure reliability, we compared the results of automatic scoring with those of manual scoring, on the basis of the Kappa statistics and the Multi-facet Rasch Model. The statistical data obtained from the comparison showed that the scoring system is as reliable as professional human graders. This system deals with textual units rather than sentential units and checks not only formal properties of a text but also its contents.

  • PDF

A Method for Compound Noun Extraction to Improve Accuracy of Keyword Analysis of Social Big Data

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.8
    • /
    • pp.55-63
    • /
    • 2021
  • Since social big data often includes new words or proper nouns, statistical morphological analysis methods have been widely used to process them properly which are based on the frequency of occurrence of each word. However, these methods do not properly recognize compound nouns, and thus have a problem in that the accuracy of keyword extraction is lowered. This paper presents a method to extract compound nouns in keyword analysis of social big data. The proposed method creates a candidate group of compound nouns by combining the words obtained through the morphological analysis step, and extracts compound nouns by examining their frequency of appearance in a given review. Two algorithms have been proposed according to the method of constructing the candidate group, and the performance of each algorithm is expressed and compared with formulas. The comparison result is verified through experiments on real data collected online, where the results also show that the proposed method is suitable for real-time processing.