• Title/Summary/Keyword: Word Tree

Search Result 95, Processing Time 0.028 seconds

Dynamic recomposition of document category using user intention tree (사용자 의도 트리를 사용한 동적 카테고리 재구성)

  • Kim, Hyo-Lae;Jang, Young-Cheol;Lee, Chang-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.8B no.6
    • /
    • pp.657-668
    • /
    • 2001
  • It is difficult that web documents are classified with exact user intention because existing document classification systems are based on word frequency number using single keyword. To improve this defect, first, we use keyword, a query, domain knowledge. Like explanation based learning, first, query is analyzed with knowledge based information and then structured user intention information is extracted. We use this intention tree in the course of existing word frequency number based document classification as user information and constraints. Thus, we can classify web documents with more exact user intention. In classifying document, structured user intention information is helpful to keep more documents and information which can be lost in the system using single keyword information. Our hybrid approach integrating user intention information with existing statistics and probability method is more efficient to decide direction and range of document category than existing word frequency approach.

  • PDF

A Study on Phoneme Likely Units to Improve the Performance of Context-dependent Acoustic Models in Speech Recognition (음성인식에서 문맥의존 음향모델의 성능향상을 위한 유사음소단위에 관한 연구)

  • 임영춘;오세진;김광동;노덕규;송민규;정현열
    • The Journal of the Acoustical Society of Korea
    • /
    • v.22 no.5
    • /
    • pp.388-402
    • /
    • 2003
  • In this paper, we carried out the word, 4 continuous digits. continuous, and task-independent word recognition experiments to verify the effectiveness of the re-defined phoneme-likely units (PLUs) for the phonetic decision tree based HM-Net (Hidden Markov Network) context-dependent (CD) acoustic modeling in Korean appropriately. In case of the 48 PLUs, the phonemes /ㅂ/, /ㄷ/, /ㄱ/ are separated by initial sound, medial vowel, final consonant, and the consonants /ㄹ/, /ㅈ/, /ㅎ/ are also separated by initial sound, final consonant according to the position of syllable, word, and sentence, respectively. In this paper. therefore, we re-define the 39 PLUs by unifying the one phoneme in the separated initial sound, medial vowel, and final consonant of the 48 PLUs to construct the CD acoustic models effectively. Through the experimental results using the re-defined 39 PLUs, in word recognition experiments with the context-independent (CI) acoustic models, the 48 PLUs has an average of 7.06%, higher recognition accuracy than the 39 PLUs used. But in the speaker-independent word recognition experiments with the CD acoustic models, the 39 PLUs has an average of 0.61% better recognition accuracy than the 48 PLUs used. In the 4 continuous digits recognition experiments with the liaison phenomena. the 39 PLUs has also an average of 6.55% higher recognition accuracy. And then, in continuous speech recognition experiments, the 39 PLUs has an average of 15.08% better recognition accuracy than the 48 PLUs used too. Finally, though the 48, 39 PLUs have the lower recognition accuracy, the 39 PLUs has an average of 1.17% higher recognition characteristic than the 48 PLUs used in the task-independent word recognition experiments according to the unknown contextual factor. Through the above experiments, we verified the effectiveness of the re-defined 39 PLUs compared to the 48PLUs to construct the CD acoustic models in this paper.

Bi-directional Maximal Matching Algorithm to Segment Khmer Words in Sentence

  • Mao, Makara;Peng, Sony;Yang, Yixuan;Park, Doo-Soon
    • Journal of Information Processing Systems
    • /
    • v.18 no.4
    • /
    • pp.549-561
    • /
    • 2022
  • In the Khmer writing system, the Khmer script is the official letter of Cambodia, written from left to right without a space separator; it is complicated and requires more analysis studies. Without clear standard guidelines, a space separator in the Khmer language is used inconsistently and informally to separate words in sentences. Therefore, a segmented method should be discussed with the combination of the future Khmer natural language processing (NLP) to define the appropriate rule for Khmer sentences. The critical process in NLP with the capability of extensive data language analysis necessitates applying in this scenario. One of the essential components in Khmer language processing is how to split the word into a series of sentences and count the words used in the sentences. Currently, Microsoft Word cannot count Khmer words correctly. So, this study presents a systematic library to segment Khmer phrases using the bi-directional maximal matching (BiMM) method to address these problematic constraints. In the BiMM algorithm, the paper focuses on the Bidirectional implementation of forward maximal matching (FMM) and backward maximal matching (BMM) to improve word segmentation accuracy. A digital or prefix tree of data structure algorithm, also known as a trie, enhances the segmentation accuracy procedure by finding the children of each word parent node. The accuracy of BiMM is higher than using FMM or BMM independently; moreover, the proposed approach improves dictionary structures and reduces the number of errors. The result of this study can reduce the error by 8.57% compared to FMM and BFF algorithms with 94,807 Khmer words.

Research on Subjective-type Grading System Using Syntactic-Semantic Tree Comparator (구문의미트리 비교기를 이용한 주관식 문항 채점 시스템에 대한 연구)

  • Kang, WonSeog
    • The Journal of Korean Association of Computer Education
    • /
    • v.21 no.6
    • /
    • pp.83-92
    • /
    • 2018
  • The subjective question is appropriate for evaluation of deep thinking, but it is not easy to score. Since, regardless of same scoring criterion, the graders are able to produce different scores, we need the objective automatic evaluation system. However, the system has the problem of Korean analysis and comparison. This paper suggests the Korean syntactic analysis and subjective grading system using the syntactic-semantic tree comparator. This system is the hybrid grading system of word based and syntactic-semantic tree based grading. This system grades the answers on the subjective question using the syntactic-semantic comparator. This proposed system has the good result. This system will be utilized in Korean syntactic-semantic analysis, subjective question grading, and document classification.

The error character Revision System of the Korean using Semantic relationship of sentence component (문장 성분의 의미 관계를 이용한 한국어 오류 문자 교정 시스템)

  • Park, Hyun-Jae;Park, Hae-Sun;Kang, One-Il;Sohn, Young-Sun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.1
    • /
    • pp.28-32
    • /
    • 2004
  • Till now, Korean spelling proofreading system has corrected words of a sentence from the relationship of a collocation or the grammatical information of the sentence. In this paper, we propose a system that corrects a word using the relationship among the sememes in a single sentence and substitutes an apt word for a word of the sentence that has the meaningful mistake by a mistyping. The proposed system makes several sentences that are able to communicate with each sememe. The substantives forms meaning tree according to the meaning of the word and the predicate of a sentence defines the meaningful relationship between a substantives of the subject and the object. After this system compares and analyzes the relationship of meaning, it corrects the mistyping of a word in a single sentence that includes an error. If the system finds out the semantic error by the mistyping, it applies the spelling proofreading method that proposed in this paper.

Exploring the role of referral efficacy in the relationship between consumer innovativeness and intention to generate word of mouth

  • Yoo, Chul Woo;Jin, Sung;Sanders, G. Lawrence
    • Agribusiness and Information Management
    • /
    • v.5 no.2
    • /
    • pp.27-37
    • /
    • 2013
  • Referral marketing plays an important role in promoting new products. When it comes to innovative agricultural products, early adopter's review or recommendation has a more critical impact on follower's purchase decision making. Hence, understanding of consumer's characteristics and needs play more important role in success of innovation. More particularly, other researchers pay attention to the role of consumer innovativeness. This study attempts to fill this gap in knowledge between innovative propensity of consumer and her/his intention to generate positive word of mouth about new agricultural products. Furthermore, in this paper, we adopt Vandecasteele and Geunes' motivated consumer innovativeness model to investigate consumer innovativeness in extrinsic motive and intrinsic motive level, and examine the moderating role of referral efficacy. For empirical verification, survey method is used for data collection. Partial least square (PLS) is adopted to analyze the data. Finally, several theoretical contributions and practical implications are discussed.

Sentiment Analysis using Latent Structural SVM (잠재 구조적 SVM을 활용한 감성 분석기)

  • Yang, Seung-Won;Lee, Changki
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.5
    • /
    • pp.240-245
    • /
    • 2016
  • In this study, comments on restaurants, movies, and mobile devices, as well as tweet messages regardless of specific domains were analyzed for sentimental information content. We proposed a system for extraction of objects (or aspects) and opinion words from each sentence and the subsequent evaluation. For the sentiment analysis, we conducted a comparative evaluation between the Structural SVM algorithm and the Latent Structural SVM. As a result, the latter showed better performance and was able to extract objects/aspects and opinion words using VP/NP analyzed by the dependency parser tree. Lastly, we also developed and evaluated the sentiment detector model for use in practical services.

Generalization of error decision rules in a grammar checker using Korean WordNet, KorLex (명사 어휘의미망을 활용한 문법 검사기의 문맥 오류 결정 규칙 일반화)

  • So, Gil-Ja;Lee, Seung-Hee;Kwon, Hyuk-Chul
    • The KIPS Transactions:PartB
    • /
    • v.18B no.6
    • /
    • pp.405-414
    • /
    • 2011
  • Korean grammar checkers typically detect context-dependent errors by employing heuristic rules that are manually formulated by a language expert. These rules are appended each time a new error pattern is detected. However, such grammar checkers are not consistent. In order to resolve this shortcoming, we propose new method for generalizing error decision rules to detect the above errors. For this purpose, we use an existing thesaurus KorLex, which is the Korean version of Princeton WordNet. KorLex has hierarchical word senses for nouns, but does not contain any information about the relationships between cases in a sentence. Through the Tree Cut Model and the MDL(minimum description length) model based on information theory, we extract noun classes from KorLex and generalize error decision rules from these noun classes. In order to verify the accuracy of the new method in an experiment, we extracted nouns used as an object of the four predicates usually confused from a large corpus, and subsequently extracted noun classes from these nouns. We found that the number of error decision rules generalized from these noun classes has decreased to about 64.8%. In conclusion, the precision of our grammar checker exceeds that of conventional ones by 6.2%.

Exact Matching Algorithm on Expanded Word Suffix Tree (확장된 단어 서픽스 트리에서의 완전매칭 알고리즘)

  • 박준영;정원형;김삼묘
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2000.10a
    • /
    • pp.575-577
    • /
    • 2000
  • DNA 염기 서열을 분석하는데 효율적으로 쓸 수 있는 자료구조서 서픽스 트리(Suffix Tree)가 제시되었다. 그러나 매우 큰 유전자 서열에 대한 서픽스 트리는 대용량의 메모리 공간을 필요로 한다. 따라서 메모리 공간의 절약을 위해서 단어 서픽스 트리를 이용하는 방법이 제안되었다. 단어 서픽스 트리는 이러한 장점에도 불구하고 단어에 의미를 두고 만든 트리 구조이기 때문에 완전 매칭 문제를 해결하기 위한 정보가 부족해서 제한적 완전 매칭 알고리즘이 제시되었다. 제한적 완전 매칭 알고리즘에서는 찾으려는 패턴이 어떤 단어의 부-문자열에 위치하거나, 두 단어 이상에 걸쳐 나오면 찾지 못하는 문제가 발생한다. 본 논문에서는 단어 서픽스 트리의 완전 매칭 문제를 해결하기 위해 각 단어들의 서픽스에 대한 정보로 구성된 Generalized 서픽스 트리를 사용하여 확장된 단어 서픽스 트리를 제시하고, 완전 매칭 알고리즘을 제안한다.

  • PDF

Improvement of algorithm for calculating word count using character hash and binary search tree (문자 해시와 이원 탐색 트리를 이용한 어절 빈도 계산 알고리즘의 성능 개선)

  • Park, Il-Nam;Kang, Seung-Shik
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.599-602
    • /
    • 2010
  • 인터넷 검색 사이트는 사용자들이 검색한 단어들의 순위를 매기는 실시간 검색 순위 서비스를 제공하는데 검색되는 단어들의 순위를 매기기 위해서는 각 단어들의 분포도를 알 수 있는 어절 빈도 계산을 수행해야 한다. 어절 빈도는 BST(Binary Search Tree)를 수행하여 계산할 수 있는데, 사용자에 의하여 검색되는 단어들은 길이와 그 형태가 다양하여 빈도 계산시에 BST 의 깊이가 깊어져서 계산 시간이 오래 걸리게 된다. 본 논문에서는 문자 해시를 이용하여 깊이가 깊은 BST 의 탐색 속도를 개선하는 알고리즘을 제안하였다. 이 방법으로 빈도 계산 속도를 비교하였을 때 문자 해시의 범위에 의해 1KB 의 추가적인 기억공간의 사용하여 9.3%의 성능 개선 효과가 있었고, 해시 공간을 10KB 추가로 사용할 때는 24.3%, 236KB 일 때는 40.6%로의 효율로 BST 의 빈도 계산 속도를 향상 시킬 수 있었다.