• Title/Summary/Keyword: Common-Word

Search Result 248, Processing Time 0.018 seconds

Gathering Common-word and Document Reclassification to improve Accuracy of Document Clustering (문서 군집화의 정확률 향상을 위한 범용어 수집과 문서 재분류 알고리즘)

  • Shin, Joon-Choul;Ock, Cheol-Young;Lee, Eung-Bong
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.53-62
    • /
    • 2012
  • Clustering technology is used to deal efficiently with many searched documents in information retrieval system. But the accuracy of the clustering is satisfied to the requirement of only some domains. This paper proposes two methods to increase accuracy of the clustering. We define a common-word, that is frequently used but has low weight during clustering. We propose the method that automatically gathers the common-word and calculates its weight from the searched documents. From the experiments, the clustering error rates using the common-word is reduced to 34% compared with clustering using a stop-word. After generating first clusters using average link clustering from the searched documents, we propose the algorithm that reevaluates the similarity between document and clusters and reclassifies the document into more similar clusters. From the experiments using Naver JiSikIn category, the accuracy of reclassified clusters is increased to 1.81% compared with first clusters without reclassification.

Korean Part-of-Speech Tagging System Using Resolution Rules for Individual Ambiguous Word (어절별 중의성 해소 규칙을 이용한 혼합형 한국어 품사 태깅 시스템)

  • Park, Hee-Geun;Ahn, Young-Min;Seo, Young-Hoon
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.6
    • /
    • pp.427-431
    • /
    • 2007
  • In this paper we describe a Korean part-of-speech tagging approach using resolution rules for individual ambiguous word and statistical information. Our tagging approach resolves lexical ambiguities by common rules, rules for individual ambiguous word, and statistical approach. Common rules are ones for idioms and phrases of common use including phrases composed of main and auxiliary verbs. We built resolution rules for each word which has several distinct morphological analysis results to enhance tagging accuracy. Each rule may have morphemes, morphological tags, and/or word senses of not only an ambiguous word itself but also words around it. Statistical approach based on HMM is then applied for ambiguous words which are not resolved by rules. Experiment shows that the part-of-speech tagging approach has high accuracy and broad coverage.

A Study on the Production of the English Word Boundaries: A Comparative Analysis of Korean Speakers and English Speakers (영어 단어경계에 따른 발화 양상 연구: 한국인 화자와 영어 원어민 화자 비교 분석)

  • Kim, Ji Hyang;Kim, Kee Ho
    • Phonetics and Speech Sciences
    • /
    • v.6 no.1
    • /
    • pp.47-58
    • /
    • 2014
  • The purpose of this paper is to find out how Korean speakers' speech production in English word boundaries differs from English speakers' and to account for what bring about such differences. Seeing two consecutive words as one single cluster, the English speakers generally pronounce them naturally by linking a word-final consonant of the first word with a word-initial vowel of the second word, while this is not the case with most of the Korean speakers; they read the two consecutive words individually. In consequence, phonological processes such as resyllabification and aspiration can be found in the English speakers' word-boundary production, while glottalization, and unreleased stops are rather common phonological process seen in the Korean speakers' word-boundary production. This may be accounted for by Korean speakers' L1 interference, depending on English proficiency.

SOC Verification Based on WGL

  • Du, Zhen-Jun;Li, Min
    • Journal of Korea Multimedia Society
    • /
    • v.9 no.12
    • /
    • pp.1607-1616
    • /
    • 2006
  • The growing market of multimedia and digital signal processing requires significant data-path portions of SoCs. However, the common models for verification are not suitable for SoCs. A novel model--WGL (Weighted Generalized List) is proposed, which is based on the general-list decomposition of polynomials, with three different weights and manipulation rules introduced to effect node sharing and the canonicity. Timing parameters and operations on them are also considered. Examples show the word-level WGL is the only model to linearly represent the common word-level functions and the bit-level WGL is especially suitable for arithmetic intensive circuits. The model is proved to be a uniform and efficient model for both bit-level and word-level functions. Then Based on the WGL model, a backward-construction logic-verification approach is presented, which reduces time and space complexity for multipliers to polynomial complexity(time complexity is less than $O(n^{3.6})$ and space complexity is less than $O(n^{1.5})$) without hierarchical partitioning. Finally, a construction methodology of word-level polynomials is also presented in order to implement complex high-level verification, which combines order computation and coefficient solving, and adopts an efficient backward approach. The construction complexity is much less than the existing ones, e.g. the construction time for multipliers grows at the power of less than 1.6 in the size of the input word without increasing the maximal space required. The WGL model and the verification methods based on WGL show their theoretical and applicable significance in SoC design.

  • PDF

SYNCHRONIZED COMPONENTS OF A SUBSHIFT

  • Shahamat, Manouchehr
    • Journal of the Korean Mathematical Society
    • /
    • v.59 no.1
    • /
    • pp.1-12
    • /
    • 2022
  • We introduce the notion of a minimal synchronizing word; that is a synchronizing word whose proper subwords are not synchronized. This has been used to give a new shorter proof for a theorem in [6]. Also, the common synchronized components of a subshift and its derived set have been characterized.

The activation study of a regional community using school land (학교용지를 활용한 지역 공동체 활성화에 관한 기초 연구)

  • Park, Min-Young;Kim, Jin-Mo;Lim, Sooyoung
    • KIEAE Journal
    • /
    • v.13 no.6
    • /
    • pp.17-22
    • /
    • 2013
  • Community has been defined as a group of interacting people living in a common location. The word is often used to refer to a group that is organized around common values and is attributed with social cohesion within a shared geographical location, generally in social units larger than a household. The word can also refer to the national community or global community. The word "community" is derived from the Old French communit$\acute{e}$ which is derived from the Latin communitas (cum, "with/together" + munus, "gift"), a broad term for fellowship or organized society. A sense of community refers to people's perception of interconnection and interdependence, shared responsibility, and common goals. Understanding a community entails having knowledge of community needs and resources, having respect for community members, and involving key community members in programs. But, on account of industrial development, At some point, we have individualism behavior. therefore, This study will achieve local community activation using school land.

A Study on the Characteristic of Logomark in Apparel Brand - Focused on Unisex Casual Brand - (의류 브랜드 로고마크의 특성에 관한 연구 - 유니섹스 캐주얼 브랜드를 중심으로 -)

  • Lee, Min-Gyung;Rha, Soo-Im
    • The Research Journal of the Costume Culture
    • /
    • v.13 no.5 s.58
    • /
    • pp.833-843
    • /
    • 2005
  • The purpose of this study was to investigate the characteristic of logomark in unisex casual apparel brand. For this study, first 36 unisex casual apparel brands were selected from the Dictionary of Fashion Brand, second analyzed the common word showing in them. Third, the logomark of the unisex casual apparel brand were classified into two types according to the typeface character of the logomark, there were serif typeface, sans-serif typeface. Fourth, analyzed the relationship between the typeface image of logomark and brand concept. The results of the study were following : First, the common word that used the most frequently in brand concept were investigated and the order of common word was reasonable, comfort or natural, practical, modern, traditional and basic. Second, The unisex casual apparel brand used the most frequently the sans-serif typeface that represents the images of simple, modern and active sense in the typeface of logomark. Third, the unisex apparel brands used the most frequently English as brand name among the various languages. Fourth, the unisex casual apparel brands were lanuched mostly except several of them after 1990.

  • PDF

Fillers in the Hong Kong Corpus of Spoken English (HKCSE)

  • Seto, Andy
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.13-22
    • /
    • 2021
  • The present study employed an analytical framework that is characterised by a synthesis of quantitative and qualitative analyses with a specially designed computer software SpeechActConc to examine speech acts in business communication. The naturally occurring data from the audio recordings and the prosodic transcriptions of the business sub-corpora of the HKCSE (prosodic) are manually annotated with a speech act taxonomy for finding out the frequency of fillers, the co-occurring patterns of fillers with other speech acts, and the linguistic realisations of fillers. The discoursal function of fillers to sustain the discourse or to hold the floor has diverse linguistic realisations, ranging from a sound (e.g. 'uhuh') and a word (e.g. 'well') to sounds (e.g. 'um er') and words, namely phrase ('sort of') and clause (e.g. 'you know'). Some are even combinations of sound(s) and word(s) (e.g. 'and um', 'yes er um', 'sort of erm'). Among the top five frequent linguistic realisations of fillers, 'er' and 'um' are the most common ones found in all the six genres with relatively higher percentages of occurrence. The remaining more frequent realisations consist of clause ('you know'), word ('yeah') and sound ('erm'). These common forms are syntactically simpler than the less frequent realisations found in the genres. The co-occurring patterns of fillers and other speech acts are diverse. The more common co-occurring speech acts with fillers include informing and answering. The findings show that fillers are not only frequently used by speakers in spontaneous conversation but also mostly represented in sounds or non-linguistic realisations.

Intonational Pattern Frequency of Seoul Korean and Its Implication to Word Segmentation

  • Kim, Sa-Hyang
    • Speech Sciences
    • /
    • v.15 no.2
    • /
    • pp.21-30
    • /
    • 2008
  • The current study investigated distributional properties of the Korean Accentual Phrase and their implication to word segmentation. The properties examined were the frequency of various AP tonal patterns, the types of tonal patterns that are imposed upon content words, and the average number and temporal location of content words within the AP. A total of 414 sentences from the Read speech corpus and the Radio corpus were used for the data analysis. The results showed that the 84% of the APs contained one content word, and that almost 90% of the content words are located in AP-initial position. When the AP-initial onset was not an aspirated or tense consonant, the most common AP patterns were LH, LHH, and LHLH (78%), and 88% of the multisyllabic content words start with a rising tone in AP-initial position. When the AP-initial onset was an aspirated or tense consonant, the most common AP patterns were HH, HHLH, and HHL (72%), and 74% of the multisyllabic content words start with a level H tone in AP-initial position. The data further showed that 84.1% of APs end with the final H tone. The findings provide valuable information about the prosodic pattern and structure of Korean APs, and account for the results of a previous study which showed that Korean listeners are sensitive to AP-initial rising and AP-final high tones (Kim, 2007). This is in line with other cross-linguistic research which has revealed the correlation between prosodic probability and speech processing strategy.

  • PDF

An analysis of 6th graders' cognitive structure about division of fraction - Application of Word Association Test(WAT) - (분수의 나눗셈과 관련된 초등학교 6학년 학생들의 인지구조 분석 - 단어연상검사(Word Association Test) 적용 -)

  • Lee, Hyojin;Lee, Kwangho
    • The Mathematical Education
    • /
    • v.53 no.3
    • /
    • pp.329-355
    • /
    • 2014
  • The purpose of this study is to understand the difference of cognitive structure depending on the level of the 6th graders' problem-solving abilities about the division of fraction and to propose a method for improving the 6th graders' understanding about the division of fraction through the word association test. The following is the findings from this study. 1)The lower level students' is, the lower the step that the chunk appeared in cognitive structure is. 2)The basic level students' association frequency between any two concepts was less than the excellent level students and the ordinary level students' it. 3)The basic level students' connection number between concepts was far less than the excellent level students and the ordinary level students' it. 4)The connection between natural number and unit fractions, subtraction of fraction and division of fraction, division of fraction and reduction to common denominator, and division of fraction and common multiple that expected in this study did not appear in the three groups.