• Title/Summary/Keyword: Lexical Statistics

Search Result 17, Processing Time 0.037 seconds

Effects of Lexical Aspect on the Interlanguage of Ibibio ESL Learners: Later than Sooner

  • Willie, Willie U.
    • Cross-Cultural Studies
    • /
    • v.43
    • /
    • pp.459-483
    • /
    • 2016
  • The main objective of this paper is to test the major prediction of the Aspect Hypothesis on interlanguage narratives collected from 171 Ibibio ESL learners in a classroom setting using sets of picture stories. Aspect Hypothesis predicts that lexical aspectual classes of verbs would determine the pattern of acquisition and distribution of tense-aspect morphology at the very early stages of L2 acquisition of tense-aspect verbal morphology. That is, telic verbs would be marked with the past tense-aspect verbal morphology before atelic verbs in the interlanguage of ESL learners irrespective of their L1 background. The results of our data analyses show a significant effect from the lexical aspect on the acquisition and distribution of tense-aspect morphology with chi-square statistics of ($x^2=196.92$,df = 6, n = 1664, p = <.0001). However, the effect of the lexical aspect is shown to be more prominent among Ibibio ESL learners at higher levels of proficiency. This is contrary to the prediction regarding Aspect Hypothesis. The paper concludes that the influence of the lexical aspect on the pattern of acquisition and distribution of tense-aspect morphology may be universal but the actual point along the developmental pathway when such influence is obtainable is yet to be determined. This calls for more research into the pattern of the L2 acquisition of tense-aspect verbal morphology.

Issues on Articles Covering Outstanding Management of Apartment Complexes - Content Analysis of Newspaper Reports with Lexical Statistics - (우수 아파트단지 취재기사에서의 관리상의 논점 - 탐방기사를 이용한 언어통계학적 내용분석 -)

  • Choi Jung-Min;Kang Soon-Joo
    • Journal of the Korean housing association
    • /
    • v.17 no.4
    • /
    • pp.131-143
    • /
    • 2006
  • Nowadays, diverse mass media discovers and introduces outstanding management cases of apartment complexes to induce vital competitions of constructors and active participation of residents to apartment management. This study statistically analyzed the management issues of outstanding apartment complexes that have been introduced by mass media with lexical criteria to examine the characteristics of their exemplary management. The key issues of outstanding apartment management are summarized as: efficient management of convenient facilities for residents, community activities based on residents' participation, and maintenance of pleasant living environments through transparent management. Also, the result of the relation arrangement of co-occurrence word from a Social Network Analysis included three key concepts of multi-family housing management - Maintenance Management, Operating Management, and Community Life Management - with emphasis on 'residents' and 'apartment complexes.' However, Operating Management was relatively deemphasized.

Corpus-Based Ambiguity-Driven Learning of Context- Dependent Lexical Rules for Part-of-Speech Tagging (품사태킹을 위한 어휘문맥 의존규칙의 말뭉치기반 중의성주도 학습)

  • 이상주;류원호;김진동;임해창
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.1
    • /
    • pp.178-178
    • /
    • 1999
  • Most stochastic taggers can not resolve some morphological ambiguities that can be resolved only by referring to lexical contexts because they use only contextual probabilities based ontag n-grams and lexical probabilities. Existing lexical rules are effective for resolving such ambiguitiesbecause they can refer to lexical contexts. However, they have two limitations. One is that humanexperts tend to make erroneous rules because they are deterministic rules. Another is that it is hardand time-consuming to acquire rules because they should be manually acquired. In this paper, wepropose context-dependent lexical rules, which are lexical rules based on the statistics of a taggedcorpus, and an ambiguity-driven teaming method, which is the method of automatically acquiring theproposed rules from a tagged corpus. By using the proposed rules, the proposed tagger can partiallyannotate an unseen corpus with high accuracy because it is a kind of memorizing tagger that canannotate a training corpus with 100% accuracy. So, the proposed tagger is useful to improve theaccuracy of a stochastic tagger. And also, it is effectively used for detecting and correcting taggingerrors in a manually tagged corpus. Moreover, the experimental results show that the proposed methodis also effective for English part-of-speech tagging.

Morpheme-based Korean broadcast news transcription (형태소 기반의 한국어 방송뉴스 인식)

  • Park Young-Hee;Ahn Dong-Hoon;Chung Minhwa
    • Proceedings of the KSPS conference
    • /
    • 2002.11a
    • /
    • pp.123-126
    • /
    • 2002
  • In this paper, we describe our LVCSR system for Korean broadcast news transcription. The main focus is to find the most proper morpheme-based lexical model for Korean broadcast news recognition to deal with the inflectional flexibilities in Korean. There are trade-offs between lexicon size and lexical coverage, and between the length of lexical unit and WER. In our system, we analyzed the training corpus to obtain a small 24k-morpheme-based lexicon with 98.8% coverage. Then, the lexicon is optimized by combining morphemes using statistics of training corpus under monosyllable constraint or maximum length constraint. In experiments, our system reduced the number of monosyllable morphemes from 52% to 29% of the lexicon and obtained 13.24% WER for anchor and 24.97% for reporter.

  • PDF

An English Essay Scoring System Based on Grammaticality and Lexical Cohesion (문법성과 어휘 응집성 기반의 영어 작문 평가 시스템)

  • Kim, Dong-Sung;Kim, Sang-Chul;Chae, Hee-Rahk
    • Korean Journal of Cognitive Science
    • /
    • v.19 no.3
    • /
    • pp.223-255
    • /
    • 2008
  • In this paper, we introduce an automatic system of scoring English essays. The system is comprised of three main components: a spelling checker, a grammar checker and a lexical cohesion checker. We have used such resources as WordNet, Link Grammar/parser and Roget's thesaurus for these components. The usefulness of an automatic scoring system depends on its reliability. To measure reliability, we compared the results of automatic scoring with those of manual scoring, on the basis of the Kappa statistics and the Multi-facet Rasch Model. The statistical data obtained from the comparison showed that the scoring system is as reliable as professional human graders. This system deals with textual units rather than sentential units and checks not only formal properties of a text but also its contents.

  • PDF

An Automatic Spam e-mail Filter System Using χ2 Statistics and Support Vector Machines (카이 제곱 통계량과 지지벡터기계를 이용한 자동 스팸 메일 분류기)

  • Lee, Songwook
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2009.05a
    • /
    • pp.592-595
    • /
    • 2009
  • We propose an automatic spam mail classifier for e-mail data using Support Vector Machines (SVM). We use a lexical form of a word and its part of speech (POS) tags as features. We select useful features with ${\chi}^2$ statistics and represent each feature using text frequency (TF) and inversed document frequency (IDF) values for each feature. After training SVM with the features, SVM classifies each email as spam mail or not. In experiment, we acquired 82.7% of accuracy with e-mail data collected from a web mail system.

  • PDF

The Study Of Lexical Statistics Analysis For Elementary School Textbook : Focusing On Comparing The SEJONG Corpus In Korean (초등학교 교과서의 어휘 통계 분석 연구 : 한국어 세종 코퍼스와의 비교를 중심으로)

  • Yu, Wonhee;Lim, Heuiseok
    • The Journal of Korean Association of Computer Education
    • /
    • v.18 no.1
    • /
    • pp.99-108
    • /
    • 2015
  • In this paper, we build a primary school textbook corpus and a statistical analysis was performed with respect to the vocabulary found in elementary textbooks. also We performed the Spearman's correlation coefficient in order to explore whether similar elementary textbooks in general life used vocabulary. the result of this study shows that corpus building in the form of elementary school textbooks and actual examples. then numerically shown correlation of the elementary textbooks and general corpus.

Spam Filter by Using X2 Statistics and Support Vector Machines (카이제곱 통계량과 지지벡터기계를 이용한 스팸메일 필터)

  • Lee, Song-Wook
    • The KIPS Transactions:PartB
    • /
    • v.17B no.3
    • /
    • pp.249-254
    • /
    • 2010
  • We propose an automatic spam filter for e-mail data using Support Vector Machines(SVM). We use a lexical form of a word and its part of speech(POS) tags as features and select features by chi square statistics. We represent each feature by TF(text frequency), TF-IDF, and binary weight for experiments. After training SVM with the selected features, SVM classifies each e-mail as spam or not. In experiment, the selected features improve the performance of our system and we acquired overall 98.9% of accuracy with TREC05-p1 spam corpus.

A Swearword Filter System for Online Game Chatting (온라인게임 채팅에서의 비속어 차단시스템)

  • Lee, Song-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.15 no.7
    • /
    • pp.1531-1536
    • /
    • 2011
  • We propose an automatic swearword filter system for online game chatting by using Support Vector Machines(SVM). We collected chatting sentences from online games and tagged them as normal sentences or swearword included sentences. We use n-gram syllables and lexical-part of speech (POS) tags of a word as features and select useful features by chi square statistics. Each selected feature is represented as binary weight and used in training SVM. SVM classifies each chatting sentence as swearword included one or not. In experiment, we acquired overall 90.4% of F1 accuracy.