• Title/Summary/Keyword: Corpus-based Learning

Search Result 119, Processing Time 0.023 seconds

The effects of corpus-based vocabulary tasks on high school students' English vocabulary learning and attitude (코퍼스를 기반으로 한 어휘 과제가 고등학생의 영어 어휘 학습과 태도에 미치는 영향)

  • Lee, Hyun Jin;Lee, Eun-Joo
    • English Language & Literature Teaching
    • /
    • v.16 no.4
    • /
    • pp.239-265
    • /
    • 2010
  • This study investigates the effects of corpus-based vocabulary tasks on the acquisition of English vocabulary in an attempt to explore the influence of corpus use on EFL pedagogy. For this to be realized, a total of 40 Korean high school students participated in the study over a 4-week period. An experimental group used a set of corpus-based tasks for vocabulary learning, whereas a control group carried out a traditional task (i.e., the L1-L2 translation) for vocabulary learning. To assess learning gains, the students were asked to complete the pre- and post-treatment tests measuring the word form, meaning, and use aspects of target lexical items. Results of the study indicate that in the experimental group the corpus-based vocabulary tasks were beneficial for the learning of word forms and use. In particular, corpus-based benefits were greatest in the low-proficiency EFL learners' collocational aspects of vocabulary use. On the other hand, in the control group, the traditional vocabulary tasks benefited the meaning aspects of target vocabulary items the most. In addition, survey results revealed that most students were positive about the corpus-based learning experience although some expressed reservations about the heavy cognitive load and the time-consuming nature of the analysis of corpus data primarily due to learners' lack of language proficiency.

  • PDF

A Transformation-Based Learning Method on Generating Korean Standard Pronunciation

  • Kim, Dong-Sung;Roh, Chang-Hwa
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.241-248
    • /
    • 2007
  • In this paper, we propose a Transformation-Based Learning (TBL) method on generating the Korean standard pronunciation. Previous studies on the phonological processing have been focused on the phonological rule applications and the finite state automata (Johnson 1984; Kaplan and Kay 1994; Koskenniemi 1983; Bird 1995). In case of Korean computational phonology, some former researches have approached the phonological rule based pronunciation generation system (Lee et al. 2005; Lee 1998). This study suggests a corpus-based and data-oriented rule learning method on generating Korean standard pronunciation. In order to substituting rule-based generation with corpus-based one, an aligned corpus between an input and its pronunciation counterpart has been devised. We conducted an experiment on generating the standard pronunciation with the TBL algorithm, based on this aligned corpus.

  • PDF

Novice Corpus Users' Gains and Views on Corpus-based Lexical Development: A Case Study of COVID-19-related Expressions

  • Chen, Mei-Hua
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.1-11
    • /
    • 2021
  • Recently, corpus assisted vocabulary instruction has been attracting a lot of interest. Most studies have focused on understanding language learners' receptive vocabulary knowledge. Limited attention has been paid to learners' productive competence. To fill this gap, this study attended to learners' productive lexical development in terms of form, meaning and use respectively. This study introduced EFL learners to the corpus-based language pedagogy to learn COVID-19 theme-based vocabulary. To investigate the gains and views of 33 EFL first-year college students, a sentence completion task and a questionnaire were developed. Learners' productive performances in the three lexical knowledge aspects (i.e., form, meaning and use) were particularly targeted. The results revealed that the students achieved significant gains in all aspects regardless of their proficiency level. In particular, the less proficient students achieved greater knowledge retention compared with their highly proficient counterparts. Meanwhile, students showed positive attitudes towards the corpus-based approach to vocabulary learning.

A Corpus-based Lexical Analysis of the Speech Texts: A Collocational Approach

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.151-170
    • /
    • 2009
  • Recently speech texts have been increasingly used for English education because of their various advantages as language teaching and learning materials. The purpose of this paper is to analyze speech texts in a corpus-based lexical approach, and suggest some productive methods which utilize English speaking or writing as the main resource for the course, along with introducing the actual classroom adaptations. First, this study shows that a speech corpus has some unique features such as different selections of pronouns, nouns, and lexical chunks in comparison to a general corpus. Next, from a collocational perspective, the study demonstrates that the speech corpus consists of a wide variety of collocations and lexical chunks which a number of linguists describe (Lewis, 1997; McCarthy, 1990; Willis, 1990). In other words, the speech corpus suggests that speech texts not only have considerable lexical potential that could be exploited to facilitate chunk-learning, but also that learners are not very likely to unlock this potential autonomously. Based on this result, teachers can develop a learners' corpus and use it by chunking the speech text. This new approach of adapting speech samples as important materials for college students' speaking or writing ability should be implemented as shown in samplers. Finally, to foster learner's productive skills more communicatively, a few practical suggestions are made such as chunking and windowing chunks of speech and presentation, and the pedagogical implications are discussed.

  • PDF

The Role of Distributional Cues in the Acquisition of Verb Argument Structures

  • Kim, Mee-Sook
    • Language and Information
    • /
    • v.7 no.1
    • /
    • pp.87-99
    • /
    • 2003
  • This paper investigates the role of input frequency in the acquisition of verb argument structures based on distributional information of a corpus of utterances derived from the English CHILDES database (MacWhinney 1993). It has been widely accepted that children successfully learn verb argument structures by innate language mechanisms, such as linking rules which connect verb meanings and its syntactic structures. In contrast, an approach to language acquisition called “statistical language learning” has currently claimed that children could succeed in acquiring syntactic structures in the absence of innate language mechanisms, making use of distributional properties of the input. In this paper, I evaluate the feasibility of the statistical learning in acquiring verb argument structures, based on distributional information about locative verbs in parental input. The naturalistic data allow us to investigate to what extent the statistical learning approach can and cannot help children succeed in learning the syntax of locative verbs. Based on the results of English database analysis, I show that there is rich statistical information for learning the syntactic possibilities of locative verbs in parental input, despite some limitations in the statistical learning approach.

  • PDF

Effects of Corpus Use on Error Identification in L2 Writing

  • Yoshiho Satake
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.1
    • /
    • pp.61-71
    • /
    • 2023
  • This study examines the effects of data-driven learning (DDL)-an approach employing corpora for inductive language pattern learning-on error identification in second language (L2) writing. The data consists of error identification instances from fifty-five participants, compared across different reference materials: the Corpus of Contemporary American English (COCA), dictionaries, and no use of reference materials. There are three significant findings. First, the use of COCA effectively identified collocational and form-related errors due to inductive inference drawn from multiple example sentences. Secondly, dictionaries were beneficial for identifying lexical errors, where providing meaning information was helpful. Finally, the participants often employed a strategic approach, identifying many simple errors without reference materials. However, while maximizing error identification, this strategy also led to mislabeling correct expressions as errors. The author has concluded that the strategic selection of reference materials can significantly enhance the effectiveness of error identification in L2 writing. The use of a corpus offers advantages such as easy access to target phrases and frequency information-features especially useful given that most errors were collocational and form-related. The findings suggest that teachers should guide learners to effectively use appropriate reference materials to identify errors based on error types.

Generative probabilistic model with Dirichlet prior distribution for similarity analysis of research topic

  • Milyahilu, John;Kim, Jong Nam
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.4
    • /
    • pp.595-602
    • /
    • 2020
  • We propose a generative probabilistic model with Dirichlet prior distribution for topic modeling and text similarity analysis. It assigns a topic and calculates text correlation between documents within a corpus. It also provides posterior probabilities that are assigned to each topic of a document based on the prior distribution in the corpus. We then present a Gibbs sampling algorithm for inference about the posterior distribution and compute text correlation among 50 abstracts from the papers published by IEEE. We also conduct a supervised learning to set a benchmark that justifies the performance of the LDA (Latent Dirichlet Allocation). The experiments show that the accuracy for topic assignment to a certain document is 76% for LDA. The results for supervised learning show the accuracy of 61%, the precision of 93% and the f1-score of 96%. A discussion for experimental results indicates a thorough justification based on probabilities, distributions, evaluation metrics and correlation coefficients with respect to topic assignment.

A Corpus-based Analysis of EFL Learners' Use of Discourse Markers in Cross-cultural Communication

  • Min, Sujung
    • English Language & Literature Teaching
    • /
    • v.17 no.3
    • /
    • pp.177-194
    • /
    • 2011
  • This study examines the use of discourse markers in cross-cultural communication between EFL learners in an e-learning environment. The study analyzes the use of discourse markers in a corpus of an interactive web with a bulletin board system through which college students of English at Japanese and Korean universities interacted with each other discussing the topics of local and global issues. It compares the use of discourse markers in the learners' corpus to that of a native English speakers' corpus. The results indicate that discourse markers are useful interactional devices to structure and organize discourse. EFL learners are found to display more frequent use of referentially and cognitively functional discourse markers and a relatively rare use of other markers. Native speakers are found to use a wider variety of discourse markers for different functions. Suggestions are made for using computer corpora in understanding EFL learners' language difficulties and helping them become more interactionally competent speakers.

  • PDF

Comparison Thai Word Sense Disambiguation Method

  • Modhiran, Teerapong;Kruatrachue, Boontee;Supnithi, Thepchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1307-1312
    • /
    • 2004
  • Word sense disambiguation is one of the most important problems in natural language processing research topics such as information retrieval and machine translation. Many approaches can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy. The purpose of this paper is to compare three famous machine learning techniques, Snow, SVM and Naive Bayes in Word-Sense Disambiguation on Thai language. 10 ambiguous words are selected to test with word and POS features. The results show that SVM algorithm gives the best results in solving of Thai WSD and the accuracy rate is approximately 83-96%.

  • PDF

A Study of Research on Methods of Automated Biomedical Document Classification using Topic Modeling and Deep Learning (토픽모델링과 딥 러닝을 활용한 생의학 문헌 자동 분류 기법 연구)

  • Yuk, JeeHee;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.63-88
    • /
    • 2018
  • This research evaluated differences of classification performance for feature selection methods using LDA topic model and Doc2Vec which is based on word embedding using deep learning, feature corpus sizes and classification algorithms. In addition to find the feature corpus with high performance of classification, an experiment was conducted using feature corpus was composed differently according to the location of the document and by adjusting the size of the feature corpus. Conclusionally, in the experiments using deep learning evaluate training frequency and specifically considered information for context inference. This study constructed biomedical document dataset, Disease-35083 which consisted biomedical scholarly documents provided by PMC and categorized by the disease category. Throughout the study this research verifies which type and size of feature corpus produces the highest performance and, also suggests some feature corpus which carry an extensibility to specific feature by displaying efficiency during the training time. Additionally, this research compares the differences between deep learning and existing method and suggests an appropriate method by classification environment.