• Title/Summary/Keyword: learners' corpus

Search Result 52, Processing Time 0.024 seconds

An Attempt to Measure the Familiarity of Specialized Japanese in the Nursing Care Field

  • Haihong Huang;Hiroyuki Muto;Toshiyuki Kanamaru
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.2
    • /
    • pp.57-74
    • /
    • 2023
  • Having a firm grasp of technical terms is essential for learners of Japanese for Specific Purposes (JSP). This research aims to analyze Japanese nursing care vocabulary based on objective corpus-based frequency and subjectively rated word familiarity. For this purpose, we constructed a text corpus centered on the National Examination for Certified Care Workers to extract nursing care keywords. The Log-Likelihood Ratio (LLR) was used as the statistical criterion for keyword identification, giving a list of 300 keywords as target words for a further word recognition survey. The survey involved 115 participants of whom 51 were certified care workers (CW group) and 64 were individuals from the general public (GP group). These participants rated the familiarity of the target keywords through crowdsourcing. Given the limited sample size, Bayesian linear mixed models were utilized to determine word familiarity rates. Our study conducted a comparative analysis of word familiarity between the CW group and the GP group, revealing key terms that are crucial for professionals but potentially unfamiliar to the general public. By focusing on these terms, instructors can bridge the knowledge gap more efficiently.

Predicting CEFR Levels in L2 Oral Speech, Based on Lexical and Syntactic Complexity

  • Hu, Xiaolin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.35-45
    • /
    • 2021
  • With the wide spread of the Common European Framework of Reference (CEFR) scales, many studies attempt to apply them in routine teaching and rater training, while more evidence regarding criterial features at different CEFR levels are still urgently needed. The current study aims to explore complexity features that distinguish and predict CEFR proficiency levels in oral performance. Using a quantitative/corpus-based approach, this research analyzed lexical and syntactic complexity features over 80 transcriptions (includes A1, A2, B1 CEFR levels, and native speakers), based on an interview test, Standard Speaking Test (SST). ANOVA and correlation analysis were conducted to exclude insignificant complexity indices before the discriminant analysis. In the result, distinctive differences in complexity between CEFR speaking levels were observed, and with a combination of six major complexity features as predictors, 78.8% of the oral transcriptions were classified into the appropriate CEFR proficiency levels. It further confirms the possibility of predicting CEFR level of L2 learners based on their objective linguistic features. This study can be helpful as an empirical reference in language pedagogy, especially for L2 learners' self-assessment and teachers' prediction of students' proficiency levels. Also, it offers implications for the validation of the rating criteria, and improvement of rating system.

A Hybrid Sentence Alignment Method for Building a Korean-English Parallel Corpus (한영 병렬 코퍼스 구축을 위한 하이브리드 기반 문장 자동 정렬 방법)

  • Park, Jung-Yeul;Cha, Jeong-Won
    • MALSORI
    • /
    • v.68
    • /
    • pp.95-114
    • /
    • 2008
  • The recent growing popularity of statistical methods in machine translation requires much more large parallel corpora. A Korean-English parallel corpus, however, is not yet enoughly available, little research on this subject is being conducted. In this paper we present a hybrid method of aligning sentences for Korean-English parallel corpora. We use bilingual news wire web pages, reading comprehension materials for English learners, computer-related technical documents and help files of localized software for building a Korean-English parallel corpus. Our hybrid method combines sentence-length based and word-correspondence based methods. We show the results of experimentation and evaluate them. Alignment results from using a full translation model are very encouraging, especially when we apply alignment results to an SMT system: 0.66% for BLEU score and 9.94% for NIST score improvement compared to the previous method.

  • PDF

Effects of Corpus Use on Error Identification in L2 Writing

  • Yoshiho Satake
    • Asia Pacific Journal of Corpus Research
    • /
    • v.4 no.1
    • /
    • pp.61-71
    • /
    • 2023
  • This study examines the effects of data-driven learning (DDL)-an approach employing corpora for inductive language pattern learning-on error identification in second language (L2) writing. The data consists of error identification instances from fifty-five participants, compared across different reference materials: the Corpus of Contemporary American English (COCA), dictionaries, and no use of reference materials. There are three significant findings. First, the use of COCA effectively identified collocational and form-related errors due to inductive inference drawn from multiple example sentences. Secondly, dictionaries were beneficial for identifying lexical errors, where providing meaning information was helpful. Finally, the participants often employed a strategic approach, identifying many simple errors without reference materials. However, while maximizing error identification, this strategy also led to mislabeling correct expressions as errors. The author has concluded that the strategic selection of reference materials can significantly enhance the effectiveness of error identification in L2 writing. The use of a corpus offers advantages such as easy access to target phrases and frequency information-features especially useful given that most errors were collocational and form-related. The findings suggest that teachers should guide learners to effectively use appropriate reference materials to identify errors based on error types.

A BERT-Based Automatic Scoring Model of Korean Language Learners' Essay

  • Lee, Jung Hee;Park, Ji Su;Shon, Jin Gon
    • Journal of Information Processing Systems
    • /
    • v.18 no.2
    • /
    • pp.282-291
    • /
    • 2022
  • This research applies a pre-trained bidirectional encoder representations from transformers (BERT) handwriting recognition model to predict foreign Korean-language learners' writing scores. A corpus of 586 answers to midterm and final exams written by foreign learners at the Intermediate 1 level was acquired and used for pre-training, resulting in consistent performance, even with small datasets. The test data were pre-processed and fine-tuned, and the results were calculated in the form of a score prediction. The difference between the prediction and actual score was then calculated. An accuracy of 95.8% was demonstrated, indicating that the prediction results were strong overall; hence, the tool is suitable for the automatic scoring of Korean written test answers, including grammatical errors, written by foreigners. These results are particularly meaningful in that the data included written language text produced by foreign learners, not native speakers.

Teaching Grammar for Spoken Korean to English-speaking Learners: Reported Speech Marker '-dae'. (영어권 학습자를 위한 한국어 구어 문법 교육 - 보고 표지 '-대'를 중심으로 -)

  • Kim, Young A;Cho, In Jung
    • Journal of Korean language education
    • /
    • v.23 no.1
    • /
    • pp.1-23
    • /
    • 2012
  • The development of corpus in recent years has attracted increased research on spoken Korean. Nevertheless, these research outcomes are yet to be meaningfully and adequately reflected in Korean language textbooks. The reported speech marker '-dae' is one of these areas that need more attention. This study investigates whether or not in textbooks '-dae' is clearly explained to English-speaking learners to prevent confusion and misuse. Based on a contrastive analysis of Korean and English, this study argues three points: Firstly, '-dae' should be introduced to Korean learners as an independent sentence ender rather than a contracted form of '-dago hae'. Secondly, it is necessary to teach English-speaking learners that '-dae' is not equivalent to the English report speech form. It functions more or less as a third person marker in Korean. Learners should be informed that '-dae' is used for statements in English, if those statements were hearsay but the source of information does not need to be specified. This is a very distinctive difference between Korean and English and should be emphasized in class when 'dae' is taught. Thirdly, '-dae' should be introduced before indirect speech constructions, because it is mainly used in simple statements and the frequency of '-dae' is very high in spoken Korean.

Acoustic analysis of Korean trisyllabic words produced by English and Korean speakers

  • Lee, Jeong-Hwa;Rhee, Seok-Chae
    • Phonetics and Speech Sciences
    • /
    • v.10 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • The current study aimed to investigate the transfer of English word stress rules to the production of Korean trisyllabic words by L1 English learners of Korean. It compared English and Korean speakers' productions of seven Korean words from the corpus L2KSC (Rhee et al., 2005). To this end, it analyzed the syllable duration, intensity, and pitch. The results showed that English and Korean speakers' pronunciations differed markedly in duration and intensity. English learners produced word-initial syllables of greater intensity than Korean speakers, while Korean speakers produced word-final syllables of longer duration than English learners. However, these differences between the two speaker groups were not related to the expected L1 transfer. The tonal patterns produced by English and Korean speakers were similar, reflecting L1 English speakers' learning of the L2 Korean prosodic system.

How Korean Learner's English Proficiency Level Affects English Speech Production Variations

  • Hong, Hye-Jin;Kim, Sun-Hee;Chung, Min-Hwa
    • Phonetics and Speech Sciences
    • /
    • v.3 no.3
    • /
    • pp.115-121
    • /
    • 2011
  • This paper examines how L2 speech production varies according to learner's L2 proficiency level. L2 speech production variations are analyzed by quantitative measures at word and phone levels using Korean learners' English corpus. Word-level variations are analyzed using correctness to explain how speech realizations are different from the canonical forms, while accuracy is used for analysis at phone level to reflect phone insertions and deletions together with substitutions. The results show that speech production of learners with different L2 proficiency levels are considerably different in terms of performance and individual realizations at word and phone levels. These results confirm that speech production of non-native speakers varies according to their L2 proficiency levels, even though they share the same L1 background. Furthermore, they will contribute to improve non-native speech recognition performance of ASR-based English language educational system for Korean learners of English.

  • PDF

Modality in Korean Learners' Spoken Interlanguage

  • Park, Hyeson
    • English Language & Literature Teaching
    • /
    • v.18 no.1
    • /
    • pp.197-216
    • /
    • 2012
  • This study examines spoken interlanguage of Korean learners of English, focusing on the distribution of modal verbs and devices of epistemic modality. (Semi-) spontaneous speech data were collected from four students participating in a self-organized study group for seven months, which produced a corpus of about 55,000 words. The data analysis reveals the following: 1) The frequency of the modal verbs produced by the learners was lower than that of native speakers; 1.99 vs. 2.32 tokens per 100 words. The range of the modal verbs used by the learners was also very limited, with over-reliance on can (43%). 2) The grammatical categories of the devices marking epistemic modality were in the order of adverbs, lexical verbs, and modal verbs, with a high frequency of a few items in each category. 3) Lexical items conveying certainty and modals of obligation were preferred over markers of weaker commitment, resulting in speech characterized by firmer assertions and a more authoritative tone, a potential cause for pragmatic failure. 4) A weak developmental change was observed in the frequency of modal verbs, but not in their functions over the seven month period of data collection. L1 influence, L2 proficiency, mode of communication, and instruction effects are discussed as possible variables involved in the distribution patterns observed.

  • PDF

The Formant Frequency Differences of English Vowels as a Function of Stress and its Applications on Vowel Pronunciation Training (강세에 따른 영어 모음의 포먼트 변이와 모음 발음 교육에의 응용)

  • Kim, Ji-Eun;Yoon, Kyuchul
    • Phonetics and Speech Sciences
    • /
    • v.5 no.2
    • /
    • pp.53-58
    • /
    • 2013
  • The purpose of this study is to compare the first two vowel formants of the stressed and unstressed English vowels produced by ten young males (in their twenties and thirties) and ten old males (in their forties or fifties) from the Buckeye Corpus of Conversational Speech. The results indicate that the stressed and unstressed vowels, /i/ and $/{\ae}/$ in particular, from the two groups are different in their formant frequencies. In addition, the vowel space of the unstressed vowels is somewhat smaller than that of the stressed vowels. Specifically, the range of the second formant of the unstressed vowels and that of the first formant of the unstressed front vowels were compressed. The findings from this study can be applied to the pronunciation training for the Korean learners of English vowels. We propose that teachers of English pay attention to the stress patterns of English vowels as well as their formant frequencies.