• Title/Summary/Keyword: Corpus Compilation

Search Result 4, Processing Time 0.019 seconds

Compilation of the Yonsei English Learner Corpus (YELC) 2011 and Its Use for Understanding Current Usage of English by Korean Pre-university Students (한국 예비 대학생의 영어 사용 특성 파악을 위한 대규모 공개 영어 학습자 코퍼스 구축 및 분석)

  • Rhee, Seok-Chae;Jung, Chae Kwan
    • The Journal of the Korea Contents Association
    • /
    • v.14 no.11
    • /
    • pp.1019-1029
    • /
    • 2014
  • In recent years, researchers have become increasingly interested in the creation and pedagogical use of English learner corpora. Many studies have shown that learner corpora can not only make a significant contribution to second language acquisition research but also contribute to the construction and evaluation of language tests by advancing our understanding of English learners. So far, however, little attention has been paid to the Korean EFL (English as a foreign language) learners' corpus. The Yonsei English Learner Corpus (YELC 2011) is a specialized, monolingual, and synchronic Korean EFL learner corpus that was developed by Yonsei University from 2011 to 2012. Over 3,000 Korean high school graduates (or equivalents) who were accepted by Yonsei University for their further studies participated in this project. It consists of 6,572 written texts (1,085,828 words) at nine different English proficiency levels. In this paper, we describe its compilation, and more specifically, how we have corpusized from a text archive to a corpus. After introducing the process of corpusization, we report arresting insights into the specific linguistic features that different proficiency levels of Korean learners of English have. This study also discusses the potential use of the YELC 2011 which is now freely available for research purposes.

A Compilation of Maritime English Corpus for English for Specific Purposes Education (특수목적영어 교육을 위한 해사영어코퍼스 구축)

  • Lee, Sung-Min;Kim, Jae-Hoon;Jhang, Se-Eun
    • Annual Conference on Human and Language Technology
    • /
    • 2015.10a
    • /
    • pp.163-164
    • /
    • 2015
  • 본 연구는 특수목적영어분인 해사영어코퍼스의 구축을 목적으로 한다. 구축과정에서 코퍼스 구축에 필요한 대표성과 균형성을 고려하여 네 가지 장르인 학술, 뉴스, 법, 책으로 나누고 각 하위코퍼스를 백만 단어씩 구축하였다. 코퍼스 구축과정에서 웹사이트와 PDF형태의 자료에서 텍스트만을 수집하고 정제하기 위하여 파이썬(Python) 프로그래밍 코딩을 하였고 무료 공개 프로그램도 병행하였다. 앞으로 해사영어코퍼스는 해사영어어휘교육에 필요한 단어목록제공이나 예문 검색 등을 통한 자료중심학습법에 활용될 수 있을 것이다. 또한 본 연구의 코퍼스구축 과정은 다른 분야의 ESP코퍼스 구축에도 응용 될 수 있을 것이다.

  • PDF

Study on Extraction of Headwords for Compilation of 「Donguibogam Dictionary」 - Based on Corpus-based Analysis - (『동의보감사전』 편찬을 위한 표제어 추출에 관한 연구 - 코퍼스 분석방법을 바탕으로 -)

  • Jung, Ji-Hun;Kim, Do-Hoon;Kim, Dong-Ryul
    • The Journal of Korean Medical History
    • /
    • v.29 no.1
    • /
    • pp.47-54
    • /
    • 2016
  • This article attempts to extract headwords for complication of "Donguibogam Dictionary" with Corpus-based Analysis. The computerized original text of Donguibogam is changed into a text file by a program 'EM Editor'. Chinese characters of high frequency of exposure among Chinese characters of Donguibogam are extracted by a Corpus-based analytical program 'AntConc'. Two-syllable, three-syllable, four-syllable, and five-syllable words including each Chinese characters of high frequency are extracted through n-cluster, one of functions of AntConc. Lastly, The output that is meaningful as a word is sorted. As a result, words that often appear in Donguibogam can be sorted in this article, and the names of books, medical herbs, disease symptoms, and prescriptions often appear especially. This way to extract headwords by this Corpus-based Analysis can suggest better headwords list for "Donguibogam Dictionary" in the future.

A Corpus-based English Syntax Academic Word List Building and its Lexical Profile Analysis (코퍼스 기반 영어 통사론 학술 어휘목록 구축 및 어휘 분포 분석)

  • Lee, Hye-Jin;Lee, Je-Young
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.12
    • /
    • pp.132-139
    • /
    • 2021
  • This corpus-driven research expounded the compilation of the most frequently occurring academic words in the domain of syntax and compared the extracted wordlist with Academic Word List(AWL) of Coxhead(2000) and General Service List(GSL) of West(1953) to examine their distribution and coverage within the syntax corpus. A specialized 546,074 token corpus, composed of widely used must-read syntax textbooks for English education majors, was loaded into and analyzed with AntWordProfiler 1.4.1. Under the parameter of lexical frequency, the analysis identified 288(50.5%) AWL word forms, appeared 16 times or more, as well as 218(38.2%) AWL items, occurred not exceeding 15 times. The analysis also indicated that the coverage of AWL and GSL accounted for 9.19% and 78.92% respectively and the combination of GSL and AWL amounted to 88.11% of all tokens. Given that AWL can be instrumental in serving broad disciplinary needs, this study highlighted the necessity to compile the domain-specific AWL as a lexical repertoire to promote academic literacy and competence.