• Title/Summary/Keyword: Corpus Analysis Tools

Search Result 13, Processing Time 0.115 seconds

Citation Practices in Academic Corpora: Implications for EAP Writing

  • Min, Su-Jung
    • English Language & Literature Teaching
    • /
    • v.10 no.3
    • /
    • pp.113-126
    • /
    • 2004
  • Explicit reference to the work of other authors is an essential feature of most academic research writings. Corpus analysis of academic text can reveal much about what writers actually do and why they do so. Application of corpus tools in language education has been well documented by many scholars (Pedersen, 1995, Swales, 1990, Thompson, 2000). They demonstrate how computer technology can assist in the effective analysis of corpus based data. For teaching purposes, tills recent research provides insights in the areas of English for Academe Purposes (EAP). The need for such support is evident when students have to use appropriate citations in their writings. Using Swales' (1990) division of citation forms into integral and non-integral and Thompson and Tnbble's (2001) classification scheme, this paper codifies academic texts in a corpus. The texts are academic research articles from different disciplines. The results lead into a comparison of the citation practices m different disciplines. Finally, it is argued that the information obtained in this study is useful for EAP writing courses in EFL countries.

  • PDF

A review of corpus research trends in Korean education (한국어 교육 관련 국내 코퍼스 연구 동향)

  • Shim, Eunji
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.43-48
    • /
    • 2021
  • The aim of this study is to analyze the trends of corpus driven research in Korean education. For this purpose, a total of 14 papers was searched online with the keywords including Korean corpus and Korean education. The data was categorized into three: vocabulary education, grammar education and corpus data construction methods. The analysis results suggest that the number of corpus studies in the field of Korean education is not large enough but continues to increase, especially in the research on data construction tools. This suggests there is a significant demand in corpus driven studies in Korean education field.

A Corpus-Based Study on Korean EFL Learners' Use of English Logical Connectors

  • Ha, Myung-Jeong
    • International Journal of Contents
    • /
    • v.10 no.4
    • /
    • pp.48-52
    • /
    • 2014
  • The purpose of this study was to examine 30 logical connectors in the essay writing of Korean university students for comparison with the use in similar types of native English writing. The main questions addressed were as follows: Do Korean EFL students tend to over- or underuse logical connectors? What types of connectors differentiate Korean learners from native use? To answer these questions, EFL learner data were compared with data from native speakers using computerized corpora and linguistic software tools to speed up the initial stage of the linguistic analysis. The analysis revealed that Korean EFL learners tend to overuse logical connectors in the initial position of the sentence, and that they tend to overuse additive connectors such as 'moreover', 'besides', and 'furthermore', whereas they underuse contrastive connectors such as 'yet' and 'instead'. On the basis of the results of this study, some pedagogical implications are made concerning the need for teaching of the semantic, stylistic, and syntactic behavior of logical connectors.

PPEditor: Semi-Automatic Annotation Tool for Korean Dependency Structure (PPEditor: 한국어 의존구조 부착을 위한 반자동 말뭉치 구축 도구)

  • Kim Jae-Hoon;Park Eun-Jin
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.63-70
    • /
    • 2006
  • In general, a corpus contains lots of linguistic information and is widely used in the field of natural language processing and computational linguistics. The creation of such the corpus, however, is an expensive, labor-intensive and time-consuming work. To alleviate this problem, annotation tools to build corpora with much linguistic information is indispensable. In this paper, we design and implement an annotation tool for establishing a Korean dependency tree-tagged corpus. The most ideal way is to fully automatically create the corpus without annotators' interventions, but as a matter of fact, it is impossible. The proposed tool is semi-automatic like most other annotation tools and is designed to edit errors, which are generated by basic analyzers like part-of-speech tagger and (partial) parser. We also design it to avoid repetitive works while editing the errors and to use it easily and friendly. Using the proposed annotation tool, 10,000 Korean sentences containing over 20 words are annotated with dependency structures. For 2 months, eight annotators have worked every 4 hours a day. We are confident that we can have accurate and consistent annotations as well as reduced labor and time.

Analysis of the English Textbooks in North Korean First Middle School (북한 제1중학교 영어교과서 분석)

  • Hwang, Seo-yeon;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.11
    • /
    • pp.242-251
    • /
    • 2017
  • For the purposes of this research, a corpus of words was created from the English textbooks of the "First Middle School" for the gifted in North Korea, and using the corpus, their linguistic characteristics were analyzed. Although there have been many studies that identified the traits of English textbooks in the North Korea's general middle school, not much focus has been placed on the English textbooks used at North Korea's First Middle School. Initially, the structure of English textbooks of the first, second, fourth, and sixth grades that had been procured from the Information Center on North Korea was reviewed, after which their corpus was created. Then, by using Wordsmith Tools 7.0, linguistic properties and high frequency content words appeared in the English textbook of the first grade were analyzed specifically. Basic statistical data gathered indicated that while the number of vocabulary did not increase as students progress through the grades, the words used tended to diversify incrementally. In the mean time, a distribution of the high frequency content words by grade illustrated that a big difference was found between the content words used in the English texts of each grade, and it was a subject matter of the texts that determined such difference.

Wallerian Degeneration of Insufficiently Affected White Matters in Old Infarction: Tract of Interest Analysis of Diffusion Tensor Imaging

  • Choi, Chi-Hoon;Lee, Jong-Min;Koo, Bang-Bon;Park, Jun-Sung;Kwon, Jun-Soo;Kim, Sun-I.
    • Journal of Biomedical Engineering Research
    • /
    • v.28 no.3
    • /
    • pp.317-324
    • /
    • 2007
  • The application of diffusion tensor imaging (DTI) and fiber tractography to Wallerian degeneration (WD) is important because this technique is a very potent tools for quantitatively evaluating fiber tracts in vivo brain. We analyzed a case and control using tracts of interest (TOI) analysis to quantify WD. We scanned a case of old infarction and an age-matched healthy volunteer. T1 magnetization prepared rapid acquisition gradient echo (MPRAGE), fluid attenuated inversion recovery (FLAIR) and 12-direction diffusion tensor imaging (DTI) were obtained and analyzed using TOI analysis. The value of mean diffusity ($D_{av}$) and fracional anisotrophy (FA) were analyzed statistically by MWU test. A p-value of less than 0.05 was considered to indicate statistical significance. A comparison of the global fiber diffusion characteristics shows WD of both the corpus callosum and the ipsilateral superior longitudinal fasciculus. The corpus callosum in particular showed trans-hemispherical degeneration. Local fiber characteristics along the geodesic paths show WD in the corpus callosum, ipsilateral superior longitudinal fasciculus, ipsilateral corticospinal tract, and ipsilateral corticothalamic tract. We have demonstrated changes in $D_{av}$ and FA values and a clear correspondence with the WD in various tracts. TOI analysis successfully revealed radial WD in white matter tracts from a region of encephalomalacia and primary gliosis, although they were only slightly affected.

Research on Development of Support Tools for Local Government Business Transaction Operation Using Big Data Analysis Methodology (빅데이터 분석 방법론을 활용한 지방자치단체 단위과제 운영 지원도구 개발 연구)

  • Kim, Dabeen;Lee, Eunjung;Ryu, Hanjo
    • The Korean Journal of Archival Studies
    • /
    • no.70
    • /
    • pp.85-117
    • /
    • 2021
  • The purpose of this study is to investigate and analyze the current status of unit tasks, unit task operation, and record management problems used by local governments, and to present improvement measures using text-based big data technology based on the implications derived from the process. Local governments are in a serious state of record management operation due to errors in preservation period due to misclassification of unit tasks, inability to identify types of overcommon and institutional affairs, errors in unit tasks, errors in name, referenceable standards, and tools. However, the number of unit tasks is about 720,000, which cannot be effectively controlled due to excessive quantities, and thus strict and controllable tools and standards are needed. In order to solve these problems, this study developed a system that applies text-based analysis tools such as corpus and tokenization technology during big data analysis, and applied them to the names and construction terms constituting the record management standard. These unit task operation support tools are expected to contribute significantly to record management tasks as they can support standard operability such as uniform preservation period, identification of delegated office records, control of duplicate and similar unit task creation, and common tasks. Therefore, if the big data analysis methodology can be linked to BRM and RMS in the future, it is expected that the quality of the record management standard work will increase.

A Diachronic Lexical Analysis of the North Korean English Textbooks (북한 영어 교과서 어휘의 통시적 분석)

  • Kim, Jiyoung;Lee, Je-Young;Kim, Jeong-ryeol
    • The Journal of the Korea Contents Association
    • /
    • v.17 no.4
    • /
    • pp.331-341
    • /
    • 2017
  • This paper aims to analyze English vocabulary of the North Korean textbooks diachronically using the constructed English textbook corpus. The North Korea English textbooks attained from Information Center on North Korea of the Ministry of Unification are divided into before and after Kim Jong-Il era for the year of 1996 in which the curriculum revision has been conducted. They are stored as text files to analyse vocabularies using WordSmith Tools 7.0. The vocabulary size of the revised textbooks increased after the curriculum reorganization, but the number of vocabulary types and vocabulary diversity decreased. After the curriculum revision, it was found that lots of vocabulary related to the establishment of the Kim Jong-Il system appeared as the keyword. It was also found that some vocabularies reflected the economic and social life of North Korea. In addition, through comparison of the 100 high-frequency word list and keywords, it can be concluded that the vocabulary of the English textbooks of North Korea is gradually changing into communicative contents from contents related with written language.

CosmoScriBe 2.0 : The development of Korean transcription tools (CosmoScriBe 2.0: 한국어 전사 도구의 개발)

  • Kwak, Sun-Dong;Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.3
    • /
    • pp.323-329
    • /
    • 2014
  • In spoken language research, transcription process needs to be carried out to translate voice data into text. Transcription tool, support program of transcription, offers various information such as content and time of utterance and speaker information. For this reason, inexperienced computer users are having trouble familiarizing with the program. Moreover, since there are little transcription tools developed domestically in Korea, they are usually not suitable for Korean environment. In this paper, we propose a transcription tool which supports not only Korean transcription but easy-to-use interface environment for novice. The transcription supporting function is also provided to minimize mistake that might happen in the process of transcription. And a system structure will be provided for data reliability. Usability of the proposed tool is evaluated in accordance with transcription experience. The evaluation result shows that transcription process and transcription support function have become faster and more convenient respectively.

A Corpus-based Analysis on Primary English Education Research for the Past 20 Years (초등영어교육 연구 논문의 변천: 코퍼스 기반 분석)

  • Choi, Wonkyung
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.2
    • /
    • pp.11-21
    • /
    • 2019
  • It has been about 20 years since the English subject was formally taught in public elementary schools in Korea. The present research aims to analyze the studies regarding 'primary English' implemented in Korea during the time period. I have investigated 6,467 theses or research papers in total that were published in Korea with the help of the corpus programs Utagger and WordSmith Tools. The results show that for the last 20 years the number of overall studies appears to have increased since the year 1997, although the recent trend seems to be in recession. The research scope ranges from 'teaching-learning interaction' to 'curriculum' and 'assessment', which have been steadily investigated for 20 years. Furthermore, researchers sometimes appear to have followed the English education policy by conducting particular investigations like 'immersion program' or 'native English speaking teachers' in a certain time period. Recently, researchers started to have interest in the cutting-edge ICT. In conclusion, the academic field of 'primary English' in Korea has grown in quantity, and the spectrum of research areas has been expanded for the past 20 years. It is hoped that the results of this research will help set a new direction for future research.