• Title/Summary/Keyword: analysis of corpus

Search Result 423, Processing Time 0.04 seconds

Corpus-Based Literary Analysis (코퍼스에 기반한 문학텍스트 분석)

  • Ha, Myung-Jeong
    • The Journal of the Korea Contents Association
    • /
    • v.13 no.9
    • /
    • pp.440-447
    • /
    • 2013
  • Recently corpus linguistic analyses enable researchers to examine meanings and structural features of data, that is not detected intuitively. While the potential of corpus linguistic techniques has been established and demonstrated for non-literary data, corpus stylistic analyses have been rarely performed in terms of the analysis of literature. Specifically this paper explores keywords and their role in text analysis, which is primary part of corpus linguistic analyses. This paper focuses on the application of techniques from corpus linguistics and the interpretation of results. This paper addresses the question of what is to be gained from keyword analysis by scrutinizing keywords in Shakespeare's Romeo and Juliet.

A review of corpus research trends in Korean education (한국어 교육 관련 국내 코퍼스 연구 동향)

  • Shim, Eunji
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.43-48
    • /
    • 2021
  • The aim of this study is to analyze the trends of corpus driven research in Korean education. For this purpose, a total of 14 papers was searched online with the keywords including Korean corpus and Korean education. The data was categorized into three: vocabulary education, grammar education and corpus data construction methods. The analysis results suggest that the number of corpus studies in the field of Korean education is not large enough but continues to increase, especially in the research on data construction tools. This suggests there is a significant demand in corpus driven studies in Korean education field.

Monophthong Analysis on a Large-scale Speech Corpus of Read-Style Korean (한국어 대용량발화말뭉치의 단모음분석)

  • Yoon, Tae-Jin;Kang, Yoonjung
    • Phonetics and Speech Sciences
    • /
    • v.6 no.3
    • /
    • pp.139-145
    • /
    • 2014
  • The paper describes methods of conducting vowel analysis from a large-scale corpus with the aids of forced alignment and optimal formant ceiling methods. 'Read Style Corpus of Standard Korean' is used for building the forced alignment system and a subset of the corpus for the processing and extraction of features for vowel analysis based on optimal formant ceiling. The results of the vowel analysis are reliable and comparable to the results obtained using traditional analytical methods. The findings indicate that the methods adopted for the analysis can be extended and be used for more fine-grained analysis without time-consuming manual labeling without losing accuracy and reliability.

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

A Corpus Analysis of Temporal Adverbs and Verb Tenses Cooccurrence in Spanish, English, and Chinese

  • Cheng, An Chung;Lu, Hui-Chuan
    • Asia Pacific Journal of Corpus Research
    • /
    • v.3 no.2
    • /
    • pp.1-16
    • /
    • 2022
  • This study investigates the cooccurrence between temporal adverbs and grammatical tenses in Spanish and contrasts temporal specifications across Spanish, English, and Chinese. Based on a monolingual Spanish corpus and a trilingual parallel corpus, the study identified the top ten frequent single-word temporal adverbs collocating with grammatical tenses in Spanish. It also contrasted the cooccurrence of temporal adverbs and verb tenses in three languages. The results show that aun 'still', hoy 'today', and ahora 'now' collocate with the present tense at more than 80%. Ayer 'yesterday' and finalmente 'finally' cooccurring with the simple past tense are at 84% and 69%, respectively. Then, mientras 'meanwhile' collocates with the past imperfect at 55%, the highest of all. Mañana 'tomorrow' cooccurs with the future and present tenses at 34%. Other adverbs, ya 'already', siempre 'always', and nuevamete 'again', do not present a strong cooccurrence tendency with a tense overall. The contrastive analysis of the trilingual parallel corpus shows a comprehensive view of temporal specifications in the three languages. However, no clear one-to-one mapping pattern of the cooccurrence across the three languages can be concluded, which provides helpful insights for second language instruction with natural language data rather than intuition. Future research with larger corpora is needed.

Modal Auxiliary Verbs in Japanese EFL Learners' Conversation: A Corpus-based Study

  • Nakayama, Shusaku
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.1
    • /
    • pp.23-34
    • /
    • 2021
  • This research examines Japanese non-native speakers' (JNNS) modal auxiliary verb use from two different perspectives: frequency of use and preferences for modalities. Additionally, error analysis is carried out to identify errors in modal use common among JNNSs. Their modal use is compared to that of English native speakers within a spoken dialogue corpus which is part of the International Corpus Network of Asian Learners' English. Research findings show at a statistically significant level that when compared to native speakers, JNNSs underuse past forms of modals and infrequently convey epistemic modality, indicating the possibility that JNNSs fail to express their opinions or thoughts indirectly when needed or to convey politeness appropriately. Error analysis identifies the following three types of common errors: (1) the use of incorrect tenses of modal verb phrases, (2) the use of inflected verb forms after modals, and (3) the non-use of main verbs after modals. The first type of error is largely because JNNSs do not master how to express past meanings of modals. The second and third types of errors seem to be due to first language transfer into second language acquisition and JNNSs' overgeneralization of the subject-verb agreement rules to modals respectively.

Use of ultrasonography for improving reproductive efficiency in cows II. Comparative evaluation of ovarian structures using ultrasonography and plasma progesterone analysis in subestrous dairy cows (초음파 진단장치를 이용한 축우의 번식효율증진에 관한 연구 II. 무발정 젖소에서 초음파검사 및 progesterone 농도측정에 의한 난소 구조물의 비교평가)

  • Son, Chang-ho;Kang, Byong-kyu;Choi, Han-sun;Kang, Hyun-gu;Paik, In-seok;Suh, Guk-hyun
    • Korean Journal of Veterinary Research
    • /
    • v.38 no.3
    • /
    • pp.642-651
    • /
    • 1998
  • The accuracy of ultrasonography for determining the presence of a functional corpus luteum in subestrous dairy cows was investigated, using a radioimmunoassay for progesterone in plasma. Luteal status (high or low progesterone concentrations) was diagnosed in 534 cows, using B-mode transrectal ultrasonography. Accuracy of ultrasonography was 96.3% and 88.8% in the cows with and without functional corpus luteum, respectively. In 362 cows diagnosed with functional corpus luteum by ultrasonographic examination, 20 cows were diagnosed with the non-functional corpus luteum by analysis of plasma progesterone concentrations (false positive). In 172 cows with non-functional corpus luteum by ultrasonographic examination, 13 cows were diagnosed with the functional corpus luteum based on plasma progesterone assay (false negative). Most of the corpus luteum with well-defined border and homogeneous echotexture were diagnosed with functional corpus luteum. All cows that were not detected a corpus luteum by ultrasonographic examination were diagnosed as non-functional corpus luteum. The corpus luteum of cows that were diagnosed with false positive appeared homogeneous echotexture and above 15 mm in diameter, but the corpus luteum was the non-functional corpus luteum within Day 5 (Day 0 is ovulation day) or after Day 19. The corpus luteum of cows that were diagnosed with false negative appeared heterogeneous echogenicity and below 15 mm in diameter, but the corpus luteum was the functional corpus luteum after Day 5 or around Day 17. It was concluded that accuracy of ultrasonography was excellent for determining the presence of a functional corpus luteum in subestrous dairy cows. The corpus luteum that was diagnosed with false positive or false negative was the developing or regressing states. Thus, ultrasonography was required a serial examination of two or three times accurately diagnosing these corpus luteum.

  • PDF

A Corpus-based Lexical Analysis of the Speech Texts: A Collocational Approach

  • Kim, Nahk-Bohk
    • English Language & Literature Teaching
    • /
    • v.15 no.3
    • /
    • pp.151-170
    • /
    • 2009
  • Recently speech texts have been increasingly used for English education because of their various advantages as language teaching and learning materials. The purpose of this paper is to analyze speech texts in a corpus-based lexical approach, and suggest some productive methods which utilize English speaking or writing as the main resource for the course, along with introducing the actual classroom adaptations. First, this study shows that a speech corpus has some unique features such as different selections of pronouns, nouns, and lexical chunks in comparison to a general corpus. Next, from a collocational perspective, the study demonstrates that the speech corpus consists of a wide variety of collocations and lexical chunks which a number of linguists describe (Lewis, 1997; McCarthy, 1990; Willis, 1990). In other words, the speech corpus suggests that speech texts not only have considerable lexical potential that could be exploited to facilitate chunk-learning, but also that learners are not very likely to unlock this potential autonomously. Based on this result, teachers can develop a learners' corpus and use it by chunking the speech text. This new approach of adapting speech samples as important materials for college students' speaking or writing ability should be implemented as shown in samplers. Finally, to foster learner's productive skills more communicatively, a few practical suggestions are made such as chunking and windowing chunks of speech and presentation, and the pedagogical implications are discussed.

  • PDF

Citation Practices in Academic Corpora: Implications for EAP Writing

  • Min, Su-Jung
    • English Language & Literature Teaching
    • /
    • v.10 no.3
    • /
    • pp.113-126
    • /
    • 2004
  • Explicit reference to the work of other authors is an essential feature of most academic research writings. Corpus analysis of academic text can reveal much about what writers actually do and why they do so. Application of corpus tools in language education has been well documented by many scholars (Pedersen, 1995, Swales, 1990, Thompson, 2000). They demonstrate how computer technology can assist in the effective analysis of corpus based data. For teaching purposes, tills recent research provides insights in the areas of English for Academe Purposes (EAP). The need for such support is evident when students have to use appropriate citations in their writings. Using Swales' (1990) division of citation forms into integral and non-integral and Thompson and Tnbble's (2001) classification scheme, this paper codifies academic texts in a corpus. The texts are academic research articles from different disciplines. The results lead into a comparison of the citation practices m different disciplines. Finally, it is argued that the information obtained in this study is useful for EAP writing courses in EFL countries.

  • PDF

'Because of Doing' and 'Because of Happening': A Corpus-based Analysis of Korean Causal Conjunctives, -nula(ko) and -nun palamey

  • Oh, Sang-Suk
    • Language and Information
    • /
    • v.8 no.2
    • /
    • pp.131-147
    • /
    • 2004
  • the two Korean causal conjunctive suffixes, -nula(ko) and -nun palamey, based on corpus linguistic analysis. Many of the linguistic accounts available, both in pedagogical reference and in the literature on linguistics, provide incomplete analyses of these suffixes, based on fabricated linguistic data. Using naturally occurring, real linguistic data, this paper examines the syntactic and semantic structures of the two causal suffixes through a consideration of three areas of corpus linguistic analysis: token frequencies, collocations, and semantic prosody. An analysis based on concordance data reveals that the two causal connectives, -nula(ko) and -nun palamey, have more differences than similarities in terms of syntactic and semantic constraints. The idiosyncratic structures of the two suffixes are discussed in terms of same subject condition, verb selection, same agent condition, synchronicity condition, and negative semantic prosody.

  • PDF