• Title/Summary/Keyword: online corpora

Search Result 7, Processing Time 0.019 seconds

A Study on the Method of Teaching Korean Synonyms Using Online Corpora (온라인 코퍼스를 활용한 한국어 유의어 교수 방안 연구)

  • 전지은
    • Language Facts and Perspectives
    • /
    • v.47
    • /
    • pp.177-203
    • /
    • 2019
  • The purpose of this study is to suggest the possibility of using online corpora for teaching synonyms in Korean. The research included how to develop the effective concordance learning materials for teaching synonyms in Korean using data driven learning(DDL). Because synonyms are similar in meaning and usage, even native speaker can not clearly explain the difference in synonyms. Furthermore, it is not easy to provide proper example sentences for each word, and it is a reality that the differentiation of the synonyms are not sufficiently provided in the Korean textbooks. In recent years, it has been claimed that DDL helps students produce vocabulary as well as comprehend vocabulary. Nevertheless, it is hard to find how the concordance materials should be made for them. In this study, we extract concordance examples from the various kinds of online corpora; written and spoken corpora, korean textbooks, newspapers. We presented how to make corpus-designed activities using concordance materials for teaching Korean synonyms. In order to examine the effects of DDL, five experimental lessons were given to a group of 15 advanced korean learners in the university and follow-up surveys(attitude-questionnaire) were conducted. This study is meaningful in that it proposed a new teaching method in Korean synonym education.

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

English No Matter Construction: A Construction-based Perspective

  • Kim, Jong-Bok;Lee, Seung Han
    • Journal of English Language & Literature
    • /
    • v.57 no.6
    • /
    • pp.959-976
    • /
    • 2011
  • The expression no matter, combining with an interrogative clause X, expresses 'it doesn't matter what the value is of X' and displays many syntactic and semantic peculiarities. To better understand the grammatical properties of the construction in question, we investigate English corpora available online and suggest that some of the irreducible properties the construction displays can be best captured by the inheritance mechanism which plays a central role in the HPSG and Construction Grammar. We show that the construction in question has its own constructional properties, but also inherits properties from related major head constructions.

Analyzing Contextual Polarity of Unstructured Data for Measuring Subjective Well-Being (주관적 웰빙 상태 측정을 위한 비정형 데이터의 상황기반 긍부정성 분석 방법)

  • Choi, Sukjae;Song, Yeongeun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.83-105
    • /
    • 2016
  • Measuring an individual's subjective wellbeing in an accurate, unobtrusive, and cost-effective manner is a core success factor of the wellbeing support system, which is a type of medical IT service. However, measurements with a self-report questionnaire and wearable sensors are cost-intensive and obtrusive when the wellbeing support system should be running in real-time, despite being very accurate. Recently, reasoning the state of subjective wellbeing with conventional sentiment analysis and unstructured data has been proposed as an alternative to resolve the drawbacks of the self-report questionnaire and wearable sensors. However, this approach does not consider contextual polarity, which results in lower measurement accuracy. Moreover, there is no sentimental word net or ontology for the subjective wellbeing area. Hence, this paper proposes a method to extract keywords and their contextual polarity representing the subjective wellbeing state from the unstructured text in online websites in order to improve the reasoning accuracy of the sentiment analysis. The proposed method is as follows. First, a set of general sentimental words is proposed. SentiWordNet was adopted; this is the most widely used dictionary and contains about 100,000 words such as nouns, verbs, adjectives, and adverbs with polarities from -1.0 (extremely negative) to 1.0 (extremely positive). Second, corpora on subjective wellbeing (SWB corpora) were obtained by crawling online text. A survey was conducted to prepare a learning dataset that includes an individual's opinion and the level of self-report wellness, such as stress and depression. The participants were asked to respond with their feelings about online news on two topics. Next, three data sources were extracted from the SWB corpora: demographic information, psychographic information, and the structural characteristics of the text (e.g., the number of words used in the text, simple statistics on the special characters used). These were considered to adjust the level of a specific SWB. Finally, a set of reasoning rules was generated for each wellbeing factor to estimate the SWB of an individual based on the text written by the individual. The experimental results suggested that using contextual polarity for each SWB factor (e.g., stress, depression) significantly improved the estimation accuracy compared to conventional sentiment analysis methods incorporating SentiWordNet. Even though literature is available on Korean sentiment analysis, such studies only used only a limited set of sentimental words. Due to the small number of words, many sentences are overlooked and ignored when estimating the level of sentiment. However, the proposed method can identify multiple sentiment-neutral words as sentiment words in the context of a specific SWB factor. The results also suggest that a specific type of senti-word dictionary containing contextual polarity needs to be constructed along with a dictionary based on common sense such as SenticNet. These efforts will enrich and enlarge the application area of sentic computing. The study is helpful to practitioners and managers of wellness services in that a couple of characteristics of unstructured text have been identified for improving SWB. Consistent with the literature, the results showed that the gender and age affect the SWB state when the individual is exposed to an identical queue from the online text. In addition, the length of the textual response and usage pattern of special characters were found to indicate the individual's SWB. These imply that better SWB measurement should involve collecting the textual structure and the individual's demographic conditions. In the future, the proposed method should be improved by automated identification of the contextual polarity in order to enlarge the vocabulary in a cost-effective manner.

Copy Raising Construction in English: A Usage-based Perspective

  • Kim, Jong-Bok
    • Language and Information
    • /
    • v.16 no.2
    • /
    • pp.1-15
    • /
    • 2012
  • In accounting for the so-called copy raising (CR) in English, the movement perspective has assumed that the embedded subject of the CR verb's sentential complement is raised to the matrix subject, leaving behind its pronominal copy. This kind of movement-based analysis raises both empirical and analytical issues, when considering variations in the pronominal copy constraint. This paper investigates the actual uses of the construction, using online-available corpora. Based on this corpus search, we classify two different types of copy raising predicates (genuine and perception), and discuss their grammatical properties in detail. We suggest that the simple copying rule couched upon movement operations is not enough to capture great variations in the uses of the construction, and show that interpretive constraints, e.g., perceptual characterization condition, play an important role in licensing the construction.

  • PDF

Multilayer Knowledge Representation of Customer's Opinion in Reviews (리뷰에서의 고객의견의 다층적 지식표현)

  • Vo, Anh-Dung;Nguyen, Quang-Phuoc;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.652-657
    • /
    • 2018
  • With the rapid development of e-commerce, many customers can now express their opinion on various kinds of product at discussion groups, merchant sites, social networks, etc. Discerning a consensus opinion about a product sold online is difficult due to more and more reviews become available on the internet. Opinion Mining, also known as Sentiment analysis, is the task of automatically detecting and understanding the sentimental expressions about a product from customer textual reviews. Recently, researchers have proposed various approaches for evaluation in sentiment mining by applying several techniques for document, sentence and aspect level. Aspect-based sentiment analysis is getting widely interesting of researchers; however, more complex algorithms are needed to address this issue precisely with larger corpora. This paper introduces an approach of knowledge representation for the task of analyzing product aspect rating. We focus on how to form the nature of sentiment representation from textual opinion by utilizing the representation learning methods which include word embedding and compositional vector models. Our experiment is performed on a dataset of reviews from electronic domain and the obtained result show that the proposed system achieved outstanding methods in previous studies.

  • PDF

Phrase-Chunk Level Hierarchical Attention Networks for Arabic Sentiment Analysis

  • Abdelmawgoud M. Meabed;Sherif Mahdy Abdou;Mervat Hassan Gheith
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.9
    • /
    • pp.120-128
    • /
    • 2023
  • In this work, we have presented ATSA, a hierarchical attention deep learning model for Arabic sentiment analysis. ATSA was proposed by addressing several challenges and limitations that arise when applying the classical models to perform opinion mining in Arabic. Arabic-specific challenges including the morphological complexity and language sparsity were addressed by modeling semantic composition at the Arabic morphological analysis after performing tokenization. ATSA proposed to perform phrase-chunks sentiment embedding to provide a broader set of features that cover syntactic, semantic, and sentiment information. We used phrase structure parser to generate syntactic parse trees that are used as a reference for ATSA. This allowed modeling semantic and sentiment composition following the natural order in which words and phrase-chunks are combined in a sentence. The proposed model was evaluated on three Arabic corpora that correspond to different genres (newswire, online comments, and tweets) and different writing styles (MSA and dialectal Arabic). Experiments showed that each of the proposed contributions in ATSA was able to achieve significant improvement. The combination of all contributions, which makes up for the complete ATSA model, was able to improve the classification accuracy by 3% and 2% on Tweets and Hotel reviews datasets, respectively, compared to the existing models.