• Title/Summary/Keyword: syntactic features

Search Result 94, Processing Time 0.021 seconds

An Analysis of Linguistic Features in Science Textbooks across Grade Levels: Focus on Text Cohesion (과학교과서의 학년 간 언어적 특성 분석 -텍스트 정합성을 중심으로-)

  • Ryu, Jisu;Jeon, Moongee
    • Journal of The Korean Association For Science Education
    • /
    • v.41 no.2
    • /
    • pp.71-82
    • /
    • 2021
  • Learning efficiency can be maximized by careful matching of text features to expected reader features (i.e., linguistic and cognitive abilities, and background knowledge). The present study aims to explore whether this systematic principle is reflected in the development of science textbooks. The current study examined science textbook texts on 20 measures provided by Auto-Kohesion, a Korean language analysis tool. In addition to surface-level features (basic counts, word-related measures, syntactic complexity measures) which have been commonly used in previous text analysis studies, the present study included cohesion-related features as well (noun overlap ratios, connectives, pronouns). The main findings demonstrate that the surface measures (e.g., word and sentence length, word frequency) overall increased in complexity with grade levels, whereas the majority of the other measures, particularly cohesion-related measures, did not systematically vary across grade levels. The current results suggest that students of lower grades are expected to experience learning difficulties and lowered motivation due to the challenging texts. Textbooks are also not likely to be suitable for students of higher grades to develop the ability to process difficulty level texts required for higher education. The current study suggests that various text-related features including cohesion-related measures need to be carefully considered in the process of textbook development.

The syntax comparative research of Korean and Chinese Adjectives (한·중 형용사 통사론적 비교 연구 - 형용사의 특징과 기능을 중심으로)

  • Dan, Mingjie
    • Cross-Cultural Studies
    • /
    • v.25
    • /
    • pp.483-527
    • /
    • 2011
  • The main focus of this dissertation is the comparative research of Korean and Chinese adjectives. With the comparison and contrast of the concepts, features and usages of Korean and Chinese adjectives, we have concluded some similarities and differences. The aim is to help Chinese learners who study Korean better understand the features of Korean adjectives and use them more easily. Korean belongs to 阿?泰?族 and expresses meanings with pronunciation; however, Chinese belongs to ?藏?族 and expresses meanings with characters. There are many similarities between those two languages that look completely different, such as pronunciation and grammar at some extent. Even the Chinese words in Korean are quite similar to Chinese. However, the two languages are very different from each other, from the detailed grammatical view. For instance, the auxiliary word in Korean and Chinese is completely different. Then, Korean has a concept: ?尾that does not exist in Chinese at all. Especially, about categories of words, it is very important and difficult to distinguish adjective and verb for the Chinese Korean-learners. One reason of the challenge is that some Korean adjectives are categorized as verbs in Chinese. For example, "like", "dislike", "fear" in Korean are "psychological adjective" however, they are "psychological verb" in Chinese. The differences in categorization always mislead learners in understanding whole articles. At the same time, they cause more problems and difficulties in learning other grammatical items for Chinese Korean-learners. Based on that, the dissertation is helpful for Chinese learners who are studying Korean. Starting from the most basic concepts, the second chapter focuses on analyzing the similarities and differences between Korean and Chinese adjectives. The correct understanding of adjective is the basis of accurate learning of it. With the comparison of concepts and primary comprehension of adjective, the third chapter analyzes in detail about the features of Korean and Chinese adjective from grammar and meaning. Based on those features, we analyze the detailed usages of Korean and Chinese adjective in articles; especially we provide the detailed explanations of adjective changes in different tense and ?尾 changes in using with noun and verb. The fourth chapter emphasizes the similarities and differences of adjective meanings in Korean and Chinese. We have provided the comparative analyses from six different views, which could be helpful for Chinese Korean-learners. Until now, there are few comparative studies of Korean and Chinese adjectives. About this dissertation, some limitations also exist in such an area. However, we hope it could provide some help for Chinese Korean-learners, and more profound research will be developed in the future.

Dynamic Expansion of Semantic Dictionary for Topic Extraction in Automatic Summarization (자동요약의 주제어 추출을 위한 의미사전의 동적 확장)

  • Choo, Kyo-Nam;Woo, Yo-Seob
    • Journal of IKEEE
    • /
    • v.13 no.2
    • /
    • pp.241-247
    • /
    • 2009
  • This paper suggests the expansion methods of semantic dictionary, taking Korean semantic features account. These methods will be used to extract a practical topic word in the automatic summarization. The first is the method which is constructed the synonym dictionary for improving the performance of semantic-marker analysis. The second is the method which is extracted the probabilistic information from the subcategorization dictionary for resolving the syntactic and semantic ambiguity. The third is the method which is predicted the subcategorization patterns of the unregistered predicate, for the resolution of an affix-derived predicate.

  • PDF

A Study on Utilization of Wikipedia Contents for Automatic Construction of Linguistic Resources (언어자원 자동 구축을 위한 위키피디아 콘텐츠 활용 방안 연구)

  • Yoo, Cheol-Jung;Kim, Yong;Yun, Bo-Hyun
    • Journal of Digital Convergence
    • /
    • v.13 no.5
    • /
    • pp.187-194
    • /
    • 2015
  • Various linguistic knowledge resources are required in order that machine can understand diverse variation in natural languages. This paper aims to devise an automatic construction method of linguistic resources by reflecting characteristics of online contents toward continuous expansion. Especially we focused to build NE(Named-Entity) dictionary because the applicability of NEs is very high in linguistic analysis processes. Based on the investigation on Korean Wikipedia, we suggested an efficient construction method of NE dictionary using the syntactic patterns and structural features such as metadatas.

A Study of Efficiency Information Filtering System using One-Hot Long Short-Term Memory

  • Kim, Hee sook;Lee, Min Hi
    • International Journal of Advanced Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.83-89
    • /
    • 2017
  • In this paper, we propose an extended method of one-hot Long Short-Term Memory (LSTM) and evaluate the performance on spam filtering task. Most of traditional methods proposed for spam filtering task use word occurrences to represent spam or non-spam messages and all syntactic and semantic information are ignored. Major issue appears when both spam and non-spam messages share many common words and noise words. Therefore, it becomes challenging to the system to filter correct labels between spam and non-spam. Unlike previous studies on information filtering task, instead of using only word occurrence and word context as in probabilistic models, we apply a neural network-based approach to train the system filter for a better performance. In addition to one-hot representation, using term weight with attention mechanism allows classifier to focus on potential words which most likely appear in spam and non-spam collection. As a result, we obtained some improvement over the performances of the previous methods. We find out using region embedding and pooling features on the top of LSTM along with attention mechanism allows system to explore a better document representation for filtering task in general.

Prosodic aspects of structural ambiguous sentences in Korean produced by Japanese intermediate Korean learners (한국어 구조적 중의성 문장에 대한 일본인 중급 한국어 학습자들의 발화양상)

  • Yune, YoungSook
    • Phonetics and Speech Sciences
    • /
    • v.7 no.3
    • /
    • pp.89-97
    • /
    • 2015
  • The aim of this study is to investigate the prosodic aspects of structural ambiguous sentences in Korean produced by Japanese Korean learners and the influence of their first language prosody. Previous studies reported that structural ambiguous sentences in Korean are different especially in prosodic phrasing. So we examined whether Japanese Korean leaners can also distinguish, in production, between two types of structural ambiguous sentences on the basis of prosodic features. For this purpose 4 Korean native speakers and 8 Japanese Korean learners participated in the production test. Analysis materials are 6 sentences where a relative clause modify either NP1 or NP1+NP2. The results show that Korean native speakers produced ambiguous sentences by different prosodic structure depending on their semantic and syntactic structure (left branching or right branching sentence). Japanese speakers also show distinct prosodic structure for two types of ambiguous sentences in most cases, but they have more errors in producing left branching sentences than right branching sentences. In addition to that, interference of Japanese pitch accent in the production of Korean ambiguous sentences was observed.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

Improved Character-Based Neural Network for POS Tagging on Morphologically Rich Languages

  • Samat Ali;Alim Murat
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.355-369
    • /
    • 2023
  • Since the widespread adoption of deep-learning and related distributed representation, there have been substantial advancements in part-of-speech (POS) tagging for many languages. When training word representations, morphology and shape are typically ignored, as these representations rely primarily on collecting syntactic and semantic aspects of words. However, for tasks like POS tagging, notably in morphologically rich and resource-limited language environments, the intra-word information is essential. In this study, we introduce a deep neural network (DNN) for POS tagging that learns character-level word representations and combines them with general word representations. Using the proposed approach and omitting hand-crafted features, we achieve 90.47%, 80.16%, and 79.32% accuracy on our own dataset for three morphologically rich languages: Uyghur, Uzbek, and Kyrgyz. The experimental results reveal that the presented character-based strategy greatly improves POS tagging performance for several morphologically rich languages (MRL) where character information is significant. Furthermore, when compared to the previously reported state-of-the-art POS tagging results for Turkish on the METU Turkish Treebank dataset, the proposed approach improved on the prior work slightly. As a result, the experimental results indicate that character-based representations outperform word-level representations for MRL performance. Our technique is also robust towards the-out-of-vocabulary issues and performs better on manually edited text.

Fake News Detector using Machine Learning Algorithms

  • Diaa Salama;yomna Ibrahim;Radwa Mostafa;Abdelrahman Tolba;Mariam Khaled;John Gerges;Diaa Salama
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.7
    • /
    • pp.195-201
    • /
    • 2024
  • With the Covid-19(Corona Virus) spread all around the world, people are using this propaganda and the desperate need of the citizens to know the news about this mysterious virus by spreading fake news. Some Countries arrested people who spread fake news about this, and others made them pay a fine. And since Social Media has become a significant source of news, .there is a profound need to detect these fake news. The main aim of this research is to develop a web-based model using a combination of machine learning algorithms to detect fake news. The proposed model includes an advanced framework to identify tweets with fake news using Context Analysis; We assumed that Natural Language Processing(NLP) wouldn't be enough alone to make context analysis as Tweets are usually short and do not follow even the most straightforward syntactic rules, so we used Tweets Features as several retweets, several likes and tweet-length we also added statistical credibility analysis for Twitter users. The proposed algorithms are tested on four different benchmark datasets. And Finally, to get the best accuracy, we combined two of the best algorithms used SVM ( which is widely accepted as baseline classifier, especially with binary classification problems ) and Naive Base.