• Title/Summary/Keyword: Text features

Search Result 580, Processing Time 0.028 seconds

A Study on Research Trend for Nurses' Workplace Bullying in Korea: Focusing on Semantic Network Analysis and Topic Modeling (간호사의 직장 내 괴롭힘에 대한 국내 연구 동향 분석: 의미연결망분석과 토픽모델링 중심)

  • Choi, Jeong Sil;Kim, Youngji
    • Korean Journal of Occupational Health Nursing
    • /
    • v.28 no.4
    • /
    • pp.221-229
    • /
    • 2019
  • Purpose: The aim of this study was to identify core keywords and topic groups of workplace bullying researches in the past 10 years for better understanding research trend. Methods: The study was conducted in four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building co-occurrence matrix and 4) analyzing network features and clustering topic groups. Results: 437 articles between 2010 and 2019 were retrieved from 5 databases (RISS, NDSL, Google scholar, DBPIA and Kyobo Scholar). Forty-one abstracts from these articles were extracted, and network analysis was conducted using semantic network module. The most important core keywords were 'turnover', 'intention', 'factor', 'program' and 'nursing'. Four topic groups were identified from Korean databases. Major topics were 'turnover' and 'organization culture'. Conclusion: After reviewing previous research, it has been found that turnover intention has been emphasized. Further research focused on various intervention is needed to relieve workplace bullying in nursing field.

Discovery Layer in Library Retrieval: VuFind as an Open Source Service for Academic Libraries in Developing Countries

  • Roy, Bijan Kumar;Mukhopadhyay, Parthasarathi;Biswas, Anirban
    • Journal of Information Science Theory and Practice
    • /
    • v.10 no.4
    • /
    • pp.3-22
    • /
    • 2022
  • This paper provides an overview of the emergence of resource discovery systems and services, along with their advantages, best practices, and current landscapes. It outlines some of the key services and functionalities of a comprehensive discovery model suitable for academic libraries in developing countries. The proposed model (VuFind as a discovery tool) performs like other existing web-scale resource discovery systems, both commercial and open-source, and is capable of providing information resources from different sources in a single-window search interface. The objective of the paper is to provide seamless access to globally distributed subscribed as well as open access resources through its discovery interface, based on a unified index. This model uses Koha, DSpace, and Greenstone as back-ends and VuFind as a discovery layer in the front-end and has also integrated many enhanced search features like Bento-box search, Geodetic search, and full-text search (using Apache Tika). The goal of this paper is to provide the academic community with a one-stop shop for better utilising and integrating heterogeneous bibliographic data sources with VuFind (https://vufind.org/vufind).

Research on Business Job Specification through Employment Information Analysis (채용정보 분석을 통한 비즈니스 직무 스펙 연구)

  • Lee, Jong Hwa;Lee, Hyun Kyu
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.271-287
    • /
    • 2022
  • Purpose This research aims to study the changes in recruitment needed for the growth and survival of companies in the rapidly changing industry. In particular, we built a real company's worklist accounting for the rapidly advancing data-driven digital transformation, and presented the capabilities and conditions required for work. Design/methodology/approach we selected 37 jobs based on NCS to develop the employment search requirements by analyzing the business characteristics and work capabilities of the industry and company. The business specification indicators were converted into a matrix through the TF-IDF process, and the NMF algorithm is used to extract the features of each document. Also, the cosine distance measurement method is utilized to determine the similarity of the job specification conditions. Findings Companies tended to prefer "IT competency," which is a specification related to computer use and certification, and "experience competency," which is a specification for experience and internship. In addition, 'foreign language competency' was additionally preferred depending on the job. This analysis and development of job requirements would not only help companies to find the talents but also be useful for the jobseekers to easily decide the priority of their specification activities.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • v.44 no.3
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

A Protein-Protein Interaction Extraction Approach Based on Large Pre-trained Language Model and Adversarial Training

  • Tang, Zhan;Guo, Xuchao;Bai, Zhao;Diao, Lei;Lu, Shuhan;Li, Lin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.3
    • /
    • pp.771-791
    • /
    • 2022
  • Protein-protein interaction (PPI) extraction from original text is important for revealing the molecular mechanism of biological processes. With the rapid growth of biomedical literature, manually extracting PPI has become more time-consuming and laborious. Therefore, the automatic PPI extraction from the raw literature through natural language processing technology has attracted the attention of the majority of researchers. We propose a PPI extraction model based on the large pre-trained language model and adversarial training. It enhances the learning of semantic and syntactic features using BioBERT pre-trained weights, which are built on large-scale domain corpora, and adversarial perturbations are applied to the embedding layer to improve the robustness of the model. Experimental results showed that the proposed model achieved the highest F1 scores (83.93% and 90.31%) on two corpora with large sample sizes, namely, AIMed and BioInfer, respectively, compared with the previous method. It also achieved comparable performance on three corpora with small sample sizes, namely, HPRD50, IEPA, and LLL.

Using Small Corpora of Critiques to Set Pedagogical Goals in First Year ESP Business English

  • Wang, Yu-Chi;Davis, Richard Hill
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.17-29
    • /
    • 2021
  • The current study explores small corpora of critiques written by Chinese and non-Chinese university students and how strategies used by these writers compare with high-rated L1 students. Data collection includes three small corpora of student writing; 20 student critiques in 2017, 23 student critiques from 2018, and 23 critiques from the online Michigan MICUSP collection at the University of Michigan. The researchers employ Text Inspector and Lexical Complexity to identify university students' vocabulary knowledge and awareness of syntactic complexity. In addition, WMatrix4® is used to identify and support the comparison of lexical and semantic differences among the three corpora. The findings indicate that gaps between Chinese and non-Chinese writers in the same university classes exist in students' knowledge of grammatical features and interactional metadiscourse. In addition, critiques by Chinese writers are more likely to produce shorter clauses and sentences. In addition, the mean value of complex nominal and coordinate phrases is smaller for Chinese students than for non-Chinese and MICUSP writers. Finally, in terms of lexical bundles, Chinese student writers prefer clausal bundles instead of phrasal bundles, which, according to previous studies, are more often found in texts of skilled writers. The current study's findings suggest incorporating implicit and explicit instruction through the implementation of corpora in language classrooms to advance skills and strategies of all, but particularly of Chinese writers of English.

RDNN: Rumor Detection Neural Network for Veracity Analysis in Social Media Text

  • SuthanthiraDevi, P;Karthika, S
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.12
    • /
    • pp.3868-3888
    • /
    • 2022
  • A widely used social networking service like Twitter has the ability to disseminate information to large groups of people even during a pandemic. At the same time, it is a convenient medium to share irrelevant and unverified information online and poses a potential threat to society. In this research, conventional machine learning algorithms are analyzed to classify the data as either non-rumor data or rumor data. Machine learning techniques have limited tuning capability and make decisions based on their learning. To tackle this problem the authors propose a deep learning-based Rumor Detection Neural Network model to predict the rumor tweet in real-world events. This model comprises three layers, AttCNN layer is used to extract local and position invariant features from the data, AttBi-LSTM layer to extract important semantic or contextual information and HPOOL to combine the down sampling patches of the input feature maps from the average and maximum pooling layers. A dataset from Kaggle and ground dataset #gaja are used to train the proposed Rumor Detection Neural Network to determine the veracity of the rumor. The experimental results of the RDNN Classifier demonstrate an accuracy of 93.24% and 95.41% in identifying rumor tweets in real-time events.

Classifications of Hadiths based on Supervised Learning Techniques

  • AbdElaal, Hammam M.;Bouallegue, Belgacem;Elshourbagy, Motasem;Matter, Safaa S.;AbdElghfar, Hany A.;Khattab, Mahmoud M.;Ahmed, Abdelmoty M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.1-10
    • /
    • 2022
  • This study aims to build a model is capable of classifying the categories of hadith, according to the reliability of hadith' narrators (sahih, hassan, da'if, maudu) and according to what was attributed to the Prophet Muhammad (saying, doing, describing, reporting ) using the supervised learning algorithms, with a view to discover a relationship between these classifications, based on the outputs of this model, which might be useful to avoid the controversy and useless debate on automatic classifications of hadith, using some of the statistical methods such as chi-square, information gain and association rules. The experimental results showed that there is a relation between these classifications, most of Sahih hadiths are belong to saying class, and most of maudu hadiths are belong to reporting class. Also the best classifier had given high accuracy was MultinomialNB, it achieved higher accuracy reached up to 0.9708 %, for his ability to process high dimensional problems and identifying the most important features that are relevant to target data in training stage. Followed by LinearSVC classifier, reached up to 0.9655, and finally, KNeighborsClassifier reached up to 0.9644.

Research on Community Knowledge Modeling of Readers Based on Interest Labels

  • Kai, Wang;Wei, Pan;Xingzhi, Chen
    • Journal of Information Processing Systems
    • /
    • v.19 no.1
    • /
    • pp.55-66
    • /
    • 2023
  • Community portraits can deeply explore the characteristics of community structures and describe the personalized knowledge needs of community users, which is of great practical significance for improving community recommendation services, as well as the accuracy of resource push. The current community portraits generally have the problems of weak perception of interest characteristics and low degree of integration of topic information. To resolve this problem, the reader community portrait method based on the thematic and timeliness characteristics of interest labels (UIT) is proposed. First, community opinion leaders are identified based on multi-feature calculations, and then the topic features of their texts are identified based on the LDA topic model. On this basis, a semantic mapping including "reader community-opinion leader-text content" was established. Second, the readers' interest similarity of the labels was dynamically updated, and two kinds of tag parameters were integrated, namely, the intensity of interest labels and the stability of interest labels. Finally, the similarity distance between the opinion leader and the topic of interest was calculated to obtain the dynamic interest set of the opinion leaders. Experimental analysis was conducted on real data from the Douban reading community. The experimental results show that the UIT has the highest average F value (0.551) compared to the state-of-the-art approaches, which indicates that the UIT has better performance in the smooth time dimension.

Improved Character-Based Neural Network for POS Tagging on Morphologically Rich Languages

  • Samat Ali;Alim Murat
    • Journal of Information Processing Systems
    • /
    • v.19 no.3
    • /
    • pp.355-369
    • /
    • 2023
  • Since the widespread adoption of deep-learning and related distributed representation, there have been substantial advancements in part-of-speech (POS) tagging for many languages. When training word representations, morphology and shape are typically ignored, as these representations rely primarily on collecting syntactic and semantic aspects of words. However, for tasks like POS tagging, notably in morphologically rich and resource-limited language environments, the intra-word information is essential. In this study, we introduce a deep neural network (DNN) for POS tagging that learns character-level word representations and combines them with general word representations. Using the proposed approach and omitting hand-crafted features, we achieve 90.47%, 80.16%, and 79.32% accuracy on our own dataset for three morphologically rich languages: Uyghur, Uzbek, and Kyrgyz. The experimental results reveal that the presented character-based strategy greatly improves POS tagging performance for several morphologically rich languages (MRL) where character information is significant. Furthermore, when compared to the previously reported state-of-the-art POS tagging results for Turkish on the METU Turkish Treebank dataset, the proposed approach improved on the prior work slightly. As a result, the experimental results indicate that character-based representations outperform word-level representations for MRL performance. Our technique is also robust towards the-out-of-vocabulary issues and performs better on manually edited text.