• Title/Summary/Keyword: Natural language processing (NLP)

Search Result 168, Processing Time 0.025 seconds

Design of Query Processing System to Retrieve Information from Social Network using NLP

  • Virmani, Charu;Juneja, Dimple;Pillai, Anuradha
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.3
    • /
    • pp.1168-1188
    • /
    • 2018
  • Social Network Aggregators are used to maintain and manage manifold accounts over multiple online social networks. Displaying the Activity feed for each social network on a common dashboard has been the status quo of social aggregators for long, however retrieving the desired data from various social networks is a major concern. A user inputs the query desiring the specific outcome from the social networks. Since the intention of the query is solely known by user, therefore the output of the query may not be as per user's expectation unless the system considers 'user-centric' factors. Moreover, the quality of solution depends on these user-centric factors, the user inclination and the nature of the network as well. Thus, there is a need for a system that understands the user's intent serving structured objects. Further, choosing the best execution and optimal ranking functions is also a high priority concern. The current work finds motivation from the above requirements and thus proposes the design of a query processing system to retrieve information from social network that extracts user's intent from various social networks. For further improvements in the research the machine learning techniques are incorporated such as Latent Dirichlet Algorithm (LDA) and Ranking Algorithm to improve the query results and fetch the information using data mining techniques.The proposed framework uniquely contributes a user-centric query retrieval model based on natural language and it is worth mentioning that the proposed framework is efficient when compared on temporal metrics. The proposed Query Processing System to Retrieve Information from Social Network (QPSSN) will increase the discoverability of the user, helps the businesses to collaboratively execute promotions, determine new networks and people. It is an innovative approach to investigate the new aspects of social network. The proposed model offers a significant breakthrough scoring up to precision and recall respectively.

Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules (커널 Ripple-Down Rule을 이용한 태깅 말뭉치 오류 자동 수정)

  • Park, Tae-Ho;Cha, Jeong-Won
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.636-644
    • /
    • 2016
  • Annotated Corpus is important to understand natural language using machine learning method. In this paper, we propose a new method to automate error reduction of annotated corpora. We use the Ripple-Down Rules(RDR) for reducing errors and Kernel to extend RDR for NLP. We applied our system to the Korean Wikipedia and blog corpus errors to find the annotated corpora error type. Experimental results with various views from the Korean Wikipedia and blog are reported to evaluate the effectiveness and efficiency of our proposed approach. The proposed approach can be used to reduce errors of large corpora.

A study on Korean multi-turn response generation using generative and retrieval model (생성 모델과 검색 모델을 이용한 한국어 멀티턴 응답 생성 연구)

  • Lee, Hodong;Lee, Jongmin;Seo, Jaehyung;Jang, Yoonna;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.13-21
    • /
    • 2022
  • Recent deep learning-based research shows excellent performance in most natural language processing (NLP) fields with pre-trained language models. In particular, the auto-encoder-based language model proves its excellent performance and usefulness in various fields of Korean language understanding. However, the decoder-based Korean generative model even suffers from generating simple sentences. Also, there is few detailed research and data for the field of conversation where generative models are most commonly utilized. Therefore, this paper constructs multi-turn dialogue data for a Korean generative model. In addition, we compare and analyze the performance by improving the dialogue ability of the generative model through transfer learning. In addition, we propose a method of supplementing the insufficient dialogue generation ability of the model by extracting recommended response candidates from external knowledge information through a retrival model.

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

Machine Learning Language Model Implementation Using Literary Texts (문학 텍스트를 활용한 머신러닝 언어모델 구현)

  • Jeon, Hyeongu;Jung, Kichul;Kwon, Kyoungah;Lee, Insung
    • The Journal of the Convergence on Culture Technology
    • /
    • v.7 no.2
    • /
    • pp.427-436
    • /
    • 2021
  • The purpose of this study is to implement a machine learning language model that learns literary texts. Literary texts have an important characteristic that pairs of question-and-answer are not frequently clearly distinguished. Also, literary texts consist of pronouns, figurative expressions, soliloquies, etc. They hinder the necessity of machine learning using literary texts by making it difficult to learn algorithms. Algorithms that learn literary texts can show more human-friendly interactions than algorithms that learn general sentences. For this goal, this paper proposes three text correction tasks that must be preceded in researches using literary texts for machine learning language model: pronoun processing, dialogue pair expansion, and data amplification. Learning data for artificial intelligence should have clear meanings to facilitate machine learning and to ensure high effectiveness. The introduction of special genres of texts such as literature into natural language processing research is expected not only to expand the learning area of machine learning, but to show a new language learning method.

A Study of Automatic Ontology Building by Web Information Extraction and Natural Language Processing (웹 문서 정보추출과 자연어처리를 통한 온톨로지 자동구축에 관한 연구)

  • Kim, Myung-Gwan;Lee, Young-Woo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.9 no.3
    • /
    • pp.61-67
    • /
    • 2009
  • The proliferation of the Internet grows, according to electronic documents, along with increasing importance of technology in information retrieval. This research is possible to build a more efficient and accurate knowledge-base with unstructured text documents from the Web using to extract knowledge of the core meaning of LGG (Local Grammar Graph). We have built a ontology based on OWL(Web Ontology Language) using the areas of particular stocks up/down patterns created by the extraction and grammar patterns. It is possible for the user can search for meaning and quality of information about the user wants.

  • PDF

Analysis of Topics Related to Population Aging Using Natural Language Processing Techniques (자연어 처리 기술을 활용한 인구 고령화 관련 토픽 분석)

  • Hyunjung Park;Taemin Lee;Heuiseok Lim
    • Journal of Information Technology Services
    • /
    • v.23 no.1
    • /
    • pp.55-79
    • /
    • 2024
  • Korea, which is expected to enter a super-aged society in 2025, is facing the most worrisome crisis worldwide. Efforts are urgently required to examine problems and countermeasures from various angles and to improve the shortcomings. In this regard, from a new viewpoint, we intend to derive useful implications by applying the recent natural language processing techniques to online articles. More specifically, we derive three research questions: First, what topics are being reported in the online media and what is the public's response to them? Second, what is the relationship between these aging-related topics and individual happiness factors? Third, what are the strategic directions and implications for benchmarking discussed to solve the problem of population aging? To find answers to these, we collect Naver portal articles related to population aging and their classification categories, comments, and number of comments, including other numerical data. From the data, we firstly derive 33 topics with a semi-supervised BERTopic by reflecting article classification information that was not used in previous studies, conducting sentiment analysis of comments on them with a current open-source large language model. We also examine the relationship between the derived topics and personal happiness factors extended to Alderfer's ERG dimension, carrying out additional 3~4-gram keyword frequency analysis, trend analysis, text network analysis based on 3~4-gram keywords, etc. Through this multifaceted approach, we present diverse fresh insights from practical and theoretical perspectives.

The Identification Framework for source code author using Authorship Analysis and CNN (작성자 분석과 CNN을 적용한 소스 코드 작성자 식별 프레임워크)

  • Shin, Gun-Yoon;Kim, Dong-Wook;Hong, Sung-sam;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.19 no.5
    • /
    • pp.33-41
    • /
    • 2018
  • Recently, Internet technology has developed, various programs are being created and therefore various codes are being made through many authors. On this aspect, some author deceive a program or code written by other particular author as they make it themselves and use other writers' code indiscriminately, or not indicating the exact code which has been used. Due to this makes it more and more difficult to protect the code. In this paper, we propose author identification framework using Authorship Analysis theory and Natural Language Processing(NLP) based on Convolutional Neural Network(CNN). We apply Authorship Analysis theory to extract features for author identification in the source code, and combine them with the features being used text mining to perform author identification using machine learning. In addition, applying CNN based natural language processing method to source code for code author classification. Therefore, we propose a framework for the identification of authors using the Authorship Analysis theory and the CNN. In order to identify the author, we need special features for identifying the authors only, and the NLP method based on the CNN is able to apply language with a special system such as source code and identify the author. identification accuracy based on Authorship Analysis theory is 95.1% and identification accuracy applied to CNN is 98%.

An effective automated ontology construction based on the agriculture domain

  • Deepa, Rajendran;Vigneshwari, Srinivasan
    • ETRI Journal
    • /
    • v.44 no.4
    • /
    • pp.573-587
    • /
    • 2022
  • The agricultural sector is completely different from other sectors since it completely relies on various natural and climatic factors. Climate changes have many effects, including lack of annual rainfall and pests, heat waves, changes in sea level, and global ozone/atmospheric CO2 fluctuation, on land and agriculture in similar ways. Climate change also affects the environment. Based on these factors, farmers chose their crops to increase productivity in their fields. Many existing agricultural ontologies are either domain-specific or have been created with minimal vocabulary and no proper evaluation framework has been implemented. A new agricultural ontology focused on subdomains is designed to assist farmers using Jaccard relative extractor (JRE) and Naïve Bayes algorithm. The JRE is used to find the similarity between two sentences and words in the agricultural documents and the relationship between two terms is identified via the Naïve Bayes algorithm. In the proposed method, the preprocessing of data is carried out through natural language processing techniques and the tags whose dimensions are reduced are subjected to rule-based formal concept analysis and mapping. The subdomain ontologies of weather, pest, and soil are built separately, and the overall agricultural ontology are built around them. The gold standard for the lexical layer is used to evaluate the proposed technique, and its performance is analyzed by comparing it with different state-of-the-art systems. Precision, recall, F-measure, Matthews correlation coefficient, receiver operating characteristic curve area, and precision-recall curve area are the performance metrics used to analyze the performance. The proposed methodology gives a precision score of 94.40% when compared with the decision tree(83.94%) and K-nearest neighbor algorithm(86.89%) for agricultural ontology construction.

Statistical Approach to Sentiment Classification using MapReduce (맵리듀스를 이용한 통계적 접근의 감성 분류)

  • Kang, Mun-Su;Baek, Seung-Hee;Choi, Young-Sik
    • Science of Emotion and Sensibility
    • /
    • v.15 no.4
    • /
    • pp.425-440
    • /
    • 2012
  • As the scale of the internet grows, the amount of subjective data increases. Thus, A need to classify automatically subjective data arises. Sentiment classification is a classification of subjective data by various types of sentiments. The sentiment classification researches have been studied focused on NLP(Natural Language Processing) and sentiment word dictionary. The former sentiment classification researches have two critical problems. First, the performance of morpheme analysis in NLP have fallen short of expectations. Second, it is not easy to choose sentiment words and determine how much a word has a sentiment. To solve these problems, this paper suggests a combination of using web-scale data and a statistical approach to sentiment classification. The proposed method of this paper is using statistics of words from web-scale data, rather than finding a meaning of a word. This approach differs from the former researches depended on NLP algorithms, it focuses on data. Hadoop and MapReduce will be used to handle web-scale data.

  • PDF