• Title/Summary/Keyword: natural language question

Search Result 85, Processing Time 0.024 seconds

Development of Artificial Intelligence-based Legal Counseling Chatbot System

  • Park, Koo-Rack
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.3
    • /
    • pp.29-34
    • /
    • 2021
  • With the advent of the 4th industrial revolution era, IT technology is creating new services that have not existed by converging with various existing industries and fields. In particular, in the field of artificial intelligence, chatbots and the latest technologies have developed dramatically with the development of natural language processing technology, and various business processes are processed through chatbots. This study is a study on a system that provides a close answer to the question the user wants to find by creating a structural form for legal inquiries through Slot Filling-based chatbot technology, and inputting a predetermined type of question. Using the proposal system, it is possible to construct question-and-answer data in a more structured form of legal information, which is unstructured data in text form. In addition, by managing the accumulated Q&A data through a big data storage system such as Apache Hive and recycling the data for learning, the reliability of the response can be expected to continuously improve.

Homonym Disambiguation based on Mutual Information and Sense-Tagged Compound Noun Dictionary (상호정보량과 복합명사 의미사전에 기반한 동음이의어 중의성 해소)

  • Heo, Jeong;Seo, Hee-Cheol;Jang, Myung-Gil
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.12
    • /
    • pp.1073-1089
    • /
    • 2006
  • The goal of Natural Language Processing(NLP) is to make a computer understand a natural language and to deliver the meanings of natural language to humans. Word sense Disambiguation(WSD is a very important technology to achieve the goal of NLP. In this paper, we describe a technology for automatic homonyms disambiguation using both Mutual Information(MI) and a Sense-Tagged Compound Noun Dictionary. Previous research work using word definitions in dictionary suffered from the problem of data sparseness because of the use of exact word matching. Our work overcomes this problem by using MI which is an association measure between words. To reflect language features, the rate of word-pairs with MI values, sense frequency and site of word definitions are used as weights in our system. We constructed a Sense-Tagged Compound Noun Dictionary for high frequency compound nouns and used it to resolve homonym sense disambiguation. Experimental data for testing and evaluating our system is constructed from QA(Question Answering) test data which consisted of about 200 query sentences and answer paragraphs. We performed 4 types of experiments. In case of being used only MI, the result of experiment showed a precision of 65.06%. When we used the weighted values, we achieved a precision of 85.35% and when we used the Sense-Tagged Compound Noun Dictionary, we achieved a precision of 88.82%, respectively.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

A Study of Speech Control Tags Based on Semantic Information of a Text (텍스트의 의미 정보에 기반을 둔 음성컨트롤 태그에 관한 연구)

  • Chang, Moon-Soo;Chung, Kyeong-Chae;Kang, Sun-Mee
    • Speech Sciences
    • /
    • v.13 no.4
    • /
    • pp.187-200
    • /
    • 2006
  • The speech synthesis technology is widely used and its application area is also being broadened to an automatic response service, a learning system for handicapped person, etc. However, the sound quality of the speech synthesizer has not yet reached to the satisfactory level of users. To make a synthesized speech, the existing synthesizer generates rhythms only by the interval information such as space and comma or by several punctuation marks such as a question mark and an exclamation mark so that it is not easy to generate natural rhythms of people even though it is based on mass speech database. To make up for the problem, there is a way to select rhythms after processing language from a higher level information. This paper proposes a method for generating tags for controling rhythms by analyzing the meaning of sentence with speech situation information. We use the Systemic Functional Grammar (SFG) [4] which analyzes the meaning of sentence with speech situation information considering the sentence prior to the given one, the situation of a conversation, the relationship among people in the conversation, etc. In this study, we generate Semantic Speech Control Tag (SSCT) by the result of SFG's meaning analysis and the voice wave analysis.

  • PDF

The Long Term Effectiveness of Kindergarten Education (유치원(幼稚園) 교육(敎育)의 효과(效果)와 지속성(持續性)에 관한 연구(硏究))

  • Kwon, Young Re
    • Korean Journal of Child Studies
    • /
    • v.3
    • /
    • pp.97-113
    • /
    • 1982
  • Problems This study was aimed to find out the effects of Kindergarten education on primary school child's learning achievement and it's continuance. The specific question to be investigated in this study can he summarized as follows. 1. How did the kindergarten education effect upon primary school child's learning achievement and behavioral development? 2. How did the differences of learning achievement and behavioral development between groups of children attending kindergarten and not attending kindergarten? Methods 1. Subjects One hundred and twenty children of sixth grade and one hundred primary school's teachers were employed 1) Sixty of children were experimented groups who had received kindergarten education. Sixty of Children are Control group who had not received Kindergarten education. The subjects were sampled in N primary school in seoul and Y primary school in Taejeon. Sixty of children were boys and sixty of children were girls. 2) Fifty of teachers were sampled in N primary school in Seoul and Fifty of teachers were sampled in Y primary school in Taejeon. 2. Instrument 1) School cumulative record. School cumulative record was referred to find out the differences of school records on learning achievement in morality, National language, social studies, arithmatics, natural science, gymnastics, music, fine arts and behavioral development in industry, responsibility, coorperation, indeperdence, law abiding spirit between experimental group and control group. 2) Questionnaire paper which constructed 10 articles. Questionnaire paper asked to primary school teachers for effects of kindergarten education. Results The results of date analysis can summarized as follows. 1. Kindergarten education, generally, is regarded effective at least for school work achievement but not so effective for morality and physical education. 2. Kindergarten education is not so effective in all school works of primary school in its continuance since its effectiveness begins to fade away from the 4th grade of primary school. But such continuance tends to be outstanding especially in national language, social subjects and arts. 3. The experiences of kindergarten education is not much helpful to behavioral development during primary school. 4. Primary school teachers recognize that kindergarten education contributes to curriculum work and behavioral development for primary school children.

  • PDF

Chatbot Design Method Using Hybrid Word Vector Expression Model Based on Real Telemarketing Data

  • Zhang, Jie;Zhang, Jianing;Ma, Shuhao;Yang, Jie;Gui, Guan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.4
    • /
    • pp.1400-1418
    • /
    • 2020
  • In the development of commercial promotion, chatbot is known as one of significant skill by application of natural language processing (NLP). Conventional design methods are using bag-of-words model (BOW) alone based on Google database and other online corpus. For one thing, in the bag-of-words model, the vectors are Irrelevant to one another. Even though this method is friendly to discrete features, it is not conducive to the machine to understand continuous statements due to the loss of the connection between words in the encoded word vector. For other thing, existing methods are used to test in state-of-the-art online corpus but it is hard to apply in real applications such as telemarketing data. In this paper, we propose an improved chatbot design way using hybrid bag-of-words model and skip-gram model based on the real telemarketing data. Specifically, we first collect the real data in the telemarketing field and perform data cleaning and data classification on the constructed corpus. Second, the word representation is adopted hybrid bag-of-words model and skip-gram model. The skip-gram model maps synonyms in the vicinity of vector space. The correlation between words is expressed, so the amount of information contained in the word vector is increased, making up for the shortcomings caused by using bag-of-words model alone. Third, we use the term frequency-inverse document frequency (TF-IDF) weighting method to improve the weight of key words, then output the final word expression. At last, the answer is produced using hybrid retrieval model and generate model. The retrieval model can accurately answer questions in the field. The generate model can supplement the question of answering the open domain, in which the answer to the final reply is completed by long-short term memory (LSTM) training and prediction. Experimental results show which the hybrid word vector expression model can improve the accuracy of the response and the whole system can communicate with humans.

Design and Implementation of Automatic Marking System for a Subjectivity Problem of the Program (프로그램의 주관식 문제 자동 채점 시스템 설계 및 구현)

  • Jung, Eun-Mi;Choi, Mi-Sun;Shim, Jae-Chang
    • Journal of Korea Multimedia Society
    • /
    • v.12 no.5
    • /
    • pp.767-776
    • /
    • 2009
  • The purpose of this paper is to design, implement and test the automatic marking system for programming languages using key-words and boolean operations to solve the processing problems of natural languages. There are accurate grammar systems and key-words in programming languages. Using these characteristics, We have designed, programmed, and tested automatic marking system for programming languages through key-words and boolean operations in this paper. We have categorized programming languages into 7 types as the type of answer and when a professor input any key-words, the system make him put conjunction with the special character. It can be logical expressions instantly so that the system easily operates. We asked 10 students who are majoring in computer engineering to take a test on the paper and web to show how well automatic marking system that we have programmed works. Then We requested 3 professors if the subject problems marked objectively. As a result, automatic marking system proved to be appropriate. We have proposed the way of using key-words and boolean operation for prohibiting huge natural language processing in marking of subjective question. It promotes efficiency rate of programming, objectivity and speed through the transferal to the web for marking since the system prohibits marker to include personal opinion on marking and gives feedback quickly

  • PDF

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

A Study of Segmental and Syllabic Intervals of Canonical Babbling and Early Speech

  • Chen, Xiaoxiang;Xiao, Yunnan
    • Cross-Cultural Studies
    • /
    • v.28
    • /
    • pp.115-139
    • /
    • 2012
  • Interval or duration of segments, syllables, words and phrases is an important acoustic feature which influences the naturalness of speech. A number of cross-sectional studies regarding acoustic characteristics of children's speech development found that intervals of segments, syllables, words and phrases tend to change with the growing age. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991), it has been supported by quite a number of researches on the basis of cross-sectional studies (Tingley & Allen,1975; Kent & Forner,1980; Chermak & Schneiderman, 1986), but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). Researchers seem to come up with conflicting postulations and inconsistent results about the change trends concerning intervals of segments, syllables, words and phrases, leaving it as an issue unresolved. Most acoustic investigations of children's speech production have been conducted via cross-sectional designs, which involves studying several groups of children. So far, there are only a few longitudinal studies. This issue needs more longitudinal investigations; moreover, the acoustic measures of the intervals of child speech are hardly available. All former studies focus on word stages excluding the babbling stages especially the canonical babbling stage, but we need to find out when concrete changes of intervals begin to occur and what causes the changes. Therefore, we conducted an acoustic study of interval characteristics of segments and words concerning Canonical Babble ( CB) and early speech in an infant aged from 0;9 to 2;4 acquiring Mandarin Chinese. The current research addresses the following two questions: 1. Whether decreases in interval would be greater when children were younger and smaller when they were older or vice versa? 2. Whether the child speech concerning the acoustic features of interval drifts in the direction of the language they are exposed to? The female infant whose L1 was Southern Mandarin living in Changsha was audio- and video-taped at her home for about one hour almost on a weekly basis during her age range from 0;9 to 2;4 under natural observation by us investigators. The recordings were digitized. Parts of the digitized material were labeled. All the repetitions were excluded. The utterances were extracted from 44 sessions ranging from 30 minutes to one hour. The utterances were divided into segments as well as syllable-sized units. Age stages are 0;9-1;0,1;1-1;5, 1;6-2;0, 2;1-2;4. The subject was a monolingual normal child from parents with a good education. The infant was audio-and video-taped in her home almost every week. The data were digitized, segments and syllables from 44 sessions spanning the transition from babble to speech were transcribed in narrow IPA and coded for analysis. Babble was coded from age 0;9-1;0, and words were coded from 1;0 to 2;4, the data has been checked by two professionally trained persons who majored in phonetics. The present investigation is a longitudinal analysis of some temporal characteristics of the child speech during the age periods of 0;9-1;0, 1;1-1;5, 1;6-2;0, 2;1-2;4. The answer to Research Question 1 is that our results are in agreement with neither of the hypotheses. One hypothesis assumed that decreases in intervals would be greater when children were younger and smaller decreases in intervals when older (Thelen,1991); but the other hypothesis predicted that decreases in intervals would be smaller when children were younger and greater decreases in intervals when older (Smith, Kenney & Hussain, 1996). On the whole, there is a tendency of decrease in segmental and syllabic duration with the growing age, but the changes are not drastic and abrupt. For example, /a/ after /k/ in Table 1 has greater decrease during 1;1-1;5, while /a/ after /p/, /t/ and /w/ has greater decrease during 2;1-2;4. /ka/ has greater decrease during 1;1-1;5, while /ta/ and /na/ has greater decrease during 2;1-2;4.Across the age periods, interval change experiences lots of fluctuation all the time. The answer to Research Question 2 is yes. Babbling stage is a period in which the children's acoustic features of intervals of segments, syllables, words and phrases is shifted in the direction of the language to be learned, babbling and children's speech emergence is greatly influenced by ambient language. The phonetic changes in terms of duration would go on until as late as 10-12 years of age before reaching adult-like levels. Definitely, with the increase of exposure to ambient language, the variation would be less and less until they attain the adult-like competence. Via the analysis of the SPSS 15.0, the decrease of segmental and syllabic intervals across the four age periods proves to be of no significant difference (p>0.05). It means that the change of segmental and syllabic intervals is continuous. It reveals that the process of child speech development is gradual and cumulative.

KG_VCR: A Visual Commonsense Reasoning Model Using Knowledge Graph (KG_VCR: 지식 그래프를 이용하는 영상 기반 상식 추론 모델)

  • Lee, JaeYun;Kim, Incheol
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.9 no.3
    • /
    • pp.91-100
    • /
    • 2020
  • Unlike the existing Visual Question Answering(VQA) problems, the new Visual Commonsense Reasoning(VCR) problems require deep common sense reasoning for answering questions: recognizing specific relationship between two objects in the image, presenting the rationale of the answer. In this paper, we propose a novel deep neural network model, KG_VCR, for VCR problems. In addition to make use of visual relations and contextual information between objects extracted from input data (images, natural language questions, and response lists), the KG_VCR also utilizes commonsense knowledge embedding extracted from an external knowledge base called ConceptNet. Specifically the proposed model employs a Graph Convolutional Neural Network(GCN) module to obtain commonsense knowledge embedding from the retrieved ConceptNet knowledge graph. By conducting a series of experiments with the VCR benchmark dataset, we show that the proposed KG_VCR model outperforms both the state of the art(SOTA) VQA model and the R2C VCR model.