• Title/Summary/Keyword: NLP(Natural Language Processing)

Search Result 166, Processing Time 0.027 seconds

Development of Artificial Intelligence-based Legal Counseling Chatbot System

  • Park, Koo-Rack
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.3
    • /
    • pp.29-34
    • /
    • 2021
  • With the advent of the 4th industrial revolution era, IT technology is creating new services that have not existed by converging with various existing industries and fields. In particular, in the field of artificial intelligence, chatbots and the latest technologies have developed dramatically with the development of natural language processing technology, and various business processes are processed through chatbots. This study is a study on a system that provides a close answer to the question the user wants to find by creating a structural form for legal inquiries through Slot Filling-based chatbot technology, and inputting a predetermined type of question. Using the proposal system, it is possible to construct question-and-answer data in a more structured form of legal information, which is unstructured data in text form. In addition, by managing the accumulated Q&A data through a big data storage system such as Apache Hive and recycling the data for learning, the reliability of the response can be expected to continuously improve.

Improved Transformer Model for Multimodal Fashion Recommendation Conversation System (멀티모달 패션 추천 대화 시스템을 위한 개선된 트랜스포머 모델)

  • Park, Yeong Joon;Jo, Byeong Cheol;Lee, Kyoung Uk;Kim, Kyung Sun
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.1
    • /
    • pp.138-147
    • /
    • 2022
  • Recently, chatbots have been applied in various fields and have shown good results, and many attempts to use chatbots in shopping mall product recommendation services are being conducted on e-commerce platforms. In this paper, for a conversation system that recommends a fashion that a user wants based on conversation between the user and the system and fashion image information, a transformer model that is currently performing well in various AI fields such as natural language processing, voice recognition, and image recognition. We propose a multimodal-based improved transformer model that is improved to increase the accuracy of recommendation by using dialogue (text) and fashion (image) information together for data preprocessing and data representation. We also propose a method to improve accuracy through data improvement by analyzing the data. The proposed system has a recommendation accuracy score of 0.6563 WKT (Weighted Kendall's tau), which significantly improved the existing system's 0.3372 WKT by 0.3191 WKT or more.

Analyzing employment trends in response to AI exposure: K-shaped labor polarization in Korea (인공지능 노출 정도에 따른 고용 추세 분석: K자형 고용 양극화)

  • Lee, Yeseul;Hwang, Hyeonjun
    • Informatization Policy
    • /
    • v.30 no.3
    • /
    • pp.69-91
    • /
    • 2023
  • The impact of technological advancements on employment is a matter of ongoing debate, with discussions on the effects of AI technology development on employment being particularly scarce. This study employs the natural language processing technique (SBERT) and patents to calculate an occupation-based AI exposure score and to analyze employment trends by group. It proposes a method for calculating the AI exposure score based on the similarity between Korean patent information and US job descriptions and linking SOC(U.S.) and KSCO(Korea). The analysis of domestic AI patent applications and regional employment data in the KOSIS Database since 2013 reveals a K-shaped polarization pattern in Korean employment trends among groups with above and below average levels of AI exposure.

AIMS: AI based Mental Healthcare System

  • Ibrahim Alrashide;Hussain Alkhalifah;Abdul-Aziz Al-Momen;Ibrahim Alali;Ghazy Alshaikh;Atta-ur Rahman;Ashraf Saadeldeen;Khalid Aloup
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.225-234
    • /
    • 2023
  • In this era of information and communication technology (ICT), tremendous improvements have been witnessed in our daily lives. The impact of these technologies is subjective and negative or positive. For instance, ICT has brought a lot of ease and versatility in our lifestyles, on the other hand, its excessive use brings around issues related to physical and mental health etc. In this study, we are bridging these both aspects by proposing the idea of AI based mental healthcare (AIMS). In this regard, we aim to provide a platform where the patient can register to the system and take consultancy by providing their assessment by means of a chatbot. The chatbot will send the gathered information to the machine learning block. The machine learning model is already trained and predicts whether the patient needs a treatment by classifying him/her based on the assessment. This information is provided to the mental health practitioner (doctor, psychologist, psychiatrist, or therapist) as clinical decision support. Eventually, the practitioner will provide his/her suggestions to the patient via the proposed system. Additionally, the proposed system prioritizes care, support, privacy, and patient autonomy, all while using a friendly chatbot interface. By using technology like natural language processing and machine learning, the system can predict a patient's condition and recommend the right professional for further help, including in-person appointments if necessary. This not only raises awareness about mental health but also makes it easier for patients to start therapy.

Performance Comparison of Transformer-based Intrusion Detection Model According to the Change of Character Encoding (문자 인코딩 방식의 변화에 따른 트랜스포머 기반 침입탐지 모델의 탐지성능 비교)

  • Kwan-Jae Kim;Soo-Jin Lee
    • Convergence Security Journal
    • /
    • v.24 no.3
    • /
    • pp.41-49
    • /
    • 2024
  • A tokenizer, which is a key component of the Transformer model, lacks the ability to effectively comprehend numerical data. Therefore, to develop a Transformer-based intrusion detection model that can operate within a real-world network environment by training packet payloads as sentences, it is necessary to convert the hexadecimal packet payloads into a character-based format. In this study, we applied three character encoding methods to convert packet payloads into numeric or character format and analyzed how detection performance changes when training them on transformer architecture. The experimental dataset was generated by extracting packet payloads from PCAP files included in the UNSW-NB15 dataset, and the RoBERTa was used as the training model. The experimental results demonstrate that the ISO-8859-1 encoding scheme achieves the highest performance in both binary and multi-class classification. In addition, when the number of tokens is set to 512 and the maximum number of epochs is set to 15, the multi-class classification accuracy is improved to 88.77%.

MF-DCCA ANALYSIS OF INVESTOR SENTIMENT AND FINANCIAL MARKET BASED ON NLP ALGORITHM

  • RUI ZHANG;CAIRANG JIA;JIAN WANG
    • Journal of the Korean Society for Industrial and Applied Mathematics
    • /
    • v.28 no.3
    • /
    • pp.71-87
    • /
    • 2024
  • In this paper, we adopt the MF-DCCA (Multifractal Detrended Cross-Correlation Analysis) method to study the nonlinear correlation between the returns of financial stock markets and investors' sentiment index (SI). The return series of Shanghai Securities Composite Index (SSEC) of China, Shenzhen Securities Component Index (SZI) of China, Nikkei 225 Index (N225) of Japan, and Standard & Poor's 500 Index (S&P500) of the United States are adopted. Firstly, we preliminarily analyze the correlation between SSEC and SI through the Pearson correlation coefficient. In addition, by MF-DCCA, we observe a power-law correlation between investors' sentiment index and SSEC stock market returns, with a significant multifractal correlation. Besides, SI series and SSEC return series have positive persistence. We compare the differences in multifractal cross-correlation between SI and stock return sequences in different markets. We found that the values of SZI-SI in terms of cross-correlation persistence and cross-correlation strength are relatively close to those of SSEC-SI, while the Hxy(2), ∆Hxy, and ∆αxy of N225-SI and S&P500 are much smaller than those of SSEC-SI and SZI-SI. This reason is related to the fact that the investors' sentiment index originated from the Shanghai Composite Index Tieba. The SI is obtained through natural language processing method. Finally, we study the rolling of Hxy(2) and ∆αxy. Results indicate that the macroeconomic environment may cause fluctuations in two sequences of Hxy(2) and ∆αxy.

Research Suggestion for Disaster Prediction using Safety Report of Korea Government (안전신문고를 이용한 재난 예측 방법론 제안)

  • Lee, Jun;Shin, Jindong;Cho, Sangmyeong;Lee, Sanghwa
    • Journal of Korean Society of Disaster and Security
    • /
    • v.12 no.4
    • /
    • pp.15-26
    • /
    • 2019
  • Anjunshinmungo (The safety e-report) has been in operation since 2014, and there are about 1 million cumulative reports by June 2019. This study analyzes the contents of more than 1 million safety newspapers reported at the present time of information age to determine how powerful and meaningful the people's voice and interest are. In particular, we are interested in forecasting ability. We wanted to check whether the report of the safety newspaper was related to possible disasters. To this end, the researchers received data reported in the safety newspaper as text and analyzed it by natural language analysis methodology. Based on this, the newspaper articles during the analysis of the safety newspaper were analyzed, and the correlation between the contents of the newspaper and the newspaper was analyzed. As a result, accidents occurred within a few months as the number of reports related to response and confirmation increased, and analyzing the contents of safety reports previously reported on social instability can be used to predict future disasters.

TAGS: Text Augmentation with Generation and Selection (생성-선정을 통한 텍스트 증강 프레임워크)

  • Kim Kyung Min;Dong Hwan Kim;Seongung Jo;Heung-Seon Oh;Myeong-Ha Hwang
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.10
    • /
    • pp.455-460
    • /
    • 2023
  • Text augmentation is a methodology that creates new augmented texts by transforming or generating original texts for the purpose of improving the performance of NLP models. However existing text augmentation techniques have limitations such as lack of expressive diversity semantic distortion and limited number of augmented texts. Recently text augmentation using large language models and few-shot learning can overcome these limitations but there is also a risk of noise generation due to incorrect generation. In this paper, we propose a text augmentation method called TAGS that generates multiple candidate texts and selects the appropriate text as the augmented text. TAGS generates various expressions using few-shot learning while effectively selecting suitable data even with a small amount of original text by using contrastive learning and similarity comparison. We applied this method to task-oriented chatbot data and achieved more than sixty times quantitative improvement. We also analyzed the generated texts to confirm that they produced semantically and expressively diverse texts compared to the original texts. Moreover, we trained and evaluated a classification model using the augmented texts and showed that it improved the performance by more than 0.1915, confirming that it helps to improve the actual model performance.

Aspect-Based Sentiment Analysis Using BERT: Developing Aspect Category Sentiment Classification Models (BERT를 활용한 속성기반 감성분석: 속성카테고리 감성분류 모델 개발)

  • Park, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.1-25
    • /
    • 2020
  • Sentiment Analysis (SA) is a Natural Language Processing (NLP) task that analyzes the sentiments consumers or the public feel about an arbitrary object from written texts. Furthermore, Aspect-Based Sentiment Analysis (ABSA) is a fine-grained analysis of the sentiments towards each aspect of an object. Since having a more practical value in terms of business, ABSA is drawing attention from both academic and industrial organizations. When there is a review that says "The restaurant is expensive but the food is really fantastic", for example, the general SA evaluates the overall sentiment towards the 'restaurant' as 'positive', while ABSA identifies the restaurant's aspect 'price' as 'negative' and 'food' aspect as 'positive'. Thus, ABSA enables a more specific and effective marketing strategy. In order to perform ABSA, it is necessary to identify what are the aspect terms or aspect categories included in the text, and judge the sentiments towards them. Accordingly, there exist four main areas in ABSA; aspect term extraction, aspect category detection, Aspect Term Sentiment Classification (ATSC), and Aspect Category Sentiment Classification (ACSC). It is usually conducted by extracting aspect terms and then performing ATSC to analyze sentiments for the given aspect terms, or by extracting aspect categories and then performing ACSC to analyze sentiments for the given aspect category. Here, an aspect category is expressed in one or more aspect terms, or indirectly inferred by other words. In the preceding example sentence, 'price' and 'food' are both aspect categories, and the aspect category 'food' is expressed by the aspect term 'food' included in the review. If the review sentence includes 'pasta', 'steak', or 'grilled chicken special', these can all be aspect terms for the aspect category 'food'. As such, an aspect category referred to by one or more specific aspect terms is called an explicit aspect. On the other hand, the aspect category like 'price', which does not have any specific aspect terms but can be indirectly guessed with an emotional word 'expensive,' is called an implicit aspect. So far, the 'aspect category' has been used to avoid confusion about 'aspect term'. From now on, we will consider 'aspect category' and 'aspect' as the same concept and use the word 'aspect' more for convenience. And one thing to note is that ATSC analyzes the sentiment towards given aspect terms, so it deals only with explicit aspects, and ACSC treats not only explicit aspects but also implicit aspects. This study seeks to find answers to the following issues ignored in the previous studies when applying the BERT pre-trained language model to ACSC and derives superior ACSC models. First, is it more effective to reflect the output vector of tokens for aspect categories than to use only the final output vector of [CLS] token as a classification vector? Second, is there any performance difference between QA (Question Answering) and NLI (Natural Language Inference) types in the sentence-pair configuration of input data? Third, is there any performance difference according to the order of sentence including aspect category in the QA or NLI type sentence-pair configuration of input data? To achieve these research objectives, we implemented 12 ACSC models and conducted experiments on 4 English benchmark datasets. As a result, ACSC models that provide performance beyond the existing studies without expanding the training dataset were derived. In addition, it was found that it is more effective to reflect the output vector of the aspect category token than to use only the output vector for the [CLS] token as a classification vector. It was also found that QA type input generally provides better performance than NLI, and the order of the sentence with the aspect category in QA type is irrelevant with performance. There may be some differences depending on the characteristics of the dataset, but when using NLI type sentence-pair input, placing the sentence containing the aspect category second seems to provide better performance. The new methodology for designing the ACSC model used in this study could be similarly applied to other studies such as ATSC.

Deep learning-based Multilingual Sentimental Analysis using English Review Data (영어 리뷰데이터를 이용한 딥러닝 기반 다국어 감성분석)

  • Sung, Jae-Kyung;Kim, Yung Bok;Kim, Yong-Guk
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.9-15
    • /
    • 2019
  • Large global online shopping malls, such as Amazon, offer services in English or in the language of a country when their products are sold. Since many customers purchase products based on the product reviews, the shopping malls actively utilize the sentimental analysis technique in judging preference of each product using the large amount of review data that the customer has written. And the result of such analysis can be used for the marketing to look the potential shoppers. However, it is difficult to apply this English-based semantic analysis system to different languages used around the world. In this study, more than 500,000 data from Amazon fine food reviews was used for training a deep learning based system. First, sentiment analysis evaluation experiments were carried out with three models of English test data. Secondly, the same data was translated into seven languages (Korean, Japanese, Chinese, Vietnamese, French, German and English) and then the similar experiments were done. The result suggests that although the accuracy of the sentimental analysis was 2.77% lower than the average of the seven countries (91.59%) compared to the English (94.35%), it is believed that the results of the experiment can be used for practical applications.