• Title/Summary/Keyword: KeyBERT

Search Result 12, Processing Time 0.133 seconds

Analysis of trends in deep learning and reinforcement learning

  • Dong-In Choi;Chungsoo Lim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.10
    • /
    • pp.55-65
    • /
    • 2023
  • In this paper, we apply KeyBERT(Keyword extraction with Bidirectional Encoder Representations of Transformers) algorithm-driven topic extraction and topic frequency analysis to deep learning and reinforcement learning research to discover the rapidly changing trends in them. First, we crawled abstracts of research papers on deep learning and reinforcement learning, and temporally divided them into two groups. After pre-processing the crawled data, we extracted topics using KeyBERT algorithm, and then analyzed the extracted topics in terms of topic occurrence frequency. This analysis reveals that there are distinct trends in research work of all analyzed algorithms and applications, and we can clearly tell which topics are gaining more interest. The analysis also proves the effectiveness of the utilized topic extraction and topic frequency analysis in research trend analysis, and this trend analysis scheme is expected to be used for research trend analysis in other research fields. In addition, the analysis can provide insight into how deep learning will evolve in the near future, and provide guidance for select research topics and methodologies by informing researchers of research topics and methodologies which are recently attracting attention.

Methodology for Deriving Required Quality of Product Using Analysis of Customer Reviews (사용자 리뷰 분석을 통한 제품 요구품질 도출 방법론)

  • Yerin Yu;Jeongeun Byun;Kuk Jin Bae;Sumin Seo;Younha Kim;Namgyu Kim
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.2
    • /
    • pp.1-18
    • /
    • 2023
  • Recently, as technology development has accelerated and product life cycles have been shortened, it is necessary to derive key product features from customers in the R&D planning and evaluation stage. More companies want differentiated competitiveness by providing consumer-tailored products based on big data and artificial intelligence technology. To achieve this, the need to correctly grasp the required quality, which is a requirement of consumers, is increasing. However, the existing methods are centered on suppliers or domain experts, so there is a gap from the actual perspective of consumers. In other words, product attributes were defined by suppliers or field experts, but this may not consider consumers' actual perspective. Accordingly, the demand for deriving the product's main attributes through reviews containing consumers' perspectives has recently increased. Therefore, we propose a review data analysis-based required quality methodology containing customer requirements. Specifically, a pre-training language model with a good understanding of Korean reviews was established, consumer intent was correctly identified, and key contents were extracted from the review through a combination of KeyBERT and topic modeling to derive the required quality for each product. RevBERT, a Korean review domain-specific pre-training language model, was established through further pre-training. By comparing the existing pre-training language model KcBERT, we confirmed that RevBERT had a deeper understanding of customer reviews. In addition, all processes other than that of selecting the required quality were linked to the automation process, resulting in the automation of deriving the required quality based on data.

Korean Co-reference Resolution using BERT with Surfaceform (표층형을 이용한 BERT 기반 한국어 상호참조해결)

  • Heo, Cheolhun;Kim, Kuntae;Choi, Key-sun
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.67-70
    • /
    • 2019
  • 상호참조해결은 자연언어 문서 내에서 같은 개체를 나타내는 언급들을 연결하는 문제다. 대명사, 지시 관형사, 축약어, 동음이의어와 같은 언급들의 상호참조를 해결함으로써, 다양한 자연언어 처리 문제의 성능 향상에 기여할 수 있다. 본 논문에서는 현재 영어권 상호참조해결에서 좋은 성능을 내고 있는 BERT 기반 상호참조해결 모델에 한국어 데이터 셋를 적용시키고 표층형을 이용한 규칙을 추가했다. 본 논문의 모델과 기존의 모델들을 실험하여 성능을 비교하였다. 기존의 연구들과는 다르게 적은 특질로 정밀도 73.59%, 재현율 71.1%, CoNLL F1-score 72.31%의 성능을 보였다. 모델들의 결과를 분석하여 BERT 기반의 모델이 다양한 특질을 사용한 기존 딥러닝 모델에 비해 문맥적 요소를 잘 파악하는 것을 확인했다.

  • PDF

Explaining the Translation Error Factors of Machine Translation Services Using Self-Attention Visualization (Self-Attention 시각화를 사용한 기계번역 서비스의 번역 오류 요인 설명)

  • Zhang, Chenglong;Ahn, Hyunchul
    • Journal of Information Technology Services
    • /
    • v.21 no.2
    • /
    • pp.85-95
    • /
    • 2022
  • This study analyzed the translation error factors of machine translation services such as Naver Papago and Google Translate through Self-Attention path visualization. Self-Attention is a key method of the Transformer and BERT NLP models and recently widely used in machine translation. We propose a method to explain translation error factors of machine translation algorithms by comparison the Self-Attention paths between ST(source text) and ST'(transformed ST) of which meaning is not changed, but the translation output is more accurate. Through this method, it is possible to gain explainability to analyze a machine translation algorithm's inside process, which is invisible like a black box. In our experiment, it was possible to explore the factors that caused translation errors by analyzing the difference in key word's attention path. The study used the XLM-RoBERTa multilingual NLP model provided by exBERT for Self-Attention visualization, and it was applied to two examples of Korean-Chinese and Korean-English translations.

Towards Improving Causality Mining using BERT with Multi-level Feature Networks

  • Ali, Wajid;Zuo, Wanli;Ali, Rahman;Rahman, Gohar;Zuo, Xianglin;Ullah, Inam
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.10
    • /
    • pp.3230-3255
    • /
    • 2022
  • Causality mining in NLP is a significant area of interest, which benefits in many daily life applications, including decision making, business risk management, question answering, future event prediction, scenario generation, and information retrieval. Mining those causalities was a challenging and open problem for the prior non-statistical and statistical techniques using web sources that required hand-crafted linguistics patterns for feature engineering, which were subject to domain knowledge and required much human effort. Those studies overlooked implicit, ambiguous, and heterogeneous causality and focused on explicit causality mining. In contrast to statistical and non-statistical approaches, we present Bidirectional Encoder Representations from Transformers (BERT) integrated with Multi-level Feature Networks (MFN) for causality recognition, called BERT+MFN for causality recognition in noisy and informal web datasets without human-designed features. In our model, MFN consists of a three-column knowledge-oriented network (TC-KN), bi-LSTM, and Relation Network (RN) that mine causality information at the segment level. BERT captures semantic features at the word level. We perform experiments on Alternative Lexicalization (AltLexes) datasets. The experimental outcomes show that our model outperforms baseline causality and text mining techniques.

Transformer-based reranking for improving Korean morphological analysis systems

  • Jihee Ryu;Soojong Lim;Oh-Woog Kwon;Seung-Hoon Na
    • ETRI Journal
    • /
    • v.46 no.1
    • /
    • pp.137-153
    • /
    • 2024
  • This study introduces a new approach in Korean morphological analysis combining dictionary-based techniques with Transformer-based deep learning models. The key innovation is the use of a BERT-based reranking system, significantly enhancing the accuracy of traditional morphological analysis. The method generates multiple suboptimal paths, then employs BERT models for reranking, leveraging their advanced language comprehension. Results show remarkable performance improvements, with the first-stage reranking achieving over 20% improvement in error reduction rate compared with existing models. The second stage, using another BERT variant, further increases this improvement to over 30%. This indicates a significant leap in accuracy, validating the effectiveness of merging dictionary-based analysis with contemporary deep learning. The study suggests future exploration in refined integrations of dictionary and deep learning methods as well as using probabilistic models for enhanced morphological analysis. This hybrid approach sets a new benchmark in the field and offers insights for similar challenges in language processing applications.

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

CORRECT? CORECT!: Classification of ESG Ratings with Earnings Call Transcript

  • Haein Lee;Hae Sun Jung;Heungju Park;Jang Hyun Kim
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.4
    • /
    • pp.1090-1100
    • /
    • 2024
  • While the incorporating ESG indicator is recognized as crucial for sustainability and increased firm value, inconsistent disclosure of ESG data and vague assessment standards have been key challenges. To address these issues, this study proposes an ambiguous text-based automated ESG rating strategy. Earnings Call Transcript data were classified as E, S, or G using the Refinitiv-Sustainable Leadership Monitor's over 450 metrics. The study employed advanced natural language processing techniques such as BERT, RoBERTa, ALBERT, FinBERT, and ELECTRA models to precisely classify ESG documents. In addition, the authors computed the average predicted probabilities for each label, providing a means to identify the relative significance of different ESG factors. The results of experiments demonstrated the capability of the proposed methodology in enhancing ESG assessment criteria established by various rating agencies and highlighted that companies primarily focus on governance factors. In other words, companies were making efforts to strengthen their governance framework. In conclusion, this framework enables sustainable and responsible business by providing insight into the ESG information contained in Earnings Call Transcript data.

Simple and effective neural coreference resolution for Korean language

  • Park, Cheoneum;Lim, Joonho;Ryu, Jihee;Kim, Hyunki;Lee, Changki
    • ETRI Journal
    • /
    • v.43 no.6
    • /
    • pp.1038-1048
    • /
    • 2021
  • We propose an end-to-end neural coreference resolution for the Korean language that uses an attention mechanism to point to the same entity. Because Korean is a head-final language, we focused on a method that uses a pointer network based on the head. The key idea is to consider all nouns in the document as candidates based on the head-final characteristics of the Korean language and learn distributions over the referenced entity positions for each noun. Given the recent success of applications using bidirectional encoder representation from transformer (BERT) in natural language-processing tasks, we employed BERT in the proposed model to create word representations based on contextual information. The experimental results indicated that the proposed model achieved state-of-the-art performance in Korean language coreference resolution.

Research on the Financial Data Fraud Detection of Chinese Listed Enterprises by Integrating Audit Opinions

  • Leiruo Zhou;Yunlong Duan;Wei Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3218-3241
    • /
    • 2023
  • Financial fraud undermines the sustainable development of financial markets. Financial statements can be regarded as the key source of information to obtain the operating conditions of listed companies. Current research focuses more on mining financial digital data instead of looking into text data. However, text data can reveal emotional information, which is an important basis for detecting financial fraud. The audit opinion of the financial statement is especially the fair opinion of a certified public accountant on the quality of enterprise financial reports. Therefore, this research was carried out by using the data features of 4,153 listed companies' financial annual reports and audits of text opinions in the past six years, and the paper puts forward a financial fraud detection model integrating audit opinions. First, the financial data index database and audit opinion text database were built. Second, digitized audit opinions with deep learning Bert model was employed. Finally, both the extracted audit numerical characteristics and the financial numerical indicators were used as the training data of the LightGBM model. What is worth paying attention to is that the imbalanced distribution of sample labels is also one of the focuses of financial fraud research. To solve this problem, data enhancement and Focal Loss feature learning functions were used in data processing and model training respectively. The experimental results show that compared with the conventional financial fraud detection model, the performance of the proposed model is improved greatly, with Area Under the Curve (AUC) and Accuracy reaching 81.42% and 78.15%, respectively.