• Title/Summary/Keyword: LDA (Latent Dirichlet allocation)

Search Result 183, Processing Time 0.027 seconds

Semi-Supervised Answer Type Classification For Question-Answering System (질의 응답 시스템을 위한 반교사 기반의 정답 유형 분류)

  • Park, Seonyeong;Lee, Donghyeon;Kim, Yonghee;Ryu, Seonghan;Lee, Gary Geunbae
    • Annual Conference on Human and Language Technology
    • /
    • 2013.10a
    • /
    • pp.45-49
    • /
    • 2013
  • 기존 연구에서는 질의 응답 시스템에서 정답 유형을 분류하기 위해 패턴 매칭 방식이나 교사 학습(Supervised Learning)을 이용했다. 패턴 매칭 방식은 질의 분석을 통해 수동으로 패턴을 구축해야 한다. 교사 학습에서는 훈련 데이터 전체에 정답 유형이 태깅(Tagging)되어야 하며, 이를 위해서는 사용자의 질의에 정답 유형을 수동으로 태깅하는 작업이 많이 필요하다. 웹을 통해 정답 유형이 태깅되지 않은 대용량의 사용자 질의 말뭉치를 구할 수 있지만, 이 데이터에는 정답 유형이 태깅되어 있지 않다. 따라서, 대용량의 사용자 질의에 비례하여, 정답 유형을 수동으로 태깅하는 작업량이 증가한다. 앞서 언급한 두 가지 방법론에서, 정답 유형 분류를 위해 수작업이 많이 필요하다는 문제점을 해결하고자 본 논문에서는 일부 태깅된 훈련 데이터를 필요로 하는 반교사 학습(Semi-supervised Learning)에 기반한 정답 유형 분류를 제안한다. 이는 정답 유형 분류 작업에 필요한 노동력을 최소화함으로 대용량의 데이터를 통한 효율적 질의 응답 시스템 구축을 가능하게 한다.

  • PDF

Applying Topic Modeling and Similarity for Predicting Bug Severity in Cross Projects

  • Yang, Geunseok;Min, Kyeongsic;Lee, Jung-Won;Lee, Byungjeong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.3
    • /
    • pp.1583-1598
    • /
    • 2019
  • Recently, software has increased in complexity and been applied in various industrial fields. As a result, the presence of software bugs cannot be avoided. Various bug severity prediction methodologies have been proposed, but their performance needs to be further improved. In this study, we propose a novel technique for bug severity prediction in cross projects such as Eclipse, Mozilla, WireShark, and Xamarin by using topic modeling and similarity (i.e., KL-divergence). First, we construct topic models from bug repositories in cross projects using Latent Dirichlet Allocation (LDA). Then, we find topics in each project that contain the most numerous similar bug reports by using a new bug report. Next, we extract the bug reports belonging to the selected topics and input them to a Naïve Bayes Multinomial (NBM) algorithm. Finally, we predict the bug severity in the new bug report. In order to evaluate the performance of our approach and to verify the difference between cross projects and single project, we compare it with the Naïve Bayes Multinomial approach; the Lamkanfi methodology, which is a well-known bug severity prediction approach; and an emotional similarity-based bug severity prediction approach. Our approach exhibits a better performance than the compared methods.

Research trend analysis of Korean new graduate nurses using topic modeling (토픽모델링을 활용한 신규간호사 관련 국내 연구동향 분석)

  • Park, Seungmi;Lee, Jung Lim
    • The Journal of Korean Academic Society of Nursing Education
    • /
    • v.27 no.3
    • /
    • pp.240-250
    • /
    • 2021
  • Purpose: The aim of this study is to analyze the research trends of articles on just graduated Korean nurses during the past 10 years for exploring strategies for clinical adaptation. Methods: The topics of new graduate nurses were extracted from 110 articles that have been published in Korean journals between January 2010 and July 2020. Abstracts were retrieved from 4 databases (DBpia, RISS, KISS and Google scholar). Keywords were extracted from the abstracts and cleaned using semantic morphemes. Network analysis and topic modeling were performed using the NetMiner program. Results: The core keywords included 'education', 'training', 'program', 'skill', 'care', 'performance', and 'satisfaction'. In recent articles on new graduate nurses, three major topics were extracted by Latent Dirichlet Allocation (LDA) techniques: 'turnover', 'adaptation', 'education'. Conclusion: Previous articles focused on exploring the factors related to the adaptation and turnover intentions of new graduate nurses. It is necessary to conduct further research focused on various interventions at the individual, task, and organizational levels to improve the retention of new graduate nurses.

Modeling Domestic News Topics for Mongolia: Focusing on Changes in Press on Diplomatic Relations between the two countries after the establishment of Diplomatic ties between Korea and Mongolia (몽골에 대한 국내 뉴스 토픽 모델링: 한몽 수교 이후 양국 관계 보도 양상 변화를 중심으로)

  • Yoon, Ji-Soo;Jin, XianMei
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.4
    • /
    • pp.37-46
    • /
    • 2022
  • During the study, big data analysis was conducted focusing on domestic media reports related to Mongolia. The Latent Dirichlet Allocation(LDA) topic modeling was conducted using 130,000 articles with the keyword 'Mongolia' as the target of analysis. As a result of deriving and examining major topics for each period, there were disappearing subjects as the diplomacy level was raised, but most appeared in the beginning were remained and additional issues in various fields were shown.

Technology Mining and Sentiment Analysis on Hydrogen Fuel Cell Using National R&D and Social Data (국가R&D와 소셜 데이터를 활용한 수소연료전지 기술마이닝과 감성분석)

  • Lee, Byeong-Hee;Choi, Jung-Woo;Kim, Tae-Hyun
    • Annual Conference of KIPS
    • /
    • 2022.11a
    • /
    • pp.341-343
    • /
    • 2022
  • 온실가스 배출 문제가 세계적인 현안으로 부각되면서 수소를 에너지원으로 사용하는 수소경제가 주목받고 있다. 수소연료전지는 수소경제의 구성요소 중 하나로, 수소를 활용해 열과 전기를 생산하며 에너지 변환 효율이 높이는데 장점이 있다. 본 연구는 세계적인 온라인 커뮤니티인 레딧(Reddit)에서 수집한 수소연료전지와 관련된 소셜 데이터를 텍스트마이닝과 감성분석 기법으로 분석하였다. 분석 결과 9,211건의 댓글을 LDA(Latent Dirichlet Allocation)을 이용해 4개의 토픽 그룹으로 분류할 수 있었다. 이 중 수소연료전지와 관련이 높은 그룹을 선정해 STM(Structural Topic Model) 분석으로 10개 토픽을 추출하였고, 기후 환경, 수소 산업, 수소 차와 관련 있는 토픽 3개를 발견할 수 있었다. 이 연구 결과를 통해 수소연료전지의 세계적으로 실제적인 내용을 빠르고 효과적으로 파악하여 수소연료전지에 대한 예측하고, 우리나라의 수소연료전지 관련 국가R&D의 정책적 방향을 제시하고자 한다.

Extracting User-Specific Advertising Keywords Based on Textual Data Mining from KakaoTalk (카카오톡에서의 텍스트 데이터 마이닝 기반의 사용자별 적합 광고 키워드 도출 )

  • Yerim Jeon;Dayeong So;Jimin Lee;Eunjin (Jinny) Jo;Jihoon Moon
    • Annual Conference of KIPS
    • /
    • 2023.05a
    • /
    • pp.368-369
    • /
    • 2023
  • 대화 데이터 기반 광고 추천은 광고 마케팅에서 고객 맞춤형 광고 제공, 마케팅 효과 극대화 등을 위한 중요한 기술로 주목받고 있다. 본 논문에서는 모바일 인스턴스 메신저인 카카오톡 대화창에서 발생한 텍스트 데이터를 기반으로 대화 내용을 분석하여 대화 주제별 적절한 광고 키워드를 제안한다. 이를 위해 주제별 대화 내용을 미용, 식음료, 상거래로 세분하고 KoNLPy 의 Okt 를 이용하여 텍스트 전처리를 수행하고 키워드별로 빈도수를 뽑아 워드 클라우드를 제시한다. 또한, 잠재 디리클레 할당(Latent Dirichlet Allocation, LDA)을 기반으로 대화 주제를 세분화한 뒤 라벨링을 통해 주제별 대화 키워드를 분석한다. 실험 결과, 대화 주제를 온라인 쇼핑, 헤어, 뷰티 관리, 음식으로 나눌 수 있었으며, 토픽별 상위 키워드를 Word2Vec 을 통해 특정 단어와 유사한 키워드를 도출하여 적절한 광고 키워드를 제시할 수 있었다.

Analysis of Reviews from Metaverse Platform Users Based on Topic Modeling

  • Jung Seung Lee
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.3
    • /
    • pp.93-104
    • /
    • 2024
  • This study conducts an in-depth analysis of user reviews from three leading metaverse platforms - Minecraft, Roblox, and Zepeto - using advanced topic modeling techniques to uncover key factors for business success. By examining a substantial dataset of user feedback, we identified and categorized the main themes and concerns expressed by users. Our analysis revealed that common issues across all platforms include technical functionality problems, user engagement and interest, payment concerns, and connection difficulties. Specifically, Minecraft users highlighted the importance of adventure and creativity, Roblox users expressed significant concerns about security and fraud, and Zepeto users focused heavily on the fairness of the in-game economy. The findings suggest that for metaverse platforms to achieve sustained success, they must prioritize the resolution of technical issues, enhance features that foster user engagement, ensure reliable connectivity, and address platform-specific concerns such as security for Roblox and payment fairness for Zepeto. These insights provide valuable guidance for developers and business strategists, emphasizing the need for robust technical infrastructure, engaging and diverse content, seamless user access, and transparent and fair economic systems. By addressing these key areas, metaverse platforms can improve user satisfaction, build a loyal user base, and secure long-term success in an increasingly competitive market.

An analysis of public perception on Artificial Intelligence(AI) education using Big Data: Based on News articles and Twitter (빅데이터 분석을 통해 본 AI교육에 대한 사회적 인식: 뉴스기사와 트위터를 중심으로)

  • Lee, Sang-Soog;Yoo, Inhyeok;Kim, Jinhee
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.9-16
    • /
    • 2020
  • The purpose of this study is to understand the public needs for AI education actively promoted and supported by the current government. In doing so, 11 metropolitan news articles and Twitter posts regarding AI education that have been posted from January 1, 2018 to December 31, 2019 were collected. Then, word frequency analysis using TF(Term Frequency) method and LDA(Latent Dirichlet Allocation) method of topic modeling analysis were conducted. The topics of the news articles turn out to be a macroscopic policy support such as 'training female manpower in the AI field' and 'curriculum reform of university and K-12', whereas the topics of twitter delineate more detailed social perception on future society, such as future competencies and pedagogical methods, including 'coexistence with intelligent robots', 'coding education', and 'humane education competence development'. The findings are expected to be used to suggest the implications for the composition and management of AI curriculum as well as the basic framework of human resources development in the future industry.

An Exploratory research on patent trends and technological value of Organic Light-Emitting Diodes display technology (Organic Light-Emitting Diodes 디스플레이 기술의 특허 동향과 기술적 가치에 관한 탐색적 연구)

  • Kim, Mingu;Kim, Yongwoo;Jung, Taehyun;Kim, Youngmin
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.135-155
    • /
    • 2022
  • This study analyzes patent trends by deriving sub-technical fields of Organic Light-Emitting Diodes (OLEDs) industry, and analyzing technology value, originality, and diversity for each sub-technical field. To collect patent data, a set of international patent classification(IPC) codes related to OLED technology was defined, and OLED-related patents applied from 2005 to 2017 were collected using a set of IPC codes. Then, a large number of collected patent documents were classified into 12 major technologies using the Latent Dirichlet Allocation(LDA) topic model and trends for each technology were investigated. Patents related to touch sensor, module, image processing, and circuit driving showed an increasing trend, but virtual reality and user interface recently decreased, and thin film transistor, fingerprint recognition, and optical film showed a continuous trend. To compare the technological value, the number of forward citations, originality, and diversity of patents included in each technology group were investigated. From the results, image processing, user interface(UI) and user experience(UX), module, and adhesive technology with high number of forward citations, originality and diversity showed relatively high technological value. The results provide useful information in the process of establishing a company's technology strategy.

A Study on Analysis of national R&D research trends for Artificial Intelligence using LDA topic modeling (LDA 토픽모델링을 활용한 인공지능 관련 국가R&D 연구동향 분석)

  • Yang, MyungSeok;Lee, SungHee;Park, KeunHee;Choi, KwangNam;Kim, TaeHyun
    • Journal of Internet Computing and Services
    • /
    • v.22 no.5
    • /
    • pp.47-55
    • /
    • 2021
  • Analysis of research trends in specific subject areas is performed by examining related topics and subject changes by using topic modeling techniques through keyword extraction for most of the literature information (paper, patents, etc.). Unlike existing research methods, this paper extracts topics related to the research topic using the LDA topic modeling technique for the project information of national R&D projects provided by the National Science and Technology Knowledge Information Service (NTIS) in the field of artificial intelligence. By analyzing these topics, this study aims to analyze research topics and investment directions for national R&D projects. NTIS provides a vast amount of national R&D information, from information on tasks carried out through national R&D projects to research results (thesis, patents, etc.) generated through research. In this paper, the search results were confirmed by performing artificial intelligence keywords and related classification searches in NTIS integrated search, and basic data was constructed by downloading the latest three-year project information. Using the LDA topic modeling library provided by Python, related topics and keywords were extracted and analyzed for basic data (research goals, research content, expected effects, keywords, etc.) to derive insights on the direction of research investment.