• Title/Summary/Keyword: Textrank

Search Result 4, Processing Time 0.017 seconds

Document Understanding and Similar Document Recommendation Through Word Embedding Model (워드 임베딩 모델을 이용한 문서 이해 및 유사문서 추천)

  • Jeongmin Cho;Seungshik Kang
    • Annual Conference of KIPS
    • /
    • 2024.10a
    • /
    • pp.480-481
    • /
    • 2024
  • 문서의 내용을 쉽게 이해하기 위해서는 문서의 핵심 단어, 또는 핵심 문장을 빠르게 파악하는 것이 중요하다. 또한 유사한 문서를 참고하여 같이 읽는다면 해당 문서 내용을 파악하는 시간을 단축시켜주거나 해당 문서에 대한 이해도를 증가시킬 수 있다. 이를 위해서 wordcloud, textrank, Doc2Vec, softmax regression, cosine similarity과 같은 기법을 활용한다. 최종적으로 어떠한 문서를 입력받으면 문서의 명사를 기반으로 한 워드클라우드 시각화 및 핵심 문장 추출, 같은 카테고리를 가지는 유사한 문서를 추천해 주는 연구를 수행하였다.

Research trends in statistics for domestic and international journal using paper abstract data (초록데이터를 활용한 국내외 통계학 분야 연구동향)

  • Yang, Jong-Hoon;Kwak, Il-Youp
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.267-278
    • /
    • 2021
  • As time goes by, the amount of data is increasing regardless of government, business, domestic or overseas. Accordingly, research on big data is increasing in academia. Statistics is one of the major disciplines of big data research, and it will be interesting to understand the research trend of statistics through big data in the growing number of papers in statistics. In this study, we analyzed what studies are being conducted through abstract data of statistical papers in Korea and abroad. Research trends in domestic and international were analyzed through the frequency of keyword data of the papers, and the relationship between the keywords was visualized through the Word Embedding method. In addition to the keywords selected by the authors, words that are importantly used in statistical papers selected through Textrank were also visualized. Lastly, 10 topics were investigated by applying the LDA technique to the abstract data. Through the analysis of each topic, we investigated which research topics are frequently studied and which words are used importantly.

Keywords Refinement using TextRank Algorithm (TextRank를 이용한 키워드 정련 -TextRank를 이용한 집단 지성에서 생성된 콘텐츠의 키워드 정련-)

  • Lee, Hyun-Woo;Han, Yo-Sub;Kim, Lae-Hyun;Cha, Jeong-Won
    • 한국HCI학회:학술대회논문집
    • /
    • 2009.02a
    • /
    • pp.285-289
    • /
    • 2009
  • Tag is important to retrieve and classify contents. However, someone uses so many unrelated tags with contents for the high ranking In this work, we propose tag refinement algorithm using TextRank. We calculate the importance of keywords occurred a title, description, tag, and comments. We refine tags removing unrelated keywords from user generated tags. From the results of experiments, we can see that proposed method is useful for refining tags.

  • PDF

Improvement of topic modeling and case analysis through convergence of Bertopic and TextRank (버토픽과 텍스트랭크의 융합을 통한 토픽모델링의 개선 및 사례 분석)

  • Kim, Keun Hyung;Kang Jae Jung
    • The Journal of Information Systems
    • /
    • v.33 no.3
    • /
    • pp.105-121
    • /
    • 2024
  • Purpose The purpose of this paper is to develop a method to improve topic representation by incorporating the TextRank technique in Bertopic-based topic modeling and additional indicators for determining the optimal number of topics. Design/methodology/approach In this paper, we propose a method to extract important documents from documents assigned to each topic of a topic model using the TextRank technique, and to calculate secondary diversity and generate topic representations based on the results. First, we integrate the TextRank algorithm into the Bertopic-based topic modeling process to set local secondary labels for each topic. The secondary labels of each topic are derived through extractive summarization based on the TextRank algorithm. Second, we improve the accuracy of selecting the optimal number of topics by calculating the secondary diversity index based on the extractive summary results of each topic. Third, we improve the efficiency by utilizing ChatGPT when deriving the labels of each topic. Findings As a result of performing case analysis and analysis evaluation using the proposed method, it was confirmed that topic representation based on TextRank results generated more accurate topic labels and that the secondary diversity index was a more effective index for determining the optimal number of topics.