• Title/Summary/Keyword: document topic

Search Result 190, Processing Time 0.024 seconds

LSTM based Language Model for Topic-focused Sentence Generation (문서 주제에 따른 문장 생성을 위한 LSTM 기반 언어 학습 모델)

  • Kim, Dahae;Lee, Jee-Hyong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2016.07a
    • /
    • pp.17-20
    • /
    • 2016
  • 딥러닝 기법이 발달함에 따라 텍스트에 내재된 의미 및 구문을 어떠한 벡터 공간 상에 표현하기 위한 언어 모델이 활발히 연구되어 왔다. 이를 통해 자연어 처리를 기반으로 하는 감성 분석 및 문서 분류, 기계 번역 등의 분야가 진보되었다. 그러나 대부분의 언어 모델들은 텍스트에 나타나는 단어들의 일반적인 패턴을 학습하는 것을 기반으로 하기 때문에, 문서 요약이나 스토리텔링, 의역된 문장 판별 등과 같이 보다 고도화된 자연어의 이해를 필요로 하는 연구들의 경우 주어진 텍스트의 주제 및 의미를 고려하기에 한계점이 있다. 이와 같은 한계점을 고려하기 위하여, 본 연구에서는 기존의 LSTM 모델을 변형하여 문서 주제와 해당 주제에서 단어가 가지는 문맥적인 의미를 단어 벡터 표현에 반영할 수 있는 새로운 언어 학습 모델을 제안하고, 본 제안 모델이 문서의 주제를 고려하여 문장을 자동으로 생성할 수 있음을 보이고자 한다.

  • PDF

Modeling an Information Architecture for Content Reuse in Engineering Accreditation Manuals (공학인증 매뉴얼에서 콘텐츠 재사용을 위한 정보 아키텍처 모델링)

  • Koo, Heung-Seo
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.63 no.1
    • /
    • pp.150-155
    • /
    • 2014
  • Content is often developed inconsistently in technical authoring environments, as new documents are created and existing documents are revised. The Darwin Information Typing Architecture(DITA) is an XML-based, end-to-end architecture for authoring, producing, and delivering technical documentations. The core of most advanced authoring and publishing systems is the concept of content reuse. In this paper, we describe to design and implement an authoring and producing system of different technical documentations for the accreditation programs of Engineering Education using DITA XML. It provide a content reuse method for accomplishing the improvement of content consistency and the speed-up of the production in technical documentations.

Analysis of Success Factors of Electric Scooter Sharing Service Using User Review Text Mining

  • Kyoung-ae Seo;Jung Seung Lee
    • Journal of Information Technology Applications and Management
    • /
    • v.30 no.2
    • /
    • pp.19-30
    • /
    • 2023
  • This study aims to analyze service improvement and success factors of electric scooter sharing service companies by using text mining after collecting reviews of shared electric scooter service applications among various models of sharing economy. In this study, the factors of satisfaction and dissatisfaction of service users were identified using the term frequency inverse document frequency (TF-IDF) technique, and topics for each keyword were extracted using the Latent Dirichlet Allocation (LDA) Topic Modeling technique. According to the analysis results, the main topics were entertainment, safety, service area, application complaints, use complaints, convenience, and mobility. Using the analysis results of this study, employees and researchers of electric scooter sharing service companies will be able to contribute to the improvement and success of related services.

Enhancing Document Clustering Method using Synonym of Cluster Topic and Similarity (군집 주제의 유의어와 유사도를 이용한 문서군집 향상 방법)

  • Park, Sun;Kim, Chul-Won
    • Annual Conference of KIPS
    • /
    • 2011.04a
    • /
    • pp.1538-1541
    • /
    • 2011
  • 본 논문은 군집 주제의 유의어와 유사도를 이용하여 문서군집의 성능을 향상시키는 방법을 제안한다. 제안된 방법은 비음수행렬분해의 의미특징을 이용하여 군집 주제(topic)의 용어들을 선택함으로서 문서 군집 집합의 내부구조를 잘 표현할 수 있으며, 군집 주제의 용어들에 워드넷의 유의어를 사용하여서 확장함으로써 문서를 용어집합(bag-of-words)으로 표현하는 문제를 해결할 수 있다. 또한 확장된 군집 주제의 용어와 문서집합에 코사인 유사도를 이용하여서 군집의 주제에 적합한 문서를 잘 군집하여서 성능을 높일 수 있다. 실험결과 제안방법을 적용한 문서군집방법이 다른 문서군집 방법에 비하여 좋은 성능을 보인다.

KOREAN TOPIC MODELING USING MATRIX DECOMPOSITION

  • June-Ho Lee;Hyun-Min Kim
    • East Asian mathematical journal
    • /
    • v.40 no.3
    • /
    • pp.307-318
    • /
    • 2024
  • This paper explores the application of matrix factorization, specifically CUR decomposition, in the clustering of Korean language documents by topic. It addresses the unique challenges of Natural Language Processing (NLP) in dealing with the Korean language's distinctive features, such as agglutinative words and morphological ambiguity. The study compares the effectiveness of Latent Semantic Analysis (LSA) using CUR decomposition with the classical Singular Value Decomposition (SVD) method in the context of Korean text. Experiments are conducted using Korean Wikipedia documents and newspaper data, providing insight into the accuracy and efficiency of these techniques. The findings demonstrate the potential of CUR decomposition to improve the accuracy of document clustering in Korean, offering a valuable approach to text mining and information retrieval in agglutinative languages.

Information Technology Application for Oral Document Analysis (구술문서 자료분석을 위한 정보검색기술의 응용)

  • Park, Soon-Cheol;Hahm, Han-Hee
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.13 no.2
    • /
    • pp.47-55
    • /
    • 2008
  • The purpose of this paper is to develop an analytical methodology of or릴 documents by the application of. Information Technologies. This system consists of the key word search, contents summary, clustering, classification & topic tracing of the contents. The integrated model of the five levels of retrieval technologies can be exhaustively used in the analysis of oral documents, which were collected as oral history of five men and women in the area of North Jeolla. Of the five methods topic tracing is the most pioneering accomplishment both home and abroad. In final this research will shed light on the methodological and theoretical studies of oral history and culture.

  • PDF

Analysis of Media Articles on COVID-19 and Nurses Using Text Mining and Topic Modeling (텍스트 마이닝과 토픽모델링 분석을 활용한 코로나19와 간호사에 대한 언론기사 분석)

  • An, Jiyeon;Yi, Yunjeong;Lee, Bokim
    • Research in Community and Public Health Nursing
    • /
    • v.32 no.4
    • /
    • pp.467-476
    • /
    • 2021
  • Purpose: The purpose of this study is to understand the social perceptions of nurses in the context of the COVID-19 outbreak through analysis of media articles. Methods: Among the media articles reported from January 1st to September 30th, 2020, those containing the keywords '[corona or Wuhan pneumonia or covid] and [nurse or nursing]' are extracted. After the selection process, the text mining and topic modeling are performed on 454 media articles using textom version 4.5. Results: Frequency Top 30 keywords include 'Nurse', 'Corona', 'Isolation', 'Support', 'Shortage', 'Protective Clothing', and so on. Keywords that ranked high in Term Frequency-Inverse Document Frequency (TF-IDF) values are 'Daegu', 'President', 'Gwangju', 'manpower', and so on. As a result of the topic analysis, 10 topics are derived, such as 'Local infection', 'Dispatch of personnel', 'Message for thanks', and 'Delivery of one's heart'. Conclusion: Nurses are both the contributors and victims of COVID-19 prevention. The government and the nurses' community should make efforts to improve poor working conditions and manpower shortages.

Study on CEO New Year's Address: Using Text Mining Method (텍스트마이닝을 활용한 주요 대기업 신년사 분석)

  • YuKyoung Kim;Daegon Cho
    • Journal of Information Technology Services
    • /
    • v.22 no.2
    • /
    • pp.93-127
    • /
    • 2023
  • This study analyzed the CEO New Year's addresses of major Korean companies, extracting key topics for employees via text mining techniques. An intended contribution of this study is to assist reporters, analysts, and researchers in gaining a better understanding of the New Year's addresses by elucidating the implicit and implicative features of messages within. To this end, this study collected and analyzed 545 New Year's addresses published between 2012 and 2021 by the top 66 Korean companies in terms of market capitalization. Research methodologies applied include text clustering, word embedding of keywords, frequency analysis, and topic modeling. Our main findings suggest that the messages in the New Year's addresses were categorized into nine topics-organizational culture, global advancement, substantial management, business reorganization, capacity building, market leadership, management innovation, sustainable management, and technology development. Next, this study further analyzed the managerial significance of each topic and discussed their characteristics from the perspectives of time, industry, and corporate groups. Companies were typically found to emphasize sound management, market leadership, and business reorganization during economic downturns while stressing capacity building and organizational culture during market transition periods. Also, companies belonging to corporate groups tended to emphasize founding philosophy and corporate culture.

Topic based Web Document Clustering using Named Entities (개체명을 이용한 주제기반 웹 문서 클러스터링)

  • Sung, Ki-Youn;Yun, Bo-Hyun
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.5
    • /
    • pp.29-36
    • /
    • 2010
  • Past clustering researches are focused on extraction of keyword for word similarity grouping. However, too many candidates to compare and compute bring high complexity, low speed and low accuracy. To overcome these weaknesses, this paper proposed a topical web document clustering model using not only keyword but also named entities such as person name, organization, location, and so on. By several experiments, we prove effects of our model compared with traditional model based on only keyword and analyze how different effects show according to characteristics of document collection.

A Korean Sentence and Document Sentiment Classification System Using Sentiment Features (감정 자질을 이용한 한국어 문장 및 문서 감정 분류 시스템)

  • Hwang, Jaw-Won;Ko, Young-Joong
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.3
    • /
    • pp.336-340
    • /
    • 2008
  • Sentiment classification is a recent subdiscipline of text classification, which is concerned not with the topic but with opinion. In this paper, we present a Korean sentence and document classification system using effective sentiment features. Korean sentiment classification starts from constructing effective sentiment feature sets for positive and negative. The synonym information of a English word thesaurus is used to extract effective sentiment features and then the extracted English sentiment features are translated in Korean features by English-Korean dictionary. A sentence or a document is represented by using the extracted sentiment features and is classified and evaluated by SVM(Support Vector Machine).