• Title/Summary/Keyword: Keywords Extraction

Search Result 139, Processing Time 0.025 seconds

KR-WordRank : An Unsupervised Korean Word Extraction Method Based on WordRank (KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법)

  • Kim, Hyun-Joong;Cho, Sungzoon;Kang, Pilsung
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.40 no.1
    • /
    • pp.18-33
    • /
    • 2014
  • A Word is the smallest unit for text analysis, and the premise behind most text-mining algorithms is that the words in given documents can be perfectly recognized. However, the newly coined words, spelling and spacing errors, and domain adaptation problems make it difficult to recognize words correctly. To make matters worse, obtaining a sufficient amount of training data that can be used in any situation is not only unrealistic but also inefficient. Therefore, an automatical word extraction method which does not require a training process is desperately needed. WordRank, the most widely used unsupervised word extraction algorithm for Chinese and Japanese, shows a poor word extraction performance in Korean due to different language structures. In this paper, we first discuss why WordRank has a poor performance in Korean, and propose a customized WordRank algorithm for Korean, named KR-WordRank, by considering its linguistic characteristics and by improving the robustness to noise in text documents. Experiment results show that the performance of KR-WordRank is significantly better than that of the original WordRank in Korean. In addition, it is found that not only can our proposed algorithm extract proper words but also identify candidate keywords for an effective document summarization.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

Extraction of Informative Features for Automatic Indexation of Human Sensibility Ergonomic Documents (감성공학 문서 데이터의 지표 자동화를 위한 코퍼스 분석 기반 특성정보 추출)

  • 배희숙;곽현민;채균식;이상태
    • Science of Emotion and Sensibility
    • /
    • v.7 no.2
    • /
    • pp.133-140
    • /
    • 2004
  • A large number of indices are produced from human sensibility ergonomic data, which are accumulated by the project "Study on the Development of Web-Based Database System of Human Sensibility and its Support". Since the research in this field will be increased rapidly, it is necessary to automate the index processing of human sensibility ergonomic data. From the similarity between indexation and summarization, we propose the automation of this process. In this paper, we study on extraction of keywords, information types and expression features that are considered as basic elements of following techniques for automatic summarization: classification of documents, extraction of information types and linguistic features. This study can be applied to automatic summarization system and knowledge management system in the domain of human sensibility ergonomics.rgonomics.

  • PDF

A Study of High Speed Retrieval Algorithm of Long Component Keyword (복합키워드의 고속검색 알고리즘에 관한 연구)

  • Lee Jin-Kwan;Jung Kyu-cheol;Lee Tae-hun;Park Ki-hong
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.8 no.8
    • /
    • pp.1769-1776
    • /
    • 2004
  • Effective keyword extraction is important in the information search system and there are several ways to select proper keyword in many keywords. Among them, DER Structure for AC Algorithm to search single keyword, can search multiple keywords but it has time complexity problem. In this paper, we developed a algorithm, "EDER structure" by expanding standalone search table based on DER structure search method to improve time complexity. We tested the algorithm using 500 text files and found that EDER structure is more efficient than DER structure for AC for keyword posting result and time complexity that 0.2 second for EDER and 0.6 second for DER structure,structure,

Keyword Weight based Paragraph Extraction Algorithm (문단 가중치 분석 기반 본문 영역 선정 알고리즘)

  • Lee, Jongwon;Yu, Seongjong;Kim, Doan;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.462-463
    • /
    • 2018
  • Traditional document analysis systems used word-based analysis using a morphological analyzer or TF-IDF technique. These systems have the advantage of being able to derive key keywords by calculating the weights of the keywords. On the other hand, it is not appropriate to analyze the contents of documents due to the structural limitations. To solve this problem, the proposed algorithm calculates the weights of the documents in the document and divides the paragraphs into areas. And we calculate the importance of the divided regions and let the user know the area with the most important paragraphs in the document. So, it is expected that the user will be provided with a service suitable for analyzing documents rather than using existing document analysis systems.

  • PDF

Angiogenesis and the prevention of alveolar osteitis: a review study

  • Saghiri, Mohammad Ali;Asatourian, Armen;Sheibani, Nader
    • Journal of the Korean Association of Oral and Maxillofacial Surgeons
    • /
    • v.44 no.3
    • /
    • pp.93-102
    • /
    • 2018
  • Angiogenesis is one of the essential processes that occur during wound healing. It is responsible for providing immunity as well as the regenerative cells, nutrition, and oxygen needed for the healing of the alveolar socket following tooth extraction. The inappropriate removal of formed blood clots causes the undesirable phenomenon of alveolar osteitis (AO) or dry socket. In this review, we aimed to investigate whether enhanced angiogenesis contributes to a more effective prevention of AO. The potential pro- or anti-angiogenic activity of different materials used for the treatment of AO were evaluated. An electronic search was performed in the PubMed, MEDLINE, and EMBASE databases via OVID from January 2000 to September 2016 using the keywords mentioned in the PubMed and MeSH (Medical Subject Headings) terms regarding the role of angiogenesis in the prevention of AO. Our initial search identified 408 articles using the keywords indicated above, with 38 of them meeting the inclusion criteria set for this review. Due to the undeniable role of angiogenesis in the socket healing process, it is beneficial if strategies for preventing AO are directed toward more proangiogenic materials and modalities.

Automatic Single Document Text Summarization Using Key Concepts in Documents

  • Sarkar, Kamal
    • Journal of Information Processing Systems
    • /
    • v.9 no.4
    • /
    • pp.602-620
    • /
    • 2013
  • Many previous research studies on extractive text summarization consider a subset of words in a document as keywords and use a sentence ranking function that ranks sentences based on their similarities with the list of extracted keywords. But the use of key concepts in automatic text summarization task has received less attention in literature on summarization. The proposed work uses key concepts identified from a document for creating a summary of the document. We view single-word or multi-word keyphrases of a document as the important concepts that a document elaborates on. Our work is based on the hypothesis that an extract is an elaboration of the important concepts to some permissible extent and it is controlled by the given summary length restriction. In other words, our method of text summarization chooses a subset of sentences from a document that maximizes the important concepts in the final summary. To allow diverse information in the summary, for each important concept, we select one sentence that is the best possible elaboration of the concept. Accordingly, the most important concept will contribute first to the summary, then to the second best concept, and so on. To prove the effectiveness of our proposed summarization method, we have compared it to some state-of-the art summarization systems and the results show that the proposed method outperforms the existing systems to which it is compared.

Research trends over 10 years (2010-2021) in infant and toddler rearing behavior by family caregivers in South Korea: text network and topic modeling

  • In-Hye Song;Kyung-Ah Kang
    • Child Health Nursing Research
    • /
    • v.29 no.3
    • /
    • pp.182-194
    • /
    • 2023
  • Purpose: This study analyzed research trends in infant and toddler rearing behavior among family caregivers over a 10-year period (2010-2021). Methods: Text network analysis and topic modeling were employed on data collected from relevant papers, following the extraction and refinement of semantic morphemes. A semantic-centered network was constructed by extracting words from 2,613 English-language abstracts. Data analysis was performed using NetMiner 4.5.0. Results: Frequency analysis, degree centrality, and eigenvector centrality all revealed the terms ''scale," ''program," and ''education" among the top 10 keywords associated with infant and toddler rearing behaviors among family caregivers. The keywords extracted from the analysis were divided into two clusters through cohesion analysis. Additionally, they were classified into two topic groups using topic modeling: "program and evaluation" (64.37%) and "caregivers' role and competency in child development" (35.63%). Conclusion: The roles and competencies of family caregivers are essential for the development of infants and toddlers. Intervention programs and evaluations are necessary to improve rearing behaviors. Future research should determine the role of nurses in supporting family caregivers. Additionally, it should facilitate the development of nursing strategies and intervention programs to promote positive rearing practices.

Perception and Trend Differences between Korea, China, and the US on Vegan Fashion -Using Big Data Analytics- (빅데이터를 이용한 비건 패션 쟁점의 분석 -한국, 중국, 미국을 중심으로-)

  • Jiwoon Jeong;Sojung Yun
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.5
    • /
    • pp.804-821
    • /
    • 2023
  • This study examines current trends and perceptions of veganism and vegan fashion in Korea, China, and the United States. Using big data tools Textom and Ucinet, we conducted cluster analysis between keywords. Further, frequency analysis using keyword extraction and CONCOR analysis obtained the following results. First, the nations' perceptions of veganism and vegan fashion differ significantly. Korea and the United States generally share a similar understanding of vegan fashion. Second, the industrial structures, such as products and businesses, impacted how Korea perceived veganism. Third, owing to its ongoing sociopolitical tensions, the United States views veganism as an ethical consumption method that ties into activism. In contrast, China views veganism as a healthy diet rather than a lifestyle and associates it with Buddhist vegetarianism. This perception is because of their religious history and culinary culture. Fundamentally, this study is meaningful for using big data to extract keywords related to vegan fashion in Korea, China, and the United States. This study deepens our understanding of vegan fashion by comparing perceptions across nations.

Automatic Keyword Extraction using Hierarchical Graph Model Based on Word Co-occurrences (단어 동시출현관계로 구축한 계층적 그래프 모델을 활용한 자동 키워드 추출 방법)

  • Song, KwangHo;Kim, Yoo-Sung
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.522-536
    • /
    • 2017
  • Keyword extraction can be utilized in text mining of massive documents for efficient extraction of subject or related words from the document. In this study, we proposed a hierarchical graph model based on the co-occurrence relationship, the intrinsic dependency relationship between words, and common sub-word in a single document. In addition, the enhanced TextRank algorithm that can reflect the influences of outgoing edges as well as those of incoming edges is proposed. Subsequently a novel keyword extraction scheme using the proposed hierarchical graph model and the enhanced TextRank algorithm is proposed to extract representative keywords from a single document. In the experiments, various evaluation methods were applied to the various subject documents in order to verify the accuracy and adaptability of the proposed scheme. As the results, the proposed scheme showed better performance than the previous schemes.