Search | Korea Science

Adjusting Weights of Single-word and Multi-word Terms for Keyphrase Extraction from Article Text

Kang, In-Su
- Journal of the Korea Society of Computer and Information
- /
- v.26 no.8
- /
- pp.47-54
- /
- 2021
Given a document, keyphrase extraction is to automatically extract words or phrases which topically represent the content of the document. In unsupervised keyphrase extraction approaches, candidate words or phrases are first extracted from the input document, and scores are calculated for keyphrase candidates, and final keyphrases are selected based on the scores. Regarding the computation of the scores of candidates in unsupervised keyphrase extraction, this study proposes a method of adjusting the scores of keyphrase candidates according to the types of keyphrase candidates: word-type or phrase-type. For this, type-token ratios of word-type and phrase-type candidates as well as information content of high-frequency word-type and phrase-type candidates are collected from the input document, and those values are employed in adjusting the scores of keyphrase candidates. In experiments using four keyphrase extraction evaluation datasets which were constructed for full-text articles in English, the proposed method performed better than a baseline method and comparison methods in three datasets.
https://doi.org/10.9708/jksci.2021.26.08.047 인용 PDF KSCI HTML

Education Content of Department of Dental Hygiene andActual Condition of the Overlapping Analytic Syllabus (치위생과 교육내용 및 교수요목 중복실태 분석)

Park, Myung-Suk;Kim, Chang-Hee
- Journal of dental hygiene science
- /
- v.7 no.1
- /
- pp.49-54
- /
- 2007
This research was conducted to provide standardization method for new dental hygiene curriculum by identifying the overlapping of education content of the Department of Dental Hygiene and analytic syllabus. To complement these overlapping education programs, I would like to make some proposals. First, unified course shall be provided by compromising specific terms of overlapping subject, overlapping curriculums for the necessary skills required for job analysis of dental hygienist and overlapping class time. This shall increase the efficiency the class time and required curriculums. Next, proactive and continuos research for standardized approach to the Department of Dental Hygiene education content is necessary and Dental Hygiene academic circle shall build trust.
PDF

Analyzing Architectural History Terminologies by Text Mining and Association Analysis (텍스트 마이닝과 연관 관계 분석을 이용한 건축역사 용어 분석)

Kim, Min-Jeong;Kim, Chul-Joo
- Journal of Digital Convergence
- /
- v.15 no.1
- /
- pp.443-452
- /
- 2017
Architectural history traces the changes in architecture through various traditions, regions, overarching stylistic trends, and dates. This study identified terminologies related to the proximity and frequency in the architectural history areas by text mining and association analysis. This study explored terminologies by investigating articles published in the "Journal of Architectural History", a sole journal for the architectural history studies. First, key terminologies that appeared frequently were extracted from paper that had titles, keywords, and abstracts. Then, we analyzed some typical and specific key terminologies that appear frequently and partially depending on the research areas. Finally, association analysis was used to find the frequent patterns in the key terminologies. This research can be used as fundamental data for understanding issues and trends in areas on the architectural history.
https://doi.org/10.14400/JDC.2017.15.1.443 인용 PDF KSCI

Analyzing Disaster Response Terminologies by Text Mining and Social Network Analysis (텍스트 마이닝과 소셜 네트워크 분석을 이용한 재난대응 용어분석)

Kang, Seong Kyung;Yu, Hwan;Lee, Young Jai
- Information Systems Review
- /
- v.18 no.1
- /
- pp.141-155
- /
- 2016
This study identified terminologies related to the proximity and frequency of disaster by social network analysis (SNA) and text mining, and then expressed the outcome into a mind map. The termdocument matrix of text mining was utilized for the terminology proximity analysis, and the SNA closeness centrality was calculated to organically express the relationship of the terminologies through a mind map. By analyzing terminology proximity and selecting disaster response-related terminologies, this study identified the closest field among all the disaster response fields to disaster response and the core terms in each disaster response field. This disaster response terminology analysis could be utilized in future core term-based terminology standardization, disaster-related knowledge accumulation and research, and application of various response scenario compositions, among others.
https://doi.org/10.14329/isr.2016.18.1.141 인용 PDF

Analysis of Scientific Item Networks from Science and Biology Textbooks (고등학교 과학 및 생물교과서 과학용어 네트워크 분석)

Park, Byeol-Na;Lee, Yoon-Kyeong;Ku, Ja-Eul;Hong, Young-Soo;Kim, Hak-Yong
- The Journal of the Korea Contents Association
- /
- v.10 no.5
- /
- pp.427-435
- /
- 2010
We extracted core terms by constructing scientific item networks from textbooks, analyzing their structures, and investigating the connected information and their relationships. For this research, we chose three high-school textbooks from different publishers for each three subjects, i.e, Science, Biology I and Biology II, to construct networks by linking scientific items in each sentence, where used items were regarded as nodes. Scientific item networks from all textbooks showed scare-free character. When core networks were established by applying k-core algorithm which is one of generally used methods for removing lesser weighted nodes and links from complex network, they showed the modular structure. Science textbooks formed four main modules of physics, chemistry, biology and earth science, while Biology I and Biology II textbooks revealed core networks composed of more detailed specific items in each field. These findings demonstrate the structural characteristics of networks in textbooks, and suggest core scientific items helpful for students' understanding of concept in Science and Biology.
https://doi.org/10.5392/JKCA.2010.10.5.427 인용 PDF KSCI

Comparison of term weighting schemes for document classification (문서 분류를 위한 용어 가중치 기법 비교)

Jeong, Ho Young;Shin, Sang Min;Choi, Yong-Seok
- The Korean Journal of Applied Statistics
- /
- v.32 no.2
- /
- pp.265-276
- /
- 2019
The document-term frequency matrix is a general data of objects in text mining. In this study, we introduce a traditional term weighting scheme TF-IDF (term frequency-inverse document frequency) which is applied in the document-term frequency matrix and used for text classifications. In addition, we introduce and compare TF-IDF-ICSDF and TF-IGM schemes which are well known recently. This study also provides a method to extract keyword enhancing the quality of text classifications. Based on the keywords extracted, we applied support vector machine for the text classification. In this study, to compare the performance term weighting schemes, we used some performance metrics such as precision, recall, and F1-score. Therefore, we know that TF-IGM scheme provided high performance metrics and was optimal for text classification.
https://doi.org/10.5351/KJAS.2019.32.2.265 인용 PDF KSCI HTML

A Study on the Structure of Definition in Terminological Dictionaries (전문용어사전의 정의 구조에 관한 연구)

김성진
- Proceedings of the Korean Society for Information Management Conference
- /
- 2000.08a
- /
- pp.11-14
- /
- 2000
사전에서 정의는 의미전달 및 이해를 도모하는 핵심부로, 피정의항의 언어적 성격에 따라, 사전의 성격에 따라 다양하다. 전문용어사전의 체계적이고 일관성 있는 정의 구조는 이용자의 이해를 도울 뿐만 아니라 시소러스 및 전자사전의 구축을 용이하게 한다. 본 연구에서는 전문용어사전의 정의 구조를 분석하여 정의 구조의 체계화를 도모할 수 있는 방안을 제안한다.
PDF

Document classification using a deep neural network in text mining (텍스트 마이닝에서 심층 신경망을 이용한 문서 분류)

Lee, Bo-Hui;Lee, Su-Jin;Choi, Yong-Seok
- The Korean Journal of Applied Statistics
- /
- v.33 no.5
- /
- pp.615-625
- /
- 2020
The document-term frequency matrix is a term extracted from documents in which the group information exists in text mining. In this study, we generated the document-term frequency matrix for document classification according to research field. We applied the traditional term weighting function term frequency-inverse document frequency (TF-IDF) to the generated document-term frequency matrix. In addition, we applied term frequency-inverse gravity moment (TF-IGM). We also generated a document-keyword weighted matrix by extracting keywords to improve the document classification accuracy. Based on the keywords matrix extracted, we classify documents using a deep neural network. In order to find the optimal model in the deep neural network, the accuracy of document classification was verified by changing the number of hidden layers and hidden nodes. Consequently, the model with eight hidden layers showed the highest accuracy and all TF-IGM document classification accuracy (according to parameter changes) were higher than TF-IDF. In addition, the deep neural network was confirmed to have better accuracy than the support vector machine. Therefore, we propose a method to apply TF-IGM and a deep neural network in the document classification.
https://doi.org/10.5351/KJAS.2020.33.5.615 인용 PDF KSCI

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

Kim, JungJin;Jeong, Hanseok
- Proceedings of the Korea Water Resources Association Conference
- /
- 2021.06a
- /
- pp.236-236
- /
- 2021
최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.
PDF

Electric Power - Smart 톡톡 - 최신 전기시사용어 해설 Smart 톡톡

대한전기협회
- JOURNAL OF ELECTRICAL WORLD
- /
- s.412
- /
- pp.66-67
- /
- 2011
우리나라는 원자력발전 3대 핵심기술인 원전계측제어시스템(MMIS), 원자로냉각재펌프(RCP), 원전설계핵심코드의 국산화를 추진하고 있다. 지난해 가장 먼저 MMIS가 국산화에 성공했으며, IAEA로부터 우수성을 평가받을 정도로 기술적 우위를 확보했다. 원전설계핵심코드 중 노심설계코드도 2010년에 개발을 완료하였다. RCP도 일부 구성품은 이미 국산화에 성공하였으며, 2012년에 순수 국산제품 개발이 완료될 예정이다.
PDF

Search Result 207, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)