• Title/Summary/Keyword: 텍스트 수집

Search Result 692, Processing Time 0.029 seconds

Design and Implementation for Extraction of Field-Associationed Terms (분야연상어 추출 방법의 설계 및 구현)

  • Lee, Won-Hee;Choi, Hyun;Lee, Samuel Sangkon
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2004.05a
    • /
    • pp.651-654
    • /
    • 2004
  • 우리는 특정 문서를 읽을 때 문서 전체를 읽지 않더라도 대표적인 몇 개의 단어를 보는 것만으로 정치나 경제, 스포츠 등의 분야를 정확히 인지할 수 있다. 문서 전체를 대상으로 하지 않고 부분텍스트에서 출현하는 소수의 단어정보에서 문서의 분야를 정확히 결정하기 위해 분야연상어의 구축은 중요한 연구과제이다. 인간이 미리 분야체계를 정의하고, 각 분야에 해당하는 문서를 인터넷이나 서적을 통해 수집한다. 본 논문은 수집문서의 분야를 정확히 지시하는 분야연상어를 자동으로 수집하는 시스템을 설계하고 구현하는데 목적이 있다. 문서의 분야결정 시점을 고려하여 분야연상어의 수준, 안정성 랭크, 집중률, 빈도정보를 이용하여 단일 분야연상어를 수집하는 방법을 제안하고 구현한다.

  • PDF

Investigation of Topic Trends in Computer and Information Science by Text Mining Techniques: From the Perspective of Conferences in DBLP (텍스트 마이닝 기법을 이용한 컴퓨터공학 및 정보학 분야 연구동향 조사: DBLP의 학술회의 데이터를 중심으로)

  • Kim, Su Yeon;Song, Sung Jeon;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.32 no.1
    • /
    • pp.135-152
    • /
    • 2015
  • The goal of this paper is to explore the field of Computer and Information Science with the aid of text mining techniques by mining Computer and Information Science related conference data available in DBLP (Digital Bibliography & Library Project). Although studies based on bibliometric analysis are most prevalent in investigating dynamics of a research field, we attempt to understand dynamics of the field by utilizing Latent Dirichlet Allocation (LDA)-based multinomial topic modeling. For this study, we collect 236,170 documents from 353 conferences related to Computer and Information Science in DBLP. We aim to include conferences in the field of Computer and Information Science as broad as possible. We analyze topic modeling results along with datasets collected over the period of 2000 to 2011 including top authors per topic and top conferences per topic. We identify the following four different patterns in topic trends in the field of computer and information science during this period: growing (network related topics), shrinking (AI and data mining related topics), continuing (web, text mining information retrieval and database related topics), and fluctuating pattern (HCI, information system and multimedia system related topics).

Text-mining Techniques for Metabolic Pathway Reconstruction (대사경로 재구축을 위한 텍스트 마이닝 기법)

  • Kwon, Hyuk-Ryul;Na, Jong-Hwa;Yoo, Jae-Soo;Cho, Wan-Sup
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.138-147
    • /
    • 2007
  • Metabolic pathway is a series of chemical reactions occuning within a cell and can be used for drug development and understanding of life phenomenon. Many biologists are trying to extract metabolic pathway information from huge literatures for their metabolic-circuit regulation study. We propose a text-mining technique based on the keyword and pattern. Proposed technique utilizes a web robot to collect huge papers and stores them into a local database. We use gene ontology to increase compound recognition rate and NCBI Tokenizer library to recognize useful information without compound destruction. Furthermore, we obtain useful sentence patterns representing metabolic pathway from papers and KEGG database. We have extracted 66 patterns in 20,000 documents for Glycosphingolipid species from KEGG, a representative metabolic database. We verify our system for nineteen compounds in Glycosphingolipid species. The result shows that the recall is 95.1%, the precision 96.3%, and the processing time 15 seconds. Proposed text mining system is expected to be used for metabolic pathway reconstruction.

  • PDF

Occupational Therapy in Long-Term Care Insurance For the Elderly Using Text Mining (텍스트 마이닝을 활용한 노인장기요양보험에서의 작업치료: 2007-2018년)

  • Cho, Min Seok;Baek, Soon Hyung;Park, Eom-Ji;Park, Soo Hee
    • Journal of Society of Occupational Therapy for the Aged and Dementia
    • /
    • v.12 no.2
    • /
    • pp.67-74
    • /
    • 2018
  • Objective : The purpose of this study is to quantitatively analyze the role of occupational therapy in long - term care insurance for the elderly using text mining, one of the big data analysis techniques. Method : For the analysis of newspaper articles, "Long - Term Care Insurance for the Elderly + Occupational Therapy for the Elderly" was collected after the period from 2007 to 208. Naver, which has a high share of the domestic search engine, utilized the database of Naver News by utilizing Textom, a web crawling tool. After collecting the article title and original text of 510 news data from the collection of the elderly long term care insurance + occupational therapy search, we analyzed the article frequency and key words by year. Result : In terms of the frequency of articles published by year, the number of articles published in 2015 and 2017 was the highest with 70 articles (13.7%), and the top 10 terms of the key word analysis showed the highest frequency of 'dementia' (344) In terms of key words, dementia, treatment, hospital, health, service, rehabilitation, facilities, institution, grade, elderly, professional, salary, industrial complex and people are related. Conclusion : In this study, it is meaningful that the textual mining technique was used to more objectively confirm the social needs and the role of the occupational therapist for the dementia and rehabilitation in the related key keywords based on the media reporting trend of the elderly long - term care insurance for 11 years. Based on the results of this study, future research should expand research field and period and supplement the research methodology through various analysis methods according to the year.

Trend Analysis of Fraudulent Claims by Long Term Care Institutions for the Elderly using Text Mining and BIGKinds (텍스트 마이닝과 빅카인즈를 활용한 노인장기요양기관 부당청구 동향 분석)

  • Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.13-24
    • /
    • 2022
  • In order to explore the context of fraudulent claims and the measures for preventing them targeting the long-term care institutions for the elderly, which is increasing every year in Korea, this study conducted the text mining analysis using the media report articles. The media report articles were collected from the news big data analysis system called 'BIG KINDS' for about 15 years from July 2008 when the Long-Term Care Insurance for the Elderly took effect, to February 28th 2022. During this period of time, total 2,627 articles were collected under keywords like 'elderly care+fraudulent claims' and 'long-term care+fraudulent claims', and among them, total 946 articles were selected after excluding overlapped articles. In the results of the text mining analysis in this study, first, the top 10 keywords mentioned in the highest frequency in every section(July 1st 2008-February 28th 2022) were shown in the order of long-term care institution for the elderly, fraudulent claims, National Health Insurance Service, Long-Term Care Insurance for the Elderly, long-term care benefits(expenses), elderly care facilities, The Ministry of Health & Welfare, the elderly, report, and reward(payment). Second, in the results of the N-gram analysis, they were shown in the order of long-term care benefits(expenses) and fraudulent claims, fraudulent claims and long-care institution for the elderly, falsehood and fraudulent claims, report and reward(payment), and long-term care institution for the elderly and report. Third, the analysis of TF-IDF was similar to the results of the frequency analysis while the rankings of report, reward(payment), and increase moved up. Based on such results of the analysis above, this study presented the future direction for the prevention of fraudulent claims of long-term care institutions for the elderly.

A Study on Focused Crawling of Web Document for Building of Ontology Instances (온톨로지 인스턴스 구축을 위한 주제 중심 웹문서 수집에 관한 연구)

  • Chang, Moon-Soo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.18 no.1
    • /
    • pp.86-93
    • /
    • 2008
  • The construction of ontology defines as complicated semantic relations needs precise and expert skills. For the well defined ontology in real applications, plenty of information of instances for ontology classes is very critical. In this study, crawling algorithm which extracts the fittest topic from the Web overflowing over by a great number of documents has been focused and developed. Proposed crawling algorithm made a progress to gather documents at high speed by extracting topic-specific Link using URL patterns. And topic fitness of Link block text has been represented by fuzzy sets which will improve a precision of the focused crawler.

A study of the creation mechanism of exclusion against the immigrant (이주민 배제 생성 기제에 대한 연구 -상층부 연구접근-)

  • Kim, Young Sook
    • Korean Journal of Social Welfare Studies
    • /
    • v.44 no.2
    • /
    • pp.5-33
    • /
    • 2013
  • This study is to analyze the creation mechanism of exclusion and discrimination against the immigrant. The author approached studying up and used life history study method. Ten of anti-multiculturists participated this study. Data were collected by in-depth interview. The text of individual life history were analyzed by Mandelbaum(1973). The author analyzed the dimension of life, turning point and adaptation. The result as follows; I presented ① Plan of oneness ground, ② Searching new Sacrifice goat, ③ Transference of a inferiority complex for the mechanism of exclusion and discrimination against immigrant. Finally I proposed 「cross cultural education」, 「native participated integration program」, 「establishment of the strongpoint center for adjustment between native and immigrant and up bringing the professional in community.

A Text Mining Analysis for Research Trend about Information and Communication Technology in Construction Automation (텍스트마이닝 기법을 활용한 정보통신기술 기반 건설자동화 연구동향 분석)

  • Lim, Si Yeong;Kim, Seok
    • Korean Journal of Construction Engineering and Management
    • /
    • v.17 no.6
    • /
    • pp.13-23
    • /
    • 2016
  • Construction automation based on information and communication technology(ICT) has been studied for improving productivity in the construction industry. This study investigates domestic research trends in ICT-based construction automation using text mining techniques. The results show that 'Technology to collect and analyze project progress(26%)' and 'Technology to analyze and apply the automation element of construction machinery(28%)' are the major research area. The word of 'construction information' is showed as important keywords in the area of 'Technology to collect and analyze project progress', and researches focusing on resource management, site management, information management, and real-time information monitoring have been mainly conducted. The word of 'ubiquitous' is shown as important keywords in the area of 'Technology to analyze and apply the automation element of construction machinery', and researches focusing on ubiquitous information management, ubiquitous site management, and measurement system have been mainly conducted.

A study on unstructured text mining algorithm through R programming based on data dictionary (Data Dictionary 기반의 R Programming을 통한 비정형 Text Mining Algorithm 연구)

  • Lee, Jong Hwa;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.20 no.2
    • /
    • pp.113-124
    • /
    • 2015
  • Unlike structured data which are gathered and saved in a predefined structure, unstructured text data which are mostly written in natural language have larger applications recently due to the emergence of web 2.0. Text mining is one of the most important big data analysis techniques that extracts meaningful information in the text because it has not only increased in the amount of text data but also human being's emotion is expressed directly. In this study, we used R program, an open source software for statistical analysis, and studied algorithm implementation to conduct analyses (such as Frequency Analysis, Cluster Analysis, Word Cloud, Social Network Analysis). Especially, to focus on our research scope, we used keyword extract method based on a Data Dictionary. By applying in real cases, we could find that R is very useful as a statistical analysis software working on variety of OS and with other languages interface.

Trends in the Study of Nursing Professionals in Korea: A Convergence Study of Text Network Analysis and Topic Modeling (국내 간호전문직관 연구 주제 동향: 텍스트네트워크분석과 토픽모델링의 융합)

  • Park, Chan-Sook
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.9
    • /
    • pp.295-305
    • /
    • 2021
  • The purpose of this study is to explore the trend of nursing professional research topics published domestically through quantitative content analysis. The research method performed procedures for collecting academic papers, refining and extracting words, and data analysis. A text network was developed by collecting 351 papers and extracting words from the abstract, and network analysis and topic modeling were performed. The core-topics were nurses, nursing professionalism, nursing students, nursing care, professional self-concept, health care professionals, satisfaction, clinical competence, and self-efficacy. Through topic modeling, topic groups of nurse's professionalism, nursing students' professionalism, nursing professional identity, and nursing competency were identified. Over time, core-topics remained unchanged, but topics such as role conflict and ethical values in the 1990s, self-leadership and socialization in the 2000s, and clinical practice stress and support systems in the 2010s have emerged. In conclusion, it is necessary to facilitate multidimensional interventional research to improve nursing professionalism of clinical nurses and nursing students.