• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.028 seconds

Analysis of Potential Construction Risk Types in Formal Documents Using Text Mining (텍스트 마이닝을 통한 건설공사 공문 잠재적 리스크 유형 분석)

  • Eom, Sae Ho;Cha, Gichun;Park, Sun Kyu;Park, Seunghee;Park, Jongho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.43 no.1
    • /
    • pp.91-98
    • /
    • 2023
  • Since risks occurring in construction projects can have a significant impact on schedules and costs, there have been many studies on this topic. However, risk analysis is often limited to only certain construction situations,and experience-dependent decision-making is therefore mainly performed. Data-based analyses have only been partially applied to safety and contract documents. Therefore, in this study, cluster analysis and a Word2Vec algorithm were applied to formal documents that contain important elements for contractors or clients. An initial classification of document content into six types was performed through cluster analysis, and 157 occurrence types were subdivided through application of the Word2Vec algorithm. The derived terms were re-classified into five categories and reviewed as to whether the terms could develop into potential construction risk factors. Identifying potential construction risk factors will be helpful as basic data for process management in the construction industry.

Approximate Top-k Labeled Subgraph Matching Scheme Based on Word Embedding (워드 임베딩 기반 근사 Top-k 레이블 서브그래프 매칭 기법)

  • Choi, Do-Jin;Oh, Young-Ho;Bok, Kyoung-Soo;Yoo, Jae-Soo
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.8
    • /
    • pp.33-43
    • /
    • 2022
  • Labeled graphs are used to represent entities, their relationships, and their structures in real data such as knowledge graphs and protein interactions. With the rapid development of IT and the explosive increase in data, there has been a need for a subgraph matching technology to provide information that the user is interested in. In this paper, we propose an approximate Top-k labeled subgraph matching scheme that considers the semantic similarity of labels and the difference in graph structure. The proposed scheme utilizes a learning model using FastText in order to consider the semantic similarity of a label. In addition, the label similarity graph(LSG) is used for approximate subgraph matching by calculating similarity values between labels in advance. Through the LSG, we can resolve the limitations of the existing schemes that subgraph expansion is possible only if the labels match exactly. It supports structural similarity for a query graph by performing searches up to 2-hop. Based on the similarity value, we provide k subgraph matching results. We conduct various performance evaluations in order to show the superiority of the proposed scheme.

A Keyphrase Extraction Model for Each Conference or Journal (학술대회 및 저널별 기술 핵심구 추출 모델)

  • Jeong, Hyun Ji;Jang, Gwangseon;Kim, Tae Hyun;Sin, Donggu
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2022.10a
    • /
    • pp.81-83
    • /
    • 2022
  • Understanding research trends is necessary to select research topics and explore related works. Most researchers search representative keywords of interesting domains or technologies to understand research trends. However some conferences in artificial intelligence or data mining fields recently publish hundreds to thousands of papers for each year. It makes difficult for researchers to understand research trend of interesting domains. In our paper, we propose an automatic technology keyphrase extraction method to support researcher to understand research trend for each conference or journal. Keyphrase extraction that extracts important terms or phrases from a text, is a fundamental technology for a natural language processing such as summarization or searching, etc. Previous keyphrase extraction technologies based on pretrained language model extract keyphrases from long texts so performances are degraded in short texts like titles of papers. In this paper, we propose a techonolgy keyphrase extraction model that is robust in short text and considers the importance of the word.

  • PDF

Development of Basic Practice Cases for Recurrent Neural Networks (순환신경망 기초 실습 사례 개발)

  • Kyeong Hur
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.491-498
    • /
    • 2022
  • In this paper, as a liberal arts course for non-major students, a case study of recurrent neural network SW practice, which is essential for designing a basic recurrent neural network subject curriculum, was developed. The developed SW practice case focused on understanding the operation principle of the recurrent neural network, and used a spreadsheet to check the entire visualized operation process. The developed recurrent neural network practice case consisted of creating supervised text completion training data, implementing the input layer, hidden layer, state layer (context node), and output layer in sequence, and testing the performance of the recurrent neural network on text data. The recurrent neural network practice case developed in this paper automatically completes words with various numbers of characters. Using the proposed recurrent neural network practice case, it is possible to create an artificial intelligence SW practice case that automatically completes by expanding the maximum number of characters constituting Korean or English words in various ways. Therefore, it can be said that the utilization of this case of basic practice of recurrent neural network is high.

National Petition Analysis Related to Nursing: Text Network Analysis and Topic Modeling (간호관련 국민청원 분석: 텍스트네트워크 분석 및 토픽모델링)

  • Ko, HyunJung;Jeong, Seok Hee;Lee, Eun Jee;Kim, Hee Sun
    • Journal of Korean Academy of Nursing
    • /
    • v.53 no.6
    • /
    • pp.635-651
    • /
    • 2023
  • Purpose: This study aimed to identify the main keyword, network structure, and main topics of the national petition related to "nursing" in South Korea. Methods: Data were gathered from petitions related to the national petition in Korea Blue House related to the topic "nursing" or "nurse" from August 17, 2017, to May 9, 2022. A total of 5,154 petitions were searched, and 995 were selected for the final analysis. Text network analysis and topic modeling were analyzed using the Netminer 4.5.0 program. Results: Regarding network characteristics, a density of 0.03, an average degree of 144.483, and an average distance of 1.943 were found. Compared to results of degree centrality and betweenness centrality, keywords such as "work environment," "nursing university," "license," and "education" appeared typically in the eigenvector centrality analysis. Topic modeling derived four topics: (1) "Improving the working environment and dealing with nursing professionals," (2) "requesting investigation and punishment related to medical accidents," (3) "requiring clear role regulation and legislation of medical and nonmedical professions," and (4) "demanding improvement of healthcare-related systems and services." Conclusion: This is the first study to analyze Korea's national petitions in the field of nursing. This study's results confirmed both the internal needs and external demands for nurses in South Korea. Policies and laws that reflect these results should be developed.

Development of a Fake News Detection Model Using Text Mining and Deep Learning Algorithms (텍스트 마이닝과 딥러닝 알고리즘을 이용한 가짜 뉴스 탐지 모델 개발)

  • Dong-Hoon Lim;Gunwoo Kim;Keunho Choi
    • Information Systems Review
    • /
    • v.23 no.4
    • /
    • pp.127-146
    • /
    • 2021
  • Fake news isexpanded and reproduced rapidly regardless of their authenticity by the characteristics of modern society, called the information age. Assuming that 1% of all news are fake news, the amount of economic costs is reported to about 30 trillion Korean won. This shows that the fake news isvery important social and economic issue. Therefore, this study aims to develop an automated detection model to quickly and accurately verify the authenticity of the news. To this end, this study crawled the news data whose authenticity is verified, and developed fake news prediction models using word embedding (Word2Vec, Fasttext) and deep learning algorithms (LSTM, BiLSTM). Experimental results show that the prediction model using BiLSTM with Word2Vec achieved the best accuracy of 84%.

Analysis of Topics Related to Population Aging Using Natural Language Processing Techniques (자연어 처리 기술을 활용한 인구 고령화 관련 토픽 분석)

  • Hyunjung Park;Taemin Lee;Heuiseok Lim
    • Journal of Information Technology Services
    • /
    • v.23 no.1
    • /
    • pp.55-79
    • /
    • 2024
  • Korea, which is expected to enter a super-aged society in 2025, is facing the most worrisome crisis worldwide. Efforts are urgently required to examine problems and countermeasures from various angles and to improve the shortcomings. In this regard, from a new viewpoint, we intend to derive useful implications by applying the recent natural language processing techniques to online articles. More specifically, we derive three research questions: First, what topics are being reported in the online media and what is the public's response to them? Second, what is the relationship between these aging-related topics and individual happiness factors? Third, what are the strategic directions and implications for benchmarking discussed to solve the problem of population aging? To find answers to these, we collect Naver portal articles related to population aging and their classification categories, comments, and number of comments, including other numerical data. From the data, we firstly derive 33 topics with a semi-supervised BERTopic by reflecting article classification information that was not used in previous studies, conducting sentiment analysis of comments on them with a current open-source large language model. We also examine the relationship between the derived topics and personal happiness factors extended to Alderfer's ERG dimension, carrying out additional 3~4-gram keyword frequency analysis, trend analysis, text network analysis based on 3~4-gram keywords, etc. Through this multifaceted approach, we present diverse fresh insights from practical and theoretical perspectives.

Metaverse Platform Customer Review Analysis Using Text Mining Techniques (텍스트 마이닝 기법을 활용한 메타버스 플랫폼 고객 리뷰 분석)

  • Hye Jin Kim;Jung Seung Lee;Soo Kyung Kim
    • Journal of Information Technology Applications and Management
    • /
    • v.31 no.1
    • /
    • pp.113-122
    • /
    • 2024
  • This comprehensive study delves into the analysis of user review data across various metaverse platforms, employing advanced text mining techniques such as TF-IDF and Word2Vec to gain insights into user perceptions. The primary objective is to uncover the factors that contribute to user satisfaction and dissatisfaction, thereby providing a nuanced understanding of user experiences in the metaverse. Through TF-IDF analysis, the research identifies key words and phrases frequently mentioned in user reviews, highlighting aspects that resonate positively with users, such as the ability to engage in creative activities and social interactions within these virtual environments. Word2Vec analysis further enriches this understanding by revealing the contextual relationships between words, offering a deeper insight into user sentiments and the specific features that enhance their engagement with the platforms. A significant finding of this study is the identification of common grievances among users, particularly related to the processes of refunds and login, which point to broader issues within payment systems and user interface designs across platforms. These insights are critical for developers and operators of metaverse platforms, suggesting a focused approach towards enhancing user experiences by amplifying positive aspects. The research underscores the importance of continuous improvement in user interface design and the transparency of payment systems to foster a loyal user base. By providing a comprehensive analysis of user reviews, this study offers valuable guidance for the strategic development and optimization of metaverse platforms, ensuring they remain responsive to user needs and continue to evolve as vibrant, engaging virtual environments.

Analysis of Teachers' Awareness and Practice of Infants and Young Children's Health (영유아의 건강에 대한 교사의 인식 및 실천 분석)

  • Yu-Mi Park;Seon-Mi Park
    • Journal of the Health Care and Life Science
    • /
    • v.11 no.2
    • /
    • pp.261-269
    • /
    • 2023
  • The purpose of this study was to analyze the perception and practice of young children's health, which is emphasized in the stories of early childhood teachers. To collect data, telephone interviews were conducted with 15 teachers of kindergartens and daycare centers in Daejeon and Chungnam The collected data was analyzed by text network analysis. The research results are as follows. First, the participants observed the health of young children when they went to school, and contacted parents in case of abnormal signs. Second, the participants considered it important to understand the physical condition of children, proper nutrition intake, and manage health problems according to the characteristics of institutions where many people live together. Third, in relation to the management of infectious diseases, the participants were practicing to separate the child with symptoms from others, conduct disinfection and quarantine, and contact the parentst. Finally, the participants recognized that they should be educated related to safety in preparation for emergency, familiarize themselves with manuals in emergency situations, and know first aid methods according to the situation.

A Comparative Study on the Perception of a Leading Smart Device Brand in Korea and China: Focusing on Text Analysis

  • Eun-Ji Lee;Jae-Young Moon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.29 no.9
    • /
    • pp.225-236
    • /
    • 2024
  • This study focuses on Xiaomi, which has gained attention amid the Fourth Industrial Revolution. Using Textom, customer perceptions of Xiaomi were collected over approximately 10 years and analyzed. Data from 2015 to 2023 were used to compare customer perceptions in South Korea and China. The analysis revealed that before 2016, both countries focused on Xiaomi as a company and its basic products. However, after 2016, perceptions shifted to include keywords related to expansion products. Additionally, perceptions of Xiaomi were positive in both countries, with South Korea showing an increasing positivity, while China maintained positive views. This suggests that entry barriers for Xiaomi in the domestic market have decreased significantly. Future research should involve big data analysis and comparative studies with other countries for more objective insights.