• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.029 seconds

Identifying Hazard of Fire Accidents in Domestic Manufacturing Industry Using Data Analytics (국내 제조업 화재사고 데이터 분석을 통한 복합 유해·위험요인 확인)

  • Kyung Min Kim;Yongyoon Suh;Jong Bin Lee;Seong Rok Chang
    • Journal of the Korean Society of Safety
    • /
    • v.38 no.4
    • /
    • pp.23-31
    • /
    • 2023
  • Revising the Occupational Safety and Health Act led to enacting and revising related laws and systems, such as placing fire observers in hot workplaces. However, the operating standards in such cases are still ambiguous. Although fire accidents occur through multiple and multi-step factors, the hazards of fire accidents have been identified in this study as individual rather than interrelated factors. The aim has been to identify multiple factors of accidents, outlining fire and explosion accidents that recently occurred in the domestic manufacturing industry. First, major keywords were extracted through text mining. Then representative accident types were derived by combining the main keywords through the co-word network analysis to identify the hazards and their relationships. The representative fire accidents were identified as six types, and their major hazards were then addressed for improving safety measures using the identification of hazards in the "Risk Assessment" tool. It is found that various safety measures, such as professional fire observers' training and clear placement standards, are needed. This study will provide useful basic data for revising practical laws and guidelines for fire accident prevention, system supplementation, safety policy establishment, and future related research.

Analysis of University Unification Education Research Trends Using Text Network Analysis and Topic Modeling

  • Do-Young LEE
    • Journal of Wellbeing Management and Applied Psychology
    • /
    • v.6 no.4
    • /
    • pp.27-31
    • /
    • 2023
  • Purpose: This study analyzed papers identified by entering the two keywords 'unification education' and 'university' during research from 2013 to 2022 in order to identify trends and key concepts in unification education research at domestic universities. Research design, data, and methodology: The study analyzed 224 papers, excluding those on primary, middle, and high school unification education, as well as unrelated and duplicate papers. The analysis included developing a co-occurrence network of keywords, utilizing topic modeling to categorize research types, and confirming visualizations such as word clouds and sociograms. Results: In the final analysis, the research identified 1,500 keywords, with notable ones like 'Korea,' 'education,' 'unification.' Centrality analysis, measuring influence through connected keywords, revealed that 'Korea,' 'education,' 'north,' and 'unification' held significant positions. Keywords with high centrality compared to their frequency included 'learning,' 'development,' 'training,' 'peace,' and 'language,' in that order. Conclusions: This study investigated trends and structures in university-level unification education by analyzing papers identified with the keywords 'unification education' and 'university.' The use of keyword network analysis aimed to elucidate patterns and structures in university-level unification education. The significance of the study lies in offering foundational data for future research directions in the field of unification education at universities.

A Multi-Class Classifier of Modified Convolution Neural Network by Dynamic Hyperplane of Support Vector Machine

  • Nur Suhailayani Suhaimi;Zalinda Othman;Mohd Ridzwan Yaakub
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.11
    • /
    • pp.21-31
    • /
    • 2023
  • In this paper, we focused on the problem of evaluating multi-class classification accuracy and simulation of multiple classifier performance metrics. Multi-class classifiers for sentiment analysis involved many challenges, whereas previous research narrowed to the binary classification model since it provides higher accuracy when dealing with text data. Thus, we take inspiration from the non-linear Support Vector Machine to modify the algorithm by embedding dynamic hyperplanes representing multiple class labels. Then we analyzed the performance of multi-class classifiers using macro-accuracy, micro-accuracy and several other metrics to justify the significance of our algorithm enhancement. Furthermore, we hybridized Enhanced Convolution Neural Network (ECNN) with Dynamic Support Vector Machine (DSVM) to demonstrate the effectiveness and efficiency of the classifier towards multi-class text data. We performed experiments on three hybrid classifiers, which are ECNN with Binary SVM (ECNN-BSVM), and ECNN with linear Multi-Class SVM (ECNN-MCSVM) and our proposed algorithm (ECNNDSVM). Comparative experiments of hybrid algorithms yielded 85.12 % for single metric accuracy; 86.95 % for multiple metrics on average. As for our modified algorithm of the ECNN-DSVM classifier, we reached 98.29 % micro-accuracy results with an f-score value of 98 % at most. For the future direction of this research, we are aiming for hyperplane optimization analysis.

Perspectives of Frontline Nurses Working in South Korea during the COVID-19 Pandemic: A Combined Method of Text Network Analysis and Summative Content Analysis

  • Lee, SangA;Lee, Tae Wha;Lee, Seung Eun
    • Journal of Korean Academy of Nursing
    • /
    • v.53 no.6
    • /
    • pp.584-596
    • /
    • 2023
  • Purpose: This study aimed to explore the perspectives of frontline nurses working during the novel coronavirus disease 2019 (COVID-19) pandemic. Methods: An online qualitative study was conducted using a pragmatic approach. The data were collected in August 2021. Registered Korean nurses who provided direct nursing care to patients with confirmed COVID-19 were eligible for this study. An online survey was used to gather free-text data, which were then analyzed using machine-based network analysis and summative content analysis. Results: The analysis examined the responses of 126 participants and led to the identification of six prominent themes. These themes were further classified into three distinct levels: personal, task, and organizational. The identified themes are as follows: "collapse of personal life," "being overwhelmed by the numerous roles required," "personal protective equipment was sufficiently provided, but that is not enough," "changes in interprofessional collaboration," "inappropriate workforce management," and "diverted allocation of healthcare services and resources." Conclusion: Our findings highlight areas for improvement in resources, systems, and policies to enhance preparedness for future pandemics.

Topic Modeling of News Article about International Construction Market Using Latent Dirichlet Allocation (Latent Dirichlet Allocation 기법을 활용한 해외건설시장 뉴스기사의 토픽 모델링(Topic Modeling))

  • Moon, Seonghyeon;Chung, Sehwan;Chi, Seokho
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.595-599
    • /
    • 2018
  • Sufficient understanding of oversea construction market status is crucial to get profitability in the international construction project. Plenty of researchers have been considering the news article as a fine data source for figuring out the market condition, since the data includes market information such as political, economic, and social issue. Since the text data exists in unstructured format with huge size, various text-mining techniques were studied to reduce the unnecessary manpower, time, and cost to summarize the data. However, there are some limitations to extract the needed information from the news article because of the existence of various topics in the data. This research is aimed to overcome the problems and contribute to summarization of market status by performing topic modeling with Latent Dirichlet Allocation. With assuming that 10 topics existed in the corpus, the topics included projects for user convenience (topic-2), private supports to solve poverty problems in Africa (topic-4), and so on. By grouping the topics in the news articles, the results could improve extracting useful information and summarizing the market status.

A Text Mining-based Intrusion Log Recommendation in Digital Forensics (디지털 포렌식에서 텍스트 마이닝 기반 침입 흔적 로그 추천)

  • Ko, Sujeong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.2 no.6
    • /
    • pp.279-290
    • /
    • 2013
  • In digital forensics log files have been stored as a form of large data for the purpose of tracing users' past behaviors. It is difficult for investigators to manually analysis the large log data without clues. In this paper, we propose a text mining technique for extracting intrusion logs from a large log set to recommend reliable evidences to investigators. In the training stage, the proposed method extracts intrusion association words from a training log set by using Apriori algorithm after preprocessing and the probability of intrusion for association words are computed by combining support and confidence. Robinson's method of computing confidences for filtering spam mails is applied to extracting intrusion logs in the proposed method. As the results, the association word knowledge base is constructed by including the weights of the probability of intrusion for association words to improve the accuracy. In the test stage, the probability of intrusion logs and the probability of normal logs in a test log set are computed by Fisher's inverse chi-square classification algorithm based on the association word knowledge base respectively and intrusion logs are extracted from combining the results. Then, the intrusion logs are recommended to investigators. The proposed method uses a training method of clearly analyzing the meaning of data from an unstructured large log data. As the results, it complements the problem of reduction in accuracy caused by data ambiguity. In addition, the proposed method recommends intrusion logs by using Fisher's inverse chi-square classification algorithm. So, it reduces the rate of false positive(FP) and decreases in laborious effort to extract evidences manually.

Formulating Strategies from Consumer Opinion Analysis on AI Kids Phone using Text Mining (AI 키즈폰의 소비자리뷰 분석을 통한 제품개선 전략에 대한 연구)

  • Kim, Dohun;Cha, Kyungjin
    • The Journal of Society for e-Business Studies
    • /
    • v.24 no.2
    • /
    • pp.71-89
    • /
    • 2019
  • In order to come up with satisfying product and improvement, firms use traditional marketing research methods to obtain consumers' opinions and further try to reflect them. Recently, gathering data from consumer communication platforms like internet and SNS has become popular methods. Meanwhile, with the development of information technology, mobile companies are launching new digital products for children to protect them from harmful content and provide them with necessary functions and information. Among these digital products, Kids Phone, which is a wearable device with safe functions that enable parents to learn childern's location. Kids phone is relatively cheaper and simpler than smartphone but it is noted that there are several problems such as some useless functions and frequent breakdowns. This study analyzes the reviews of Kids phones from domestic mobile companies, identifies the characteristics, strengths and weaknesses of the products, proposes improvement methods strategies for devices and services through SNS consumer analysis. In order to do that customer review data from online shopping malls was gathered and was further analyzed through text mining methods such as TF/IDF, Sentiment Analysis, and network analysis. Customer review data was gathered through crawling Online shopping Mall and Naver Blog/$Caf\acute{e}$. Data analysis and visualization was done using 'R', 'Textom', and 'Python'. Such analysis allowed us to figure out main issues and recent trends regarding kids phones and to suggest possible service improvement strategies based on sentiment analysis.

The Prediction of Cryptocurrency on Using Text Mining and Deep Learning Techniques : Comparison of Korean and USA Market (텍스트 마이닝과 딥러닝을 활용한 암호화폐 가격 예측 : 한국과 미국시장 비교)

  • Won, Jonggwan;Hong, Taeho
    • Knowledge Management Research
    • /
    • v.22 no.2
    • /
    • pp.1-17
    • /
    • 2021
  • In this study, we predicted the bitcoin prices of Bithum and Coinbase, a leading exchange in Korea and USA, using ARIMA and Recurrent Neural Networks(RNNs). And we used news articles from each country to suggest a separated RNN model. The suggested model identifies the datasets based on the changing trend of prices in the training data, and then applies time series prediction technique(RNNs) to create multiple models. Then we used daily news data to create a term-based dictionary for each trend change point. We explored trend change points in the test data using the daily news keyword data of testset and term-based dictionary, and apply a matching model to produce prediction results. With this approach we obtained higher accuracy than the model which predicted price by applying just time series prediction technique. This study presents that the limitations of the time series prediction techniques could be overcome by exploring trend change points using news data and various time series prediction techniques with text mining techniques could be applied to improve the performance of the model in the further research.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

Study of the Application of VQA Deep Learning Technology to the Operation and Management of Urban Parks - Analysis of SNS Images - (도시공원 운영 및 관리를 위한 VQA 딥러닝 기술 활용 연구 - SNS 이미지 분석을 중심으로 -)

  • Lee, Da-Yeon;Park, Seo-Eun;Lee, Jae Ho
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.51 no.5
    • /
    • pp.44-56
    • /
    • 2023
  • This research explores the enhancement of park operation and management by analyzing the changing demands of park users. While traditional methods depended on surveys, there has been a recent shift towards utilizing social media data to understand park usage trends. Notably, most research has focused on text data from social media, overlooking the valuable insights from image data. Addressing this gap, our study introduces a novel method of assessing park usage using social media image data and then applies it to actual city park evaluations. A unique image analysis tool, built on Visual Question Answering (VQA) deep learning technology, was developed. This tool revealed specific city park details such as user demographics, behaviors, and locations. Our findings highlight three main points: (1) The VQA-based image analysis tool's validity was proven by matching its results with traditional text analysis outcomes. (2) VQA deep learning technology offers insights like gender, age, and usage time, which aren't accessible from text analysis alone. (3) Using VQA, we derived operational and management strategies for city parks. In conclusion, our VQA-based method offers significant methodological advancements for future park usage studies.