• Title/Summary/Keyword: Text data

Search Result 2,953, Processing Time 0.029 seconds

Finding Meaningful Pattern of Key Words in IIE Transactions Using Text Mining (텍스트마이닝을 활용한 산업공학 학술지의 논문 주제어간 연관관계 연구)

  • Cho, Su-Gon;Kim, Seoung-Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.38 no.1
    • /
    • pp.67-73
    • /
    • 2012
  • Identification of meaningful patterns and trends in large volumes of text data is an important task in various research areas. In the present study we crawled the keywords from the abstracts in IIE Transactions, one of the representative journals in the field of Industrial Engineering from 1969 to 2011. We applied low-dimensional embedding method, clustering analysis, association rule, and social network analysis to find meaningful associative patterns of key words frequently appeared in the paper.

Understanding Facility Management on Tunnel through Text Mining of Precision Safety Diagnosis Data (터널시설물 점검진단 데이터의 텍스트마이닝 분석을 통한 유형별·지역별 중점 유지관리요소의 이해)

  • Seo, Jeong-eun;Oh, Jintak
    • Journal of Korean Association for Spatial Structures
    • /
    • v.21 no.3
    • /
    • pp.85-92
    • /
    • 2021
  • The purpose of this paper is to understand the key factors for efficient maintenance of rapidly aging facilities. Therefore, the safety inspection/diagnosis reports accumulated in the unstructured data were collected and preprocessed. Then, the analysis was performed using a text mining analysis method. The derived vulnerabilities of tunnel facilities can be used as elements of inspections that take into account the characteristics of individual facilities during regular inspections and daily inspections in the short term. In addition, if detailed specification information and other inspection results(safety, durability, and ease of use) are used for analysis, it provides a stepping stone for supporting preemptive maintenance decision-making in the long term.

Research on Methods for Processing Nonstandard Korean Words on Social Network Services (소셜네트워크서비스에 활용할 비표준어 한글 처리 방법 연구)

  • Lee, Jong-Hwa;Le, Hoanh Su;Lee, Hyun-Kyu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.21 no.3
    • /
    • pp.35-46
    • /
    • 2016
  • Social network services (SNS) that help to build relationship network and share a particular interest or activity freely according to their interests by posting comments, photos, videos,${\ldots}$ on online communities such as blogs have adopted and developed widely as a social phenomenon. Several researches have been done to explore the pattern and valuable information in social networks data via text mining such as opinion mining and semantic analysis. For improving the efficiency of text mining, keyword-based approach have been applied but most of researchers argued the limitations of the rules of Korean orthography. This research aims to construct a database of non-standard Korean words which are difficulty in data mining such abbreviations, slangs, strange expressions, emoticons in order to improve the limitations in keyword-based text mining techniques. Based on the study of subjective opinions about specific topics on blogs, this research extracted non-standard words that were found useful in text mining process.

Text Classification based on a Feature Projection Technique with Robustness from Noisy Data (오류 데이타에 강한 자질 투영법 기반의 문서 범주화 기법)

  • 고영중;서정연
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.4
    • /
    • pp.498-504
    • /
    • 2004
  • This paper presents a new text classifier based on a feature projection technique. In feature projections, training documents are represented as the projections on each feature. A classification process is based on individual feature projections. The final classification is determined by the sum from the individual classification of each feature. In our experiments, the proposed classifier showed high performance. Especially, it have fast execution speed and robustness with noisy data in comparison with k-NN and SVM, which are among the state-of-art text classifiers. Since the algorithm of the proposed classifier is very simple, its implementation and training process can be done very simply. Therefore, it can be a useful classifier in text classification tasks which need fast execution speed, robustness, and high performance.

Segment unit shuffling layer in deep neural networks for text-independent speaker verification (문장 독립 화자 인증을 위한 세그멘트 단위 혼합 계층 심층신경망)

  • Heo, Jungwoo;Shim, Hye-jin;Kim, Ju-ho;Yu, Ha-Jin
    • The Journal of the Acoustical Society of Korea
    • /
    • v.40 no.2
    • /
    • pp.148-154
    • /
    • 2021
  • Text-Independent speaker verification needs to extract text-independent speaker embedding to improve generalization performance. However, deep neural networks that depend on training data have the potential to overfit text information instead of learning the speaker information when repeatedly learning from the identical time series. In this paper, to prevent the overfitting, we propose a segment unit shuffling layer that divides and rearranges the input layer or a hidden layer along the time axis, thus mixes the time series information. Since the segment unit shuffling layer can be applied not only to the input layer but also to the hidden layers, it can be used as generalization technique in the hidden layer, which is known to be effective compared to the generalization technique in the input layer, and can be applied simultaneously with data augmentation. In addition, the degree of distortion can be adjusted by adjusting the unit size of the segment. We observe that the performance of text-independent speaker verification is improved compared to the baseline when the proposed segment unit shuffling layer is applied.

Rating and Comments Mining Using TF-IDF and SO-PMI for Improved Priority Ratings

  • Kim, Jinah;Moon, Nammee
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.11
    • /
    • pp.5321-5334
    • /
    • 2019
  • Data mining technology is frequently used in identifying the intention of users over a variety of information contexts. Since relevant terms are mainly hidden in text data, it is necessary to extract them. Quantification is required in order to interpret user preference in association with other structured data. This paper proposes rating and comments mining to identify user priority and obtain improved ratings. Structured data (location and rating) and unstructured data (comments) are collected and priority is derived by analyzing statistics and employing TF-IDF. In addition, the improved ratings are generated by applying priority categories based on materialized ratings through Sentiment-Oriented Point-wise Mutual Information (SO-PMI)-based emotion analysis. In this paper, an experiment was carried out by collecting ratings and comments on "place" and by applying them. We confirmed that the proposed mining method is 1.2 times better than the conventional methods that do not reflect priorities and that the performance is improved to almost 2 times when the number to be predicted is small.

A Study on Contributor to Sports Development Big Data Research Using Oral Records

  • Byun, Jisun
    • Journal of Multimedia Information System
    • /
    • v.8 no.4
    • /
    • pp.301-308
    • /
    • 2021
  • The purpose of this study is to analyze the oral records of sports development contributors to explore the direction of big data research on sports development contributors in the future. To this end, the audio file produced in the interview with Lee00, a sports development contributor, was converted into text. The major themes were extracted by analyzing these oral records. The sub-themes were extracted in chronological order. Keywords were extracted by analyzing sub-themes. And the extracted keywords are searched in Google search engine to find related topics and to use them. A Google search for the topic 'Mt. Inwang' extracted from the oral archives of Lee00, a contributor to the development of sports, finds newspaper articles about President Moon Jae-in's climbing Mt. Inwang and opening up Mt. Bukhan. In addition, articles about Mt. Inwang and mountain climbers that the narrator In-jeong Lee speaks are searched for. Through these articles, you can Deriving the theme of the museum exhibition, Collection of museum exhibits, Use as climbing education material.

A Study on Gamification Consumer Perception Analysis Using Big Data

  • Se-won Jeon;Youn Ju Ahn;Gi-Hwan Ryu
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.3
    • /
    • pp.332-337
    • /
    • 2023
  • The purpose of the study was to analyze consumers' perceptions of gamification. Based on the analyzed data, we would like to provide data by systematically organizing the concept, game elements, and mechanisms of gamification. Recently, gamification can be easily found around medical care, corporate marketing, and education. This study collected keywords from social media portal sites Naver, Daum, and Google from 2018 to 2023 using TEXTOM, a social media analysis tool. In this study, data were analyzed using text mining, semantic network analysis, and CONCOR analysis methods. Based on the collected data, we looked at the relevance and clusters related to gamification. The clusters were divided into a total of four clusters: 'Awareness of Gamification', 'Gamification Program', 'Future Technology of Gamification', and 'Use of Gamification'. Through social media analysis, we want to investigate and identify consumers' perceptions of gamification use, and check market and consumer perceptions to make up for the shortcomings. Through this, we intend to develop a plan to utilize gamification.

Analysis of Smart Tourism Issues Using Social Big Data Analysis

  • Se-won Jeon;Gi-Hwan Ryu
    • International journal of advanced smart convergence
    • /
    • v.13 no.3
    • /
    • pp.300-305
    • /
    • 2024
  • Smart tourism enhances communication between tourists and residents, improves quality of life, increases the utilization of local tourism resources, and helps manage cities efficiently. This paper analyzes recent issues and trends in smart tourism, derives key factors for activating smart tourism based on the analyzed data, and conducts research on promoting smart tourism. Using smart tourism as a keyword, data was collected through Textom. The collection scope included a total of 33,588 pieces of data related to smart tourism over the past year, from May 1, 2023, to May 1, 2024. The data was analyzed using text mining and social network analysis techniques. Through this analysis, the paper suggests directions for the development of smart tourism, enabling the activation of local tourism and effective urban management.

Big Data Analysis on the Perception of Home Training According to the Implementation of COVID-19 Social Distancing

  • Hyun-Chang Keum;Kyung-Won Byun
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.15 no.3
    • /
    • pp.211-218
    • /
    • 2023
  • Due to the implementation of COVID-19 distancing, interest and users in 'home training' are rapidly increasing. Therefore, the purpose of this study is to identify the perception of 'home training' through big data analysis on social media channels and provide basic data to related business sector. Social media channels collected big data from various news and social content provided on Naver and Google sites. Data for three years from March 22, 2020 were collected based on the time when COVID-19 distancing was implemented in Korea. The collected data included 4,000 Naver blogs, 2,673 news, 4,000 cafes, 3,989 knowledge IN, and 953 Google channel news. These data analyzed TF and TF-IDF through text mining, and through this, semantic network analysis was conducted on 70 keywords, big data analysis programs such as Textom and Ucinet were used for social big data analysis, and NetDraw was used for visualization. As a result of text mining analysis, 'home training' was found the most frequently in relation to TF with 4,045 times. The next order is 'exercise', 'Homt', 'house', 'apparatus', 'recommendation', and 'diet'. Regarding TF-IDF, the main keywords are 'exercise', 'apparatus', 'home', 'house', 'diet', 'recommendation', and 'mat'. Based on these results, 70 keywords with high frequency were extracted, and then semantic indicators and centrality analysis were conducted. Finally, through CONCOR analysis, it was clustered into 'purchase cluster', 'equipment cluster', 'diet cluster', and 'execute method cluster'. For the results of these four clusters, basic data on the 'home training' business sector were presented based on consumers' main perception of 'home training' and analysis of the meaning network.