• 제목/요약/키워드: Text Data Analysis

검색결과 1,523건 처리시간 0.029초

텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축 (Development of Online Fashion Thesaurus and Taxonomy for Text Mining)

  • 장세윤;김하연;김송미;최우진;정진;이유리
    • 한국의류학회지
    • /
    • 제46권6호
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Study of Mental Disorder Schizophrenia, based on Big Data

  • Hye-Sun Lee
    • International Journal of Advanced Culture Technology
    • /
    • 제11권4호
    • /
    • pp.279-285
    • /
    • 2023
  • This study provides academic implications by considering trends of domestic research regarding therapy for Mental disorder schizophrenia and psychosocial. For the analysis of this study, text mining with the use of R program and social network analysis method have been used and 65 papers have been collected The result of this study is as follows. First, collected data were visualized through analysis of keywords by using word cloud method. Second, keywords such as intervention, schizophrenia, research, patients, program, effect, society, mind, ability, function were recorded with highest frequency resulted from keyword frequency analysis. Third, LDA (latent Dirichlet allocation) topic modeling result showed that classified into 3 keywords: patient, subjects, intervention of psychosocial, efficacy of interventions. Fourth, the social network analysis results derived connectivity, closeness centrality, betweennes centrality. In conclusion, this study presents significant results as it provided basic rehabilitation data for schizophrenia and psychosocial therapy through new research methods by analyzing with big data method by proposing the results through visualization from seeking research trends of schizophrenia and psychosocial therapy through text mining and social network analysis.

Text-Mining of Online Discourse to Characterize the Nature of Pain in Low Back Pain

  • Ryu, Young Uk
    • 대한물리의학회지
    • /
    • 제14권3호
    • /
    • pp.55-62
    • /
    • 2019
  • PURPOSE: Text-mining has been shown to be useful for understanding the clinical characteristics and patients' concerns regarding a specific disease. Low back pain (LBP) is the most common disease in modern society and has a wide variety of causes and symptoms. On the other hand, it is difficult to understand the clinical characteristics and the needs as well as demands of patients with LBP because of the various clinical characteristics. This study examined online texts on LBP to determine of text-mining can help better understand general characteristics of LBP and its specific elements. METHODS: Online data from www.spine-health.com were used for text-mining. Keyword frequency analysis was performed first on the complete text of postings (full-text analysis). Only the sentences containing the highest frequency word, pain, were selected. Next, texts including the sentences were used to re-analyze the keyword frequency (pain-text analysis). RESULTS: Keyword frequency analysis showed that pain is of utmost concern. Full-text analysis was dominated by structural, pathological, and therapeutic words, whereas pain-text analysis was related mainly to the location and quality of the pain. CONCLUSION: The present study indicated that text-mining for a specific element (keyword) of a particular disease could enhance the understanding of the specific aspect of the disease. This suggests that a consideration of the text source is required when interpreting the results. Clinically, the present results suggest that clinicians pay more attention to the pain a patient is experiencing, and provide information based on medical knowledge.

빅데이터 텍스트 마이닝 분석을 활용한 아메카지 패션 트렌드 특징 고찰 (A Study on the Characteristics of Amekaji Fashion Trends Using Big Data Text Mining Analysis)

  • 김지형
    • 패션비즈니스
    • /
    • 제26권3호
    • /
    • pp.138-154
    • /
    • 2022
  • The purpose of this study is to identify the characteristics of domestic American casual fashion trends using big data text mining analysis. 108,524 posts and 2,038,999 extracted keywords from Naver and Daum related to American casual fashion in the past 5 years were collected and refined by the Textom program, and frequency analysis, word cloud, N-gram, centrality analysis, and CONCOR analysis were performed. The frequency analysis, 'vintage', 'style', 'daily look', 'coordination', 'workwear', 'men's wear' appeared as the main keywords. The main nationality of the representative brands was Japanese, followed by American, Korean, and others. As a result of the CONCOR analysis, four clusters were derived: "general American casual trend", "vintage taste", "direct sales mania", and "American styling". This study results showed that Japanese American casual clothes are influenced by American casual clothes, and American casual fashion in Korea, which has been reinterpreted, is completed with various coordination and creative styles such as workwear, street, military, classic, etc., focusing on items and brands. Looks were worn and shared on social networks, and the existence of an active consumer group and market potential to obtain genuine products, ranging from second-hand transactions for limited edition vintages to individual transactions were also confirmed. The significance of this study is that it presented the characteristics of American casual fashion trends academically based on online text data that the public actually uses because it has been spread by the public.

텍스트 분석의 신뢰성 확보를 위한 스팸 데이터 식별 방안 (Detecting Spam Data for Securing the Reliability of Text Analysis)

  • 현윤진;김남규
    • 한국통신학회논문지
    • /
    • 제42권2호
    • /
    • pp.493-504
    • /
    • 2017
  • 최근 뉴스, 블로그, 소셜미디어 등을 통해 방대한 양의 비정형 텍스트 데이터가 쏟아져 나오고 있다. 이러한 비정형 텍스트 데이터는 풍부한 정보 및 의견을 거의 실시간으로 반영하고 있다는 측면에서 그 활용도가 매우 높아, 학계는 물론 산업계에서도 분석 수요가 증가하고 있다. 하지만 텍스트 데이터의 유용성이 증가함과 동시에 이러한 텍스트 데이터를 왜곡하여 특정 목적을 달성하려는 시도도 늘어나고 있다. 이러한 스팸성 텍스트 데이터의 증가는 방대한 정보 가운데 필요한 정보를 획득하는 일을 더욱 어렵게 만드는 것은 물론, 정보 자체 및 정보 제공 매체에 대한 신뢰도를 떨어뜨리는 현상을 초래하게 된다. 따라서 원본 데이터로부터 스팸성 데이터를 식별하여 제거함으로써, 정보의 신뢰성 및 분석 결과의 품질을 제고하기 위한 노력이 반드시 필요하다. 이러한 목적으로 스팸을 식별하기 위한 연구가 오피니언 스팸 탐지, 스팸 이메일 검출, 웹 스팸 탐지 등의 분야에서 매우 활발하게 수행되었다. 본 연구에서는 스팸 식별을 위한 기존의 연구 동향을 자세히 소개하고, 블로그 정보의 신뢰성 향상을 위한 방안 중 하나로 블로그의 스팸 태그를 식별하기 위한 방안을 제안한다.

Analyzing XR(eXtended Reality) Trends in South Korea: Opportunities and Challenges

  • Sukchang Lee
    • International Journal of Advanced Culture Technology
    • /
    • 제12권2호
    • /
    • pp.221-226
    • /
    • 2024
  • This study used text mining, a big data analysis technique, to explore XR trends in South Korea. For this research, I utilized a big data platform called BigKinds. I collected data focusing on the keyword 'XR', spanning approximately 14 years from 2010 to 2024. The gathered data underwent a cleansing process and was analyzed in three ways: keyword trend analysis, relational analysis, and word cloud. The analysis identified the emergence and most active discussion periods of XR, with XR devices and manufacturers emerging as key keywords.

웹 애플리케이션 기반의 텍스트 데이터 분석 모델 (Text Data Analysis Model Based on Web Application)

  • 진고환
    • 한국콘텐츠학회논문지
    • /
    • 제21권11호
    • /
    • pp.785-792
    • /
    • 2021
  • 4차 산업혁명 이후 인공지능, 빅 데이터와 같은 기술들의 발전으로 사회 전반에 다양한 변화가 일어나고 있으며, 핵심적인 기술 적용 과정에서 수집할 수 있는 데이터의 양도 급속하게 증가하고 있는 추세이다. 특히 학계에서는 연구 동향을 파악하기 위하여 기존에 생성된 문헌 데이터에 대한 분석이 이루어지고 있으며, 이러한 문헌 분석은 연구의 흐름을 정리하고, 어떤 연구 방법론이나 주제, 또는 현재 학계에서 화두가 되고 있는 대상에 대한 파악을 통하여 향후 연구 방향 설정에 많은 기여를 하고 있는 상황이다. 그러나 문서 데이터의 분석을 위하여 데이터 수집이 필요하나, 일반적으로 프로그램에 대한 전문 지식이 없는 경우 접근하기 어렵다. 본 논문에서는 텍스트 마이닝 기반의 토픽 모델링 웹 애플리케이션 모델을 제안한다. 제안 모델을 통하여 데이터 분석 기법에 대한 전문적인 지식이 부족하더라도, 연구 논문의 수집, 저장, 텍스트 분석과 같은 다양한 작업을 진행할 수 있으며, 연구자들이 선행 연구 분석과 연구 동향을 파악하기 위하여 데이터 분석에 투입되는 시간 및 노력을 단축시킬 수 있을 것으로 기대된다.

Analysis of Social Media Utilization based on Big Data-Focusing on the Chinese Government Weibo

  • Li, Xiang;Guo, Xiaoqin;Kim, Soo Kyun;Lee, Hyukku
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제16권8호
    • /
    • pp.2571-2586
    • /
    • 2022
  • The rapid popularity of government social media has generated huge amounts of text data, and the analysis of these data has gradually become the focus of digital government research. This study uses Python language to analyze the big data of the Chinese provincial government Weibo. First, this study uses a web crawler approach to collect and statistically describe over 360,000 data from 31 provincial government microblogs in China, covering the period from January 2018 to April 2022. Second, a word separation engine is constructed and these text data are analyzed using word cloud word frequencies as well as semantic relationships. Finally, the text data were analyzed for sentiment using natural language processing methods, and the text topics were studied using LDA algorithm. The results of this study show that, first, the number and scale of posts on the Chinese government Weibo have grown rapidly. Second, government Weibo has certain social attributes, and the epidemics, people's livelihood, and services have become the focus of government Weibo. Third, the contents of government Weibo account for more than 30% of negative sentiments. The classified topics show that the epidemics and epidemic prevention and control overshadowed the other topics, which inhibits the diversification of government Weibo.

Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology

  • Kim, Yoosin;Ju, Yeonjin;Hong, SeongGwan;Jeong, Seung Ryul
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권8호
    • /
    • pp.4133-4145
    • /
    • 2017
  • Advances in science and technology are driving us to the better life but also forcing us to make more investment at the same time. Therefore, the government has provided the investment to carry on the promising futuristic technology successfully. Indeed, a lot of resources from the government have supported into the science and technology R&D projects for several decades. However, the performance of the public investments remains unclear in many ways, so thus it is required that planning and evaluation about the new investment should be on data driven decision with fact based evidence. In this regard, the government wanted to know the trend and issue of the science and technology with evidences, and has accumulated an amount of database about the science and technology such as research papers, patents, project reports, and R&D information. Nowadays, the database is supporting to various activities such as planning policy, budget allocation, and investment evaluation for the science and technology but the information quality is not reached to the expectation because of limitations of text mining to drill out the information from the unstructured data like the reports and papers. To solve the problem, this study proposes a practical text mining methodology for the science and technology trend analysis, in case of aerospace technology, and conduct text mining methods such as ontology development, topic analysis, network analysis and their visualization.

온라인 리뷰의 텍스트 마이닝에 기반한 한국방문 외국인 관광객의 문화적 특성 연구 (A study on cultural characteristics of foreign tourists visiting Korea based on text mining of online review)

  • 야오즈옌;김은미;홍태호
    • 한국정보시스템학회지:정보시스템연구
    • /
    • 제29권4호
    • /
    • pp.171-191
    • /
    • 2020
  • Purpose The study aims to compare the online review writing behavior of users in China and the United States through text mining on online reviews' text content. In particular, existing studies have verified that there are differences in online reviews between different cultures. Therefore, the purpose of this study is to compare the differences between reviews written by Chinese and American tourists by analyzing text contents of online reviews based on cultural theory. Design/methodology/approach This study collected and analyzed online review data for hotels, targeting Chinese and US tourists who visited Korea. Then, we analyzed review data through text mining like sentiment analysis and topic modeling analysis method based on previous research analysis. Findings The results showed that Chinese tourists gave higher ratings and relatively less negative ratings than American tourists. And American tourists have more negative sentiments and emotions in writing online reviews than Chinese tourists. Also, through the analysis results using topic modeling, it was confirmed that Chinese tourists mentioned more topics about the hotel location, room, and price, while American tourists mentioned more topics about hotel service. American tourists also mention more topics about hotels than Chinese tourists, indicating that American tourists tend to provide more information through online reviews.