• Title/Summary/Keyword: Text frequency analysis

Search Result 454, Processing Time 0.024 seconds

A Structural Analysis of Acupuncture & Moxibustion Points in the NaeGyeong Chapter of DongUiBoGam Using Text Mining (텍스트마이닝을 이용한 동의보감의 질병인식방식과 내경편 침구법 경혈 특성 분석)

  • Lee, Taehyung;Jung, Won-Mo;Lee, In-Seon;Lee, Hyejung;Kim, Namil;Chae, Younbyoung
    • Korean Journal of Acupuncture
    • /
    • v.30 no.4
    • /
    • pp.230-242
    • /
    • 2013
  • Objectives : DongUiBoGam is a representative medical literature in Korea. This research intends to structurally grasp how DongUiBoGam understands the human body and review the methods of acupuncture and moxibustion in the NaeGyeong chapter of it using text mining. Methods : The structure of DongUiBoGam was analyzed with specific parts of the book that described contents, major premises of understanding the human body, and processes of treatment. We analyzed characteristics of each acupoints in a relationship with causes of diseases & symptoms in the NaeGyeong chapter using a Term Frequency - Inverse Document Frequency(TFIDF). Results : Three different categories of pattern identification(PI) were formed after structural analysis of DongUiBoGam. Every causes of diseases & symptoms were transformed according to the three categories of PI. After analyzing the relationship between acupoints and causes of diseases & symptoms, 114 acupoints were visualized with TFIDF values of three PI categories. Conclusions : The selection of acupoints in NaeGyeong chapter of DongUiBoGam were linked to causes of diseases & symptoms based on the three PI categories. Through visualization of bipartite relationships between acupoints and causes of diseases & symptoms, we could easily understand characteristics of each acupoint.

Text mining analysis of terms and information on product names used in online sales of women's clothing (텍스트마이닝을 활용한 온라인 판매 여성 의류 상품명에 나타난 용어 및 정보분석)

  • Yeo Sun Kang
    • The Research Journal of the Costume Culture
    • /
    • v.31 no.1
    • /
    • pp.34-52
    • /
    • 2023
  • In this study, text mining was conducted on the product names of skirts, pants, shirts/blouses, and dresses to analyze the characteristics of keywords appearing in online shopping product names. As a result of frequency analysis, the number of keywords that appeared 0.5% or more for each item was around 30, and the number of keywords that appeared 0.1% or more was around 150. The cumulative distribution rate of 150 terms was around 80%. Accordingly, information on 150 key terms was analyzed, from which item, clothing composition, and material information were the found to be the most important types of information (ranking in the top five of all items). In addition, fit and style information for skirts and pants and length information for skirts and dresses were also considered important information. Keywords representing clothing composition information were: banding, high waist, and split for skirts and pants; and V-neck, tie, long sleeves, and puff for shirts/blouses and dresses. It was possible to identify the current design characteristics preferred by consumers from this information. However, there were also problems with terminology that hindered the connection between sellers and consumers. The most common problems were the use of various terms with the same meaning and irregular use of Korean and English terms. However, as a result of using co-appearance frequency analysis, it can be interpreted that there is little intention for product exposure, so it is recommended to avoid it.

A Method for Short Text Classification using SNS Feature Information based on Markov Logic Networks (SNS 특징정보를 활용한 마르코프 논리 네트워크 기반의 단문 텍스트 분류 방법)

  • Lee, Eunji;Kim, Pankoo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.7
    • /
    • pp.1065-1072
    • /
    • 2017
  • As smart devices and social network services (SNSs) become increasingly pervasive, individuals produce large amounts of data in real time. Accordingly, studies on unstructured data analysis are actively being conducted to solve the resultant problem of information overload and to facilitate effective data processing. Many such studies are conducted for filtering inappropriate information. In this paper, a feature-weighting method considering SNS-message features is proposed for the classification of short text messages generated on SNSs, using Markov logic networks for category inference. The performance of the proposed method is verified through a comparison with an existing frequency-based classification methods.

An Analysis on Learning Effects of Character Animation Based-Mobile Foreign Language Vocabulary Learning App (캐릭터 애니메이션 기반 모바일 외국어 어휘 학습 앱 효과 분석)

  • Kim, Insook;Choi, Minsuh;Ko, Hyeyoung
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.12
    • /
    • pp.1526-1533
    • /
    • 2018
  • This study aims to provide implications for mobile foreign language vocabulary learning app by analyzing the effects of mobile vocabulary learning app based on character animation. For this purpose, we applied the learning application designed with character animation and text, and the application designed with text only to two groups of learners, and analyzed the effect. As a result, we found that application designed with character animation and text was useful in recognition frequency and duration concerning learning. Regarding learning outcomes, we found that it is useful not only in memory but also in learning interest and motivation. This study provides implications for learning method and design development of mobile-based foreign language vocabulary learning application which actively using recently.

Social media big data analysis of Z-generation fashion (Z세대 패션에 대한 소셜미디어의 빅데이터 분석)

  • Sung, Kwang-Sook
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.22 no.3
    • /
    • pp.49-61
    • /
    • 2020
  • This study analyzed the social media accounts and performed a Big Data analysis of Z-generation fashion using Textom Text Mining Techniques program and Ucinet Big Data analysis program. The research results are as follows: First, as a result of keyword analysis on 67.646 Z-generation fashion social media posts over the last 5 years, 220,211 keywords were extracted. Among them, 67 major keywords were selected based on the frequency of co-occurrence being greater than more than 250 times. As the top keywords appearing over 1000 times, were the most influential as the number of nodes connected to 'Z generation' (29595 times) are overwhelmingly, and was followed by 'millennials'(18536 times), 'fashion'(17836 times), and 'generation'(13055 times), 'brand'(8325 times) and 'trend'(7310 times) Second, as a result of the analysis of Network Degree Centrality between the key keywords for the Z-generation, the number of nodes connected to the "Z-generation" (29595 times) is overwhelmingly large. Next, many 'millennial'(18536 times), 'fashion'(17836 times), 'generation'(13055 times), 'brand'(8325 times), 'trend'(7310 times), etc. appear. These texts are considered to be important factors in exploring the reaction of social media to the Z-generation. Third, through the analysis of CONCOR, text with the structural equivalence between major keywords for Gen Z fashion was rearranged and clustered. In addition, four clusters were derived by grouping through network semantic network visualization. Group 1 is 54 texts, 'Diverse Characteristics of Z-Generation Fashion Consumers', Group 2 is 7 Texts, 'Z-Generation's teenagers Fashion Powers', Group 3 is 8 Texts, 'Z-Generation's Celebrity Fashions' Interest and Fashion', Group 4 named 'Gucci', the most popular luxury fashion of the Z-generation as one text.

Trend Analysis of Fraudulent Claims by Long Term Care Institutions for the Elderly using Text Mining and BIGKinds (텍스트 마이닝과 빅카인즈를 활용한 노인장기요양기관 부당청구 동향 분석)

  • Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.13-24
    • /
    • 2022
  • In order to explore the context of fraudulent claims and the measures for preventing them targeting the long-term care institutions for the elderly, which is increasing every year in Korea, this study conducted the text mining analysis using the media report articles. The media report articles were collected from the news big data analysis system called 'BIG KINDS' for about 15 years from July 2008 when the Long-Term Care Insurance for the Elderly took effect, to February 28th 2022. During this period of time, total 2,627 articles were collected under keywords like 'elderly care+fraudulent claims' and 'long-term care+fraudulent claims', and among them, total 946 articles were selected after excluding overlapped articles. In the results of the text mining analysis in this study, first, the top 10 keywords mentioned in the highest frequency in every section(July 1st 2008-February 28th 2022) were shown in the order of long-term care institution for the elderly, fraudulent claims, National Health Insurance Service, Long-Term Care Insurance for the Elderly, long-term care benefits(expenses), elderly care facilities, The Ministry of Health & Welfare, the elderly, report, and reward(payment). Second, in the results of the N-gram analysis, they were shown in the order of long-term care benefits(expenses) and fraudulent claims, fraudulent claims and long-care institution for the elderly, falsehood and fraudulent claims, report and reward(payment), and long-term care institution for the elderly and report. Third, the analysis of TF-IDF was similar to the results of the frequency analysis while the rankings of report, reward(payment), and increase moved up. Based on such results of the analysis above, this study presented the future direction for the prevention of fraudulent claims of long-term care institutions for the elderly.

A Study on the Changes in Consumer Perceptions of the Relationship between Ethical Consumption and Consumption Value: Focusing on Analyzing Ethical Consumption and Consumption Value Keyword Changes Using Big Data (윤리적 소비와 소비가치의 관계에 대한 소비자 인식 변화: 소셜 빅데이터를 활용한 윤리적 소비와 소비가치의 키워드 변화 분석을 중심으로)

  • Shin, Eunjung;Koh, Ae-Ran
    • Human Ecology Research
    • /
    • v.59 no.2
    • /
    • pp.245-259
    • /
    • 2021
  • The purpose of this study was to analyze big data to identify the sub-dimensions of ethical consumption, as well as the consumption value associated with ethical consumption that changes over time. For this study, data were collected from Naver and Daum using the keyword 'ethical consumption' and frequency and matrix data were extracted through Textom, for the period January 1, 2016, to December 31, 2018. In addition, a two-way mode network analysis was conducted using the UCINET 6.0 program and visualized using the NetDraw function. The results of text mining show increasing keyword frequency year-on-year, indicating that interest in ethical consumption has grown. The sub-dimensions derived for 2014 and 2015 are fair trade, ethical consumption, eco-friendly products, and cooperatives and for 2016 are fair trade, ethical consumption, eco-friendly products and animal welfare. The results of deriving consumption value keywords were classified as emotional value, social value, functional value and conditional value. The influence of functional value was found to be growing over time. Through network analysis, the relationship between the sub-dimensions of ethical consumption and consumption values derived each year from 2014 to 2018 showed a significantly strong correlation between eco-friendly product consumption and emotional value, social value, functional value and conditional value.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.

Analysis of Keywords in national river occupancy permits by region using text mining and network theory (텍스트 마이닝과 네트워크 이론을 활용한 권역별 국가하천 점용허가 키워드 분석)

  • Seong Yun Jeong
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.185-197
    • /
    • 2023
  • This study was conducted using text mining and network theory to extract useful information for application for occupancy and performance of permit tasks contained in the permit contents from the permit register, which is used only for the simple purpose of recording occupancy permit information. Based on text mining, we analyzed and compared the frequency of vocabulary occurrence and topic modeling in five regions, including Seoul, Gyeonggi, Gyeongsang, Jeolla, Chungcheong, and Gangwon, as well as normalization processes such as stopword removal and morpheme analysis. By applying four types of centrality algorithms, including stage, proximity, mediation, and eigenvector, which are widely used in network theory, we looked at keywords that are in a central position or act as an intermediary in the network. Through a comprehensive analysis of vocabulary appearance frequency, topic modeling, and network centrality, it was found that the 'installation' keyword was the most influential in all regions. This is believed to be the result of the Ministry of Environment's permit management office issuing many permits for constructing facilities or installing structures. In addition, it was found that keywords related to road facilities, flood control facilities, underground facilities, power/communication facilities, sports/park facilities, etc. were at a central position or played a role as an intermediary in topic modeling and networks. Most of the keywords appeared to have a Zipf's law statistical distribution with low frequency of occurrence and low distribution ratio.

Analysis of Hip-hop Fashion Codes in Contemporary Chinese Fashion

  • Sen, Bin;Haejung, Yum
    • Journal of Fashion Business
    • /
    • v.26 no.6
    • /
    • pp.1-13
    • /
    • 2022
  • The purpose of this study was to find out the type of fashion codes hip-hop fashion has in contemporary Chinese fashion, and the frequency and characteristics of each fashion code. Text mining, which is the most basic analysis method in big data analyticswas used rather than traditional design element analysis. Specific results were as follows. First, hip-hop initially entered China in the late 1970s. The most historical turning point was the American film "Breakin". Second, frequency and word cloud analysis results showed that the "national tide" fashion code was the most notable code. Third, through word embedding analysis, fashion codes were divided into types of "original hip-hop codes", "trendy hip-hop codes", and "hip-hop codes grafted with traditional Chinese culture".