• 제목/요약/키워드: TF-IDF analysis

Search Result 197, Processing Time 0.022 seconds

Social network analysis of keyword community network in IoT patent data (키워드 커뮤니티 네트워크의 소셜 네트워크 분석을 이용한 사물 인터넷 특허 분석)

  • Kim, Do Hyun;Kim, Hyon Hee;Kim, Donggeon;Jo, Jinnam
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.4
    • /
    • pp.719-728
    • /
    • 2016
  • In this paper, we analyzed IoT patent data using the social network analysis of keyword community network in patents related to Internet of Things technology. To identify the difference of IoT patent trends between Korea and USA, 100 Korea patents and 100 USA patents were collected, respectively. First, we first extracted important keywords from IoT patent abstracts using the TF-IDF weight and their correlation and then constructed the keyword network based on the selected keywords. Second, we constructed a keyword community network based on the keyword community and performed social network analysis. Our experimental results showed while Korea patents focus on the core technologies of IoT (such as security, semiconductors and image process areas), USA patents focus on the applications of IoT (such as the smart home, interactive media and telecommunications).

Analysis of Symptoms-Herbs Relationships in Shanghanlun Using Text Mining Approach (텍스트마이닝 기법을 이용한 『상한론』 내의 증상-본초 조합의 탐색적 분석)

  • Jang, Dongyeop;Ha, Yoonsu;Lee, Choong-Yeol;Kim, Chang-Eop
    • Journal of Physiology & Pathology in Korean Medicine
    • /
    • v.34 no.4
    • /
    • pp.159-169
    • /
    • 2020
  • Shanghanlun (Treatise on Cold Damage Diseases) is the oldest document in the literature on clinical records of Traditional Asian medicine (TAM), on which TAM theories about symptoms-herbs relationships are based. In this study, we aim to quantitatively explore the relationships between symptoms and herbs in Shanghanlun. The text in Shanghanlun was converted into structured data. Using the structured data, Term Frequency - Inverse Document Frequency (TF-IDF) scores of symptoms and herbs were calculated from each chapter to derive the major symptoms and herbs in each chapter. To understand the structure of the entire document, principal component analysis (PCA) was performed for the 6-dimensional chapter space. Bipartite network analysis was conducted focusing on Jaccard scores between symptoms and herbs and eigenvector centralities of nodes. TF-IDF scores showed the characteristics of each chapter through major symptoms and herbs. Principal components drawn by PCA suggested the entire structure of Shanghanlun. The network analysis revealed a 'multi herbs - multi symptoms' relationship. Common symptoms and herbs were drawn from high eigenvector centralities of their nodes, while specific symptoms and herbs were drawn from low centralities. Symptoms expected to be treated by herbs were derived, respectively. Using measurable metrics, we conducted a computational study on patterns of Shanghanlun. Quantitative researches on TAM theories will contribute to improving the clarity of TAM theories.

Music Lyrics Summarization Method using TextRank Algorithm (TextRank 알고리즘을 이용한 음악 가사 요약 기법)

  • Son, Jiyoung;Shin, Yongtae
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.1
    • /
    • pp.45-50
    • /
    • 2018
  • This research paper describes how to summarize music lyrics using the TextRank algorithm. This method can summarize music lyrics as important lyrics. Therefore, we recommend music more effectively than analyzing the number of words and recommending music.

Analysis of major components of YouTube fishing content (유튜브 낚시성 콘텐츠의 주요 구성요소 분석)

  • Lee, Seo-Woo;Jo, Mi-jeong;Chae, Eun-bi;Kim, Hae-in
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2022.11a
    • /
    • pp.779-781
    • /
    • 2022
  • 본 연구에서는 낚시성 콘텐츠의 주요 구성 요소인 썸네일과 제목을 MLKit와 TF-IDF를 이용하여 분석하고 이를 딥러닝 Sentence BERT 모델에 적용하였다. 이를 활용하여 추후 낚시성 콘텐츠를 걸러내는 알고리즘을 개발 예정이다.

Keyword Analysis of Research on Consumption of Children and Adolescents Using Text Mining (텍스트마이닝을 활용한 아동, 청소년 대상 소비관련 연구 키워드 분석)

  • Jin, Hyun-Jeong
    • Journal of Korean Home Economics Education Association
    • /
    • v.33 no.4
    • /
    • pp.1-13
    • /
    • 2021
  • The purpose of this study is to identify trends and potential themes of research on consumption of children and adolescents for 20 years by analyzing keywords. The keywords of 869 studies on consumption of children and adolescents published in journals listed in Korean Citation Index were analyzed using text mining techniques. The most frequent keywords were found in the order of youth, youth consumers, consumer education, conspicuous consumption, consumption behavior, and character. As a result of analyzing the frequency of keywords by dividing into five-year periods, it was confirmed that the frequency of consumer education was significantly higher betwn 2006 and 2010. Research on ethical consumption has been active since 2011, and research has been conducted on various topics instead of without a prominent keyword during the most recent 5-year period. Looking at the keywords based on the TF-IDF, the keywords related to the environment and the Internet were the main keywords between 2001 and 2005. From 2006 to 2010, the TF-IDF values of media use, advertisement education, and Internet items were high. From 2011 to 2015, fair trade, green growth, green consumption, North Korean defector youths, social media, and from 2016 to 2020, text mining, sustainable development education, maker education, and the 2015 revised curriculum appeared as important themes. As a result of topic modeling, eight topics were derived: consumer education, mass media/peer culture, rational consumption, Hallyu/cultural industry, consumer competency, economic education, teaching and learning method, and eco-friendly/ethical consumption. As a result of network analysis, it was found that conspicuous consumption and consumer education are important topics in consumption research of children and adolescents.

Impact of Diverse Document-evaluation Measure-based Searching Methods in Big Data Search Accuracy (빅데이터 검색 정확도에 미치는 다양한 측정 방법 기반 검색 기법의 효과)

  • Kim, Ji young;Han, DaHyeon;Kim, Jongkwon
    • Journal of KIISE
    • /
    • v.44 no.5
    • /
    • pp.553-558
    • /
    • 2017
  • With the rapid growth of Big Data, research on extracting meaningful information is being pursued by both academia and industry. Especially, data characteristics derived from analysis, and researcher intention are key factors for search algorithms to obtain accurate output. Therefore, reflecting both data characteristics and researcher intention properly is the final goal of data analysis research. The data analyzed properly can help users to increase loyalty to the service provided by company, and to utilize information more effectively and efficiently. In this paper, we explore various methods of document-evaluation, so that we can improve the accuracy of searching article one of the most frequently searches used in real life. We also analyze the experiment result, and suggest the proper manners to use various methods.

Design of a Mirror for Fragrance Recommendation based on Personal Emotion Analysis (개인의 감성 분석 기반 향 추천 미러 설계)

  • Hyeonji Kim;Yoosoo Oh
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.4
    • /
    • pp.11-19
    • /
    • 2023
  • The paper proposes a smart mirror system that recommends fragrances based on user emotion analysis. This paper combines natural language processing techniques such as embedding techniques (CounterVectorizer and TF-IDF) and machine learning classification models (DecisionTree, SVM, RandomForest, SGD Classifier) to build a model and compares the results. After the comparison, the paper constructs a personal emotion-based fragrance recommendation mirror model based on the SVM and word embedding pipeline-based emotion classifier model with the highest performance. The proposed system implements a personalized fragrance recommendation mirror based on emotion analysis, providing web services using the Flask web framework. This paper uses the Google Speech Cloud API to recognize users' voices and use speech-to-text (STT) to convert voice-transcribed text data. The proposed system provides users with information about weather, humidity, location, quotes, time, and schedule management.

Text Mining Analysis of Customer Reviews on Public Service Robots: With a focus on the Guide Robot Cases (텍스트 마이닝을 활용한 공공기관 서비스 로봇에 대한 사용자 리뷰 분석 : 안내로봇 사례를 중심으로)

  • Hyorim Shin;Junho Choi;Changhoon Oh
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.1
    • /
    • pp.787-797
    • /
    • 2023
  • The use of service robots, particularly guide robots, is becoming increasingly prevalent in public institutions. However, there has been limited research into the interactions between users and guide robots. To explore the customer experience with the guidance robot, we selected 'QI', which has been meeting customers for the longest time, and collected all reviews since the service was launched in public institutions. By using text mining techniques, we identified the main keywords and user experience factors and examined factors that hinder user experience. As a result, the guide robot's functionality, appearance, interaction methods, and role as a cultural commentator and helper were key factors that influenced the user experience. After identifying hindrance factors, we suggested solutions such as improved interaction design, multimodal interface service design, and content development. This study contributes to the understanding of user experience with guide robots and provides practical suggestions for improvement.

A Study on the Perception of Artificial Intelligence Literacy and Artificial Intelligence Convergence Education Using Text Mining Analysis Techniques (텍스트 마이닝 분석기법을 활용한 인공지능 리터러시 및 인공지능 융합 교육에 관한 인식 연구)

  • Hyeok Yun;Jeongrang Kim
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.6
    • /
    • pp.553-566
    • /
    • 2022
  • This study collects social data and academic research data from portal sites and RISS, and analyzes TF-IDF, N-Gram, semantic network analysis, and CONCOR analysis to analyze the social awareness and current aspects of 'AI Literacy' and 'AI Convergence Education'. Through this, we tried to understand the social awareness aspect and the current situation, and to suggest implications and directions. In the social data, the collection of 'AI Convergence Education' was more than twice that of 'AI Literacy', indicating that awareness of 'AI Literacy' was relatively low. In 'AI Literacy', the keyword 'human' in social data showed no cluster to which it belonged, indicating a lack of philosophical interest in and awareness of humanities and AI. In addition, the keyword 'Ministry of Education' showed high frequency, importance, and centrality of connection only in the social data of 'AI convergence education', confirming that 'AI convergence education' is closely related to government policy.

Detection of Depression Trends in Literary Cyber Writers Using Sentiment Analysis and Machine Learning

  • Faiza Nasir;Haseeb Ahmad;CM Nadeem Faisal;Qaisar Abbas;Mubarak Albathan;Ayyaz Hussain
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.67-80
    • /
    • 2023
  • Rice is an important food crop for most of the population in Nowadays, psychologists consider social media an important tool to examine mental disorders. Among these disorders, depression is one of the most common yet least cured disease Since abundant of writers having extensive followers express their feelings on social media and depression is significantly increasing, thus, exploring the literary text shared on social media may provide multidimensional features of depressive behaviors: (1) Background: Several studies observed that depressive data contains certain language styles and self-expressing pronouns, but current study provides the evidence that posts appearing with self-expressing pronouns and depressive language styles contain high emotional temperatures. Therefore, the main objective of this study is to examine the literary cyber writers' posts for discovering the symptomatic signs of depression. For this purpose, our research emphases on extracting the data from writers' public social media pages, blogs, and communities; (3) Results: To examine the emotional temperatures and sentences usage between depressive and not depressive groups, we employed the SentiStrength algorithm as a psycholinguistic method, TF-IDF and N-Gram for ranked phrases extraction, and Latent Dirichlet Allocation for topic modelling of the extracted phrases. The results unearth the strong connection between depression and negative emotional temperatures in writer's posts. Moreover, we used Naïve Bayes, Support Vector Machines, Random Forest, and Decision Tree algorithms to validate the classification of depressive and not depressive in terms of sentences, phrases and topics. The results reveal that comparing with others, Support Vectors Machines algorithm validates the classification while attaining highest 79% f-score; (4) Conclusions: Experimental results show that the proposed system outperformed for detection of depression trends in literary cyber writers using sentiment analysis.