• Title/Summary/Keyword: tf-idf

Search Result 348, Processing Time 0.027 seconds

A Study on the Meaning of The First Slam Dunk Based on Text Mining and Semantic Network Analysis

  • Kyung-Won Byun
    • International journal of advanced smart convergence
    • /
    • v.12 no.1
    • /
    • pp.164-172
    • /
    • 2023
  • In this study, we identify the recognition of 'The First Slam Dunk', which is gaining popularity as a sports-based cartoon through big data analysis of social media channels, and provide basic data for the development and development of various contents in the sports industry. Social media channels collected detailed social big data from news provided on Naver and Google sites. Data were collected from January 1, 2023 to February 15, 2023, referring to the release date of 'The First Slam Dunk' in Korea. The collected data were 2,106 Naver news data, and 1,019 Google news data were collected. TF and TF-IDF were analyzed through text mining for these data. Through this, semantic network analysis was conducted for 60 keywords. Big data analysis programs such as Textom and UCINET were used for social big data analysis, and NetDraw was used for visualization. As a result of the study, the keyword with the high frequency in relation to the subject in consideration of TF and TF-IDF appeared 4,079 times as 'The First Slam Dunk' was the keyword with the high frequency among the frequent keywords. Next are 'Slam Dunk', 'Movie', 'Premiere', 'Animation', 'Audience', and 'Box-Office'. Based on these results, 60 high-frequency appearing keywords were extracted. After that, semantic metrics and centrality analysis were conducted. Finally, a total of 6 clusters(competing movie, cartoon, passion, premiere, attention, Box-Office) were formed through CONCOR analysis. Based on this analysis of the semantic network of 'The First Slam Dunk', basic data on the development plan of sports content were provided.

A Suggestion for Spatiotemporal Analysis Model of Complaints on Officially Assessed Land Price by Big Data Mining (빅데이터 마이닝에 의한 공시지가 민원의 시공간적 분석모델 제시)

  • Cho, Tae In;Choi, Byoung Gil;Na, Young Woo;Moon, Young Seob;Kim, Se Hun
    • Journal of Cadastre & Land InformatiX
    • /
    • v.48 no.2
    • /
    • pp.79-98
    • /
    • 2018
  • The purpose of this study is to suggest a model analysing spatio-temporal characteristics of the civil complaints for the officially assessed land price based on big data mining. Specifically, in this study, the underlying reasons for the civil complaints were found from the spatio-temporal perspectives, rather than the institutional factors, and a model was suggested monitoring a trend of the occurrence of such complaints. The official documents of 6,481 civil complaints for the officially assessed land price in the district of Jung-gu of Incheon Metropolitan City over the period from 2006 to 2015 along with their temporal and spatial poperties were collected and used for the analysis. Frequencies of major key words were examined by using a text mining method. Correlations among mafor key words were studied through the social network analysis. By calculating term frequency(TF) and term frequency-inverse document frequency(TF-IDF), which correspond to the weighted value of key words, I identified the major key words for the occurrence of the civil complaint for the officially assessed land price. Then the spatio-temporal characteristics of the civil complaints were examined by analysing hot spot based on the statistics of Getis-Ord $Gi^*$. It was found that the characteristic of civil complaints for the officially assessed land price were changing, forming a cluster that is linked spatio-temporally. Using text mining and social network analysis method, we could find out that the occurrence reason of civil complaints for the officially assessed land price could be identified quantitatively based on natural language. TF and TF-IDF, the weighted averages of key words, can be used as main explanatory variables to analyze spatio-temporal characteristics of civil complaints for the officially assessed land price since these statistics are different over time across different regions.

Keyword Weight based Paragraph Extraction Algorithm (문단 가중치 분석 기반 본문 영역 선정 알고리즘)

  • Lee, Jongwon;Yu, Seongjong;Kim, Doan;Jung, Hoekyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2018.05a
    • /
    • pp.462-463
    • /
    • 2018
  • Traditional document analysis systems used word-based analysis using a morphological analyzer or TF-IDF technique. These systems have the advantage of being able to derive key keywords by calculating the weights of the keywords. On the other hand, it is not appropriate to analyze the contents of documents due to the structural limitations. To solve this problem, the proposed algorithm calculates the weights of the documents in the document and divides the paragraphs into areas. And we calculate the importance of the divided regions and let the user know the area with the most important paragraphs in the document. So, it is expected that the user will be provided with a service suitable for analyzing documents rather than using existing document analysis systems.

  • PDF

Music information and musical propensity analysis, and music recommendation system using collaborative filtering (음악정보와 음악적 성향 분석 및 협업 필터링을 이용한 음악추천시스템)

  • Gong, Minseo;Hong, Jinju;Choi, Jaehyun
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.05a
    • /
    • pp.533-536
    • /
    • 2015
  • Mobile music market is growing. However, services what are applied recently are inaccurate to recommend music that a user is worth to prefer. So, this paper suggests music recommend system. This system recommend music that users prefer analyzing music information and user's musical propensity and using collaborative filtering. This system classify genre and extract factors what can be get using STFT's ZCR, Spectral roll-off, Spectral flux. So similar musics are clustered by these factors. And then, after divide mood of music's lyric, it finally recommend music automatically using collaborative filtering.

  • PDF

Implementation of a Web Document Clustering System Using Word2Vec (Word2Vec을 이용한 웹 문서 클러스터링 시스템 구현)

  • Yi, Hyun Seok;Ahn, Sung Hun;Lee, Yong Hwan;Cheon, Myung Jae;Park, Hyeok Ju;Park, Mee Hwa;Lee, Yong Kyu
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2016.10a
    • /
    • pp.26-29
    • /
    • 2016
  • 웹 문서 추천 시스템에서는 유사한 내용의 문서임에도 불구하고 URL이 달라서 다른 문서로 인식하여 사용자에게 추천하는 데이터 희소성 문제가 있다. 여기서 기존 연구들은 이 문제에 대한 해결 방법으로 TF-IDF를 이용하였으나 비용 및 시간의 한계가 있으며 유의어 분류 문제가 있다. 본 논문에서는 Word2Vec을 이용한 웹문서 학습 시스템을 통해 문제를 해결한다. 제안 시스템은 언론사의 뉴스를 수집하고 이를 정형화된 형식으로 분석하여 가공하는 전처리 과정을 거친 후 Word2Vec 학습을 통해 문서 벡터를 생성하고 이를 K-Means 클러스터링으로 유사 문서군으로 분류한다. 이 시스템을 이용하면 데이터 희소성 문제를 해결할 뿐만 아니라 연산량이 TF-IDF에 비해 줄어들고 유의어 분류 시 유사도가 높아지는 강점이 있다.

An Effective Metric for Measuring the Degree of Web Page Changes (효과적인 웹 문서 변경도 측정 방법)

  • Kwon, Shin-Young;Kim, Sung-Jin;Lee, Sang-Ho
    • Journal of KIISE:Databases
    • /
    • v.34 no.5
    • /
    • pp.437-447
    • /
    • 2007
  • A variety of similarity metrics have been used to measure the degree of web page changes. In this paper, we first define criteria for web page changes to evaluate the effectiveness of the similarity metrics in terms of six important types of web page changes. Second, we propose a new similarity metric appropriate for measuring the degree of web page changes. Using real web pages and synthesized pages, we analyze the five existing metrics (i.e., the byte-wise comparison, the TF IDF cosine distance, the word distance, the edit distance, and the shingling) and ours under the proposed criteria. The analysis result shows that our metric represents the changes more effectively than other metrics. We expect that our study can help users select an appropriate metric for particular web applications.

An Evaluation of Twitter Ranking Using the Retweet Information (재전송 정보를 활용한 트위터 랭킹의 정확도 평가)

  • Chang, Jae-Young
    • The Journal of Society for e-Business Studies
    • /
    • v.17 no.2
    • /
    • pp.73-85
    • /
    • 2012
  • Recently, as Social Network Services(SNS), such as Twitter, Facebook, are becoming more popular, much research has been doing actively. However, since SNS has been launched recently, related researches are also infant level. Especially, search engines serviced in web potals simply show the postings in order of upload time. Searching the postings in Twitter should be different from web search, which is based on traditional TF-IDF. In this paper, we present the new method of searching and ranking the interesting postings in Twitter. In proposed method, we utilize the frequency of retweets as a major factor for estimating the quality of postings. It can be an important criteria since users tend to retweet the valuable postings. Experimental results show that proposed method can be applied successfully in Twitter search system.

Implementation of Search Method based on Sequence and Adjacency Relationship of User Query (사용자 검색 질의 단어의 순서 및 단어간의 인접 관계에 기반한 검색 기법의 구현)

  • So, Byung-Chul;Jung, Jin-Woo
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.21 no.6
    • /
    • pp.724-729
    • /
    • 2011
  • Information retrieval is a method to search the needed data by users. Generally, when a user searches some data in the large scale data set like the internet, ranking-based search is widely used because it is not easy to find the exactly needed data at once. In this paper, we propose a novel ranking-based search method based on sequence and adjacency relationship of user query by the help of TF-IDF and n-gram. As a result, it was possible to find the needed data more accurately with 73% accuracy in more than 19,000 data set.

Movie Retrieval System by Analyzing Sentimental Keyword from User's Movie Reviews (사용자 영화평의 감정어휘 분석을 통한 영화검색시스템)

  • Oh, Sung-Ho;Kang, Shin-Jae
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.3
    • /
    • pp.1422-1427
    • /
    • 2013
  • This paper proposed a movie retrieval system based on sentimental keywords extracted from user's movie reviews. At first, sentimental keyword dictionary is manually constructed by applying morphological analysis to user's movie reviews, and then keyword weights in the dictionary are calculated for each movie with TF-IDF. By using these results, the proposed system classify sentimental categories of movies and rank classified movies. Without reading any movie reviews, users can retrieve movies through queries composed by sentimental keywords.

A Clustering Technique Using Association Rules for The Library and Information Science Terminology (연관규칙을 이용한 문헌정보학 전문용어 클러스터링 기법에 관한 연구)

  • Seung, Hyon-Woo;Park, Mi-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.37 no.2
    • /
    • pp.89-105
    • /
    • 2003
  • In this paper, an effective method for clustering terminologies extracted from text is proposed, in order to develope a search engine to extract relevant information from large web documents. To prevent frequency of the meaningless association rules among general terminologies, only useful association rules among terminologies are produced using database tables which consist of domain-specific terminologies. Such association rules are produced by applying the Apriori algorithm after forming transaction units from groups of association rules in a document. A group of association rules produced from a terminology forms in a cluster.