• Title/Summary/Keyword: TF-IDF analysis

Search Result 197, Processing Time 0.023 seconds

Comparative Analysis in Perception on Men's Fashion Using Big Data : Focused on Influence of COVID-19 (빅 데이터를 활용한 코로나19 이전과 이후의 남성 패션에 대한 인식 비교)

  • Kim, Do-Hyeon;Kim, Jeong-Mee
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.24 no.3
    • /
    • pp.1-15
    • /
    • 2022
  • The purpose of this study is to compare and analyze the perception of men's fashion before and after the COVID-19 pandemic. TEXTOM allowed the collection of Big Data based on the term 'men's fashion'. As for the data collection periods, Jan. 1, 2018 to Dec. 31, 2019 was set as the pre-COVID-19 era, while Jan. 1, 2020 to Dec. 31, 2021 was set as the post-COVID-19 era. The top 50 words in terms of appearance frequency were extracted from the data. The extracted words were processed using network centrality analysis and CONCOR analysis using Ucinet 6. Research findings were as follows. 1) In the pre-COVID-19 era, the appearance frequency of 'men' was the highest, followed by 'fashion', 'men's fashion', 'brand', 'daily look', 'suit', and 'department store'. These words came up with a high TF-IDF values. Network centrality analysis discovered that 'men', 'fashion', 'men's fashion', 'brand', and 'suit' had a high level of connectivity with other words. CONCOR analysis showed four significant groups: 'fashion item and styles', 'fashion show', 'purchase', and 'collection'. 2) In the post-COVID-19 era, the appearance frequency of 'men' was the highest, followed by 'fashion', 'brand', 'men's fashion', 'discount', 'women', and 'luxury'. These words also displayed high TF-IDF values. Network centrality analysis found that 'fashion', 'men', 'brand', 'men's fashion', and 'discount' had a high level of connectivity with other words. CONCOR analysis showed four significant groups: 'fashion item and style', 'fashion show', 'purchase', and 'situation'. 3) Before the outbreak of the pandemic, men were interested in suits to wear to the office, daily look, and fashion shows in Milan and Paris. They often purchased menswear in multi-brand and open stores. However, they were more interested in sneakers, casual styles, and online fashion shows as social distancing and working from home became common. Most purchased menswear through online platforms.

A Study on Machine Learning Based Anti-Analysis Technique Detection Using N-gram Opcode (N-gram Opcode를 활용한 머신러닝 기반의 분석 방지 보호 기법 탐지 방안 연구)

  • Kim, Hee Yeon;Lee, Dong Hoon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.32 no.2
    • /
    • pp.181-192
    • /
    • 2022
  • The emergence of new malware is incapacitating existing signature-based malware detection techniques., and applying various anti-analysis techniques makes it difficult to analyze. Recent studies related to signature-based malware detection have limitations in that malware creators can easily bypass them. Therefore, in this study, we try to build a machine learning model that can detect and classify the anti-analysis techniques of packers applied to malware, not using the characteristics of the malware itself. In this study, the n-gram opcodes are extracted from the malicious binary to which various anti-analysis techniques of the commercial packers are applied, and the features are extracted by using TF-IDF, and through this, each anti-analysis technique is detected and classified. In this study, real-world malware samples packed using The mida and VMProtect with multiple anti-analysis techniques were trained and tested with 6 machine learning models, and it constructed the optimal model showing 81.25% accuracy for The mida and 95.65% accuracy for VMProtect.

Comparative Analysis in Perception of Retro Fashion and New-tro Fashion Using Big Data (빅 데이터를 활용한 레트로 패션과 뉴트로 패션에 대한 인식 비교)

  • Kyung Ja Paek;Jeong-Mee Kim
    • Journal of the Korea Fashion and Costume Design Association
    • /
    • v.25 no.1
    • /
    • pp.83-96
    • /
    • 2023
  • The purpose of this study is to compare and analyze the perception of retro fashion and new-tro fashion using big data. TEXTOM allowed the collection of big data on the words 'retro fashion' and 'new-tro fashion', which was refined afterwards. As for the data collection period, Jan. 1, 2019 to Nov. 30, 2022 was set. A top 50 list of words were extracted from this data based on appearance frequency. The extracted words were processed through Network centrality analysis and CONCOR analysis using Ucinet 6. The results are as follows. 1) In retro fashion, the appearance frequency of 'style' was the highest, followed by 'sensibility', 'color', 'trend', 'fashion', and 'brand'. These words came up with high TF-IDF values. Network centrality analysis discovered that 'color', 'style', 'trend', 'sensibility', and 'design' had high level of connectivity with other words. CONCOR analysis showed a total of four significant groups; trends, styles, looks, and photos. 2) In new-tro fashion, the appearance frequency of 'retro' was the highest, followed by 'trend', 'generation', 'style', 'brand', and 'fashion'. These words also came up with high TF-IDF values. Network centrality analysis found that 'retro', 'trend', 'generation', and 'brand' had high level of connectivity with other words. CONCOR analysis showed a total of four significant groups; style, brand, clothing, and trend. 3) New-tro fashion is included in retro fashion in that it reproduces the styles of the past. However, it is taken completely differently from generation to generation. Unlike the older generations, millennials actively accept newly created clothes and brands based on the past styles. They perceive it as a fashion that reveals their own unique tastes and tastes.

Analysis on the Trend of The Journal of Information Systems Using TLS Mining (TLS 마이닝을 이용한 '정보시스템연구' 동향 분석)

  • Yun, Ji Hye;Oh, Chang Gyu;Lee, Jong Hwa
    • The Journal of Information Systems
    • /
    • v.31 no.1
    • /
    • pp.289-304
    • /
    • 2022
  • Purpose The development of the network and mobile industries has induced companies to invest in information systems, leading a new industrial revolution. The Journal of Information Systems, which developed the information system field into a theoretical and practical study in the 1990s, retains a 30-year history of information systems. This study aims to identify academic values and research trends of JIS by analyzing the trends. Design/methodology/approach This study aims to analyze the trend of JIS by compounding various methods, named as TLS mining analysis. TLS mining analysis consists of a series of analysis including Term Frequency-Inverse Document Frequency (TF-IDF) weight model, Latent Dirichlet Allocation (LDA) topic modeling, and a text mining with Semantic Network Analysis. Firstly, keywords are extracted from the research data using the TF-IDF weight model, and after that, topic modeling is performed using the Latent Dirichlet Allocation (LDA) algorithm to identify issue keywords. Findings The current study used the summery service of the published research paper provided by Korea Citation Index to analyze JIS. 714 papers that were published from 2002 to 2012 were divided into two periods: 2002-2011 and 2012-2021. In the first period (2002-2011), the research trend in the information system field had focused on E-business strategies as most of the companies adopted online business models. In the second period (2012-2021), data-based information technology and new industrial revolution technologies such as artificial intelligence, SNS, and mobile had been the main research issues in the information system field. In addition, keywords for improving the JIS citation index were presented.

A Method of Mining Visualization Rules from Open Online Text for Situation Aware Business Chart Recommendation (상황인식형 비즈니스 차트 추천기 개발을 위한 개방형 온라인 텍스트로부터의 시각화 규칙 추출 방법 연구)

  • Zhang, Qingxuan;Kwon, Ohbyung
    • The Journal of Society for e-Business Studies
    • /
    • v.25 no.1
    • /
    • pp.83-107
    • /
    • 2020
  • Selecting business charts based on the nature of the data and the purpose of the visualization is useful in business analysis. However, current visualization tools lack the ability to help choose the right business chart for the context. Also, soliciting expert help about visualization methods for every analysis is inefficient. Therefore, the purpose of this study is to propose an accessible method to improve business chart productivity by creating rules for selecting business charts from online published documents. To this end, Korean, English, and Chinese unstructured data describing business charts were collected from the Internet, and the relationships between the contexts and the business charts were calculated using TF-IDF. We also used a Galois lattice to create rules for business chart selection. In order to evaluate the adequacy of the rules generated by the proposed method, experiments were conducted on experimental and control groups. The results confirmed that meaningful rules were extracted by the proposed method. To the best of our knowledge, this is the first study to recommend customizing business charts through open unstructured data analysis and to propose a method that enables efficient selection of business charts for office workers without expert assistance. This method should be useful for staff training by recommending business charts based on the document that he/she is working on.

NFT(Non-Fungible Token) Patent Trend Analysis using Topic Modeling

  • Sin-Nyum Choi;Woong Kim
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.12
    • /
    • pp.41-48
    • /
    • 2023
  • In this paper, we propose an analysis of recent trends in the NFT (Non-Fungible Token) industry using topic modeling techniques, focusing on their universal application across various industrial fields. For this study, patent data was utilized to understand industry trends. We collected data on 371 domestic and 454 international NFT-related patents registered in the patent information search service KIPRIS from 2017, when the first NFT standard was introduced, to October 2023. In the preprocessing stage, stopwords and lemmas were removed, and only noun words were extracted. For the analysis, the top 50 words by frequency were listed, and their corresponding TF-IDF values were examined to derive key keywords of the industry trends. Next, Using the LDA algorithm, we identified four major latent topics within the patent data, both domestically and internationally. We analyzed these topics and presented our findings on NFT industry trends, underpinned by real-world industry cases. While previous review presented trends from an academic perspective using paper data, this study is significant as it provides practical trend information based on data rooted in field practice. It is expected to be a useful reference for professionals in the NFT industry for understanding market conditions and generating new items.

Comparative analysis of informationattributes inchemical accident response systems through Unstructured Data: Spotlighting on the OECD Guidelines for Chemical Accident Prevention, Preparedness, and Response (비정형 데이터를 이용한 화학물질 사고 대응 체계 정보속성 비교 분석 : 화학사고 예방, 대비 및 대응을 위한 OECD 지침서를 중심으로)

  • YongJin Kim;Chunghyun Do
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.4
    • /
    • pp.91-110
    • /
    • 2023
  • The importance of manuals is emphasized because chemical accidents require swift response and recovery, and often result in environmental pollution and casualties. In this regard, the OECD revised OECD Guidelines for the Prevention, Preparedness, and Response to Chemical Accidents (referred to as the OECD Guidelines), in June 2023. Moreover, while existing research primarily raises awareness about chemical accidents, highlighting the need for a system-wide response including laws, regulations, and manuals, it was difficult to find comparative research on the attributes of manuals. So, this paper aims to compare and analyze the second and third editions of the OECD Guidelines, in order to uncover the information attributes and implications of the revised version. Specifically, TF-IDF (Term Frequency-Inverse Document Frequency) was applied to understand which keywords have become more important, and Word2Vec was applied to identify keywords that were used similarly and those that were differentiated. Lastly, a 2×2 matrix was proposed, identifying the topics within each quadrant to provide a deeper comparison of the information attributes of the OECD Guidelines. This study offers a framework to help researchers understand information attributes. From a practical perspective, it appears valuable for the revision of standard manuals by domestic government agencies and corporations related to chemistry.

An Exploratory Study of VR Technology using Patents and News Articles (특허와 뉴스 기사를 이용한 가상현실 기술에 관한 탐색적 연구)

  • Kim, Sungbum
    • Journal of Digital Convergence
    • /
    • v.16 no.11
    • /
    • pp.185-199
    • /
    • 2018
  • The purpose of this study is to derive the core technologies of VR using patent analysis and to explore the direction of social and public interest in VR using news analysis. In Study 1, we derived keywords using the frequency of words in patent texts, and we compared by company, year, and technical classification. Netminer, a network analysis program, was used to analyze the IPC codes of patents. In Study 2, we analyzed news articles using T-LAB program. TF-IDF was used as a keyword selection method and chi-square and association index algorithms were used to extract the words most relevant to VR. Through this study, we confirmed that VR is a fusion technology including optics, head mounted display (HMD), data analysis, electric and electronic technology, and found that optical technology is the central technology among the technologies currently being developed. In addition, through news articles, we found that the society and the public are interested in the formation and growth of VR suppliers and markets, and VR should be developed on the basis of user experience.

A Study on the Analysis of Agricultural R&D Keywords Using Textmining Method (텍스트마이닝을 활용한 농업 R&D 키워드 분석)

  • Kim, Ji-Hoon;Kim, Seong-Sup
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.2
    • /
    • pp.721-732
    • /
    • 2021
  • This study analyzed keywords for agricultural R&D using the textmining method to examine the trend of agricultural R&D. Data used for the analysis included R&D project information provided by NTIS, and the research and development step by year from 2003 to 2018 were classified and applied. The TF-IDF approach was used as the analysis method, and ranking was derived based on score. Furthermore, we analyzed by grouping for similar keywords. The main analysis results are as follows. First, agricultural R&D trends are changing according to the introduction of new technologies and changes in the external environment. Second, keyword changes appeared with a time lag in the R&D step. The main keywords are changing in the order of basic research - applied research - development research. Third, the main keyword of agricultural R&D was 'rice.' However, the direction and purpose of the research were changing according to changes in the domestic and foreign agricultural environments.

Sentiment Analysis and Star Rating Prediction Based on Big Data Analysis of Online Reviews of Foreign Tourists Visiting Korea (방한 관광객의 온라인 리뷰에 대한 빅데이터 분석 기반의 감성분석 및 평점 예측모형)

  • Hong, Taeho
    • Knowledge Management Research
    • /
    • v.23 no.1
    • /
    • pp.187-201
    • /
    • 2022
  • Online reviews written by tourists provide important information for the management and operation of the tourism industry. The star rating of online reviews is a simple quantitative evaluation of a product or service, but it is difficult to reflect the sincere attitude of tourists. There is also an issue; the star rating and review content are not matched. In this study, a star rating prediction model based on online review content was proposed to solve the discrepancy problem. We compared the differences in star ratings and sentiment by continent through sentiment analysis on tourist attractions and hotels written by foreign tourists who visited Korea. Variables were selected through TF-IDF vectorization and sentiment analysis results. Logit, artificial neural network, and SVM(Support Vector Machine) were used for the classification model, and artificial neural network and SVR(Support Vector regression) were applied for the rating prediction model. The online review rating prediction model proposed in this study could solve inconsistency problems and also could be applied even if when there is no star rating.