• 제목/요약/키워드: Text frequency analysis

Search Result 459, Processing Time 0.026 seconds

A Comparative Study of a New Approach to Keyword Analysis: Focusing on NBC (키워드 분석에 대한 최신 접근법 비교 연구: 성경 코퍼스를 중심으로)

  • Ha, Myoungho
    • Journal of Digital Convergence
    • /
    • v.19 no.7
    • /
    • pp.33-39
    • /
    • 2021
  • This paper aims to analyze lexical properties of keyword lists extracted from NLT Old Testament Corpus(NOTC), NLT New Testament Corpus(NNTC), and The NLT Bible Corpus(NBC) and identify that text dispersion keyness is more effective than corpus frequency keyness. For this purpose, NOTC including around 570,000 running words and NNTC about 200,000 were compiled after downloading the files from NLT website of Bible Hub. Scott's (2020) WordSmith 8.0 was utilized to extract keyword lists through comparing a target corpus and a reference corpus. The result demonstrated that text dispersion keyness showed lexical properties of keyword lists better than corpus frequency keyness and that the former was a superior measure for generating optimal keyword lists to fully meet content-generalizability and content distinctiveness.

A Performance Analysis Based on Hadoop Application's Characteristics in Cloud Computing (클라우드 컴퓨팅에서 Hadoop 애플리케이션 특성에 따른 성능 분석)

  • Keum, Tae-Hoon;Lee, Won-Joo;Jeon, Chang-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.5
    • /
    • pp.49-56
    • /
    • 2010
  • In this paper, we implement a Hadoop based cluster for cloud computing and evaluate the performance of this cluster based on application characteristics by executing RandomTextWriter, WordCount, and PI applications. A RandomTextWriter creates given amount of random words and stores them in the HDFS(Hadoop Distributed File System). A WordCount reads an input file and determines the frequency of a given word per block unit. PI application induces PI value using the Monte Carlo law. During simulation, we investigate the effect of data block size and the number of replications on the execution time of applications. Through simulation, we have confirmed that the execution time of RandomTextWriter was proportional to the number of replications. However, the execution time of WordCount and PI were not affected by the number of replications. Moreover, the execution time of WordCount was optimum when the block size was 64~256MB. Therefore, these results show that the performance of cloud computing system can be enhanced by using a scheduling scheme that considers application's characteristics.

Analysis of Dental Hygienist Job Recognition Using Text Mining

  • Kim, Bo-Ra;Ahn, Eunsuk;Hwang, Soo-Jeong;Jeong, Soon-Jeong;Kim, Sun-Mi;Han, Ji-Hyoung
    • Journal of dental hygiene science
    • /
    • v.21 no.1
    • /
    • pp.70-78
    • /
    • 2021
  • Background: The aim of this study was to analyze the public demand for information about the job of dental hygienists by mining text data collected from the online Q & A section on an Internet portal site. Methods: Text data were collected from inquiries that were posted on the Naver Q & A section from January 2003 to July 2020 using "dental hygienist job recognition," "role recognition," "medical assistance," and "scaling" as search keywords. Text mining techniques were used to identify significant Korean words and their frequency of occurrence. In addition, the association between words was analyzed. Results: A total of 10,753 Korean words related to the job of dental hygienists were extracted from the text data. "Chi-lyo (treatment)," "chigwa (dental clinic)," "ske-illing (scaling)," "itmom (gum)," and "chia (tooth)" were the five most frequently used words. The words were classified into the following areas of job of the dental hygienist: periodontal disease treatment and prevention, medical assistance, patient care and consultation, and others. Among these areas, the number of words related to medical assistance was the largest, with sixty-six association rules found between the words, and "chi-lyo," "chigwa," and "ske-illing" as core words. Conclusion: The public demand for information about the job of dental hygienists was mainly related to "chi-lyo," "chigwa," and "ske-illing" as core words, demonstrating that scaling is recognized by the public as the job of a dental hygienist. However, the high demand for information related to treatment and medical assistance in the context of dental hygienists indicates that the job of dental hygienists is recognized by the public as being more focused on medical assistance than preventive dental care that are provided with job autonomy.

A Study of Perception of Golfwear Using Big Data Analysis (빅데이터를 활용한 골프웨어에 관한 인식 연구)

  • Lee, Areum;Lee, Jin Hwa
    • Fashion & Textile Research Journal
    • /
    • v.20 no.5
    • /
    • pp.533-547
    • /
    • 2018
  • The objective of this study is to examine the perception of golfwear and related trends based on major keywords and associated words related to golfwear utilizing big data. For this study, the data was collected from blogs, Jisikin and Tips, news articles, and web $caf{\acute{e}}$ from two of the most commonly used search engines (Naver & Daum) containing the keywords, 'Golfwear' and 'Golf clothes'. For data collection, frequency and matrix data were extracted through Textom, from January 1, 2016 to December 31, 2017. From the matrix created by Textom, Degree centrality, Closeness centrality, Betweenness centrality, and Eigenvector centrality were calculated and analyzed by utilizing Netminer 4.0. As a result of analysis, it was found that the keyword 'brand' showed the highest rank in web visibility followed by 'woman', 'size', 'man', 'fashion', 'sports', 'price', 'store', 'discount', 'equipment' in the top 10 frequency rankings. For centrality calculations, only the top 30 keywords were included because the density was extremely high due to high frequency of the co-occurring keywords. The results of centrality calculations showed that the keywords on top of the rankings were similar to the frequency of the raw data. When the frequency was adjusted by subtracting 100 and 500 words, it showed different results as the low-ranking keywords such as J. Lindberg in the frequency analysis ranked high along with changes in the rankings of all centrality calculations. Such findings of this study will provide basis for marketing strategies and ways to increase awareness and web visibility for Golfwear brands.

Analysis of domestic and foreign research trends of Tricholoma matsutake using text mining techniques

  • Choi, Ah Hyeon;Kang, Jun Won
    • Korean Journal of Agricultural Science
    • /
    • v.48 no.3
    • /
    • pp.505-514
    • /
    • 2021
  • Among non-timber forest products, Tricholoma matsutake is a high value added item. Many countries, including Korea, China, and Japan, are doing research and technology development to increase artificial cultivation and productivity. However, the production of T. matsutake is on the decline due to global warming, abnormal temperatures and pine tree pest problems. Therefore, it is necessary to identify trends in domestic and foreign research on T. matsutake, respond to preemptive research and development to preserve the genetic resources of T. matsutake and increase its productivity. Based on the correlation between keywords in the high frequency keywords, it was observed that microbial clusters of T. matsutake are mainly found in Korea. The main focus in China has been the pharmacology studies on the ingredients of T. matsutake. The main focus in Japan has been on preserving the genetic diversity and species of T. matsutake. Thus, future domestic studies of T. matsutake will require pharmacological studies on the ingredients of T. matsutake and on its genetic diversity and species conservation. In addition, unlike China and Japan, genetic keywords did not appear in Korea at high frequency. Therefore, Korea will have to proceed with research using modern molecular biology techniques.

The Strategy of Wireless Power Transfer for Light Rail Transit By Core Technologies Analysis Based on Text Mining

  • Meng, Xiang-Yu;Han, Young-Jae;Eum, Soo-Min;Cho, Sung-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.11
    • /
    • pp.193-201
    • /
    • 2018
  • In this paper, we extracted relevant patent data and conducted statistical analysis to understand the technical development trend related to Wireless Power Transfer (WPT) for Light Rail Transit (LRT). Recently, with the development of WPT technologies, the Light Rail Transit (LRT) industry is concentrating on applying WPT to the power supply system of trains because of their advantages compared wired counterpart, such as low maintenance cost and high stability. This technology is divided into three areas: wireless feeding and collecting technology, high-frequency power converter technology and orbital and infrastructure technology. From each specific area, key words in patent document were extracted by TF-IDF method and analyzed by social network. In the keyword network, core word of each specific technology were extracted according to their degree centrality. Then, the multi-word phrases were also built to represent the concept of core technologies. Finally, based on the analysis results, the development strategies for each specifics technical area of WPT in LRT filed will be provided.

Research on Satisfaction Evaluation Based on Tourist Big Data

  • Guo, Hanwen;Liu, Ziyang;Jiao, Zeyu
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.1
    • /
    • pp.231-244
    • /
    • 2022
  • With the improvement of people's living standards and the development of tourism, tourists have greater freedom in choosing destinations. Therefore, as an indicator of satisfaction with scenic spots, tourist comments are becoming increasingly prominent. This paper aims to compare and analyze the landscape image of the Five Great Mountains in China and provide specific strategies for its development. The online reviews of tourists on the Online Travel Agency (OTA) website about the Five Great Mountains from 2015 to 2018 are collected as research samples. The text analysis method and R language are used to analyze the content of the tourist reviews, while the high-frequency words in the word cloud are used for visual display. In addition, the entropy weight method is used to determine the index weight and tourist satisfaction is evaluated to understand the weaknesses of those scenic spots. The results of the study show that firstly, the tourist satisfaction with the Five Great Mountains is basically consistent with its popularity. Secondly, through weight analysis, tourists pay special attention to the landscape features and environmental health of the scenic area, so that relevant departments should focus on building the landscape characteristics and improving the environmental health of the scenic area. At the same time, the accommodation and service management of the scenic spot cannot be ignored. Finally, according to the analysis results, suggestions are made on how to improve the tourist satisfaction with the Five Great Mountains.

A Study on Perceptions of Virtual Influencers through YouTube Comments -Focusing on Positive and Negative Emotional Responses Toward Character Design- (유튜브 댓글을 통해 살펴본 버추얼 인플루언서에 대한 인식 연구 -캐릭터 디자인에 대한 긍부정 감성 반응을 중심으로-)

  • Hyosun An;Jiyoung Kim
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.47 no.5
    • /
    • pp.873-890
    • /
    • 2023
  • This study analyzed users' emotional responses to VI character design through YouTube comments. The researchers applied text-mining to analyze 116,375 comments, focusing on terms related to character design and characteristics of VI. Using the BERT model in sentiment analysis, we classified comments into extremely negative, negative, neutral, positive, or extremely positive sentiments. Next, we conducted a co-occurrence frequency analysis on comments with extremely negative and extremely positive responses to examine the semantic relationships between character design and emotional characteristic terms. We also performed a content analysis of comments about Miquela and Shudu to analyze the perception differences regarding the two character designs. The results indicate that form elements (e.g., voice, face, and skin) and behavioral elements (e.g., speaking, interviewing, and reacting) are vital in eliciting users' emotional responses. Notably, in the negative responses, users focused on the humanization aspect of voice and the authenticity aspect of behavior in speaking, interviewing, and reacting. Furthermore, we found differences in the character design elements and characteristics that users expect based on the VI's field of activity. As a result, this study suggests applications to character design to accommodate these variations.

Analyzing Architectural History Terminologies by Text Mining and Association Analysis (텍스트 마이닝과 연관 관계 분석을 이용한 건축역사 용어 분석)

  • Kim, Min-Jeong;Kim, Chul-Joo
    • Journal of Digital Convergence
    • /
    • v.15 no.1
    • /
    • pp.443-452
    • /
    • 2017
  • Architectural history traces the changes in architecture through various traditions, regions, overarching stylistic trends, and dates. This study identified terminologies related to the proximity and frequency in the architectural history areas by text mining and association analysis. This study explored terminologies by investigating articles published in the "Journal of Architectural History", a sole journal for the architectural history studies. First, key terminologies that appeared frequently were extracted from paper that had titles, keywords, and abstracts. Then, we analyzed some typical and specific key terminologies that appear frequently and partially depending on the research areas. Finally, association analysis was used to find the frequent patterns in the key terminologies. This research can be used as fundamental data for understanding issues and trends in areas on the architectural history.

A Study on the Consumer's Perception of HiSeoul Fashion Show Using Big Data Analysis (빅데이터 분석을 활용한 하이서울패션쇼에 대한 소비자 인식 조사)

  • Han, Ki Hyang
    • Journal of Fashion Business
    • /
    • v.23 no.5
    • /
    • pp.81-95
    • /
    • 2019
  • The purpose of this study is to research consumers' perception of the HiSeoul fashion show, which is being used by new designers as a means of promotion, and to propose a strategy for revitalizing new designer brands. This was done in order to secure basic data from fashion consumers, to help guide marketing strategies and promote rising designers. In this research, the consumers' perception of HiSeoul fashion show was verified using text-mining, data refinement and word clouding that was undertaken by TEXTOM3.0. Also, semantic network analysis, CONCOR analysis and visualization of the analysis results were performed using Ucinet 6.0 and NetDraw. "HiSeoul fashion show" was used as the keyword for text-mining and data was collected from March 1, 2018 to April 30, 2019. Using frequency analysis, TF-IDF, and N-gram, it was also shown that consumers are aware of places where shows are held, such as DDP and Igansumun. It was also revealed that consumers recognize rising designer brands, designer's names, the names of guests attending the show and the photo times. This study is meaningful in that it not only confirmed consumers' interest in new designer brands participating in the HiSeoul Fashion Show through big data but also confirmed that it is available as a marketing strategy to boost brand sales. This study suggests using HiSeoul show room to induce consumer sales, or inviting guests that match the brand image to promote them on SNS on the day the show is held for a marketing strategy.