• Title/Summary/Keyword: Web Text Analysis

Search Result 281, Processing Time 0.025 seconds

Analysis Study on Trends of Library Development Plan by Using Big Data Analysis (빅데이터 분석 기법을 활용한 도서관발전종합계획 동향 분석 연구)

  • Kim, Dongseok;Noh, Younghee
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.29 no.2
    • /
    • pp.85-108
    • /
    • 2018
  • This study aimed to analyze media reports of the Comprehensive Library Advancement Plan using big data analysis in order to determine trends and implications by period. To do so, related data from 2009 to 2017 were collected from major domestic web portal sites. Words in the collected data were refined through the text mining process and frequency, centrality, and structural equivalence analyses were performed. Results confirmed that, during the implementation of the first and the second phases of the Comprehensive Library Advancement Plan, the focus of the library policy changed from external growth to strengthening internal stability and advancement of library operation, and the media coverage were limited to specific policies such as expansion of library facilities. Findings from this study will serve as useful material for ascertaining the approach to perceive and understand the national library policy represented by the Comprehensive Library Advancement Plan.

Semantic analysis via application of deep learning using Naver movie review data (네이버 영화 리뷰 데이터를 이용한 의미 분석(semantic analysis))

  • Kim, Sojin;Song, Jongwoo
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.1
    • /
    • pp.19-33
    • /
    • 2022
  • With the explosive growth of social media, its abundant text-based data generated by web users has become an important source for data analysis. For example, we often witness online movie reviews from the 'Naver Movie' affecting the general public to decide whether they should watch the movie or not. This study has conducted analysis on the Naver Movie's text-based review data to predict the actual ratings. After examining the distribution of movie ratings, we performed semantics analysis using Korean Natural Language Processing. This research sought to find the best review rating prediction model by comparing machine learning and deep learning models. We also compared various regression and classification models in 2-class and multi-class cases. Lastly we explained the causes of review misclassification related to movie review data characteristics.

A Study on the Convenient of Fashion Product Information Among Adverstisement Types and Information Design Methods (정보제공매체(情報提供媒體)의 구성방식(構成方式)에 따른 패션상품(商品) 정보검색(情報檢索)의 편의성(便宜性)에 관(關)한 연구(硏究))

  • Ko, Eun-Ju
    • Journal of Fashion Business
    • /
    • v.4 no.4
    • /
    • pp.123-133
    • /
    • 2000
  • The purpose of this study was to examine the usage of internet advertisement, to define the convenient dimensions of information navigation medium, and to examine the difference of convenient information navigation among advertisement types and information design methods. Thirty-four subjects were selected for the survey study and descriptive anlysis, factor analysis and ANOVA were used for the data analysis. The results of this study were: 1. the most perceived channel about the web site information was the search engine, the most preferred type of information design was the mixed type with text and image, and the most preferred medium for fashion information is the magazine. 2. the convenient dimensions of information navigation medium were information understanding, design composition, easy navigation, timely navigation, and convenient navigation. 3. the information design method is the significant factor influencing on the convenient information navigation.

  • PDF

Age and Gender in Reddit Commenting and Success

  • Finlay, S. Craig
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.3
    • /
    • pp.18-28
    • /
    • 2014
  • Reddit is a large user generated content (USG) website in which users form common interest groups and submit links to external content or text posts of user-created content. The web site operates on a voting system whereby registered users can assign positive or negative ratings to both submitted content and comments made to submitted content. While Reddit is a pseudonymous site, with users creating usernames but providing no biographical data, an informal survey posted to a large shared interest community yielded 734 responses including age and gender of users. This provided a large amount of contextual biographical data with which to analyse user profiles at the first level of Computer Mediated Discourse Analysis (CMDA), articulated by Susan Herring. The results indicate that older Reddit users both formulate more complex writing and enjoy more success when rated by other users. Gender data was incomplete and as such only tentative results could be proposed in that regard.

Analysis of domestic and foreign research trends of Tricholoma matsutake using text mining techniques

  • Choi, Ah Hyeon;Kang, Jun Won
    • Korean Journal of Agricultural Science
    • /
    • v.48 no.3
    • /
    • pp.505-514
    • /
    • 2021
  • Among non-timber forest products, Tricholoma matsutake is a high value added item. Many countries, including Korea, China, and Japan, are doing research and technology development to increase artificial cultivation and productivity. However, the production of T. matsutake is on the decline due to global warming, abnormal temperatures and pine tree pest problems. Therefore, it is necessary to identify trends in domestic and foreign research on T. matsutake, respond to preemptive research and development to preserve the genetic resources of T. matsutake and increase its productivity. Based on the correlation between keywords in the high frequency keywords, it was observed that microbial clusters of T. matsutake are mainly found in Korea. The main focus in China has been the pharmacology studies on the ingredients of T. matsutake. The main focus in Japan has been on preserving the genetic diversity and species of T. matsutake. Thus, future domestic studies of T. matsutake will require pharmacological studies on the ingredients of T. matsutake and on its genetic diversity and species conservation. In addition, unlike China and Japan, genetic keywords did not appear in Korea at high frequency. Therefore, Korea will have to proceed with research using modern molecular biology techniques.

Survey of Automatic Query Expansion for Arabic Text Retrieval

  • Farhan, Yasir Hadi;Noah, Shahrul Azman Mohd;Mohd, Masnizah
    • Journal of Information Science Theory and Practice
    • /
    • v.8 no.4
    • /
    • pp.67-86
    • /
    • 2020
  • Information need has been one of the main motivations for a person using a search engine. Queries can represent very different information needs. Ironically, a query can be a poor representation of the information need because the user can find it difficult to express the information need. Query Expansion (QE) is being popularly used to address this limitation. While QE can be considered as a language-independent technique, recent findings have shown that in certain cases, language plays an important role. Arabic is a language with a particularly large vocabulary rich in words with synonymous shades of meaning and has high morphological complexity. This paper, therefore, provides a review on QE for Arabic information retrieval, the intention being to identify the recent state-of-the-art of this burgeoning area. In this review, we primarily discuss statistical QE approaches that include document analysis, search, browse log analyses, and web knowledge analyses, in addition to the semantic QE approaches, which use semantic knowledge structures to extract meaningful word relationships. Finally, our conclusion is that QE regarding the Arabic language is subjected to additional investigation and research due to the intricate nature of this language.

A Study on the Factors of Well-aging through Big Data Analysis : Focusing on Newspaper Articles (빅데이터 분석을 활용한 웰에이징 요인에 관한 연구 : 신문기사를 중심으로)

  • Lee, Chong Hyung;Kang, Kyung Hee;Kim, Yong Ha;Lim, Hyo Nam;Ku, Jin Hee;Kim, Kwang Hwan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.5
    • /
    • pp.354-360
    • /
    • 2021
  • People hope to live a healthy and happy life achieving satisfaction by striking a good work-life balance. Therefore, there is a growing interest in well-aging which means living happily to a healthy old age without worry. This study identified important factors related to well-aging by analyzing news articles published in Korea. Using Python-based web crawling, 1,199 articles were collected on the news service of portal site Daum till November 2020, and 374 articles were selected which matched the subject of the study. The frequency analysis results of text mining showed keywords such as 'elderly', 'health', 'skin', 'well-aging', 'product', 'person', 'aging', 'female', 'domestic' and 'retirement' as important keywords. Besides, a social network analysis with 45 important keywords revealed strong connections in the order of 'skin-wrinkle', 'skin-aging' and 'old-health'. The result of the CONCOR analysis showed that 45 main keywords were composed of eight clusters of 'life and happiness', 'disease and death', 'nutrition and exercise', 'healing', 'health', and 'elderly services'.

Semantic Network Analysis of Online News and Social Media Text Related to Comprehensive Nursing Care Service (간호간병통합서비스 관련 온라인 기사 및 소셜미디어 빅데이터의 의미연결망 분석)

  • Kim, Minji;Choi, Mona;Youm, Yoosik
    • Journal of Korean Academy of Nursing
    • /
    • v.47 no.6
    • /
    • pp.806-816
    • /
    • 2017
  • Purpose: As comprehensive nursing care service has gradually expanded, it has become necessary to explore the various opinions about it. The purpose of this study is to explore the large amount of text data regarding comprehensive nursing care service extracted from online news and social media by applying a semantic network analysis. Methods: The web pages of the Korean Nurses Association (KNA) News, major daily newspapers, and Twitter were crawled by searching the keyword 'comprehensive nursing care service' using Python. A morphological analysis was performed using KoNLPy. Nodes on a 'comprehensive nursing care service' cluster were selected, and frequency, edge weight, and degree centrality were calculated and visualized with Gephi for the semantic network. Results: A total of 536 news pages and 464 tweets were analyzed. In the KNA News and major daily newspapers, 'nursing workforce' and 'nursing service' were highly rated in frequency, edge weight, and degree centrality. On Twitter, the most frequent nodes were 'National Health Insurance Service' and 'comprehensive nursing care service hospital.' The nodes with the highest edge weight were 'national health insurance,' 'wards without caregiver presence,' and 'caregiving costs.' 'National Health Insurance Service' was highest in degree centrality. Conclusion: This study provides an example of how to use atypical big data for a nursing issue through semantic network analysis to explore diverse perspectives surrounding the nursing community through various media sources. Applying semantic network analysis to online big data to gather information regarding various nursing issues would help to explore opinions for formulating and implementing nursing policies.

Exploring the phenomenon of veganphobia in vegan food and vegan fashion (비건 음식과 비건 패션에서 나타난 비건포비아 현상에 대한 탐구)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.3
    • /
    • pp.381-397
    • /
    • 2024
  • This study investigates the negative perceptions (veganphobia) held by consumers toward vegan diets and fashion and aims to foster a genuine acceptance of ethical veganism in consumption. The textual data web-crawled Korean online posts, including news articles, blogs, forums, and tweets, containing keywords such as "contradiction," "dilemma," "conflict," "issues," "vegan food" and "vegan fashion" from 2013 to 2021. Data analysis was conducted through text mining, network analysis, and clustering analysis using Python and NodeXL programs. The analysis revealed distinct negative perceptions regarding vegan food. Key issues included the perception of hypocrisy among vegetarians, associations with specific political leanings, conflicts between environmental and animal rights, and contradictions between views on companion animals and livestock. Regarding the vegan fashion industry, the eco-friendliness of material selection and design processes were seen as the pivotal factors shaping negative attitudes. Furthermore, the study identified a shared negative perception regarding vegan food and vegan fashion. This negativity was characterized by confusion and conflicts between animal and environmental rights, biased perceptions linked to specific political affiliations, perceived self-righteousness among vegetarians, and general discomfort toward them. These factors collectively contributed to a broader negative perception of vegan consumption. In conclusion, this study is significant in understanding the complex perceptions and attitudes that con- sumers hold toward vegan food and fashion. The insights gained from this research can aid in the design of more effective campaign strategies aimed at promoting vegan consumerism, ultimately contributing to a more widespread acceptance of ethical veganism in society.

A Study of the Improvement Method of I-pin Mass Illegal Issue Accident (아이핀 대량 부정발급 사고에 대한 개선방법 연구)

  • Lee, Younggyo;Ahn, Jeonghee
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.2
    • /
    • pp.11-22
    • /
    • 2015
  • The almost of Web page has been gathered the personal information(Korean resident registration number, name, cell-phone number, home telephone number, E-mail address, home address, etc.) using the membership and log-in. The all most user of Web page are concerned for gathering of the personal information. I-pin is the alternative means of resident registration number and has been used during the last ten-year period in the internet. The accident of I-pin mass illegal issue was happened by hacker at February, 2015. In this paper, we analysis the problems of I-pin system about I-pin mass illegal issue accident and propose a improvement method of it. First, I-pin issue must be processed by the off-line of face certification in spite of user's inconvenience. Second, I-pin use must be made up through second certification of password or OTP. The third, the notification of I-pin use must be sent to the user by the text messaging service of cell-phone or the E-mail. The forth, I-pin must be used an alternative means of Korean resident registration number in Internet. The methods can reduce the problems of I-pin system.