• Title/Summary/Keyword: Web-crawling

Search Result 177, Processing Time 0.025 seconds

COVID-19 and Korean Family Life on Social Media: A Topic Model Approach (소셜 빅데이터로 알아본 코로나19와 가족생활: 토픽모델 접근)

  • Park, Sunyoung;Lee, Jaerim
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.3
    • /
    • pp.282-300
    • /
    • 2021
  • The purpose of this study was to explore what social media posts tell us about family life during the COVID-19 pandemic by examining the keywords and topics underlying posts on blogs and online forums. Our criteria for web crawling were (a) blog and forum posts on Naver and Daum, the top portal sites in Korea, (b) posts between February 23 and April 19, 2020, the period of the first heightened social distancing orders, and (c) inclusion of "COVID" and "family" or "COVID" and "home." We analyzed 351,734 posts using TF-IDF values and topic modeling based on latent Dirichlet allocation. We identified and named 22 topics including COVID-19 prevention, family infection, family health, dietary life and changes, religious life, stuck at home, postponed school year, family events, travel and vacations, concerns about family and friends, anxiety and stress, disaster and damage, COVID-19 warning text messages, family support policies, Shin-cheon-ji and Daegu. The results show that COVID-19 impacted various domains of family life including health, food, housing, religion, child care, education, rituals, and leisure as well as relationships and emotions.

Analysis for Daily Food Delivery & Consumption Trends in the Post-Covid-19 Era through Big Data

  • Jeong, Chan-u;Moon, Yoo-Jin;Hwang, Young-Ho
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.1
    • /
    • pp.231-238
    • /
    • 2021
  • In this paper, we suggest a method of analysis for daily food delivery & consumption trends through big data of the post-Covid-19 era. Through analysis of big data and the database system, four analyzed factors, excluding weather, was proved to have significant correlation with delivery sales for 'Baedarui Minjok' of a catering delivery application. The research found that KBS, MBC and SBS Media showed remarkable results in food delivery & consumption sales soaring up to about 60 percent increase on the day after the Covid-19 related new article was issued. In addition, it proved that mobile media and web surfing were the main factors in increasing sales of food delivery & consumption applications, suggesting that viral marketing and emotional analysis by crawling data from SNS used by Millennials might be an important factor in sales growth. It can contribute the companies in the economic recession era to survive by providing the method for analyzing the big data and increasing their sales.

Analysis of Research Trends in Elementary Information Education According to Changes in Curriculum (교육과정 변화에 따른 초등 정보교육 연구 동향 분석)

  • Lee, Youngho
    • Journal of The Korean Association of Information Education
    • /
    • v.25 no.3
    • /
    • pp.537-545
    • /
    • 2021
  • Contents related to computers in the curriculum have been presented from the 5th curriculum released in 1987. The practical education curriculum of the 2015 revised curriculum is composed of software-related content from the existing ICT-related contents. Related research needs to be preceded in order to revise the curriculum according to the times and social needs. Research on elementary school information education is mainly conducted by the Korean Society for Information Education. Therefore, in this study, based on the thesis of the Society for Information Education, the research trends of the society were analyzed by a period of change in the curriculum. Research Results The research of the society shows a change in research trends similar to the change in the curriculum. And it can be seen that the research of society precedes the change in the curriculum.

Classification Model of Food Groups in Food Exchange Table Using Decision Tree-based Machine Learning

  • Kim, Ji Yun;Kim, Jongwan
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.12
    • /
    • pp.51-58
    • /
    • 2022
  • In this paper, we propose a decision tree-based machine learning model that leads to food exchange table renewal by classifying food groups through machine learning for existing food and food data found by web crawling. The food exchange table is the standard for food exchange intake when composing a diet such as diet and diet, as well as patients who need nutritional management. The food exchange table, which is the standard for the composition of the diet, takes a lot of manpower and time in the process of revision through the National Health and Nutrition Survey, making it difficult to quickly reflect food changes according to new foods or trends. Since the proposed technique classifies newly added foods based on the existing food group, it is possible to organize a rapid food exchange table reflecting the trend of food. As a result of classifying food into the proposed model in the study, the accuracy of the food group in the food exchange table was 97.45%, so this food classification model is expected to be highly utilized for the composition of a diet that suits your taste in hospitals and nursing homes.

Analysis of online parenting community posts on expanded newborn screening for metabolic disorders using topic modeling: a quantitative content analysis (토픽 모델링을 활용한 광범위 선천성 대사이상 신생아 선별검사 관련 온라인 육아 커뮤니티 게시 글 분석: 계량적 내용분석 연구)

  • Myeong Seon Lee;Hyun-Sook Chung;Jin Sun Kim
    • Women's Health Nursing
    • /
    • v.29 no.1
    • /
    • pp.20-31
    • /
    • 2023
  • Purpose: As more newborns have received expanded newborn screening (NBS) for metabolic disorders, the overall number of false-positive results has increased. The purpose of this study was to explore the psychological impacts experienced by mothers related to the NBS process. Methods: An online parenting community in Korea was selected, and questions regarding NBS were collected using web crawling for the period from October 2018 to August 2021. In total, 634 posts were analyzed. The collected unstructured text data were preprocessed, and keyword analysis, topic modeling, and visualization were performed. Results: Of 1,057 words extracted from posts, the top keyword based on 'term frequency-inverse document frequency' values was "hypothyroidism," followed by "discharge," "close examination," "thyroid-stimulating hormone levels," and "jaundice." The top keyword based on the simple frequency of appearance was "XXX hospital," followed by "close examination," "discharge," "breastfeeding," "hypothyroidism," and "professor." As a result of LDA topic modeling, posts related to inborn errors of metabolism (IEMs) were classified into four main themes: "confirmatory tests of IEMs," "mother and newborn with thyroid function problems," "retests of IEMs," and "feeding related to IEMs." Mothers experienced substantial frustration, stress, and anxiety when they received positive NBS results. Conclusion: The online parenting community played an important role in acquiring and sharing information, as well as psychological support related to NBS in newborn mothers. Nurses can use this study's findings to develop timely and evidence-based information for parents whose children receive positive NBS results to reduce the negative psychological impact.

Development of Online Fashion Thesaurus and Taxonomy for Text Mining (텍스트마이닝을 위한 패션 속성 분류체계 및 말뭉치 웹사전 구축)

  • Seyoon Jang;Ha Youn Kim;Songmee Kim;Woojin Choi;Jin Jeong;Yuri Lee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.46 no.6
    • /
    • pp.1142-1160
    • /
    • 2022
  • Text data plays a significant role in understanding and analyzing trends in consumer, business, and social sectors. For text analysis, there must be a corpus that reflects specific domain knowledge. However, in the field of fashion, the professional corpus is insufficient. This study aims to develop a taxonomy and thesaurus that considers the specialty of fashion products. To this end, about 100,000 fashion vocabulary terms were collected by crawling text data from WSGN, Pantone, and online platforms; text subsequently was extracted through preprocessing with Python. The taxonomy was composed of items, silhouettes, details, styles, colors, textiles, and patterns/prints, which are seven attributes of clothes. The corpus was completed through processing synonyms of terms from fashion books such as dictionaries. Finally, 10,294 vocabulary words, including 1,956 standard Korean words, were classified in the taxonomy. All data was then developed into a web dictionary system. Quantitative and qualitative performance tests of the results were conducted through expert reviews. The performance of the thesaurus also was verified by comparing the results of text mining analysis through the previously developed corpus. This study contributes to achieving a text data standard and enables meaningful results of text mining analysis in the fashion field.

Web crawling process of each social network service for recognizing water quality accidents in the water supply networks (물공급네트워크 수질사고인지를 위한 소셜네트워크 서비스 별 웹크롤링 방법론 개발)

  • Yoo, Do Guen;Hong, Seunghyeok;Moon, Gihoon
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2022.05a
    • /
    • pp.398-398
    • /
    • 2022
  • 최근 수돗물 공급과정에 있어 적수, 유충 발생 등 지역 단위의 수질문제로 국민의 직간접적인 피해가 발생된 바 있다. 수질문제 발생 시, 소셜네트워크서비스(SNS)에 게시되는 피해 관련 의견은 시공간적으로 빠르게 확산되며, 궁극적으로는 물공급과정 전체의 부정적 인식증가와 신뢰도 저하를 초래한다. 따라서, 물공급시스템에서의 수질사고 발생을 빠르게 인지하는 다양한 방법론의 적용을 통한 피해 최소화를 위한 노력이 반드시 필요하다. 일반적으로 수질사고는 다양한 항목의 실시간 계측기에서 획득되는 시계열자료의 변화양상을 통해 판단할 수 있으나, 이와 같은 방법론의 효율적 적용을 위해서는 선진계측인프라의 도입이 선행되어야 한다. 본 연구에서는 국내의 발달된 정보통신기술환경을 활용하여, 물공급네트워크 내 수질사고인지를 위한 SNS 별 웹크롤링 방법론을 제안하고, 적용결과를 분석하였다. 방법론의 구현에 앞서, 각종 SNS 별(트위터, 인스타그램, 블로그, 네이버 카페 등) 프로그래밍을 통한 웹크롤링 가능여부, 정보획득 기간 등을 확인하였으며, 과거 유사 수질사고 발생 시 영향력과 관련 게시글이 크게 나타난 네이버 카페와 트위터를 중심으로 웹 크롤링 절차를 제시하였다. 네이버 카페의 경우 대상급수구역 내의 시민들이 다수 참여하는 카페를 목록화하고, 지자체명과 핵심 키워드(수돗물, 유충, 적수) 조합을 활용한 웹크롤링을 수행하여, 관련 게시물 건수와 의미를 실시간으로 분석하는 절차를 마련하였다. 개발된 SNS 별 웹크롤링 방법론에 따라 과거 수질사고가 발생된 바 있는 2개 이상의 지자체에 대한 분석을 실시하였으며, SNS 별 결과에 있어 차이점을 확인하여 제시하였다. 향후 제안된 방법을 적용하여 시공간적 수질사고 정보의 전파 및 확산양상을 추가적으로 분석할수 있을 것으로 기대된다.

  • PDF

Media-based Analysis of Gasoline Inventory with Korean Text Summarization (한국어 문서 요약 기법을 활용한 휘발유 재고량에 대한 미디어 분석)

  • Sungyeon Yoon;Minseo Park
    • The Journal of the Convergence on Culture Technology
    • /
    • v.9 no.5
    • /
    • pp.509-515
    • /
    • 2023
  • Despite the continued development of alternative energies, fuel consumption is increasing. In particular, the price of gasoline fluctuates greatly according to fluctuations in international oil prices. Gas stations adjust their gasoline inventory to respond to gasoline price fluctuations. In this study, news datasets is used to analyze the gasoline consumption patterns through fluctuations of the gasoline inventory. First, collecting news datasets with web crawling. Second, summarizing news datasets using KoBART, which summarizes the Korean text datasets. Finally, preprocessing and deriving the fluctuations factors through N-Gram Language Model and TF-IDF. Through this study, it is possible to analyze and predict gasoline consumption patterns.

Evaluation of communication effectiveness of cruelty-free fashion brands - A comparative study of brand-led and consumer-perceived images - (크루얼티 프리 패션 브랜드의 커뮤니케이션 성과 분석 - 브랜드 주도적 이미지와 소비자 지각 이미지에 대한 비교 -)

  • Yeong-Hyeon Choi;Sangyung Lee
    • The Research Journal of the Costume Culture
    • /
    • v.32 no.2
    • /
    • pp.247-259
    • /
    • 2024
  • This study assessed the effectiveness of brand image communication on consumer perceptions of cruelty-free fashion brands. Brand messaging data were gathered from postings on the official Instagram accounts of three cruelty-free fashion brands and consumer perception data were gathered from Tweets containing keywords related to each brand. Web crawling and natural language processing were performed using Python and sentiment analysis was conducted using the BERT model. By analyzing Instagram content from Stella McCartney, Patagonia, and Freitag from their inception until 2021, this study found these brands all emphasize environmental aspects but with differing focuses: Stella McCartney on ecological conservation, Patagonia on an active outdoor image, and Freitag on upcycled products. Keyword analysis further indicated consumers perceive these brands in line with their brand messaging: Stella McCartney as high-end and eco-friendly, Patagonia as active and environmentally conscious, and Freitag as centered on recycling. Results based on the assessment of the alignment between brand-driven images and consumer-perceived images and the sentiment evaluation of the brand confirmed the outcomes of brand communication performance. The study revealed a correlation between brand image and positive consumer evaluations, indicating that higher alignment of ethical values leads to more positive consumer assessments. Given that consumers tend to prioritize search keywords over brand concepts, it's important for brands to focus on using visual imagery and promotions to effectively convey brand communication information. These findings highlight the importance of brand communication by emphasizing the connection between ethical brand images and consumer perceptions.

A Study on Duplication Verification of Public Library Catalog Data: Focusing on the Case of G Library in Busan (공공도서관 목록데이터의 중복검증에 관한 연구 - 부산 지역 G도서관 사례를 중심으로 -)

  • Min-geon Song;Soo-Sang Lee
    • Journal of Korean Library and Information Science Society
    • /
    • v.55 no.1
    • /
    • pp.1-26
    • /
    • 2024
  • The purpose of this study is to derive an integration plan for bibliographic records by applying a duplicate verification algorithm to the item-based catalog in public libraries. To this, G Library, which was opened recently in Busan, was selected. After collecting OPAC data from G Library through web crawling, multipart monographs of Korean Literature (KDC 800) were selected and KERIS duplicate verification algorithm was applied. After two rounds of data correction based on the verification results, the duplicate verification rate increased by a total of 2.74% from 95.53% to 98.27%. Even after data correction, 24 books that were judged to be similar or inconsistent were identified as data from other published editions after receiving separate ISBN such as revised versions or hard copies. Through this, it was confirmed that the duplicate verification rate could be improved through catalog data correction work, and the possibility of using the KERIS duplicate verification algorithm as a tool to convert duplicate item-based records from public libraries into manifestation-based records was confirmed.