• Title/Summary/Keyword: news data

Search Result 888, Processing Time 0.036 seconds

Training Techniques for Data Bias Problem on Deep Learning Text Summarization (딥러닝 텍스트 요약 모델의 데이터 편향 문제 해결을 위한 학습 기법)

  • Cho, Jun Hee;Oh, Hayoung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.7
    • /
    • pp.949-955
    • /
    • 2022
  • Deep learning-based text summarization models are not free from datasets. For example, a summarization model trained with a news summarization dataset is not good at summarizing other types of texts such as internet posts and papers. In this study, we define this phenomenon as Data Bias Problem (DBP) and propose two training methods for solving it. The first is the 'proper nouns masking' that masks proper nouns. The second is the 'length variation' that randomly inflates or deflates the length of text. As a result, experiments show that our methods are efficient for solving DBP. In addition, we analyze the results of the experiments and present future development directions. Our contributions are as follows: (1) We discovered DBP and defined it for the first time. (2) We proposed two efficient training methods and conducted actual experiments. (3) Our methods can be applied to all summarization models and are easy to implement, so highly practical.

Change in Market Issues on HMR (Home Meal Replacements) Using Local Foods after the COVID-19 Outbreak: Text Mining of Online Big Data (코로나19 발생 후 지역농산물 이용 간편식에 대한 시장 이슈 변화: 온라인 빅데이터의 텍스트마이닝)

  • Yoojeong, Joo;Woojin, Byeon;Jihyun, Yoon
    • Journal of the Korean Society of Food Culture
    • /
    • v.38 no.1
    • /
    • pp.1-14
    • /
    • 2023
  • This study was conducted to explore the change in the market issues on HMR (Home Meal Replacements) using local foods after the COVID-19 outbreak. Online text data were collected from internet news, social media posts, and web documents before (from January 2016 to December 2019) and after (from January 2020 to November 2022) the COVID-19 outbreak. TF-IDF analysis showed that 'Trend', 'Market', 'Consumption', and 'Food service industry' were the major keywords before the COVID-19 outbreak, whereas 'Wanju-gun', 'Distribution', 'Development', and 'Meal-kit' were main keywords after the COVID-19 outbreak. The results of topic modeling analysis and categorization showed that after the COVID-19 outbreak, the 'Market' category included 'Non-face-to-face market' instead of 'Event,' and 'Delivery' instead of 'Distribution'. In the 'Product' category, 'Marketing' was included instead of 'Trend'. Additionally, in the 'Support' category, 'Start-up' and 'School food service' appeared as new topics after the COVID-19 outbreak. In conclusion, this study showed that meaningful change had occurred in market issues on HMR using local foods after the COVID-19 outbreak. Therefore, governments should take advantage of such market opportunity by implementing policy and programs to promote the development and marketing of HMR using local foods.

Social Media Analytics to Understand the Construction Industry Sentiments

  • Shrestha, K. Joseph;Mani, Nirajan;Kisi, Krishna P.;Abdelaty, Ahmed
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.712-720
    • /
    • 2022
  • The use of social media to disseminate news and interact with project stakeholders is increasing over time in the construction industry. Such social media data can be analyzed to get useful insights of the industry such as demands of new housing construction and satisfaction of construction workers. However, there has been a limited attempts to analyze social media data related to the construction industry. The objective of this study is to collect and analyze construction related tweets to understand the overall sentiments of individuals and organizations about the construction industry. The study collected 87,244 tweets from April 6, 2020, to April 13, 2020, which had hashtags relevant to the construction industry. The tweets were then analyzed to evaluate its sentiments polarity (positive or negative) and sentiment intensity or scores (-1 to +1). Descriptive statistics were produced for the tweets and the sentiment scores were visualized in a scatterplot to show the trend of the sentiment scores over time. The results shows that the overall sentiment score of all the tweets was slightly positive (0.0365). Negative tweets were retweeted and marked as favorite by more users on average than the positive ones. More specifically, the tweets with negative sentiments were retweeted by 2,802 users on average compared to the tweets with positive sentiments (247 average retweet count). This study can potentially be expanded in the future to produce a real time indicator of the construction market industry such as the increased availability of construction jobs, improved wage rates, and recession.

  • PDF

Structural Topic Modeling Analysis of Patient Safety Interest among Health Consumers in Social Media (소셜미디어 내 의료소비자의 환자안전 관심에 대한 구조적 토픽 모델링 분석)

  • Kim, Nari;Lee, Nam-Ju
    • Journal of Korean Academy of Nursing
    • /
    • v.54 no.2
    • /
    • pp.266-278
    • /
    • 2024
  • Purpose: This study aimed to investigate healthcare consumers' interest in patient safety on social media using structural topic modeling (STM) and to identify changes in interest over time. Methods: Analyzing 105,727 posts from Naver news comments, blogs, internet cafés, and Twitter between 2010 and 2022, this study deployed a Python script for data collection and preprocessing. STM analysis was conducted using R, with the documents' publication years serving as metadata to trace the evolution of discussions on patient safety. Results: The analysis identified a total of 13 distinct topics, organized into three primary communities: (1) "Demand for systemic improvement of medical accidents," underscoring the need for legal and regulatory reform to enhance accountability; (2) "Efforts of the government and organizations for safety management," highlighting proactive risk mitigation strategies; and (3) "Medical accidents exposed in the media," reflecting widespread concerns over medical negligence and its repercussions. These findings indicate pervasive concerns regarding medical accountability and transparency among healthcare consumers. Conclusion: The findings emphasize the importance of transparent healthcare policies and practices that openly address patient safety incidents. There is clear advocacy for policy reforms aimed at increasing the accountability and transparency of healthcare providers. Moreover, this study highlights the significance of educational and engagement initiatives involving healthcare consumers in fostering a culture of patient safety. Integrating consumer perspectives into patient safety strategies is crucial for developing a robust safety culture in healthcare.

The Study on the TV Female Anchor's Image according to the Make Up and Hair Style (메이크업과 헤어스타일 유형에 따른 TV 뉴스 여자 앵커의 인상형성에 관한 연구)

  • Oh, In-Young;Kim, In-Sook
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.30 no.11 s.158
    • /
    • pp.1636-1647
    • /
    • 2006
  • The purpose of this study was to provide data that can be used to suggest idealistic anchor's image to capture ratings for the news program and to suggest guide for casting and training anchors for the broadcasting stations by examining the idealistic looks and images of TV news anchors by asking the general viewers who watch TV, The research methods was questionnaire survey. The subjects were 839 male and female audiences in theire 20's and 40's who residing in Seoul and Gyeonggi area. The study results are as follows: 1) The factors that decide impression of a female anchor The factors that decide female anchor's impression were 'specialty factor, friendliness factor, elegance factor, dynamic factor, and attractiveness factor'. 2) The difference in formation of impression according to makeup and hair style of a female anchor In case of specialty and friendliness factors scored high when putting on natural makeup, dynamic factors scored high when putting on elegant makeup, and attractiveness factor when putting on natural and romantic makeup. All factor were high when a female anchor had short-cut style and straight hair 3) Formation of anchor's impression from makeup and hair style according to the perceiver's variables (gender and age) Male and female audiences both gave hish score for a female anchor's specialty such as 'confident and reliable' and friendliness such as 'warm and comfortable' when a female anchor puts on natural makeup. They gave high score for attractiveness factor such as' good impression and refined' when putting on romantic makeup and high score for dynamic factor such as 'positive and confident' when putting on elegant makeup. Both male and female audiences gave high score fur all except friendliness factor when a female anchor had short-cut style compared to bobbed hair and high score far specialty factor when a female anchor had straight hair. The audiences both in their 20's and 40's gave high score for specialty and friendliness when a female anchor put on natural makeup while the perceivers at their 20's gave high score for elegance and dynamic factor when a female anchor put on elegant makeup. The audiences both in their 20's and 40's gave high score for all factors when female anchor had short-cut hair.

A Study on Citizen Reporter Systems and Civic Journalism Practices in Korean Internet Newspapers (시민기자 제도 도입에 따른 인터넷 신문의 시민 저널리즘 실천 가능성에 관한 연구)

  • Kim, Byoung-Cheol;Choi, Young
    • Korean journal of communication and information
    • /
    • v.26
    • /
    • pp.45-82
    • /
    • 2004
  • The purpose of this study is to examine the concept of civic journalism and the contents of Korean Internet newspapers that might reflect the possibilities of this new medium for civic journalism practices. This study examined how far and deep civic journalism practices have extended into Korean Internet newspapers as journalism's new tradition. More specifically, this study analyzed news articles of Korean Internet newspapers to uncover any differences among civic journalism Internet newspapers with different citizen reporter systems. The composite measure based upon ten elements of civic journalism practices was used as indicator of civic journalism practices. To obtain systematic data on news offered by Korean Internet newspapers on the World Wide Web, four major Internet newspapers, including Ohmynews, Ngotimes, Netpinion and Pressian were examined by a content analysis in April and May of 2003. Findings of this study reveal that many Korean newspapers do not fully exploit the opportunities and advantages offered by the new medium for civic journalism practices in online environments. Both aggregate and individual level of analysis for the civic journalism index reveal that there are some differences between non-civic journalism and civic journalism Internet newspapers using citizen reporter systems. However, overall performances of civic journalism Internet newspapers are not good enough to support the argument that civic journalism is well practiced in Korean Internet newspapers. Nonetheless, it would not be fair to conclude that Korean Internet newspapers have totally ignored the Internet's potential to increase the civic journalism performance in online environments.

  • PDF

The Relationship among Media use, Political cynicism, Voting Behavior in 2012 General Elections (2012 국회의원 총선에서 나타난 미디어 이용, 정치 냉소주의, 투표 참여 간의 관계에 관한 연구)

  • Kwon, Hyok-Nam
    • Korean journal of communication and information
    • /
    • v.60
    • /
    • pp.28-51
    • /
    • 2012
  • This study explored the influence of media use on the audiences' intention to vote as well as their political cynicism in 2012 General elections. I offered three research questions: Research Question 1: What is the impact of media use on the political cynicism? Research Question 2: What are the impact of political intersest, political knowledge, media malaise, political efficacy on the political cynicism? Research Question 3: What is the impact of political cynicism on vote behavior? This study analysed survey data. Based on the results of hierarchial regression analysis and path analysis(AMOS), Internet news use was found to have a significant impact on the political cynicism. But the use of newspaper, TV news were not related to political cynicism. The political efficacy decreased political cynicism effectively, The findings from this study indicate that the relationship between media use and political cynicism is contingent on many factors and that cynicism has a negligible impact on citizen participation. This study also found that persons higher in efficacy were less cynical than low in efficacy. This suggest that cynicism is not always bad thing, that it may in fact be an indication of "an interested and critical citizenry". In conclusion this study showed that we need more in-depth analyses on the relationships among attention to media use, political cynicism and voting behavior to activate political participation.

  • PDF

An Analysis of Newspaper Coverage of Korean Movie Stars : Focusing on the Image of Movie Stars and Reporting Trend (신문의 한국 영화스타 보도 내용분석 : 영화스타의 이미지와 보도 경향 중심으로)

  • Tae, Bo-Ra
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.9
    • /
    • pp.535-549
    • /
    • 2019
  • The purpose of this study was to examine what kind of images were presented on movie stars in the newspaper. For the purpose, we classified the time period according to the movie industry and media trend, selected representative stars by period, and collected 798 related articles reported in newspapers. As a result of analyzing the reporting trend, domestic and foreign topics, news format, and gender difference in collected movie star articles, it was found that the image of movie stars reproduced in newspaper articles had mostly neutral images that do not represent specific gender. Since the 2000s, news coverage was changed to reproduce various images rather than being fixed to particular images, and the subject of report became more diversified through comparison of domestic and foreign topics. In addition, articles in the form of book review decreased and the interview-type articles increased in number, and in the case of male movie stars, the proportion of articles based on works was high in comparison to female movie stars. This study has significance in that it explored the changes in the process of reproducing star images diachronically from the initial stage of stars to the modern times. And it is hoped that this study will serve as basic data for the follow-up studies on the process of reproducing various images in the multi-media era.

Trend Analysis of Fraudulent Claims by Long Term Care Institutions for the Elderly using Text Mining and BIGKinds (텍스트 마이닝과 빅카인즈를 활용한 노인장기요양기관 부당청구 동향 분석)

  • Youn, Ki-Hyok
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.2
    • /
    • pp.13-24
    • /
    • 2022
  • In order to explore the context of fraudulent claims and the measures for preventing them targeting the long-term care institutions for the elderly, which is increasing every year in Korea, this study conducted the text mining analysis using the media report articles. The media report articles were collected from the news big data analysis system called 'BIG KINDS' for about 15 years from July 2008 when the Long-Term Care Insurance for the Elderly took effect, to February 28th 2022. During this period of time, total 2,627 articles were collected under keywords like 'elderly care+fraudulent claims' and 'long-term care+fraudulent claims', and among them, total 946 articles were selected after excluding overlapped articles. In the results of the text mining analysis in this study, first, the top 10 keywords mentioned in the highest frequency in every section(July 1st 2008-February 28th 2022) were shown in the order of long-term care institution for the elderly, fraudulent claims, National Health Insurance Service, Long-Term Care Insurance for the Elderly, long-term care benefits(expenses), elderly care facilities, The Ministry of Health & Welfare, the elderly, report, and reward(payment). Second, in the results of the N-gram analysis, they were shown in the order of long-term care benefits(expenses) and fraudulent claims, fraudulent claims and long-care institution for the elderly, falsehood and fraudulent claims, report and reward(payment), and long-term care institution for the elderly and report. Third, the analysis of TF-IDF was similar to the results of the frequency analysis while the rankings of report, reward(payment), and increase moved up. Based on such results of the analysis above, this study presented the future direction for the prevention of fraudulent claims of long-term care institutions for the elderly.

Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation (영한 기계 번역에서 미가공 텍스트 데이터를 이용한 대역어 선택 중의성 해소)

  • Kim Yu-Seop;Chang Jeong-Ho
    • The KIPS Transactions:PartB
    • /
    • v.11B no.6
    • /
    • pp.749-758
    • /
    • 2004
  • In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We use k- nearest neighbor learning algorithm for the resolution of data sparseness Problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wail Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis methods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space and k value of k-NT learning by using correlation calculation.