• Title/Summary/Keyword: news big data

Search Result 287, Processing Time 0.024 seconds

Unstructured Data Analysis using Equipment Check Ledger: A Case Study in Telecom Domain (장비점검 일지의 비정형 데이터분석을 통한 고장 대응 효율화 사례 연구)

  • Ju, Yeonjin;Kim, Yoosin;Jeong, Seung Ryul
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.127-135
    • /
    • 2020
  • As the importance of the use and analysis of big data is emerging, there is a growing interest in natural language processing techniques for unstructured data such as news articles and comments. Particularly, as the collection of big data becomes possible, data mining techniques capable of pre-processing and analyzing data are emerging. In this case study with a telecom company, we propose a methodology how to formalize unstructured data using text mining. The domain is determined as equipment failure and the data is about 2.2 million equipment check ledger data. Data on equipment failures by 800,000 per year is accumulated in the equipment check ledger. The equipment check ledger coexist with both formal and unstructured data. Although formal data can be easily used for analysis, unstructured data is difficult to be used immediately for analysis. However, in unstructured data, there is a high possibility that important information. Because it can be contained that is not written in a formal. Therefore, in this study, we study to develop digital transformation method for unstructured data in equipment check ledger.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

News Big Data Analysis of 'Media Literacy' Using Topic Modeling Analysis (미디어 리터러시 뉴스 빅데이터 분석: 토픽 모델링 분석을 중심으로)

  • Han, Songlee;Kim, Taejong
    • The Journal of the Korea Contents Association
    • /
    • v.21 no.4
    • /
    • pp.26-37
    • /
    • 2021
  • This study conducted a big data analysis on news to identify the agenda of media literacy, which has been socially discussed, and on which relevant policy directions will be proposed. To this end 1,336 articles from January 1, 2019 to September 30, 2020 were collected and a topic modeling analysis was conducted according to four periods. Five topics for each period were derived through the analysis, and implications based on the results are as follows. First, the government should implement a nation-level systematic approach to media literacy education according to life cycle stages to generate economic and cultural value. Second, local communities and schools should provide systematic support and education guidance activities to ensure a sustainable ecosystem for media literacy and prevent an educational gap and loss in learning. Third, efforts should be made in various aspects to minimize the side effects resulting from constantly providing media literacy education; furthermore a culture of desirable media application should be established. Finally, a research environment for scientific research on media literacy, active exchange of experience and value obtained in the field, and long-term accumulation of research results should be encouraged to develop a robust knowledge exchange culture.

Analysis of news bigdata on 'Gather Town' using the Bigkinds system

  • Choi, Sui
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.3
    • /
    • pp.53-61
    • /
    • 2022
  • Recent years have drawn a great attention to generation MZ and Metaverse, due to 4th industrial revolution and the development of digital environment that blurs the boundary between reality and virtual reality. Generation MZ approaches the information very differently from the existing generations and uses distinguished communication methods. In terms of learning, they have different motivations, types, skills and build relationships differently. Meanwhile, Metaverse is drawing a great attention as a teaching method that fits traits of gen MZ. Thus, the current research aimed to investigate how to increase the use of Metaverse in Educational Technology. Specifically, this research examined the antecedents of popularity of Gather Town, a platform of Metaverse. Big data of news articles have been collected and analyzed using the Bigkinds system provided by Korea Press Foundation. The analysis revealed, first, a rapid increasing trend of media exposure of Gather Town since July 2021. This suggests a greater utilization of Gather Town in the field of education after the COVID-19 pandemic. Second, Word Association Analysis and Word Cloud Analysis showed high weights on education related words such as 'remote', 'university', and 'freshman', while words like 'Metaverse', 'Metaverse platform', 'Covid19', and 'Avatar' were also emphasized. Third, Network Analysis extracted 'COVID19', 'Avatar', 'University student', 'career', 'YouTube' as keywords. The findings also suggest potential value of Gather Town as an educational tool under COVID19 pandemic. Therefore, this research will contribute to the application and utilization of Gather Town in the field of education.

A Study on the Factors of Well-aging through Big Data Analysis : Focusing on Newspaper Articles (빅데이터 분석을 활용한 웰에이징 요인에 관한 연구 : 신문기사를 중심으로)

  • Lee, Chong Hyung;Kang, Kyung Hee;Kim, Yong Ha;Lim, Hyo Nam;Ku, Jin Hee;Kim, Kwang Hwan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.5
    • /
    • pp.354-360
    • /
    • 2021
  • People hope to live a healthy and happy life achieving satisfaction by striking a good work-life balance. Therefore, there is a growing interest in well-aging which means living happily to a healthy old age without worry. This study identified important factors related to well-aging by analyzing news articles published in Korea. Using Python-based web crawling, 1,199 articles were collected on the news service of portal site Daum till November 2020, and 374 articles were selected which matched the subject of the study. The frequency analysis results of text mining showed keywords such as 'elderly', 'health', 'skin', 'well-aging', 'product', 'person', 'aging', 'female', 'domestic' and 'retirement' as important keywords. Besides, a social network analysis with 45 important keywords revealed strong connections in the order of 'skin-wrinkle', 'skin-aging' and 'old-health'. The result of the CONCOR analysis showed that 45 main keywords were composed of eight clusters of 'life and happiness', 'disease and death', 'nutrition and exercise', 'healing', 'health', and 'elderly services'.

An Exploratory Study on the Learning Community: Focusing on the Covid19 Untact Era (배움공동체에 대한 탐색적 연구 : covid19 언택트시대를 중심으로)

  • Jeong, Su-Jeong;Im, Hong-Nam;Park, Hong-Jae
    • Journal of Convergence for Information Technology
    • /
    • v.12 no.5
    • /
    • pp.237-245
    • /
    • 2022
  • This study examines the social discourse on the characteristics of the learning community in the untact era, and discusses the directions that learning communities for children could explore and consider in the pandemic situation and beyond. For this purpose, big data for one year, from January 20, 2020 to January 20, 2021, were collected through internet portal sites (includingincluding Google News, Daum, Naver and other News surfaces), using two keywords "untact" and "learning community", and analyzed by employing a word frequency and network analysis method. The analysis results show that several important terms, such as 'village education community', 'operation', 'activity', 'corona 19', 'support', and 'online' are closely related to the learning community in the untact era. The findings from this study also have implications for developing the learning community as an alternative model to fill the existing gaps in public care and education for children during the prolonged pandemic and afterwards. In conclusion, the study findings highlight that it is meaningful to identify key terms and concepts through word frequency analysis in order to examine social trends and issues related to the learning community.

Issue tracking and voting rate prediction for 19th Korean president election candidates (댓글 분석을 통한 19대 한국 대선 후보 이슈 파악 및 득표율 예측)

  • Seo, Dae-Ho;Kim, Ji-Ho;Kim, Chang-Ki
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.199-219
    • /
    • 2018
  • With the everyday use of the Internet and the spread of various smart devices, users have been able to communicate in real time and the existing communication style has changed. Due to the change of the information subject by the Internet, data became more massive and caused the very large information called big data. These Big Data are seen as a new opportunity to understand social issues. In particular, text mining explores patterns using unstructured text data to find meaningful information. Since text data exists in various places such as newspaper, book, and web, the amount of data is very diverse and large, so it is suitable for understanding social reality. In recent years, there has been an increasing number of attempts to analyze texts from web such as SNS and blogs where the public can communicate freely. It is recognized as a useful method to grasp public opinion immediately so it can be used for political, social and cultural issue research. Text mining has received much attention in order to investigate the public's reputation for candidates, and to predict the voting rate instead of the polling. This is because many people question the credibility of the survey. Also, People tend to refuse or reveal their real intention when they are asked to respond to the poll. This study collected comments from the largest Internet portal site in Korea and conducted research on the 19th Korean presidential election in 2017. We collected 226,447 comments from April 29, 2017 to May 7, 2017, which includes the prohibition period of public opinion polls just prior to the presidential election day. We analyzed frequencies, associative emotional words, topic emotions, and candidate voting rates. By frequency analysis, we identified the words that are the most important issues per day. Particularly, according to the result of the presidential debate, it was seen that the candidate who became an issue was located at the top of the frequency analysis. By the analysis of associative emotional words, we were able to identify issues most relevant to each candidate. The topic emotion analysis was used to identify each candidate's topic and to express the emotions of the public on the topics. Finally, we estimated the voting rate by combining the volume of comments and sentiment score. By doing above, we explored the issues for each candidate and predicted the voting rate. The analysis showed that news comments is an effective tool for tracking the issue of presidential candidates and for predicting the voting rate. Particularly, this study showed issues per day and quantitative index for sentiment. Also it predicted voting rate for each candidate and precisely matched the ranking of the top five candidates. Each candidate will be able to objectively grasp public opinion and reflect it to the election strategy. Candidates can use positive issues more actively on election strategies, and try to correct negative issues. Particularly, candidates should be aware that they can get severe damage to their reputation if they face a moral problem. Voters can objectively look at issues and public opinion about each candidate and make more informed decisions when voting. If they refer to the results of this study before voting, they will be able to see the opinions of the public from the Big Data, and vote for a candidate with a more objective perspective. If the candidates have a campaign with reference to Big Data Analysis, the public will be more active on the web, recognizing that their wants are being reflected. The way of expressing their political views can be done in various web places. This can contribute to the act of political participation by the people.

Data Processing and Visualization Method for Retrospective Data Analysis and Research Using Patient Vital Signs (환자의 활력 징후를 이용한 후향적 데이터의 분석과 연구를 위한 데이터 가공 및 시각화 방법)

  • Kim, Su Min;Yoon, Ji Young
    • Journal of Biomedical Engineering Research
    • /
    • v.42 no.4
    • /
    • pp.175-185
    • /
    • 2021
  • Purpose: Vital sign are used to help assess the general physical health of a person, give clues to possible diseases, and show progress toward recovery. Researchers are using vital sign data and AI(artificial intelligence) to manage a variety of diseases and predict mortality. In order to analyze vital sign data using AI, it is important to select and extract vital sign data suitable for research purposes. Methods: We developed a method to visualize vital sign and early warning scores by processing retrospective vital sign data collected from EMR(electronic medical records) and patient monitoring devices. The vital sign data used for development were obtained using the open EMR big data MIMIC-III and the wearable patient monitoring device(CareTaker). Data processing and visualization were developed using Python. We used the development results with machine learning to process the prediction of mortality in ICU patients. Results: We calculated NEWS(National Early Warning Score) to understand the patient's condition. Vital sign data with different measurement times and frequencies were sampled at equal time intervals, and missing data were interpolated to reconstruct data. The normal and abnormal states of vital sign were visualized as color-coded graphs. Mortality prediction result with processed data and machine learning was AUC of 0.892. Conclusion: This visualization method will help researchers to easily understand a patient's vital sign status over time and extract the necessary data.

Analysis of remote learning trends in the COVID-19 period using news big data (뉴스 빅데이터를 활용한 코로나 19시기의 원격 교육 동향 분석)

  • Lee, Youngho;Koo, Dukhoi
    • 한국정보교육학회:학술대회논문집
    • /
    • 2021.08a
    • /
    • pp.193-197
    • /
    • 2021
  • The pandemic situation caused by COVID-19 has a large and small impact on our society socially, economically, psychologically, and other aspects. In order to prevent the spread of COVID-19, various countries, including Korea, have entered into long-term home care and distance learning systems. However, distance learning experiments conducted in many countries have raised whether face-to-face education can be replaced by distance learning. Therefore, in this study, public opinion, social perception, and field trends were analyzed based on media reports on distance learning. For this purpose, 2,600 articles from 11 newspapers and four broadcasters related to distance learning were collected in this study. Based on this data, keyword trend analysis, topic modeling analysis, sentiment analysis were performed.

  • PDF

Comparative Analysis of Low Fertility Policy and the Public Perceptions using Text-Mining Methodology (텍스트 마이닝을 활용한 저출산 정책과 대중인식 비교)

  • Bae, Giryeon;Moon, HyunJeong;Lee, Jaeil;Park, Mina;Park, Arum
    • Journal of Digital Convergence
    • /
    • v.19 no.12
    • /
    • pp.29-42
    • /
    • 2021
  • As the low fertility intensifies in Korea, this study investigated fundamental differences between the government's low fertility policy and public perception of it. To this end, we selected four times 'Aging Society and Population Policy' documents and news comments for two weeks immediately after announcement of the third and fourth Policy as analysis targets. Then we conducted word frequency analysis, co-occurrence analysis and CONCOR analysis. As a result of analyses, first, direct childcare support during the first and second periods, and a social structural approach during third and fourth periods were noticeable. Second, it was revealed that both policies and comments aim for the work-family compatibility in 'parenting'. Lastly it was showed public interest in environment of raising children and the critical mind to effectiveness of the policy. This study is meaningful in that it confirmed the public perception using big data analysis, and it will help improve the direction for the future low fertility policy.