• Title/Summary/Keyword: Big data collection

Search Result 348, Processing Time 0.025 seconds

National Awareness of the 2019 World Swimming Championships using Big Data from Social Network Analysis (소셜네트워크 분석의 빅데이터를 활용한 2019세계수영선수권 대회의 국내 인식조사)

  • Kim, Gi-Tak
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.4
    • /
    • pp.173-184
    • /
    • 2019
  • The data processing of this study is based on the word data search in social media through textom and the big data analysis is carried out and three areas (2019 Gwangju World Swimming Championships, 2019 Gwangju World Swimming Masters Competition, 2019 World Swimming Championships Problem) was consistently handled through data collection and refinement in the web environment. We applied the collected words to the program of Ucinet6, visualized them, and conducted a CONCOR analysis to grasp the similar relationship of words and to identify the cluster of common factors. As a result of the analysis, the clusters related to the 2019 Gwangju World Swimming Championships mainly consisted of four major areas of recognition and perception, mainly searching for operational aspects related to the swimming championship, and the community related to the 2019 Gwangju World Swimming Masters Competition Is mainly searched for the promotion of the Masters Competition and the aspect of the competition divided into two areas of major recognition and peripheral recognition. The cluster related to the problems of the 2019 Gwangju World Swimming Championships is divided into five areas, And they are mainly searching for the place, operation, institution, event, etc. of the problem of the swimming championship.

HTML Text Extraction Using Tag Path and Text Appearance Frequency (태그 경로 및 텍스트 출현 빈도를 이용한 HTML 본문 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.12
    • /
    • pp.1709-1715
    • /
    • 2021
  • In order to accurately extract the necessary text from the web page, the method of specifying the tag and style attributes where the main contents exist to the web crawler has a problem in that the logic for extracting the main contents. This method needs to be modified whenever the web page configuration is changed. In order to solve this problem, the method of extracting the text by analyzing the frequency of appearance of the text proposed in the previous study had a limitation in that the performance deviation was large depending on the collection channel of the web page. Therefore, in this paper, we proposed a method of extracting texts with high accuracy from various collection channels by analyzing not only the frequency of appearance of text but also parent tag paths of text nodes extracted from the DOM tree of web pages.

Security Log collection and analysis System Design Using Big Data System (빅 데이터 시스템을 이용한 보안 로그 수집 및 분석 시스템 설계)

  • Kim, Du-Hoe;Shin, Dong-Kyoo;Shin, Dong-Il
    • Annual Conference of KIPS
    • /
    • 2016.04a
    • /
    • pp.321-323
    • /
    • 2016
  • 최근 SNS, 클라우드 서비스, IoT 등 신기술이 발전함에 따라서 개인 정보 보호와 보안에 관심이 대두 되었다. 때문에 기업들은 고객 정보 보호를 위한 보안 솔루션 구축이 필수불가결해졌다. 이러한 기업의 니즈를 충족시키기 위해 ESM이라는 보안 관리 시스템이 등장하고 최근에는 SIEM으로 넘어가고 있는 추세이다. SIEM은 관리자가 로그들을 모니터링 하는 방식으로 많은 양의 로그가 발생하거나 축적된 로그들을 분석하는 것은 한계가 있다. 따라서 본 논문에서는 빅 데이터 시스템을 이용하여 로그들을 축적하고 머하웃을 이용하여 축적된 로그들을 분석하는 자동화 시스템을 제안한다.

Predictive Maintenance Plan based on Vibration Monitoring of Nuclear Power Plants using Industry 4.0 (4차 산업기술을 활용한 원전설비 진동감시기반 예측정비 방안)

  • Do-young Ko
    • Transactions of the Korean Society of Pressure Vessels and Piping
    • /
    • v.19 no.1
    • /
    • pp.6-10
    • /
    • 2023
  • Only about 10% of selected equipment in nuclear power plants are monitored by wiring to address failures or problems caused by vibration. The purpose is primarily for preventive maintenance, not for predictive maintenance. This paper shows that vibration monitoring and diagnosis using Industrial 4.0 enables the complete predictive maintenance for all vibrating equipments in nuclear power plants with the convergence of internet of things; wireless technology, big data through periodic collection and artificial intelligence. Predictive maintenance using wireless technology is possible in all areas of nuclear power plants and in all systems, but it should satisfy regulatory guides on electromagnetic interference and cyber security.

Determining Food Nutrition Information Preference Through Big Data Log Analysis (빅데이터 로그분석을 통한 식품영양정보 선호도 분석)

  • Hana Song;Hae-Jeung, Lee;Hunjoo Lee
    • Journal of Food Hygiene and Safety
    • /
    • v.38 no.5
    • /
    • pp.402-408
    • /
    • 2023
  • Consumer interest in food nutrition continues to grow; however, research on consumer preferences related to nutrition remains limited. In this study, big data analysis was conducted using keyword logs collected from the national information service, the Korean Food Composition Database (K-FCDB), to determine consumer preferences for foods of nutritional interest. The data collection period was set from January 2020 to December 2022, covering a total of 2,243,168 food name keywords searched by K-FCDB users. Food names were processed by merging them into representative food names. The search frequency of food names was analyzed for the entire period and by season using R. In the frequency analysis for the entire period, steamed rice, chicken, and egg were found to be the most frequently consumed foods by Koreans. Seasonal preference analysis revealed that in the spring and summer, foods without broth and cold dishes were consumed frequently, whereas in fall and winter, foods with broth and warm dishes were more popular. Additionally, foods sold by restaurants as seasonal items, such as Naengmyeon and Kongguksu, also exhibited seasonal variations in frequency. These results provide insights into consumer interest patterns in the nutritional information of commonly consumed foods and are expected to serve as fundamental data for formulating seasonal marketing strategies in the restaurant industry, given their indirect relevance to consumer trends.

Regional Health Status and Medicine Expenses by Income Quartile Using the Korea Health Panel (한국의료패널로 본 소득분위에 따른 권역별 건강수준과 의약품 지출 비용)

  • Kim, Yun-Jeong;Hwang, Byung-Deog
    • The Korean Journal of Health Service Management
    • /
    • v.11 no.1
    • /
    • pp.117-130
    • /
    • 2017
  • Objectives : In this study, 3,107 patients were used to evaluate the impact based on raw data of 2014 and the health status and medical expenses income quintile was collected and data was analyzed. Methods : Analysis method was the average comparison, ANOVA, subjected to a multiple logistic regression analysis, the statistical test was the t-test and the scheffe post verification. Results : Gender(p<.000), age(p<.000), marital status(p<.000) educational status (p<.000), easement(p<.000), medication(p<.000), subjective health status(p<.005) were analyzed. First quintile identified that the highest amount was spent in the Chungcheong region, the 2nd quintile showed that the highest output was in the Gyeongsang region. The 3rd and 4th quintiles indicated that the highest expenditure was in the Seoul metropolitan region. The 5th quintile showed that the Chungcheong was the highest once again and the Jeolla region was the lowest in terms of expediture. Conclusions : Future medical research on income will require the government's Big Data collection to create the primary basis for policy making in order to improve the efficiency, effectiveness and equity of medicine spending.

R&D Perspective Social Issue Packaging using Text Analysis

  • Wong, William Xiu Shun;Kim, Namgyu
    • Journal of Information Technology Services
    • /
    • v.15 no.3
    • /
    • pp.71-95
    • /
    • 2016
  • In recent years, text mining has been used to extract meaningful insights from the large volume of unstructured text data sets of various domains. As one of the most representative text mining applications, topic modeling has been widely used to extract main topics in the form of a set of keywords extracted from a large collection of documents. In general, topic modeling is performed according to the weighted frequency of words in a document corpus. However, general topic modeling cannot discover the relation between documents if the documents share only a few terms, although the documents are in fact strongly related from a particular perspective. For instance, a document about "sexual offense" and another document about "silver industry for aged persons" might not be classified into the same topic because they may not share many key terms. However, these two documents can be strongly related from the R&D perspective because some technologies, such as "RF Tag," "CCTV," and "Heart Rate Sensor," are core components of both "sexual offense" and "silver industry." Thus, in this study, we attempted to discover the differences between the results of general topic modeling and R&D perspective topic modeling. Furthermore, we package social issues from the R&D perspective and present a prototype system, which provides a package of news articles for each R&D issue. Finally, we analyze the quality of R&D perspective topic modeling and provide the results of inter- and intra-topic analysis.

Home Automation System through Learning User Life Pattern (사용자 생활패턴 학습을 통한 홈오토메이션 시스템)

  • Bae, Hong-Min;Seo, Shin-Il;Kim, Byung-Seo
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.8 no.2
    • /
    • pp.79-85
    • /
    • 2015
  • Home automation technology combines various devices in home organically with each other to ensure the convenience and improved safety of the home life refers to a technique for improving the human living. Because of the technology needed to make this house before you can act as awareness of your thoughts. In this paper, we implement the system model, such as data collection using a sensor network, and take advantage of the idea that the data itself to the home, and an introduction to the method.

Phishing Attack Detection Using Deep Learning

  • Alzahrani, Sabah M.
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.12
    • /
    • pp.213-218
    • /
    • 2021
  • This paper proposes a technique for detecting a significant threat that attempts to get sensitive and confidential information such as usernames, passwords, credit card information, and more to target an individual or organization. By definition, a phishing attack happens when malicious people pose as trusted entities to fraudulently obtain user data. Phishing is classified as a type of social engineering attack. For a phishing attack to happen, a victim must be convinced to open an email or a direct message [1]. The email or direct message will contain a link that the victim will be required to click on. The aim of the attack is usually to install malicious software or to freeze a system. In other instances, the attackers will threaten to reveal sensitive information obtained from the victim. Phishing attacks can have devastating effects on the victim. Sensitive and confidential information can find its way into the hands of malicious people. Another devastating effect of phishing attacks is identity theft [1]. Attackers may impersonate the victim to make unauthorized purchases. Victims also complain of loss of funds when attackers access their credit card information. The proposed method has two major subsystems: (1) Data collection: different websites have been collected as a big data corresponding to normal and phishing dataset, and (2) distributed detection system: different artificial algorithms are used: a neural network algorithm and machine learning. The Amazon cloud was used for running the cluster with different cores of machines. The experiment results of the proposed system achieved very good accuracy and detection rate as well.

Optimizing Study-life Balance within Higher Education: A Comprehensive Literature Review

  • HATCHER, Ryan;HWANG, Yosung
    • The Journal of Economics, Marketing and Management
    • /
    • v.8 no.2
    • /
    • pp.1-12
    • /
    • 2020
  • Purpose: The rise of the phrase Work Life Balance was bought up in 1986 when amid many Americans there was prevalence of detrimental work place practices like neglecting families, leisure activities and friends in order to achieve their study place goals. The significance of work-life balance has been gaining ground in recent years to grasp a wider range of groups, including students. Searching and finding a balance can be complex and challenging for many individuals and students. Research design, data and methodology: Through this paper we will explore how students balance the competing demands of work, study, and social activities. Several factors have increased imbalances within Educational organizations, and technology specifically has been influential. However, technology also provides a novel solution to this organizational performance management issue. A Study-Life Optimization model (SLO) is suggested, which incorporates information systems, analytics, and decision support into a Smart Service System. A general framework for this model, detailing data collection, measurement, and ethical issues is explained briefly. Results: Outcomes include improved WLB, greater perceived quality of life, and increased Educational organizational performance. Conclusions: This paper contributes to the relevant literature as it pays attention to the various students' of varying lifestyles school-work-personal lives. Findings of this study will provide a meaningful of the Work/school-life balance issues faced by students. The research could be helpful to the various stakeholders of a University, the curriculum designers, program coordinators etc.