• Title/Summary/Keyword: Big data Processing

Search Result 1,063, Processing Time 0.022 seconds

Design and Implementation of a System for Recommending Related Content Using NoSQL (NoSQL 기반 연관 콘텐츠 추천 시스템의 설계 및 구현)

  • Ko, Eun-Jeong;Kim, Ho-Jun;Park, Hyo-Ju;Jeon, Young-Ho;Lee, Ki-Hoon;Shin, Saim
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.9
    • /
    • pp.1541-1550
    • /
    • 2017
  • The increasing number of multimedia content offered to the user demands content recommendation. In this paper, we propose a system for recommending content related to the content that user is watching. In the proposed system, relationship information between content is generated using relationship information between representative keywords of content. Relationship information between keywords is generated by analyzing keyword collocation frequencies in Internet news corpus. In order to handle big corpus data, we design an architecture that consists of a distributed search engine and a distributed data processing engine. Furthermore, we store relationship information between keywords and relationship information between keywords and content in NoSQL to handle big relationship data. Because the query optimizer of NoSQL is not as well developed as RDBMS, we propose query optimization techniques to efficiently process complex queries for recommendation. Experimental results show that the performance is improved by up to 69 times by using the proposed techniques, especially when the number of requested related keywords is small.

A Classification of Medical and Advertising Blogs Using Machine Learning (머신러닝을 이용한 의료 및 광고 블로그 분류)

  • Lee, Gi-Sung;Lee, Jong-Chan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.11
    • /
    • pp.730-737
    • /
    • 2018
  • With the increasing number of health consumers aiming for a happy quality of life, the O2O medical marketing market is activated by choosing reliable health care facilities and receiving high quality medical services based on the medical information distributed on web's blog. Because unstructured text data used on the Internet, mobile, and social networks directly or indirectly reflects authors' interests, preferences, and expectations in addition to their expertise, it is difficult to guarantee credibility of medical information. In this study, we propose a blog reading system that provides users with a higher quality medical information service by classifying medical information blogs (medical blog, ad blog) using bigdata and MLP processing. We collect and analyze many domestic medical information blogs on the Internet based on the proposed big data and machine learning technology, and develop a personalized health information recommendation system for each disease. It is expected that the user will be able to maintain his / her health condition by continuously checking his / her health problems and taking the most appropriate measures.

A Study on Internet Technology Perspective Applicable in Industrial Environments (산업환경에서 적용 가능한 사물인터넷 기술 전망에 한 연구)

  • Hong, Sunghyuck
    • Journal of Industrial Convergence
    • /
    • v.17 no.2
    • /
    • pp.21-27
    • /
    • 2019
  • The Internet of things is the infrastructure that can communicate with each other by exchanging information by installing antennas that can communicate with all things in the world. The reason why the Internet of Things is the core of the Fourth Industrial Revolution is that data is collected through the Internet to be. Technology of things Internet and Trend of Things Internet IoT (Internet of Things) is a concept that enables internet connection and communication between devices equipped with various sensors. It is the core IT trend of lot, technology such as big data, mobile, cloud And to provide information for the development of the industrial environment through research on the importance of the Internet of things, the core of the Fourth Industrial Revolution and the processing and analysis techniques of Big Data. By providing various security measures and future technologies, This study was conducted to contribute to management.

Humming: Image Based Automatic Music Composition Using DeepJ Architecture (허밍: DeepJ 구조를 이용한 이미지 기반 자동 작곡 기법 연구)

  • Kim, Taehun;Jung, Keechul;Lee, Insung
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.5
    • /
    • pp.748-756
    • /
    • 2022
  • Thanks to the competition of AlphaGo and Sedol Lee, machine learning has received world-wide attention and huge investments. The performance improvement of computing devices greatly contributed to big data processing and the development of neural networks. Artificial intelligence not only imitates human beings in many fields, but also seems to be better than human capabilities. Although humans' creation is still considered to be better and higher, several artificial intelligences continue to challenge human creativity. The quality of some creative outcomes by AI is as good as the real ones produced by human beings. Sometimes they are not distinguishable, because the neural network has the competence to learn the common features contained in big data and copy them. In order to confirm whether artificial intelligence can express the inherent characteristics of different arts, this paper proposes a new neural network model called Humming. It is an experimental model that combines vgg16, which extracts image features, and DeepJ's architecture, which excels in creating various genres of music. A dataset produced by our experiment shows meaningful and valid results. Different results, however, are produced when the amount of data is increased. The neural network produced a similar pattern of music even though it was a different classification of images, which was not what we were aiming for. However, these new attempts may have explicit significance as a starting point for feature transfer that will be further studied.

Recent Research Trend Analysis for the Journal of Society of Korea Industrial and Systems Engineering Using Topic Modeling (토픽모델링을 활용한 한국산업경영시스템학회지의 최근 연구주제 분석)

  • Dong Joon Park;Pyung Hoi Koo;Hyung Sool Oh;Min Yoon
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.46 no.3
    • /
    • pp.170-185
    • /
    • 2023
  • The advent of big data has brought about the need for analytics. Natural language processing (NLP), a field of big data, has received a lot of attention. Topic modeling among NLP is widely applied to identify key topics in various academic journals. The Korean Society of Industrial and Systems Engineering (KSIE) has published academic journals since 1978. To enhance its status, it is imperative to recognize the diversity of research domains. We have already discovered eight major research topics for papers published by KSIE from 1978 to 1999. As a follow-up study, we aim to identify major topics of research papers published in KSIE from 2000 to 2022. We performed topic modeling on 1,742 research papers during this period by using LDA and BERTopic which has recently attracted attention. BERTopic outperformed LDA by providing a set of coherent topic keywords that can effectively distinguish 36 topics found out this study. In terms of visualization techniques, pyLDAvis presented better two-dimensional scatter plots for the intertopic distance map than BERTopic. However, BERTopic provided much more diverse visualization methods to explore the relevance of 36 topics. BERTopic was also able to classify hot and cold topics by presenting 'topic over time' graphs that can identify topic trends over time.

Utilizing NLP-based Data Techniques from Customer Reviews: Deriving Insights and Strategies for Cushion Product Improvement (고객 리뷰를 통한 NLP 기반 데이터 기술 활용: 고객 인사이트 도출과 쿠션 제품 개선 방안 연구)

  • Sel-A Lim;Mi-yeon Cho;Eun-Bi Jo;Su-Han Yu
    • The Journal of Bigdata
    • /
    • v.9 no.1
    • /
    • pp.49-60
    • /
    • 2024
  • This study aims to provide insights for developing innovative products, based on reviews from females aged 30 to 70 who bought cosmetic cushions via TV home shopping. Analyzing 200,000 reviews with Selenium and NLP techniques, we found the main audience is in their 50s and 60s, prioritizing radiance, blemish and wrinkle coverage, and adherence. Notably, products with appealing designs were preferred, especially for gifting among relatives and friends. The proposed innovation is Korea's first AI-recommended cushion, utilizing NLP to match customer needs. Key ingredient recommendations include S.Acamella extract and AHA components, chosen for their perceived benefits and consumer preference. The research also highlights the importance of product aesthetics and gift potential, suggesting marketing strategies should emphasize these aspects to appeal to the target demographic. This approach aims to guide product development and marketing towards meeting consumer expectations in the cosmetic cushion industry, making products more personalized and gift-worthy.

A Study on Procurement Audit Integration Real Time Monitoring System Using Process Mining Under Big Data Environment (빅 데이터 환경하에서 프로세스 마이닝을 이용한 구매 감사 통합 실시간 모니터링 시스템에 대한 연구)

  • Yoo, Young-Seok;Park, Han-Gyu;Back, Seung-Hoon;Hong, Sung-Chan
    • Journal of Internet Computing and Services
    • /
    • v.18 no.3
    • /
    • pp.71-83
    • /
    • 2017
  • In recent years, by utilizing the greatest strengths of process mining, the various research activities have been actively progressed to use auditing work of business organization. On the other hand, there is insufficient research on systematic and efficient analysis of massive data generated under big data environment using process mining, and proactive monitoring of risk management from audit side, which is one of important management activities of corporate organization. In this study, we intend to realize Hadoop-based internal audit integrated real-time monitoring system in order to detect the abnormal symptoms in prevent accidents in advance. Through the integrated real-time monitoring system for purchasing audit, we intend to realize strengthen the delivery management of purchasing materials ordered, reduce cost of purchase, manage competitive companies, prevent fraud, comply with regulations, and adhere to internal control accounting system. As a result, we can provide information that can be immediately executed due to enhanced purchase audit integrated real-time monitoring by analyzing data efficiently using process mining via Hadoop-based systems. From an integrated viewpoint, it is possible to manage the business status, by processing a large amount of work at a high speed faster than the continuous monitoring, the effectiveness of the quality improvement of the purchase audit and the innovation of the purchase process appears.

Efficient Locality-Aware Traffic Distribution in Apache Storm (Apache Storm에서 지역성을 고려한 효율적인 트래픽 분배)

  • Son, Siwoon;Lee, Sanghun;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.12
    • /
    • pp.677-683
    • /
    • 2017
  • Apache Storm is a representative real-time distributed processing system, which is able to process data streams quickly over distributed servers. Storm currently provides several stream grouping methods to distribute data traffic to multiple servers. Among them, the shuffle grouping may cause a processing delay problem and the local-or-shuffle grouping used to solve the problem may cause the problem of concentrating the traffic on a specific node. In this paper, we propose the locality-aware grouping to solve the problems that may arise in the existing Storm grouping methods. Experimental results show that the proposed locality-aware grouping is considerably superior to the existing shuffle grouping and the local-or-shuffle grouping. These results show that the new grouping is an excellent approach considering both the locality and load balancing which are limitations of the existing Storm.

Performance Comparison of DW System Tajo Based on Hadoop and Relational DBMS (하둡 기반 DW시스템 타조와 관계형 DBMS의 성능 비교)

  • Liu, Chen;Ko, Junghyun;Yeo, Jeongmo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.9
    • /
    • pp.349-354
    • /
    • 2014
  • Since Hadoop which is the Big-data processing platform was announced, SQL-on-Hadoop is the spotlight as the technique to analyze data using SQL on Hadoop. Tajo created by Korean programmers has recently been promoted to Top-Level-Project status by the Apache in April and has been paid attention all around world. Despite a sensible change caused by Hadoop's appearance in DW market, researches of those performance is insufficient. Thus, this study has been conducted to help choose a DW solution based on SQL-on-Hadoop as progressing the test on comparison analysis of RDBMS and Tajo. It has shown that Tajo based on Hadoop is more superior than RDBMS if it is used with accurate strategy. In addition, open-source project Tajo is expected not only to achieve improvements in technique due to active participation of many developers but also to be in charge of an important role of DW in the filed of data analysis.

A MapReduce-based kNN Join Query Processing Algorithm for Analyzing Large-scale Data (대용량 데이터 분석을 위한 맵리듀스 기반 kNN join 질의처리 알고리즘)

  • Lee, HyunJo;Kim, TaeHoon;Chang, JaeWoo
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.504-511
    • /
    • 2015
  • Recently, the amount of data is rapidly increasing with the popularity of the SNS and the development of mobile technology. So, it has been actively studied for the effective data analysis schemes of the large amounts of data. One of the typical schemes is a Voronoi diagram based on kNN join algorithm (VkNN-join) using MapReduce. For two datasets R and S, VkNN-join can reduce the time of the join query processing involving big data because it selects the corresponding subset Sj for each Ri and processes the query with them. However, VkNN-join requires a high computational cost for constructing the Voronoi diagram. Moreover, the computational overhead of the VkNN-join is high because the number of the candidate cells increases as the value of the k increases. In order to solve these problems, we propose a MapReduce-based kNN-join query processing algorithm for analyzing the large amounts of data. Using the seed-based dynamic partitioning, our algorithm can reduce the overhead for constructing the index structure. Also, it can reduce the computational overhead to find the candidate partitions by selecting corresponding partitions with the average distance between two seeds. We show that our algorithm has better performance than the existing scheme in terms of the query processing time.