• Title/Summary/Keyword: Big data collection

Search Result 348, Processing Time 0.029 seconds

Design and Implementation of a Search Engine based on Apache Spark (아파치 스파크 기반 검색엔진의 설계 및 구현)

  • Park, Ki-Sung;Choi, Jae-Hyun;Kim, Jong-Bae;Park, Jae-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.1
    • /
    • pp.17-28
    • /
    • 2017
  • Recently, a study on data has been actively conducted because the value of the data has become more useful. Web crawler that is program of data collection recently spotlighted because it can take advantage of the various fields. Web crawler can be defined as a tool to analyze the web pages and collects the URL by traversing the web server in an automated manner. For the treatment of Big-data, distributed Web crawler is widely used which is based on the Hadoop MapReduce. But, it is difficult to use and has constraints on the performance. Apache spark that is the In-memory computing platform is an alternative to MapReduce. The search engine which is one of the main purposes of web crawler displays the information you search by keyword gathered by web crawler. If search engines implement a spark-based web crawler instead of traditional MapReduce-based web crawler, it would be a more rapid data collection.

Storm-Based Dynamic Tag Cloud for Real-Time SNS Data (실시간 SNS 데이터를 위한 Storm 기반 동적 태그 클라우드)

  • Son, Siwoon;Kim, Dasol;Lee, Sujeong;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.6
    • /
    • pp.309-314
    • /
    • 2017
  • In general, there are many difficulties in collecting, storing, and analyzing SNS (social network service) data, since those data have big data characteristics, which occurs very fast with the mixture form of structured and unstructured data. In this paper, we propose a new data visualization framework that works on Apache Storm, and it can be useful for real-time and dynamic analysis of SNS data. Apache Storm is a representative big data software platform that processes and analyzes real-time streaming data in the distributed environment. Using Storm, in this paper we collect and aggregate the real-time Twitter data and dynamically visualize the aggregated results through the tag cloud. In addition to Storm-based collection and aggregation functionalities, we also design and implement a Web interface that a user gives his/her interesting keywords and confirms the visualization result of tag cloud related to the given keywords. We finally empirically show that this study makes users be able to intuitively figure out the change of the interested subject on SNS data and the visualized results be applied to many other services such as thematic trend analysis, product recommendation, and customer needs identification.

Design and Implemention of Real-time web Crawling distributed monitoring system (실시간 웹 크롤링 분산 모니터링 시스템 설계 및 구현)

  • Kim, Yeong-A;Kim, Gea-Hee;Kim, Hyun-Ju;Kim, Chang-Geun
    • Journal of Convergence for Information Technology
    • /
    • v.9 no.1
    • /
    • pp.45-53
    • /
    • 2019
  • We face problems from excessive information served with websites in this rapidly changing information era. We find little information useful and much useless and spend a lot of time to select information needed. Many websites including search engines use web crawling in order to make data updated. Web crawling is usually used to generate copies of all the pages of visited sites. Search engines index the pages for faster searching. With regard to data collection for wholesale and order information changing in realtime, the keyword-oriented web data collection is not adequate. The alternative for selective collection of web information in realtime has not been suggested. In this paper, we propose a method of collecting information of restricted web sites by using Web crawling distributed monitoring system (R-WCMS) and estimating collection time through detailed analysis of data and storing them in parallel system. Experimental results show that web site information retrieval is applied to the proposed model, reducing the time of 15-17%.

A Study on the Quality Monitoring and Prediction of OTT Traffic in ISP (ISP의 OTT 트래픽 품질모니터링과 예측에 관한 연구)

  • Nam, Chang-Sup
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.14 no.2
    • /
    • pp.115-121
    • /
    • 2021
  • This paper used big data and artificial intelligence technology to predict the rapidly increasing internet traffic. There have been various studies on traffic prediction in the past, but they have not been able to reflect the increasing factors that induce huge Internet traffic such as smartphones and streaming in recent years. In addition, event-like factors such as the release of large-capacity popular games or the provision of new contents by OTT (Over the Top) operators are more difficult to predict in advance. Due to these characteristics, it was impossible for an ISP (Internet Service Provider) to reflect real-time service quality management or traffic forecasts in the network business environment with the existing method. Therefore, in this study, in order to solve this problem, an Internet traffic collection system was constructed that searches, discriminates and collects traffic data in real time, separate from the existing NMS. Through this, the flexibility and elasticity to automatically register the data of the collection target are secured, and real-time network quality monitoring is possible. In addition, a large amount of traffic data collected from the system was analyzed by machine learning (AI) to predict future traffic of OTT operators. Through this, more scientific and systematic prediction was possible, and in addition, it was possible to optimize the interworking between ISP operators and to secure the quality of large-scale OTT services.

A Study of Data Collection Method for Efficient Sharing in IoT Environment (사물인터넷(IoT) 환경에서 효율적 공유를 위한 데이터 수집 기법에 대한 연구)

  • Hwang, Chi-Gon;Yoon, Chang-Pyo
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2015.10a
    • /
    • pp.268-269
    • /
    • 2015
  • The current Internet environment, it is accessible by a computer, but also transferred to the IoT(Internet of Things). These data become large. If the data are provided to the application without any adjustment, it is difficult to exert the original performance. In this paper, we propose a method for filtering the data using the MapReduce of big data processing techniques to refine the collected data. We want to address the heterogeneity of the data generated by the sensor by adding a knowledge identification step in MapReduce. We use XMDR for this purpose.

  • PDF

IoT based Energy data collection system for data center (IoT 기반 데이터센터 에너지 정보 수집 시스템 기술)

  • Kang, Jeonghoon;Lim, Hojung;Jung, Hyedong
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.893-895
    • /
    • 2016
  • Data center has a lot of management efforts for the facility, energy, and efficient usage monitoring. Data center power management is important to make the data center have reliable service and cost-effective business. In this paper, IoT based energy measurements monitoring which gives support to energy consumption analysis including indoor, outdoor temperature condition. This converged information for energy analysis gives various aspects of energy consumption effects. With IoT big data, energy machine learning system can give the relation of energy components and measurements, it is the key information of the quick energy analysis in the just one month data trend for the prediction and estimation.

  • PDF

Data Collection Management Program for Smart Factory (스마트팩토리를 위한 데이터 수집 관리 프로그램 개발)

  • Kim, Hyeon-Jin;Kim, Jin-Sa
    • Journal of the Korean Institute of Electrical and Electronic Material Engineers
    • /
    • v.35 no.5
    • /
    • pp.509-515
    • /
    • 2022
  • As the 4th industrial revolution based on ICT is progressing in the manufacturing field, interest in building smart factories that can be flexible and customized according to customer demand is increasing. To this end, it is necessary to maximize the efficiency of factory by performing an automated process in real time through a network communication between engineers and equipment to be able to link the established IT system. It is also necessary to collect and store real-time data from heterogeneous facilities and to analyze and visualize a vast amount of data to utilize necessary information. Therefore, in this study, four types of controllers such as PLC, Arduino, Raspberry Pi, and embedded system, which are generally used to build a smart factory that can connect technologies such as artificial intelligence (AI), Internet of Things (IoT), and big data, are configured. This study was conducted for the development of a program that can collect and store data in real time to visualize and manage information. For communication verification by controller, data communication was implemented and verified with the data log in the program, and 3D monitoring was implemented and verified to check the process status such as planned quantity for each controller, actual quantity, production progress, operation rate, and defect rate.

IoT-Based Device Utilization Technology for Big Data Collection in Foundry (주물공장의 빅데이터 수집을 위한 IoT 기반 디바이스 활용 기술)

  • Kim, Moon-Jo;Kim, DongEung
    • Journal of Korea Foundry Society
    • /
    • v.41 no.6
    • /
    • pp.550-557
    • /
    • 2021
  • With the advent of the fourth industrial revolution, the interest in the internet of things (IoT) in manufacturing is growing, even at foundries. There are several types of process data that can be automatically collected at a foundry, but considerable amounts of process data are still managed based on handwriting for reasons such as the limited functions of outdated production facilities and process design based on operator know-how. In particular, despite recognizing the importance of converting process data into big data, many companies have difficulty adopting these steps willingly due to the burden of system construction costs. In this study, the field applicability of IoT-based devices was examined by manufacturing devices and applying them directly to the site of a centrifugal foundry. For the centrifugal casting process, the temperature and humidity of the working site, the molten metal temperature, and mold rotation speed were selected as process parameters to be collected. The sensors were selected in consideration of the detailed product specifications and cost required for each process parameter, and the circuit was configured using a NodeMCU board capable of wireless communication for IoT-based devices. After designing the circuit, PCB boards were prepared for each parameter, and each device was installed on site considering the working environment. After the on-site installation process, it was confirmed that the level of satisfaction with the safety of the workers and the efficiency of process management increased. Also, it is expected that it will be possible to link process data and quality data in the future, if process parameters are continuously collected. The IoT-based device designed in this study has adequate reliability at a low cast, meaning that the application of this technique can be considered as a cornerstone of data collecting at foundries.

Smartphone NFC based Access Control System (스마트폰 NFC 기반의 출입관리 시스템)

  • Bae, Sang-Jung;Jeon, Soon-Yeong;Lee, Sang-Hwa;Lee, Chan-Ho;Jung, Hoe-Kyung
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2016.10a
    • /
    • pp.596-597
    • /
    • 2016
  • Recently a variety of companies as the growing importance of big data are in the process of research and development for large-scale data collection and management. In addition, a situation that collects all necessary data inside and outside the enterprise to improve the manufacturing environment and the service environment. In this paper, we collect to ingest the mouth-time work for companies within the temple. Also, in conjunction with the NFC tag of the smart phone and employee information stored in the database. Utilizing this determines whether or not the employee attendance and the authority to automatically access to the database or resource within the enterprise. The proposed system is considered to be able to increase the accessibility and efficiency of a database or a resource in the enterprise.

  • PDF

The Effect of Highland Weather and Soil Information on the Prediction of Chinese Cabbage Weight (기상 및 토양정보가 고랭지배추 단수예측에 미치는 영향)

  • Kwon, Taeyong;Kim, Rae Yong;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.28 no.8
    • /
    • pp.701-707
    • /
    • 2019
  • Highland farming is agriculture that takes place 400 m above sea level and typically involves both low temperatures and long sunshine hours. Most highland Chinese cabbages are harvested in the Gangwon province. The Ubiquitous Sensor Network (USN) has been deployed to observe Chinese cabbages growth because of the lack of installed weather stations in the highlands. Five representative Chinese cabbage cultivation spots were selected for USN and meteorological data collection between 2015 and 2017. The purpose of this study is to develop a weight prediction model for Chinese cabbages using the meteorological and growth data that were collected one week prior. Both a regression and random forest model were considered for this study, with the regression assumptions being satisfied. The Root Mean Square Error (RMSE) was used to evaluate the predictive performance of the models. The variables influencing the weight of cabbage were the number of cabbage leaves, wind speed, precipitation and soil electrical conductivity in the regression model. In the random forest model, cabbage width, the number of cabbage leaves, soil temperature, precipitation, temperature, soil moisture at a depth of 30 cm, cabbage leaf width, soil electrical conductivity, humidity, and cabbage leaf length were screened. The RMSE of the random forest model was 265.478, a value that was relatively lower than that of the regression model (404.493); this is because the random forest model could explain nonlinearity.