• Title/Summary/Keyword: Big data collection

Search Result 348, Processing Time 0.023 seconds

A Study on the Protection and Utilization of Personal Information for the Operation of Artificial Intelligence and Big Data in the Fourth Industrial Revolution (4차 산업혁명기 인공지능과 빅데이터 운용을 위한 개인정보 보호와 이용에 관한 연구)

  • Choi, Won Sang;Lee, Jong Yong;Shin, Jin
    • Convergence Security Journal
    • /
    • v.19 no.5
    • /
    • pp.63-73
    • /
    • 2019
  • In the 4th Industrial Revolution, information is collected and analyzed from people and objects through the rapid development of ICT. It is possible to create value. However, there are many legal and institutional restrictions on the collection of information aimed at people.Therefore, in-depth research on the protection and use of personal information in the rapidly changing cyber security environment is needed. The purpose of this study is to protect and utilize personal information for the operation of AI (Artificial Intelligence) and big data during the 4th Industrial Revolution. It is to seek a paradigm shift. The organization of the research for this is: Chapter 1 examines the meaning of personal information during the 4th Industrial Revolution, Chapter 2 presents the framework for the review and analysis of prior research. In Chapter 3, after analyzing policies for the protection and utilization of personal information in major countries, Chapter 4 looks at the paradigm shift in personal information protection during the 4th Industrial Revolution and how to respond. Chapter 5 made some policy suggestions for the protection and utilization of personal information.

HTML Text Extraction Using Frequency Analysis (빈도 분석을 이용한 HTML 텍스트 추출)

  • Kim, Jin-Hwan;Kim, Eun-Gyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.25 no.9
    • /
    • pp.1135-1143
    • /
    • 2021
  • Recently, text collection using a web crawler for big data analysis has been frequently performed. However, in order to collect only the necessary text from a web page that is complexly composed of numerous tags and texts, there is a cumbersome requirement to specify HTML tags and style attributes that contain the text required for big data analysis in the web crawler. In this paper, we proposed a method of extracting text using the frequency of text appearing in web pages without specifying HTML tags and style attributes. In the proposed method, the text was extracted from the DOM tree of all collected web pages, the frequency of appearance of the text was analyzed, and the main text was extracted by excluding the text with high frequency of appearance. Through this study, the superiority of the proposed method was verified.

Service Platform of Regional Smart Tour Ecosystem Support (지역중심의 스마트관광 생태계 지원 서비스 플랫)

  • Weon, Dalsoo
    • The Journal of the Convergence on Culture Technology
    • /
    • v.4 no.4
    • /
    • pp.31-36
    • /
    • 2018
  • The tourism industry has a great influence on national economy activation. The development of IT technology has enabled the collection and analysis of personal profile information, location information and activity information based on the characteristics, behavior, purchase propensity and interest of tourists. In order to realize this, the implementation of convergence smart tourism information service platform is completed by developing business model, IoT & Big Data integration management system, big data algorithm development and analysis platform in three stages. The underlying technology of the platform and algorithm needs a process of adopting open source, expanding the service element on the basis of it, and then complementing the problem through the test-bed demonstration test that connects the area. Using this platform, it is possible to develop a smart tourism environment that can provide customized services for each tourist by analyzing various information in an integrated manner. Also, it will be possible to improve the life of tourist destination residents and contribute to regional revitalization and job creation through the creation of smart tourism ecosystem focused on the region.

Implementation of a Travel Route Recommendation System Utilizing Daily Scheduling Templates

  • Kim, Hyeon Gyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.10
    • /
    • pp.137-146
    • /
    • 2022
  • In relation to the travel itinerary recommendation service, which has recently become in high demand, our previous work introduces a method to quantify the popularity of places including tour spots, restaurants, and accommodations through social big data analysis, and to create a travel schedule based on the analysis results. On the other hand, the generated schedule was mainly composed of travel routes that connected tour spots with the shorted distance, and detailed schedule information including restaurants and accommodation information for each travel date was not provided. This paper presents an algorithm for constructing a detailed travel route using a scenario template in a travel schedule created based on social big data, and introduces a prototype system that implements it. The proposed system consists of modules such as place information collection, place-specific popularity score estimation, shortest travel rout generation, daily schedule organization, and UI visualization. Experiments conducted based on social reviews collected from 63,000 places in the Gyeongnam province proved effectiveness of the proposed system.

Probe Vehicle Data Collecting Intervals for Completeness of Link-based Space Mean Speed Estimation (링크 공간평균속도 신뢰성 확보를 위한 프로브 차량 데이터 적정 수집주기 산정 연구)

  • Oh, Chang-hwan;Won, Minsu;Song, Tai-jin
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.19 no.5
    • /
    • pp.70-81
    • /
    • 2020
  • Point-by-point data, which is abundantly collected by vehicles with embedded GPS (Global Positioning System), generate useful information. These data facilitate decisions by transportation jurisdictions, and private vendors can monitor and investigate micro-scale driver behavior, traffic flow, and roadway movements. The information is applied to develop app-based route guidance and business models. Of these, speed data play a vital role in developing key parameters and applying agent-based information and services. Nevertheless, link speed values require different levels of physical storage and fidelity, depending on both collecting and reporting intervals. Given these circumstances, this study aimed to establish an appropriate collection interval to efficiently utilize Space Mean Speed information by vehicles with embedded GPS. We conducted a comparison of Probe-vehicle data and Image-based vehicle data to understand PE(Percentage Error). According to the study results, the PE of the Probe-vehicle data showed a 95% confidence level within an 8-second interval, which was chosen as the appropriate collection interval for Probe-vehicle data. It is our hope that the developed guidelines facilitate C-ITS, and autonomous driving service providers will use more reliable Space Mean Speed data to develop better related C-ITS and autonomous driving services.

Investigations on Techniques and Applications of Text Analytics (텍스트 분석 기술 및 활용 동향)

  • Kim, Namgyu;Lee, Donghoon;Choi, Hochang;Wong, William Xiu Shun
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.42 no.2
    • /
    • pp.471-492
    • /
    • 2017
  • The demand and interest in big data analytics are increasing rapidly. The concepts around big data include not only existing structured data, but also various kinds of unstructured data such as text, images, videos, and logs. Among the various types of unstructured data, text data have gained particular attention because it is the most representative method to describe and deliver information. Text analysis is generally performed in the following order: document collection, parsing and filtering, structuring, frequency analysis, and similarity analysis. The results of the analysis can be displayed through word cloud, word network, topic modeling, document classification, and semantic analysis. Notably, there is an increasing demand to identify trending topics from the rapidly increasing text data generated through various social media. Thus, research on and applications of topic modeling have been actively carried out in various fields since topic modeling is able to extract the core topics from a huge amount of unstructured text documents and provide the document groups for each different topic. In this paper, we review the major techniques and research trends of text analysis. Further, we also introduce some cases of applications that solve the problems in various fields by using topic modeling.

Lambda Architecture Used Apache Kudu and Impala (Apache Kudu와 Impala를 활용한 Lambda Architecture 설계)

  • Hwang, Yun-Young;Lee, Pil-Won;Shin, Yong-Tae
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.9 no.9
    • /
    • pp.207-212
    • /
    • 2020
  • The amount of data has increased significantly due to advances in technology, and various big data processing platforms are emerging, to handle it. Among them, the most widely used platform is Hadoop developed by the Apache Software Foundation, and Hadoop is also used in the IoT field. However, the existing Hadoop-based IoT sensor data collection and analysis environment has a problem of overloading the name node due to HDFS' Small File, which is Hadoop's core project, and it is impossible to update or delete the imported data. This paper uses Apache Kudu and Impala to design Lambda Architecture. The proposed Architecture classifies IoT sensor data into Cold-Data and Hot-Data, stores it in storage according to each personality, and uses Batch-View created through Batch and Real-time View generated through Apache Kudu and Impala to solve problems in the existing Hadoop-based IoT sensor data collection analysis environment and shorten the time users access to the analyzed data.

Real-time IoT Big Data Analysis Platform Requirements (실시간 IoT Big Data 분석 플랫폼 요건)

  • Kang, Sun-Kyoung;Lee, Hyun-Chang;Shin, Seong-Yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.165-166
    • /
    • 2017
  • It is demanding to receive information of data in real time anywhere and analyze it with meaningful data. Research on the platform for such analysis is actively underway. In this paper, we try to find out what are important factors in solving the problems of collecting and analyzing IoT data in real time. How much better than existing data collection methods and analytical methods can be the basis for judging the value of the data. It is important to accurately collect and store data more quickly and quickly from many sensors in real time in real time, and analytical methods that can derive values from the stored data. Therefore, an important requirement of the analysis platform in the IoT environment is to process large amount of data in real time and to centralize and manage it.

  • PDF

Web crawler Improvement and Dynamic process Design and Implementation for Effective Data Collection (효과적인 데이터 수집을 위한 웹 크롤러 개선 및 동적 프로세스 설계 및 구현)

  • Wang, Tae-su;Song, JaeBaek;Son, Dayeon;Kim, Minyoung;Choi, Donggyu;Jang, Jongwook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1729-1740
    • /
    • 2022
  • Recently, a lot of data has been generated according to the diversity and utilization of information, and the importance of big data analysis to collect, store, process and predict data has increased, and the ability to collect only necessary information is required. More than half of the web space consists of text, and a lot of data is generated through the organic interaction of users. There is a crawling technique as a representative method for collecting text data, but many crawlers are being developed that do not consider web servers or administrators because they focus on methods that can obtain data. In this paper, we design and implement an improved dynamic web crawler that can efficiently fetch data by examining problems that may occur during the crawling process and precautions to be considered. The crawler, which improved the problems of the existing crawler, was designed as a multi-process, and the work time was reduced by 4 times on average.

I/E Selective Activation based Knowledge Reconfiguration mechanism and Reasoning

  • Shim, JeongYon
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.5
    • /
    • pp.338-344
    • /
    • 2014
  • As the role of information collection becomes increasingly important in the enormous data environment, there is growing demand for more intelligent information technologies for managing complex data. On the other hand, it is difficult to find a solution because of the data complexity and big scaled amount. Accordingly, there is a need for a special intelligent knowledge base frame that can be operated by itself flexibly. In this paper, by adopting switching function for signal transmission in the synapse of the human brain, I/E selective activation based knowledge reconfiguring mechanism is proposed for building more intelligent information management system. In particular, knowledge network design, a special knowledge node structure, Type definition, I/E gauge definition and I/E matching scheme are provided. Using these concepts, the proposed system makes the functions of activation by I/E Gauge, selection and reconfiguration. In a more efficient manner, the routing and reasoning process was performed based on the knowledge reconfiguration network. In the experiments, the process of selection by I/E matching, knowledge reconfiguration and routing & reasoning results are described.