• Title/Summary/Keyword: distributed data storage

Search Result 299, Processing Time 0.034 seconds

Anomaly Detection Technique of Log Data Using Hadoop Ecosystem (하둡 에코시스템을 활용한 로그 데이터의 이상 탐지 기법)

  • Son, Siwoon;Gil, Myeong-Seon;Moon, Yang-Sae
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.2
    • /
    • pp.128-133
    • /
    • 2017
  • In recent years, the number of systems for the analysis of large volumes of data is increasing. Hadoop, a representative big data system, stores and processes the large data in the distributed environment of multiple servers, where system-resource management is very important. The authors attempted to detect anomalies from the rapid changing of the log data that are collected from the multiple servers using simple but efficient anomaly-detection techniques. Accordingly, an Apache Hive storage architecture was designed to store the log data that were collected from the multiple servers in the Hadoop ecosystem. Also, three anomaly-detection techniques were designed based on the moving-average and 3-sigma concepts. It was finally confirmed that all three of the techniques detected the abnormal intervals correctly, while the weighted anomaly-detection technique is more precise than the basic techniques. These results show an excellent approach for the detection of log-data anomalies with the use of simple techniques in the Hadoop ecosystem.

Secure Sharing and Recovering Scheme of e-Business Data Based on Weight Table (가중치 테이블 기반 안전한 e-비즈니스 데이터 분할 복원 방식)

  • Song, You-Jin;Kim, Jin-Seog
    • The KIPS Transactions:PartC
    • /
    • v.16C no.1
    • /
    • pp.27-36
    • /
    • 2009
  • The leaking of personal information is mostly occurred by internal users. The confidential information such as credit card number can be disclosed or modified by system manager easily. The secure storaging and managing scheme for sensitive data of individual and enterprise is required for distributed data management. The manager owning private data is needed to have a weight which is a right to disclose a private data. For deciding a weight, it is required that system is able to designate the level of user's right. In this paper, we propose the new algorithm named digit-independent algorithm. And we propose a new data management scheme of gathering and processing the data based on digit-independent algorithm. Our sharing and recovering scheme have the efficient computation operation for managing a large quantity of data using weight table. The proposed scheme is able to use for secure e-business data management and storage in ubiquitous computing environment.

Real-time Processing of Manufacturing Facility Data based on Big Data for Smart-Factory (스마트팩토리를 위한 빅데이터 기반 실시간 제조설비 데이터 처리)

  • Hwang, Seung-Yeon;Shin, Dong-Jin;Kwak, Kwang-Jin;Kim, Jeong-Joon;Park, Jeong-Min
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.5
    • /
    • pp.219-227
    • /
    • 2019
  • Manufacturing methods have been changed from labor-intensive methods to technological intensive methods centered on manufacturing facilities. As manufacturing facilities replace human labour, the importance of monitoring and managing manufacturing facilities is emphasized. In addition, Big Data technology has recently emerged as an important technology to discover new value from limited data. Therefore, changes in manufacturing industries have increased the need for smart factory that combines IoT, information and communication technologies, sensor data, and big data. In this paper, we present strategies for existing domestic manufacturing factory to becom big data based smart-factory through technologies for distributed storage and processing of manufacturing facility data in MongoDB in real time and visualization using R programming.

Multi-Behavior Analysis Based on Google Archiving Data (구글 아카이빙 데이터 기반 멀티 행위 분석)

  • Yeeun Kim;Sara Hong;Seongmin Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.5
    • /
    • pp.737-751
    • /
    • 2023
  • The importance of digital forensics in the cloud environment is increasing as businesses and individuals move their data from On-premise to the cloud. Cloud data can be stored on various devices, including mobile devices and desktops, and encompasses a variety of user behavior artifacts, such as information generated from linked accounts and cloud services. However, there are limitations in securing and analyzing digital evidence due to environmental constraints of the cloud, such as distributed storage of data and lack of artifact linkage. One solution to address this is archiving services, and Google's Takeout is prime example. In this paper, user behavior data is analyzed for cloud forensics based on archiving data and necessary items are selected from an investigation perspective. Additionally, we propose the process of analyzing selectively collected data based on time information and utilizing web-based visualization to meaningfully assess artifact associations and multi-behaviors. Through this, we aim to demonstrate the value of utilizing archiving data in response to the increasing significance of evidence collection for cloud data.

Design and Implementation of the Extended SLDS Supporting SDP Master Replication (SDP Master 이중화를 지원하는 확장 SLDS 설계 및 구현)

  • Shin, In-Su;Kang, Hong-Koo;Lee, Ki-Young;Han, Ki-Joon
    • Journal of Korea Spatial Information System Society
    • /
    • v.10 no.3
    • /
    • pp.79-91
    • /
    • 2008
  • Recently, with highly Interest In Location-Based Service(LBS) utilizing location data of moving objects, the GALIS(Gracefully Aging Location Information System) which is a cluster-based distributed computing architecture was proposed as a more efficient location management system of moving objects. In the SLDS(Short-term location Data Subsystem) which Is a subsystem of the GALIS, since the SDP(Short-term Data Processor) Master transmits current location data and queries to every SDP Worker, the SDP Master reassembles and sends query results produced by SDP Workers to the client. However, the services are suspended during the SDP Master under failure and the response time to the client is increased if the load is concentrated on the SDP Master. Therefore, in this paper, the extended SLDS was designed and implemented to solve these problems. Though one SDP Master is under failure, the other can provide the services continually, and so the extended SLDS can guarantee the high reliability of the SLDS. The extended SLDS also can reduce the response time to the client by enabling two SDP Masters to perform the distributed query processing. Finally, we proved high reliability and high availability of the extended SLDS by implementing the current location data storage, query processing, and failure takeover scenarios. We also verified that the extended SLDS is more efficient than the original SLDS through the query processing performance evaluation.

  • PDF

A Web-Based Multimedia Dictionary System Supporting Media Synchronization (미디어 동기화를 지원하는 웹기반 멀티미디어 전자사전 시스템)

  • Choi, Yong-Jun;Hwang, Do-Sam
    • Journal of Korea Multimedia Society
    • /
    • v.7 no.8
    • /
    • pp.1145-1161
    • /
    • 2004
  • The purpose of this research is to establish a method for the construction of a multimedia electronic dictionary system by integrating the media data available from linguistic resources on the Internet. As the result of this study, existing text-oriented electronic dictionary systems can be developed into multimedia lexical systems with greater efficiency and effectiveness. A method is proposed to integrate the media data of linguistic resources on the Internet by a web browser. In the proposed method, a web browser carries out all the work related to integration of media data, and it does not need a dedicated server system. The system constructed by our web browser environment integrates text, image, and voice sources, and also can produce moving pictures. Each media is associated with the meaning of data so that the data integration and movement may be specified in the associations. SMIL documents are generated by analyzing the meaning of each data unit and they are executed in a web browser. The proposed system can be operated without a dedicated server system. And also, the system saves storage space by sharing the each media data distributed on the Internet, and makes it easier to update data.

  • PDF

Improving Efficiency of Encrypted Data Deduplication with SGX (SGX를 활용한 암호화된 데이터 중복제거의 효율성 개선)

  • Koo, Dongyoung
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.11 no.8
    • /
    • pp.259-268
    • /
    • 2022
  • With prosperous usage of cloud services to improve management efficiency due to the explosive increase in data volume, various cryptographic techniques are being applied in order to preserve data privacy. In spite of the vast computing resources of cloud systems, decrease in storage efficiency caused by redundancy of data outsourced from multiple users acts as a factor that significantly reduces service efficiency. Among several approaches on privacy-preserving data deduplication over encrypted data, in this paper, the research results for improving efficiency of encrypted data deduplication using trusted execution environment (TEE) published in the recent USENIX ATC are analysed in terms of security and efficiency of the participating entities. We present a way to improve the stability of a key-managing server by integrating it with individual clients, resulting in secure deduplication without independent key servers. The experimental results show that the communication efficiency of the proposed approach can be improved by about 30% with the effect of a distributed key server while providing robust security guarantees as the same level of the previous research.

Assessment of CO2 Geological Storage Capacity for Basalt Flow Structure around PZ-1 Exploration Well in the Southern Continental Shelf of Korea (남해 대륙붕 PZ-1 시추공 주변 현무암 대지 구조의 CO2 지중저장용량 평가)

  • Shin, Seung Yong;Kang, Moohee;Shinn, Young Jae;Cheong, Snons
    • Economic and Environmental Geology
    • /
    • v.53 no.1
    • /
    • pp.33-43
    • /
    • 2020
  • CO2 geological storage is currently considered as the most stable and effective technology for greenhouse gas reduction. The saline formations for CO2 geological storage are generally located at a depth of more than 800 m where CO2 can be stored in a supercritical state, and an extensive impermeable cap rock that prevents CO2 leakage to the surface should be distributed above the saline formations. Trough analysis of seismic and well data, we identified the basalt flow structure for potential CO2 storage where saline formation is overlain by basalt cap rock around PZ-1 exploration well in the Southern Continental Shelf of Korea. To evaluate CO2 storage capacity of the saline formation, total porosity and CO2 density are calculated based on well logging data of PZ-1 well. We constructed a 3D geological grid model with a certain size in the x, y and z axis directions for volume estimates of the saline formation, and performed a property modeling to assign total porosity to the geological grid. The estimated average CO2 geological storage capacity evaluated by the U.S. DOE method for the saline formation covered by the basalt cap rock is 84.17 Mt of CO2(ranges from 42.07 to 143.79 Mt of CO2).

Neighbor Caching for P2P Applications in MUlti-hop Wireless Ad Hoc Networks (멀티 홉 무선 애드혹 네트워크에서 P2P 응용을 위한 이웃 캐싱)

  • 조준호;오승택;김재명;이형호;이준원
    • Journal of KIISE:Information Networking
    • /
    • v.30 no.5
    • /
    • pp.631-640
    • /
    • 2003
  • Because of multi-hop wireless communication, P2P applications in ad hoc networks suffer poor performance. We Propose neighbor caching strategy to overcome this shortcoming and show it is more efficient than self caching that nodes store data in theirs own cache individually. A node can extend its caching storage instantaneously with neighbor caching by borrowing the storage from idle neighbors, so overcome multi-hop wireless communications with data source long distance away from itself. We also present the ranking based prediction that selects the most appropriate neighbor which data can be stored in. The node that uses the ranking based prediction can select the neighbor that has high possibility to keep data for a long time and avoid caching the low ranked data. Therefore the ranking based prediction improves the throughput of neighbor caching. In the simulation results, we observe that neighbor caching has better performance, as large as network size, as long as idle time, and as small as cache size. We also show the ranking based prediction is an adaptive algorithm that adjusts times of data movement into the neighbor, so makes neighbor caching flexible according to the idleness of nodes

The Model of Network Packet Analysis based on Big Data (빅 데이터 기반의 네트워크 패킷 분석 모델)

  • Choi, Bomin;Kong, Jong-Hwan;Han, Myung-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.23 no.5
    • /
    • pp.392-399
    • /
    • 2013
  • Due to the development of IT technology and the information age, a dependency of the network over the most of our lives have grown to a greater extent. Although it provides us to get various useful information and service, it also has negative effectiveness that can provide network intruder with vulnerable roots. In other words, we need to urgently cope with theses serious security problem causing service disableness or system connected to network obstacle with exploiting various packet information. Many experts in a field of security are making an effort to develop the various security solutions to respond against these threats, but existing solutions have a lot of problems such as lack of storage capacity and performance degradation along with the massive increase of packet data volume. Therefore we propose the packet analysis model to apply issuing Big Data technology in the field of security. That is, we used NoSQL which is technology of massive data storage to collect the packet data growing massive and implemented the packet analysis model based on K-means clustering using MapReudce which is distributed programming framework, and then we have shown its high performance by experimenting.