• Title/Summary/Keyword: Large data

Search Result 14,138, Processing Time 0.046 seconds

A Study on Recommendation System Using Data Mining Techniques for Large-sized Music Contents (대용량 음악콘텐츠 환경에서의 데이터마이닝 기법을 활용한 추천시스템에 관한 연구)

  • Kim, Yong;Moon, Sung-Been
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.2
    • /
    • pp.89-104
    • /
    • 2007
  • This research attempts to give a personalized recommendation framework in large-sized music contents environment. Despite of existing studios and commercial contents for recommendation systems, large online shopping malls are still looking for a recommendation system that can serve personalized recommendation and handle large data in real-time. This research utilizes data mining technologies and new pattern matching algorithm. A clustering technique is used to get dynamic user segmentations using user preference to contents categories. Then a sequential pattern mining technique is used to extract contents access patterns in the user segmentations. And the recommendation is given by our recommendation algorithm using user contents preference history and contents access patterns of the segment. In the framework, preprocessing and data transformation and transition are implemented on DBMS. The proposed system is implemented to show that the framework is feasible. In the experiment using real-world large data, personalized recommendation is given in almost real-time and shows acceptable correctness.

Visualizing Large Two-way Crosstabs by PLS Method (PLS 방법에 의한 "큰" 2원 교차표의 시각화)

  • Lee, Yong-Goo;Choi, Youn-Im
    • Communications for Statistical Applications and Methods
    • /
    • v.16 no.3
    • /
    • pp.421-428
    • /
    • 2009
  • On the visualization of categorical data, if the number of categories is small, we can consider Hayashi Quantification Method 3 for visualization of the categories of the variables. But it is known that the method is unstable because it quantifies more significantly for the small frequency categories rather than large frequency categories. The purpose of this research is to propose the visualization of large two-way crosstabulation data by PLS methods for checking the relationship between the categories of row and column variables. In this research, we utilize the PLS visualization methods (Huh et al., 2007) that is proposed for visualization of the qualitative data to visualize the categories of the large categorical data. We also compared both methods by applying them to real data, and studied the results from PLS visualization method on the real categorized data with many categories.

The first of its kind metallicity map of the Large Magellanic Cloud

  • Choudhury, Samyaday;Subramaniam, Annapurni;Cole, Andrew A.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.41 no.2
    • /
    • pp.30.1-30.1
    • /
    • 2016
  • We have estimated a metallicity map of the Large Magellanic Cloud (LMC) using the Magellanic Cloud Photometric Survey (MCPS) and Optical Gravitational Lensing Experiment (OGLE III) photometric data. This is a first of its kind, high-spatial resolution map of metallicity up to a radius of $4^{\circ}-5^{\circ}$, derived using large area photometric data and calibrated using spectroscopic data of Red Giant Branch (RGB) stars. The RGB is identified in the V, (V - I) colour- magnitude diagrams of small subregions of varying sizes in both data sets. The slope of the RGB is used as an indicator of the mean metallicity of a subregion, and it is calibrated to metallicity using spectroscopic data for field and cluster red giants in selected subregions. The mean metallicity of the LMC is found to be [Fe/H] = -0.37 dex (${\sigma}[Fe/H]=0.12$) from MCPS data, and [Fe/H] = -0.39 dex (${\sigma}[Fe/H]=0.10$) from OGLE III data. The bar is found to have an uniform and higher metallicity compared to the disk, and is indicative of an active bar in the past. Both the data sets suggest a shallow radial metallicity gradient up to a radius of 4 kpc ($-0.049{\pm}0.002$ dex kpc-1 to $-0.066{\pm}0.006$ dex kpc-1). This metallicity gradient of the LMC disk, though shallow, resembles the gradient seen in spiral galaxies, and similar to that found in our Galaxy.

  • PDF

Energy-Aware Video Coding Selection for Solar-Powered Wireless Video Sensor Networks

  • Yi, Jun Min;Noh, Dong Kun;Yoon, Ikjune
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.7
    • /
    • pp.101-108
    • /
    • 2017
  • A wireless image sensor node collecting image data for environmental monitoring or surveillance requires a large amount of energy to transmit the huge amount of video data. Even though solar energy can be used to overcome the energy constraint, since the collected energy is also limited, an efficient energy management scheme for transmitting a large amount of video data is needed. In this paper, we propose a method to reduce the number of blackout nodes and increase the amount of gathered data by selecting an appropriate video coding method according to the energy condition of the node in a solar-powered wireless video sensor network. This scheme allocates the amount of energy that can be used over time in order to seamlessly collect data regardless of night or day, and selects a high compression coding method when the allocated energy is large and a low compression coding when the quota is low. Thereby, it reduces the blackout of the relay node and increases the amount of data obtained at the sink node by allowing the data to be transmitted continuously. Also, if the energy is lower than operating normaly, the frame rate is adjusted to prevent the energy exhaustion of nodes. Simulation results show that the proposed scheme suppresses the energy exhaustion of the relay node and collects more data than other schemes.

Two-Tier Storage DBMS for High-Performance Query Processing

  • Eo, Sang-Hun;Li, Yan;Kim, Ho-Seok;Bae, Hae-Young
    • Journal of Information Processing Systems
    • /
    • v.4 no.1
    • /
    • pp.9-16
    • /
    • 2008
  • This paper describes the design and implementation of a two-tier DBMS for handling massive data and providing faster response time. In the present day, the main requirements of DBMS are figured out using two aspects. The first is handling large amounts of data. And the second is providing fast response time. But in fact, Traditional DBMS cannot fulfill both the requirements. The disk-oriented DBMS can handle massive data but the response time is relatively slower than the memory-resident DBMS. On the other hand, the memory-resident DBMS can provide fast response time but they have original restrictions of database size. In this paper, to meet the requirements of handling large volumes of data and providing fast response time, a two-tier DBMS is proposed. The cold-data which does not require fast response times are managed by disk storage manager, and the hot-data which require fast response time among the large volumes of data are handled by memory storage manager as snapshots. As a result, the proposed system performs significantly better than disk-oriented DBMS with an added advantage to manage massive data at the same time.

Implementation of AIoT Edge Cluster System via Distributed Deep Learning Pipeline

  • Jeon, Sung-Ho;Lee, Cheol-Gyu;Lee, Jae-Deok;Kim, Bo-Seok;Kim, Joo-Man
    • International journal of advanced smart convergence
    • /
    • v.10 no.4
    • /
    • pp.278-288
    • /
    • 2021
  • Recently, IoT systems are cloud-based, so that continuous and large amounts of data collected from sensor nodes are processed in the data server through the cloud. However, in the centralized configuration of large-scale cloud computing, computational processing must be performed at a physical location where data collection and processing take place, and the need for edge computers to reduce the network load of the cloud system is gradually expanding. In this paper, a cluster system consisting of 6 inexpensive Raspberry Pi boards was constructed to perform fast data processing. And we propose "Kubernetes cluster system(KCS)" for processing large data collection and analysis by model distribution and data pipeline method. To compare the performance of this study, an ensemble model of deep learning was built, and the accuracy, processing performance, and processing time through the proposed KCS system and model distribution were compared and analyzed. As a result, the ensemble model was excellent in accuracy, but the KCS implemented as a data pipeline proved to be superior in processing speed..

Camera Calibration Using Neural Network with a Small Amount of Data (소수 데이터의 신경망 학습에 의한 카메라 보정)

  • Do, Yongtae
    • Journal of Sensor Science and Technology
    • /
    • v.28 no.3
    • /
    • pp.182-186
    • /
    • 2019
  • When a camera is employed for 3D sensing, accurate camera calibration is vital as it is a prerequisite for the subsequent steps of the sensing process. Camera calibration is usually performed by complex mathematical modeling and geometric analysis. On the other contrary, data learning using an artificial neural network can establish a transformation relation between the 3D space and the 2D camera image without explicit camera modeling. However, a neural network requires a large amount of accurate data for its learning. A significantly large amount of time and work using a precise system setup is needed to collect extensive data accurately in practice. In this study, we propose a two-step neural calibration method that is effective when only a small amount of learning data is available. In the first step, the camera projection transformation matrix is determined using the limited available data. In the second step, the transformation matrix is used for generating a large amount of synthetic data, and the neural network is trained using the generated data. Results of simulation study have shown that the proposed method as valid and effective.

A Note on Weak Law of targe Numbers for $L^{1}(R)^{1}$

  • Lee, Sung-Ho
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.299-303
    • /
    • 1998
  • In this paper weak laws of large numbers are obtained for random variables in $L^{1}(R)$ which satisfy a compact uniform integrability condition.

  • PDF

Secure Multiparty Computation of Principal Component Analysis (주성분 분석의 안전한 다자간 계산)

  • Kim, Sang-Pil;Lee, Sanghun;Gil, Myeong-Seon;Moon, Yang-Sae;Won, Hee-Sun
    • Journal of KIISE
    • /
    • v.42 no.7
    • /
    • pp.919-928
    • /
    • 2015
  • In recent years, many research efforts have been made on privacy-preserving data mining (PPDM) in data of large volume. In this paper, we propose a PPDM solution based on principal component analysis (PCA), which can be widely used in computing correlation among sensitive data sets. The general method of computing PCA is to collect all the data spread in multiple nodes into a single node before starting the PCA computation; however, this approach discloses sensitive data of individual nodes, involves a large amount of computation, and incurs large communication overheads. To solve the problem, in this paper, we present an efficient method that securely computes PCA without the need to collect all the data. The proposed method shares only limited information among individual nodes, but obtains the same result as that of the original PCA. In addition, we present a dimensionality reduction technique for the proposed method and use it to improve the performance of secure similar document detection. Finally, through various experiments, we show that the proposed method effectively and efficiently works in a large amount of multi-dimensional data.

A Study on Information Strategy Development Using Configuration Management in Large-scale Construction Project (형상관리기법을 활용한 대형 프로젝트 정보화 전략개발)

  • Won, Seo Kyung
    • Proceedings of the Korean Institute of Building Construction Conference
    • /
    • 2018.05a
    • /
    • pp.66-67
    • /
    • 2018
  • Large-scale construction projects require various license and technologies for the manufacturing and handling processes. Also, the whole life cycle business process management determines the success of the project. Then, the efficiency of the business conducted by stakeholders and their possessed technology should be enhanced in order to strengthen their competitive power. For this reason, many experts pointed out to focus on the improvement of the life cycle process and efficient management. Since it is very important to keep up-to-date data and utilize it for work during the long-term project to reflect changes in the large-scale project, the most important part of the project management in project is information change management. Therefore, the objective of this study is applying configuration management(CM) technique in order to managing change data generated for planning in early phase. The result of this research will certainly assist the large-scale project managers in the development of information change management system.

  • PDF