• Title/Summary/Keyword: Large data

Search Result 14,138, Processing Time 0.039 seconds

Pattern mining for large distributed dataset: A parallel approach (PMLDD)

  • Pal, Amrit;Kumar, Manish
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.11
    • /
    • pp.5287-5303
    • /
    • 2018
  • Handling vast amount of data found in large transactional datasets is an obvious challenge for the conventional data mining algorithms. Addressing this challenge, our paper proposes a parallel approach for proper decomposition of mining problem into sub-problems in order to find frequent patterns from these datasets. The proposed, Pattern Mining for Large Distributed Dataset (PMLDD) approach, ensures minimum dependencies as well as minimum communications among sub-problems. It establishes a linear aggregation of the intermediate results so that it can be adapted to large-scale programming models like MapReduce. In this context, an algorithmic structure for MapReduce programming model is presented. PMLDD guarantees an efficient load balancing among the sub-problems by a specific selection criterion. Further, it optimizes the number of required iterations over the dataset for mining frequent patterns as compared to the existing approaches. Finally, we believe that our approach is scalable enough to handle larger datasets in terms of performance evaluation, and the result analysis justifies all these mentioned concerns.

Seasonal Variations of $SO_2$Dry Deposition Velocity Obtained by Sonic Anemometer-Thermometer (초음파 풍속온도계를 이용한 $SO_2$건성침착속도의 계절변화 특징)

  • 이종범;박세영
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.14 no.5
    • /
    • pp.465-478
    • /
    • 1998
  • In this study, seasonal variations of the dry deposition velocity and deposition flux for the sulfur dioxide were analysed. The field observation was performed during one year (from November 1, 1995 to October 31, 1996) in Chunchon basin. The turbulence data were measured by 3-dimensional sonic anemometer/thermometer, and were estimated by mean meteorological data obtained at two heights (2.5 m and 10 m) of meteorological tower. Also, the estimation methods were evaluated by comparing the turbulence data. The results showed that the estimated dry deposition velocity and turbulence parameter such as uc and sensible heat flux using mean meteorological data were relatively similar to the sonic measurements, but all showed somewhat large differences. The dry deposition velocity was large in summer and small in winter mainly due to canopy resistance (rc). The major factor which affects diurnal variation of the velocity was aerodynamic resistance (rw). The SO2 dry deposition flux was large in winter and small in summer in Chunchon.

  • PDF

A Study on Developing and Refining a Large Citation Service System

  • Kim, Kwang-Young;Kim, Hwan-Min
    • International Journal of Knowledge Content Development & Technology
    • /
    • v.3 no.1
    • /
    • pp.65-80
    • /
    • 2013
  • Today, citation index information is used as an outcome scale of spreading technology and encouraging research. Article citation information is an important factor to determine the authority of the relevant author. Google Scholar uses the article citation information to organize academic article search results with a rank algorithm. For an accurate analysis of such important citation index information, large amounts of bibliographic data are required. Therefore, this study aims to build a fast and efficient system for large amounts of bibliographic data, and to design and develop a system for quickly analyzing cited information for that data. This study also aims to use and analyze citation data to be a basic element for providing various advanced services to the academic article search system.

Seperation of foreground stars using proper motion data in the Large Magellanic Cloud

  • Kim, Jae-Yeong;Pak, Soo-Jong;Choi, Min-Ho;Kandori, Ryo;Tamura, Motohide;Nagata, Tetsuya;Kwon, Jung-Mi;Kato, Daisuke;Jaffe, Daniel T.
    • The Bulletin of The Korean Astronomical Society
    • /
    • v.36 no.1
    • /
    • pp.31.1-31.1
    • /
    • 2011
  • We present wide-field near-IR imaging polarimetry of 30 Doradus in the Large Magellanic Cloud, using the InfraRed Survey Facility (IRSF). We obtained polarimetry data in J, H, and Ks bands using the JHKs-simultaneous imaging polarimeter SIRPOL. Since many Galactic field stars along the line-of-sight to the Large Magellanic Cloud are contaminated in our data, we developed methods to identify the foreground sources using the proper motion data. We investigated polarimetric properties between the Galactic foreground stars and the stars in the LMC.

  • PDF

Data augmentation technique based on image binarization for constructing large-scale datasets (대형 이미지 데이터셋 구축을 위한 이미지 이진화 기반 데이터 증강 기법)

  • Lee JuHyeok;Kim Mi Hui
    • Journal of IKEEE
    • /
    • v.27 no.1
    • /
    • pp.59-64
    • /
    • 2023
  • Deep learning can solve various computer vision problems, but it requires a large dataset. Data augmentation technique based on image binarization for constructing large-scale datasets is proposed in this paper. By extracting features using image binarization and randomly placing the remaining pixels, new images are generated. The generated images showed similar quality to the original images and demonstrated excellent performance in deep learning models.

Efficient Top-K Queries Computation for Encrypted Data in the Cloud (클라우드 환경에서의 암호화 데이터에 대한 효율적인 Top-K 질의 수행 기법)

  • Kim, Jong Wook
    • Journal of Korea Multimedia Society
    • /
    • v.18 no.8
    • /
    • pp.915-924
    • /
    • 2015
  • With growing popularity of cloud computing services, users can more easily manage massive amount of data by outsourcing them to the cloud, or more efficiently analyse large amount of data by leveraging IT infrastructure provided by the cloud. This, however, brings the security concerns of sensitive data. To provide data security, it is essential to encrypt sensitive data before uploading it to cloud computing services. Although data encryption helps provide data security, it negatively affects the performance of massive data analytics because it forbids the use of index and mathematical operation on encrypted data. Thus, in this paper, we propose a novel algorithm which enables to efficiently process a large amount of encrypted data. In particular, we propose a novel top-k processing algorithm on the massive amount of encrypted data in the cloud computing environments, and verify the performance of the proposed approach with real data experiments.

Volume Rendering using Grid Computing for Large-Scale Volume Data

  • Nishihashi, Kunihiko;Higaki, Toru;Okabe, Kenji;Raytchev, Bisser;Tamaki, Toru;Kaneda, Kazufumi
    • International Journal of CAD/CAM
    • /
    • v.9 no.1
    • /
    • pp.111-120
    • /
    • 2010
  • In this paper, we propose a volume rendering method using grid computing for large-scale volume data. Grid computing is attractive because medical institutions and research facilities often have a large number of idle computers. A large-scale volume data is divided into sub-volumes and the sub-volumes are rendered using grid computing. When using grid computing, different computers rarely have the same processor speeds. Thus the return order of results rarely matches the sending order. However order is vital when combining results to create a final image. Job-Scheduling is important in grid computing for volume rendering, so we use an obstacle-flag which changes priorities dynamically to manage sub-volume results. Obstacle-Flags manage visibility of each sub-volume when line of sight from the view point is obscured by other subvolumes. The proposed Dynamic Job-Scheduling based on visibility substantially increases efficiency. Our Dynamic Job-Scheduling method was implemented on our university's campus grid and we conducted comparative experiments, which showed that the proposed method provides significant improvements in efficiency for large-scale volume rendering.

Development of Extra-large Hydraulic Breaker (초대형 유압브레이커 개발)

  • Ahn, Kyubok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.5
    • /
    • pp.3081-3086
    • /
    • 2015
  • Development of a extra-large hydraulic breaker, which could be used for a 100 ton-class excavator were carried out Hot-firing tests were carried out. Before designing a hydraulic breaker, the analysis method to predict the performance such as impact energy and impact rate were studied. Based on the analysis result, the design and manufacture of a extra-large hydraulic breaker were performed, and the breaker were confirmed to operate successfully. The data of impact energy and impact rate were measured during the operation of the breaker, and were compared with the analysis result. The analysis result of impact rate anticipated well the test data, but that of impact energy showed a large difference with the test data. The extra-large hydraulic breaker were successfully developed and the analysis method of impact energy will be updated taking into account friction, hydraulic circuit, etc.

Sensor placement selection of SHM using tolerance domain and second order eigenvalue sensitivity

  • He, L.;Zhang, C.W.;Ou, J.P.
    • Smart Structures and Systems
    • /
    • v.2 no.2
    • /
    • pp.189-208
    • /
    • 2006
  • Monitoring large-scale civil engineering structures such as offshore platforms and high-large buildings requires a large number of sensors of different types. Innovative sensor data information technologies are very extremely important for data transmission, storage and retrieval of large volume sensor data generated from large sensor networks. How to obtain the optimal sensor set and placement is more and more concerned by researchers in vibration-based SHM. In this paper, a method of determining the sensor location which aims to extract the dynamic parameter effectively is presented. The method selects the number and place of sensor being installed on or in structure by through the tolerance domain statistical inference algorithm combined with second order sensitivity technology. The method proposal first finds and determines the sub-set sensors from the theoretic measure point derived from analytical model by the statistical tolerance domain procedure under the principle of modal effective independence. The second step is to judge whether the sorted out measured point set has sensitive to the dynamic change of structure by utilizing second order characteristic value sensitivity analysis. A 76-high-building benchmark mode and an offshore platform structure sensor optimal selection are demonstrated and result shows that the method is available and feasible.

Development of the Design Methodology for Large-scale Data Warehouse based on MongoDB

  • Lee, Junho;Joo, Kyungsoo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.23 no.3
    • /
    • pp.49-54
    • /
    • 2018
  • A data warehouse is a system that collectively manages and integrates data of a company. And provides the basis for decision making for management strategy. Nowadays, analysis data volumes are reaching critical size challenging traditional data ware housing approaches. Current implemented solutions are mainly based on relational database that are no longer adapted to these data volume. NoSQL solutions allow us to consider new approaches for data warehousing, especially from the multidimensional data management point of view. In this paper, we extend the data warehouse design methodology based on relational database using star schema, and have developed a consistent design methodology from information requirement analysis to data warehouse construction for large scale data warehouse construction based on MongoDB, one of NoSQL.