• Title/Summary/Keyword: Massive Data Processing

Search Result 231, Processing Time 0.03 seconds

A Tentative Study on Inter-Library Cooperation (도서관(圖書館) 상호협력(相互協力)에 관한 시론적고찰(試論的考察))

  • Kim, Se-Ick
    • Journal of the Korean BIBLIA Society for library and Information Science
    • /
    • v.6 no.1
    • /
    • pp.5-46
    • /
    • 1984
  • Today, we have often heard that information is a national resources, perhaps our most important national resources, and that government should recognize this and develop a national policy for the effective management and use of information. But traditionally the library has played a rather passive role in selecting, acquiring, preserving and transfering information. Now we have faced greater and much more diverse and more urgent demand from users and we have to cope effectively with a massive increase in volume and in cost in selecting, acquiring, storing and retrieving information. Individual libraries can no longer keep pace with the published outputs of the world. The solution of these problems lies in "national planning and in cooperation through inter-library cooperation and in the application of data processing to library operations." Also at the national level lies the responsibility for total bibliographic control of the national information output and for the interfacing of this with other national and international systems; for the development, adoption and maintenance of, standards in all areas affecting library work; for the provision of services based on centrally created and maintained bibliographic data files, and for planning and policy development of the national information system.

  • PDF

HA-PVFS : A PVFS File System supporting High Data Availability Adaptive to Temporal Locality (HA-PVFS : 시간적 지역성에 적응적인 데이터 고가용성을 지원하는 PVFS 파일 시스템)

  • Sim Sang-Man;Han Sae-Young;Park Sung-Yong
    • The KIPS Transactions:PartA
    • /
    • v.13A no.3 s.100
    • /
    • pp.241-252
    • /
    • 2006
  • In cluster file systems, the availability of files has been supported by replicating entire files or generating parities on parity servers. However, those methods require very large temporal and spatial cost, and cannot handle massive failures situation on the file system. So we propose HA-PVFS, a cluster file system supporting high data availability adaptive to temporal locality. HA-PVFS restricts replication or parity generation to some important files, for that it employs an efficient algorithm to estimate file access patterns from limited information. Moreover, in order to minimize the performance degradation of the file system, it uses delayed update method and relay replication.

Enhancing the Text Mining Process by Implementation of Average-Stochastic Gradient Descent Weight Dropped Long-Short Memory

  • Annaluri, Sreenivasa Rao;Attili, Venkata Ramana
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.7
    • /
    • pp.352-358
    • /
    • 2022
  • Text mining is an important process used for analyzing the data collected from different sources like videos, audio, social media, and so on. The tools like Natural Language Processing (NLP) are mostly used in real-time applications. In the earlier research, text mining approaches were implemented using long-short memory (LSTM) networks. In this paper, text mining is performed using average-stochastic gradient descent weight-dropped (AWD)-LSTM techniques to obtain better accuracy and performance. The proposed model is effectively demonstrated by considering the internet movie database (IMDB) reviews. To implement the proposed model Python language was used due to easy adaptability and flexibility while dealing with massive data sets/databases. From the results, it is seen that the proposed LSTM plus weight dropped plus embedding model demonstrated an accuracy of 88.36% as compared to the previous models of AWD LSTM as 85.64. This result proved to be far better when compared with the results obtained by just LSTM model (with 85.16%) accuracy. Finally, the loss function proved to decrease from 0.341 to 0.299 using the proposed model

Iceberg Query Evaluation Technical Using a Cuboid Prefix Tree (큐보이드 전위트리를 이용한 빙산질의 처리)

  • Han, Sang-Gil;Yang, Woo-Sock;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.36 no.3
    • /
    • pp.226-234
    • /
    • 2009
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to the characteristics of a data stream, it is impossible to save all the data elements of a data stream. Therefore it is necessary to define a new synopsis structure to store the summary information of a data stream. For this purpose, this paper proposes a cuboid prefix tree that can be effectively employed in evaluating an iceberg query over data streams. A cuboid prefix tree only stores those itemsets that consist of grouping attributes used in GROUP BY query. In addition, a cuboid prefix tree can compute multiple iceberg queries simultaneously by sharing their common sub-expressions. A cuboid prefix tree evaluates an iceberg query over an infinitely generated data stream while efficiently reducing memory usage and processing time, which is verified by a series of experiments.

A Study on the Data Analysis of Fire Simulation in Underground Utility Tunnel for Digital Twin Application (디지털트윈 적용을 위한 지하공동구 화재 시뮬레이션의 데이터 분석 연구)

  • Jae-Ho Lee;Se-Hong Min
    • Journal of the Society of Disaster Information
    • /
    • v.20 no.1
    • /
    • pp.82-92
    • /
    • 2024
  • Purpose: The purpose of this study is to find a solution to the massive data construction that occurs when fire simulation data is linked to augmented reality and the resulting data overload problem. Method: An experiment was conducted to set the interval between appropriate input data to improve the reliability and computational complexity of Linear Interpolation, a data estimation technology. In addition, a validity verification was conducted to confirm whether Linear Interpolation well reflected the dynamic changes of fire. Result: As a result of application to the underground common area, which is the study target building, it showed high satisfaction in improving the reliability of Interpolation and the operation processing speed of simulation when data was input at intervals of 10 m. In addition, it was verified through evaluation using MAE and R-Squared that the estimation method of fire simulation data using the Interpolation technique had high explanatory power and reliability. Conclusion: This study solved the data overload problem caused by applying digital twin technology to fire simulation through Interpolation techniques, and confirmed that fire information prediction and visualization were of great help in real-time fire prevention.

A study on searching image by cluster indexing and sequential I/O (연속적 I/O와 클러스터 인덱싱 구조를 이용한 이미지 데이타 검색 연구)

  • Kim, Jin-Ok;Hwang, Dae-Joon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.5
    • /
    • pp.779-788
    • /
    • 2002
  • There are many technically difficult issues in searching multimedia data such as image, video and audio because they are massive and more complex than simple text-based data. As a method of searching multimedia data, a similarity retrieval has been studied to retrieve automatically basic features of multimedia data and to make a search among data with retrieved features because exact match is not adaptable to a matrix of features of multimedia. In this paper, data clustering and its indexing are proposed as a speedy similarity-retrieval method of multimedia data. This approach clusters similar images on adjacent disk cylinders and then builds Indexes to access the clusters. To minimize the search cost, the hashing is adapted to index cluster. In addition, to reduce I/O time, the proposed searching takes just one I/O to look up the location of the cluster containing similar object and one sequential file I/O to read in this cluster. The proposed schema solves the problem of multi-dimension by using clustering and its indexing and has higher search efficiency than the content-based image retrieval that uses only clustering or indexing structure.

Relationships Between the Characteristics of the Business Data Set and Forecasting Accuracy of Prediction models (시계열 데이터의 성격과 예측 모델의 예측력에 관한 연구)

  • 이원하;최종욱
    • Journal of Intelligence and Information Systems
    • /
    • v.4 no.1
    • /
    • pp.133-147
    • /
    • 1998
  • Recently, many researchers have been involved in finding deterministic equations which can accurately predict future event, based on chaotic theory, or fractal theory. The theory says that some events which seem very random but internally deterministic can be accurately predicted by fractal equations. In contrast to the conventional methods, such as AR model, MA, model, or ARIMA model, the fractal equation attempts to discover a deterministic order inherent in time series data set. In discovering deterministic order, researchers have found that neural networks are much more effective than the conventional statistical models. Even though prediction accuracy of the network can be different depending on the topological structure and modification of the algorithms, many researchers asserted that the neural network systems outperforms other systems, because of non-linear behaviour of the network models, mechanisms of massive parallel processing, generalization capability based on adaptive learning. However, recent survey shows that prediction accuracy of the forecasting models can be determined by the model structure and data structures. In the experiments based on actual economic data sets, it was found that the prediction accuracy of the neural network model is similar to the performance level of the conventional forecasting model. Especially, for the data set which is deterministically chaotic, the AR model, a conventional statistical model, was not significantly different from the MLP model, a neural network model. This result shows that the forecasting model. This result shows that the forecasting model a, pp.opriate to a prediction task should be selected based on characteristics of the time series data set. Analysis of the characteristics of the data set was performed by fractal analysis, measurement of Hurst index, and measurement of Lyapunov exponents. As a conclusion, a significant difference was not found in forecasting future events for the time series data which is deterministically chaotic, between a conventional forecasting model and a typical neural network model.

  • PDF

An Adaptive Grid-based Clustering Algorithm over Multi-dimensional Data Streams (적응적 격자기반 다차원 데이터 스트림 클러스터링 방법)

  • Park, Nam-Hun;Lee, Won-Suk
    • The KIPS Transactions:PartD
    • /
    • v.14D no.7
    • /
    • pp.733-742
    • /
    • 2007
  • A data stream is a massive unbounded sequence of data elements continuously generated at a rapid rate. Due to this reason, memory usage for data stream analysis should be confined finitely although new data elements are continuously generated in a data stream. To satisfy this requirement, data stream processing sacrifices the correctness of its analysis result by allowing some errors. The old distribution statistics are diminished by a predefined decay rate as time goes by, so that the effect of the obsolete information on the current result of clustering can be eliminated without maintaining any data element physically. This paper proposes a grid based clustering algorithm for a data stream. Given a set of initial grid cells, the dense range of a grid cell is recursively partitioned into a smaller cell based on the distribution statistics of data elements by a top down manner until the smallest cell, called a unit cell, is identified. Since only the distribution statistics of data elements are maintained by dynamically partitioned grid cells, the clusters of a data stream can be effectively found without maintaining the data elements physically. Furthermore, the memory usage of the proposed algorithm is adjusted adaptively to the size of confined memory space by flexibly resizing the size of a unit cell. As a result, the confined memory space can be fully utilized to generate the result of clustering as accurately as possible. The proposed algorithm is analyzed by a series of experiments to identify its various characteristics

Optimizing Multi-way Join Query Over Data Streams (데이타 스트림에서의 다중 조인 질의 최적화 방법)

  • Park, Hong-Kyu;Lee, Won-Suk
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.459-468
    • /
    • 2008
  • A data stream which is a massive unbounded sequence of data elements continuously generated at a rapid rate. Many recent research activities for emerging applications often need to deal with the data stream. Such applications can be web click monitoring, sensor data processing, network traffic analysis. telephone records and multi-media data. For this. data processing over a data stream are not performed on the stored data but performed the newly updated data with pre-registered queries, and then return a result immediately or periodically. Recently, many studies are focused on dealing with a data stream more than a stored data set. Especially. there are many researches to optimize continuous queries in order to perform them efficiently. This paper proposes a query optimization algorithm to manage continuous query which has multiple join operators(Multi-way join) over data streams. It is called by an Extended Greedy query optimization based on a greedy algorithm. It defines a join cost by a required operation to compute a join and an operation to process a result and then stores all information for computing join cost and join cost in the statistics catalog. To overcome a weak point of greedy algorithm which has poor performance, the algorithm selects the set of operators with a small lay, instead of operator with the smallest cost. The set is influenced the accuracy and execution time of the algorithm and can be controlled adaptively by two user-defined values. Experiment results illustrate the performance of the EGA algorithm in various stream environments.

A Study on Optimum Coding Method for Correlation Processing of Radio Astronomy (전파천문 상관처리를 위한 최적 코딩 방법에 관한 연구)

  • Shin, Jae-Sik;Oh, Se-Jin;Yeom, Jae-Hwan;Roh, Duk-Gyoo;Chung, Dong-Kyu;Oh, Chung-Sik;Hwang, Ju-Yeon;So, Yo-Hwan
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.16 no.4
    • /
    • pp.139-148
    • /
    • 2015
  • In this paper, the optimum coding method is proposed by using open library in order to improve the performance of a software correlator developed for Korea-Japan Joint VLBI Correlator(KJJVC). The correlation system for VLBI observing system is generally implemented with hardware using ASIC or FPGA because the computational quantity is increased geometrically according to the participated observatory number. However, the software correlation system is recently constructed at a massive server such as a cluster using software according to the development of computing power. Since VLBI correlator implemented with hardware is able to conduct data processing with real-time or quasi real-time compared with mostly observational time, software correlation has to perform optimal data processing in coding work so as to have the same performance as that of the hardware. Therefore, in this paper, the experimental comparison was conducted by open-source based fftw library released in FFT processing stage, which is the most important part of the correlator system for performing optimum coding work in software development phase, such as general method using fftw library or methods using SSE(Streaming SIMD Extensions), shared memory, or OpenMP, and method using merged techniques listed above. Through the experimental results, the proposed optimum coding method for improving the performance of developed software correlator using fftw library, shared memory and OpenMP is effectively confirmed by reducing correlation time compared with conventional method.