• Title/Summary/Keyword: large-scale systems

Search Result 1,879, Processing Time 0.029 seconds

A Distributed High Dimensional Indexing Structure for Content-based Retrieval of Large Scale Data (대용량 데이터의 내용 기반 검색을 위한 분산 고차원 색인 구조)

  • Cho, Hyun-Hwa;Lee, Mi-Young;Kim, Young-Chang;Chang, Jae-Woo;Lee, Kyu-Chul
    • Journal of KIISE:Databases
    • /
    • v.37 no.5
    • /
    • pp.228-237
    • /
    • 2010
  • Although conventional index structures provide various nearest-neighbor search algorithms for high-dimensional data, there are additional requirements to increase search performances as well as to support index scalability for large scale data. To support these requirements, we propose a distributed high-dimensional indexing structure based on cluster systems, called a Distributed Vector Approximation-tree (DVA-tree), which is a two-level structure consisting of a hybrid spill-tree and VA-files. We also describe the algorithms used for constructing the DVA-tree over multiple machines and performing distributed k-nearest neighbors (NN) searches. To evaluate the performance of the DVA-tree, we conduct an experimental study using both real and synthetic datasets. The results show that our proposed method contributes to significant performance advantages over existing index structures on difference kinds of datasets.

An Energy Efficient RF Protocol Structure for a Large-Scale In-Home Display Deployment (대규모 In-Home Display 보급을 위한 에너지 효율적 RF 통신 프로토콜 체계)

  • Lee, Seung-Min;Son, Sung-Yong
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.4 no.1
    • /
    • pp.53-60
    • /
    • 2011
  • In-Home Display (IHD) is one of the most popular ways to induce voluntary customer participation in energy savings. Various communication technologies are used for recent IHD implementations, but most IHD systems are designed for each house because of their limitations such as communication coverage area and operation complexity. In this study, 400MHz RF communication is used for economical large-scale deployment of IHDs especially for apartment complexes that represent typical residentioal environment in Korea. Since it is essential to use internal batteries to increase the usability of IHD, the frequent changes of them should be avoided. By dividing communication data into 3 types such as common data, long term data, and short term data depending on their update periods, energy efficient communication protocol is designed and proposed. In result, the quantity of data and the battery consumption of IHD are reduced to 23.4% and 31.5% each without harming service quality.

MarSel : LD based tagSNP Selection System for Large-scale SNP Haplotype Dataset (MarSel : 대용량 SNP 일배체형 데이터에 대한 연관불균형기반의 tagSNP 선택 시스템)

  • Kim Sang-Jun;Yeo Sang-Soo;Kim Sung-Kwon
    • The KIPS Transactions:PartA
    • /
    • v.13A no.1 s.98
    • /
    • pp.79-86
    • /
    • 2006
  • Recently the tagSNP selection problem has been researched for reducing the cost of association studies between human's diversities and SNPs. General approach for this problem is that all of SNPs are separated into appropriate blocks and then tagSNPs are chosen in each block. Marsel in this paper is the system that involved the concept of linkage disequilibrium for overcoming the problem that the existing block partitioning approaches have short of biological meanings. In most approaches, the contiguous regions, which recombinations have LD coefficient |D'| and then tagSNP selection step is performed. And MarSel guarantees the minimum tagSNP selection using entropy-based optimal selection algorithm when tagSNPs are chosen in each block, and enables chromosome-level association studies using efficient memory management technique when input is very large-scale dataset that is impossible to be processed in the existing systems.

SVM-Based Incremental Learning Algorithm for Large-Scale Data Stream in Cloud Computing

  • Wang, Ning;Yang, Yang;Feng, Liyuan;Mi, Zhenqiang;Meng, Kun;Ji, Qing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3378-3393
    • /
    • 2014
  • We have witnessed the rapid development of information technology in recent years. One of the key phenomena is the fast, near-exponential increase of data. Consequently, most of the traditional data classification methods fail to meet the dynamic and real-time demands of today's data processing and analyzing needs--especially for continuous data streams. This paper proposes an improved incremental learning algorithm for a large-scale data stream, which is based on SVM (Support Vector Machine) and is named DS-IILS. The DS-IILS takes the load condition of the entire system and the node performance into consideration to improve efficiency. The threshold of the distance to the optimal separating hyperplane is given in the DS-IILS algorithm. The samples of the history sample set and the incremental sample set that are within the scope of the threshold are all reserved. These reserved samples are treated as the training sample set. To design a more accurate classifier, the effects of the data volumes of the history sample set and the incremental sample set are handled by weighted processing. Finally, the algorithm is implemented in a cloud computing system and is applied to study user behaviors. The results of the experiment are provided and compared with other incremental learning algorithms. The results show that the DS-IILS can improve training efficiency and guarantee relatively high classification accuracy at the same time, which is consistent with the theoretical analysis.

An Analysis of the Overhead of Multiple Buffer Pool Scheme on InnoDB-based Database Management Systems (InnoDB 기반 DBMS에서 다중 버퍼 풀 오버헤드 분석)

  • Song, Yongju;Lee, Minho;Eom, Young Ik
    • Journal of KIISE
    • /
    • v.43 no.11
    • /
    • pp.1216-1222
    • /
    • 2016
  • The advent of large-scale web services has resulted in gradual increase in the amount of data used in those services. These big data are managed efficiently by DBMS such as MySQL and MariaDB, which use InnoDB engine as their storage engine, since InnoDB guarantees ACID and is suitable for handling large-scale data. To improve I/O performance, InnoDB caches data and index of its database through a buffer pool. It also supports multiple buffer pools to mitigate lock contentions. However, the multiple buffer pool scheme leads to the additional data consistency overhead. In this paper, we analyze the overhead of the multiple buffer pool scheme. In our experimental results, although multiple buffer pool scheme mitigates the lock contention by up to 46.3%, throughput of DMBS is significantly degraded by up to 50.6% due to increased disk I/O and fsync calls.

Two-phase Multicast in Wormhole-switched Bidirectional Banyan Networks (웜홀 스위칭하는 양방향 베니언 망에서의 두 단계 멀티캐스트)

  • Kwon, Wi-Nam;Kwon, Bo-Seob;Park, Jae-Hyung;Yun, Hyeon-Su
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.3
    • /
    • pp.255-263
    • /
    • 2000
  • A multistage interconnection network is a suitable class of interconnection architecture for constructing large-scale multicomputers. Broadcast and multicast communication are fundamental in supporting collective communication operations such as reduction and barrier synchronization. In this paper, we propose a new multicast technique in wormhole-switched bidirectional multistage banyan networks for constructing large-scale multicomputers. To efficiently support broadcast and multicast with simple additional hardware without deadlock, we propose a two-phase multicast algorithm which takes only two transmissions to perform a broadcast and a multicast to an arbitrary number of desired destinations. We encode a header as a cube and adopt the most upper input link first scheme with periodic priority rotation as arbitration mechanism on contented output links. We coalesce the desired destination addresses into multiple number of cubes. And then, we evaluate the performance of the proposed algorithm by simulation. The proposed two-phase multicast algorithm makes a significant improvement in terms of latency. It is noticeable that the two-phase algorithm keeps broadcast latency as efficient as the multicast latency of fanout 2^m where m is the minimum integer satisfying $2^m{\geq} {\sqrt{N}}$ ( N is a network size).

  • PDF

A Large-scale Test Set for Author Disambiguation (저자 식별을 위한 대용량 평가셋 구축)

  • Kang, In-Su;Kim, Pyung;Lee, Seung-Woo;Jung, Han-Min;You, Beom-Jong
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.11
    • /
    • pp.455-464
    • /
    • 2009
  • To overcome article-oriented search functions and provide author-oriented ones, a namesake problem for author names should be solved. Author disambiguation, proposed as its solution, assigns identifiers of real individuals to author name entities. Although recent state-of-the-art approaches to author disambiguation have reported above 90% performance, there are few academic information services which adopt author-resolving functions. This paper describes a large-scale test set for author disambiguation which was created by KISTI to foster author resolution researches. The result of these researches can be applied to academic information systems and make better service. The test set was constructed from DBLP data through web searches and manual inspection, Currently it consists of 881 author names, 41,673 author name entities, and 6,921 person identifiers.

Large-scale Ambient Display Environment for providing Multi Spatial Interaction Interface (멀티 공간 인터랙션 인터페이스 제공을 위한 대규모 앰비언트 디스플레이 환경)

  • Yun, Chang Ok;Park, Jung Pil;Yun, Tae Soo;Lee, Dong Hoon
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2009.05a
    • /
    • pp.30-34
    • /
    • 2009
  • Recently, systems providing the interaction different according to an interval between a user and the display were developed in order to construct the ambient or the ubiquitous computing environment. Therefore, we propose a new type of spatial interaction system; our main goal is to provide the interactive domain in the large-scale ambient display environment. So, we divide into two zones of interaction dependent on the distance from the interaction surface interactive zone and ambient zone. In interactive zone, the users can approach the interaction surface and interact with natural hand-touch. When the users are outside the range of the interactive zone, the display shows only general information. Therefore, this system offers the various interactions and information to users in the ubiquitous ambient environment.

  • PDF

A Study on the PEM Electrolysis Characteristics Using Ti Mesh Coated with Electrocatalysts (Ti Mesh 처리 촉매전극을 이용한 고체고분자 전해질 전기분해 특성연구)

  • Sim, Kyu-Sung;Kim, Youn-Soon;Kim, Jong-Won;Han, Sang-Do
    • Journal of Hydrogen and New Energy
    • /
    • v.7 no.1
    • /
    • pp.29-37
    • /
    • 1996
  • Alkaline water electrolysis has been commercialized as the only large-scale method for a long time to produce hydrogen and the technology is superior to other methods such as photochemical, thermochemical water splitting, and thermal decomposition method in view of efficiency and related technical problem. However, such conventional electrolyzer do not have high electric efficiency and productivity to apply to large scale hydrogen production for energy or chemical feedstocks. Solid polymer electrolyte water electrolysis using a perfluorocation exchange membrane as an $H^+$ ion conductor is considered to be a promising method, because of capability for operating at high current densities and low cell voltages. So, this is a good technology for the storage of electricity generated by photovoltaic power plants, wind generators and other energy conversion systems. One of the most important R&D topics in electrolyser is how to minimize cell voltage and maximize current density in order to increase the productivity of the electrolyzer. A commercialized technology is the hot press method which the film type electrocatalyst is hot-pressed to soild polymer membrane in order to eliminate the contact resistance. Various technologies, electrocatalyst formed over Nafion membrane surface by means of nonelectrolytic plating process, porous sintered metal(titanium powder) or titanium mesh coated with electrocatalyst, have been studied for preparation of membrane-electrocatalyst composites. In this study some experiments have been conducted at a solid polymer electrolyte water electrolyzer, which consisted of single cell stack with an electrode area of $25cm^2$ in a unipolar arrangement using titanium mesh coated with electrocatalyst.

  • PDF

Pilot Hopping Scheme for Massive Antenna Systems in Cellular Networks (극다중 안테나 셀룰러 시스템을 위한 파일럿 도약 기법)

  • Kim, Seong Hwan;Ban, Tae-Won;Lee, Wongsup;Ryu, Jong Yeol
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.4
    • /
    • pp.718-723
    • /
    • 2017
  • We propose a pilot hopping scheme that improves the limited system capacity due to pilot contamination in multi-cell environment with large-scale antenna arrays at a base station, assuming the infinite number of antennas. In the conventional fixed pilot scheme, each user obtains the same signal-to-interference ratio (SIR) over a long period of time. Therefore, a user with strong interference has continuously low SIR which degrades its service quality. In the proposed pilot hopping scheme, different pilot signals are used for each time slot, and different amounts of interference are received every time. When such a pilot hopping technique is applied, the SIR fluctuates at every time slot. When the Hybrid Automatic Repeat & reQuest (HARQ) technique is applied in such a channel, the outage probability and transmission rate are improved. We show that there is the performance gain of the proposed scheme over the conventional scheme through computer simulations.