• 제목/요약/키워드: data partition

검색결과 416건 처리시간 0.027초

Spatial Partitioning using filbert Space Filling Curve for Spatial Query Optimization (공간 질의 최적화를 위한 힐버트 공간 순서화에 따른 공간 분할)

  • Whang, Whan-Kyu;Kim, Hyun-Guk
    • The KIPS Transactions:PartD
    • /
    • 제11D권1호
    • /
    • pp.23-30
    • /
    • 2004
  • In order to approximate the spatial query result size we partition the input rectangles into subsets and estimate the query result size based on the partitioned spatial area. In this paper we examine query result size estimation in skewed data. We examine the existing spatial partitioning techniques such as equi-area and equi-count partitioning, which are analogous to the equi-width and equi-height histograms used in relational databases, and examine the other partitioning techniques based on spatial indexing. In this paper we propose a new spatial partitioning technique based on the Hilbert space filling curve. We present a detailed experimental evaluation comparing the proposed technique and the existing techniques using synthetic as well as real-life datasets. The experiments showed that the proposed partitioning technique based on the Hilbert space filling curve achieves better query result size estimation than the existing techniques for space query size, bucket numbers, skewed data, and spatial data size.

Modeling the Fate of Priority Pharmaceuticals in Korea in a Conventional Sewage Treatment Plant

  • Kim, Hyo-Jung;Lee, Hyun-Jeoung;Lee, Dong-Soo;Kwon, Jung-Hwan
    • Environmental Engineering Research
    • /
    • 제14권3호
    • /
    • pp.186-194
    • /
    • 2009
  • Understanding the environmental fate of human and animal pharmaceuticals and their risk assessment are of great importance due to their growing environmental concerns. Although there are many potential pathways for them to reach the environment, effluents from sewage treatment plants (STPs) are recognized as major point sources. In this study, the removal efficiencies of the 43 selected priority pharmaceuticals in a conventional STP were evaluated using two simple models: an equilibrium partitioning model (EPM) and STPWIN$^{TM}$ program developed by US EPA. It was expected that many pharmaceuticals are not likely to be removed by conventional activated sludge processes because of their relatively low sorption potential to suspended sludge and low biodegradability. Only a few pharmaceuticals were predicted to be easily removed by sorption or biodegradation, and hence a conventional STP may not protect the environment from the release of unwanted pharmaceuticals. However, the prediction made in this study strongly relies on sorption coefficient to suspended sludge and biodegradation half-lives, which may vary significantly depending on models. Removal efficiencies predicted using the EPM were typically higher than those predicted by STPWIN for many hydrophilic pharmaceuticals due to the difference in prediction method for sorption coefficients. Comparison with experimental organic carbon-water partition coefficients ($K_{ocs}) revealed that log KOW-based estimation used in STPWIN is likely to underestimate sorption coefficients, thus resulting low removal efficiency by sorption. Predicted values by the EPM were consistent with limited experimental data although this model does not include biodegradation processes, implying that this simple model can be very useful with reliable Koc values. Because there are not many experimental data available for priority pharmaceuticals to evaluate the model performance, it should be important to obtain reliable experimental data including sorption coefficients and biodegradation rate constants for the prediction of the fate of the selected pharmaceuticals.

An Analysis for the Structural Variation in the Unemployment Rate and the Test for the Turning Point (실업률 변동구조의 분석과 전환점 진단)

  • Kim, Tae-Ho;Hwang, Sung-Hye;Lee, Young-Hoon
    • The Korean Journal of Applied Statistics
    • /
    • 제18권2호
    • /
    • pp.253-269
    • /
    • 2005
  • One of the basic assumptions of the regression models is that the parameter vector does not vary across sample observations. If the parameter vector is not constant for all observations in the sample, the statistical model is changed and the usual least squares estimators do not yield unbiased, consistent and efficient estimates. This study investigates the regression model with some or all parameters vary across partitions of the whole sample data when the model permits different response coefficients during unusual time periods. Since the usual test for overall homogeneity of regressions across partitions of the sample data does not explicitly identify the break points between the partitions, the testing the equality between subsets of coefficients in two or more linear regressions is generalized and combined with the test procedure to search the break point. The method is applied to find the possibility and the turning point of the structural change in the long-run unemployment rate in the usual static framework by using the regression model. The relationships between the variables included in the model are reexamined in the dynamic framework by using Vector Autoregression.

Reduction of Input Pins in VLSI Array for High Speed Fractal Image Compression (고속 프랙탈 영상압축을 위한 VLSI 어레이의 입력핀의 감소)

  • 성길영;전상현;이수진;우종호
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제26권12A호
    • /
    • pp.2059-2066
    • /
    • 2001
  • In this paper, we proposed a method to reduce the number of input pins in one-dimensional VLSI array for fractal image compression. We use quad-tree partition scheme and can reduce the number of the input pins up to 50% by sharing the domain\`s and the range\`s data input pins in the proposed VLSI array architecture. Also, we can reduce the input pins and simplify the internal operation circuit of the processing elements by eliminating a few number of bits of the least significant bits of the input data. We simulated using the 256$\times$256 and 512$\times$512 Lena images to verify performance of the proposed method. As the result of simulation, we can decompress the original image with about 32dB(PSNR) in spite of elimination of the least significant 2-bit in the original input data, and additionally reduce the number of input pins up to 25% compared to VLSI array sharing input pins of range and domain.

  • PDF

Development of a Screening Method for Deforestation Area Prediction using Probability Model (확률모델을 이용한 산림전용지역의 스크리닝방법 개발)

  • Lee, Jung-Soo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • 제11권2호
    • /
    • pp.108-120
    • /
    • 2008
  • This paper discusses the prediction of deforestation areas using probability models from forest census database, Geographic information system (GIS) database and the land cover database. The land cover data was analyzed using remotely-sensed (RS) data of the Landsat TM data from 1989 to 2001. Over the analysis period of 12 years, the deforestation area was about 40ha. Most of the deforestation areas were attributable to road construction and residential development activities. About 80% of the deforestation areas for residential development were found within 100m of the road network. More than 20% of the deforestation areas for forest road construction were within 100m of the road network. Geographic factors and vegetation change detection (VCD) factors were used in probability models to construct deforestation occurrence map. We examined the size effect of area partition as training area and validation area for the probability models. The Bayes model provided a better deforestation prediction rate than that of the regression model.

  • PDF

A New Efficient Group-wise Spatial Multiplexing Design for Closed-Loop MIMO Systems (폐루프 다중입출력 시스템을 위한 효율적인 그룹별 공간 다중화 기법 설계)

  • Moon, Sung-Myun;Lee, Heun-Chul;Kim, Young-Tae;Lee, In-Kyu
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • 제35권4A호
    • /
    • pp.322-331
    • /
    • 2010
  • This paper introduces a new efficient design scheme for spatial multiplexing (SM) systems over closed loop multiple-input multiple-output (MIMO) wireless channels. Extending the orthogonalized spatial multiplexing (OSM) scheme which was developed recently for transmitting two data streams, we propose a new SM scheme where a larger number of data streams can be supported. To achieve this goal, we partition the data streams into several subblocks and execute the block-diagonalization process at the receiver. The proposed scheme still guarantees single-symbol maximum likelihood (ML) detection with small feedback information. Simulation results verify that the proposed scheme achieves a huge performance gain at a bit error rate (BER) of $10^{-4}$ over conventional closed-loop schemes based on minimum mean-square error (MSE) or bit error rate (BER) criterion. We also show that an additional 2.5dB gain can be obtained by optimizing the group selection with extra feedback information.

Extended Information Entropy via Correlation for Autonomous Attribute Reduction of BigData (빅 데이터의 자율 속성 감축을 위한 확장된 정보 엔트로피 기반 상관척도)

  • Park, In-Kyu
    • Journal of Korea Game Society
    • /
    • 제18권1호
    • /
    • pp.105-114
    • /
    • 2018
  • Various data analysis methods used for customer type analysis are very important for game companies to understand their type and characteristics in an attempt to plan customized content for our customers and to provide more convenient services. In this paper, we propose a k-mode cluster analysis algorithm that uses information uncertainty by extending information entropy to reduce information loss. Therefore, the measurement of the similarity of attributes is considered in two aspects. One is to measure the uncertainty between each attribute on the center of each partition and the other is to measure the uncertainty about the probability distribution of the uncertainty of each property. In particular, the uncertainty in attributes is taken into account in the non-probabilistic and probabilistic scales because the entropy of the attribute is transformed into probabilistic information to measure the uncertainty. The accuracy of the algorithm is observable to the result of cluster analysis based on the optimal initial value through extensive performance analysis and various indexes.

Classification of Proximity Relational Using Multiple Fuzzy Alpha Cut(MFAC) (MFAC를 사용한 근접관계의 분류)

  • Ryu, Kyung-Hyun;Chung, Hwan-Mook
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • 제18권1호
    • /
    • pp.139-144
    • /
    • 2008
  • Generally, real system that is the object of decision-making is very variable and sometimes it lies situations with uncertainty. To solve these problem, it has used statistical methods as significance level, certainty factor, sensitivity analysis and so on. In this paper, we propose a method for fuzzy decision-making based on MFAC(Multiple Fuzzy Alpha Cut) to improve the definiteness of classification results with similarity evaluation. In the proposed method, MFAC is used for extracting multiple a ${\alpha}$-level with proximity degree at proximity relation between relative Hamming distance and max-min method and for minimizing the number of data which are associated with the partition intervals extracted by MFAC. To determine final alternative of decision-making, we compute the weighted value between extracted data by MFAC From the experimental results, we can see the fact that the proposed method is simpler and more definite than classification performance of the conventional methods and determines an alternative efficiently for decision-maker by testing significance of sample data through statistical method.

Analysis of Saccharomyces Cell Cycle Expression Data using Bayesian Validation of Fuzzy Clustering (퍼지 클러스터링의 베이지안 검증 방법을 이용한 발아효모 세포주기 발현 데이타의 분석)

  • Yoo Si-Ho;Won Hong-Hee;Cho Sung-Bae
    • Journal of KIISE:Software and Applications
    • /
    • 제31권12호
    • /
    • pp.1591-1601
    • /
    • 2004
  • Clustering, a technique for the analysis of the genes, organizes the patterns into groups by the similarity of the dataset and has been used for identifying the functions of the genes in the cluster or analyzing the functions of unknown gones. Since the genes usually belong to multiple functional families, fuzzy clustering methods are more appropriate than the conventional hard clustering methods which assign a sample to a group. In this paper, a Bayesian validation method is proposed to evaluate the fuzzy partitions effectively. Bayesian validation method is a probability-based approach, selecting a fuzzy partition with the largest posterior probability given the dataset. At first, the proposed Bayesian validation method is compared to the 4 representative conventional fuzzy cluster validity measures in 4 well-known datasets where foray c-means algorithm is used. Then, we have analyzed the results of Saccharomyces cell cycle expression data evaluated by the proposed method.

Algorithm for Block Packing of Main Memory Allocation Problem (주기억장치 할당 문제의 블록 채우기 알고리즘)

  • Lee, Sang-Un
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • 제22권6호
    • /
    • pp.99-105
    • /
    • 2022
  • This paper deals with the problem of appropriately allocating multiple processors arriving at the ready queue to the block in the user space of the main memory is divided into blocks of variable size at compilation time. The existing allocation methods, first fit(FF), best fit(BF), worst fit(WF), and next fit(NF) methods, had the disadvantage of waiting for a specific processor because they failed to allocate all processors arriving at the ready queue. The proposed algorithm in this paper is a simple block packing algorithm that allocates as many processors as possible to the largest block by sorting the size of the partitioned blocks(holes) and the size of the processor in the ready queue in descending order. The application of the proposed algorithm to nine benchmarking experimental data showed the performance of allocating all processors while having minimal internal fragment(IF) for all eight data except one data in which the weiting processor occurs due to partition errors.