• Title/Summary/Keyword: Skewed Data

Search Result 205, Processing Time 0.021 seconds

Hierarchical Organization of Embryo Data for Supporting Efficient Search (배아 데이터의 효율적 검색을 위한 계층적 구조화 방법)

  • Won, Jung-Im;Oh, Hyun-Kyo;Jang, Min-Hee;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.16-27
    • /
    • 2011
  • Embryo is a very early stage of the development of multicellular organism such as animals and plants. It is an important research target for studying ontogeny because the fundamental body system of multicellular organism is determined during an embryo state. Researchers in the developmental biology have a large volume of embryo image databases for studying embryos and they frequently search for an embryo image efficiently from those databases. Thus, it is crucial to organize databases for their efficient search. Hierarchical clustering methods have been widely used for database organization. However, most of previous algorithms tend to produce a highly skewed tree as a result of clustering because they do not simultaneously consider both the size of a cluster and the number of objects within the cluster. The skewed tree requires much time to be traversed in users' search process. In this paper, we propose a method that effectively organizes a large volume of embryo image data in a balanced tree structure. We first represent embryo image data as a similarity-based graph. Next, we identify clusters by performing a graph partitioning algorithm repeatedly. We check constantly the size of a cluster and the number of objects, and partition clusters whose size is too large or whose number of objects is too high, which prevents clusters from growing too large or having too many objects. We show the superiority of the proposed method by extensive experiments. Moreover, we implement the visualization tool to help users quickly and easily navigate the embryo image database.

Disproportional Insertion Policy for Improving Query Performance in RFID Tag Data Indices (RFID 태그 데이타 색인의 질의 성능 향상을 위한 불균형 삽입 정책)

  • Kim, Gi-Hong;Hong, Bong-Hee;Ahn, Sung-Woo
    • Journal of KIISE:Databases
    • /
    • v.35 no.5
    • /
    • pp.432-446
    • /
    • 2008
  • Queries for tracing tag locations are among the most challenging requirements in RFID based applications, including automated manufacturing, inventory tracking and supply chain management. For efficient query processing, a previous study proposed the index scheme for storing tag objects, based on the moving object index, in 3-dimensional domain with the axes being the tag identifier, the reader identifier, and the time. In a different way of a moving object index, the ranges of coordinates for each domain are quite different so that the distribution of query regions is skewed to the reader identifier domain. Previous indexes for tags, however, do not consider the skewed distribution for query regions. This results in producing many overlaps between index nodes and query regions and then causes the problem of traversing many index nodes. To solve this problem, we propose a new disproportional insertion and split policy of the index for RFID tags which is based on the R*-tree. For efficient insertion of tag data, our method derives the weighted margin for each node by using weights of each axis and margin of nodes. Based the weighted margin, we can choose the subtree and the split method in order to insert tag data with the minimum cost. Proposed insertion method also reduces the cost of region query by reducing overlapped area of query region and MBRs. Our experiments show that the index based on the proposed insertion and split method considerably improves the performance of queries than the index based on the previous methods.

Power Investigation of the Entropy-Based Test of Fit for Inverse Gaussian Distribution by the Information Discrimination Index

  • Choi, Byungjin
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.837-847
    • /
    • 2012
  • Inverse Gaussian distribution is widely used in applications to analyze and model right-skewed data. To assess the appropriateness of the distribution prior to data analysis, Mudholkar and Tian (2002) proposed an entropy-based test of fit. The test is based on the entropy power fraction(EPF) index suggested by Gokhale (1983). The simulation results report that the power of the entropy-based test is superior compared to other goodness-of-fit tests; however, this observation is based on the small-scale simulation results on the standard exponential, Weibull W(1; 2) and lognormal LN(0:5; 1) distributions. A large-scale simulation should be performed against various alternative distributions to evaluate the power of the entropy-based test; however, the use of a theoretical method is more effective to investigate the powers. In this paper, utilizing the information discrimination(ID) index defined by Ehsan et al. (1995) as a mathematical tool, we scrutinize the power of the entropy-based test. The selected alternative distributions are the gamma, Weibull and lognormal distributions, which are widely used in data analysis as an alternative to inverse Gaussian distribution. The study results are provided and an illustrative example is analyzed.

Bayesian quantile regression analysis of Korean Jeonse deposit

  • Nam, Eun Jung;Lee, Eun Kyung;Oh, Man-Suk
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.5
    • /
    • pp.489-499
    • /
    • 2018
  • Jeonse is a unique property rental system in Korea in which a tenant pays a part of the price of a leased property as a fixed amount security deposit and gets back the entire deposit when the tenant moves out at the end of the tenancy. Jeonse deposit is very important in the Korean real estate market since it is directly related to the residential property sales price and it is a key indicator to predict future real estate market trend. Jeonse deposit data shows a skewed and heteroscedastic distribution and the commonly used mean regression model may be inappropriate for the analysis of Jeonse deposit data. In this paper, we apply a Bayesian quantile regression model to analyze Jeonse deposit data, which is non-parametric and does not require any distributional assumptions. Analysis results show that the quantile regression coefficients of most explanatory variables change dramatically for different quantiles. The regression coefficients of some variables have different signs for different quantiles, implying that even the same variable may affect the Jeonse deposit in the opposite direction depending on the amount of deposit.

Performance Analysis of Multimedia File System

  • Park, Jinyoun;Youjip Won;Jaideep Srivastava
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.100-102
    • /
    • 2001
  • Intensive I/O bandwidth demand of the multimedia streaming service puts significant burden on file system. Different from the legacy text based or image data, the semantics of the data in multimedia format can be significantly affected if the data block is not delivered by the predefined deadline. The legacy file system used in Unix or Unix like environment is designed to efficiently handle the files who sizes range from few hundreds of byte to several tens of gigabytes. This fundamental design philosophy results in the file system based on multi level skewed tree structure. Multi level i-node structure has significant drawback when the application performs sequential read operation. In this article, we present the result of the performance study of the file system which is specifically designed for handling multimedia streams. We implemented the file system on Linux Operating System environment and examines the performance behavior of the file system under streaming I/O workload. The result of the study shows that the proposed file system performs much more efficiently than the ext2 file system of Linux does.

Efficient striping policy of NOD data on clustered storage server (Clustered Storage Server 환경에서 뉴스 데이터에 적합한 분산 저장방법)

  • 정귀옥;박성호;김영주;정기동
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 1998.10a
    • /
    • pp.89-91
    • /
    • 1998
  • 현대 사회의 정보 요구 증가와 편리함의 추구는 정보통신 기술의 발달과 함께 멀티미디어 데이터 서비스를 급증 시켰다. NOD 데이터의 경우 이러한 요구에 부합하므로, 많은 사용자를 가지게 될 것이며, 그에 따른 제반 요건으로 서버 구현에서 scalability, availability, reliability 등이 중요한 요건이다. 따라서 이러한 요건을 멀티미디어 데이터 특성을 이용한 저장 방법으로 만족시키려는 많은 연구가 있다. 그러나 NOD 시스템에 대한 연구는 미흡한 실정이며 clustered 환경에서의 New 데이터에 대한 연구는 거의 없다. VOD 데이터에 적합한 것으로 알려진 일반적인 저장 방법이 NOD 데이터에 반드시 적합한 것이 아니며, 본 논문에서는 기존에 연구된 데이터 저장 방법 중에서 NOD 데이터의 small volume, skewed popularity distribution 등의 특성을 고려하여 clustered storage server환경에 맞는 striping 정책을 찾는다.

  • PDF

CMOS Clockless Wave Pipelined Adder Using Edge-Sensing Completion Detection (에지완료 검출을 이용한 클럭이 없는 CMOS 웨이브파이프라인 덧셈기 설계)

  • Ahn, Yong-Sung;Kang, Jin-Ku
    • Journal of IKEEE
    • /
    • v.8 no.2 s.15
    • /
    • pp.161-165
    • /
    • 2004
  • In this paper, an 8bit wave pipelined adder using the static CMOS plus Edge-Sensing Completion Detection Logic is presented. The clockless wave-pipelining algorithm was implemented in the circuit design. The Edge-Sensing Completion Detection (ESCD) in the algorithm is consisted of edge-sensing circuits and latches. Using the algorithm, skewed data at the output of 8bit adder could be aligned. Simulation results show that the adder operates at 1GHz in $0.35{\mu}m$ CMOS technology with 3.3V supply voltage.

  • PDF

A digital frame phse aligner in SDH-based transmission system (SDH 동기식 전송시스템의 디지철 프레임 위상 정열기)

  • 이상훈;성영권
    • Journal of the Korean Institute of Telematics and Electronics S
    • /
    • v.34S no.12
    • /
    • pp.10-18
    • /
    • 1997
  • The parallel trabutary signals in the SDH-based transmission system have the frame phase skew due to uneven transmission delays in the data and the clock path. This phase skew must be eliminated prior to synchronously multiplexing process. A new twenty-four channel, 51.84Mb/s DFPA(Digital Frame Phase Aligner) has been designed and fabricated in 0.8.mu.m CMOS gate array. This unique device phase-aligns the skewed input signals with refernce frame synchronous signal and reference clok for subsequent synchronous multiplexing process. the performance of fabricated device is evaluated by the STM-16 transmission system and DS-3 meansurement set. The frame phase margin of +2/-3 bit periods has been demonstrated.

  • PDF

Manager's Attitude about Health Management of Workers in Coal Mine Industry (석탄광업소장의 근로자 건강관리에 대한 태도)

  • Rhee, Kyung-Yong;Hong, Jeong-Pyo
    • Journal of Preventive Medicine and Public Health
    • /
    • v.22 no.2 s.26
    • /
    • pp.197-207
    • /
    • 1989
  • This study was planned to investigate employer's attitude about health management of workers in coal mine industry. The sample size was 38.3% (178 coal mine industries) of total 463 coal mine industries. The mailing survey was used to collecting data of coal mine industry and manager of coal mine industry. Distribution of attitude about health management of workers in coal mine industry, specifically necessity and availability of some items of health management and some apparatus of working environment, was skewed to positive attitude. While recognition of susceptability of coal workers' pneumoconiosis was low, that of seriousness of incidence of coal workers' pneumoconiosis was high.

  • PDF

Comparison of several computational turbulence models with full-scale measurements of flow around a building

  • Wright, N.G.;Easom, G.J.
    • Wind and Structures
    • /
    • v.2 no.4
    • /
    • pp.305-323
    • /
    • 1999
  • Accurate turbulence modeling is an essential prerequisite for the use of Computational Fluid Dynamics (CFD) in Wind Engineering. At present the most popular turbulence model for general engineering flow problems is the ${\kappa}-{\varepsilon}$ model. Models such as this are based on the isotropic eddy viscosity concept and have well documented shortcomings (Murakami et al. 1993) for flows encountered in Wind Engineering. This paper presents an objective assessment of several available alternative models. The CFD results for the flow around a full-scale (6 m) three-dimensional surface mounted cube in an atmospheric boundary layer are compared with recently obtained data. Cube orientations normal and skewed at $45^{\circ}$ to the incident wind have been analysed at Reynolds at Reynolds number of greater than $10^6$. In addition to turbulence modeling other aspects of the CFD procedure are analysed and their effects are discussed.