• Title/Summary/Keyword: Distribution Information

Search Result 11,984, Processing Time 0.043 seconds

Mining Quantitative Association Rules using Commercial Data Mining Tools (상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝)

  • Kang, Gong-Mi;Moon, Yang-Sae;Choi, Hun-Young;Kim, Jin-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.2
    • /
    • pp.97-111
    • /
    • 2008
  • Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.

A Comparative Study of Parametric Methods for Significant Gene Set Identification Depending on Various Expression Metrics (유전자 발현 메트릭에 기반한 모수적 방식의 유의 유전자 집합 검출 비교 연구)

  • Kim, Jae-Young;Shin, Mi-Young
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.1
    • /
    • pp.1-8
    • /
    • 2010
  • Recently lots of attention has been paid to gene set analysis for identifying differentially expressed gene-sets between two sample groups. Unlike earlier approaches, the gene set analysis enables us to find significant gene-sets along with their functional characteristics. For this reason, various novel approaches have been suggested lately for gene set analysis. As one of such, PAGE is a parametric approach that employs average difference (AD) as an expression metric to quantify expression differences between two sample groups and assumes that the distribution of gene scores is normal. This approach is preferred to non-parametric approach because of more effective performance. However, the metric AD does not reflect either gene expression intensities or variances over samples in calculating gene scores. Thus, in this paper, we investigate the usefulness of several other expression metrics for parametric gene-set analysis, which consider actual expression intensities of genes or their expression variances over samples. For this purpose, we examined three expression metrics, WAD (weighted average difference), FC (Fisher's criterion), and Abs_SNR (Absolute value of signal-to-noise ratio) for parametric gene set analysis and evaluated their experimental results.

An On-chip Cache and Main Memory Compression System Optimized by Considering the Compression rate Distribution of Compressed Blocks (압축블록의 압축률 분포를 고려해 설계한 내장캐시 및 주 메모리 압축시스템)

  • Yim, Keun-Soo;Lee, Jang-Soo;Hong, In-Pyo;Kim, Ji-Hong;Kim, Shin-Dug;Lee, Yong-Surk;Koh, Kern
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.31 no.1_2
    • /
    • pp.125-134
    • /
    • 2004
  • Recently, an on-chip compressed cache system was presented to alleviate the processor-memory Performance gap by reducing on-chip cache miss rate and expanding memory bandwidth. This research Presents an extended on-chip compressed cache system which also significantly expands main memory capacity. Several techniques are attempted to expand main memory capacity, on-chip cache capacity, and memory bandwidth as well as reduce decompression time and metadata size. To evaluate the performance of our proposed system over existing systems, we use execution-driven simulation method by modifying a superscalar microprocessor simulator. Our experimental methodology has higher accuracy than previous trace-driven simulation method. The simulation results show that our proposed system reduces execution time by 4-23% compared with conventional memory system without considering the benefits obtained from main memory expansion. The expansion rates of data and code areas of main memory are 57-120% and 27-36%, respectively.

Study on the Combustion Characteristics of a Small-Scale Orimulsion Boiler (소형 오리멀젼 보일러의 연소특성 연구)

  • Kim, Hey-Suk;Shin, Mi-Soo;Jang, Dong-Soon;Choi, Young-Chan;Lee, Jae-Gu
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.27 no.10
    • /
    • pp.1081-1089
    • /
    • 2005
  • In order to examine the application feasibility of Orimulsion fuel in a commercial boiler using heavy fuel oil, a numerical and experimental research efforts have been made especially to figure out the fundamental combustion characteristics of this fuel in a small-scale boiler. One of the notable combustion features of Orimulsion fuel is the delayed appearance of flame location with the flame shape of rather broad distribution, which is found experimentally and confirmed by numerical calculation. This kind of flame characteristics is considered due to the high moisture content included inherently in the process of Orimulsion manufacture together with micro-explosion by the existence of fine water droplets. In order to investigate the effect on the combustion characteristics of Orimulsion, a series of parametric investigation have been made in terms of important design and operational variables such as injected amount of fuel, types of atomization fluid, and phonemenological radiation model employed in the calculation, etc. The delayed feature of peak flame can be alleviated by the adjustment of the flow rate of injected fuel and the generating features of CO, $SO_2$ and NO gases are also evaluated in the boiler. When the steam injection as atomizing fluid is used, the combustion process is stabilized with the reduced region of high flame temperature. In general, the calculation results are physically acceptable and consistent but some refinements of phenomenological models are necessary for the better resolution of pollutant formation. From the results of this small-scale Orimulsion boiler, it is believed that a number of useful information are obtained with the working computer program for the near future application of Orimulsion fuel to a conventional boiler.

Research Trends for Nanotoxicity Using Soil Nematode Caenorhabditis elegans (토양선충 Caenorhabditis elegans를 이용한 나노독성 연구동향)

  • Kim, Shin Woong;Lee, Woo-Mi;An, Youn-Joo
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.34 no.12
    • /
    • pp.855-862
    • /
    • 2012
  • Caenorhabditis elegans, a free-living nematode mainly found in the soil pore water, roles the critical function in trophic levels, energy flow, and decomposition in soil ecosystem. C. elegans is commonly used species to test soil toxicity. Recently, they are employed broadly as a test organism in nanotoxicology. In this study, a review of the toxicity of nanomaterials for C. elegans was presented based on SCI (E) papers. The nanotoxicity studies using C. elegans have been reported in 20 instances including the mechanism of toxicity. Most studies used K-medium, S-medium, and NGM (Nematode Growth Medium) plate as an exposure medium to test toxicity of nanoparticles. The effects observed include anti aging, phototoxicity, genotoxicity, and dermal effects on C. elegans exposed to nanoparticles. We found that the toxic mechanisms were related with various aspects such as lifespan abnormality, oxidative stress, distribution of particles on inter-organisms, and stress-related gene analysis. C. elegans has advantage to test toxicity of nanoparticles due to various cellular activities, full genome information, and easy observation of transparent body. C. elegans was considered to be a good test species to evaluate the nanotoxicity.

An Efficient Subsequence Matching Method Based on Index Interpolation (인덱스 보간법에 기반한 효율적인 서브시퀀스 매칭 기법)

  • Loh Woong-Kee;Kim Sang-Wook
    • The KIPS Transactions:PartD
    • /
    • v.12D no.3 s.99
    • /
    • pp.345-354
    • /
    • 2005
  • Subsequence matching is one of the most important operations in the field of data mining. The existing subsequence matching algorithms use only one index, and their performance gets worse as the difference between the length of a query sequence and the site of windows, which are subsequences of a same length extracted from data sequences to construct the index, increases. In this paper, we propose a new subsequence matching method based on index interpolation to overcome such a problem. An index interpolation method constructs two or more indexes, and performs search ing by selecting the most appropriate index among them according to the given query sequence length. In this paper, we first examine the performance trend with the difference between the query sequence length and the window size through preliminary experiments, and formulate a search cost model that reflects the distribution of query sequence lengths in the view point of the physical database design. Next, we propose a new subsequence matching method based on the index interpolation to improve search performance. We also present an algorithm based on the search cost formula mentioned above to construct optimal indexes to get better search performance. Finally, we verify the superiority of the proposed method through a series of experiments using real and synthesized data sets.

Residual Battery Capacity and Signal Strength Based Power-aware Routing Protocol in MANET (MANET에서 배터리 잔량과 신호세기를 동시에 고려한 Power-aware 라우팅 프로토콜)

  • Park Gun-Woo;Choi Jong-Oh;Kim Hyoung-Jin;Song Joo-Seok
    • The KIPS Transactions:PartC
    • /
    • v.13C no.2 s.105
    • /
    • pp.219-226
    • /
    • 2006
  • The shortest path is only maintained during short time because network topology changes very frequently and each mobile nodes communicate each other by depending on battery in MANET(Mobile Ad-hoc Network). So many researches that are to overcome a limitation or consider a power have executed actively by many researcher. But these protocols are considered only one side of link stability or power consumption so we can make high of stability but power consumption isn't efficient. And also we can reduce power consumption of network but the protocol can't make power consumption of balancing. For that reason we suggest RBSSPR(Residual Battery Capacity and Signal Strength Based Power-aware Routing Protocol in MANET). The RBSSPR considers residual capacity of battery and signal strength so it keeps not only a load balancing but also minimizing of power consumption. The RBSSPR is based on AODV(Ad-hoc On-demand Distance Vector Routing). We use ns-2 for simulation. This simulation result shows that RBSSPR can extense lifetime of network through distribution of traffic that is centralized into special node and reducing of power consumption.

A Prostate Segmentation of TRUS Image using Average Shape Model and SIFT Features (평균 형상 모델과 SIFT 특징을 이용한 TRUS 영상의 전립선 분할)

  • Kim, Sang Bok;Seo, Yeong Geon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.3
    • /
    • pp.187-194
    • /
    • 2012
  • Prostate cancer is one of the most frequent cancers in men and is a major cause of mortality in the most of countries. In many diagnostic and treatment procedures for prostate disease, transrectal ultrasound(TRUS) images are being used because the cost is low. But, accurate detection of prostate boundaries is a challenging and difficult task due to weak prostate boundaries, speckle noises and the short range of gray levels. This paper proposes a method for automatic prostate segmentation in TRUS images using its average shape model and invariant features. This approach consists of 4 steps. First, it detects the probe position and the two straight lines connected to the probe using edge distribution. Next, it acquires 3 prostate patches which are in the middle of average model. The patches will be used to compare the features of prostate and nonprostate. Next, it compares and classifies which blocks are similar to 3 representative patches. Last, the boundaries from prior classification and the rough boundaries from first step are used to determine the segmentation. A number of experiments are conducted to validate this method and results showed that this new approach extracted the prostate boundary with less than 7.78% relative to boundary provided manually by experts.

Data Acquisition System Applying TMO for GIS Preventive Diagnostic System (GIS 예방진단시스템을 위한 TMO 응용 데이터 수집 시스템)

  • Kim, Tae-Wan;Kim, Yun-Gwan;Jang, Cheon-Hyeon
    • The KIPS Transactions:PartA
    • /
    • v.16A no.6
    • /
    • pp.481-488
    • /
    • 2009
  • GIS is used to isolate large power electrical equipment using SF6 gas. While GIS has simple structure, it has few break down, relatively high reliability. But it is hard to check up faults for reason of pressure. Faults of GIS should have a ripple effect on community and be hard to recovery. Consequently, GIS imports a preventive diagnostic system to find internal faults in advance. It is most important that reliability on the GIS preventive diagnostic system, because it estimates abnormality of system by analysis result of collected data. But, exist system which used central data management is low efficiency, and hard to guarantee timeliness and accuracy of data. To guarantee timeliness and accuracy, the GIS preventive diagnostic system needs accordingly to use a real-time middleware. So, in this paper, to improve reliability of the GIS preventive diagnostic system, we use a middleware based on TMO for guaranteeing timeliness of real-time distributed computing. And we propose an improved GIS preventive diagnostic system applying data acquisition, monitoring and control methods based on the TMO model. The presented system uses the Communication Control Unit(CCU) for distributed data handling which is supported by TMO. CCU can improve performance of the GIS preventive diagnostic system by guaranteeing timeliness of data handling process and increasing reliability of data through the TMO middleware. And, it has designed to take full charge of overload on a data acquisition task had been processed in an exist server. So, it could reduce overload of the server and apply distribution environment from now. Therefore, the proposed system can improve performance and reliability of the GIS preventive diagnostic system and contribute to stable operation of GIS.

The Analysis of the Number of Donations Based on a Mixture of Poisson Regression Model (포아송 분포의 혼합모형을 이용한 기부 횟수 자료 분석)

  • Kim In-Young;Park Su-Bum;Kim Byung-Soo;Park Tae-Kyu
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.1-12
    • /
    • 2006
  • The aim of this study is to analyse a survey data on the number of charitable donations using a mixture of two Poisson regression models. The survey was conducted in 2002 by Volunteer 21, an nonprofit organization, based on Koreans, who were older than 20. The mixture of two Poisson distributions is used to model the number of donations based on the empirical distribution of the data. The mixture of two Poisson distributions implies the whole population is subdivided into two groups, one with lesser number of donations and the other with larger number of donations. We fit the mixture of Poisson regression models on the number of donations to identify significant covariates. The expectation-maximization algorithm is employed to estimate the parameters. We computed 95% bootstrap confidence interval based on bias-corrected and accelerated method and used then for selecting significant explanatory variables. As a result, the income variable with four categories and the volunteering variable (1: experience of volunteering, 0: otherwise) turned out to be significant with the positive regression coefficients both in the lesser and the larger donation groups. However, the regression coefficients in the lesser donation group were larger than those in larger donation group.