• Title/Summary/Keyword: data partition

Search Result 416, Processing Time 0.026 seconds

Software Development Effort Estimation Using Partition of Project Delivery Rate Group (프로젝트 인도율 그룹 분할 방법을 이용한 소프트웨어 개발노력 추정)

  • Lee, Sang-Un;No, Myeong-Ok;Lee, Bu-Gwon
    • The KIPS Transactions:PartD
    • /
    • v.9D no.2
    • /
    • pp.259-266
    • /
    • 2002
  • The main issue in software development is the ability of software project effort and cost estimation in the early phase of software life cycle. The regression models for project effort and cost estimation are presented by function point that is a software sire. The data sets used to conduct previous studies are of ten small and not too recent. Applying these models to 789 project data developed from 1990 ; the models only explain fewer than 0.53 $R^2$(Coefficient of determination) of the data variation. Homogeneous group in accordance with project delivery rate (PDR) divides the data sets. Then this paper presents general effort estimation models using project delivery rate. The presented model has a random distribution of residuals and explains more than 0.93 $R^2$ of data variation in most of PDR ranges.

An Optimal Cluster Analysis Method with Fuzzy Performance Measures (퍼지 성능 측정자를 결합한 최적 클러스터 분석방법)

  • 이현숙;오경환
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.6 no.3
    • /
    • pp.81-88
    • /
    • 1996
  • Cluster analysis is based on partitioning a collection of data points into a number of clusters, where the data points in side a cluster have a certain degree of similarity and it is a fundamental process of data analysis. So, it has been playing an important role in solving many problems in pattern recognition and image processing. For these many clustering algorithms depending on distance criteria have been developed and fuzzy set theory has been introduced to reflect the description of real data, where boundaries might be fuzzy. If fuzzy cluster analysis is tomake a significant contribution to engineering applications, much more attention must be paid to fundamental questions of cluster validity problem which is how well it has identified the structure that is present in the data. Several validity functionals such as partition coefficient, claasification entropy and proportion exponent, have been used for measuring validity mathematically. But the issue of cluster validity involves complex aspects, it is difficult to measure it with one measuring function as the conventional study. In this paper, we propose four performance indices and the way to measure the quality of clustering formed by given learning strategy.

  • PDF

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

Domain Decomposition Strategy for Pin-wise Full-Core Monte Carlo Depletion Calculation with the Reactor Monte Carlo Code

  • Liang, Jingang;Wang, Kan;Qiu, Yishu;Chai, Xiaoming;Qiang, Shenglong
    • Nuclear Engineering and Technology
    • /
    • v.48 no.3
    • /
    • pp.635-641
    • /
    • 2016
  • Because of prohibitive data storage requirements in large-scale simulations, the memory problem is an obstacle for Monte Carlo (MC) codes in accomplishing pin-wise three-dimensional (3D) full-core calculations, particularly for whole-core depletion analyses. Various kinds of data are evaluated and quantificational total memory requirements are analyzed based on the Reactor Monte Carlo (RMC) code, showing that tally data, material data, and isotope densities in depletion are three major parts of memory storage. The domain decomposition method is investigated as a means of saving memory, by dividing spatial geometry into domains that are simulated separately by parallel processors. For the validity of particle tracking during transport simulations, particles need to be communicated between domains. In consideration of efficiency, an asynchronous particle communication algorithm is designed and implemented. Furthermore, we couple the domain decomposition method with MC burnup process, under a strategy of utilizing consistent domain partition in both transport and depletion modules. A numerical test of 3D full-core burnup calculations is carried out, indicating that the RMC code, with the domain decomposition method, is capable of pin-wise full-core burnup calculations with millions of depletion regions.

Pareto Analysis of Experimental Data by L18(2 X 37) Orthogonal Array (L18(2 X 37) 직교배열표 실험자료에 대한 파레토 그림 분석)

  • 임용빈
    • The Korean Journal of Applied Statistics
    • /
    • v.17 no.3
    • /
    • pp.499-505
    • /
    • 2004
  • The Pareto diagram analysis of the experimental data by the two level orthogonal arrays has been used widely in practice since it is a graphical, quick and easy method to analyze experimental results, which does not use the analysis of variance to screen significant effects. For the analysis of the experimental data by $L_{18}(2 \times 3^7)$ orthogonal array, Park(1996) proposed Pareto ANOVA in which the size of effects is defined by the mean squares of effects and the Pareto principle is used. In this paper, a new approach of the Pareto diagram analysis of the experimental data by $L_{18}(2 \times 3^7)$ orthogonal array is proposed. The main idea is to partition the size of three level effects by that of linear and quadratic orthogonal contrasts of those effects.

An Index Structure based on Space Partitions and Adaptive Bit Allocations for Multi-Dimensional Data (다차원 데이타를 위한 공간 분할 및 적응적 비트 할당 기반 색인 구조)

  • Bok, Kyoung-Soo;Kim, Eun-Jae;Yoo, Jae-Soo
    • Journal of KIISE:Databases
    • /
    • v.32 no.5
    • /
    • pp.509-525
    • /
    • 2005
  • In this paper, we propose the index structure based on a vector approximation for efficiently supporting the similarity search of multi-dimensional data. The proposed index structure splits a region with the space partition method and allocates to the split region dynamic bits according to the distribution of data. Therefore, the index structure splits a region to the unoverlapped regions and can reduce the depth of the tree by storing the much region information of child nodes in a internal node. Our index structure represents the child node more exactly and provide the efficient search by representing the region information of the child node relatively using the region information of the parent node. We show that our proposed index structure is better than the existing index structure in various experiments. Experimental results show that our proposed index structure achieves about $40\%$ performance improvements on search performance over the existing method.

Low-Power Multiplier Using Input Data Partition (입력 데이터 분할을 이용한 저전력 부스 곱셈기 설계)

  • Park Jongsu;Kim Jinsang;Cho Won-Kyung
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.30 no.11A
    • /
    • pp.1092-1097
    • /
    • 2005
  • In this paper, we propose a low-power Booth multiplication which reduces the switching activities of partial products during multiplication process. Radix-4 Booth algorithm has a characteristic that produces the Booth encoded products with zero when input data have sequentially equal values (0 or 1). Therefore, partial products have higher chances of being zero when an input with a smaller effective dynamic range of two multiplication inputs is used as a multiplier data instead of a multiplicand. The proposed multiplier divides a multiplication expression into several multiplication expressions with smaller bits than those of an original input data, and each multiplication is computed independently for the Booth encoding. Finally, the results of each multiplication are added. This means that the proposed multiplier has a higher chance to have zero encoded products so that we can implement a low power multiplier with the smaller switching activity. Implementation results show the proposed multiplier can save maximally about $20\%$ power dissipation than a previous Booth multiplier.

Improving Fault Tolerance for High-capacity Shared Distributed File Systems using the Rotational Lease Under Network Partitioning (대용량 공유 분산 화일 시스템에서 망 분할 시 순환 리스를 사용한 고장 감내성 향상)

  • Tak, Byung-Chul;Chung, Yon-Dohn;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.32 no.6
    • /
    • pp.616-627
    • /
    • 2005
  • In the shared storage file system, systems can directly access the shared storage device through specialized data-only subnetwork unlike in the network attached file server system. In this shared-storage architecture, data consistency is maintained by some designated set of lock servers which use control network to send and receive the lock information. Furthermore, lease mechanism is introduced to cope with the control network failure. But when the control network is partitioned, participating systems can no longer make progress after the lease term expires until the network recovers. This paper addresses this limitation and proposes a method that allows partitioned systems to make progress under the partition of control network. The proposed method works in a manner that each participating system is rotationally given a predefined lease term periodically. It is also shown that the proposed mechanism always preserves data consistency.

Comparison of Algorithms for Generating Parametric Image of Cerebral Blood Flow Using ${H_2}^{15}O$ PET Positron Emission Tomography (${H_2}^{15}O$ PET을 이용한 뇌혈류 파라메트릭 영상 구성을 위한 알고리즘 비교)

  • Lee, Jae-Sung;Lee, Dong-Soo;Park, Kwang-Suk;Chung, June-Key;Lee, Myung-Chul
    • The Korean Journal of Nuclear Medicine
    • /
    • v.37 no.5
    • /
    • pp.288-300
    • /
    • 2003
  • Purpose: To obtain regional blood flow and tissue-blood partition coefficient with time-activity curves from ${H_2}^{15}O$ PET, fitting of some parameters in the Kety model is conventionally accomplished by nonlinear least squares (NLS) analysis. However, NLS requires considerable compuation time then is impractical for pixel-by-pixel analysis to generate parametric images of these parameters. In this study, we investigated several fast parameter estimation methods for the parametric image generation and compared their statistical reliability and computational efficiency. Materials and Methods: These methods included linear least squres (LLS), linear weighted least squares (LWLS), linear generalized least squares (GLS), linear generalized weighted least squares (GWLS), weighted Integration (WI), and model-based clustering method (CAKS). ${H_2}^{15}O$ dynamic brain PET with Poisson noise component was simulated using numerical Zubal brain phantom. Error and bias in the estimation of rCBF and partition coefficient, and computation time in various noise environments was estimated and compared. In audition, parametric images from ${H_2}^{15}O$ dynamic brain PET data peformed on 16 healthy volunteers under various physiological conditions was compared to examine the utility of these methods for real human data. Results: These fast algorithms produced parametric images with similar image qualify and statistical reliability. When CAKS and LLS methods were used combinedly, computation time was significantly reduced and less than 30 seconds for $128{\times}128{\times}46$ images on Pentium III processor. Conclusion: Parametric images of rCBF and partition coefficient with good statistical properties can be generated with short computation time which is acceptable in clinical situation.

Characteristics of Gas Furnace Process by Means of Partition of Input Spaces in Trapezoid-type Function (사다리꼴형 함수의 입력 공간분할에 의한 가스로공정의 특성분석)

  • Lee, Dong-Yoon
    • Journal of Digital Convergence
    • /
    • v.12 no.4
    • /
    • pp.277-283
    • /
    • 2014
  • Fuzzy modeling is generally using the given data and the fuzzy rules are established by the input variables and the space division by selecting the input variable and dividing the input space for each input variables. The premise part of the fuzzy rule is presented by selection of the input variables, the number of space division and membership functions and in this paper the consequent part of the fuzzy rule is identified by polynomial functions in the form of linear inference and modified quadratic. Parameter identification in the premise part devides input space Min-Max method using the minimum and maximum values of input data set and C-Means clustering algorithm forming input data into the hard clusters. The identification of the consequence parameters, namely polynomial coefficients, of each rule are carried out by the standard least square method. In this paper, membership function of the premise part is dividing input space by using trapezoid-type membership function and by using gas furnace process which is widely used in nonlinear process we evaluate the performance.