• Title/Summary/Keyword: Data Weights

Search Result 1,427, Processing Time 0.03 seconds

A PNN approach for combining multiple forecasts (예측치 결합을 위한 PNN 접근방법)

  • Jun, Duk-Bin;Shin, Hyo-Duk;Lee, Jung-Jin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.26 no.3
    • /
    • pp.193-199
    • /
    • 2000
  • In many studies, considerable attention has been focussed upon choosing a model which represents underlying process of time series and forecasting the future. In the real world, however, there may be some cases that one model can not reflect all the characteristics of original time series. Under such circumstances, we may get better performance by combining the forecasts from several models. The most popular methods for combining forecasts involve taking a weighted average of multiple forecasts. But the weights are usually unstable. In cases the assumptions of normality and unbiasedness for forecast errors are satisfied, a Bayesian method can be used for updating the weights. In the real world, however, there are many circumstances the Bayesian method is not appropriate. This paper proposes a PNN(Probabilistic Neural Net) approach as a method for combining forecasts that can be applied when the assumption of normality or unbiasedness for forecast errors is not satisfied. In this paper, PNN method, which is similar to Bayesian approach, is suggested as an updating method of the unstable weights in the combination of the forecasts. The PNN method has been usually used in the field of pattern recognition. Unlike the Bayesian approach, it requires no assumption of a specific prior distribution because it gets probabilities by using the distribution estimated from given data. Empirical results reveal that the PNN method offers superior predictive capabilities.

  • PDF

Varietal Difference of Leaf Breakdown in Field of Flue-Cured Tobacco (Nicotiana tabacum L.) (황색종 연초(Nicotiana tabacum L.)에서 엽탈락의 품종간 차이)

  • 조수헌
    • Journal of the Korean Society of Tobacco Science
    • /
    • v.10 no.2
    • /
    • pp.93-98
    • /
    • 1988
  • This study was conducted to obtain basic information for varietal difference of leaf breakdown in field of flue-cured tobacco at Taegu Experiment Station, Korea Ginseng & Tobacco Research Institute in 1987. The experiment was designed in randomized block with 3 replications, data ware analysed as Split split-plot design. Main plots were varieties, sub-plots were leaf positions, 4, 5 and 6th from bottom, and each sub-plot was divided into 3 parts with distance of midrib, 7 10 and 13em from stalk. Four varieties, NC 95, NC 2326, NC 82 and BY 4 were transplanted in 15 April, and measured the weight of leaf breakdown by artificially weighted in 5 June. The results obtained are as follows : 1. Weights of leaf breakdown according to leaf position, NC 95 were lower as 358-5799 than those of other varieties as 555-597g, were not significantly different regardless of varieties. 2. Weights of leaf breakdown in relation to distance of midrib from stack, NC 95 were lower as 309-419g than those of other varieties as 472-710g. 3. Weights of leaf breakdown were significantly different according to distance of midrib from stalk, and not significantly different according to leaf position under the same distance of midrib from stalk regardless of varieties.

  • PDF

A weighted method for evaluating software quality (가중치를 적용한 소프트웨어 품질 평가 방법)

  • Jung, Hye Jung
    • Journal of Digital Convergence
    • /
    • v.19 no.8
    • /
    • pp.249-255
    • /
    • 2021
  • This study proposed a method for determining weights for the eight quality characteristics, such as functionality, reliability, usability, maintainability, portability, efficiency, security, and interoperability, which are suggested by international standards, focusing on software test reports. Currently, the test results for software quality evaluation apply the same weight to 8 quality characteristics to obtain the arithmetic average. Weights for 8 quality characteristics were applied using the results from text analysis, and weights were applied using the results of text analysis of test reports for two products. It was confirmed that the average of test reports according to the weighted quality characteristics was more efficient.

Blind Signal Separation Using Eigenvectors as Initial Weights in Delayed Mixtures (지연혼합에서의 초기 값으로 고유벡터를 이용하는 암묵신호분리)

  • Park, Jang-Sik;Son, Kyung-Sik;Park, Keun-Soo
    • The Journal of the Acoustical Society of Korea
    • /
    • v.25 no.1
    • /
    • pp.14-20
    • /
    • 2006
  • In this paper. a novel technique to set up the initial weights in BSS of delayed mixtures is proposed. After analyzing Eigendecomposition for the correlation matrix of mixing data. the initial weights are set from the Eigenvectors ith delay information. The Proposed setting of initial weighting method for conventional FDICA technique improved the separation Performance. The computer simulation shows that the Proposed method achieves the improved SIR and faster convergence speed of learning curve.

Neural Networks and Logistic Models for Classification: A Case Study

  • Hwang, Chang-Ha
    • Journal of the Korean Data and Information Science Society
    • /
    • v.7 no.1
    • /
    • pp.13-19
    • /
    • 1996
  • In this paper, we study and compare two types of methods for classification when both continuous and categorical variables are used to describe each individual. One is neural network(NN) method using backpropagation learning(BPL). The other is logistic model(LM) method. Both the NN and LM are based on projections of the data in directions determined from interconnection weights.

  • PDF

A Study on the Improvement Methods for Sausage Stuffing Process

  • Lee, Jae-Man;Cha, Young-Joon;Hong, Yeon-Woong
    • Journal of the Korean Data and Information Science Society
    • /
    • v.16 no.2
    • /
    • pp.391-399
    • /
    • 2005
  • Consider a stuffing process where sausage-casings are filled with sausage-kneading. One of the most important factors in the stuffing process is weights of stuffed sausages. Sausages weighting above the specified limit are sold in a regular market price for a fixed price, and underfilled sausages are reworked at the expense of reprocessing cost. In this paper, the sausage stuffing process is inspected for improving productivity and quality levels. Several statistical process control tools are suggested by using real data obtained from a Korean Vienna sausage company.

  • PDF

REGRESSION WITH CENSORED DATA BY LEAST SQUARES SUPPORT VECTOR MACHINE

  • Kim, Dae-Hak;Shim, Joo-Yong;Oh, Kwang-Sik
    • Journal of the Korean Statistical Society
    • /
    • v.33 no.1
    • /
    • pp.25-34
    • /
    • 2004
  • In this paper we propose a prediction method on the regression model with randomly censored observations of the training data set. The least squares support vector machine regression is applied for the regression function prediction by incorporating the weights assessed upon each observation in the optimization problem. Numerical examples are given to show the performance of the proposed prediction method.

One-step spectral clustering of weighted variables on single-cell RNA-sequencing data (단세포 RNA 시퀀싱 데이터를 위한 가중변수 스펙트럼 군집화 기법)

  • Park, Min Young;Park, Seyoung
    • The Korean Journal of Applied Statistics
    • /
    • v.33 no.4
    • /
    • pp.511-526
    • /
    • 2020
  • Single-cell RNA-sequencing (scRNA-seq) data consists of each cell's RNA expression extracted from large populations of cells. One main purpose of using scRNA-seq data is to identify inter-cellular heterogeneity. However, scRNA-seq data pose statistical challenges when applying traditional clustering methods because they have many missing values and high level of noise due to technical and sampling issues. In this paper, motivated by analyzing scRNA-seq data, we propose a novel spectral-based clustering method by imposing different weights on genes when computing a similarity between cells. Assigning weights on genes and clustering cells are performed simultaneously in the proposed clustering framework. We solve the proposed non-convex optimization using an iterative algorithm. Both real data application and simulation study suggest that the proposed clustering method better identifies underlying clusters compared with existing clustering methods.

Adjustment of Korean Birth Weight Data (한국 신생아의 출생체중 데이터 보정)

  • Shin, Hyungsik
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.21 no.2
    • /
    • pp.259-264
    • /
    • 2017
  • Birth weight of a new born baby provides very important information in evaluating many clinical issues such as fetal growth restriction. This paper analyzes birth weight data of babies born in Korea from 2011 to 2013, and it shows that there is a biologically implausible distribution of birth weights in the data. This implies that some errors may be generated in the data collection process. In particular, this paper analyzes the relationship between gestational period and birth weight, and it is shown that the birth weight data mostly of gestational periods from 28 to 32 weeks have noticeable errors. Therefore, this paper employs the finite Gaussian mixture model to classify the collected data points into two classes: non-corrupted and corrupted. After the classification the paper removes data points that have been predicted to be corrupted. This adjustment scheme provides more natural and medically plausible percentile values of birth weights for all the gestational periods.

A Folksonomy Ranking Framework: A Semantic Graph-based Approach (폭소노미 사이트를 위한 랭킹 프레임워크 설계: 시맨틱 그래프기반 접근)

  • Park, Hyun-Jung;Rho, Sang-Kyu
    • Asia pacific journal of information systems
    • /
    • v.21 no.2
    • /
    • pp.89-116
    • /
    • 2011
  • In collaborative tagging systems such as Delicious.com and Flickr.com, users assign keywords or tags to their uploaded resources, such as bookmarks and pictures, for their future use or sharing purposes. The collection of resources and tags generated by a user is called a personomy, and the collection of all personomies constitutes the folksonomy. The most significant need of the folksonomy users Is to efficiently find useful resources or experts on specific topics. An excellent ranking algorithm would assign higher ranking to more useful resources or experts. What resources are considered useful In a folksonomic system? Does a standard superior to frequency or freshness exist? The resource recommended by more users with mere expertise should be worthy of attention. This ranking paradigm can be implemented through a graph-based ranking algorithm. Two well-known representatives of such a paradigm are Page Rank by Google and HITS(Hypertext Induced Topic Selection) by Kleinberg. Both Page Rank and HITS assign a higher evaluation score to pages linked to more higher-scored pages. HITS differs from PageRank in that it utilizes two kinds of scores: authority and hub scores. The ranking objects of these pages are limited to Web pages, whereas the ranking objects of a folksonomic system are somewhat heterogeneous(i.e., users, resources, and tags). Therefore, uniform application of the voting notion of PageRank and HITS based on the links to a folksonomy would be unreasonable, In a folksonomic system, each link corresponding to a property can have an opposite direction, depending on whether the property is an active or a passive voice. The current research stems from the Idea that a graph-based ranking algorithm could be applied to the folksonomic system using the concept of mutual Interactions between entitles, rather than the voting notion of PageRank or HITS. The concept of mutual interactions, proposed for ranking the Semantic Web resources, enables the calculation of importance scores of various resources unaffected by link directions. The weights of a property representing the mutual interaction between classes are assigned depending on the relative significance of the property to the resource importance of each class. This class-oriented approach is based on the fact that, in the Semantic Web, there are many heterogeneous classes; thus, applying a different appraisal standard for each class is more reasonable. This is similar to the evaluation method of humans, where different items are assigned specific weights, which are then summed up to determine the weighted average. We can check for missing properties more easily with this approach than with other predicate-oriented approaches. A user of a tagging system usually assigns more than one tags to the same resource, and there can be more than one tags with the same subjectivity and objectivity. In the case that many users assign similar tags to the same resource, grading the users differently depending on the assignment order becomes necessary. This idea comes from the studies in psychology wherein expertise involves the ability to select the most relevant information for achieving a goal. An expert should be someone who not only has a large collection of documents annotated with a particular tag, but also tends to add documents of high quality to his/her collections. Such documents are identified by the number, as well as the expertise, of users who have the same documents in their collections. In other words, there is a relationship of mutual reinforcement between the expertise of a user and the quality of a document. In addition, there is a need to rank entities related more closely to a certain entity. Considering the property of social media that ensures the popularity of a topic is temporary, recent data should have more weight than old data. We propose a comprehensive folksonomy ranking framework in which all these considerations are dealt with and that can be easily customized to each folksonomy site for ranking purposes. To examine the validity of our ranking algorithm and show the mechanism of adjusting property, time, and expertise weights, we first use a dataset designed for analyzing the effect of each ranking factor independently. We then show the ranking results of a real folksonomy site, with the ranking factors combined. Because the ground truth of a given dataset is not known when it comes to ranking, we inject simulated data whose ranking results can be predicted into the real dataset and compare the ranking results of our algorithm with that of a previous HITS-based algorithm. Our semantic ranking algorithm based on the concept of mutual interaction seems to be preferable to the HITS-based algorithm as a flexible folksonomy ranking framework. Some concrete points of difference are as follows. First, with the time concept applied to the property weights, our algorithm shows superior performance in lowering the scores of older data and raising the scores of newer data. Second, applying the time concept to the expertise weights, as well as to the property weights, our algorithm controls the conflicting influence of expertise weights and enhances overall consistency of time-valued ranking. The expertise weights of the previous study can act as an obstacle to the time-valued ranking because the number of followers increases as time goes on. Third, many new properties and classes can be included in our framework. The previous HITS-based algorithm, based on the voting notion, loses ground in the situation where the domain consists of more than two classes, or where other important properties, such as "sent through twitter" or "registered as a friend," are added to the domain. Forth, there is a big difference in the calculation time and memory use between the two kinds of algorithms. While the matrix multiplication of two matrices, has to be executed twice for the previous HITS-based algorithm, this is unnecessary with our algorithm. In our ranking framework, various folksonomy ranking policies can be expressed with the ranking factors combined and our approach can work, even if the folksonomy site is not implemented with Semantic Web languages. Above all, the time weight proposed in this paper will be applicable to various domains, including social media, where time value is considered important.