• Title/Summary/Keyword: Bootstrap technique

Search Result 59, Processing Time 0.022 seconds

Forcing a Closer Fit in the Lower Tails of a Distribution for Better Estimating Extremely Small Percentiles of Strengths

  • Guess, Frank-M.;Leon, Ramon-V.;Chen, Weiwei;Young, Timothy-M.
    • International Journal of Reliability and Applications
    • /
    • v.5 no.4
    • /
    • pp.129-145
    • /
    • 2004
  • We use a novel, forced censoring technique that closer fits the lower tails of strenth distributions to better estimate extremly smaller percentiles for measuring progress in continuous improvement initiatives. These percentiles are of greater interest for companies, government oversight organizations, and consumers concerned with safely and preventing accidents for many products in general, but specifically for medium density fiberboard (MDF). The international industrial standard for MDF for measuring highest quality is internal bond (IB, also called tensile strengh) and its smaller percentiles are crucial, especially the first percentile and lower ones. We induce censoring at a value just above the median to weight lower observations more. Using this approach, we have better fits in the lower tails of the distribution, where these samller percentiles are impacted most. Finally, bootstrap estimates of the small percentiles are used to demonstrate improved intervals by our forced censoring approach and the fitted model. There was evidence from the study to suggest that MDF has potentially different failure modes for early failures. Overall, our approach is parsimonious and is suitable for real time manufacturing settings. The approach works for either strengths distributions or lifetime distributions.

  • PDF

Estimating the design flood interval of agricultural reservoirs using a non-parametric resampling technique (비매개변수적 리샘플링 기법 기반 농업용 저수지 설계홍수량 구간 추정 기법)

  • Park, Jihoon;Kang, Moon Seong;Kim, Keuk Soo;Choi, Kyu Hyun;Cho, Hyo Seob
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.397-397
    • /
    • 2021
  • 본 연구의 목적은 비매개변수적 리샘플링 기법을 이용하여 농업용 저수지 유입 설계홍수량의 구간을 추정하는 기법을 제안하는 데 있다. 본 연구는 설계홍수량을 점 추정하여 안전계수(safety factor)를 적용하는 기존 방법에 대한 대안을 제시하고자 한다. 설계홍수량의 구간 추정을 수행하기 위해 부트스트랩 기법(bootstrap technique)을 사용하였다. 부트스트랩 기법을 이용하여 95% 신뢰수준에 해당하는 신뢰구간을 추정하였다. 본 연구의 공간적인 범위는 남한의 30개 농업용 저수지이며, 시간적인 범위는 과거 기간(2015s: 1986-2015)과 미래기간(2040s: 2011-2040, 2070s: 2041-2070, 2100s: 2071-2100)을 설정하였다. 본 연구에서는 200년 빈도, 24시간 지속기간을 대표적인 결과로 선정하여 분석하였다. 빈도분석은 GEV 분포를 사용하였고, L-moment 방법을 이용하여 매개변수를 추정하였다. 설계홍수량은 HEC-1 모형을 이용하여 산정하였다. 최종적으로 설계홍수량 구간 추정한 결과를 기존의 점 추정한 뒤 안전계수를 적용한 기존 방법과 비교하였다. 97.5th BCa percentile 기준으로 상대적인 변화를 비교해보면, 미래로 갈수록 구간 추정으로 산정한 설계홍수량이 점차 증가하는 것으로 도출되었다. 한강 및 금강 유역에 위치한 농업용 저수지의 설계홍수량이 낙동강 유역에 비해 상대적으로 큰 변화를 보여주었다. 몇몇 농업용 저수지에 대해서 2040s 기간에 다소 감소하기도 하였으나 2070s 기간 이후에 다시 증가하는 결과를 보여주었다. 낙동강 유역의 위치는 농업용 저수지의 설계홍수량은 미래로 갈수록 크게 증가하지 않는 경향을 보여주었다. 본 연구는 설계홍수량을 추정하는 데 있어 결정론적인 방법에서 더 나아가 자료의 통계적인 특성을 고려하여 구간 추정을 수행하는 방법론을 제공할 수 있을 것으로 사료된다.

  • PDF

Hybrid Multiple Classifier Systems (하이브리드 다중 분류기시스템)

  • Kim In-cheol
    • Journal of Intelligence and Information Systems
    • /
    • v.10 no.2
    • /
    • pp.133-145
    • /
    • 2004
  • Combining multiple classifiers to obtain improved performance over the individual classifier has been a widely used technique. The task of constructing a multiple classifier system(MCS) contains two different issues : how to generate a diverse set of base-level classifiers and how to combine their predictions. In this paper, we review the characteristics of the existing multiple classifier systems: bagging, boosting, and stacking. And then we propose new MCSs: stacked bagging, stacked boosting, bagged stacking, and boasted stacking. These MCSs are a sort of hybrid MCSs that combine advantageous characteristics of the existing ones. In order to evaluate the performance of the proposed schemes, we conducted experiments with nine different real-world datasets from UCI KDD archive. The result of experiments showed the superiority of our hybrid MCSs, especially bagged stacking and boosted stacking, over the existing ones.

  • PDF

Bayesian estimation for frequency using resampling methods (재표본 방법론을 활용한 베이지안 주파수 추정)

  • Pak, Ro Jin
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.6
    • /
    • pp.877-888
    • /
    • 2017
  • Spectral analysis is used to determine the frequency of time series data. We first determine the frequency of the series through the power spectrum or the periodogram and then calculate the period of a cycle that may exist in a time series. Estimating the frequency using a Bayesian technique has been developed and proven to be useful; however, the Bayesian estimator for the frequency cannot be analytically solved through mathematical equations and may be handled numerically or computationally. In this paper, we make an inference on the Bayesian frequency through both resampling a parameter by Markov chain Monte Carlo (MCMC) methods and resampling data by bootstrap methods for a time series. We take the Korean real estate price index as an example for Bayesian frequency estimation. We have found a difference in the periods between the sale price index and the long term rental price index, but the difference is not statistically significant.

The Effects of Civic Consciousness and Sense of Community on Happiness in Adolescent: Mediating Effects of Career Desision (중학생의 시민의식과 공동체의식이 행복감에 미치는 영향: 진로결정의 매개효과)

  • Myung-Ha Lee;Ouk-Sun Cho
    • Journal of Industrial Convergence
    • /
    • v.21 no.5
    • /
    • pp.97-107
    • /
    • 2023
  • The purpose of this study was to provide basic data to improve happiness by verifying the mediating effect of career decision in the relationship between civic consciousness, community consciousness, career decision, and happiness of middle school students. As for the analysis data, the "2020 Gen Z Teenage Values Survey" data surveyed by the Korea Youth Policy Institute was used. Among the survey subjects, 2,703 middle school students who met the purpose of this study were sampled and analyzed using the SPSS WIN 25.0 program. For the analysis method, frequency analysis, descriptive statistical analysis, correlation analysis, and PROCESS MACRO Model Number 4 were used to verify the mediating effect, and indirect effects and significance were analyzed by applying the Bootstrap technique. The results of the study showed, first, that middle school students' sense of citizenship and community had a positive effect on happiness. Second, in the relationship between civic consciousness and happiness, career decision had a partial mediating effect. Third, in the relationship between community consciousness and happiness, career decision had a partial mediating effect. In other words, it is meaningful in that it presented policy alternatives and practical programs to improve the happiness of middle school students.

Product Recommender Systems using Multi-Model Ensemble Techniques (다중모형조합기법을 이용한 상품추천시스템)

  • Lee, Yeonjeong;Kim, Kyoung-Jae
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.2
    • /
    • pp.39-54
    • /
    • 2013
  • Recent explosive increase of electronic commerce provides many advantageous purchase opportunities to customers. In this situation, customers who do not have enough knowledge about their purchases, may accept product recommendations. Product recommender systems automatically reflect user's preference and provide recommendation list to the users. Thus, product recommender system in online shopping store has been known as one of the most popular tools for one-to-one marketing. However, recommender systems which do not properly reflect user's preference cause user's disappointment and waste of time. In this study, we propose a novel recommender system which uses data mining and multi-model ensemble techniques to enhance the recommendation performance through reflecting the precise user's preference. The research data is collected from the real-world online shopping store, which deals products from famous art galleries and museums in Korea. The data initially contain 5759 transaction data, but finally remain 3167 transaction data after deletion of null data. In this study, we transform the categorical variables into dummy variables and exclude outlier data. The proposed model consists of two steps. The first step predicts customers who have high likelihood to purchase products in the online shopping store. In this step, we first use logistic regression, decision trees, and artificial neural networks to predict customers who have high likelihood to purchase products in each product group. We perform above data mining techniques using SAS E-Miner software. In this study, we partition datasets into two sets as modeling and validation sets for the logistic regression and decision trees. We also partition datasets into three sets as training, test, and validation sets for the artificial neural network model. The validation dataset is equal for the all experiments. Then we composite the results of each predictor using the multi-model ensemble techniques such as bagging and bumping. Bagging is the abbreviation of "Bootstrap Aggregation" and it composite outputs from several machine learning techniques for raising the performance and stability of prediction or classification. This technique is special form of the averaging method. Bumping is the abbreviation of "Bootstrap Umbrella of Model Parameter," and it only considers the model which has the lowest error value. The results show that bumping outperforms bagging and the other predictors except for "Poster" product group. For the "Poster" product group, artificial neural network model performs better than the other models. In the second step, we use the market basket analysis to extract association rules for co-purchased products. We can extract thirty one association rules according to values of Lift, Support, and Confidence measure. We set the minimum transaction frequency to support associations as 5%, maximum number of items in an association as 4, and minimum confidence for rule generation as 10%. This study also excludes the extracted association rules below 1 of lift value. We finally get fifteen association rules by excluding duplicate rules. Among the fifteen association rules, eleven rules contain association between products in "Office Supplies" product group, one rules include the association between "Office Supplies" and "Fashion" product groups, and other three rules contain association between "Office Supplies" and "Home Decoration" product groups. Finally, the proposed product recommender systems provides list of recommendations to the proper customers. We test the usability of the proposed system by using prototype and real-world transaction and profile data. For this end, we construct the prototype system by using the ASP, Java Script and Microsoft Access. In addition, we survey about user satisfaction for the recommended product list from the proposed system and the randomly selected product lists. The participants for the survey are 173 persons who use MSN Messenger, Daum Caf$\acute{e}$, and P2P services. We evaluate the user satisfaction using five-scale Likert measure. This study also performs "Paired Sample T-test" for the results of the survey. The results show that the proposed model outperforms the random selection model with 1% statistical significance level. It means that the users satisfied the recommended product list significantly. The results also show that the proposed system may be useful in real-world online shopping store.

An Empirical Analysis on the Appeal Case of Origin Verification for Korean Import Goods Using Bootstrapping Technique (부트스트랩 기법을 활용한 한국 수입 상품의 원산지검증 불복사례 실증분석)

  • Kim, Jong-Hyuk;Heo, Sang-Hyun;Kim, Suk-Chul
    • Korea Trade Review
    • /
    • v.42 no.4
    • /
    • pp.93-114
    • /
    • 2017
  • Under the FTA agreement, preferential tariffs between FTA members will result in tariff reductions. In order to ensure the stable use of the FTA tariff system, it is necessary for the customs authorities to determine whether the origin goods are clearly applicable. This study analyzed the procedure of appeal according to the origin verification system based on the decision made by Korea Customs Service and Tax Tribunal. From this, we examined whether the rate of re-claiming a case rejected in the 'Review System of the Legality Before Taxation' differs. In addition, we carried out a quantitative analysis using bootstrapping technique in order to overcome the scarcity cases of verification of origin among FTA members. The implications of this paper are summarized as follows: First, we tested the hypothesis that the re-claiming rate of Western countries is higher. Second, some issues represented higher re-claiming rate. Third, there was no significant difference between the verification group and the re-claiming rate. Finally, even if an applicant makes a claim again, there is a possibility of being rejected again.

  • PDF

Deep Learning Model Validation Method Based on Image Data Feature Coverage (영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법)

  • Lim, Chang-Nam;Park, Ye-Seul;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.375-384
    • /
    • 2021
  • Deep learning techniques have been proven to have high performance in image processing and are applied in various fields. The most widely used methods for validating a deep learning model include a holdout verification method, a k-fold cross verification method, and a bootstrap method. These legacy methods consider the balance of the ratio between classes in the process of dividing the data set, but do not consider the ratio of various features that exist within the same class. If these features are not considered, verification results may be biased toward some features. Therefore, we propose a deep learning model validation method based on data feature coverage for image classification by improving the legacy methods. The proposed technique proposes a data feature coverage that can be measured numerically how much the training data set for training and validation of the deep learning model and the evaluation data set reflects the features of the entire data set. In this method, the data set can be divided by ensuring coverage to include all features of the entire data set, and the evaluation result of the model can be analyzed in units of feature clusters. As a result, by providing feature cluster information for the evaluation result of the trained model, feature information of data that affects the trained model can be provided.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.