• Title/Summary/Keyword: 확률이론

Search Result 831, Processing Time 0.024 seconds

Accelerated Loarning of Latent Topic Models by Incremental EM Algorithm (점진적 EM 알고리즘에 의한 잠재토픽모델의 학습 속도 향상)

  • Chang, Jeong-Ho;Lee, Jong-Woo;Eom, Jae-Hong
    • Journal of KIISE:Software and Applications
    • /
    • v.34 no.12
    • /
    • pp.1045-1055
    • /
    • 2007
  • Latent topic models are statistical models which automatically captures salient patterns or correlation among features underlying a data collection in a probabilistic way. They are gaining an increased popularity as an effective tool in the application of automatic semantic feature extraction from text corpus, multimedia data analysis including image data, and bioinformatics. Among the important issues for the effectiveness in the application of latent topic models to the massive data set is the efficient learning of the model. The paper proposes an accelerated learning technique for PLSA model, one of the popular latent topic models, by an incremental EM algorithm instead of conventional EM algorithm. The incremental EM algorithm can be characterized by the employment of a series of partial E-steps that are performed on the corresponding subsets of the entire data collection, unlike in the conventional EM algorithm where one batch E-step is done for the whole data set. By the replacement of a single batch E-M step with a series of partial E-steps and M-steps, the inference result for the previous data subset can be directly reflected to the next inference process, which can enhance the learning speed for the entire data set. The algorithm is advantageous also in that it is guaranteed to converge to a local maximum solution and can be easily implemented just with slight modification of the existing algorithm based on the conventional EM. We present the basic application of the incremental EM algorithm to the learning of PLSA and empirically evaluate the acceleration performance with several possible data partitioning methods for the practical application. The experimental results on a real-world news data set show that the proposed approach can accomplish a meaningful enhancement of the convergence rate in the learning of latent topic model. Additionally, we present an interesting result which supports a possible synergistic effect of the combination of incremental EM algorithm with parallel computing.

Empirical Analysis on the Disparity between Willingness to Pay and Willingness to Accept for Drinking Water Risks : Using Experimental Market Method (비시장재에 대한 WTP와 WTA 격차에 대한 실증분석 : 실험시장접근법을 이용한 음용수 건강위험을 사례로)

  • Eom, Young Sook
    • Environmental and Resource Economics Review
    • /
    • v.17 no.3
    • /
    • pp.135-166
    • /
    • 2008
  • This paper reports the empirical results of comparing the willingness to pay(WTP) for health risk reductions and the willingness to accept(WTA) for risk increases using experimental market methods in the first time in Korea. Health risks were defined as probabilities of premature death from exposure to one of As, Pb, and THM in tap water. A total of six experimental markets with 15 participants in each experiments were held using 20 repetitive Vickrey second-price sealed-bid auctions. To compare the effects of market experiences, trading a marketed good, candy bar, was introduced before the trading the non-marketed good, drinking water risks. Moreover, an objective risk information was provided after the first 10 trials to incorporate learning processes. Regardless of marketed or non-marketed goods, the mean of WTA exceeded the mean of WTP at the first auction trial. As experimental trials proceeded, the disparity between WTA and WTP for marketed goods disappeared. However results for non-marketed goods were rather mixed to the extent that WTA for health risks from As (relatively high risk leves) were significantly larger than WTP, while there were no significant difference between WTA and WTP for health risks fro Pb and THM (relatively low risk levels). On the other hand, participants seemed to respond in a 'rational' manner to the objective risk information provided, with positive learning effects of market-like experience(especially in the WTA experiments).

  • PDF

Development of Sumulation Model for Breeding Schemes of Hanwoo(Korean Cattle) (한우의 개량 체계 모의실험을 위한 모형 개발)

  • Ju, J.C.;Kim, N.S.
    • Journal of Animal Science and Technology
    • /
    • v.44 no.5
    • /
    • pp.507-518
    • /
    • 2002
  • A multiple-trait stochastic computer simulation model was constructed to predict the breeding schemes and selection methods on Hanwoo(Korean cattle). The model could be used four kinds of selection criteria (random, phenotype and true or estimated breeding values). At the test run in various population size for 20 years, all estimated parameters of the each simulated populations were resulted similar to input parameters. The deviations between input and output values of parameter in the large population were smaller than in the small population. The simulated results obtained from ten small populations consisted with one sire and ten dams in each population for 500 years were as follows; Inbreeding coefficients of population were similar to theoretical estimating function. Mean values of each traits selected were randomly drifted by generation, but they were converged into a value when inbreeding coefficients came close to one. Additive genetic variances within each population were reduced by generation, and they were converged into zero when inbreeding coefficients came close to one. These results indicated that the simulated populations hold to statistical properties of input parameters.

Two Level Bin-Packing Algorithm for Data Allocation on Multiple Broadcast Channels (다중 방송 채널에 데이터 할당을 위한 두 단계 저장소-적재 알고리즘)

  • Kwon, Hyeok-Min
    • Journal of Korea Multimedia Society
    • /
    • v.14 no.9
    • /
    • pp.1165-1174
    • /
    • 2011
  • In data broadcasting systems, servers continuously disseminate data items through broadcast channels, and mobile client only needs to wait for the data of interest to present on a broadcast channel. However, because broadcast channels are shared by a large set of data items, the expected delay of receiving a desired data item may increase. This paper explores the issue of designing proper data allocation on multiple broadcast channels to minimize the average expected delay time of all data items, and proposes a new data allocation scheme named two level bin-packing(TLBP). This paper first introduces the theoretical lower-bound of the average expected delay, and determines the bin capacity based on this value. TLBP partitions all data items into a number of groups using bin-packing algorithm and allocates each group of data items on an individual channel. By employing bin-packing algorithm in two step, TLBP can reflect a variation of access probabilities among data items allocated on the same channel to the broadcast schedule, and thus enhance the performance. Simulation is performed to compare the performance of TLBP with three existing approaches. The simulation results show that TLBP outperforms others in terms of the average expected delay time at a reasonable execution overhead.

Bootstrap estimation of the standard error of treatment effect with double propensity score adjustment (이중 성향점수 보정 방법을 이용한 처리효과 추정치의 표준오차 추정: 붓스트랩의 적용)

  • Lim, So Jung;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.3
    • /
    • pp.453-462
    • /
    • 2017
  • Double propensity score adjustment is an analytic solution to address bias due to incomplete matching. However, it is difficult to estimate the standard error of the estimated treatment effect when using double propensity score adjustment. In this study, we propose two bootstrap methods to estimate the standard error. The first is a simple bootstrap method that involves drawing bootstrap samples from the matched sample using the propensity score as well as estimating the standard error from the bootstrapped samples. The second is a complex bootstrap method that draws bootstrap samples first from the original sample and then applies the propensity score matching to each bootstrapped sample. We examined the performances of the two methods using simulations under various scenarios. The estimates of standard error using the complex bootstrap were closer to the empirical standard error than those using the simple bootstrap. The simple bootstrap methods tended to underestimate. In addition, the coverage rates of a 95% confidence interval using the complex bootstrap were closer to the advertised rate of 0.95. We applied the two methods to a real data example and found also that the estimate of the standard error using the simple bootstrap was smaller than that using the complex bootstrap.

Determining Transit Vehicle Dispatching Time (최적 배차시각 설정에 관한 해석적 연구)

  • Park, Jun-Sik;Go, Seung-Yeong;Kim, Jeom-San;Gwon, Yong-Seok
    • Journal of Korean Society of Transportation
    • /
    • v.25 no.3
    • /
    • pp.137-144
    • /
    • 2007
  • This study involves an analytical approach to determine transit dispatching schedules (headways) Determining a time schedule is an important process in transit system planning. In general, the transit headway should be shorter during the peak hour than at non-peak hours for demand-responsive service. It allows passengers to minimize their waiting time under inelastic, fixed demand conditions. The transit headway should be longer as operating costs increase, and shorter as demand and waiting time increase. Optimal headway depends on the amount of ridership. and each individual vehicle dispatching time depends on the distribution of the ridership. This study provides a theoretical foundation for the dispatching scheme consistent with common sense. Previous research suggested a dispatching scheme with even headway. However, according to this research, that is valid for a specific case when the demand pattern is uniform. This study is a general analysis expanding that previous research. This study suggests an easy method to set a time table without a complex and difficult calculation. Further. if the time axis is changed to the space axis instead, this study could be expanded to address the spacing problems of some facilities such as roads. stations, routes and others.

Study on Success and Failure of Diversification Based on Neo-Schumpeterian Perspective: Samsung's Three Diversification Cases in the Semiconductor Industry (네오슘페터주의 관점에서 바라본 다각화의 성공과 실패: 삼성 반도체사업의 세 가지 다각화 사례 연구)

  • Park, Tae-Young
    • Journal of Technology Innovation
    • /
    • v.18 no.2
    • /
    • pp.175-219
    • /
    • 2010
  • Since diversification can be a necessary means for company's survival and the conservation of its success, hundreds of studies have been done by three schools of industrial economics, strategic management, and Neo-Schumpeterian economics for over 30 years. However, any school has not presented a model comprehensively explaining diversification' success or failure. The study tried to suggest a theoretical framework integrating findings came from three schools. The framework considers both firm's technological capabilities and sector-specific characteristics as well as reflects a Neo-Schumpeterian view emphasizing technological aspects. The goal of the study is finding major reasons of success and failure during company's diversification through studying three diversification cases of Samsung. Our findings show that the diversification toward TFT-LCD was easier and more successful than the diversification toward microprocessor because DRAM is more similar to TFT-LCD than microprocessor. Samsung also tended to build only the types of capabilities which were originated from capabilities accumulated in DRAM business. Our findings give firm's strategists a lesson that they can increase the probability of success in diversification, if only they should simultaneously consider a new sector's characteristics, a firm's technological capabilities accumulated in old sectors, and the availability of old capabilities for being applied to a new sector.

  • PDF

Implications of Cohabitation for the Korean Family: Cohabiter Characteristics Based on National Survey Data (동거와 한국가족: 전국조사에서 나타난 동거자의 특성)

  • Lee, Yean-Ju
    • Korea journal of population studies
    • /
    • v.31 no.2
    • /
    • pp.77-100
    • /
    • 2008
  • This study explores the implications of increasing cohabitation for the Korean family, by comparing the characteristics of cohabiters with those of married couples and of never-married and divorced people. Data are from the Marriage Registration Files for the years of 1997 through 2005 and Social Statistics Survey conducted in 2006. Results from descriptive statistics and logit analysis generally confirm the predictions of the western literature. First, cohabitation is part of overall changes in the family system. Cohabitation is more prevalent among the previously married than among the never married. Second, the socioeconomic status of cohabiting men is lower than that of married men. Third, according to spouses' employment status, educational levels, and age differences, gender roles are more egalitarian among cohabiting couples than among married couples. The finding that cohabiter characteristics are not similar to those of married couples seems to suggest that cohabitation does not simply represent a trial of marriage out of caution, unlike what most media articles assume. Instead, cohabitation may signify some unconventional circumstances forcing the couple to choose it as an alternative to marriage even temporarily. This and other conjectures discussed in this paper need to be reexamined with more rigorous data, as increasing trend of cohabitation seems to be inevitable in the coming years.

Quantification of Directional Properties of Channel Network and Hill Slope (하천망과 사면의 방향성 정량화)

  • Park, Changyeol;Yoo, Chulsang
    • 한국방재학회:학술대회논문집
    • /
    • 2011.02a
    • /
    • pp.211-211
    • /
    • 2011
  • 지형은 강우에 의한 유역 유출응답을 결정하는 중요한 인자이다. 따라서 유역의 지형형태학적 인자를 수문해석에 이용하기 위한 시도는 긴 역사를 갖는다(Rodriguez-Itube and Valdes, 1979). 지형을 구성하는 대표적인 요소로 하천망과 사면을 들 수 있다. 당연히 이들이 어떤 방식으로 결합되는지에 따라 유출특성의 차이가 발생하게 된다(Zevenbergen and Thorne, 1987; Brierley and Fryirs, 2005). 이에 본 연구에서는 하천유역에서 사면과 하천망의 방향적 특성을 정량화하고, 그 둘 사이의 관계를 살펴보고자 한다. 만일 사면의 방향성과 하천의 방향성이 일정한 관계를 가지고 정량화될 수 있다면, 이러한 특성은 보다 간단히 강우-유출 모형에 고려될 수 있을 것이다. 일례로 확률밀도함수 형태로 제시되는 사면과 하천 방향성을 GIUH 이론에 근거하여 재해석할 수 있다. 궁극적으로는 호우 방향성에 의한 유출응답의 차이를 파악할 수 있게 된다. 본 연구에서는 내성천 유역을 대상으로 하였으며, 대상유역의 수치지형도를 수집하여 DEM을 구축하였다. 하천망 추출을 위해 ArcGIS의 Hydro Tool을 이용하였다. 이들 하천망의 방향성은 von Mises 분포에 적용하여 정량화하였으며, 이를 통해 하천유역에서 하천망의 방향적 특성을 살펴보았다. 추가로 하천망과 사면의 방향적 구조를 확인함으로써 이들 특성이 강우-유출 모형에 유연하게 고려될 수 있도록 하였다. 본 연구의 결과를 요약하면 다음과 같다. (1) 본 연구에서 고려한 von Mises 분포는 하천망의 방향적 특성을 적절히 표현할 수 있음을 확인하였다. 방위 기준으로 정리한 하천망의 방향성은 하나의 mode 특성이 뚜렷하고, 하천 합류점 하천을 기준으로 정리할 경우에는 두 개의 mode 특성이 뚜렷해짐을 알 수 있었다. (2) 하천망의 방향성은 사면의 방향성과 뚜렷한 관계를 갖는 것을 알 수 있었다. 하천망과 사면의 방향적 결합 구조는 유역의 특성을 보다 현실적으로 묘사할 수 있고, 이들 관계를 가정하고 하천망의 방향성이 정량화된다면, 강우-유출 모형에 이들 특성이 쉽게 반영될 수 있을 것으로 기대된다. (3) 하천망의 방향성은 고차 하천일수록 뚜렷한 mode 특성을 나타냄을 확인하였다. 이러한 결과는 고차 하천일수록 그 방향성이 한반도의 주구조선과 잘 일치하는 것으로 기존 연구성과와도 일치하는 것이다. (4) 하천망의 주방향은 하천연장에 대한 영향을 크게 받음을 알 수 있었다. 이는 대상 하천유역의 유역응답에서 하천유출이 사면유출보다 상대적으로 큰 영향력을 갖기 때문이다. 강우-유출 모형에 하천망 방향성을 고려하기 위해서도 하천연장을 고려하여 이들 방향성을 정량화하는 것이 호우 방향에 보다 뚜렷한 유출반응 특성을 나타낼 것으로 보인다. (5) 본 연구에서 고려한 하천망의 방향성 정량화 방안을 이용할 경우 이들 결과는 유출모형에 고려될 수 있을 뿐만 아니라 유출응답 특성을 정량적으로 파악하는데 이용될 수 있다. 방위 기준으로 정리한 하천망 방향성은 실제 유역에 대한 유출모형에 적용이 가능하며, 하천 합류점을 기준으로 정리한 결과는 호우의 방향성에 대한 유출응답의 반응을 정량적으로 살펴보는데 이용될 수 있다.

  • PDF

The Significance Test on the AHP-based Alternative Evaluation: An Application of Non-Parametric Statistical Method (AHP를 이용한 대안 평가의 유의성 분석: 비모수적 통계 검정 적용)

  • Park, Joonsoo;Kim, Sung-Chul
    • The Journal of Society for e-Business Studies
    • /
    • v.22 no.1
    • /
    • pp.15-35
    • /
    • 2017
  • The method of weighted sum of evaluation using AHP is widely used in feasibility analysis and alternative selection. Final scores are given in forms of weighted sums and the alternative with largest score is selected. With two alternatives, as in feasibility analysis, the final score greater than 0.5 gives the selection but there remains a question that how large is large enough. KDI suggested a concept of 'grey area' where scores are between 0.45 and 0.55 in which decisions are to be made with caution, but it lacks theoretical background. Statistical testing was introduced to answer the question in some studies. It was assumed some kinds of probability distribution, but did not give the validity on them. We examine the various cases of weighted sum of evaluation score and show why the statistical testing has to be introduced. We suggest a non-parametric testing procedure which does not assume a specific distribution. A case study is conducted to analyze the validity of our suggested testing procedure. We conclude our study with remarks on the implication of analysis and the future way of research development.