• Title/Summary/Keyword: 확률적 매칭

Search Result 62, Processing Time 0.021 seconds

Voice Dialing system using Stochastic Matching (확률적 매칭을 사용한 음성 다이얼링 시스템)

  • 김원구
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2004.04a
    • /
    • pp.515-518
    • /
    • 2004
  • This paper presents a method that improves the performance of the personal voice dialling system in which speaker Independent phoneme HMM's are used. Since the speaker independent phoneme HMM based voice dialing system uses only the phone transcription of the input sentence, the storage space could be reduced greatly. However, the performance of the system is worse than that of the system which uses the speaker dependent models due to the phone recognition errors generated when the speaker Independent models are used. In order to solve this problem, a new method that jointly estimates transformation vectors for the speaker adaptation and transcriptions from training utterances is presented. The biases and transcriptions are estimated iteratively from the training data of each user with maximum likelihood approach to the stochastic matching using speaker-independent phone models. Experimental result shows that the proposed method is superior to the conventional method which used transcriptions only.

  • PDF

Direction Augmented Probabilistic Scan Matching for Reliable Localization (신뢰성 높은 위치 인식을 위하여 방향을 고려한 확률적 스캔 매칭 기법)

  • Choi, Min-Yong;Choi, Jin-Woo;Chung, Wan-Kyun
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.17 no.12
    • /
    • pp.1234-1239
    • /
    • 2011
  • The scan matching is widely used in localization and mapping of mobile robots. This paper presents a probabilistic scan matching method. To improve the performance of the scan matching, a direction of data point is incorporated into the scan matching. The direction of data point is calculated using the line fitted by the neighborhood data. Owing to the incorporation, the performance of the matching was improved. The number of iterations in the scan matching decreased, and the tolerance against a high rotation between scans increased. Based on real data of a laser range finder, experiments verified the performance of the proposed direction augmented probabilistic scan matching algorithm.

An Interpretable Log Anomaly System Using Bayesian Probability and Closed Sequence Pattern Mining (베이지안 확률 및 폐쇄 순차패턴 마이닝 방식을 이용한 설명가능한 로그 이상탐지 시스템)

  • Yun, Jiyoung;Shin, Gun-Yoon;Kim, Dong-Wook;Kim, Sang-Soo;Han, Myung-Mook
    • Journal of Internet Computing and Services
    • /
    • v.22 no.2
    • /
    • pp.77-87
    • /
    • 2021
  • With the development of the Internet and personal computers, various and complex attacks begin to emerge. As the attacks become more complex, signature-based detection become difficult. It leads to the research on behavior-based log anomaly detection. Recent work utilizes deep learning to learn the order and it shows good performance. Despite its good performance, it does not provide any explanation for prediction. The lack of explanation can occur difficulty of finding contamination of data or the vulnerability of the model itself. As a result, the users lose their reliability of the model. To address this problem, this work proposes an explainable log anomaly detection system. In this study, log parsing is the first to proceed. Afterward, sequential rules are extracted by Bayesian posterior probability. As a result, the "If condition then results, post-probability" type rule set is extracted. If the sample is matched to the ruleset, it is normal, otherwise, it is an anomaly. We utilize HDFS datasets for the experiment, resulting in F1score 92.7% in test dataset.

Bootstrap estimation of the standard error of treatment effect with double propensity score adjustment (이중 성향점수 보정 방법을 이용한 처리효과 추정치의 표준오차 추정: 붓스트랩의 적용)

  • Lim, So Jung;Jung, Inkyung
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.3
    • /
    • pp.453-462
    • /
    • 2017
  • Double propensity score adjustment is an analytic solution to address bias due to incomplete matching. However, it is difficult to estimate the standard error of the estimated treatment effect when using double propensity score adjustment. In this study, we propose two bootstrap methods to estimate the standard error. The first is a simple bootstrap method that involves drawing bootstrap samples from the matched sample using the propensity score as well as estimating the standard error from the bootstrapped samples. The second is a complex bootstrap method that draws bootstrap samples first from the original sample and then applies the propensity score matching to each bootstrapped sample. We examined the performances of the two methods using simulations under various scenarios. The estimates of standard error using the complex bootstrap were closer to the empirical standard error than those using the simple bootstrap. The simple bootstrap methods tended to underestimate. In addition, the coverage rates of a 95% confidence interval using the complex bootstrap were closer to the advertised rate of 0.95. We applied the two methods to a real data example and found also that the estimate of the standard error using the simple bootstrap was smaller than that using the complex bootstrap.

A Fast Motion Estimation Algorithm using Probability Distribution of Motion Vector and Adaptive Search (움직임벡터의 확률분포와 적응적인 탐색을 이용한 고속 움직임 예측 알고리즘)

  • Park, Seong-Mo;Ryu, Tae-Kyung;Kim, Jong-Nam
    • Journal of KIISE:Information Networking
    • /
    • v.37 no.2
    • /
    • pp.162-165
    • /
    • 2010
  • In the paper, we propose an algorithm that significantly reduces unnecessary computations, while keeping prediction quality almost similar to that of the full search. In the proposed algorithm, we can reduces only unnecessary computations efficiently by taking different search patterns and error criteria of block matching according to distribution probability of motion vectors. Our algorithm takes only 20~30% in computational amount and has decreased prediction quality about 0~0.02dB compared with the fast full search of the H.264 reference software. Our algorithm will be useful to real-time video coding applications using MPEG-2/4 AVC standards.

Estimation and Sensitivity Analysis on the Effect of Job Training for Non-Regular Employees (비정규직 직업훈련효과 추정과 민감도 분석)

  • Lee, Sang-Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.1
    • /
    • pp.163-181
    • /
    • 2012
  • This paper studies the effect of job training for non-regular employees in the Korea labor market. Using an economically active population data set of statistics Korea, we apply a non-parametric matching and sensitivity analysis method to measure the effect of the training for non-regular employees and to look for the impact of an unobservable variable or confounding factor in regards to the selection effect and outcome effect. In the our empirical results, we conclude that the effect of the training for non-regular employees has a better employment effect for getting a regular job rather than a wage effect; in addition, the impact of unobservable variables or confounding factors do not exercise a statistically strong influence on the baseline ATT.

Estimation of Mass Rapid Transit Passenger's Train Choice Using a Mixture Distribution Analysis (통행시간 기반 혼합분포모형 분석을 통한 도시철도 승객의 급행 탑승 여부 추정 연구)

  • Jang, Jinwon;Yoon, Hosang;Park, Dongjoo
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.20 no.5
    • /
    • pp.1-17
    • /
    • 2021
  • Identifying the exact train and the type of train boarded by passengers is practically cumbersome. Previous studies identified the trains boarded by each passenger by matching the Automated Fare Collection (AFC) data and the train schedule diagram. However, this approach has been shown to be inefficient as the exact train boarded by a considerable number of passengers cannot be accurately determined. In this study, we demonstrate that the AFC data - diagram matching technique could not estimate 28% of the train type selected by passengers using the Seoul Metro line no.9. To obtain more accurate results, this paper developed a two-step method for estimating the train type boarded by passengers by applying the AFC data - diagram matching method followed by a mixture distribution analysis. As a result of the analysis, we derived reasonable express train use/non-use passenger classification points based on 298 origin-destination pairs that satisfied the verification criteria of this study.

Inversion of Acoustical Properties of Sedimentary Layers from Chirp Sonar Signals (Chirp 신호를 이용한 해저퇴적층의 음향학적 특성 역산)

  • 박철수;성우제
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.8
    • /
    • pp.32-41
    • /
    • 1999
  • In this paper, an inversion method using chirp signals and two near field receivers is proposed. Inversion problems can be formulated into the probabilistic models composed of signals, a forward model and noise. Forward model to simulate chirp signals is chosen to be the source-wavelet-convolution planewave modeling method. The solution of the inversion problem is defined by a posteriori pdf. The wavelet matching technique, using weighted least-squares fitting, estimates the sediment sound-speed and thickness on which determination of the ranges for a priori uniform distribution is based. The genetic algorithm can be applied to a global optimization problem to find a maximum a posteriori solution for determined a priori search space. Here the object function is defined by an L₂norm of the difference between measured and modeled signals. The observed signals can be separated into a set of two signals reflected from the upper and lower boundaries of a sediment. The separation of signals and successive applications of the genetic algorithm optimization process reduce the search space, therefore improving the inversion results. Not only the marginal pdf but also the statistics are calculated by numerical evaluation of integrals using the samples selected during importance sampling process of the genetic algorithm. The examples applied here show that, for synthetic data with noise, it is possible to carry out an inversion for sedimentary layers using the proposed inversion method.

  • PDF

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

  • Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.3
    • /
    • pp.1-23
    • /
    • 2013
  • To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.

The Effect of the Extended Benefit Duration on the Aggregate Labor Market (실업급여 지급기간 변화의 효과 분석)

  • Moon, Weh-Sol
    • KDI Journal of Economic Policy
    • /
    • v.32 no.1
    • /
    • pp.131-169
    • /
    • 2010
  • I develop a matching model in which risk-averse workers face borrowing constraints and make a labor force participation decision as well as a job search decision. A sharp distinction between unemployment and out of the labor force is made: those who look for work for a certain period but find no job are classified as the unemployed and those who do not look for work are classified as those out of the labor force. In the model, the job search decision consists of two steps. First, each individual who is not working obtains information about employment opportunities. Second, each individual who decides to search has to take costly actions to find a job. Since individuals differ with respect to asset holdings, they have different reservation job-finding probabilities at which an individual is indifferent between searching and not searching. Individuals, who have large asset holdings and thereby are less likely to participate in the labor market, have high reservation job-finding probability, and they are less likely to search if they have less quality of information. In other words, if individuals with large asset holdings search for job, they must have very high quality of information and face very high actual job-finding probability. On the other hand, individuals with small asset holdings have low reservation job-finding probability and they are likely to search for less quality of information. They face very low actual job-finding probability and seem to remain unemployed for a long time. Therefore, differences in the quality of information explain heterogeneous job search decisions among individuals as well as higher job finding probability for those who reenter the labor market than for those who remain in the labor force. The effect of the extended maximum duration of unemployment insurance benefits on the aggregate labor market and the labor market flows is investigated. The benchmark benefit duration is set to three months. As maximum benefit duration is extended up to six months, the employment-population ratio decreases while the unemployment rate increases because individuals who are eligible for benefits have strong incentives to remain unemployed and decide to search even if they obtain less quality of information, which leads to low job-finding probability and then high unemployment rate. Then, the vacancy-unemployment ratio decreases and, in turn, the job-finding probability for both the unemployed and those out of the labor force decrease. Finally, the outflow from nonparticipation decreases with benefit duration because the equilibrium job-finding probability decreases. As the job-finding probability decreases, those who are out of the labor force are less likely to search for the same quality of information. I also consider the matching model with two states of employment and unemployment. Compared to the results of the two-state model, the simulated effects of changes in benefit duration on the aggregate labor market and the labor market flows are quite large and significant.

  • PDF