• 제목/요약/키워드: Unsupervised Probabilistic Model

검색결과 8건 처리시간 0.019초

확률모형과 수식정보를 이용한 와/과 병렬사구 범위결정 (Range Detection of Wa/Kwa Parallel Noun Phrase using a Probabilistic Model and Modification Information)

  • 최용석;신지애;최기선
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제35권2호
    • /
    • pp.128-136
    • /
    • 2008
  • 한국어 구문 분석의 초기 단계로서 병렬구조의 해석은 파싱의 효율을 높일 수 있다. 본 논문은 병렬구조 해석을 위한 비지도식 언어에 독립적인 확률 모델을 제안한다. 이 모델은 병렬구조의 대칭성과 상호교환성에 근거한다. 대칭성은 같은 구조가 반복된다는 것이고, 교환성은 좌우 구성요소를 교환해도 같은 의미를 지닌다는 것이다. 병렬구조는 일반적으로 대칭성을 따르지만, 수식어의 성질에 따라서 한쪽만을 수식하는 비대칭적인 구조가 출현하기도 한다. 비대칭 병렬구조 해석을 위해서 추가적으로 수식관계 통계정보를 사용한다. 제안된 모델을 본 논문에서는 "와/과" 조사로 이루어진 한국어의 명사구 병렬구조를 해석하는데 사용되는 것[1]을 중점으로 보여준다. 지도적 방식에 의한 모델을 포함한 다른 모델들에 비해 효율적임을 실험적으로 보여준다.

비지도 학습을 기반으로 한 한국어 부사격의 의미역 결정 (Unsupervised Semantic Role Labeling for Korean Adverbial Case)

  • 김병수;이용훈;이종혁
    • 한국정보과학회논문지:소프트웨어및응용
    • /
    • 제34권2호
    • /
    • pp.112-122
    • /
    • 2007
  • 말뭉치를 이용하여 통계적으로 의미역 결정(semantic role labeling)을 하기 위해서는, 의미역을 태깅하는 작업이 필수적이다. 그러나 한국어의 경우 의미역이 태깅된 대량의 말뭉치를 구하기 힘들며, 이를 직접 구축하기 위해서는 많은 시간과 노력이 필요한 문제점이 있다. 본 논문에서는 비지도 학습의 하나인 self-training 알고리즘을 적용하여, 의미역이 태깅되지 않은 말뭉치로부터 의미역을 결정하는 방법을 제안한다. 이를 위해, 세종 용언 전자사전의 격틀 정보를 이용하여 자동으로 학습 말뭉치를 구축하였으며, 확률 모델을 적용하여 점진적으로 학습하였다. 그 결과, 4개의 부사격 조사에 대해 평균적으로 83.00%의 정확률을 보였다.

확률적 자율 학습을 위한 베이지안 모델 (Bayesian Model for Probabilistic Unsupervised Learning)

  • 최준혁;김중배;김대수;임기욱
    • 한국지능시스템학회논문지
    • /
    • 제11권9호
    • /
    • pp.849-854
    • /
    • 2001
  • Bishop이 제안한 Generative Topographic Mapping(GTM)은 Kohonen이 제안한 자율 학습 신경망인 Self Organizing Maps(SOM)의 확률 버전이다. GTM은 데이터가 생성되는 확률 분포를 잠재 변수, 혹은 은닉 변수를 사용하여 모형화한다. 이것은 SOM에서는 구현될 수 없는 GTM만의 특징이며, 이러한 특징으로 인하여 SOM의 한계들을 극복할 수 있게 된다. 본 논문에서는 이러한 GTM 모형에 베이지안 학습(Bayesian learning)을 결합하여 작은 오분류율을 가지는 분류 알고리즘인 베이지안 GTM(Bayesian GTM)을 제안한다. 이 알고리즘은 기존의 GTM의 빠른 계산 처리 능력과 데이터에 대한 확률 분포, 그리고 베이지안 추론의 정확성을 이용하여 기존의 분류 알고리즘보다 우수한 결과를 얻게 된다. 본 논문에서는 기존의 분류 알고리즘에서 많이 실험하였다. 학습 데이터를 통하여 이를 확인하였다.

  • PDF

정렬기법을 활용한 와/과 병렬명사구 범위 결정 (Range Detection of Wa/Kwa Parallel Noun Phrase by Alignment method)

  • 최용석;신지애;최기선;김기태;이상태
    • 한국감성과학회:학술대회논문집
    • /
    • 한국감성과학회 2008년도 추계학술대회
    • /
    • pp.90-93
    • /
    • 2008
  • In natural language, it is common that repetitive constituents in an expression are to be left out and it is necessary to figure out the constituents omitted at analyzing the meaning of the sentence. This paper is on recognition of boundaries of parallel noun phrases by figuring out constituents omitted. Recognition of parallel noun phrases can greatly reduce complexity at the phase of sentence parsing. Moreover, in natural language information retrieval, recognition of noun with modifiers can play an important role in making indexes. We propose an unsupervised probabilistic model that identifies parallel cores as well as boundaries of parallel noun phrases conjoined by a conjunctive particle. It is based on the idea of swapping constituents, utilizing symmetry (two or more identical constituents are repeated) and reversibility (the order of constituents is changeable) in parallel structure. Semantic features of the modifiers around parallel noun phrase, are also used the probabilistic swapping model. The model is language-independent and in this paper presented on parallel noun phrases in Korean language. Experiment shows that our probabilistic model outperforms symmetry-based model and supervised machine learning based approaches.

  • PDF

Topic Masks for Image Segmentation

  • Jeong, Young-Seob;Lim, Chae-Gyun;Jeong, Byeong-Soo;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권12호
    • /
    • pp.3274-3292
    • /
    • 2013
  • Unsupervised methods for image segmentation are recently drawing attention because most images do not have labels or tags. A topic model is such an unsupervised probabilistic method that captures latent aspects of data, where each latent aspect, or a topic, is associated with one homogeneous region. The results of topic models, however, usually have noises, which decreases the overall segmentation performance. In this paper, to improve the performance of image segmentation using topic models, we propose two topic masks applicable to topic assignments of homogeneous regions obtained from topic models. The topic masks capture the noises among the assigned topic assignments or topic labels, and remove the noises by replacements, just like image masks for pixels. However, as the nature of topic assignments is different from image pixels, the topic masks have properties that are different from the existing image masks for pixels. There are two contributions of this paper. First, the topic masks can be used to reduce the noises of topic assignments obtained from topic models for image segmentation tasks. Second, we test the effectiveness of the topic masks by applying them to segmented images obtained from the Latent Dirichlet Allocation model and the Spatial Latent Dirichlet Allocation model upon the MSRC image dataset. The empirical results show that one of the masks successfully reduces the topic noises.

생성 모형을 사용한 순항 항공기 향후 속도 예측 및 추론 (En-route Ground Speed Prediction and Posterior Inference Using Generative Model)

  • 백현진;이금진
    • 한국항공운항학회지
    • /
    • 제27권4호
    • /
    • pp.27-36
    • /
    • 2019
  • An accurate trajectory prediction is a key to the safe and efficient operations of aircraft. One way to improve trajectory prediction accuracy is to develop a model for aircraft ground speed prediction. This paper proposes a generative model for posterior aircraft ground speed prediction. The proposed method fits the Gaussian Mixture Model(GMM) to historical data of aircraft speed, and then the model is used to generates probabilistic speed profile of the aircraft. The performances of the proposed method are demonstrated with real traffic data in Incheon Flight Information Region(FIR).

Weighted Local Naive Bayes Link Prediction

  • Wu, JieHua;Zhang, GuoJi;Ren, YaZhou;Zhang, XiaYan;Yang, Qiao
    • Journal of Information Processing Systems
    • /
    • 제13권4호
    • /
    • pp.914-927
    • /
    • 2017
  • Weighted network link prediction is a challenge issue in complex network analysis. Unsupervised methods based on local structure are widely used to handle the predictive task. However, the results are still far from satisfied as major literatures neglect two important points: common neighbors produce different influence on potential links; weighted values associated with links in local structure are also different. In this paper, we adapt an effective link prediction model-local naive Bayes model into a weighted scenario to address this issue. Correspondingly, we propose a weighted local naive Bayes (WLNB) probabilistic link prediction framework. The main contribution here is that a weighted cluster coefficient has been incorporated, allowing our model to inference the weighted contribution in the predicting stage. In addition, WLNB can extensively be applied to several classic similarity metrics. We evaluate WLNB on different kinds of real-world weighted datasets. Experimental results show that our proposed approach performs better (by AUC and Prec) than several alternative methods for link prediction in weighted complex networks.

Non-Simultaneous Sampling Deactivation during the Parameter Approximation of a Topic Model

  • Jeong, Young-Seob;Jin, Sou-Young;Choi, Ho-Jin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제7권1호
    • /
    • pp.81-98
    • /
    • 2013
  • Since Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) were introduced, many revised or extended topic models have appeared. Due to the intractable likelihood of these models, training any topic model requires to use some approximation algorithm such as variational approximation, Laplace approximation, or Markov chain Monte Carlo (MCMC). Although these approximation algorithms perform well, training a topic model is still computationally expensive given the large amount of data it requires. In this paper, we propose a new method, called non-simultaneous sampling deactivation, for efficient approximation of parameters in a topic model. While each random variable is normally sampled or obtained by a single predefined burn-in period in the traditional approximation algorithms, our new method is based on the observation that the random variable nodes in one topic model have all different periods of convergence. During the iterative approximation process, the proposed method allows each random variable node to be terminated or deactivated when it is converged. Therefore, compared to the traditional approximation ways in which usually every node is deactivated concurrently, the proposed method achieves the inference efficiency in terms of time and memory. We do not propose a new approximation algorithm, but a new process applicable to the existing approximation algorithms. Through experiments, we show the time and memory efficiency of the method, and discuss about the tradeoff between the efficiency of the approximation process and the parameter consistency.