Search | Korea Science

Text Categorization Based on the Maximum Entropy Principle (최대 엔트로피 기반 문서 분류기의 학습)

장정호;장병탁;김영택
- Proceedings of the Korean Information Science Society Conference
- /
- 1999.10b
- /
- pp.57-59
- /
- 1999
본 논문에서는 최대 엔트로피 원리에 기반한 문서 분류기의 학습을 제안한다. 최대 엔트로피 기법은 자연언어 처리에서 언어 모델링(Language Modeling), 품사 태깅 (Part-of-Speech Tagging) 등에 널리 사용되는 방법중의 하나이다. 최대 엔트로피 모델의 효율성을 위해서는 자질 선정이 중요한데, 본 논문에서는 자질 집합의 선택을 위한 기준으로 chi-square test, log-likelihood ratio, information gain, mutual information 등의 방법을 이용하여 실험하고, 전체 후보 자질에 대한 실험 결과와 비교해 보았다. 데이터 집합으로는 Reuters-21578을 사용하였으며, 각 클래스에 대한 이진 분류 실험을 수행하였다.
PDF

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

Kim Sung-Dong
- Journal of KIISE:Software and Applications
- /
- v.32 no.5
- /
- pp.385-395
- /
- 2005
Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.
PDF KSCI

An Analysis of Fuzzy Survey Data Based on the Maximum Entropy Principle (최대 엔트로피 분포를 이용한 퍼지 관측데이터의 분석법에 관한 연구)

유재휘;유동일
- Journal of the Korea Society of Computer and Information
- /
- v.3 no.2
- /
- pp.131-138
- /
- 1998
In usual statistical data analysis, we describe statistical data by exact values. However, in modem complex and large-scale systems, it is difficult to treat the systems using only exact data. In this paper, we define these data as fuzzy data(ie. Linguistic variable applied to make the member-ship function.) and Propose a new method to get an analysis of fuzzy survey data based on the maximum entropy Principle. Also, we propose a new method of discrimination by measuring distance between a distribution of the stable state and estimated distribution of the present state using the Kullback - Leibler information. Furthermore, we investigate the validity of our method by computer simulations under realistic situations.
PDF

Korean Noun Phrase Identification Using Maximum Entropy Method (최대 엔트로피 모델을 이용한 한국어 명사구 추출)

강인호;전수영;김길창
- Proceedings of the Korean Society for Cognitive Science Conference
- /
- 2000.06a
- /
- pp.127-132
- /
- 2000
본 논문에서는 격조사의 구문적인 특성을 이용하여, 수식어까지 포함한 명사구 추출 방법을 연구한다. 명사구 판정을 위해 연속적인 형태소열을 문맥정보로 사용하던 기존의 방법과 달리, 명사구의 처음과 끝 그리고 명사구 주변의 형태소를 이용하여 명사구의 수식 부분과 중심 명사를 문맥정보로 사용한다. 다양한 형태의 문맥 정보들은 최대 엔트로피 원리(Maximum Entropy Principle)에 의해 하나의 확률 분포로 결합된다. 본 논문에서 제안하는 명사구 추출 방법은 먼저 구문 트리 태깅된 코퍼스에서 품사열로 표현되는 명사구 문법 규칙을 얻어낸다. 이렇게 얻어낸 명사구 규칙을 이용하여 격조사와 인접한 명사구 후보들을 추출한다. 추출된 각 명사구 후보는 학습 코퍼스에서 얻어낸 확률 분포에 기반하여 명사구로 해석될 확률값을 부여받는다. 이 중 제일 확률값이 높은 것을 선택하는 형태로 각 격조사와 관계있는 명사구를 추출한다. 본 연구에서 제시하는 모델로 시험을 한 결과 평균 4.5개의 구를 포함하는 명사구를 추출할 수 있었다.
PDF

불확정적 특성을 고려한 응력해석에 관한 일고찰

정명채
- Computational Structural Engineering
- /
- v.6 no.4
- /
- pp.10-13
- /
- 1993
본 고에서는 불확정적 Approach에 의한 구조물 거동파악의 일례를 소개한다. 이 예에서는 극치통계와 엔트로피 최대원리를 이용하여, 부동침하를 받은 쉘구조물의 응력을 추정하는 이론을 취급한다. 부동침하는 불확정적 특성을 비교적 많이 지니고 있으며, 특히 구조물을 지지하고 있는 지반의 경우는 그 물리적 정수와 침하특성이 확정론적으로는 취급이 곤란한 경우가 많다고 생각된다. 구체적으로 극치통계법에서는 부동침하를 기초 Ring의 원주방향으로의 Fourier 계수로 가정하여, 위상각과 침하의 2승평균치가 확정치로 주어졌을 때, 진폭 Spectrum을 불확정변수로 간주하여 추정하는 방법을 소개한다. 일단 진폭 Spectrum이 구해지면 응력은 간단히 구해지므로 여기서는 Spectrum에 관해서만 언급하기로 한다.
PDF

Korean Noun Phrase Identification using Maximum Entropy Method (최대 엔트로피 모델을 이용한 한국어 명사구 추출)

Kang, In-Ho;Jeon, Su-Young;Kim, Gil-Chang
- Annual Conference on Human and Language Technology
- /
- 2000.10d
- /
- pp.127-132
- /
- 2000
본 논문에서는 격조사의 구문적인 특성을 이용하여, 수식어까지 포함한 명사구 추출 방법을 연구한다. 명사구 판정을 위해 연속적인 형태소열을 문맥정보로 사용하던 기존의 방법과 달리, 명사구의 처음과 끝 그리고 명사구 주변의 형태소를 이용하여 명사구의 수식 부분과 중심 명사를 문맥정보로 사용한다. 다양한 형태의 문맥정보들은 최대 엔트로피 원리(Maximum Entropy Principle)에 의해 하나의 확률 분포로 결합된다. 본 논문에서 제안하는 명사구 추출 방법은 먼저 구문 트리 태깅된 코퍼스에서 품사열로 표현되는 명사구 문법 규칙을 얻어낸다. 이렇게 얻어낸 명사구 규칙을 이용하여 격조사와 인접한 명사구 후보들을 추출한다. 추출된 각 명사구 후보는 학습 코퍼스에서 얻어낸 확률 분포에 기반하여 명사구로 해서될 확률값을 부여받는다. 이 중 제일 확률값이 높은 것을 선택하는 형태로 각 격조사와 관계있는 명사구를 추출한다. 본 연구에서 제시하는 모델로 실험을 한 결과 평균 4.5개의 구를 포함하는 명사구를 추출할 수 있었다.
PDF

A probabilistic information retrieval model by document ranking using term dependencies (용어간 종속성을 이용한 문서 순위 매기기에 의한 확률적 정보 검색)

You, Hyun-Jo;Lee, Jung-Jin
- The Korean Journal of Applied Statistics
- /
- v.32 no.5
- /
- pp.763-782
- /
- 2019
This paper proposes a probabilistic document ranking model incorporating term dependencies. Document ranking is a fundamental information retrieval task. The task is to sort documents in a collection according to the relevance to the user query (Qin et al., Information Retrieval Journal, 13, 346-374, 2010). A probabilistic model is a model for computing the conditional probability of the relevance of each document given query. Most of the widely used models assume the term independence because it is challenging to compute the joint probabilities of multiple terms. Words in natural language texts are obviously highly correlated. In this paper, we assume a multinomial distribution model to calculate the relevance probability of a document by considering the dependency structure of words, and propose an information retrieval model to rank a document by estimating the probability with the maximum entropy method. The results of the ranking simulation experiment in various multinomial situations show better retrieval results than a model that assumes the independence of words. The results of document ranking experiments using real-world datasets LETOR OHSUMED also show better retrieval results.
https://doi.org/10.5351/KJAS.2019.32.5.763 인용 PDF KSCI

Segmentation of Color Image using the Deterministic Annealing EM Algorithm (결정적 어닐링 EM 알고리즘을 이요한 칼라 영상의 분할)

Cho, Wan-Hyun;Park, Jong-Hyun;Park, Soon-Young
- Journal of KIISE:Databases
- /
- v.28 no.3
- /
- pp.324-333
- /
- 2001
In this paper we present a novel color image segmentation algorithm based on a Gaussian Mixture Model(GMM). It is introduced a Deterministic Annealing Expectation Maximization(DAEM) algorithm which is developed using the principle of maximum entropy to overcome the local maxima problem associated with the standard EM algorithm. In our approach, the GMM is used to represent the multi-colored objects statistically and its parameters are estimated by DAEM algorithm. We also develop the automatic determination method of the number of components in Gaussian mixtures models. The segmentation of image is based on the maximum posterior probability distribution which is calculated by using the GMM. The experimental results show that the proposed DAEM can estimate the parameters more accurately than the standard EM and the determination method of the number of mixture models is very efficient. When tested on two natural images, the proposed algorithm performs much better than the traditional algorithm in segmenting the image fields.
PDF

A New Formulation of the Reconstruction Problem in Neutronics Nodal Methods Based on Maximum Entropy Principle (노달방법의 중성자속 분포 재생 문제에의 최대 엔트로피 원리에 의한 새로운 접근)

Na, Won-Joon;Cho, Nam-Zin
- Nuclear Engineering and Technology
- /
- v.21 no.3
- /
- pp.193-204
- /
- 1989
This paper develops a new method for reconstructing neutron flux distribution, that is based on the maximum entropy Principle in information theory. The Probability distribution that maximizes the entropy Provides the most unbiased objective Probability distribution within the known partial information. The partial information are the assembly volume-averaged neutron flux, the surface-averaged neutron fluxes and the surface-averaged neutron currents, that are the results of the nodal calculation. The flux distribution on the boundary of a fuel assembly, which is the boundary condition for the neutron diffusion equation, is transformed into the probability distribution in the entropy expression. The most objective boundary flux distribution is deduced using the results of the nodal calculation by the maximum entropy method. This boundary flux distribution is then used as the boundary condition in a procedure of the imbedded heterogeneous assembly calculation to provide detailed flux distribution. The results of the new method applied to several PWR benchmark problem assemblies show that the reconstruction errors are comparable with those of the form function methods in inner region of the assembly while they are relatively large near the boundary of the assembly. The incorporation of the surface-averaged neutron currents in the constraint information (that is not done in the present study) should provide better results.
PDF

Search Result 9, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)