• Title/Summary/Keyword: Conditional Entropy

Search Result 36, Processing Time 0.021 seconds

Method for Evaluating Optimal Air Monitoring Sites for SO2 in Ulsan (울산광역시 아황산가스(SO2)의 최적관측소 평가방법)

  • Lim, Junghyun;Yoon, Sanghoo
    • Journal of Environmental Science International
    • /
    • v.26 no.9
    • /
    • pp.1073-1080
    • /
    • 2017
  • Manufacturing and technology industries produce large amounts of air pollutants. Ulsan Metropolitan City, South Korea, is well-known for its large industrial complexes; in particular, the concentration of $SO_2$ here is the highest in the country. We assessed $SO_2$ monitoring sites based on conditional and joint entropy, because this is a common method for determining an optimal air monitoring network. Monthly $SO_2$ concentrations from 12 air monitoring sites were collected, and the distribution of spatial locations was determined by kriging. Mean absolute error, Root Mean Squared Error (RMSE), bias and correlation coefficients were employed to evaluate the considered algorithms. An optimal air monitoring network for Ulsan was suggested based on the improvement of RMSE.

Efficient variable selection method using conditional mutual information (조건부 상호정보를 이용한 분류분석에서의 변수선택)

  • Ahn, Chi Kyung;Kim, Donguk
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.1079-1094
    • /
    • 2014
  • In this paper, we study efficient gene selection methods by using conditional mutual information. We suggest gene selection methods using conditional mutual information based on semiparametric methods utilizing multivariate normal distribution and Edgeworth approximation. We compare our suggested methods with other methods such as mutual information filter, SVM-RFE, Cai et al. (2009)'s gene selection (MIGS-original) in SVM classification. By these experiments, we show that gene selection methods using conditional mutual information based on semiparametric methods have better performance than mutual information filter. Furthermore, we show that they take far less computing time than Cai et al. (2009)'s gene selection but have similar performance.

Determination of the Group of Classifiers by Minimizing the Conditional Entropy (조건부 엔트로피의 최소화를 통하여 인식기의 집합을 결정하는 방법)

  • Kang, Hee-Joong
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06c
    • /
    • pp.569-573
    • /
    • 2008
  • 패턴인식 문제를 다루는 연구에서 인식 성능을 향상시키고자 베이스 에러율의 상한인 조건부 엔트로피를 응용하는 시도가 있었다. 본 논문에서는 다수의 인식기로 구성된 다수 인식기 시스템이 우수한 성능을 보이도록 인식기의 집합을 결정하는 문제에서 이러한 조건부 엔트로피의 최소화를 통하여 시도한 방법과 다른 방법들을 간단하고 분명한 예제를 통하여 비교, 분석해 보았다. 다수 인식기의 결합 방법으로 대표적인 투표 기법과 조건부 독립 가정의 베이지안 기법을 사용하였으며, 조건부 엔트로피의 최소화를 통하여 인식기의 집합을 결정하는 방법에 대한 유용성을 확인할 수 있었다.

  • PDF

Construction of Multiple Classifier Systems based on a Classifiers Pool (인식기 풀 기반의 다수 인식기 시스템 구축방법)

  • Kang, Hee-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.8
    • /
    • pp.595-603
    • /
    • 2002
  • Only a few studies have been conducted on how to select multiple classifiers from the pool of available classifiers for showing the good classification performance. Thus, the selection problem if classifiers on how to select or how many to select still remains an important research issue. In this paper, provided that the number of selected classifiers is constrained in advance, a variety of selection criteria are proposed and applied to tile construction of multiple classifier systems, and then these selection criteria will be evaluated by the performance of the constructed multiple classifier systems. All the possible sets of classifiers are trammed by the selection criteria, and some of these sets are selected as the candidates of multiple classifier systems. The multiple classifier system candidates were evaluated by the experiments recognizing unconstrained handwritten numerals obtained both from Concordia university and UCI machine learning repository. Among the selection criteria, particularly the multiple classifier system candidates by the information-theoretic selection criteria based on conditional entropy showed more promising results than those by the other selection criteria.

Fine-Grained Named Entity Recognition using Conditional Random Fields for Question Answering (Conditional Random Fields를 이용한 세부 분류 개체명 인식)

  • Lee, Chang-Ki;Hwang, Yi-Gyu;Oh, Hyo-Jung;Lim, Soo-Jong;Heo, Jeong;Lee, Chung-Hee;Kim, Hyeon-Jin;Wang, Ji-Hyun;Jang, Myung-Gil
    • Annual Conference on Human and Language Technology
    • /
    • 2006.10e
    • /
    • pp.268-272
    • /
    • 2006
  • 질의응답 시스템은 사용자 질의에 해당하는 정답을 찾기 위해서 세부 분류된 개체명을 사용한다. 이러한 세부 분류 개체명 인식을 위해서 대부분의 시스템이 일반 대분류 개체명인식 후에 사전 등을 이용하여 세부 분류로 나누는 방법을 이용하고 있다. 본 논문에서는 질의응답 시스템을 위한 세부 분류 개체명 인식을 위해서 Conditional Random Fields를 이용한다. 개체명 인식의 과정을 개체명 경계 인식과 경계가 인식된 개체명의 클래스 분류의 두 단계로 나누어, 개체명 경계 인식에 Conditional Random Fields를 이용하고, 경계 인식된 개체명의 클래스 분류에는 Maximum Entropy를 이용한다. 실험결과 147개의 세부분류 개체명 인식에 대해서 정확도 85.8%, 재현률 81.1%. F1=83.4의 성능을 얻었고. baseline model 보다 학습 시간이 27%로 줄고 성능은 증가하였다. 또한 제안된 세부 분류개체명 인식기를 이용하여 질의응답 시스템에 적용한 결과 26%의 성능향상을 보였다.

  • PDF

TAKES: Two-step Approach for Knowledge Extraction in Biomedical Digital Libraries

  • Song, Min
    • Journal of Information Science Theory and Practice
    • /
    • v.2 no.1
    • /
    • pp.6-21
    • /
    • 2014
  • This paper proposes a novel knowledge extraction system, TAKES (Two-step Approach for Knowledge Extraction System), which integrates advanced techniques from Information Retrieval (IR), Information Extraction (IE), and Natural Language Processing (NLP). In particular, TAKES adopts a novel keyphrase extraction-based query expansion technique to collect promising documents. It also uses a Conditional Random Field-based machine learning technique to extract important biological entities and relations. TAKES is applied to biological knowledge extraction, particularly retrieving promising documents that contain Protein-Protein Interaction (PPI) and extracting PPI pairs. TAKES consists of two major components: DocSpotter, which is used to query and retrieve promising documents for extraction, and a Conditional Random Field (CRF)-based entity extraction component known as FCRF. The present paper investigated research problems addressing the issues with a knowledge extraction system and conducted a series of experiments to test our hypotheses. The findings from the experiments are as follows: First, the author verified, using three different test collections to measure the performance of our query expansion technique, that DocSpotter is robust and highly accurate when compared to Okapi BM25 and SLIPPER. Second, the author verified that our relation extraction algorithm, FCRF, is highly accurate in terms of F-Measure compared to four other competitive extraction algorithms: Support Vector Machine, Maximum Entropy, Single POS HMM, and Rapier.

Chinese Prosody Generation Based on C-ToBI Representation for Text-to-Speech (음성합성을 위한 C-ToBI기반의 중국어 운율 경계와 F0 contour 생성)

  • Kim, Seung-Won;Zheng, Yu;Lee, Gary-Geunbae;Kim, Byeong-Chang
    • MALSORI
    • /
    • no.53
    • /
    • pp.75-92
    • /
    • 2005
  • Prosody Generation Based on C-ToBI Representation for Text-to-SpeechSeungwon Kim, Yu Zheng, Gary Geunbae Lee, Byeongchang KimProsody modeling is critical in developing text-to-speech (TTS) systems where speech synthesis is used to automatically generate natural speech. In this paper, we present a prosody generation architecture based on Chinese Tone and Break Index (C-ToBI) representation. ToBI is a multi-tier representation system based on linguistic knowledge to transcribe events in an utterance. The TTS system which adopts ToBI as an intermediate representation is known to exhibit higher flexibility, modularity and domain/task portability compared with the direct prosody generation TTS systems. However, the cost of corpus preparation is very expensive for practical-level performance because the ToBI labeled corpus has been manually constructed by many prosody experts and normally requires a large amount of data for accurate statistical prosody modeling. This paper proposes a new method which transcribes the C-ToBI labels automatically in Chinese speech. We model Chinese prosody generation as a classification problem and apply conditional Maximum Entropy (ME) classification to this problem. We empirically verify the usefulness of various natural language and phonology features to make well-integrated features for ME framework.

  • PDF

Context-based Predictive Coding Scheme for Lossless Image Compression (무손실 영상 압축을 위한 컨텍스트 기반 적응적 예측 부호화 방법)

  • Kim, Jongho;Yoo, Hoon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.17 no.1
    • /
    • pp.183-189
    • /
    • 2013
  • This paper proposes a novel lossless image compression scheme composed of direction-adaptive prediction and context-based entropy coding. In the prediction stage, we analyze the directional property with respect to the current coding pixel and select an appropriate prediction pixel. In order to further reduce the prediction error, we propose a prediction error compensation technique based on the context model defined by the activities and directional properties of neighboring pixels. The proposed scheme applies a context-based Golomb-Rice coding as the entropy coding since the coding efficiency can be improved by using the conditional entropy from the viewpoint of the information theory. Experimental results indicate that the proposed lossless image compression scheme outperforms the low complexity and high efficient JPEG-LS in terms of the coding efficiency by 1.3% on average for various test images, specifically for the images with a remarkable direction the proposed scheme shows better results.

Eojeol Syntactic Tag Prediction of Korean Text using Entropy Guided CRF (엔트로피 지도 CRF를 이용한 한국어 어절 구문태그 예측)

  • Oh, Jin-Young;Cha, Jeong-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.5
    • /
    • pp.395-399
    • /
    • 2009
  • In this work, we describe the syntactic tag prediction system for Korean using the decision tree and CRFs. Generally they select features by their intuition. It depends on their prior knowledge. In this works, we combine features systematically using the decision tree. We also analyze errors and optimize features for the best performance. From the result of experiments, we can see that the proposed method is effective for the syntactic tag estimation and will be helpful for the syntactic analysis.

A Big Data Analysis by Between-Cluster Information using k-Modes Clustering Algorithm (k-Modes 분할 알고리즘에 의한 군집의 상관정보 기반 빅데이터 분석)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.13 no.11
    • /
    • pp.157-164
    • /
    • 2015
  • This paper describes subspace clustering of categorical data for convergence and integration. Because categorical data are not designed for dealing only with numerical data, The conventional evaluation measures are more likely to have the limitations due to the absence of ordering and high dimensional data and scarcity of frequency. Hence, conditional entropy measure is proposed to evaluate close approximation of cohesion among attributes within each cluster. We propose a new objective function that is used to reflect the optimistic clustering so that the within-cluster dispersion is minimized and the between-cluster separation is enhanced. We performed experiments on five real-world datasets, comparing the performance of our algorithms with four algorithms, using three evaluation metrics: accuracy, f-measure and adjusted Rand index. According to the experiments, the proposed algorithm outperforms the algorithms that were considered int the evaluation, regarding the considered metrics.