Search | Korea Science

Automatic Text Categorization by Term Weighting and Inverted Category Frequency (용어 가중치와 역범주 빈도에 의한 자동문서 범주화)

Lee, Kyung-Chan;Kang, Seung-Shik
- Annual Conference on Human and Language Technology
- /
- 2003.10d
- /
- pp.14-17
- /
- 2003
문서의 확률을 이용하여 자동으로 문서를 분류하는 문서 범주화 기법의 대표적인 방법이 나이브 베이지언 확률 모델이다. 이 방법의 기본 형식은 출현 용어의 확률 계산 방법이다. 하지만 실제 문서 범주화 과정에서 출현하지 않는 용어들도 성능에 많은 영향을 줄 수 있으며, 출현 용어들에 대한 빈도 이외의 역범주 빈도나 용어가중치를 적용하여 문서 범주화 시스템의 성능을 향상시킬 수 있다. 본 논문에서는 나이브 베이지언 확률 모델에 출현 용어와 출현하지 않는 용어들에 대한 smoothing 기법을 적용하여 실험하였다. 성능 평가를 위해 뉴스그룹 문서들을 이용하였으며, 역범주 빈도와 가중치를 적용했을 때 나이브 베이지언 확률 모델에 비해 약 7% 정도 성능 개선 효과가 있었다.
PDF

Improving Multinomial Naive Bayes Text Classifier (다항시행접근 단순 베이지안 문서분류기의 개선)

김상범;임해창
- Journal of KIISE:Software and Applications
- /
- v.30 no.3_4
- /
- pp.259-267
- /
- 2003
Though naive Bayes text classifiers are widely used because of its simplicity, the techniques for improving performances of these classifiers have been rarely studied. In this paper, we propose and evaluate some general and effective techniques for improving performance of the naive Bayes text classifier. We suggest document model based parameter estimation and document length normalization to alleviate the Problems in the traditional multinomial approach for text classification. In addition, Mutual-Information-weighted naive Bayes text classifier is proposed to increase the effect of highly informative words. Our techniques are evaluated on the Reuters21578 and 20 Newsgroups collections, and significant improvements are obtained over the existing multinomial naive Bayes approach.
PDF KSCI

The Risk Assessment and Prediction for the Mixed Deterioration in Cable Bridges Using a Stochastic Bayesian Modeling (확률론적 베이지언 모델링에 의한 케이블 교량의 복합열화 리스크 평가 및 예측시스템)

Cho, Tae Jun;Lee, Jeong Bae;Kim, Seong Soo
- Journal of the Korea institute for structural maintenance and inspection
- /
- v.16 no.5
- /
- pp.29-39
- /
- 2012
The main objective is to predict the future degradation and maintenance budget for a suspension bridge system. Bayesian inference is applied to find the posterior probability density function of the source parameters (damage indices and serviceability), given ten years of maintenance data. The posterior distribution of the parameters is sampled using a Markov chain Monte Carlo method. The simulated risk prediction for decreased serviceability conditions are posterior distributions based on prior distribution and likelihood of data updated from annual maintenance tasks. Compared with conventional linear prediction model, the proposed quadratic model provides highly improved convergence and closeness to measured data in terms of serviceability, risky factors, and maintenance budget for bridge components, which allows forecasting a future performance and financial management of complex infrastructures based on the proposed quadratic stochastic regression model.
https://doi.org/10.11112/jksmi.2012.16.5.029 인용 PDF KSCI

The Development of e-Learning System for Science and Engineering Mathematics using Computer Algebra System (컴퓨터 대수 시스템을 이용한 이공계 수학용이러닝 시스템 개발)

Park, Hong-Joon;Jun, Young-Cook;Jang, Moon-Suk
- The KIPS Transactions:PartA
- /
- v.14A no.6
- /
- pp.383-390
- /
- 2007
This paper describes the e-learning system for science and engineering mathematics using computer algebra system and Bayesian inference network. The best feature of this system is using one of the most recent mathematical dynamic web content authoring model which is called client independent dynamic web content authoring model and using the Bayesian inference network for diagnosing student's learning. The authoring module using computer algebra system provides teacher-user with easy way to make dynamic mathematical web contents. The diagnosis module using Bayesian inference network helps students know the weaker parts of their learning, in this way our system determines appropriate next learning sequences in order to provide supplementary learning feedback.
https://doi.org/10.3745/KIPSTA.2007.14-A.6.383 인용 PDF KSCI

Ranking by Inductive Inference in Collaborative Filtering Systems (협력적 여과 시스템에서 귀납 추리를 이용한 순위 결정)

Ko, Su-Jeong
- Journal of KIISE:Software and Applications
- /
- v.37 no.9
- /
- pp.659-668
- /
- 2010
Collaborative filtering systems grasp behaviors for a new user and need new information for the user in order to recommend interesting items to the user. For the purpose of acquiring the information the collaborative filtering systems learn behaviors for users based on the previous data and can obtain new information from the results. In this paper, we propose an inductive inference method to obtain new information for users and rank items by using the new information in the proposed method. The proposed method clusters users into groups by learning users through NMF among inductive machine learning methods and selects the group features from the groups by using chi-square. Then, the method classifies a new user into a group by using the bayesian probability model as one of inductive inference methods based on the rating values for the new user and the features of groups. Finally, the method decides the ranks of items by applying the Rocchio algorithm to items with the missing values.
PDF KSCI

Probabilistic based Web Contents Mining (확률 기반 웹 콘텐츠 마이닝)

Yun, Bo-Hyun;Cho, Kwang-Moon
- Proceedings of the Korea Contents Association Conference
- /
- 2006.11a
- /
- pp.16-20
- /
- 2006
In Web contents mining, it is important to recognize the unlabeled entities and to integrate the sub-linked information and the extracted results. This paper presents the probabilistic based method which can recognize the unlabeled entity by using the Baysien model. Moreover, we propose the method that can use the information of the sub-linked web pages and integrate the extracted results. In the experimental results, we can see that the probabilistic based entity and information integration show the most significant precision.
PDF

A Probabilistic Method for Recognizing Unlabeled Text on Web Pages (웹페이지에서 레이블이 없는 텍스트 인식을 위한 확률 모델)

정창후;이민호;주원균;맹성현
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.10a
- /
- pp.163-165
- /
- 2003
도메인 지식은 텍스트의 포맷과 의미 정보를 이용하여 웹에 존재하는 텍스트의 다양한 의미를 이해할 수 있도록 도와준다. 그러나 도메인 지식은 텍스트에 데이터의 의미를 표현하는 레이블이 존재하지 알을 경우에 텍스트 인식을 제대로 수행할 수 없기 때문에 무용지물이 되고 만다. 이러한 문제를 해결하기 위해 본 논문에서는 레이블이 존재하지 않는 텍스트의 의미를 효과적으로 추론할 수 있는 엔티티 인식 모델을 제안한다 엔티티 인식 모델은 베이지언 모델과 컨텍스트 정보를 결합한 방법으로서, 구조 분석을 수행한 HTML 문서의 텍스트 토큰에 대해서 어떤 엔티티에 속할 것인가를 결정하는 기능을 수행한다. 실험 결과 본 모델을 사용할 경우 기존에는 레이블이 없어서 인식되지 않았던 텍스트들을 효과적으로 인식하는 것을 확인할 수 있었다.
PDF

Automatic Text Categorization Using Term Information of Anchor Text (Anchor Text의 단어 정보를 이용한 자동 문서 범주화)

Heo, Hee-keun;Han, Gi-deok;Jung, Sung-won;Lim, Sung-shin;Kwon, Hyuk-chul
- Proceedings of the Korea Information Processing Society Conference
- /
- 2004.05a
- /
- pp.665-668
- /
- 2004
최근의 웹 문서는 텍스트뿐만 아니라 이미지, 사운드 등 다른 여러 형태로 표현되고 있어서 텍스트의 비중이 낮아지고 있다. 그래서 문서 내에서 일정량 이상의 단어 추출이 어려운 문서들에 대해서 기존의 단어 정보만을 이용한 문서 범주화 방법은 좋은 성능을 기대할 수 없다. 그래서 본 논문은 Anchor Text 단어 정보의 자질 적합성 판단에 의한 새로운 자동 문서 범주화 모델을 제안한다. 문서 범주화 모델로는 베이지언 확률 모델을 이용하였으며, 카이제곱 통계량을 사용하여 자질을 선정하였다. 문서 내에서 추출된 단어 자질들이 해당 문서를 판단하는데 부족하다고 판단되면 문서의 링크정보를 이용하여 연결된 문서의 단어 자질과 Anchor Text의 단어 자질을 반영함으로써 성능을 향상시킨다.
PDF

Automatic Document Categorization by the Importance of Features (자질 중요도 계산 기법에 의한 자동문서 범주화)

이경찬;강승식
- Proceedings of the Korean Information Science Society Conference
- /
- 2003.04c
- /
- pp.537-539
- /
- 2003
문서 범주화를 위해 자질을 선별하는 기법으로는 자질의 출현 빈도에 따라 범주를 대표하는 자질들을 선별하는 것이 일반적이다. 출현 빈도에 의한 자질을 선별하는 통계적인 기법은 문서의 내용을 대표하는 용어들의 중요도를 간과하는 문제가 발생한다. 본 논문에서는 학습 문서 및 실험 문서에서 자질의 중요도에 의해 범주 대표어를 선별하는 문서 범주화 기법을 제안하였으며, 역범주 빈도 및 카이제곱 통계량에 의해 자질을 선별하는 방법과 비교-실험을 하였다. 문서 범주화 모델로는 나이브 베이지언 확률 모델을 이용하였으며, 성능 평가를 위해서 웹 디렉토리에서 수집된 데이터를 이용하여 실험하였다. 본 논문에서 제안한 자질 중요도에 의한 자질 선별 기법은 용어의 출현 빈도 및 카이제곱 통계량에 의해 자질을 선별한 방법보다 더 나은 성능을 보였다.
PDF

Skin Color Region Segmentation using classified 3D skin (계층화된 3차원 피부색 모델을 이용한 피부색 분할)

Park, Gyeong-Mi;Yoon, Ga-Rim;Kim, Young-Bong
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.8
- /
- pp.1809-1818
- /
- 2010
In order to detect the skin color area from input images, many prior researches have divided an image into the pixels having a skin color and the other pixels. In a still image or videos, it is very difficult to exactly extract the skin pixels because lighting condition and makeup generate a various variations of skin color. In this thesis, we propose a method that improves its performance using hierarchical merging of 3D skin color model and context informations for the images having various difficulties. We first make 3D color histogram distributions using skin color pixels from many YCbCr color images and then divide the color space into 3 layers including skin color region(Skin), non-skin color region(Non-skin), skin color candidate region (Skinness). When we segment the skin color region from an image, skin color pixel and non-skin color pixels are determined to skin region and non-skin region respectively. If a pixel is belong to Skinness color region, the pixels are divided into skin region or non-skin region according to the context information of its neighbors. Our proposed method can help to efficiently segment the skin color regions from images having many distorted skin colors and similar skin colors.
https://doi.org/10.6109/jkiice.2010.14.8.1809 인용 PDF KSCI

Search Result 10, Processing Time 0.03 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)