• Title/Summary/Keyword: bayesian classification

Search Result 254, Processing Time 0.033 seconds

Model selection algorithm in Gaussian process regression for computer experiments

  • Lee, Youngsaeng;Park, Jeong-Soo
    • Communications for Statistical Applications and Methods
    • /
    • v.24 no.4
    • /
    • pp.383-396
    • /
    • 2017
  • The model in our approach assumes that computer responses are a realization of a Gaussian processes superimposed on a regression model called a Gaussian process regression model (GPRM). Selecting a subset of variables or building a good reduced model in classical regression is an important process to identify variables influential to responses and for further analysis such as prediction or classification. One reason to select some variables in the prediction aspect is to prevent the over-fitting or under-fitting to data. The same reasoning and approach can be applicable to GPRM. However, only a few works on the variable selection in GPRM were done. In this paper, we propose a new algorithm to build a good prediction model among some GPRMs. It is a post-work of the algorithm that includes the Welch method suggested by previous researchers. The proposed algorithms select some non-zero regression coefficients (${\beta}^{\prime}s$) using forward and backward methods along with the Lasso guided approach. During this process, the fixed were covariance parameters (${\theta}^{\prime}s$) that were pre-selected by the Welch algorithm. We illustrated the superiority of our proposed models over the Welch method and non-selection models using four test functions and one real data example. Future extensions are also discussed.

A Study of using Emotional Features for Information Retrieval Systems (감정요소를 사용한 정보검색에 관한 연구)

  • Kim, Myung-Gwan;Park, Young-Tack
    • The KIPS Transactions:PartB
    • /
    • v.10B no.6
    • /
    • pp.579-586
    • /
    • 2003
  • In this paper, we propose a novel approach to employ emotional features to document retrieval systems. Fine emotional features, such as HAPPY, SAD, ANGRY, FEAR, and DISGUST, have been used to represent Korean document. Users are allowed to use these features for retrieving their documents. Next, retrieved documents are learned by classification methods like cohesion factor, naive Bayesian, and, k-nearest neighbor approaches. In order to combine various approaches, voting method has been used. In addition, k-means clustering has been used for our experimentation. The performance of our approach proved to be better in accuracy than other methods, and be better in short texts rather than large documents.

Crowd Activity Recognition using Optical Flow Orientation Distribution

  • Kim, Jinpyung;Jang, Gyujin;Kim, Gyujin;Kim, Moon-Hyun
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.9 no.8
    • /
    • pp.2948-2963
    • /
    • 2015
  • In the field of computer vision, visual surveillance systems have recently become an important research topic. Growth in this area is being driven by both the increase in the availability of inexpensive computing devices and image sensors as well as the general inefficiency of manual surveillance and monitoring. In particular, the ultimate goal for many visual surveillance systems is to provide automatic activity recognition for events at a given site. A higher level of understanding of these activities requires certain lower-level computer vision tasks to be performed. So in this paper, we propose an intelligent activity recognition model that uses a structure learning method and a classification method. The structure learning method is provided as a K2-learning algorithm that generates Bayesian networks of causal relationships between sensors for a given activity. The statistical characteristics of the sensor values and the topological characteristics of the generated graphs are learned for each activity, and then a neural network is designed to classify the current activity according to the features extracted from the multiple sensor values that have been collected. Finally, the proposed method is implemented and tested by using PETS2013 benchmark data.

Taxonomy and phylogeny of the genus Cryptomonas (Cryptophyceae, Cryptophyta) from Korea

  • Choi, Bomi;Son, Misun;Kim, Jong Im;Shin, Woongghi
    • ALGAE
    • /
    • v.28 no.4
    • /
    • pp.307-330
    • /
    • 2013
  • The genus Cryptomonas is easily recognized by having two flagella, green brownish color, and a swaying behavior. They have relatively simple morphology, and limited diagnostic characters, which present a major difficulty in differentiating between species of the genus. To understand species delineation and phylogenetic relationships among Cryptomonas species, the nuclear-encoded internal transcribed spacer 2 (ITS2), partial large subunit (LSU) and small subunit ribosomal DNA (rDNA), and chloroplast-encoded psbA and LSU rDNA sequences were determined and used for phylogenetic analyses, using Bayesian and maximum likelihood methods. In addition, nuclear-encoded ITS2 sequences were predicted to secondary structures, and were used to determine nine species and four unidentified species from 47 strains. Sequences of helix I, II, and IIIb in ITS2 secondary structure were very useful for the identification of Cryptomonas species. However, the helix IV was the most variable region across species in alignment. The phylogenetic tree showed that fourteen species were monophyletic. However, some strains of C. obovata had chloroplasts with pyrenoid while others were without pyrenoid, which used as a key character in few species. Therefore, classification systems depending solely on morphological characters are inadequate, and require the use of molecular data.

Development of Medical Cost Prediction Model Based on the Machine Learning Algorithm (머신러닝 알고리즘 기반의 의료비 예측 모델 개발)

  • Han Bi KIM;Dong Hoon HAN
    • Journal of Korea Artificial Intelligence Association
    • /
    • v.1 no.1
    • /
    • pp.11-16
    • /
    • 2023
  • Accurate hospital case modeling and prediction are crucial for efficient healthcare. In this study, we demonstrate the implementation of regression analysis methods in machine learning systems utilizing mathematical statics and machine learning techniques. The developed machine learning model includes Bayesian linear, artificial neural network, decision tree, decision forest, and linear regression analysis models. Through the application of these algorithms, corresponding regression models were constructed and analyzed. The results suggest the potential of leveraging machine learning systems for medical research. The experiment aimed to create an Azure Machine Learning Studio tool for the speedy evaluation of multiple regression models. The tool faciliates the comparision of 5 types of regression models in a unified experiment and presents assessment results with performance metrics. Evaluation of regression machine learning models highlighted the advantages of boosted decision tree regression, and decision forest regression in hospital case prediction. These findings could lay the groundwork for the deliberate development of new directions in medical data processing and decision making. Furthermore, potential avenues for future research may include exploring methods such as clustering, classification, and anomaly detection in healthcare systems.

Segmentation Method of Overlapped nuclei in FISH Image (FISH 세포영상에서의 군집세포 분할 기법)

  • Jeong, Mi-Ra;Ko, Byoung-Chul;Nam, Jae-Yeal
    • The KIPS Transactions:PartB
    • /
    • v.16B no.2
    • /
    • pp.131-140
    • /
    • 2009
  • This paper presents a new algorithm to the segmentation of the FISH images. First, for segmentation of the cell nuclei from background, a threshold is estimated by using the gaussian mixture model and maximizing the likelihood function of gray value of cell images. After nuclei segmentation, overlapped nuclei and isolated nuclei need to be classified for exact nuclei analysis. For nuclei classification, this paper extracted the morphological features of the nuclei such as compactness, smoothness and moments from training data. Three probability density functions are generated from these features and they are applied to the proposed Bayesian networks as evidences. After nuclei classification, segmenting of overlapped nuclei into isolated nuclei is necessary. This paper first performs intensity gradient transform and watershed algorithm to segment overlapped nuclei. Then proposed stepwise merging strategy is applied to merge several fragments in major nucleus. The experimental results using FISH images show that our system can indeed improve segmentation performance compared to previous researches, since we performed nuclei classification before separating overlapped nuclei.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

Classification and Analysis of Data Mining Algorithms (데이터마이닝 알고리즘의 분류 및 분석)

  • Lee, Jung-Won;Kim, Ho-Sook;Choi, Ji-Young;Kim, Hyon-Hee;Yong, Hwan-Seung;Lee, Sang-Ho;Park, Seung-Soo
    • Journal of KIISE:Databases
    • /
    • v.28 no.3
    • /
    • pp.279-300
    • /
    • 2001
  • Data mining plays an important role in knowledge discovery process and usually various existing algorithms are selected for the specific purpose of the mining. Currently, data mining techniques are actively to the statistics, business, electronic commerce, biology, and medical area and currently numerous algorithms are being researched and developed for these applications. However, in a long run, only a few algorithms, which are well-suited to specific applications with excellent performance in large database, will survive. So it is reasonable to focus our effort on those selected algorithms in the future. This paper classifies about 30 existing algorithms into 7 categories - association rule, clustering, neural network, decision tree, genetic algorithm, memory-based reasoning, and bayesian network. First of all, this work analyzes systematic hierarchy and characteristics of algorithms and we present 14 criteria for classifying the algorithms and the results based on this criteria. Finally, we propose the best algorithms among some comparable algorithms with different features and performances. The result of this paper can be used as a guideline for data mining researches as well as field applications of data mining.

  • PDF

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

Construction of Multiple Classifier Systems based on a Classifiers Pool (인식기 풀 기반의 다수 인식기 시스템 구축방법)

  • Kang, Hee-Joong
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.8
    • /
    • pp.595-603
    • /
    • 2002
  • Only a few studies have been conducted on how to select multiple classifiers from the pool of available classifiers for showing the good classification performance. Thus, the selection problem if classifiers on how to select or how many to select still remains an important research issue. In this paper, provided that the number of selected classifiers is constrained in advance, a variety of selection criteria are proposed and applied to tile construction of multiple classifier systems, and then these selection criteria will be evaluated by the performance of the constructed multiple classifier systems. All the possible sets of classifiers are trammed by the selection criteria, and some of these sets are selected as the candidates of multiple classifier systems. The multiple classifier system candidates were evaluated by the experiments recognizing unconstrained handwritten numerals obtained both from Concordia university and UCI machine learning repository. Among the selection criteria, particularly the multiple classifier system candidates by the information-theoretic selection criteria based on conditional entropy showed more promising results than those by the other selection criteria.