• Title/Summary/Keyword: Classifier algorithm

Search Result 722, Processing Time 0.032 seconds

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

A method of background noise removal of Raman spectra for classification of liver disease (간 질병 분류를 위한 라만 스펙트럼의 배경 잡음 제거 방법)

  • Park, Aaron;Baek, Sung-June
    • Smart Media Journal
    • /
    • v.2 no.2
    • /
    • pp.33-38
    • /
    • 2013
  • In this paper, we investigated baseline estimation methods for remove background noise using Raman spectra from acute alcohol liver injury and acute ethanol-induced chronic liver fibrosis. Far the baseline estimation, we applied first derivative, linear programming and rolling ball method. Optimal input parameter of each method were determined by the training rate of MAP (maximum a posteriori probability) classifier. According to the experimental results, classification results baseline estimation with the rolling ball algorithm gave about 89.4%, which is very promising results for classification of acute alcohol liver injury and acute ethanol-induced chronic liver fibrosis. From these results, to determined the appropriate methods and parameters of baseline estimation impact on classification performance was confirmed.

  • PDF

Robust Person Identification Using Optimal Reliability in Audio-Visual Information Fusion

  • Tariquzzaman, Md.;Kim, Jin-Young;Na, Seung-You;Choi, Seung-Ho
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.3E
    • /
    • pp.109-117
    • /
    • 2009
  • Identity recognition in real environment with a reliable mode is a key issue in human computer interaction (HCI). In this paper, we present a robust person identification system considering score-based optimal reliability measure of audio-visual modalities. We propose an extension of the modified reliability function by introducing optimizing parameters for both of audio and visual modalities. For degradation of visual signals, we have applied JPEG compression to test images. In addition, for creating mismatch in between enrollment and test session, acoustic Babble noises and artificial illumination have been added to test audio and visual signals, respectively. Local PCA has been used on both modalities to reduce the dimension of feature vector. We have applied a swarm intelligence algorithm, i.e., particle swarm optimization for optimizing the modified convection function's optimizing parameters. The overall person identification experiments are performed using VidTimit DB. Experimental results show that our proposed optimal reliability measures have effectively enhanced the identification accuracy of 7.73% and 8.18% at different illumination direction to visual signal and consequent Babble noises to audio signal, respectively, in comparison with the best classifier system in the fusion system and maintained the modality reliability statistics in terms of its performance; it thus verified the consistency of the proposed extension.

Multiple Instance Mamdani Fuzzy Inference

  • Khalifa, Amine B.;Frigui, Hichem
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.15 no.4
    • /
    • pp.217-231
    • /
    • 2015
  • A novel fuzzy learning framework that employs fuzzy inference to solve the problem of Multiple Instance Learning (MIL) is presented. The framework introduces a new class of fuzzy inference systems called Multiple Instance Mamdani Fuzzy Inference Systems (MI-Mamdani). In multiple instance problems, the training data is ambiguously labeled. Instances are grouped into bags, labels of bags are known but not those of individual instances. MIL deals with learning a classifier at the bag level. Over the years, many solutions to this problem have been proposed. However, no MIL formulation employing fuzzy inference exists in the literature. Fuzzy logic is powerful at modeling knowledge uncertainty and measurements imprecision. It is one of the best frameworks to model vagueness. However, in addition to uncertainty and imprecision, there is a third vagueness concept that fuzzy logic does not address quiet well, yet. This vagueness concept is due to the ambiguity that arises when the data have multiple forms of expression, this is the case for multiple instance problems. In this paper, we introduce multiple instance fuzzy logic that enables fuzzy reasoning with bags of instances. Accordingly, a MI-Mamdani that extends the standard Mamdani inference system to compute with multiple instances is introduced. The proposed framework is tested and validated using a synthetic dataset suitable for MIL problems. Additionally, we apply the proposed multiple instance inference to fuse the output of multiple discrimination algorithms for the purpose of landmine detection using Ground Penetrating Radar.

Deep Learning based Scrapbox Accumulated Status Measuring

  • Seo, Ye-In;Jeong, Eui-Han;Kim, Dong-Ju
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.3
    • /
    • pp.27-32
    • /
    • 2020
  • In this paper, we propose an algorithm to measure the accumulated status of scrap boxes where metal scraps are accumulated. The accumulated status measuring is defined as a multi-class classification problem, and the method with deep learning classify the accumulated status using only the scrap box image. The learning was conducted by the Transfer Learning method, and the deep learning model was NASNet-A. In order to improve the accuracy of the model, we combined the Random Forest classifier with the trained NASNet-A and improved the model through post-processing. Testing with 4,195 data collected in the field showed 55% accuracy when only NASNet-A was applied, and the proposed method, NASNet with Random Forest, improved the accuracy by 88%.

Stress Identification and Analysis using Observed Heart Beat Data from Smart HRM Sensor Device

  • Pramanta, SPL Aditya;Kim, Myonghee;Park, Man-Gon
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.8
    • /
    • pp.1395-1405
    • /
    • 2017
  • In this paper, we analyses heart beat data to identify subjects stress state (binary) using heart rate variability (HRV) features extracted from heart beat data of the subjects and implement supervised machine learning techniques to create the mental stress classifier. There are four steps need to be done: data acquisition, data processing (HRV analysis), features selection, and machine learning, before doing performance measurement. There are 56 features generated from the HRV Analysis module with several of them are selected (using own algorithm) after computing the Pearson Correlation Matrix (p-values). The results of the list of selected features compared with all features data are compared by its model error after training using several machine learning techniques: support vector machine, decision tree, and discriminant analysis. SVM model and decision tree model with using selected features shows close results compared to using all recording by only 1% difference. Meanwhile, the discriminant analysis differs about 5%. All the machine learning method used in this works have 90% maximum average accuracy.

Skin Pigment Recognition using Projective Hemoglobin- Melanin Coordinate Measurements

  • Yang, Liu;Lee, Suk-Hwan;Kwon, Seong-Geun;Song, Ha-Joo;Kwon, Ki-Ryong
    • Journal of Electrical Engineering and Technology
    • /
    • v.11 no.6
    • /
    • pp.1825-1838
    • /
    • 2016
  • The detection of skin pigment is crucial in the diagnosis of skin diseases and in the evaluation of medical cosmetics and hairdressing. Accuracy in the detection is a basis for the prompt cure of skin diseases. This study presents a method to recognize and measure human skin pigment using Hemoglobin-Melanin (HM) coordinate. The proposed method extracts the skin area through a Gaussian skin-color model estimated from statistical analysis and decomposes the skin area into two pigments of hemoglobin and melanin using an Independent Component Analysis (ICA) algorithm. Then, we divide the two-dimensional (2D) HM coordinate into rectangular bins and compute the location histograms of hemoglobin and melanin for all the bins. We label the skin pigment of hemoglobin, melanin, and normal skin on all bins according to the Bayesian classifier. These bin-based HM projective histograms can quantify the skin pigment and compute the standard deviation on the total quantification of skin pigments surrounding normal skin. We tested our scheme using images taken under different illumination conditions. Several cosmetic coverings were used to test the performance of the proposed method. The experimental results show that the proposed method can detect skin pigments with more accuracy and evaluate cosmetic covering effects more effectively than conventional methods.

Real-time Sign Language Recognition Using an Armband with EMG and IMU Sensors (근전도와 관성센서가 내장된 암밴드를 이용한 실시간 수화 인식)

  • Kim, Seongjung;Lee, Hansoo;Kim, Jongman;Ahn, Soonjae;Kim, Youngho
    • Journal of rehabilitation welfare engineering & assistive technology
    • /
    • v.10 no.4
    • /
    • pp.329-336
    • /
    • 2016
  • Deaf people using sign language are experiencing social inequalities and financial losses due to communication restrictions. In this paper, real-time pattern recognition algorithm was applied to distinguish American Sign Language using an armband sensor(8-channel EMG sensors and one IMU) to enable communication between the deaf and the hearing people. The validation test was carried out with 11 people. Learning pattern classifier was established by gradually increasing the number of training database. Results showed that the recognition accuracy was over 97% with 20 training samples and over 99% with 30 training samples. The present study shows that sign language recognition using armband sensor is more convenient and well-performed.

Classification of Forest Vegetation Zone over Southern Part of Korean Peninsula Using Geographic Information Systems (環境因子의 空間分析을 통한 南韓지역의 山林植生帶 구분/지리정보시스템(GIS)에 의한 접근)

  • Lee, Kyu-Sung;Byong-Chun Lee;Joon Hwan Shin
    • The Korean Journal of Ecology
    • /
    • v.19 no.5
    • /
    • pp.465-476
    • /
    • 1996
  • There are several environmental variables that may be influential to the spatial distribution of forest vegetation. To create a map of forest vegetation zone over southern part of Korean Peninsula, digital map layers were produced for each of environmental variables that include topography, geographic locations, and climate. In addition, an extensive set of field survey data was collected at relatively undisturbed forests and they were introduced into the GIS database with exact coordinates of survey sites. Preliminary statistical analysis on the survey data showed that the environmental variables were significantly different among the previously defined five forest vegetation zones. Classification of the six layers of digital map representing environmental variables was carried out by a supervised classifier using the training statistics from field survey data and by a clustering algorithm. Although the maps from two classifiers were somewhat different due to the classification procedure applied, they showed overall patterns of vertical and horizontal distribution of forest zones. considering the spatial contents of many ecological studies, GIS can be used as an important tool to manage and analyze spatial data. This study discusses more about the generation of digital map and the analysis procedure rather than the outcome map of forest vegetation zone.

  • PDF

Feature extraction and Classification of EEG for BCI system

  • Kim, Eung-Soo;Cho, Han-Bum;Yang, Eun-Joo;Eum, Tae-Wan
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2003.09a
    • /
    • pp.260-263
    • /
    • 2003
  • EEC is an electrical signal, which occurs during information processing in the brain. These EEG signals has been used clinically, but nowadays we are mainly studying Brain-Computer Interface(BCI) such as interfacing with a computer through the EEG controlling the machine through the EEG The ultimate purpose of BCI study is specifying the EEG at various mental states so as to control the computer and machine. A BCI has to perform two tasks, the parameter estimation task, which attemps to describe the properties of the EEG signal and the classification task, which separates the different EEC patterns based on the estimated parameters. First, we have to do parameter estimation of EEG to embody BCI system. It is important to improve performance of classifier, But, It is not easy to do parameter estimation by reason of EEG is sensitivity and undergo various influences. Therefore, this research should do parameter estimation and classification of the EEG to use various analysis algorithm.

  • PDF